MarmotApp

Enable running any large language models locally and privately.

About

MarmotApp is a cutting-edge application that enables users to run any large language models locally on their devices, ensuring complete privacy and offline functionality. Our solution brings powerful AI capabilities to your fingertips without compromising data security.

Preview

🚀 Quick Start

📱 iOS - Ready for Testing

Our iOS app is available through TestFlight for beta testing.

To join the beta:

Fill out our Beta Tester Form.
We'll send a TestFlight invitation to your Apple ID email.
Install via TestFlight and start using locally!

Note: No App Store download required - direct TestFlight access.

The source codes for building iOS applications can be found in ios directory.

🤖 Android - Libraries Available

We currently provide pre-built executable files and corresponding libraries for Android in android directory. A full Android app is coming soon.

Setup via ADB

Push executable files and corresponding libraries to Android device.

adb push android /data/local/tmp
adb push ggml-model-llama-2-7b-chat-q4_0.gguf /data/local/tmp

Connect to device shell.

adb shell

Current Features

Text generation

cd /data/local/tmp
LD_LIBRARY_PATH=android/lib/ ./android/bin/llama-cli-prefetch -m ggml-model-llama-2-7b-chat-q4_0.gguf -p "I believe the meaning of life is" -n 128 -t 4 -am 2 -tp 1 -c 512 -ngl 0

Parameters:

-m: model path

-p: prompt, if the prompt contains some special characters, we can use -f to indicate the file that contains the prompt

-n: the number of tokens to generate

-t: the number of computing threads

-am: the available memory size

-tp: the number of I/O threads

-c: the context size

-ngl: the number of layers computed on GPU (must set to 0, prefetching on GPU is not supported so far)

Chatting

cd /data/local/tmp
LD_LIBRARY_PATH=android/lib/ ./android/bin/llama-cli-prefetch -m ggml-model-llama-2-7b-chat-q4_0.gguf -p "Your are a helpful assistant" -n 128 -t 4 -am 2 -tp 1 -c 512 -ngl 0 --cnv

Parameters:

-p: system prompt, if the system prompt contains some special characters, we can use -spf to indicate the file that contains the system prompt

-cnv: conversation mode

other parameters are the same as the above

Speed Benchmarking

cd /data/local/tmp
LD_LIBRARY_PATH=android/lib/ ./android/bin/llama-bench -m ggml-model-llama-2-7b-chat-q4_0.gguf -p 16 -n 16 -t 4 -am 2 -tp 1 -ngl 0

Parameters:

-p: the prompt length for benchmarking

-n: the generation length for benchmarking

other parameters are the same as the above

🖥️ Desktop - Executable Ready

We also provides executable files and corresponding libraries in host directory to serve LLMs directly on your host machine.

Current Features

Text generation

LD_LIBRARY_PATH=host/lib/ ./host/bin/llama-cli-prefetch -m ggml-model-llama-2-7b-chat-q4_0.gguf -p "I believe the meaning of life is" -n 128 -t 4 -am 2 -tp 1 -c 512 -ngl 0

Chatting

LD_LIBRARY_PATH=host/lib/ ./host/bin/llama-cli-prefetch -m ggml-model-llama-2-7b-chat-q4_0.gguf -p "Your are a helpful assistant" -n 128 -t 4 -am 2 -tp 1 -c 512 -ngl 0 --cnv

Speed Benchmarking

LD_LIBRARY_PATH=host/lib/ ./host/bin/llama-bench -m ggml-model-llama-2-7b-chat-q4_0.gguf -p 16 -n 16 -t 4 -am 2 -tp 1 -ngl 0

Todo List

Feedback & Issue Reporting

We value your feedback and are committed to improving MarmotApp. If you encounter any issues or have suggestions, please report them through our GitHub Issues.

Citations

Please consider citing our project if you find it useful:

@software{marmotapp,
    author = {{MarmotTech}},
    title = {{MarmotApp}},
    url = {https://github.com/MarmotTech/MarmotApp},
    year = {2025}
}

The underlying techniques of MarmotApp include:

@inproceedings{euromlsys-flexinfer,
    author       = {Hongchao Du and
                    Shangyu Wu and
                    Arina Kharlamova and
                    Nan Guan and
                    Chun Jason Xue},
    editor       = {Eiko Yoneki and
                    Amir H. Payberah},
    title        = {FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading
                    for On-Device {LLM} Inference},
    booktitle    = {Proceedings of the 5th Workshop on Machine Learning and Systems, EuroMLSys
                    2025, World Trade Center, Rotterdam, The Netherlands, 30 March 2025-
                    3 April 2025},
    pages        = {56--65},
    publisher    = {{ACM}},
    year         = {2025},
    url          = {https://doi.org/10.1145/3721146.3721961},
    doi          = {10.1145/3721146.3721961},
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
andriod		andriod
docs/pics		docs/pics
gguf-py		gguf-py
host		host
ios		ios
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
convert.py		convert.py
llama-quantize		llama-quantize
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MarmotApp

About

Preview

🚀 Quick Start

📱 iOS - Ready for Testing

🤖 Android - Libraries Available

Setup via ADB

Current Features

🖥️ Desktop - Executable Ready

Current Features

Todo List

Feedback & Issue Reporting

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MarmotApp

About

Preview

🚀 Quick Start

📱 iOS - Ready for Testing

🤖 Android - Libraries Available

Setup via ADB

Current Features

🖥️ Desktop - Executable Ready

Current Features

Todo List

Feedback & Issue Reporting

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages