MarmotApp is a cutting-edge application that enables users to run any large language models locally on their devices, ensuring complete privacy and offline functionality. Our solution brings powerful AI capabilities to your fingertips without compromising data security.
Our iOS app is available through TestFlight for beta testing.
To join the beta:
-
Fill out our Beta Tester Form.
-
We'll send a TestFlight invitation to your Apple ID email.
-
Install via TestFlight and start using locally!
Note: No App Store download required - direct TestFlight access.
The source codes for building iOS applications can be found in ios directory.
We currently provide pre-built executable files and corresponding libraries for Android in android directory. A full Android app is coming soon.
Push executable files and corresponding libraries to Android device.
adb push android /data/local/tmp
adb push ggml-model-llama-2-7b-chat-q4_0.gguf /data/local/tmpConnect to device shell.
adb shellText generation
cd /data/local/tmp
LD_LIBRARY_PATH=android/lib/ ./android/bin/llama-cli-prefetch -m ggml-model-llama-2-7b-chat-q4_0.gguf -p "I believe the meaning of life is" -n 128 -t 4 -am 2 -tp 1 -c 512 -ngl 0Parameters:
-m: model path-p: prompt, if the prompt contains some special characters, we can use-fto indicate the file that contains the prompt-n: the number of tokens to generate-t: the number of computing threads-am: the available memory size-tp: the number of I/O threads-c: the context size-ngl: the number of layers computed on GPU (must set to 0, prefetching on GPU is not supported so far)
Chatting
cd /data/local/tmp
LD_LIBRARY_PATH=android/lib/ ./android/bin/llama-cli-prefetch -m ggml-model-llama-2-7b-chat-q4_0.gguf -p "Your are a helpful assistant" -n 128 -t 4 -am 2 -tp 1 -c 512 -ngl 0 --cnvParameters:
-p: system prompt, if the system prompt contains some special characters, we can use-spfto indicate the file that contains the system prompt-cnv: conversation modeother parameters are the same as the above
Speed Benchmarking
cd /data/local/tmp
LD_LIBRARY_PATH=android/lib/ ./android/bin/llama-bench -m ggml-model-llama-2-7b-chat-q4_0.gguf -p 16 -n 16 -t 4 -am 2 -tp 1 -ngl 0Parameters:
-p: the prompt length for benchmarking-n: the generation length for benchmarkingother parameters are the same as the above
We also provides executable files and corresponding libraries in host directory to serve LLMs directly on your host machine.
Text generation
LD_LIBRARY_PATH=host/lib/ ./host/bin/llama-cli-prefetch -m ggml-model-llama-2-7b-chat-q4_0.gguf -p "I believe the meaning of life is" -n 128 -t 4 -am 2 -tp 1 -c 512 -ngl 0Chatting
LD_LIBRARY_PATH=host/lib/ ./host/bin/llama-cli-prefetch -m ggml-model-llama-2-7b-chat-q4_0.gguf -p "Your are a helpful assistant" -n 128 -t 4 -am 2 -tp 1 -c 512 -ngl 0 --cnvSpeed Benchmarking
LD_LIBRARY_PATH=host/lib/ ./host/bin/llama-bench -m ggml-model-llama-2-7b-chat-q4_0.gguf -p 16 -n 16 -t 4 -am 2 -tp 1 -ngl 0- Release Android application.
- Supporting prefetching techniques on GPU
- Mail GPU
- Metal GPU
- Supporting chatting history management
- Supporting more downstream task
We value your feedback and are committed to improving MarmotApp. If you encounter any issues or have suggestions, please report them through our GitHub Issues.
Please consider citing our project if you find it useful:
@software{marmotapp,
author = {{MarmotTech}},
title = {{MarmotApp}},
url = {https://github.com/MarmotTech/MarmotApp},
year = {2025}
}The underlying techniques of MarmotApp include:
@inproceedings{euromlsys-flexinfer,
author = {Hongchao Du and
Shangyu Wu and
Arina Kharlamova and
Nan Guan and
Chun Jason Xue},
editor = {Eiko Yoneki and
Amir H. Payberah},
title = {FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading
for On-Device {LLM} Inference},
booktitle = {Proceedings of the 5th Workshop on Machine Learning and Systems, EuroMLSys
2025, World Trade Center, Rotterdam, The Netherlands, 30 March 2025-
3 April 2025},
pages = {56--65},
publisher = {{ACM}},
year = {2025},
url = {https://doi.org/10.1145/3721146.3721961},
doi = {10.1145/3721146.3721961},
}