Skip to content

MarmotTech/MarmotApp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MarmotApp

License Related Repository: MarmotApp

iOS: TestFlight Ready Android: Libraries Supported Host: Libraries Supported

Enable running any large language models locally and privately.

About

MarmotApp is a cutting-edge application that enables users to run any large language models locally on their devices, ensuring complete privacy and offline functionality. Our solution brings powerful AI capabilities to your fingertips without compromising data security.

Preview

🚀 Quick Start

📱 iOS - Ready for Testing

Our iOS app is available through TestFlight for beta testing.

To join the beta:

  1. Fill out our Beta Tester Form.

  2. We'll send a TestFlight invitation to your Apple ID email.

  3. Install via TestFlight and start using locally!

Note: No App Store download required - direct TestFlight access.

The source codes for building iOS applications can be found in ios directory.

🤖 Android - Libraries Available

We currently provide pre-built executable files and corresponding libraries for Android in android directory. A full Android app is coming soon.

Setup via ADB

Push executable files and corresponding libraries to Android device.

adb push android /data/local/tmp
adb push ggml-model-llama-2-7b-chat-q4_0.gguf /data/local/tmp

Connect to device shell.

adb shell

Current Features

Text generation

cd /data/local/tmp
LD_LIBRARY_PATH=android/lib/ ./android/bin/llama-cli-prefetch -m ggml-model-llama-2-7b-chat-q4_0.gguf -p "I believe the meaning of life is" -n 128 -t 4 -am 2 -tp 1 -c 512 -ngl 0

Parameters:

  1. -m: model path
  2. -p: prompt, if the prompt contains some special characters, we can use -f to indicate the file that contains the prompt
  3. -n: the number of tokens to generate
  4. -t: the number of computing threads
  5. -am: the available memory size
  6. -tp: the number of I/O threads
  7. -c: the context size
  8. -ngl: the number of layers computed on GPU (must set to 0, prefetching on GPU is not supported so far)

Chatting

cd /data/local/tmp
LD_LIBRARY_PATH=android/lib/ ./android/bin/llama-cli-prefetch -m ggml-model-llama-2-7b-chat-q4_0.gguf -p "Your are a helpful assistant" -n 128 -t 4 -am 2 -tp 1 -c 512 -ngl 0 --cnv

Parameters:

  1. -p: system prompt, if the system prompt contains some special characters, we can use -spf to indicate the file that contains the system prompt
  2. -cnv: conversation mode

other parameters are the same as the above

Speed Benchmarking

cd /data/local/tmp
LD_LIBRARY_PATH=android/lib/ ./android/bin/llama-bench -m ggml-model-llama-2-7b-chat-q4_0.gguf -p 16 -n 16 -t 4 -am 2 -tp 1 -ngl 0

Parameters:

  1. -p: the prompt length for benchmarking
  2. -n: the generation length for benchmarking

other parameters are the same as the above

🖥️ Desktop - Executable Ready

We also provides executable files and corresponding libraries in host directory to serve LLMs directly on your host machine.

Current Features

Text generation

LD_LIBRARY_PATH=host/lib/ ./host/bin/llama-cli-prefetch -m ggml-model-llama-2-7b-chat-q4_0.gguf -p "I believe the meaning of life is" -n 128 -t 4 -am 2 -tp 1 -c 512 -ngl 0

Chatting

LD_LIBRARY_PATH=host/lib/ ./host/bin/llama-cli-prefetch -m ggml-model-llama-2-7b-chat-q4_0.gguf -p "Your are a helpful assistant" -n 128 -t 4 -am 2 -tp 1 -c 512 -ngl 0 --cnv

Speed Benchmarking

LD_LIBRARY_PATH=host/lib/ ./host/bin/llama-bench -m ggml-model-llama-2-7b-chat-q4_0.gguf -p 16 -n 16 -t 4 -am 2 -tp 1 -ngl 0

Todo List

  • Release Android application.
  • Supporting prefetching techniques on GPU
    • Mail GPU
    • Metal GPU
  • Supporting chatting history management
  • Supporting more downstream task

Feedback & Issue Reporting

We value your feedback and are committed to improving MarmotApp. If you encounter any issues or have suggestions, please report them through our GitHub Issues.

Citations

Please consider citing our project if you find it useful:

@software{marmotapp,
    author = {{MarmotTech}},
    title = {{MarmotApp}},
    url = {https://github.com/MarmotTech/MarmotApp},
    year = {2025}
}

The underlying techniques of MarmotApp include:

@inproceedings{euromlsys-flexinfer,
    author       = {Hongchao Du and
                    Shangyu Wu and
                    Arina Kharlamova and
                    Nan Guan and
                    Chun Jason Xue},
    editor       = {Eiko Yoneki and
                    Amir H. Payberah},
    title        = {FlexInfer: Breaking Memory Constraint via Flexible and Efficient Offloading
                    for On-Device {LLM} Inference},
    booktitle    = {Proceedings of the 5th Workshop on Machine Learning and Systems, EuroMLSys
                    2025, World Trade Center, Rotterdam, The Netherlands, 30 March 2025-
                    3 April 2025},
    pages        = {56--65},
    publisher    = {{ACM}},
    year         = {2025},
    url          = {https://doi.org/10.1145/3721146.3721961},
    doi          = {10.1145/3721146.3721961},
}

About

A mobile application for efficiently running large language models locally and privately.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors