Interactive CLI for managing llama.cpp on Vast.ai GPU instances via SSH.
Build, download models, start servers, and expose them locally — all from one terminal.
# Install
pip install -e .
# Configure Vast.ai API key (one of these)
export VASTAI_API_KEY="your_key_here"
# Or create ~/.llama-cli.json:
# {"api_key": "your_key_here", "ssh_key_path": "~/.ssh/id_ed25519"}
# Run
llamaThe CLI will:
- Fetch your Vast.ai instances
- Let you pick one
- Connect via SSH
- Show a menu based on what's already set up on the instance
- One-command setup — Build llama.cpp, download models (with mmproj), start server
- Interactive shell — Full PTY SSH session, run any command with your own flags
- Background processes — Run servers detached like tmux, attach/detach anytime
- SSH tunneling — Expose remote ports locally, access llama-server from your browser
- Vision model support — Auto-detect and manage mmproj files for multimodal models
- Real-time monitoring — GPU/CPU/RAM dashboard
| Command | Description |
|---|---|
/start [model] [--port N] [--mmproj file] |
Start llama-server (auto-detects model, port, mmproj) |
/kill |
Kill llama-server on remote |
/shell |
Interactive SSH shell — run any command with your own flags |
| Command | Description |
|---|---|
/bg-proc <command> |
Start any command in background (detached) |
/bg-list |
List running background processes with status |
/bg-attach <pid> |
Attach to a process's output (Ctrl+C to detach, process keeps running) |
/bg-stop <pid> |
Stop a background process |
| Command | Description |
|---|---|
/tunnel <remote_port> [local_port] |
Create SSH tunnel (defaults to same port on both sides) |
/tunnels |
List active tunnels |
/close <port> |
Close a specific tunnel |
/close --all |
Close all tunnels |
| Command | Description |
|---|---|
/build [version] |
Build/update llama.cpp from source |
/download <url> [name] |
Download a model from HuggingFace |
/models |
List all .gguf files on remote (models + mmproj) |
| Command | Description |
|---|---|
/status |
GPU, server, tunnels, models overview |
/monitor |
Real-time GPU/CPU/RAM dashboard |
/logs [n] |
Tail server log (default 20 lines) |
| Command | Description |
|---|---|
/switch |
Switch to a different Vast.ai instance |
/test |
Test tunnel connectivity |
/clear |
Clear terminal |
/help |
Show all commands |
/exit |
Disconnect and quit |
$ llama
# Select instance → Setup wizard builds llama.cpp and downloads model
# Then start server with /start or /bg-proc
$ llama
# Open interactive shell
root@34186938 [RTX 3090] /workspace> /shell
# Run llama-server with YOUR flags, live output
$ llama-server -m /workspace/model.gguf --port 8080 -ngl 99 -c 8192 --host 0.0.0.0
# Ctrl+C to stop the server
$ exit
# Back in the REPL, create tunnel
root@34186938 [RTX 3090] /workspace> /tunnel 8080
# Open http://localhost:8080 in your browser
$ llama
# Start server in background — it keeps running even if you close the CLI
root@34186938 [RTX 3090] /workspace> /bg-proc llama-server -m model.gguf --port 8080 -ngl 99
# Started (PID 1234). Go do other things.
# Check on it later
root@34186938 [RTX 3090] /workspace> /bg-list
# Attach to see live output (Ctrl+C to detach without stopping)
root@34186938 [RTX 3090] /workspace> /bg-attach 1234
# Stop it when done
root@34186938 [RTX 3090] /workspace> /bg-stop 1234
The setup wizard asks about vision support before downloading:
? Model URL: https://huggingface.co/.../model.gguf
? Filename: (model.gguf)
? Does this model have vision (multimodal)? Yes
? mmproj download URL: https://huggingface.co/.../mmproj.gguf
? Build llama.cpp and download model.gguf + mmproj.gguf?
Step 1/3: Building llama.cpp...
Step 2/3: Downloading model...
Step 3/3: Downloading mmproj...
/start auto-detects mmproj files and asks if you want to use them:
/start model.gguf
# "Found mmproj file: mmproj.gguf. Use it?" → Yes
# Starts: llama-server -m model.gguf --mmproj mmproj.gguf --port 8080
Or pass it manually:
/start model.gguf --mmproj /workspace/mmproj.gguf --port 9000
~/.llama-cli.json:
{
"api_key": "your_vastai_api_key",
"ssh_key_path": "/home/user/.ssh/id_ed25519",
"default_instance": 12345,
"recent_models": [
{
"url": "https://huggingface.co/.../model.gguf",
"filename": "model.gguf"
}
]
}- Python 3.10+
- SSH key configured on Vast.ai
- Vast.ai API key
MIT