Interactive UI for exploring next-token probabilities from local GGUF models via llama-cpp-python. Visualize candidate tokens, their probabilities, and sampling controls.
Inspiration: https://www.youtube.com/watch?v=vrO8tZ0hHGk
Vibe-coded using OpenCode, Gemini 3 Pro, and MiniMax M2.1.
- Next-token probability explorer with color-coded confidence
- Starter texts dropdown + clear button
- Auto-inference mode (weighted random selection with visual feedback)
- Sampling controls: temperature, top-k, top-p, repetition penalty
- Model manager for local GGUF files
- Remote model browser and downloader (Hugging Face Hub)
- Python 3.10+
- macOS with Metal GPU offload supported
- Disk space for GGUF models
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtuvicorn app.main:app --reloadOpen: http://127.0.0.1:8000
Default model downloads automatically on first run.
GET /health— Health checkPOST /next-tokens— Get next token candidatesGET /models— List local modelsGET /models/lookup?repo_id=...— Search Hugging FacePOST /models/download— Download modelPOST /models/switch— Switch model
app/main.py— FastAPI appapp/llm.py— LLM engineapp/models_manager.py— Model handlingapp/static/— UI (HTML/CSS/JS)
MIT License.
