34,507 semantic chunks · 13 source repositories · 0% hallucination rate · 100% local
Ask a question about Vector's cliff sensor.
Get back the exact C++ function, the TRM page, and the cross-repo call chain.
Every citation verified against real files. Nothing fabricated.
Vectorax is a fully local RAG (Retrieval-Augmented Generation) system built to let AI agents and LLMs work intelligently with the entire Anki Vector robot codebase — firmware, SDKs, cloud services, community forks, and the 565-page Technical Reference Manual (TRM).
It has two components that work together:
| Component | Role |
|---|---|
| VaultForge | Parsing pipeline — ingests 13 repos + TRM PDF → 34,507 semantic chunks in ChromaDB |
| VectorMap | Agentic operations center — LangGraph RAG pipeline + 5-page dashboard |
Everything runs on your local machine. No OpenAI. No cloud APIs. No data exfiltration.
┌─────────────────────────────────────────────────────────────────────────┐
│ VECTORAX │
│ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ VaultForge Pipeline │ │
│ │ │ │
│ │ 13 Git Repos ──┐ │ │
│ │ (C++/Go/Python) │ │ │
│ │ ▼ │ │
│ │ VectorTRM.pdf ──► repo_parser → chunker → annotator ──────────►│ │
│ │ (565 pages) └─► trm_scanner ──► db_writer │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ChromaDB chroma_db_v2/ │ │
│ │ ┌───────────────────────────────┐ │ │
│ │ │ repo_code · 33,773 chunks │ │ │
│ │ │ trm_prose · 230 chunks │ │ │
│ │ │ trm_code · 250 chunks │ │ │
│ │ │ trm_tables · 180 chunks │ │ │
│ │ │ trm_notes · 74 chunks │ │ │
│ │ └───────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ VectorMap — Agentic Operations Center │ │
│ │ │ │
│ │ FastAPI ──► LangGraph State Machine │ │
│ │ │ │ │
│ │ ├─ retrieve() → ChromaDB multi-collection search │ │
│ │ ├─ generate() → Ollama qwen2.5-coder:7b │ │
│ │ └─ validate() → WikiLink citation check │ │
│ │ │ │ │
│ │ └── retry on failure (logged to ledger) │ │
│ │ │ │
│ │ 5-Page Dashboard · 35+ REST endpoints · SQLite history │ │
│ └──────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
Every LLM response passes through a validation node before it reaches the user:
LLM response
│
▼
① Has ## Stack Trace & Sources section? ──── NO ──► retry
│ YES
▼
② Every citation is a [[WikiLink]]? ──────── NO ──► retry
│ YES
▼
③ Every linked file exists in retrieved context? ── NO ──► retry + log to ledger
│ YES
▼
Response delivered ✓
Violations are captured in the hallucination ledger (SQLite) with the raw LLM output, violation type, and corrected response — visible in the Agentic Forge dashboard page.
Vectorax ingests and cross-links 13 open-source Vector robot repositories:
| Repository | Language | Role |
|---|---|---|
| digital-dream-labs/vector | C++ | Core robot firmware — hardware-closest code |
| fforchino/vector-python-sdk | Python | Primary Python SDK — LLM-facing public API |
| fforchino/vector-go-sdk | Go | Go language bindings |
| digital-dream-labs/vector-cloud | Go | Cloud gateway & authentication (Protobuf) |
| kercre123/wire-pod | Go | Self-hosted cloud replacement |
| digital-dream-labs/chipper | Go | Central gRPC server — voice + intent processing |
| digital-dream-labs/vector-bluetooth | Mixed | BLE setup & onboarding |
| digital-dream-labs/vector-web-setup | JavaScript | Web configuration UI |
| fforchino/vectorx | Mixed | Community extended Vector |
| fforchino/vectorx-voiceserver | Go | Voice services for VectorX |
| digital-dream-labs/escape-pod-extension | TypeScript | VS Code extension for Vector dev |
| digital-dream-labs/dev-docs | Markdown | Official developer documentation |
| digital-dream-labs/hugh | Go | Face recognition service |
All repos are excluded from this repository due to size. Re-clone with:
bash VaultForge/sources/clone_repos.sh
┌──────────────────────────────────────────────────────────┐
│ ① COMMAND CENTER │ ② AGENTIC FORGE │ ③ OBSERVATORY │
├──────────────────────────────────────────────────────────┤
│ ④ VAULT MANAGEMENT │ ⑤ INTELLIGENCE TOOLS │
└──────────────────────────────────────────────────────────┘
RAG chat interface with real-time source retrieval scores, live log stream, conversation memory (configurable turn buffer), and one-click Obsidian export.
Live LangGraph node highlighter showing the active pipeline stage in real time. Query template library, hallucination ledger browser, A/B model benchmarking, and manual context injection zone.
Interactive 3D PCA embedding map of all 34,507 vectors — rotatable, filterable by repository and file type. Spotlight search highlights nearest neighbours. Chunk inspector shows size distribution and per-file retrieval heatmap.
ChromaDB CRUD explorer (search, view, delete individual chunks, re-embed files), Obsidian sync drift monitor, autonomous backfill queue with progress tracking, and a composite Vault Health Score (0–100).
Code refactor sub-agent, interactive architecture dependency graph (vis.js), web-search grounding toggle, Wire-Pod/Vector live log sniffer, token budget deep dive, and session export to Obsidian.
VaultForge/pipeline/
├── repo_parser.py # AST + regex extraction: functions, classes, structs
├── chunker.py # Token-aware chunking (tiktoken cl100k_base)
├── annotator.py # LLM annotation: every function, class, file, repo
├── import_resolver.py # Real import graph (not word-overlap guesses)
├── similarity_detector.py # MinHash LSH — cross-repo clone detection
├── repo_git_meta.py # Git history, authors, blame data per chunk
├── trm_scanner.py # PDF parser: prose / code / tables / notes / figures
├── trm_code.py # TRM fenced code block extractor
├── trm_tables.py # TRM table → structured rows with full metadata
├── trm_notes.py # TRM developer notes/warnings (highest priority)
├── trm_figures.py # TRM diagram → PNG + LLM vision description
├── trm_crossrefs.py # TRM ↔ repo cross-reference linker
├── trm_repo_linker.py # Hardware component → source file mapper
├── vault_generator.py # Obsidian markdown vault generator
└── db_writer.py # ChromaDB writer — nomic-embed-text 768D
Chunk metadata (25+ fields per chunk): source file, repo, language, function name, class name, git commit, author, token count, TRM cross-references, similarity cluster, import dependencies, hardware component tags, and more.
| Tool | Version | Purpose |
|---|---|---|
| Python | 3.12+ | Backend runtime |
| Ollama | Latest | Local LLM inference |
| Git | Any | Repo cloning |
| macOS / Linux | — | Supported platforms |
git clone https://github.com/infraax/vectorax.git
cd vectorax
# Bootstrap Python environment + pull Ollama models
bash VectorMap/setup.sh# Clone the 13 source repositories (~816 MB, a few minutes)
bash VaultForge/sources/clone_repos.sh
# Run the VaultForge pipeline (~30–60 min depending on hardware)
# Requires: ollama serve && ollama pull nomic-embed-text
python VaultForge/pipeline/db_writer.pyPre-built database: The
chroma_db_v2/(420 MB) cannot be included in the repo due to GitHub's 100 MB file size limit. SeeVectorMap/data/DOWNLOADS.mdfor details.
bash VectorMap/start.sh
# → Opens dashboard at http://127.0.0.1:<port># Optional — override default paths
export VAULT_PATH="/path/to/your/obsidian/vault"
export CHROMA_PATH="/path/to/chroma_db_v2"
# Copy and edit the example env file
cp VectorMap/.env.example VectorMap/.envvectorax/
│
├── .claude-project # claude-project v4 brain (registry, agents, automations)
├── .gitignore
├── CLAUDE.md # Auto-generated project brief for AI sessions
│
├── VaultForge/ # ── Parsing Pipeline ─────────────────────────
│ ├── config/
│ │ └── pipeline.yaml # Master pipeline configuration
│ ├── docs/ # Technical specs (8 documents)
│ ├── pipeline/ # 14 pipeline modules
│ ├── sources/
│ │ ├── REPOS.yaml # All 13 repo GitHub URLs
│ │ ├── clone_repos.sh # Re-clone script
│ │ └── VectorTRM.pdf # 565-page Technical Reference Manual
│ ├── tests/ # VaultForge test suite
│ └── vectormap_mcp/ # MCP server for VectorMap integration
│
└── VectorMap/ # ── Agentic Operations Center ────────────────
├── src/
│ ├── server.py # FastAPI — 35+ REST endpoints
│ ├── langgraph_agent.py # LangGraph pipeline (Retrieve → Generate → Validate)
│ ├── query_history.py # SQLite: sessions, templates, hallucination ledger
│ └── profiler.py # Structured logging + request timing
├── frontend/
│ ├── index.html # Dashboard shell
│ ├── css/style.css
│ └── js/ # 11 JS modules (one per feature domain)
├── tests/ # 86 pytest tests
├── data/
│ └── DOWNLOADS.md # ChromaDB rebuild instructions
├── setup.sh # One-command environment bootstrap
└── start.sh # Quick launch
Core endpoints
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Dashboard UI |
GET |
/status |
System telemetry (CPU, RAM, models, indexing state) |
POST |
/chat |
RAG query → response + sources with scores + token usage |
GET/PUT |
/api/config |
Read / update AGENT_CONFIG live |
Memory & history
| Method | Endpoint | Description |
|---|---|---|
GET/DELETE |
/api/memory |
Conversation buffer read / clear |
GET |
/api/hallucinations |
Hallucination ledger |
GET/POST/DELETE |
/api/templates[/{id}] |
Query template CRUD |
GET |
/api/history |
Query history with retrieval scores |
Vault & ChromaDB
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/vault/health |
Composite health score (0–100, 5 dimensions) |
GET |
/api/vault/drift |
Sync drift monitor — stale vs fresh files |
GET |
/api/vault/heatmap |
Per-file retrieval frequency heatmap |
GET |
/api/chroma/search |
Semantic chunk search |
GET |
/api/chroma/file |
All chunks for a file |
DELETE |
/api/chroma/chunk/{id} |
Delete single chunk |
POST |
/api/chroma/reindex |
Re-embed a single source file |
POST/GET/POST |
/api/backfill/* |
Autonomous backfill queue |
Intelligence tools
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/benchmark |
A/B model comparison (tokens, latency, response) |
POST |
/api/vector_search |
Semantic spotlight — highlight nearest vectors in 3D map |
GET |
/api/chunks/stats |
Chunk size distribution + top files by chunk count |
GET |
/api/vector_map |
PCA-reduced embeddings for 3D visualisation |
POST |
/api/tools/refactor |
LLM code refactor + unit test generation sub-agent |
POST |
/api/tools/arch_graph |
Architecture dependency graph (nodes + edges) |
GET |
/api/robot/log/stream |
Wire-Pod / Vector live log tail |
POST |
/api/export/obsidian |
Export chat session to Obsidian vault as Markdown |
cd VectorMap
source agent_env/bin/activate
pytest tests/ -v --tb=short
# 86 passedAll parameters are hot-reloadable via dashboard or PUT /api/config:
The VectorTRM.pdf (included at VaultForge/sources/VectorTRM.pdf) is the 565-page Anki Vector Technical Reference Manual. VaultForge extracts it into five structured ChromaDB collections:
| Collection | Content | Chunks |
|---|---|---|
trm_prose |
Chapter narrative, architecture descriptions | 230 |
trm_code |
Fenced code blocks (language-tagged) | 250 |
trm_tables |
Pin maps, specs, register tables (linearised rows) | 180 |
trm_notes |
Developer notes, warnings, design decisions | 74 |
repo_code |
All 13 source repositories | 33,773 |
This project uses claude-project v4 for persistent AI session memory:
# Show project status (registry, agents, services, automations)
claude-project status
# Sync session memory to Obsidian vault
claude-project sync
# Dispatch an agent task
claude-project dispatch create "Review new pipeline output" --agent summariserConfigured automations:
sync-on-session-end— memory → Obsidian on every session closedaily-standup— morning summary of yesterday's events (viasummariseragent)
This research project is released for educational and research purposes.
All Vector robot source code belongs to their respective copyright holders:
| Repository | Copyright Holder |
|---|---|
| digital-dream-labs/vector | Anki, Inc. / Digital Dream Labs |
| digital-dream-labs/chipper | Anki, Inc. / Digital Dream Labs |
| digital-dream-labs/vector-cloud | Anki, Inc. / Digital Dream Labs |
| digital-dream-labs/vector-bluetooth | Anki, Inc. / Digital Dream Labs |
| digital-dream-labs/vector-web-setup | Anki, Inc. / Digital Dream Labs |
| digital-dream-labs/escape-pod-extension | Anki, Inc. / Digital Dream Labs |
| digital-dream-labs/dev-docs | Anki, Inc. / Digital Dream Labs |
| digital-dream-labs/hugh | Anki, Inc. / Digital Dream Labs |
| fforchino/vector-python-sdk | fforchino (community fork) |
| fforchino/vector-go-sdk | fforchino (community fork) |
| fforchino/vectorx | fforchino (community fork) |
| fforchino/vectorx-voiceserver | fforchino (community fork) |
| kercre123/wire-pod | kercre123 (community project) |
The VectorTRM.pdf is Anki proprietary documentation included for research purposes under fair use.
Built with Claude Code · Powered by Ollama · Indexed with ChromaDB
{ "model": "qwen2.5-coder:7b", // swap any Ollama model "temperature": 0.1, "retrieval_k": 8, // chunks per query "max_attempts": 3, // validation retries "context_budget": 20000, // max tokens for context "memory_turns": 4, // conversation history turns "web_search": false, // DuckDuckGo fallback grounding "similarity_threshold": 0.0 // min score to include chunk }