Vectorax

Vectorax

Local-first agentic AI for the Anki Vector robot ecosystem

34,507 semantic chunks · 13 source repositories · 0% hallucination rate · 100% local

Ask a question about Vector's cliff sensor.
Get back the exact C++ function, the TRM page, and the cross-repo call chain.
Every citation verified against real files. Nothing fabricated.

What Is Vectorax?

Vectorax is a fully local RAG (Retrieval-Augmented Generation) system built to let AI agents and LLMs work intelligently with the entire Anki Vector robot codebase — firmware, SDKs, cloud services, community forks, and the 565-page Technical Reference Manual (TRM).

It has two components that work together:

Component	Role
VaultForge	Parsing pipeline — ingests 13 repos + TRM PDF → 34,507 semantic chunks in ChromaDB
VectorMap	Agentic operations center — LangGraph RAG pipeline + 5-page dashboard

Everything runs on your local machine. No OpenAI. No cloud APIs. No data exfiltration.

Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              VECTORAX                                    │
│                                                                          │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │  VaultForge Pipeline                                             │   │
│  │                                                                  │   │
│  │  13 Git Repos  ──┐                                               │   │
│  │  (C++/Go/Python) │                                               │   │
│  │                  ▼                                               │   │
│  │  VectorTRM.pdf ──► repo_parser → chunker → annotator ──────────►│   │
│  │  (565 pages)   └─► trm_scanner ──► db_writer                    │   │
│  │                                        │                         │   │
│  │                                        ▼                         │   │
│  │                              ChromaDB chroma_db_v2/              │   │
│  │                         ┌───────────────────────────────┐        │   │
│  │                         │  repo_code   · 33,773 chunks  │        │   │
│  │                         │  trm_prose   ·    230 chunks  │        │   │
│  │                         │  trm_code    ·    250 chunks  │        │   │
│  │                         │  trm_tables  ·    180 chunks  │        │   │
│  │                         │  trm_notes   ·     74 chunks  │        │   │
│  │                         └───────────────────────────────┘        │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│                                      │                                   │
│                                      ▼                                   │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │  VectorMap — Agentic Operations Center                           │   │
│  │                                                                  │   │
│  │  FastAPI ──► LangGraph State Machine                             │   │
│  │              │                                                   │   │
│  │              ├─ retrieve()  →  ChromaDB multi-collection search  │   │
│  │              ├─ generate()  →  Ollama qwen2.5-coder:7b           │   │
│  │              └─ validate()  →  WikiLink citation check           │   │
│  │                    │                                             │   │
│  │                    └── retry on failure (logged to ledger)       │   │
│  │                                                                  │   │
│  │  5-Page Dashboard  ·  35+ REST endpoints  ·  SQLite history      │   │
│  └──────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘

Hallucination Prevention

Every LLM response passes through a validation node before it reaches the user:

LLM response
    │
    ▼
① Has ## Stack Trace & Sources section?  ──── NO ──► retry
    │ YES
    ▼
② Every citation is a [[WikiLink]]?  ──────── NO ──► retry
    │ YES
    ▼
③ Every linked file exists in retrieved context?  ── NO ──► retry + log to ledger
    │ YES
    ▼
 Response delivered  ✓

Violations are captured in the hallucination ledger (SQLite) with the raw LLM output, violation type, and corrected response — visible in the Agentic Forge dashboard page.

The Repositories

Vectorax ingests and cross-links 13 open-source Vector robot repositories:

Repository	Language	Role
digital-dream-labs/vector	C++	Core robot firmware — hardware-closest code
fforchino/vector-python-sdk	Python	Primary Python SDK — LLM-facing public API
fforchino/vector-go-sdk	Go	Go language bindings
digital-dream-labs/vector-cloud	Go	Cloud gateway & authentication (Protobuf)
kercre123/wire-pod	Go	Self-hosted cloud replacement
digital-dream-labs/chipper	Go	Central gRPC server — voice + intent processing
digital-dream-labs/vector-bluetooth	Mixed	BLE setup & onboarding
digital-dream-labs/vector-web-setup	JavaScript	Web configuration UI
fforchino/vectorx	Mixed	Community extended Vector
fforchino/vectorx-voiceserver	Go	Voice services for VectorX
digital-dream-labs/escape-pod-extension	TypeScript	VS Code extension for Vector dev
digital-dream-labs/dev-docs	Markdown	Official developer documentation
digital-dream-labs/hugh	Go	Face recognition service

All repos are excluded from this repository due to size. Re-clone with:
bash VaultForge/sources/clone_repos.sh

VectorMap — 5-Page Dashboard

┌──────────────────────────────────────────────────────────┐
│  ① COMMAND CENTER  │  ② AGENTIC FORGE  │  ③ OBSERVATORY │
├──────────────────────────────────────────────────────────┤
│  ④ VAULT MANAGEMENT  │  ⑤ INTELLIGENCE TOOLS            │
└──────────────────────────────────────────────────────────┘

① Command Center

RAG chat interface with real-time source retrieval scores, live log stream, conversation memory (configurable turn buffer), and one-click Obsidian export.

② Agentic Forge

Live LangGraph node highlighter showing the active pipeline stage in real time. Query template library, hallucination ledger browser, A/B model benchmarking, and manual context injection zone.

③ Semantic Observatory

Interactive 3D PCA embedding map of all 34,507 vectors — rotatable, filterable by repository and file type. Spotlight search highlights nearest neighbours. Chunk inspector shows size distribution and per-file retrieval heatmap.

④ Vault Management

ChromaDB CRUD explorer (search, view, delete individual chunks, re-embed files), Obsidian sync drift monitor, autonomous backfill queue with progress tracking, and a composite Vault Health Score (0–100).

⑤ Intelligence Tools

Code refactor sub-agent, interactive architecture dependency graph (vis.js), web-search grounding toggle, Wire-Pod/Vector live log sniffer, token budget deep dive, and session export to Obsidian.

VaultForge — Pipeline

VaultForge/pipeline/
├── repo_parser.py        # AST + regex extraction: functions, classes, structs
├── chunker.py            # Token-aware chunking (tiktoken cl100k_base)
├── annotator.py          # LLM annotation: every function, class, file, repo
├── import_resolver.py    # Real import graph (not word-overlap guesses)
├── similarity_detector.py # MinHash LSH — cross-repo clone detection
├── repo_git_meta.py      # Git history, authors, blame data per chunk
├── trm_scanner.py        # PDF parser: prose / code / tables / notes / figures
├── trm_code.py           # TRM fenced code block extractor
├── trm_tables.py         # TRM table → structured rows with full metadata
├── trm_notes.py          # TRM developer notes/warnings (highest priority)
├── trm_figures.py        # TRM diagram → PNG + LLM vision description
├── trm_crossrefs.py      # TRM ↔ repo cross-reference linker
├── trm_repo_linker.py    # Hardware component → source file mapper
├── vault_generator.py    # Obsidian markdown vault generator
└── db_writer.py          # ChromaDB writer — nomic-embed-text 768D

Chunk metadata (25+ fields per chunk): source file, repo, language, function name, class name, git commit, author, token count, TRM cross-references, similarity cluster, import dependencies, hardware component tags, and more.

Quick Start

Prerequisites

Tool	Version	Purpose
Python	3.12+	Backend runtime
Ollama	Latest	Local LLM inference
Git	Any	Repo cloning
macOS / Linux	—	Supported platforms

1 — Clone and setup

git clone https://github.com/infraax/vectorax.git
cd vectorax

# Bootstrap Python environment + pull Ollama models
bash VectorMap/setup.sh

2 — Rebuild the vector database

# Clone the 13 source repositories (~816 MB, a few minutes)
bash VaultForge/sources/clone_repos.sh

# Run the VaultForge pipeline (~30–60 min depending on hardware)
# Requires: ollama serve && ollama pull nomic-embed-text
python VaultForge/pipeline/db_writer.py

Pre-built database: The chroma_db_v2/ (420 MB) cannot be included in the repo due to GitHub's 100 MB file size limit. See VectorMap/data/DOWNLOADS.md for details.

3 — Launch

bash VectorMap/start.sh
# → Opens dashboard at http://127.0.0.1:<port>

Environment variables

# Optional — override default paths
export VAULT_PATH="/path/to/your/obsidian/vault"
export CHROMA_PATH="/path/to/chroma_db_v2"

# Copy and edit the example env file
cp VectorMap/.env.example VectorMap/.env

Project Layout

vectorax/
│
├── .claude-project               # claude-project v4 brain (registry, agents, automations)
├── .gitignore
├── CLAUDE.md                     # Auto-generated project brief for AI sessions
│
├── VaultForge/                   # ── Parsing Pipeline ─────────────────────────
│   ├── config/
│   │   └── pipeline.yaml         # Master pipeline configuration
│   ├── docs/                     # Technical specs (8 documents)
│   ├── pipeline/                 # 14 pipeline modules
│   ├── sources/
│   │   ├── REPOS.yaml            # All 13 repo GitHub URLs
│   │   ├── clone_repos.sh        # Re-clone script
│   │   └── VectorTRM.pdf         # 565-page Technical Reference Manual
│   ├── tests/                    # VaultForge test suite
│   └── vectormap_mcp/            # MCP server for VectorMap integration
│
└── VectorMap/                    # ── Agentic Operations Center ────────────────
    ├── src/
    │   ├── server.py             # FastAPI — 35+ REST endpoints
    │   ├── langgraph_agent.py    # LangGraph pipeline (Retrieve → Generate → Validate)
    │   ├── query_history.py      # SQLite: sessions, templates, hallucination ledger
    │   └── profiler.py           # Structured logging + request timing
    ├── frontend/
    │   ├── index.html            # Dashboard shell
    │   ├── css/style.css
    │   └── js/                   # 11 JS modules (one per feature domain)
    ├── tests/                    # 86 pytest tests
    ├── data/
    │   └── DOWNLOADS.md          # ChromaDB rebuild instructions
    ├── setup.sh                  # One-command environment bootstrap
    └── start.sh                  # Quick launch

API

Core endpoints

Method	Endpoint	Description
`GET`	`/`	Dashboard UI
`GET`	`/status`	System telemetry (CPU, RAM, models, indexing state)
`POST`	`/chat`	RAG query → response + sources with scores + token usage
`GET/PUT`	`/api/config`	Read / update AGENT_CONFIG live

Memory & history

Method	Endpoint	Description
`GET/DELETE`	`/api/memory`	Conversation buffer read / clear
`GET`	`/api/hallucinations`	Hallucination ledger
`GET/POST/DELETE`	`/api/templates[/{id}]`	Query template CRUD
`GET`	`/api/history`	Query history with retrieval scores

Vault & ChromaDB

Method	Endpoint	Description
`GET`	`/api/vault/health`	Composite health score (0–100, 5 dimensions)
`GET`	`/api/vault/drift`	Sync drift monitor — stale vs fresh files
`GET`	`/api/vault/heatmap`	Per-file retrieval frequency heatmap
`GET`	`/api/chroma/search`	Semantic chunk search
`GET`	`/api/chroma/file`	All chunks for a file
`DELETE`	`/api/chroma/chunk/{id}`	Delete single chunk
`POST`	`/api/chroma/reindex`	Re-embed a single source file
`POST/GET/POST`	`/api/backfill/*`	Autonomous backfill queue

Intelligence tools

Method	Endpoint	Description
`POST`	`/api/benchmark`	A/B model comparison (tokens, latency, response)
`POST`	`/api/vector_search`	Semantic spotlight — highlight nearest vectors in 3D map
`GET`	`/api/chunks/stats`	Chunk size distribution + top files by chunk count
`GET`	`/api/vector_map`	PCA-reduced embeddings for 3D visualisation
`POST`	`/api/tools/refactor`	LLM code refactor + unit test generation sub-agent
`POST`	`/api/tools/arch_graph`	Architecture dependency graph (nodes + edges)
`GET`	`/api/robot/log/stream`	Wire-Pod / Vector live log tail
`POST`	`/api/export/obsidian`	Export chat session to Obsidian vault as Markdown

Running Tests

cd VectorMap
source agent_env/bin/activate
pytest tests/ -v --tb=short
# 86 passed

Agent Configuration

All parameters are hot-reloadable via dashboard or PUT /api/config:

{
  "model":                "qwen2.5-coder:7b",  // swap any Ollama model
  "temperature":          0.1,
  "retrieval_k":          8,                   // chunks per query
  "max_attempts":         3,                   // validation retries
  "context_budget":       20000,               // max tokens for context
  "memory_turns":         4,                   // conversation history turns
  "web_search":           false,               // DuckDuckGo fallback grounding
  "similarity_threshold": 0.0                  // min score to include chunk
}

Technical Reference Manual

The VectorTRM.pdf (included at VaultForge/sources/VectorTRM.pdf) is the 565-page Anki Vector Technical Reference Manual. VaultForge extracts it into five structured ChromaDB collections:

Collection	Content	Chunks
`trm_prose`	Chapter narrative, architecture descriptions	230
`trm_code`	Fenced code blocks (language-tagged)	250
`trm_tables`	Pin maps, specs, register tables (linearised rows)	180
`trm_notes`	Developer notes, warnings, design decisions	74
`repo_code`	All 13 source repositories	33,773

claude-project Integration

This project uses claude-project v4 for persistent AI session memory:

# Show project status (registry, agents, services, automations)
claude-project status

# Sync session memory to Obsidian vault
claude-project sync

# Dispatch an agent task
claude-project dispatch create "Review new pipeline output" --agent summariser

Configured automations:

sync-on-session-end — memory → Obsidian on every session close
daily-standup — morning summary of yesterday's events (via summariser agent)

License & Attribution

This research project is released for educational and research purposes.

All Vector robot source code belongs to their respective copyright holders:

Repository	Copyright Holder
digital-dream-labs/vector	Anki, Inc. / Digital Dream Labs
digital-dream-labs/chipper	Anki, Inc. / Digital Dream Labs
digital-dream-labs/vector-cloud	Anki, Inc. / Digital Dream Labs
digital-dream-labs/vector-bluetooth	Anki, Inc. / Digital Dream Labs
digital-dream-labs/vector-web-setup	Anki, Inc. / Digital Dream Labs
digital-dream-labs/escape-pod-extension	Anki, Inc. / Digital Dream Labs
digital-dream-labs/dev-docs	Anki, Inc. / Digital Dream Labs
digital-dream-labs/hugh	Anki, Inc. / Digital Dream Labs
fforchino/vector-python-sdk	fforchino (community fork)
fforchino/vector-go-sdk	fforchino (community fork)
fforchino/vectorx	fforchino (community fork)
fforchino/vectorx-voiceserver	fforchino (community fork)
kercre123/wire-pod	kercre123 (community project)

The VectorTRM.pdf is Anki proprietary documentation included for research purposes under fair use.

Built with Claude Code · Powered by Ollama · Indexed with ChromaDB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vectorax

Local-first agentic AI for the Anki Vector robot ecosystem

What Is Vectorax?

Architecture

Hallucination Prevention

The Repositories

VectorMap — 5-Page Dashboard

① Command Center

② Agentic Forge

③ Semantic Observatory

④ Vault Management

⑤ Intelligence Tools

VaultForge — Pipeline

Quick Start

Prerequisites

1 — Clone and setup

2 — Rebuild the vector database

3 — Launch

Environment variables

Project Layout

API

Running Tests

Agent Configuration

Technical Reference Manual

claude-project Integration

License & Attribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
VaultForge		VaultForge
VectorMap		VectorMap
.claude-project		.claude-project
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Vectorax

Local-first agentic AI for the Anki Vector robot ecosystem

What Is Vectorax?

Architecture

Hallucination Prevention

The Repositories

VectorMap — 5-Page Dashboard

① Command Center

② Agentic Forge

③ Semantic Observatory

④ Vault Management

⑤ Intelligence Tools

VaultForge — Pipeline

Quick Start

Prerequisites

1 — Clone and setup

2 — Rebuild the vector database

3 — Launch

Environment variables

Project Layout

API

Running Tests

Agent Configuration

Technical Reference Manual

claude-project Integration

License & Attribution

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages