Skip to content

infraax/vectorax

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vector Robot AI Research

Vectorax

Local-first agentic AI for the Anki Vector robot ecosystem

34,507 semantic chunks · 13 source repositories · 0% hallucination rate · 100% local


Python FastAPI LangGraph ChromaDB Ollama Tests License


Ask a question about Vector's cliff sensor.
Get back the exact C++ function, the TRM page, and the cross-repo call chain.
Every citation verified against real files. Nothing fabricated.

What Is Vectorax?

Vectorax is a fully local RAG (Retrieval-Augmented Generation) system built to let AI agents and LLMs work intelligently with the entire Anki Vector robot codebase — firmware, SDKs, cloud services, community forks, and the 565-page Technical Reference Manual (TRM).

It has two components that work together:

Component Role
VaultForge Parsing pipeline — ingests 13 repos + TRM PDF → 34,507 semantic chunks in ChromaDB
VectorMap Agentic operations center — LangGraph RAG pipeline + 5-page dashboard

Everything runs on your local machine. No OpenAI. No cloud APIs. No data exfiltration.


Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              VECTORAX                                    │
│                                                                          │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │  VaultForge Pipeline                                             │   │
│  │                                                                  │   │
│  │  13 Git Repos  ──┐                                               │   │
│  │  (C++/Go/Python) │                                               │   │
│  │                  ▼                                               │   │
│  │  VectorTRM.pdf ──► repo_parser → chunker → annotator ──────────►│   │
│  │  (565 pages)   └─► trm_scanner ──► db_writer                    │   │
│  │                                        │                         │   │
│  │                                        ▼                         │   │
│  │                              ChromaDB chroma_db_v2/              │   │
│  │                         ┌───────────────────────────────┐        │   │
│  │                         │  repo_code   · 33,773 chunks  │        │   │
│  │                         │  trm_prose   ·    230 chunks  │        │   │
│  │                         │  trm_code    ·    250 chunks  │        │   │
│  │                         │  trm_tables  ·    180 chunks  │        │   │
│  │                         │  trm_notes   ·     74 chunks  │        │   │
│  │                         └───────────────────────────────┘        │   │
│  └──────────────────────────────────────────────────────────────────┘   │
│                                      │                                   │
│                                      ▼                                   │
│  ┌──────────────────────────────────────────────────────────────────┐   │
│  │  VectorMap — Agentic Operations Center                           │   │
│  │                                                                  │   │
│  │  FastAPI ──► LangGraph State Machine                             │   │
│  │              │                                                   │   │
│  │              ├─ retrieve()  →  ChromaDB multi-collection search  │   │
│  │              ├─ generate()  →  Ollama qwen2.5-coder:7b           │   │
│  │              └─ validate()  →  WikiLink citation check           │   │
│  │                    │                                             │   │
│  │                    └── retry on failure (logged to ledger)       │   │
│  │                                                                  │   │
│  │  5-Page Dashboard  ·  35+ REST endpoints  ·  SQLite history      │   │
│  └──────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────┘

Hallucination Prevention

Every LLM response passes through a validation node before it reaches the user:

LLM response
    │
    ▼
① Has ## Stack Trace & Sources section?  ──── NO ──► retry
    │ YES
    ▼
② Every citation is a [[WikiLink]]?  ──────── NO ──► retry
    │ YES
    ▼
③ Every linked file exists in retrieved context?  ── NO ──► retry + log to ledger
    │ YES
    ▼
 Response delivered  ✓

Violations are captured in the hallucination ledger (SQLite) with the raw LLM output, violation type, and corrected response — visible in the Agentic Forge dashboard page.


The Repositories

Vectorax ingests and cross-links 13 open-source Vector robot repositories:

Repository Language Role
digital-dream-labs/vector C++ Core robot firmware — hardware-closest code
fforchino/vector-python-sdk Python Primary Python SDK — LLM-facing public API
fforchino/vector-go-sdk Go Go language bindings
digital-dream-labs/vector-cloud Go Cloud gateway & authentication (Protobuf)
kercre123/wire-pod Go Self-hosted cloud replacement
digital-dream-labs/chipper Go Central gRPC server — voice + intent processing
digital-dream-labs/vector-bluetooth Mixed BLE setup & onboarding
digital-dream-labs/vector-web-setup JavaScript Web configuration UI
fforchino/vectorx Mixed Community extended Vector
fforchino/vectorx-voiceserver Go Voice services for VectorX
digital-dream-labs/escape-pod-extension TypeScript VS Code extension for Vector dev
digital-dream-labs/dev-docs Markdown Official developer documentation
digital-dream-labs/hugh Go Face recognition service

All repos are excluded from this repository due to size. Re-clone with:

bash VaultForge/sources/clone_repos.sh

VectorMap — 5-Page Dashboard

┌──────────────────────────────────────────────────────────┐
│  ① COMMAND CENTER  │  ② AGENTIC FORGE  │  ③ OBSERVATORY │
├──────────────────────────────────────────────────────────┤
│  ④ VAULT MANAGEMENT  │  ⑤ INTELLIGENCE TOOLS            │
└──────────────────────────────────────────────────────────┘

① Command Center

RAG chat interface with real-time source retrieval scores, live log stream, conversation memory (configurable turn buffer), and one-click Obsidian export.

② Agentic Forge

Live LangGraph node highlighter showing the active pipeline stage in real time. Query template library, hallucination ledger browser, A/B model benchmarking, and manual context injection zone.

③ Semantic Observatory

Interactive 3D PCA embedding map of all 34,507 vectors — rotatable, filterable by repository and file type. Spotlight search highlights nearest neighbours. Chunk inspector shows size distribution and per-file retrieval heatmap.

④ Vault Management

ChromaDB CRUD explorer (search, view, delete individual chunks, re-embed files), Obsidian sync drift monitor, autonomous backfill queue with progress tracking, and a composite Vault Health Score (0–100).

⑤ Intelligence Tools

Code refactor sub-agent, interactive architecture dependency graph (vis.js), web-search grounding toggle, Wire-Pod/Vector live log sniffer, token budget deep dive, and session export to Obsidian.


VaultForge — Pipeline

VaultForge/pipeline/
├── repo_parser.py        # AST + regex extraction: functions, classes, structs
├── chunker.py            # Token-aware chunking (tiktoken cl100k_base)
├── annotator.py          # LLM annotation: every function, class, file, repo
├── import_resolver.py    # Real import graph (not word-overlap guesses)
├── similarity_detector.py # MinHash LSH — cross-repo clone detection
├── repo_git_meta.py      # Git history, authors, blame data per chunk
├── trm_scanner.py        # PDF parser: prose / code / tables / notes / figures
├── trm_code.py           # TRM fenced code block extractor
├── trm_tables.py         # TRM table → structured rows with full metadata
├── trm_notes.py          # TRM developer notes/warnings (highest priority)
├── trm_figures.py        # TRM diagram → PNG + LLM vision description
├── trm_crossrefs.py      # TRM ↔ repo cross-reference linker
├── trm_repo_linker.py    # Hardware component → source file mapper
├── vault_generator.py    # Obsidian markdown vault generator
└── db_writer.py          # ChromaDB writer — nomic-embed-text 768D

Chunk metadata (25+ fields per chunk): source file, repo, language, function name, class name, git commit, author, token count, TRM cross-references, similarity cluster, import dependencies, hardware component tags, and more.


Quick Start

Prerequisites

Tool Version Purpose
Python 3.12+ Backend runtime
Ollama Latest Local LLM inference
Git Any Repo cloning
macOS / Linux Supported platforms

1 — Clone and setup

git clone https://github.com/infraax/vectorax.git
cd vectorax

# Bootstrap Python environment + pull Ollama models
bash VectorMap/setup.sh

2 — Rebuild the vector database

# Clone the 13 source repositories (~816 MB, a few minutes)
bash VaultForge/sources/clone_repos.sh

# Run the VaultForge pipeline (~30–60 min depending on hardware)
# Requires: ollama serve && ollama pull nomic-embed-text
python VaultForge/pipeline/db_writer.py

Pre-built database: The chroma_db_v2/ (420 MB) cannot be included in the repo due to GitHub's 100 MB file size limit. See VectorMap/data/DOWNLOADS.md for details.

3 — Launch

bash VectorMap/start.sh
# → Opens dashboard at http://127.0.0.1:<port>

Environment variables

# Optional — override default paths
export VAULT_PATH="/path/to/your/obsidian/vault"
export CHROMA_PATH="/path/to/chroma_db_v2"

# Copy and edit the example env file
cp VectorMap/.env.example VectorMap/.env

Project Layout

vectorax/
│
├── .claude-project               # claude-project v4 brain (registry, agents, automations)
├── .gitignore
├── CLAUDE.md                     # Auto-generated project brief for AI sessions
│
├── VaultForge/                   # ── Parsing Pipeline ─────────────────────────
│   ├── config/
│   │   └── pipeline.yaml         # Master pipeline configuration
│   ├── docs/                     # Technical specs (8 documents)
│   ├── pipeline/                 # 14 pipeline modules
│   ├── sources/
│   │   ├── REPOS.yaml            # All 13 repo GitHub URLs
│   │   ├── clone_repos.sh        # Re-clone script
│   │   └── VectorTRM.pdf         # 565-page Technical Reference Manual
│   ├── tests/                    # VaultForge test suite
│   └── vectormap_mcp/            # MCP server for VectorMap integration
│
└── VectorMap/                    # ── Agentic Operations Center ────────────────
    ├── src/
    │   ├── server.py             # FastAPI — 35+ REST endpoints
    │   ├── langgraph_agent.py    # LangGraph pipeline (Retrieve → Generate → Validate)
    │   ├── query_history.py      # SQLite: sessions, templates, hallucination ledger
    │   └── profiler.py           # Structured logging + request timing
    ├── frontend/
    │   ├── index.html            # Dashboard shell
    │   ├── css/style.css
    │   └── js/                   # 11 JS modules (one per feature domain)
    ├── tests/                    # 86 pytest tests
    ├── data/
    │   └── DOWNLOADS.md          # ChromaDB rebuild instructions
    ├── setup.sh                  # One-command environment bootstrap
    └── start.sh                  # Quick launch

API

Core endpoints
Method Endpoint Description
GET / Dashboard UI
GET /status System telemetry (CPU, RAM, models, indexing state)
POST /chat RAG query → response + sources with scores + token usage
GET/PUT /api/config Read / update AGENT_CONFIG live
Memory & history
Method Endpoint Description
GET/DELETE /api/memory Conversation buffer read / clear
GET /api/hallucinations Hallucination ledger
GET/POST/DELETE /api/templates[/{id}] Query template CRUD
GET /api/history Query history with retrieval scores
Vault & ChromaDB
Method Endpoint Description
GET /api/vault/health Composite health score (0–100, 5 dimensions)
GET /api/vault/drift Sync drift monitor — stale vs fresh files
GET /api/vault/heatmap Per-file retrieval frequency heatmap
GET /api/chroma/search Semantic chunk search
GET /api/chroma/file All chunks for a file
DELETE /api/chroma/chunk/{id} Delete single chunk
POST /api/chroma/reindex Re-embed a single source file
POST/GET/POST /api/backfill/* Autonomous backfill queue
Intelligence tools
Method Endpoint Description
POST /api/benchmark A/B model comparison (tokens, latency, response)
POST /api/vector_search Semantic spotlight — highlight nearest vectors in 3D map
GET /api/chunks/stats Chunk size distribution + top files by chunk count
GET /api/vector_map PCA-reduced embeddings for 3D visualisation
POST /api/tools/refactor LLM code refactor + unit test generation sub-agent
POST /api/tools/arch_graph Architecture dependency graph (nodes + edges)
GET /api/robot/log/stream Wire-Pod / Vector live log tail
POST /api/export/obsidian Export chat session to Obsidian vault as Markdown

Running Tests

cd VectorMap
source agent_env/bin/activate
pytest tests/ -v --tb=short
# 86 passed

Agent Configuration

All parameters are hot-reloadable via dashboard or PUT /api/config:

{
  "model":                "qwen2.5-coder:7b",  // swap any Ollama model
  "temperature":          0.1,
  "retrieval_k":          8,                   // chunks per query
  "max_attempts":         3,                   // validation retries
  "context_budget":       20000,               // max tokens for context
  "memory_turns":         4,                   // conversation history turns
  "web_search":           false,               // DuckDuckGo fallback grounding
  "similarity_threshold": 0.0                  // min score to include chunk
}

Technical Reference Manual

The VectorTRM.pdf (included at VaultForge/sources/VectorTRM.pdf) is the 565-page Anki Vector Technical Reference Manual. VaultForge extracts it into five structured ChromaDB collections:

Collection Content Chunks
trm_prose Chapter narrative, architecture descriptions 230
trm_code Fenced code blocks (language-tagged) 250
trm_tables Pin maps, specs, register tables (linearised rows) 180
trm_notes Developer notes, warnings, design decisions 74
repo_code All 13 source repositories 33,773

claude-project Integration

This project uses claude-project v4 for persistent AI session memory:

# Show project status (registry, agents, services, automations)
claude-project status

# Sync session memory to Obsidian vault
claude-project sync

# Dispatch an agent task
claude-project dispatch create "Review new pipeline output" --agent summariser

Configured automations:

  • sync-on-session-end — memory → Obsidian on every session close
  • daily-standup — morning summary of yesterday's events (via summariser agent)

License & Attribution

This research project is released for educational and research purposes.

All Vector robot source code belongs to their respective copyright holders:

Repository Copyright Holder
digital-dream-labs/vector Anki, Inc. / Digital Dream Labs
digital-dream-labs/chipper Anki, Inc. / Digital Dream Labs
digital-dream-labs/vector-cloud Anki, Inc. / Digital Dream Labs
digital-dream-labs/vector-bluetooth Anki, Inc. / Digital Dream Labs
digital-dream-labs/vector-web-setup Anki, Inc. / Digital Dream Labs
digital-dream-labs/escape-pod-extension Anki, Inc. / Digital Dream Labs
digital-dream-labs/dev-docs Anki, Inc. / Digital Dream Labs
digital-dream-labs/hugh Anki, Inc. / Digital Dream Labs
fforchino/vector-python-sdk fforchino (community fork)
fforchino/vector-go-sdk fforchino (community fork)
fforchino/vectorx fforchino (community fork)
fforchino/vectorx-voiceserver fforchino (community fork)
kercre123/wire-pod kercre123 (community project)

The VectorTRM.pdf is Anki proprietary documentation included for research purposes under fair use.


Built with Claude Code · Powered by Ollama · Indexed with ChromaDB

About

Fully local agentic AI for the Anki Vector robot ecosystem — LangGraph RAG, ChromaDB, Ollama, 5-page dashboard, 34,507 semantic chunks, 0% hallucination rate

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors