last_validated	2026-03-16

Agent Brain System Architecture

This document provides a comprehensive overview of Agent Brain's architecture, design decisions, and unique value proposition in the RAG ecosystem.

Executive Summary
Why Agent Brain?
High-Level Architecture
Core Components
Data Flow
Key Design Decisions
Comparison with Alternatives

Executive Summary

Agent Brain is a RAG (Retrieval-Augmented Generation) system designed specifically for AI coding assistants. It combines three retrieval paradigms into a unified, per-project knowledge system:

BM25 Keyword Search - Fast, precise matching for technical terms, function names, and error codes
Semantic Vector Search - Deep understanding of concepts and natural language queries
GraphRAG Knowledge Graph - Relationship-aware retrieval for dependencies, hierarchies, and connections

Unlike generic RAG solutions, Agent Brain is built with code-first priorities: AST-aware chunking, per-project isolation, and seamless integration with Claude Code.

Key Differentiators

Feature	Agent Brain	Generic RAG
Code Understanding	AST-aware chunking for 8+ languages	Text-based splitting
Search Modes	5 modes (BM25, Vector, Hybrid, Graph, Multi)	Usually 1-2 modes
Project Isolation	Per-project servers with auto-discovery	Shared instance
Knowledge Graphs	GraphRAG with entity extraction	Not available
Claude Integration	Native plugin and skill	Manual integration

Why Agent Brain?

The Problem with Generic RAG

Traditional RAG systems treat all content as text. This works for documentation but fails for codebases:

Function boundaries ignored: A function split mid-body loses context
No structural awareness: Class hierarchies and import relationships are invisible
Single search mode: Either keyword OR semantic, not both
No relationship tracking: "What calls this function?" is unanswerable

Agent Brain's Solution

Agent Brain treats code as a first-class citizen with:

AST-Aware Chunking: Uses tree-sitter to split code at semantic boundaries (functions, classes, methods)
Hybrid Search: Combines BM25 precision with vector semantics in a single query
GraphRAG: Builds a knowledge graph of entities and relationships for structural queries
Per-Project Isolation: Each project gets its own index with automatic port management

High-Level Architecture

                                    +------------------------+
                                    |    Claude Code         |
                                    |    (Plugin/Skill)      |
                                    +------------------------+
                                              |
                                              v
+------------------+              +------------------------+
|  agent-brain-cli |  REST API   |  agent-brain-server    |
|                  |------------>|                        |
| - init           |             | FastAPI + Uvicorn      |
| - start/stop     |             +------------------------+
| - query          |                       |
| - index          |       +---------------+---------------+
| - status         |       |               |               |
+------------------+       v               v               v
                    +-----------+   +-----------+   +-----------+
                    | BM25      |   | ChromaDB  |   | GraphRAG  |
                    | Index     |   | Vectors   |   | Knowledge |
                    +-----------+   +-----------+   +-----------+
                          |               |               |
                          +-------+-------+-------+-------+
                                  |
                                  v
                         +----------------+
                         | Fusion Engine  |
                         | (RRF Scoring)  |
                         +----------------+

Component Responsibilities

Component	Role
agent-brain-cli	User-facing CLI for all operations
agent-brain-server	FastAPI REST API server handling indexing and queries
BM25 Index	Keyword-based retrieval using rank-bm25
ChromaDB	Vector similarity search with OpenAI embeddings
GraphRAG	Knowledge graph for entity relationships
Fusion Engine	Combines results using Reciprocal Rank Fusion

Core Components

1. Document Loader

The document loader handles file discovery and content extraction.

Location: agent-brain-server/agent_brain_server/indexing/document_loader.py

Capabilities:

Loads documents (.md, .txt, .pdf, .html, .rst)
Loads code files (.py, .ts, .js, .java, .go, .rs, .c, .cpp, .cs)
Automatic language detection via file extension and content patterns
Metadata extraction (file size, path, source type)

Supported Languages: Python, TypeScript, JavaScript, Java, Go, Rust, C, C++, C#, Kotlin, Swift

2. Chunking Pipeline

The chunking system splits content into searchable units while preserving context.

Location: agent-brain-server/agent_brain_server/indexing/chunking.py

Two Chunking Modes:

Mode	Used For	Strategy
ContextAwareChunker	Documents	Paragraph/sentence boundaries with overlap
CodeChunker	Source Code	AST-aware boundaries (function, class, method)

Code Chunker Features:

Uses LlamaIndex CodeSplitter with tree-sitter parsing
Preserves symbol boundaries (never splits a function mid-body)
Extracts rich metadata: symbol name, kind, line numbers, docstrings
Generates optional LLM summaries for improved semantic search

3. Embedding Generator

Generates vector embeddings for semantic search.

Location: agent-brain-server/agent_brain_server/indexing/embedding.py

Configuration:

Model: text-embedding-3-large (3072 dimensions)
Batch processing: 100 chunks per batch
Caching for repeated queries

4. Vector Store (ChromaDB)

Persistent vector storage for similarity search.

Location: agent-brain-server/agent_brain_server/storage/vector_store.py

Features:

Thread-safe async operations
Cosine similarity scoring
Metadata filtering (source type, language, file path)
Upsert support for incremental updates

5. BM25 Index

Keyword-based retrieval for exact term matching.

Location: agent-brain-server/agent_brain_server/indexing/bm25_index.py

Features:

Persistent disk-based index
LlamaIndex BM25Retriever integration
Language and source type filtering
Sub-50ms query latency

6. GraphRAG Index

Knowledge graph for relationship-aware retrieval.

Location: agent-brain-server/agent_brain_server/indexing/graph_index.py

Features:

Entity extraction (LLM-based and code metadata)
Relationship storage (imports, contains, calls, extends)
Graph traversal for multi-hop queries
Two storage backends: SimplePropertyGraphStore (default) and Kuzu (production)

7. Query Service

Orchestrates search across all indexes.

Location: agent-brain-server/agent_brain_server/services/query_service.py

Query Modes:

Mode	Description	Use Case
`bm25`	Keyword-only search	Technical terms, function names
`vector`	Semantic-only search	Concepts, explanations
`hybrid`	BM25 + Vector with Relative Score Fusion	Comprehensive results
`graph`	Knowledge graph traversal	Dependencies, relationships
`multi`	All three with Reciprocal Rank Fusion	Most comprehensive

Data Flow

Indexing Flow

User Command: agent-brain index /path/to/project

1. Document Loading
   /path/to/project --> DocumentLoader --> LoadedDocument[]

2. Type Detection
   LoadedDocument --> LanguageDetector --> {source_type, language}

3. Chunking
   Documents --> ContextAwareChunker --> TextChunk[]
   Code Files --> CodeChunker (AST) --> CodeChunk[]

4. Embedding
   Chunks --> EmbeddingGenerator --> embeddings[]

5. Storage (Parallel)
   embeddings --> ChromaDB (vectors)
   chunks --> BM25Index (keywords)
   chunks --> GraphIndex (entities/relationships) [if enabled]

Query Flow

User Query: agent-brain query "how does authentication work" --mode hybrid

1. Query Processing
   "how does..." --> QueryRequest{mode=hybrid, top_k=5}

2. Parallel Retrieval
   QueryRequest --> VectorSearch --> vector_results[]
   QueryRequest --> BM25Search --> bm25_results[]

3. Fusion (Hybrid Mode)
   vector_results + bm25_results --> RelativeScoreFusion --> fused_results[]

4. Ranking & Filtering
   fused_results --> RankByScore --> top_k results

5. Response
   results --> QueryResponse{results, query_time_ms}

Key Design Decisions

1. Per-Project Isolation

Decision: Each project runs its own Agent Brain server with isolated indexes.

Rationale:

No context pollution between projects
Automatic port allocation prevents conflicts
Server discovery via runtime.json enables multi-agent workflows
Clean shutdown releases all resources

Implementation: .agent-brain/ directory per project stores state, indexes, and runtime info.

agent-brain init writes CLI/runtime state such as .agent-brain/config.json and runtime.json, while provider/search configuration for setup flows is typically authored in .agent-brain/config.yaml; both files live under the same .agent-brain/ root.

2. AST-Aware Chunking

Decision: Use tree-sitter for code parsing instead of text-based splitting.

Rationale:

Functions and classes stay intact
Symbol metadata (name, kind, line numbers) improves search relevance
Enables structural queries ("find all methods in class X")
Supports 8+ languages with consistent quality

3. Hybrid Search Default

Decision: Hybrid mode (BM25 + Vector) is the default search mode.

Rationale:

BM25 excels at exact matches (function names, error codes)
Vector search excels at semantic understanding
Fusion provides best of both worlds
Alpha parameter allows tuning (0 = pure BM25, 1 = pure vector)

4. GraphRAG as Optional

Decision: GraphRAG is disabled by default; users opt-in via configuration.

Rationale:

Entity extraction adds indexing latency
Graph storage requires additional memory
Many use cases don't need relationship queries
Progressive enhancement: enable when needed

5. LlamaIndex Foundation

Decision: Build on LlamaIndex rather than implementing RAG primitives from scratch.

Rationale:

Battle-tested components (CodeSplitter, BM25Retriever)
Active community and maintenance
Plugin ecosystem (graph stores, embeddings)
Focus on code-specific innovations, not RAG basics

Comparison with Alternatives

Agent Brain vs. LangChain RAG

Aspect	Agent Brain	LangChain RAG
Code Support	AST-aware, 8+ languages	Text-based only
Search Modes	5 modes with fusion	Usually 1-2 modes
Graph Support	Built-in GraphRAG	Requires custom setup
Deployment	Per-project servers	Shared service
Claude Integration	Native plugin	Manual integration

Agent Brain vs. Copilot Workspace

Aspect	Agent Brain	Copilot Workspace
Customization	Full control	Black box
Index Content	Your choice	Predetermined
Search Tuning	Mode/threshold control	None
Local Control	Full	Cloud-dependent
Cost	OpenAI embeddings only	Subscription

Agent Brain vs. Custom ChromaDB Setup

Aspect	Agent Brain	Custom ChromaDB
Code Understanding	Built-in AST	DIY
BM25 Search	Included	Separate system
Graph Search	Included	Not available
CLI/API	Ready to use	Build yourself
Multi-project	Automatic	Manual setup

Next Steps

GraphRAG Integration Guide - Deep dive into knowledge graph features
Code Indexing Deep Dive - AST-aware chunking explained
API Reference - Complete REST API documentation
Configuration Reference - All configuration options

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Brain System Architecture

Table of Contents

Executive Summary

Key Differentiators

Why Agent Brain?

The Problem with Generic RAG

Agent Brain's Solution

High-Level Architecture

Component Responsibilities

Core Components

1. Document Loader

2. Chunking Pipeline

3. Embedding Generator

4. Vector Store (ChromaDB)

5. BM25 Index

6. GraphRAG Index

7. Query Service

Data Flow

Indexing Flow

Query Flow

Key Design Decisions

1. Per-Project Isolation

2. AST-Aware Chunking

3. Hybrid Search Default

4. GraphRAG as Optional

5. LlamaIndex Foundation

Comparison with Alternatives

Agent Brain vs. LangChain RAG

Agent Brain vs. Copilot Workspace

Agent Brain vs. Custom ChromaDB Setup

Next Steps

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Agent Brain System Architecture

Table of Contents

Executive Summary

Key Differentiators

Why Agent Brain?

The Problem with Generic RAG

Agent Brain's Solution

High-Level Architecture

Component Responsibilities

Core Components

1. Document Loader

2. Chunking Pipeline

3. Embedding Generator

4. Vector Store (ChromaDB)

5. BM25 Index

6. GraphRAG Index

7. Query Service

Data Flow

Indexing Flow

Query Flow

Key Design Decisions

1. Per-Project Isolation

2. AST-Aware Chunking

3. Hybrid Search Default

4. GraphRAG as Optional

5. LlamaIndex Foundation

Comparison with Alternatives

Agent Brain vs. LangChain RAG

Agent Brain vs. Copilot Workspace

Agent Brain vs. Custom ChromaDB Setup

Next Steps