A tool for indexing and searching text using hybrid search (BM25 + vector embeddings), with MMR deduplication and context packing for LLM consumption. Runs as CLI or in-browser via WebAssembly.
$ rag query -q "How did Ned Stark die" -index ./books -expandNed Stark was executed by beheading after being accused of treason. King Joffrey, despite initially suggesting Ned could take the black, ordered his execution. Ser Ilyn Payne, the King's Justice, carried out the sentence at the steps of the Great Sept of Baelor [1.txt:16974-16996].
~3k tokens (with RAG) vs ~2M tokens (without RAG)
$ rag query -q "Joffrey death" -index ./books -expandJoffrey was murdered by poison at his own wedding feast. The poison used is identified as "the strangler," a rare substance that causes the throat muscles to clench, shutting off the windpipe and turning the victim's face purple [2.txt:L472-495]. During Tyrion's trial, Grand Maester Pycelle confirms that the strangler was used to kill Joffrey [3.txt:L22842-22868].
~4k tokens (with RAG) vs ~2M tokens (without RAG)
$ rag query -q "Red Wedding Robb Stark murdered" -index ./books -expandRobb Stark was betrayed and murdered by the Freys and Boltons at the Twins during his uncle's wedding, an event known as the Red Wedding [4.txt:21098-21138].
~3k tokens (with RAG) vs ~2M tokens (without RAG)
$ rag query -q "How did Drogo die" -index ./books -expandDrogo died after being placed in a comatose state by a bloodmagic ritual performed by Mirri Maz Duur. The ritual involved sacrificing his horse and using its blood, but it left Drogo alive yet unresponsive [1.txt:L16544-16620]. Mirri Maz Duur states that Drogo will only return to his former self under impossible conditions, implying he will never recover [1.txt:L17745-17772].
~5k tokens (with RAG) vs ~2M tokens (without RAG)
Responses generated using DeepSeek with hybrid search (BM25 + embeddings)
go build -o rag ./cmd/ragFirst, create a rag.yaml config in your content directory:
# books/rag.yaml
index:
includes:
- "**/*.txt"
chunk_tokens: 512
stemming: true
retrieve:
top_k: 20
mmr_lambda: 0.7
pack:
token_budget: 4000# Index a directory
rag index /path/to/books
# Search for relevant code
rag query -q "authentication handler"
# Pack context for LLM consumption
rag pack -q "how does auth work" -b 4000 -o context.json
# Generate a prompt for manual LLM orchestration
rag runprompt --runtime --ctx context.json -q "Explain the auth flow"Index files in a directory for later retrieval. Creates a .rag/index.db file.
rag index . # Index current directory
rag index /path/to/project # Index specific directoryFlags:
-d, --dir- Root directory (default: current directory)--config- Path to config file (default:./rag.yaml)
Search indexed files using BM25 retrieval with MMR deduplication.
rag query -q "database connection"
rag query -q "error handling" --top-k 10 --json
rag query -q "how to handle errors" --semanticFlags:
-q, --query- Search query (required)-k, --top-k- Number of results (default from config)--json- Output as JSON--no-mmr- Disable MMR reranking--semantic- Use embedding-only search (no BM25)-c, --context- Expand results by N lines before/after
Pack relevant chunks into compressed context that fits a token budget.
rag pack -q "authentication flow" -b 2000
rag pack -q "API endpoints" -o context.jsonFlags:
-q, --query- Search query (required)-b, --budget- Token budget (default from config)-o, --output- Output file (default: stdout)-k, --top-k- Candidate pool size
Generate formatted prompts from templates for manual LLM orchestration.
# Runtime prompt for question answering
rag runprompt --runtime --ctx context.json -q "How does auth work?"
# Builder prompt for context compression
rag runprompt --builder --ctx context.jsonFlags:
--runtime- Use runtime (answering) prompt template--builder- Use builder (compression) prompt template--ctx- Path to packed context JSON file (required)-q, --query- Override query for runtime prompt
Create a rag.yaml file in your project root:
index:
includes:
- "**/*.go"
- "**/*.py"
- "**/*.js"
- "**/*.ts"
- "**/*.md"
excludes:
- "**/node_modules/**"
- "**/vendor/**"
- "**/.git/**"
stemming: true
chunk_tokens: 512
chunk_overlap: 50
k1: 1.2
b: 0.75
retrieve:
top_k: 20
mmr_lambda: 0.7
dedup_jaccard: 0.8
pack:
token_budget: 4000
output: json
logging:
level: info| Section | Option | Description | Default |
|---|---|---|---|
index |
includes |
Glob patterns for files to index | Common code extensions |
index |
excludes |
Glob patterns to exclude | node_modules, vendor, .git |
index |
stemming |
Enable Porter stemming | true |
index |
chunk_tokens |
Max tokens per chunk | 512 |
index |
chunk_overlap |
Token overlap between chunks | 50 |
index |
k1 |
BM25 k1 parameter | 1.2 |
index |
b |
BM25 b parameter | 0.75 |
retrieve |
top_k |
Default number of results | 20 |
retrieve |
mmr_lambda |
MMR relevance vs diversity (0-1) | 0.7 |
retrieve |
dedup_jaccard |
Jaccard threshold for dedup | 0.8 |
pack |
token_budget |
Default token budget | 4000 |
To enable semantic search alongside BM25 keyword search, install Ollama and pull an embedding model:
# Install Ollama (macOS)
brew install ollama
# Start Ollama server
ollama serve
# Pull embedding model
ollama pull nomic-embed-textThen add embedding config to your rag.yaml:
embedding:
enabled: true
provider: ollama
model: nomic-embed-text
dimension: 768
retrieve:
hybrid_enabled: true
rrf_k: 60 # RRF fusion parameter
bm25_weight: 0.5 # Balance between BM25 and vector (0-1)Re-index to generate embeddings:
rag index /path/to/contentHybrid search combines BM25 (keyword matching) with vector similarity (semantic matching) using Reciprocal Rank Fusion (RRF).
Use --semantic flag to search using only vector embeddings (no BM25 keyword matching):
rag query -q "a noble man betrayed by those he trusted" --semanticSemantic search is useful for:
- Natural language questions (e.g., "how to handle errors gracefully")
- Conceptual queries where exact keywords may not appear
- Finding related content even when terminology differs
Requires embeddings to be enabled and indexed (see Hybrid Search section above).
- Walks directory with glob patterns
- Checks file modification times for incremental updates
- Splits files into line-based chunks with token awareness
- Tokenizes with optional Porter stemming
- Builds inverted index with term frequencies
- Stores in BoltDB (
.rag/index.db)
- Tokenizes and stems query
- Scores chunks using BM25:
score(q,c) = Σ IDF(t) × (tf × (k1+1)) / (tf + k1 × (1-b + b×|c|/avgDl)) - Applies MMR for diversity:
MMR(c) = λ × relevance(c) - (1-λ) × max_similarity(c, selected) - Returns ranked, deduplicated results
- Calculates utility = score / token_count
- Greedily selects chunks by utility until budget exhausted
- Merges adjacent chunks from same file
- Outputs JSON with citations (path, line range, relevance)
{
"query": "authentication",
"budget_tokens": 4000,
"used_tokens": 1250,
"snippets": [
{
"path": "/src/auth/handler.go",
"range": "L45-89",
"why": "BM25 score: 2.34",
"text": "func Authenticate(..."
}
]
}RAG can run entirely in the browser via WebAssembly (BM25 search only, no embeddings).
make build-wasm
# Or manually:
GOOS=js GOARCH=wasm go build -o examples/wasm/rag.wasm ./cmd/wasmcd examples/wasm
python3 -m http.server 8080
# Open http://localhost:8080// Index content
ragIndex("file.txt", "Your text content here...")
// Search (returns JSON string)
const results = JSON.parse(ragQuery("search term", 5))
// Clear index
ragClear()
// Get statistics
const stats = JSON.parse(ragStats())See examples/wasm/README.md for details.
cmd/rag/main.go # Entrypoint
cmd/wasm/main.go # WASM entrypoint
internal/
├── domain/ # Core entities (Document, Chunk, etc.)
├── port/ # Interfaces (IndexStore, Retriever, etc.)
├── usecase/ # Business logic
│ ├── index.go # Indexing orchestration
│ ├── retrieve.go # Search with BM25 + MMR
│ └── pack.go # Context packing
└── adapter/
├── fs/ # File system walker
├── store/ # BoltDB implementation
├── analyzer/ # Tokenizer + Porter stemmer
├── chunker/ # Line-based chunking
└── retriever/ # BM25 + MMR implementations
MIT