Skip to content

hypnagonia/rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG Context Compressor

A tool for indexing and searching text using hybrid search (BM25 + vector embeddings), with MMR deduplication and context packing for LLM consumption. Runs as CLI or in-browser via WebAssembly.

Example: Querying Game of Thrones (5 books, ~2M tokens)

$ rag query -q "How did Ned Stark die" -index ./books -expand

Ned Stark was executed by beheading after being accused of treason. King Joffrey, despite initially suggesting Ned could take the black, ordered his execution. Ser Ilyn Payne, the King's Justice, carried out the sentence at the steps of the Great Sept of Baelor [1.txt:16974-16996].

~3k tokens (with RAG) vs ~2M tokens (without RAG)

$ rag query -q "Joffrey death" -index ./books -expand

Joffrey was murdered by poison at his own wedding feast. The poison used is identified as "the strangler," a rare substance that causes the throat muscles to clench, shutting off the windpipe and turning the victim's face purple [2.txt:L472-495]. During Tyrion's trial, Grand Maester Pycelle confirms that the strangler was used to kill Joffrey [3.txt:L22842-22868].

~4k tokens (with RAG) vs ~2M tokens (without RAG)

$ rag query -q "Red Wedding Robb Stark murdered" -index ./books -expand

Robb Stark was betrayed and murdered by the Freys and Boltons at the Twins during his uncle's wedding, an event known as the Red Wedding [4.txt:21098-21138].

~3k tokens (with RAG) vs ~2M tokens (without RAG)

$ rag query -q "How did Drogo die" -index ./books -expand

Drogo died after being placed in a comatose state by a bloodmagic ritual performed by Mirri Maz Duur. The ritual involved sacrificing his horse and using its blood, but it left Drogo alive yet unresponsive [1.txt:L16544-16620]. Mirri Maz Duur states that Drogo will only return to his former self under impossible conditions, implying he will never recover [1.txt:L17745-17772].

~5k tokens (with RAG) vs ~2M tokens (without RAG)

Responses generated using DeepSeek with hybrid search (BM25 + embeddings)


Installation

go build -o rag ./cmd/rag

Quick Start

First, create a rag.yaml config in your content directory:

# books/rag.yaml
index:
  includes:
    - "**/*.txt"
  chunk_tokens: 512
  stemming: true

retrieve:
  top_k: 20
  mmr_lambda: 0.7

pack:
  token_budget: 4000
# Index a directory
rag index /path/to/books

# Search for relevant code
rag query -q "authentication handler"

# Pack context for LLM consumption
rag pack -q "how does auth work" -b 4000 -o context.json

# Generate a prompt for manual LLM orchestration
rag runprompt --runtime --ctx context.json -q "Explain the auth flow"

Commands

rag index <path>

Index files in a directory for later retrieval. Creates a .rag/index.db file.

rag index .                      # Index current directory
rag index /path/to/project       # Index specific directory

Flags:

  • -d, --dir - Root directory (default: current directory)
  • --config - Path to config file (default: ./rag.yaml)

rag query -q "<question>"

Search indexed files using BM25 retrieval with MMR deduplication.

rag query -q "database connection"
rag query -q "error handling" --top-k 10 --json
rag query -q "how to handle errors" --semantic

Flags:

  • -q, --query - Search query (required)
  • -k, --top-k - Number of results (default from config)
  • --json - Output as JSON
  • --no-mmr - Disable MMR reranking
  • --semantic - Use embedding-only search (no BM25)
  • -c, --context - Expand results by N lines before/after

rag pack -q "<question>"

Pack relevant chunks into compressed context that fits a token budget.

rag pack -q "authentication flow" -b 2000
rag pack -q "API endpoints" -o context.json

Flags:

  • -q, --query - Search query (required)
  • -b, --budget - Token budget (default from config)
  • -o, --output - Output file (default: stdout)
  • -k, --top-k - Candidate pool size

rag runprompt

Generate formatted prompts from templates for manual LLM orchestration.

# Runtime prompt for question answering
rag runprompt --runtime --ctx context.json -q "How does auth work?"

# Builder prompt for context compression
rag runprompt --builder --ctx context.json

Flags:

  • --runtime - Use runtime (answering) prompt template
  • --builder - Use builder (compression) prompt template
  • --ctx - Path to packed context JSON file (required)
  • -q, --query - Override query for runtime prompt

Configuration

Create a rag.yaml file in your project root:

index:
  includes:
    - "**/*.go"
    - "**/*.py"
    - "**/*.js"
    - "**/*.ts"
    - "**/*.md"
  excludes:
    - "**/node_modules/**"
    - "**/vendor/**"
    - "**/.git/**"
  stemming: true
  chunk_tokens: 512
  chunk_overlap: 50
  k1: 1.2
  b: 0.75

retrieve:
  top_k: 20
  mmr_lambda: 0.7
  dedup_jaccard: 0.8

pack:
  token_budget: 4000
  output: json

logging:
  level: info

Configuration Options

Section Option Description Default
index includes Glob patterns for files to index Common code extensions
index excludes Glob patterns to exclude node_modules, vendor, .git
index stemming Enable Porter stemming true
index chunk_tokens Max tokens per chunk 512
index chunk_overlap Token overlap between chunks 50
index k1 BM25 k1 parameter 1.2
index b BM25 b parameter 0.75
retrieve top_k Default number of results 20
retrieve mmr_lambda MMR relevance vs diversity (0-1) 0.7
retrieve dedup_jaccard Jaccard threshold for dedup 0.8
pack token_budget Default token budget 4000

Hybrid Search (BM25 + Vector Embeddings)

To enable semantic search alongside BM25 keyword search, install Ollama and pull an embedding model:

# Install Ollama (macOS)
brew install ollama

# Start Ollama server
ollama serve

# Pull embedding model
ollama pull nomic-embed-text

Then add embedding config to your rag.yaml:

embedding:
  enabled: true
  provider: ollama
  model: nomic-embed-text
  dimension: 768

retrieve:
  hybrid_enabled: true
  rrf_k: 60          # RRF fusion parameter
  bm25_weight: 0.5   # Balance between BM25 and vector (0-1)

Re-index to generate embeddings:

rag index /path/to/content

Hybrid search combines BM25 (keyword matching) with vector similarity (semantic matching) using Reciprocal Rank Fusion (RRF).

Semantic-Only Search

Use --semantic flag to search using only vector embeddings (no BM25 keyword matching):

rag query -q "a noble man betrayed by those he trusted" --semantic

Semantic search is useful for:

  • Natural language questions (e.g., "how to handle errors gracefully")
  • Conceptual queries where exact keywords may not appear
  • Finding related content even when terminology differs

Requires embeddings to be enabled and indexed (see Hybrid Search section above).

How It Works

Indexing

  1. Walks directory with glob patterns
  2. Checks file modification times for incremental updates
  3. Splits files into line-based chunks with token awareness
  4. Tokenizes with optional Porter stemming
  5. Builds inverted index with term frequencies
  6. Stores in BoltDB (.rag/index.db)

Retrieval

  1. Tokenizes and stems query
  2. Scores chunks using BM25:
    score(q,c) = Σ IDF(t) × (tf × (k1+1)) / (tf + k1 × (1-b + b×|c|/avgDl))
    
  3. Applies MMR for diversity:
    MMR(c) = λ × relevance(c) - (1-λ) × max_similarity(c, selected)
    
  4. Returns ranked, deduplicated results

Packing

  1. Calculates utility = score / token_count
  2. Greedily selects chunks by utility until budget exhausted
  3. Merges adjacent chunks from same file
  4. Outputs JSON with citations (path, line range, relevance)

Output Format

Packed Context JSON

{
  "query": "authentication",
  "budget_tokens": 4000,
  "used_tokens": 1250,
  "snippets": [
    {
      "path": "/src/auth/handler.go",
      "range": "L45-89",
      "why": "BM25 score: 2.34",
      "text": "func Authenticate(..."
    }
  ]
}

WebAssembly (Browser)

RAG can run entirely in the browser via WebAssembly (BM25 search only, no embeddings).

Build WASM

make build-wasm
# Or manually:
GOOS=js GOARCH=wasm go build -o examples/wasm/rag.wasm ./cmd/wasm

Run Demo

cd examples/wasm
python3 -m http.server 8080
# Open http://localhost:8080

JavaScript API

// Index content
ragIndex("file.txt", "Your text content here...")

// Search (returns JSON string)
const results = JSON.parse(ragQuery("search term", 5))

// Clear index
ragClear()

// Get statistics
const stats = JSON.parse(ragStats())

See examples/wasm/README.md for details.


Architecture

cmd/rag/main.go          # Entrypoint
cmd/wasm/main.go         # WASM entrypoint
internal/
├── domain/              # Core entities (Document, Chunk, etc.)
├── port/                # Interfaces (IndexStore, Retriever, etc.)
├── usecase/             # Business logic
│   ├── index.go         # Indexing orchestration
│   ├── retrieve.go      # Search with BM25 + MMR
│   └── pack.go          # Context packing
└── adapter/
    ├── fs/              # File system walker
    ├── store/           # BoltDB implementation
    ├── analyzer/        # Tokenizer + Porter stemmer
    ├── chunker/         # Line-based chunking
    └── retriever/       # BM25 + MMR implementations

License

MIT

About

CLI RAG Context Compressor

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors