RAG Context Compressor

A tool for indexing and searching text using hybrid search (BM25 + vector embeddings), with MMR deduplication and context packing for LLM consumption. Runs as CLI or in-browser via WebAssembly.

Example: Querying Game of Thrones (5 books, ~2M tokens)

$ rag query -q "How did Ned Stark die" -index ./books -expand

Ned Stark was executed by beheading after being accused of treason. King Joffrey, despite initially suggesting Ned could take the black, ordered his execution. Ser Ilyn Payne, the King's Justice, carried out the sentence at the steps of the Great Sept of Baelor [1.txt:16974-16996].

~3k tokens (with RAG) vs ~2M tokens (without RAG)

$ rag query -q "Joffrey death" -index ./books -expand

Joffrey was murdered by poison at his own wedding feast. The poison used is identified as "the strangler," a rare substance that causes the throat muscles to clench, shutting off the windpipe and turning the victim's face purple [2.txt:L472-495]. During Tyrion's trial, Grand Maester Pycelle confirms that the strangler was used to kill Joffrey [3.txt:L22842-22868].

~4k tokens (with RAG) vs ~2M tokens (without RAG)

$ rag query -q "Red Wedding Robb Stark murdered" -index ./books -expand

Robb Stark was betrayed and murdered by the Freys and Boltons at the Twins during his uncle's wedding, an event known as the Red Wedding [4.txt:21098-21138].

~3k tokens (with RAG) vs ~2M tokens (without RAG)

$ rag query -q "How did Drogo die" -index ./books -expand

Drogo died after being placed in a comatose state by a bloodmagic ritual performed by Mirri Maz Duur. The ritual involved sacrificing his horse and using its blood, but it left Drogo alive yet unresponsive [1.txt:L16544-16620]. Mirri Maz Duur states that Drogo will only return to his former self under impossible conditions, implying he will never recover [1.txt:L17745-17772].

~5k tokens (with RAG) vs ~2M tokens (without RAG)

_{Responses generated using DeepSeek with hybrid search (BM25 + embeddings)}

Installation

go build -o rag ./cmd/rag

Quick Start

First, create a rag.yaml config in your content directory:

# books/rag.yaml
index:
  includes:
    - "**/*.txt"
  chunk_tokens: 512
  stemming: true

retrieve:
  top_k: 20
  mmr_lambda: 0.7

pack:
  token_budget: 4000

# Index a directory
rag index /path/to/books

# Search for relevant code
rag query -q "authentication handler"

# Pack context for LLM consumption
rag pack -q "how does auth work" -b 4000 -o context.json

# Generate a prompt for manual LLM orchestration
rag runprompt --runtime --ctx context.json -q "Explain the auth flow"

Commands

`rag index <path>`

Index files in a directory for later retrieval. Creates a .rag/index.db file.

rag index .                      # Index current directory
rag index /path/to/project       # Index specific directory

Flags:

-d, --dir - Root directory (default: current directory)
--config - Path to config file (default: ./rag.yaml)

`rag query -q "<question>"`

Search indexed files using BM25 retrieval with MMR deduplication.

rag query -q "database connection"
rag query -q "error handling" --top-k 10 --json
rag query -q "how to handle errors" --semantic

Flags:

-q, --query - Search query (required)
-k, --top-k - Number of results (default from config)
--json - Output as JSON
--no-mmr - Disable MMR reranking
--semantic - Use embedding-only search (no BM25)
-c, --context - Expand results by N lines before/after

`rag pack -q "<question>"`

Pack relevant chunks into compressed context that fits a token budget.

rag pack -q "authentication flow" -b 2000
rag pack -q "API endpoints" -o context.json

Flags:

-q, --query - Search query (required)
-b, --budget - Token budget (default from config)
-o, --output - Output file (default: stdout)
-k, --top-k - Candidate pool size

`rag runprompt`

Generate formatted prompts from templates for manual LLM orchestration.

# Runtime prompt for question answering
rag runprompt --runtime --ctx context.json -q "How does auth work?"

# Builder prompt for context compression
rag runprompt --builder --ctx context.json

Flags:

--runtime - Use runtime (answering) prompt template
--builder - Use builder (compression) prompt template
--ctx - Path to packed context JSON file (required)
-q, --query - Override query for runtime prompt

Configuration

Create a rag.yaml file in your project root:

index:
  includes:
    - "**/*.go"
    - "**/*.py"
    - "**/*.js"
    - "**/*.ts"
    - "**/*.md"
  excludes:
    - "**/node_modules/**"
    - "**/vendor/**"
    - "**/.git/**"
  stemming: true
  chunk_tokens: 512
  chunk_overlap: 50
  k1: 1.2
  b: 0.75

retrieve:
  top_k: 20
  mmr_lambda: 0.7
  dedup_jaccard: 0.8

pack:
  token_budget: 4000
  output: json

logging:
  level: info

Configuration Options

Section	Option	Description	Default
`index`	`includes`	Glob patterns for files to index	Common code extensions
`index`	`excludes`	Glob patterns to exclude	node_modules, vendor, .git
`index`	`stemming`	Enable Porter stemming	`true`
`index`	`chunk_tokens`	Max tokens per chunk	`512`
`index`	`chunk_overlap`	Token overlap between chunks	`50`
`index`	`k1`	BM25 k1 parameter	`1.2`
`index`	`b`	BM25 b parameter	`0.75`
`retrieve`	`top_k`	Default number of results	`20`
`retrieve`	`mmr_lambda`	MMR relevance vs diversity (0-1)	`0.7`
`retrieve`	`dedup_jaccard`	Jaccard threshold for dedup	`0.8`
`pack`	`token_budget`	Default token budget	`4000`

Hybrid Search (BM25 + Vector Embeddings)

To enable semantic search alongside BM25 keyword search, install Ollama and pull an embedding model:

# Install Ollama (macOS)
brew install ollama

# Start Ollama server
ollama serve

# Pull embedding model
ollama pull nomic-embed-text

Then add embedding config to your rag.yaml:

embedding:
  enabled: true
  provider: ollama
  model: nomic-embed-text
  dimension: 768

retrieve:
  hybrid_enabled: true
  rrf_k: 60          # RRF fusion parameter
  bm25_weight: 0.5   # Balance between BM25 and vector (0-1)

Re-index to generate embeddings:

rag index /path/to/content

Hybrid search combines BM25 (keyword matching) with vector similarity (semantic matching) using Reciprocal Rank Fusion (RRF).

Semantic-Only Search

Use --semantic flag to search using only vector embeddings (no BM25 keyword matching):

rag query -q "a noble man betrayed by those he trusted" --semantic

Semantic search is useful for:

Natural language questions (e.g., "how to handle errors gracefully")
Conceptual queries where exact keywords may not appear
Finding related content even when terminology differs

Requires embeddings to be enabled and indexed (see Hybrid Search section above).

How It Works

Indexing

Walks directory with glob patterns
Checks file modification times for incremental updates
Splits files into line-based chunks with token awareness
Tokenizes with optional Porter stemming
Builds inverted index with term frequencies
Stores in BoltDB (.rag/index.db)

Retrieval

Tokenizes and stems query

Scores chunks using BM25:

score(q,c) = Σ IDF(t) × (tf × (k1+1)) / (tf + k1 × (1-b + b×|c|/avgDl))

Applies MMR for diversity:

MMR(c) = λ × relevance(c) - (1-λ) × max_similarity(c, selected)

Returns ranked, deduplicated results

Packing

Calculates utility = score / token_count
Greedily selects chunks by utility until budget exhausted
Merges adjacent chunks from same file
Outputs JSON with citations (path, line range, relevance)

Output Format

Packed Context JSON

{
  "query": "authentication",
  "budget_tokens": 4000,
  "used_tokens": 1250,
  "snippets": [
    {
      "path": "/src/auth/handler.go",
      "range": "L45-89",
      "why": "BM25 score: 2.34",
      "text": "func Authenticate(..."
    }
  ]
}

WebAssembly (Browser)

RAG can run entirely in the browser via WebAssembly (BM25 search only, no embeddings).

Build WASM

make build-wasm
# Or manually:
GOOS=js GOARCH=wasm go build -o examples/wasm/rag.wasm ./cmd/wasm

Run Demo

cd examples/wasm
python3 -m http.server 8080
# Open http://localhost:8080

JavaScript API

// Index content
ragIndex("file.txt", "Your text content here...")

// Search (returns JSON string)
const results = JSON.parse(ragQuery("search term", 5))

// Clear index
ragClear()

// Get statistics
const stats = JSON.parse(ragStats())

See examples/wasm/README.md for details.

Architecture

cmd/rag/main.go          # Entrypoint
cmd/wasm/main.go         # WASM entrypoint
internal/
├── domain/              # Core entities (Document, Chunk, etc.)
├── port/                # Interfaces (IndexStore, Retriever, etc.)
├── usecase/             # Business logic
│   ├── index.go         # Indexing orchestration
│   ├── retrieve.go      # Search with BM25 + MMR
│   └── pack.go          # Context packing
└── adapter/
    ├── fs/              # File system walker
    ├── store/           # BoltDB implementation
    ├── analyzer/        # Tokenizer + Porter stemmer
    ├── chunker/         # Line-based chunking
    └── retriever/       # BM25 + MMR implementations

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
cmd		cmd
config		config
examples		examples
internal		internal
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
rag.yaml		rag.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Context Compressor

Example: Querying Game of Thrones (5 books, ~2M tokens)

Installation

Quick Start

Commands

`rag index <path>`

`rag query -q "<question>"`

`rag pack -q "<question>"`

`rag runprompt`

Configuration

Configuration Options

Hybrid Search (BM25 + Vector Embeddings)

Semantic-Only Search

How It Works

Indexing

Retrieval

Packing

Output Format

Packed Context JSON

WebAssembly (Browser)

Build WASM

Run Demo

JavaScript API

Architecture

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Context Compressor

Example: Querying Game of Thrones (5 books, ~2M tokens)

Installation

Quick Start

Commands

rag index <path>

rag query -q "<question>"

rag pack -q "<question>"

rag runprompt

Configuration

Configuration Options

Hybrid Search (BM25 + Vector Embeddings)

Semantic-Only Search

How It Works

Indexing

Retrieval

Packing

Output Format

Packed Context JSON

WebAssembly (Browser)

Build WASM

Run Demo

JavaScript API

Architecture

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`rag index <path>`

`rag query -q "<question>"`

`rag pack -q "<question>"`

`rag runprompt`

Packages