Local RAG

A Retrieval-Augmented Generation system that runs entirely on your local machine. Chat about your documents without any data leaving your machine.

Features

Privacy — Zero external API calls. All processing happens locally.
GPU Accelerated — CUDA-powered embeddings for fast vectorization.
Hybrid Search — Combines semantic (vector) + keyword (BM25) retrieval with Reciprocal Rank Fusion.
Re-ranking — FlashRank cross-encoder for improved relevance.
Multi-Format — Supports PDF, Markdown, Text, and Word documents.
Configurable — YAML-based configuration for all parameters.

Quick Start

Prerequisites

Python 3.11 or 3.12 (3.13 not yet supported)
NVIDIA GPU with CUDA (recommended for performance)
Ollama installed and running

1. Clone & Setup

# Clone the repository
git clone https://github.com/yourusername/local-rag.git
cd local-rag

# Create virtual environment
python -m venv .venv

# Activate (Windows PowerShell)
.venv\Scripts\Activate.ps1

# Activate (Linux/macOS)
source .venv/bin/activate

2. Install Dependencies

# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install project dependencies
pip install -r requirements.txt

3. Setup Ollama

# Pull the LLM model
ollama pull llama3

# Start Ollama server (if not running)
ollama serve

4. Ingest Documents

# Add your documents to data/documents/ then run:
python main.py ingest ./data/documents

# Or ingest with reset (clears existing vectors)
python main.py ingest ./data/documents --reset

5. Start Chatting

python main.py chat

Usage

CLI Commands

Command	Description
`python main.py ingest <path>`	Ingest documents from a file or directory
`python main.py ingest <path> --reset`	Clear vector store and re-ingest
`python main.py chat`	Start interactive chat session

Chat Commands

Command	Description
`/help`	Show available commands
`/exit`	Exit the chat

Configuration

All settings are managed in config/config.yaml:

llm:
  model: "llama3"              # Ollama model to use
  temperature: 0.0             # 0 = deterministic responses

embedding:
  model_name: "sentence-transformers/all-MiniLM-L6-v2"
  device: "cuda"               # Use GPU for embeddings

retrieval:
  chunk_size: 500              # Characters per chunk
  k_retrieved: 20              # Candidates for re-ranking
  k_final: 5                   # Final context to LLM
  use_reranker: true           # Enable FlashRank

Architecture

Documents --> Ingestion --> Chunks --> Embedding (CUDA) --> ChromaDB

User Question --> Query Embedding --> Retrieval (Hybrid + RRF) --> Ollama --> Answer + Sources

Project Structure

local-rag/
├── main.py                 # CLI entry point
├── config/
│   └── config.yaml         # Configuration file
├── src/
│   ├── config.py           # Config validation (Pydantic)
│   ├── ingest.py           # Document loading & chunking
│   ├── vectorstore.py      # ChromaDB + embeddings + retrieval
│   ├── rag.py              # RAG orchestration
│   └── utils.py            # VRAM monitoring utilities
├── data/
│   ├── documents/          # Your documents go here
│   └── chroma_db/          # Persisted vector store
├── tests/                  # Test suite
└── requirements.txt        # Dependencies

Testing

# Run all tests
pytest tests/ -v

# Run with coverage
pytest tests/ --cov=src --cov-report=html

Troubleshooting

Issue	Solution
CUDA not available	Ensure NVIDIA drivers and CUDA toolkit are installed
Ollama connection failed	Run `ollama serve` in a separate terminal
Python version error	Use Python 3.11 or 3.12
Out of memory	Reduce `k_retrieved` in config or use smaller embedding model

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
config		config
data		data
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTEXT.md		CONTEXT.md
README.md		README.md
_rerank_method.txt		_rerank_method.txt
main.py		main.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup_env.ps1		setup_env.ps1
test_query.txt		test_query.txt
test_utils.py		test_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local RAG

Features

Quick Start

Prerequisites

1. Clone & Setup

2. Install Dependencies

3. Setup Ollama

4. Ingest Documents

5. Start Chatting

Usage

CLI Commands

Chat Commands

Configuration

Architecture

Project Structure

Testing

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Local RAG

Features

Quick Start

Prerequisites

1. Clone & Setup

2. Install Dependencies

3. Setup Ollama

4. Ingest Documents

5. Start Chatting

Usage

CLI Commands

Chat Commands

Configuration

Architecture

Project Structure

Testing

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages