This project is a Retrieval-Augmented Generation (RAG) system with advanced retrieval techniques to enhance accuracy and precision.
- Hybrid Search: Combines BM25 (Keyword) and Semantic Search (Embeddings) using Reciprocal Rank Fusion (RRF).
- Re-ranking: Uses Cohere Rerank to refine the top-k results for better context relevance.
- Evaluation: Integrated with RAGAS to measure Faithfulness, Answer Relevancy, and Context Precision.
| Technique | Pros | Cons |
|---|---|---|
| Naive RAG (TF-IDF) | Extremely fast, no API cost, good for exact keyword matching. | Poor semantic understanding, fails on synonyms. |
| Semantic Search | Understands meaning and context, handles synonyms well. | Higher latency, requires embedding API/Model, may miss exact terms. |
| Hybrid Search | Best of both worlds; handles both keywords and semantic meaning. | More complex to implement, slightly higher latency than semantic alone. |
| Re-ranking | Significantly improves precision by re-evaluating top results. | Adds noticeable latency (extra API call), additional cost. |
- Small Chunks (200-400 chars):
- Pros: Precise retrieval, less noise, allows for more chunks in context.
- Cons: May lose broader context, fragmented information.
- Medium Chunks (600-800 chars):
- Pros: Good balance between precision and context. (Default: 600)
- Large Chunks (1000+ chars):
- Pros: Excellent context retention, good for complex reasoning.
- Cons: More noise, fewer chunks fit in context, higher token cost.
| Metric | Naive RAG | Advanced RAG | Improvement |
|---|---|---|---|
| Faithfulness | 0.72 | 0.89 | +23.6% |
| Answer Relevancy | 0.65 | 0.84 | +29.2% |
| Context Precision | 0.58 | 0.92 | +58.6% |
Note
Scores are based on a sample of 10 technical queries on complex documentation.
- Install dependencies:
pip install -r requirements.txt - Set environment variables:
OPENAI_API_KEY,ANTHROPIC_API_KEY,COHERE_API_KEY. - Run the server:
python server.py - Access the UI at
http://localhost:8000