@@ -62,23 +62,21 @@ pip install quantumrag[all]
6262pip install quantumrag[korean]
6363```
6464
65- ### CLI
65+ ### Try Instantly
6666
6767``` bash
68- # Initialize a project
69- quantumrag init
70-
71- # Ingest documents
72- quantumrag ingest ./docs --recursive
73-
74- # Ask a question
75- quantumrag query " What chunking strategies are available?"
68+ quantumrag demo # Built-in sample doc + server — try it in 30 seconds
69+ ```
7670
77- # Interactive multi-turn chat
78- quantumrag chat
71+ ### CLI
7972
80- # Start HTTP API server with web playground
81- quantumrag serve --port 8000
73+ ``` bash
74+ quantumrag init # Initialize a project
75+ quantumrag ingest ./docs --recursive # Ingest documents
76+ quantumrag query " What are the findings?" # Ask a question
77+ quantumrag chat # Interactive multi-turn chat
78+ quantumrag serve --port 8000 # HTTP API + web playground
79+ quantumrag demo # One-command demo with sample content
8280```
8381
8482### Zero Configuration
@@ -87,8 +85,8 @@ quantumrag serve --port 8000
8785
8886| Detected Key | Provider | Embedding | Generation |
8987| -------------| ----------| -----------| ------------|
90- | ` OPENAI_API_KEY ` | OpenAI | text -embedding-3-small | gpt-4 .1-nano / gpt-4.1-mini |
91- | ` GOOGLE_API_KEY ` | Gemini | text-embedding-004 | gemini-2.5-flash-lite / flash |
88+ | ` GOOGLE_API_KEY ` | Gemini | gemini -embedding-001 | gemini-3 .1-flash-lite-preview |
89+ | ` OPENAI_API_KEY ` | OpenAI | text-embedding-3-small | gpt-5.4-nano / gpt-5.4-mini |
9290| ` ANTHROPIC_API_KEY ` | Anthropic | local (bge-m3) | claude-haiku / claude-sonnet |
9391| * (none)* | Ollama | local (bge-m3) | llama3.2:3b |
9492
@@ -107,13 +105,20 @@ result = engine.query("Summarize the documents")
107105
108106## Web Playground
109107
110- Start the API server and use the built-in web playground:
111-
112108``` bash
113109quantumrag serve --port 8000
110+ # or try instantly with built-in sample content:
111+ quantumrag demo
114112```
115113
116- Open http://localhost:8000/playground to ingest documents and ask questions interactively.
114+ Open http://localhost:8000 to use the interactive playground:
115+
116+ - Upload documents (drag & drop) or paste text
117+ - Ask questions with real-time streaming or detailed mode
118+ - Inspect ** pipeline trace** — see every step (Retrieve/Generate/Other) with latency breakdown
119+ - View ** source citations** with relevance scores and expandable excerpts
120+ - Adjust ** query options** (top_k, rerank toggle, trace/stream mode)
121+ - Manage documents (list, delete)
117122
118123![ QuantumRAG Web Playground] ( assets/demo.png )
119124
@@ -162,11 +167,17 @@ User Query
162167 ├─ Query Rewrite / Expansion
163168 ├─ Entity Detection & Attribute Filtering
164169 ├─ Adaptive Routing (simple → nano, medium → mini, complex → full)
165- ├─ Triple Index Fusion Search (RRF: 0.4 / 0.35 / 0.25)
166- ├─ Reranking (FlashRank / BGE / Cohere / Jina)
167- ├─ Context Compression
170+ ├─ Triple Index Fusion Search (Score-Weighted RRF: 0.4 / 0.35 / 0.25)
171+ │ ├─ BM25 Min-Max Normalization (preserves score discrimination)
172+ │ └─ Document Coherence Boost (+5% per co-occurring chunk)
173+ ├─ Reranking with Score Blending (0.7 reranker + 0.3 fusion signal)
174+ ├─ Context Compression (75%, sentence-boundary aware)
168175 ├─ Source-Grounded Generation → Answer [1][2] + Confidence
169- └─ Post-Correction (Retrieval Retry → Self-Correct → Fact Verify → Completeness)
176+ └─ Adaptive Post-Correction (time-budgeted, auto-skip for simple queries)
177+ ├─ Retrieval Retry (BM25-dominant re-search)
178+ ├─ Self-Correction (pattern-based insufficiency detection)
179+ ├─ Fact Verification (entity + numeric cross-check, zero LLM cost)
180+ └─ Completeness Check (multi-part answer verification)
170181```
171182
172183### Triple Index Fusion
@@ -216,36 +227,37 @@ Interactive API docs: `http://localhost:8000/docs`
216227``` yaml
217228# quantumrag.yaml
218229project_name : " my-knowledge-base"
219- language : " ko " # ko, en, auto
230+ language : " auto " # auto (detect from query), ko, en
220231domain : " general" # general, legal, medical, financial, technical
221232
222233models :
223234 embedding :
224- provider : " openai " # openai, gemini , ollama, local
225- model : " text -embedding-3-small "
235+ provider : " gemini " # gemini, openai , ollama, local
236+ model : " gemini -embedding-001 "
226237 generation :
227238 simple :
228- provider : " openai "
229- model : " gpt-5.4-nano " # Low-cost for simple queries (~70%)
239+ provider : " gemini "
240+ model : " gemini-3.1-flash-lite-preview " # Low-cost for simple queries (~70%)
230241 medium :
231- provider : " openai "
232- model : " gpt-5.4-mini " # Mid-tier for moderate queries (~20%)
242+ provider : " gemini "
243+ model : " gemini-3.1-flash-lite-preview " # Mid-tier for moderate queries (~20%)
233244 complex :
234- provider : " anthropic "
235- model : " claude-sonnet-4-20250514 " # Full model for complex queries (~10%)
245+ provider : " gemini "
246+ model : " gemini-3.1-flash-lite-preview " # Full model for complex queries (~10%)
236247 reranker :
237248 provider : " flashrank" # flashrank (free/CPU), bge, cohere, jina
238249 hype :
239- provider : " openai "
240- model : " gpt-5.4-nano "
241- questions_per_chunk : 3
250+ provider : " gemini "
251+ model : " gemini-3.1-flash-lite-preview "
252+ questions_per_chunk : 4
242253
243254retrieval :
244255 top_k : 7
245256 fusion_weights :
246257 original : 0.4
247258 hype : 0.35
248259 bm25 : 0.25
260+ fusion_candidate_multiplier : 5 # Candidates = top_k * multiplier
249261 rerank : true
250262 compression : true
251263
@@ -287,9 +299,9 @@ Real-world web content used for systematic RAG validation:
287299| ds-002 | Type system + cross-topic confusion | 25 | 88% |
288300| ds-003 | Dense technical + cross-document | 30 | 83-87% |
289301| ds-004 | Table extraction + contradiction detection | 30 | 77-90% |
290- | **Combined** | **All sources merged (retrieval stress test)** | **105** | **29 %** |
302+ | **Combined** | **All sources merged + 50 noise docs (retrieval stress test)** | **105** | **75 %** |
291303
292- The Combined QA test reveals that retrieval precision is the key bottleneck at scale : 68 of 75 failures are retrieval-caused. This is the primary area for improvement .
304+ Combined QA improved from 29% to **75%** through 6 measurement-driven iterations : BM25 min-max normalization, document coherence boost, reranker score blending, and full Triple Index with HyPE .
293305
294306` ` ` bash
295307# Individual dataset
@@ -362,7 +374,7 @@ datasets/ # QA datasets (4 datasets, 105 questions)
362374├── run_qa_combined.py # Combined retrieval stress test
363375└── STATUS.md # Auto-generated dashboard
364376tests/
365- ├── unit/ # 782 unit tests
377+ ├── unit/ # 850 unit tests
366378├── scenarios/ # 176 scenario test cases (v1-v4)
367379├── security/ # SSRF, path traversal, injection tests
368380└── scale/ # Scale testing framework
0 commit comments