Skip to content

Commit b764cde

Browse files
unamedkrclaude
andcommitted
release: v0.4.3 — README update with v0.4 improvements + demo command
README (EN/KO) updates: - Combined QA: 29% → 75% with 6 measurement-driven iterations - Added `quantumrag demo` to Quick Start (one-command instant demo) - Web Playground: pipeline trace, latency breakdown, query options - Query Pipeline diagram: BM25 normalization, coherence boost, score blending, adaptive post-correction with time budget - Updated auto-detect table (Gemini first, updated model names) - Updated config example (Gemini provider, HyPE 4q, multiplier 5) - Test count: 782 → 850 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 096ec6a commit b764cde

4 files changed

Lines changed: 85 additions & 59 deletions

File tree

README.ko.md

Lines changed: 34 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -62,20 +62,21 @@ pip install quantumrag[all]
6262
pip install quantumrag[korean]
6363
```
6464

65-
### CLI
65+
### 즉시 체험
6666

6767
```bash
68-
# 프로젝트 초기화
69-
quantumrag init
70-
71-
# 문서 인제스트
72-
quantumrag ingest ./docs --recursive
68+
quantumrag demo # 내장 샘플 문서 + 서버 — 30초 안에 체험
69+
```
7370

74-
# 질문하기
75-
quantumrag query "지원되는 청킹 전략은 무엇인가요?"
71+
### CLI
7672

77-
# 웹 플레이그라운드와 함께 API 서버 시작
78-
quantumrag serve --port 8000
73+
```bash
74+
quantumrag init # 프로젝트 초기화
75+
quantumrag ingest ./docs --recursive # 문서 인제스트
76+
quantumrag query "주요 발견사항은?" # 질문하기
77+
quantumrag chat # 대화형 멀티턴 채팅
78+
quantumrag serve --port 8000 # HTTP API + 웹 플레이그라운드
79+
quantumrag demo # 샘플 콘텐츠로 원커맨드 데모
7980
```
8081

8182
### 로컬 모델 (API 키 불필요)
@@ -93,13 +94,20 @@ result = engine.query("문서를 요약해주세요")
9394

9495
## 웹 플레이그라운드
9596

96-
API 서버를 시작하면 내장 웹 플레이그라운드를 사용할 수 있습니다:
97-
9897
```bash
9998
quantumrag serve --port 8000
99+
# 또는 내장 샘플 콘텐츠로 즉시 체험:
100+
quantumrag demo
100101
```
101102

102-
http://localhost:8000/playground 에서 문서를 인제스트하고 질문할 수 있습니다.
103+
http://localhost:8000 에서 인터랙티브 플레이그라운드를 사용할 수 있습니다:
104+
105+
- 문서 업로드 (드래그 & 드롭) 또는 텍스트 붙여넣기
106+
- 실시간 스트리밍 또는 상세 모드로 질문
107+
- **파이프라인 트레이스** 조회 — 모든 단계(검색/생성/기타)의 latency 분해
108+
- **소스 인용** 관련도 점수 및 발췌문 확인
109+
- **쿼리 옵션** 조정 (top_k, rerank 토글, trace/stream 모드)
110+
- 문서 관리 (목록, 삭제)
103111

104112
![QuantumRAG 웹 플레이그라운드](assets/demo.ko.png)
105113

@@ -148,11 +156,17 @@ pip install kiwipiepy # 한국어 형태소 분석 필수
148156
├─ 쿼리 리라이트 / 확장
149157
├─ 엔티티 감지 & 속성 필터링
150158
├─ 적응형 라우팅 (simple → nano, medium → mini, complex → full)
151-
├─ Triple Index Fusion 검색 (RRF: 0.4 / 0.35 / 0.25)
152-
├─ 리랭킹 (FlashRank / BGE / Cohere / Jina)
153-
├─ 컨텍스트 압축
159+
├─ Triple Index Fusion 검색 (Score-Weighted RRF: 0.4 / 0.35 / 0.25)
160+
│ ├─ BM25 Min-Max 정규화 (스코어 변별력 보존)
161+
│ └─ Document Coherence Boost (동일 문서 chunk +5% 부스트)
162+
├─ 리랭킹 + 스코어 블렌딩 (0.7 reranker + 0.3 fusion 신호 보존)
163+
├─ 컨텍스트 압축 (75%, 문장 경계 인식)
154164
├─ 출처 기반 답변 생성 → 답변 [1][2] + 신뢰도
155-
└─ 후처리 교정 (Retrieval Retry → Self-Correct → Fact Verify → Completeness)
165+
└─ 적응형 후처리 교정 (시간 예산 기반, 단순 쿼리 자동 스킵)
166+
├─ Retrieval Retry (BM25 우선 재검색)
167+
├─ Self-Correction (패턴 기반 불충분 답변 감지)
168+
├─ Fact Verification (엔티티 + 수치 교차 검증, LLM 비용 0)
169+
└─ Completeness Check (다중 파트 답변 검증)
156170
```
157171

158172
### Triple Index Fusion
@@ -271,9 +285,9 @@ print(result.summary)
271285
| ds-002 | 타입시스템 + 교차 주제 혼동 | 25 | 88% |
272286
| ds-003 | 밀집 기술문서 + 교차 문서 | 30 | 83-87% |
273287
| ds-004 | 테이블 추출 + 모순 검출 | 30 | 77-90% |
274-
| **Combined** | **전체 소스 합산 (retrieval 스트레스 테스트)** | **105** | **29%** |
288+
| **Combined** | **전체 소스 + 50개 노이즈 문서 (retrieval 스트레스 테스트)** | **105** | **75%** |
275289

276-
Combined QA 결과: retrieval 정밀도가 핵심 병목 (75건 실패 중 68건이 retrieval 원인).
290+
Combined QA: 29%에서 **75%**로 6회 측정-개선 루프를 통해 개선. BM25 min-max 정규화, Document Coherence Boost, Reranker Score Blending, Full Triple Index(HyPE) 활성화.
277291

278292
```bash
279293
# 개별 데이터셋
@@ -346,7 +360,7 @@ datasets/ # QA 데이터셋 (4개, 105 질문)
346360
├── run_qa_combined.py # 합산 retrieval 스트레스 테스트
347361
└── STATUS.md # 자동 생성 대시보드
348362
tests/
349-
├── unit/ # 782 유닛 테스트
363+
├── unit/ # 850 유닛 테스트
350364
├── scenarios/ # 176 시나리오 테스트 (v1-v4)
351365
├── security/ # SSRF, 경로 탐색, 인젝션 테스트
352366
└── scale/ # 스케일 테스트 프레임워크

README.md

Lines changed: 49 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -62,23 +62,21 @@ pip install quantumrag[all]
6262
pip install quantumrag[korean]
6363
```
6464

65-
### CLI
65+
### Try Instantly
6666

6767
```bash
68-
# Initialize a project
69-
quantumrag init
70-
71-
# Ingest documents
72-
quantumrag ingest ./docs --recursive
73-
74-
# Ask a question
75-
quantumrag query "What chunking strategies are available?"
68+
quantumrag demo # Built-in sample doc + server — try it in 30 seconds
69+
```
7670

77-
# Interactive multi-turn chat
78-
quantumrag chat
71+
### CLI
7972

80-
# Start HTTP API server with web playground
81-
quantumrag serve --port 8000
73+
```bash
74+
quantumrag init # Initialize a project
75+
quantumrag ingest ./docs --recursive # Ingest documents
76+
quantumrag query "What are the findings?" # Ask a question
77+
quantumrag chat # Interactive multi-turn chat
78+
quantumrag serve --port 8000 # HTTP API + web playground
79+
quantumrag demo # One-command demo with sample content
8280
```
8381

8482
### Zero Configuration
@@ -87,8 +85,8 @@ quantumrag serve --port 8000
8785

8886
| Detected Key | Provider | Embedding | Generation |
8987
|-------------|----------|-----------|------------|
90-
| `OPENAI_API_KEY` | OpenAI | text-embedding-3-small | gpt-4.1-nano / gpt-4.1-mini |
91-
| `GOOGLE_API_KEY` | Gemini | text-embedding-004 | gemini-2.5-flash-lite / flash |
88+
| `GOOGLE_API_KEY` | Gemini | gemini-embedding-001 | gemini-3.1-flash-lite-preview |
89+
| `OPENAI_API_KEY` | OpenAI | text-embedding-3-small | gpt-5.4-nano / gpt-5.4-mini |
9290
| `ANTHROPIC_API_KEY` | Anthropic | local (bge-m3) | claude-haiku / claude-sonnet |
9391
| *(none)* | Ollama | local (bge-m3) | llama3.2:3b |
9492

@@ -107,13 +105,20 @@ result = engine.query("Summarize the documents")
107105

108106
## Web Playground
109107

110-
Start the API server and use the built-in web playground:
111-
112108
```bash
113109
quantumrag serve --port 8000
110+
# or try instantly with built-in sample content:
111+
quantumrag demo
114112
```
115113

116-
Open http://localhost:8000/playground to ingest documents and ask questions interactively.
114+
Open http://localhost:8000 to use the interactive playground:
115+
116+
- Upload documents (drag & drop) or paste text
117+
- Ask questions with real-time streaming or detailed mode
118+
- Inspect **pipeline trace** — see every step (Retrieve/Generate/Other) with latency breakdown
119+
- View **source citations** with relevance scores and expandable excerpts
120+
- Adjust **query options** (top_k, rerank toggle, trace/stream mode)
121+
- Manage documents (list, delete)
117122

118123
![QuantumRAG Web Playground](assets/demo.png)
119124

@@ -162,11 +167,17 @@ User Query
162167
├─ Query Rewrite / Expansion
163168
├─ Entity Detection & Attribute Filtering
164169
├─ Adaptive Routing (simple → nano, medium → mini, complex → full)
165-
├─ Triple Index Fusion Search (RRF: 0.4 / 0.35 / 0.25)
166-
├─ Reranking (FlashRank / BGE / Cohere / Jina)
167-
├─ Context Compression
170+
├─ Triple Index Fusion Search (Score-Weighted RRF: 0.4 / 0.35 / 0.25)
171+
│ ├─ BM25 Min-Max Normalization (preserves score discrimination)
172+
│ └─ Document Coherence Boost (+5% per co-occurring chunk)
173+
├─ Reranking with Score Blending (0.7 reranker + 0.3 fusion signal)
174+
├─ Context Compression (75%, sentence-boundary aware)
168175
├─ Source-Grounded Generation → Answer [1][2] + Confidence
169-
└─ Post-Correction (Retrieval Retry → Self-Correct → Fact Verify → Completeness)
176+
└─ Adaptive Post-Correction (time-budgeted, auto-skip for simple queries)
177+
├─ Retrieval Retry (BM25-dominant re-search)
178+
├─ Self-Correction (pattern-based insufficiency detection)
179+
├─ Fact Verification (entity + numeric cross-check, zero LLM cost)
180+
└─ Completeness Check (multi-part answer verification)
170181
```
171182

172183
### Triple Index Fusion
@@ -216,36 +227,37 @@ Interactive API docs: `http://localhost:8000/docs`
216227
```yaml
217228
# quantumrag.yaml
218229
project_name: "my-knowledge-base"
219-
language: "ko" # ko, en, auto
230+
language: "auto" # auto (detect from query), ko, en
220231
domain: "general" # general, legal, medical, financial, technical
221232

222233
models:
223234
embedding:
224-
provider: "openai" # openai, gemini, ollama, local
225-
model: "text-embedding-3-small"
235+
provider: "gemini" # gemini, openai, ollama, local
236+
model: "gemini-embedding-001"
226237
generation:
227238
simple:
228-
provider: "openai"
229-
model: "gpt-5.4-nano" # Low-cost for simple queries (~70%)
239+
provider: "gemini"
240+
model: "gemini-3.1-flash-lite-preview" # Low-cost for simple queries (~70%)
230241
medium:
231-
provider: "openai"
232-
model: "gpt-5.4-mini" # Mid-tier for moderate queries (~20%)
242+
provider: "gemini"
243+
model: "gemini-3.1-flash-lite-preview" # Mid-tier for moderate queries (~20%)
233244
complex:
234-
provider: "anthropic"
235-
model: "claude-sonnet-4-20250514" # Full model for complex queries (~10%)
245+
provider: "gemini"
246+
model: "gemini-3.1-flash-lite-preview" # Full model for complex queries (~10%)
236247
reranker:
237248
provider: "flashrank" # flashrank (free/CPU), bge, cohere, jina
238249
hype:
239-
provider: "openai"
240-
model: "gpt-5.4-nano"
241-
questions_per_chunk: 3
250+
provider: "gemini"
251+
model: "gemini-3.1-flash-lite-preview"
252+
questions_per_chunk: 4
242253

243254
retrieval:
244255
top_k: 7
245256
fusion_weights:
246257
original: 0.4
247258
hype: 0.35
248259
bm25: 0.25
260+
fusion_candidate_multiplier: 5 # Candidates = top_k * multiplier
249261
rerank: true
250262
compression: true
251263

@@ -287,9 +299,9 @@ Real-world web content used for systematic RAG validation:
287299
| ds-002 | Type system + cross-topic confusion | 25 | 88% |
288300
| ds-003 | Dense technical + cross-document | 30 | 83-87% |
289301
| ds-004 | Table extraction + contradiction detection | 30 | 77-90% |
290-
| **Combined** | **All sources merged (retrieval stress test)** | **105** | **29%** |
302+
| **Combined** | **All sources merged + 50 noise docs (retrieval stress test)** | **105** | **75%** |
291303

292-
The Combined QA test reveals that retrieval precision is the key bottleneck at scale: 68 of 75 failures are retrieval-caused. This is the primary area for improvement.
304+
Combined QA improved from 29% to **75%** through 6 measurement-driven iterations: BM25 min-max normalization, document coherence boost, reranker score blending, and full Triple Index with HyPE.
293305

294306
```bash
295307
# Individual dataset
@@ -362,7 +374,7 @@ datasets/ # QA datasets (4 datasets, 105 questions)
362374
├── run_qa_combined.py # Combined retrieval stress test
363375
└── STATUS.md # Auto-generated dashboard
364376
tests/
365-
├── unit/ # 782 unit tests
377+
├── unit/ # 850 unit tests
366378
├── scenarios/ # 176 scenario test cases (v1-v4)
367379
├── security/ # SSRF, path traversal, injection tests
368380
└── scale/ # Scale testing framework

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "quantumrag"
7-
version = "0.4.2"
7+
version = "0.4.3"
88
description = "Index-Heavy, Query-Light RAG Engine — Put in docs, ask questions, it just works."
99
readme = "README.md"
1010
license = "Apache-2.0"

quantumrag/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.4.2"
1+
__version__ = "0.4.3"

0 commit comments

Comments
 (0)