release: v0.4.3 — README update with v0.4 improvements + demo command

unamedkr · claude · unamedkr · commit b764cde4becf · 2026-03-30T00:00:44.000+09:00
README (EN/KO) updates:
- Combined QA: 29% → 75% with 6 measurement-driven iterations
- Added `quantumrag demo` to Quick Start (one-command instant demo)
- Web Playground: pipeline trace, latency breakdown, query options
- Query Pipeline diagram: BM25 normalization, coherence boost, score
  blending, adaptive post-correction with time budget
- Updated auto-detect table (Gemini first, updated model names)
- Updated config example (Gemini provider, HyPE 4q, multiplier 5)
- Test count: 782 → 850

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/README.ko.md b/README.ko.md
@@ -62,20 +62,21 @@ pip install quantumrag[all]
 pip install quantumrag[korean]
 ```
 
-### CLI
+### 즉시 체험
 
 ```bash
-# 프로젝트 초기화
-quantumrag init
-
-# 문서 인제스트
-quantumrag ingest ./docs --recursive
+quantumrag demo    # 내장 샘플 문서 + 서버 — 30초 안에 체험
+```
 
-# 질문하기
-quantumrag query "지원되는 청킹 전략은 무엇인가요?"
+### CLI
 
-# 웹 플레이그라운드와 함께 API 서버 시작
-quantumrag serve --port 8000
+```bash
+quantumrag init                          # 프로젝트 초기화
+quantumrag ingest ./docs --recursive     # 문서 인제스트
+quantumrag query "주요 발견사항은?"         # 질문하기
+quantumrag chat                          # 대화형 멀티턴 채팅
+quantumrag serve --port 8000             # HTTP API + 웹 플레이그라운드
+quantumrag demo                          # 샘플 콘텐츠로 원커맨드 데모
 ```
 
 ### 로컬 모델 (API 키 불필요)
@@ -93,13 +94,20 @@ result = engine.query("문서를 요약해주세요")
 
 ## 웹 플레이그라운드
 
-API 서버를 시작하면 내장 웹 플레이그라운드를 사용할 수 있습니다:
-
 ```bash
 quantumrag serve --port 8000
+# 또는 내장 샘플 콘텐츠로 즉시 체험:
+quantumrag demo
 ```
 
-http://localhost:8000/playground 에서 문서를 인제스트하고 질문할 수 있습니다.
+http://localhost:8000 에서 인터랙티브 플레이그라운드를 사용할 수 있습니다:
+
+- 문서 업로드 (드래그 & 드롭) 또는 텍스트 붙여넣기
+- 실시간 스트리밍 또는 상세 모드로 질문
+- **파이프라인 트레이스** 조회 — 모든 단계(검색/생성/기타)의 latency 분해
+- **소스 인용** 관련도 점수 및 발췌문 확인
+- **쿼리 옵션** 조정 (top_k, rerank 토글, trace/stream 모드)
+- 문서 관리 (목록, 삭제)
 
 ![QuantumRAG 웹 플레이그라운드](assets/demo.ko.png)
 
@@ -148,11 +156,17 @@ pip install kiwipiepy  # 한국어 형태소 분석 필수
   ├─ 쿼리 리라이트 / 확장
   ├─ 엔티티 감지 & 속성 필터링
   ├─ 적응형 라우팅 (simple → nano, medium → mini, complex → full)
-  ├─ Triple Index Fusion 검색 (RRF: 0.4 / 0.35 / 0.25)
-  ├─ 리랭킹 (FlashRank / BGE / Cohere / Jina)
-  ├─ 컨텍스트 압축
+  ├─ Triple Index Fusion 검색 (Score-Weighted RRF: 0.4 / 0.35 / 0.25)
+  │     ├─ BM25 Min-Max 정규화 (스코어 변별력 보존)
+  │     └─ Document Coherence Boost (동일 문서 chunk +5% 부스트)
+  ├─ 리랭킹 + 스코어 블렌딩 (0.7 reranker + 0.3 fusion 신호 보존)
+  ├─ 컨텍스트 압축 (75%, 문장 경계 인식)
   ├─ 출처 기반 답변 생성 → 답변 [1][2] + 신뢰도
-  └─ 후처리 교정 (Retrieval Retry → Self-Correct → Fact Verify → Completeness)
+  └─ 적응형 후처리 교정 (시간 예산 기반, 단순 쿼리 자동 스킵)
+       ├─ Retrieval Retry (BM25 우선 재검색)
+       ├─ Self-Correction (패턴 기반 불충분 답변 감지)
+       ├─ Fact Verification (엔티티 + 수치 교차 검증, LLM 비용 0)
+       └─ Completeness Check (다중 파트 답변 검증)
 ```
 
 ### Triple Index Fusion
@@ -271,9 +285,9 @@ print(result.summary)
 | ds-002 | 타입시스템 + 교차 주제 혼동 | 25 | 88% |
 | ds-003 | 밀집 기술문서 + 교차 문서 | 30 | 83-87% |
 | ds-004 | 테이블 추출 + 모순 검출 | 30 | 77-90% |
-| **Combined** | **전체 소스 합산 (retrieval 스트레스 테스트)** | **105** | **29%** |
+| **Combined** | **전체 소스 + 50개 노이즈 문서 (retrieval 스트레스 테스트)** | **105** | **75%** |
 
-Combined QA 결과: retrieval 정밀도가 핵심 병목 (75건 실패 중 68건이 retrieval 원인).
+Combined QA: 29%에서 **75%**로 6회 측정-개선 루프를 통해 개선. BM25 min-max 정규화, Document Coherence Boost, Reranker Score Blending, Full Triple Index(HyPE) 활성화.
 
 ```bash
 # 개별 데이터셋
@@ -346,7 +360,7 @@ datasets/                      # QA 데이터셋 (4개, 105 질문)
 ├── run_qa_combined.py         # 합산 retrieval 스트레스 테스트
 └── STATUS.md                  # 자동 생성 대시보드
 tests/
-├── unit/                      # 782 유닛 테스트
+├── unit/                      # 850 유닛 테스트
 ├── scenarios/                 # 176 시나리오 테스트 (v1-v4)
 ├── security/                  # SSRF, 경로 탐색, 인젝션 테스트
 └── scale/                     # 스케일 테스트 프레임워크
diff --git a/README.md b/README.md
@@ -62,23 +62,21 @@ pip install quantumrag[all]
 pip install quantumrag[korean]
 ```
 
-### CLI
+### Try Instantly
 
 ```bash
-# Initialize a project
-quantumrag init
-
-# Ingest documents
-quantumrag ingest ./docs --recursive
-
-# Ask a question
-quantumrag query "What chunking strategies are available?"
+quantumrag demo    # Built-in sample doc + server — try it in 30 seconds
+```
 
-# Interactive multi-turn chat
-quantumrag chat
+### CLI
 
-# Start HTTP API server with web playground
-quantumrag serve --port 8000
+```bash
+quantumrag init                          # Initialize a project
+quantumrag ingest ./docs --recursive     # Ingest documents
+quantumrag query "What are the findings?" # Ask a question
+quantumrag chat                          # Interactive multi-turn chat
+quantumrag serve --port 8000             # HTTP API + web playground
+quantumrag demo                          # One-command demo with sample content
 ```
 
 ### Zero Configuration
@@ -87,8 +85,8 @@ quantumrag serve --port 8000
 
 | Detected Key | Provider | Embedding | Generation |
 |-------------|----------|-----------|------------|
-| `OPENAI_API_KEY` | OpenAI | text-embedding-3-small | gpt-4.1-nano / gpt-4.1-mini |
-| `GOOGLE_API_KEY` | Gemini | text-embedding-004 | gemini-2.5-flash-lite / flash |
+| `GOOGLE_API_KEY` | Gemini | gemini-embedding-001 | gemini-3.1-flash-lite-preview |
+| `OPENAI_API_KEY` | OpenAI | text-embedding-3-small | gpt-5.4-nano / gpt-5.4-mini |
 | `ANTHROPIC_API_KEY` | Anthropic | local (bge-m3) | claude-haiku / claude-sonnet |
 | *(none)* | Ollama | local (bge-m3) | llama3.2:3b |
 
@@ -107,13 +105,20 @@ result = engine.query("Summarize the documents")
 
 ## Web Playground
 
-Start the API server and use the built-in web playground:
-
 ```bash
 quantumrag serve --port 8000
+# or try instantly with built-in sample content:
+quantumrag demo
 ```
 
-Open http://localhost:8000/playground to ingest documents and ask questions interactively.
+Open http://localhost:8000 to use the interactive playground:
+
+- Upload documents (drag & drop) or paste text
+- Ask questions with real-time streaming or detailed mode
+- Inspect **pipeline trace** — see every step (Retrieve/Generate/Other) with latency breakdown
+- View **source citations** with relevance scores and expandable excerpts
+- Adjust **query options** (top_k, rerank toggle, trace/stream mode)
+- Manage documents (list, delete)
 
 ![QuantumRAG Web Playground](assets/demo.png)
 
@@ -162,11 +167,17 @@ User Query
   ├─ Query Rewrite / Expansion
   ├─ Entity Detection & Attribute Filtering
   ├─ Adaptive Routing (simple → nano, medium → mini, complex → full)
-  ├─ Triple Index Fusion Search (RRF: 0.4 / 0.35 / 0.25)
-  ├─ Reranking (FlashRank / BGE / Cohere / Jina)
-  ├─ Context Compression
+  ├─ Triple Index Fusion Search (Score-Weighted RRF: 0.4 / 0.35 / 0.25)
+  │     ├─ BM25 Min-Max Normalization (preserves score discrimination)
+  │     └─ Document Coherence Boost (+5% per co-occurring chunk)
+  ├─ Reranking with Score Blending (0.7 reranker + 0.3 fusion signal)
+  ├─ Context Compression (75%, sentence-boundary aware)
   ├─ Source-Grounded Generation → Answer [1][2] + Confidence
-  └─ Post-Correction (Retrieval Retry → Self-Correct → Fact Verify → Completeness)
+  └─ Adaptive Post-Correction (time-budgeted, auto-skip for simple queries)
+       ├─ Retrieval Retry (BM25-dominant re-search)
+       ├─ Self-Correction (pattern-based insufficiency detection)
+       ├─ Fact Verification (entity + numeric cross-check, zero LLM cost)
+       └─ Completeness Check (multi-part answer verification)
 ```
 
 ### Triple Index Fusion
@@ -216,36 +227,37 @@ Interactive API docs: `http://localhost:8000/docs`
 ```yaml
 # quantumrag.yaml
 project_name: "my-knowledge-base"
-language: "ko"                          # ko, en, auto
+language: "auto"                        # auto (detect from query), ko, en
 domain: "general"                       # general, legal, medical, financial, technical
 
 models:
   embedding:
-    provider: "openai"                  # openai, gemini, ollama, local
-    model: "text-embedding-3-small"
+    provider: "gemini"                  # gemini, openai, ollama, local
+    model: "gemini-embedding-001"
   generation:
     simple:
-      provider: "openai"
-      model: "gpt-5.4-nano"            # Low-cost for simple queries (~70%)
+      provider: "gemini"
+      model: "gemini-3.1-flash-lite-preview"   # Low-cost for simple queries (~70%)
     medium:
-      provider: "openai"
-      model: "gpt-5.4-mini"            # Mid-tier for moderate queries (~20%)
+      provider: "gemini"
+      model: "gemini-3.1-flash-lite-preview"   # Mid-tier for moderate queries (~20%)
     complex:
-      provider: "anthropic"
-      model: "claude-sonnet-4-20250514" # Full model for complex queries (~10%)
+      provider: "gemini"
+      model: "gemini-3.1-flash-lite-preview"   # Full model for complex queries (~10%)
   reranker:
     provider: "flashrank"               # flashrank (free/CPU), bge, cohere, jina
   hype:
-    provider: "openai"
-    model: "gpt-5.4-nano"
-    questions_per_chunk: 3
+    provider: "gemini"
+    model: "gemini-3.1-flash-lite-preview"
+    questions_per_chunk: 4
 
 retrieval:
   top_k: 7
   fusion_weights:
     original: 0.4
     hype: 0.35
     bm25: 0.25
+  fusion_candidate_multiplier: 5        # Candidates = top_k * multiplier
   rerank: true
   compression: true
 
@@ -287,9 +299,9 @@ Real-world web content used for systematic RAG validation:
 | ds-002 | Type system + cross-topic confusion | 25 | 88% |
 | ds-003 | Dense technical + cross-document | 30 | 83-87% |
 | ds-004 | Table extraction + contradiction detection | 30 | 77-90% |
-| **Combined** | **All sources merged (retrieval stress test)** | **105** | **29%** |
+| **Combined** | **All sources merged + 50 noise docs (retrieval stress test)** | **105** | **75%** |
 
-The Combined QA test reveals that retrieval precision is the key bottleneck at scale: 68 of 75 failures are retrieval-caused. This is the primary area for improvement.
+Combined QA improved from 29% to **75%** through 6 measurement-driven iterations: BM25 min-max normalization, document coherence boost, reranker score blending, and full Triple Index with HyPE.
 
 ```bash
 # Individual dataset
@@ -362,7 +374,7 @@ datasets/                      # QA datasets (4 datasets, 105 questions)
 ├── run_qa_combined.py         # Combined retrieval stress test
 └── STATUS.md                  # Auto-generated dashboard
 tests/
-├── unit/                      # 782 unit tests
+├── unit/                      # 850 unit tests
 ├── scenarios/                 # 176 scenario test cases (v1-v4)
 ├── security/                  # SSRF, path traversal, injection tests
 └── scale/                     # Scale testing framework
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "quantumrag"
-version = "0.4.2"
+version = "0.4.3"
 description = "Index-Heavy, Query-Light RAG Engine — Put in docs, ask questions, it just works."
 readme = "README.md"
 license = "Apache-2.0"
diff --git a/quantumrag/_version.py b/quantumrag/_version.py
@@ -1 +1 @@
-__version__ = "0.4.2"
+__version__ = "0.4.3"

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-__version__ = "0.4.2"`
	`1`	`+__version__ = "0.4.3"`