Skip to content

Commit e07abce

Browse files
unamedkrclaude
andcommitted
docs: CHANGELOG + README + README.ko reference Working Memory Cliff tech report
Cross-links the Phase 1B tech report from the project's main entry points so visitors landing on README see the v2 update inline with the v1 7/7 vs 0/7 Beyond RAG result. Same content reflected in: - CHANGELOG.md: new [Unreleased] section with the cliff finding, the FP32-weights control, the compression-neutrality table, the synthesised-hallucination failure mode example, and the CLI seed bug fix entry. - README.md: appends a "v2 update — the Working Memory Cliff" note immediately after the Beyond RAG honest disclaimer paragraph. - README.ko.md: same v2 후속 paragraph in Korean. No content changes elsewhere — just the cross-links so the v2 findings aren't buried in docs/paper/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 92b2f6c commit e07abce

File tree

3 files changed

+29
-1
lines changed

3 files changed

+29
-1
lines changed

CHANGELOG.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,29 @@
11
# Changelog
22

3+
## [Unreleased]
4+
5+
### Research — Working Memory Cliff tech report (Phase 1B)
6+
7+
We measured 204 NIAH trials across Llama-3.2-1B-Q8 and Llama-3.2-3B-Q4 to find where the "long-context replaces RAG" framing actually holds at edge-device scale. Both models exhibit a sharp cliff at less than 1% of their nominal 128K context window:
8+
9+
- **Llama-3.2-1B-Q8**: 100% retrieval at ctx=512, 44% at ctx=1024, 0% by ctx=1536 (graded cliff)
10+
- **Llama-3.2-3B-Q4**: 100% at ctx=1024, 0% at ctx=1280 (**step function cliff**, no degradation interval)
11+
12+
A 6-trial FP32-weights control (`TQ_NO_Q4=1`) confirms the cliff sits in the **same place** when on-the-fly weight requantization is disabled — the cliff is a model property, not a quantization artifact. 6.4× KV compression is bit-for-bit identical to FP32 baseline in 18 of 20 cells. The cliff is also independent of the KV cache.
13+
14+
Above the cliff, the dominant failure mode is **synthesised hallucination** — the model fuses the planted needle into the haystack subject's biography (e.g., "In 2023 Boulter was hired as the chief financial officer..." where Boulter is the wikitext subject and Sarah Chen is the needle). This is the same silent-hallucination failure that vector RAG produces on retrieval miss, occurring in the regime that was supposed to *eliminate* it.
15+
16+
The honest reframing of v0.12's Beyond RAG result: it works for documents that fit in the model's *effective* working memory, which is two to three orders of magnitude smaller than the nominal context window for the configurations we measured.
17+
18+
- 📄 Tech report: [`docs/paper/working-memory-cliff.md`](docs/paper/working-memory-cliff.md)
19+
- 📊 Master table: [`bench/results/niah/master_table.md`](bench/results/niah/master_table.md)
20+
- 🐦 Launch thread: [`docs/paper/twitter-thread.md`](docs/paper/twitter-thread.md)
21+
- 📝 HF blog draft: [`docs/paper/hf-blog-draft.md`](docs/paper/hf-blog-draft.md)
22+
23+
### Fixed
24+
25+
- **`-s <seed>` CLI flag**: documented in `--help` since the project's first release but never actually wired up. Passing `-s 42` previously fell through to the positional-arg branch and was parsed as a model path (`Loading model from 42... cannot open '42'`). Discovered while attempting a sampled NIAH seed sweep for the Working Memory Cliff tech report; fixed in `a8f6d8a`. Backwards compatible: callers that don't pass `-s` get bit-identical behaviour.
26+
327
## [0.12.0] — 2026-04-11 — Beyond RAG
428

529
> **Chunking RAG was a workaround for small context windows.**

README.ko.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,8 @@ Chunk-RAG가 잘못된 섹션을 검색하면, 모델은 **"모른다"고 하지
5858

5959
**핵심**: KV 압축은 단순한 메모리 절감이 아니라 **근본적으로 다른 RAG 접근**을 가능하게 합니다. RAG는 "어떤 문서를 볼지" 결정하고, long-context는 "그 문서를 얼마나 깊이 이해할지" 결정합니다. 전체 결과: [bench/results/document_level_rag_breakthrough.md](bench/results/document_level_rag_breakthrough.md)
6060

61+
> **v2 후속 — Working Memory Cliff (2026-04-11)**: v1 결과를 더 큰 grid로 확장 측정했습니다 (1B/3B 모델, ctx 256-2048, 204 NIAH trials + FP32-weights 통제 실험). 두 모델 모두 명목 128K context window의 **1% 미만**에서 sharp cliff가 존재합니다 (1B Q8 cliff 512-1024, 3B Q4 cliff 1024-1280을 **step function**으로). 6.4× KV 압축은 20개 cell 중 18개에서 fp32 baseline과 bit-for-bit 일치 — cliff는 model property이지 KV/weight quantization artifact가 아닙니다. 정직한 재해석: Beyond RAG는 *유효* working memory 안에 들어가는 문서에 대해서만 동작하며, 그 크기는 명목 context window의 100분의 1에서 1000분의 1입니다. 전체 tech report: [`docs/paper/working-memory-cliff.md`](docs/paper/working-memory-cliff.md). HuggingFace blog post draft: [`docs/paper/hf-blog-draft.md`](docs/paper/hf-blog-draft.md).
62+
6163
---
6264

6365
## 왜 quant.cpp인가?

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,9 @@ The bug was using the same tool for both. The fix is using each for what it's go
149149
**Full benchmark report:** [bench/results/document_level_rag_breakthrough.md](bench/results/document_level_rag_breakthrough.md)
150150
**Manifesto:** [docs/beyond-rag-manifesto.md](docs/beyond-rag-manifesto.md)
151151

152-
> **Honest disclaimer:** v1 is a synthetic 5-section document with 7 questions on a single 3B model. We're not claiming this is LongBench. We *are* claiming it's enough to start a conversation about the failure mode chunk-RAG has been hiding. v2 with real benchmarks is in progress.
152+
> **Honest disclaimer:** v1 is a synthetic 5-section document with 7 questions on a single 3B model. We're not claiming this is LongBench. We *are* claiming it's enough to start a conversation about the failure mode chunk-RAG has been hiding.
153+
154+
> **v2 update — the Working Memory Cliff (2026-04-11):** We followed up the v1 result with 204 NIAH trials across 1B and 3B at context lengths 256–2048, plus a 6-trial FP32-weights control. Both models hit a sharp cliff at **less than 1% of their nominal 128K context window** (1B Q8 at 512–1024, 3B Q4 at 1024–1280 *as a step function*). The 6.4× KV compression is bit-for-bit identical to FP32 baseline in 18 of 20 cells, so the cliff is a model property — not a KV property and not a weight-quantization artifact. The honest reframing: Beyond RAG works for documents that fit in the model's *effective* working memory, which is 2–3 orders of magnitude smaller than the nominal context window. Full tech report: [`docs/paper/working-memory-cliff.md`](docs/paper/working-memory-cliff.md). HF blog post draft: [`docs/paper/hf-blog-draft.md`](docs/paper/hf-blog-draft.md).
153155
154156
---
155157

0 commit comments

Comments
 (0)