You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following up on the design paper posted here earlier — we ran an empirical test of the central claim.
Setup: A 78-turn conversation (~6,400 words) with natural noise accumulation (topic shifts, corrections, abandoned approaches, off-topic tangents) fed to llama3.1 8B via Ollama. 10 fact-retrieval questions, tested under two conditions: full flat-log context vs curated thread-only context.
Result: 43.3% accuracy (full context) vs 100% accuracy (curated context).
The model hallucinated facts, denied information existed in the conversation, and picked up contradicted details from noise. Exactly the failure modes predicted by the paper.
The takeaway for kernel-level orchestration: a well-curated short context dramatically outperforms a noisy long context, even when the long context is well within the model's window. Context selection dominates context length.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Following up on the design paper posted here earlier — we ran an empirical test of the central claim.
Setup: A 78-turn conversation (~6,400 words) with natural noise accumulation (topic shifts, corrections, abandoned approaches, off-topic tangents) fed to llama3.1 8B via Ollama. 10 fact-retrieval questions, tested under two conditions: full flat-log context vs curated thread-only context.
Result: 43.3% accuracy (full context) vs 100% accuracy (curated context).
The model hallucinated facts, denied information existed in the conversation, and picked up contradicted details from noise. Exactly the failure modes predicted by the paper.
Full results, methodology, and reproducible code: github.com/MikeyBeez/fuzzyOS/discussions/2
The takeaway for kernel-level orchestration: a well-curated short context dramatically outperforms a noisy long context, even when the long context is well within the model's window. Context selection dominates context length.
Beta Was this translation helpful? Give feedback.
All reactions