You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Local branch `tune/agent-prompts` carries a single commit that compresses `agents/code-reviewer.md` from 266 lines to 36 — roughly 87% reduction — into a density-matched subagent prompt. The commit has been sitting since 2026-04-14 and was deliberately set aside during the ADR-123 firing-dynamics work (PRs #48, #49) so the prompt-compression refactor wouldn't mix signals with the architectural work in flight. Shipping both in the same window would have made it impossible to tell which change was responsible for any behavioral shift.
The rationale per Aaron: a more cohesive scoring system should land before testing compression, so the experiment runs against the new scoring rather than the old.
Empirical baseline we already have
The verbose 266-line version was invoked against PR #49 late in the session that merged it. It produced:
Checkout `tune/agent-prompts`, re-invoke against the same PR. Save the output.
Compare:
Finding count and distribution by category (correctness / safety / design / nit)
False-positive rate
File:line citation precision
Adherence to the report structure
Total tokens consumed (cheaper is only a win if quality holds)
Any regression in edge-case coverage vs the verbose version
Accept / reject criteria
Ship compressed if it produces ≥90% of the baseline's true-positive findings with ≤the same false-positive rate, and the compressed prompt is easier to maintain/read.
Keep verbose if the dense version drops findings, misclassifies, or loses citation precision. Token savings are not sufficient justification if quality regresses.
Iterate on dense if it's close but not matching. The 36-line version is likely not the final answer; it's a directional experiment.
Notes on sequencing
Do NOT merge `tune/agent-prompts` before running the A/B. That would make the dense version the default and forfeit the comparison.
Do NOT delete `tune/agent-prompts` before the experiment. The compressed version is meaningful work that represents a directional hypothesis about density-matched prompts.
The branch has one commit, 7 hours old as of merge, and a single-file diff. It's cheap to carry indefinitely.
Context
Local branch `tune/agent-prompts` carries a single commit that compresses `agents/code-reviewer.md` from 266 lines to 36 — roughly 87% reduction — into a density-matched subagent prompt. The commit has been sitting since 2026-04-14 and was deliberately set aside during the ADR-123 firing-dynamics work (PRs #48, #49) so the prompt-compression refactor wouldn't mix signals with the architectural work in flight. Shipping both in the same window would have made it impossible to tell which change was responsible for any behavioral shift.
The rationale per Aaron: a more cohesive scoring system should land before testing compression, so the experiment runs against the new scoring rather than the old.
Empirical baseline we already have
The verbose 266-line version was invoked against PR #49 late in the session that merged it. It produced:
That's the bar the dense version needs to match or exceed.
Proposed A/B
Accept / reject criteria
Notes on sequencing
References