feat(health): per-model context-rot warnings [GET-28]#57
Conversation
Pure agent. Single source of truth for per-model context-rot warn / critical thresholds and coaching copy. Grounded in two Anthropic-published MRCR scores (Opus 4.6 = 76%, Sonnet 4.5 = 18.5% on 8-needle 1M MRCR v2); all other thresholds explicitly marked as Saar coaching defaults. Lookup uses longest-prefix match so claude-sonnet-4-6-20250514 resolves to the canonical claude-sonnet-4-6 entry. Detail-heavy adjustment (-15pp) shifts thresholds earlier when the prompt demands precise recall, floored at MIN_THRESHOLD_FLOOR (30%) to guard future tuning. ABSOLUTE_CRITICAL_FLOOR (90%) acts as a model-agnostic safety net. Coaching copy is claude.ai-specific: never mentions /compact (Claude Code only), softens for compaction-aware models (Opus 4.7/4.6, Sonnet 4.6), points 200k models at "start a new chat" + "use Projects." Spec doc pins verbatim Anthropic quotes and documents the threshold derivation, so the runtime file stays terse and the rationale lives in prose. Drift test enforces that every profile with an mrcrAt1MPct field also carries a sourceUrl + sourceQuote, and that the quote contains the figure verbatim. 44 unit tests covering lookup, threshold math, zone classification, copy generation, and the floor branch via a directly-exposed helper.
…ET-28] New pure function isDetailHeavy(promptText) returns true when the user's turn demands precise / exhaustive recall on prior context. Two independent triggers: a fenced code block (cheapest check) or any DETAIL_HEAVY_KEYWORDS substring (case-insensitive after a single toLowerCase). The Health Agent reads the resulting flag to shift its per-model warn / critical thresholds earlier. Why this lives in prompt-analysis: prompt inspection is the Prompt Agent's job, the rot agent only consumes a flag. Keeps each lib file focused on one input shape. Tests: 7 cases plus a cross-file mirror drift guard. inject.ts duplicates DETAIL_HEAVY_KEYWORDS inline (no lib imports allowed in the MAIN world), so the test reads inject.ts as text and asserts every keyword on the lib side appears verbatim there. Same pattern as the SHORT_FOLLOWUP_MAX_CHARS comment but with a real assertion behind it.
Add an optional isDetailHeavy boolean to StreamCompletePayload. inject.ts computes it from the live promptText (mirrored keyword list, code-fence check) before posting; raw prompt text never crosses the bridge. bridge-validation.ts type-checks the new field to keep the Layer-5 schema validator strict. The mirrored keyword list in inject.ts is guarded by the cross-file drift test added alongside lib/prompt-analysis.ts isDetailHeavy().
Replace the model-agnostic 70/90 ceilings with per-model warn / critical
thresholds from lib/context-rot-thresholds.ts. HealthInput now requires
two new fields:
- model: string (from SSE message_start; "" falls back to the
conservative 200k profile)
- isDetailHeavy: boolean (from inject.ts on STREAM_COMPLETE)
Computation is two independent classifiers, more severe wins:
PRIMARY (per-model utilization). Three-zone classification: healthy
/ approaching / in-rot. The approaching and in-rot zones use the
per-model coaching string (cites MRCR when published, mentions
compaction for 1M models, points 200k models at Projects + new chat).
SECONDARY (turn count, growth rate). Catches the cases the primary
classifier misses: a long conversation with low context still shows
attention drift, and a fast-filling short conversation deserves a
forward-looking warning before the per-model warn fires. Secondary
rules use generic copy with the model name; they never quote MRCR
data they did not consume.
TURN_AWARE_WARN_OFFSET (10pp): a turn-aware degrading rule fires when
context is within 10 points of warn AND turnCount > TURN_HEALTHY_CEIL.
This preserves the legacy "70% + 11 turns = degrading" coverage on
Sonnet 4.5-class models without locking it to 70.
Critical: the absolute floor at 90% (DEGRADING_CEIL, re-exported for
back-compat) always trips, even on the most-permissive 1M profile.
Content script (claude-ai.content.ts) now plumbs model + isDetailHeavy
into all four computeHealthScore call sites: page-load restore,
async restore, post-STREAM_COMPLETE recompute, and SPA-navigation
restore. lastDetailHeavy is reset alongside other per-conversation
state on SPA navigation. Side panel (useDashboardData.ts) passes the
stored conversation's model with isDetailHeavy=false (no live draft).
Tests rewritten to use named models (Sonnet 4.5 as the rot exemplar,
Opus 4.6 for differentiated behavior, Sonnet 4.6 for compaction-aware
copy assertions). Audit, integration, perf, fuzz, and smoke suites all
updated to pass the two new fields. 1709 passing, +52 from baseline.
|
@DevanshuNEU is attempting to deploy a commit to the Dev's projects Team on Vercel. A member of the Team first needs to authorize it. |
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds model-aware context-rot thresholds and detail-heavy prompt detection. It introduces a new thresholds module, propagates an Changes
Sequence DiagramsequenceDiagram
participant User
participant ContentScript as Content Script<br/>(inject.ts)
participant Bridge as Message Bridge
participant HealthAgent as Health Agent
participant ThresholdModule as Context-Rot<br/>Thresholds
participant Coaching as Coaching<br/>Generator
User->>ContentScript: Submit prompt
ContentScript->>ContentScript: isDetailHeavy detection<br/>(code fence + keywords)
ContentScript->>Bridge: STREAM_COMPLETE (model, isDetailHeavy, contextPct)
Bridge->>HealthAgent: deliver STREAM_COMPLETE
HealthAgent->>ThresholdModule: getRotProfile(model)
ThresholdModule-->>HealthAgent: profile (warn/crit, compaction, MRCR)
HealthAgent->>ThresholdModule: getEffectiveThresholds(model, isDetailHeavy)
ThresholdModule-->>HealthAgent: adjusted warn/crit
HealthAgent->>ThresholdModule: getRotZone(model, contextPct, isDetailHeavy)
ThresholdModule-->>HealthAgent: zone (healthy/approaching/in-rot)
alt zone is approaching/in-rot or turn-growth escalation
HealthAgent->>Coaching: getRotCoaching(model, contextPct, isDetailHeavy)
Coaching-->>HealthAgent: model-aware coaching (MRCR/compaction-aware)
end
HealthAgent-->>Bridge: health result
Bridge-->>User: display health + coaching
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/context-rot-thresholds-spec.md (1)
1-257:⚠️ Potential issue | 🟡 MinorRemove em dashes from the spec copy.
This markdown file uses em dashes in several sections, which violates the repo rule for
*.md. Please rewrite them with periods, semicolons, or commas. As per coding guidelines, no em dashes. Use colons, semicolons, or rewrite the sentence.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/context-rot-thresholds-spec.md` around lines 1 - 257, The spec file uses em dashes throughout (e.g., in headings/sections like "What is NOT an Anthropic fact", "Why -15 percentage points for detail-heavy", and sentences such as "Compaction provides server-side summarization — that automatically condenses..."), which violates the markdown rule; find and replace all em dashes (—) with appropriate punctuation (periods, commas, colons, or semicolons) or rewrite the sentences to remove the dash while preserving meaning, updating occurrences in paragraphs under headings like "What problem are we solving", "Anthropic-published facts we rely on", "What is NOT an Anthropic fact", "Why -15 percentage points for detail-heavy", and "Coaching copy contract".
🧹 Nitpick comments (2)
lib/context-rot-thresholds.ts (1)
357-363: Centralize the low-context cutoff to prevent cross-file drift.
30is hardcoded here whilelib/health-score.tsdefinesLOW_CONTEXT_REASSURANCE_CEIL = 30. Using one exported constant would keep coaching and secondary-rule behavior aligned over time.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/context-rot-thresholds.ts` around lines 357 - 363, Replace the hardcoded 30 in the healthy-zone low-context check with the centralized exported constant LOW_CONTEXT_REASSURANCE_CEIL from lib/health-score.ts: update the condition in the zone === 'healthy' branch (currently contextPct < 30) to use LOW_CONTEXT_REASSURANCE_CEIL, add the appropriate import for LOW_CONTEXT_REASSURANCE_CEIL at the top of lib/context-rot-thresholds.ts, and keep the surrounding logic that returns the reassurance message using pctRounded, profile.label and windowLabel unchanged.tests/unit/context-rot-thresholds.test.ts (1)
57-63: Add a non-boundary prefix regression case.Please add a test that a model like
claude-sonnet-4-50does not resolve to theclaude-sonnet-4-5profile. This protects lookup behavior from partial-prefix collisions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/context-rot-thresholds.test.ts` around lines 57 - 63, Add a regression test that calls getRotProfile('claude-sonnet-4-50') and asserts it does NOT resolve to the shorter prefix 'claude-sonnet-4-5' (e.g., expect(p.modelPrefix).not.toBe('claude-sonnet-4-5')); use the same test structure as the existing cases so you verify getRotProfile and its modelPrefix field avoid partial-prefix collisions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/context-rot-thresholds.ts`:
- Around line 260-263: The current prefix match in the ROT_PROFILES loop uses
normalized.startsWith(profile.modelPrefix) which can falsely match longer
versioned names (e.g. "claude-sonnet-4-50" matching "claude-sonnet-4-5"); update
the condition to only accept the prefix when either normalized.length ===
profile.modelPrefix.length or the character immediately after the prefix is not
a digit (i.e. check normalized[profile.modelPrefix.length] and reject when it's
a numeric char). Apply this boundary check before assigning best so
profile.modelPrefix, normalized, ROT_PROFILES and best are used to locate and
fix the logic.
In `@lib/health-score.ts`:
- Around line 207-215: The coaching string can show "0 messages" because
remaining is computed with Math.round(headroom / growthRate) and can be 0 even
when not at warn threshold; in the block that returns the degrading warning
(uses growthRate, FAST_GROWTH_PCT, contextPct, thresholds.warnAtPct, remaining,
target, profile.label), ensure remaining is at least 1 before formatting the
message (e.g., compute remaining = Math.max(1, Math.round(headroom /
growthRate)) or conditionally set remainingForDisplay = remaining === 0 ? 1 :
remaining) and choose the correct singular/plural target based on that display
value so the coaching text never says "0 messages."
In `@lib/prompt-analysis.ts`:
- Around line 75-103: Update the documentation block above the
DETAIL_HEAVY_KEYWORDS constant to state that the matching is substring-based
(case-insensitive includes()), not word-bounded, and adjust the explanatory text
about single-word vs multi-word triggers accordingly; then remove the entry
'comprehensive' from the DETAIL_HEAVY_KEYWORDS array so the list no longer
contains that AI-filler term.
In `@tests/audit/health-score-audit.test.ts`:
- Around line 51-57: The test "just below DEGRADING_CEIL is not automatically
critical from this rule" uses computeHealthScore(input({ contextPct: 89.9,
model: OPUS_46 })) but does not isolate DEGRADING_CEIL because OPUS_46 already
triggers its per-model critical threshold; either remove the test or update it
to accurately reflect what it verifies: rename the test and update its comment
to state it asserts that the per-model rule (for OPUS_46) takes precedence over
the floor, and keep the call to computeHealthScore/input/OPUS_46 and the
expectation as-is, or alternatively replace OPUS_46 with a model whose per-model
critical threshold is above 89.9 if the intention is to test the floor
specifically.
---
Outside diff comments:
In `@docs/context-rot-thresholds-spec.md`:
- Around line 1-257: The spec file uses em dashes throughout (e.g., in
headings/sections like "What is NOT an Anthropic fact", "Why -15 percentage
points for detail-heavy", and sentences such as "Compaction provides server-side
summarization — that automatically condenses..."), which violates the markdown
rule; find and replace all em dashes (—) with appropriate punctuation (periods,
commas, colons, or semicolons) or rewrite the sentences to remove the dash while
preserving meaning, updating occurrences in paragraphs under headings like "What
problem are we solving", "Anthropic-published facts we rely on", "What is NOT an
Anthropic fact", "Why -15 percentage points for detail-heavy", and "Coaching
copy contract".
---
Nitpick comments:
In `@lib/context-rot-thresholds.ts`:
- Around line 357-363: Replace the hardcoded 30 in the healthy-zone low-context
check with the centralized exported constant LOW_CONTEXT_REASSURANCE_CEIL from
lib/health-score.ts: update the condition in the zone === 'healthy' branch
(currently contextPct < 30) to use LOW_CONTEXT_REASSURANCE_CEIL, add the
appropriate import for LOW_CONTEXT_REASSURANCE_CEIL at the top of
lib/context-rot-thresholds.ts, and keep the surrounding logic that returns the
reassurance message using pctRounded, profile.label and windowLabel unchanged.
In `@tests/unit/context-rot-thresholds.test.ts`:
- Around line 57-63: Add a regression test that calls
getRotProfile('claude-sonnet-4-50') and asserts it does NOT resolve to the
shorter prefix 'claude-sonnet-4-5' (e.g.,
expect(p.modelPrefix).not.toBe('claude-sonnet-4-5')); use the same test
structure as the existing cases so you verify getRotProfile and its modelPrefix
field avoid partial-prefix collisions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d24f57f9-c379-4ac4-98eb-db7550e57bcd
📒 Files selected for processing (20)
docs/context-rot-thresholds-spec.mdentrypoints/claude-ai.content.tsentrypoints/inject.tsentrypoints/sidepanel/hooks/useDashboardData.tslib/bridge-validation.tslib/context-rot-thresholds.tslib/health-score.tslib/message-types.tslib/prompt-analysis.tstests/audit/fuzz.test.tstests/audit/health-score-audit.test.tstests/audit/smoke.test.tstests/integration/agent-pipeline.test.tstests/integration/restore-pipeline.test.tstests/integration/session-state.test.tstests/perf/benchmarks.bench.tstests/perf/memory.test.tstests/unit/context-rot-thresholds.test.tstests/unit/health-score.test.tstests/unit/prompt-analysis.test.ts
Seven small fixes from the second pipeline pass, none behavior-breaking. 1. getRotProfile prefix match now requires a digit boundary, so a hypothetical "claude-sonnet-4-50" cannot silently match the "claude-sonnet-4-5" profile and inherit its 200k-window coaching. Two regression tests pin this and the still-valid date-suffixed form. 2. Centralize LOW_CONTEXT_REASSURANCE_CEIL in context-rot-thresholds.ts and import it from health-score.ts so both the coaching layer and the fast-growth rule see the same value. (Cannot import the other direction without a circular dep, hence the move.) 3. Fast-growth coaching no longer says "About 0 messages until the rot zone". Floors the displayed count at 1 since rounding can produce 0 when headroom is small relative to growth rate. 4. DETAIL_HEAVY_KEYWORDS docstring corrected: matching is substring via String.prototype.includes, not word-bounded as the comment claimed. Removed 'comprehensive' from the keyword list (filler-y trigger, produces false positives on prose like 'comprehensive overview'). Mirror in inject.ts updated; the cross-file drift test still passes. 5. Audit test "just below DEGRADING_CEIL is not automatically critical" was misleading: it asserts critical and the assertion is correct (Opus 4.6's per-model crit at 85 fires before the 90 floor). Renamed and clarified the comment. 6. Two em dashes in the spec doc replaced with periods. 1711 tests passing.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/context-rot-thresholds.ts`:
- Around line 118-126: Fix the typo in the doc comment for the constant
LOW_CONTEXT_REASSURANCE_CEIL: change "precentage" to "percentage" in the comment
describing the threshold so the documentation reads correctly; locate the
comment block above the exported constant LOW_CONTEXT_REASSURANCE_CEIL and
update the misspelled word.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: eb6128b9-b512-4eff-bc3d-49cee754e80a
📒 Files selected for processing (7)
docs/context-rot-thresholds-spec.mdentrypoints/inject.tslib/context-rot-thresholds.tslib/health-score.tslib/prompt-analysis.tstests/audit/health-score-audit.test.tstests/unit/context-rot-thresholds.test.ts
✅ Files skipped from review due to trivial changes (1)
- docs/context-rot-thresholds-spec.md
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Summary
Replaces the model-agnostic 70/90 health-indicator ceilings with per-model warn / critical thresholds grounded in Anthropic's published MRCR scores. Each Claude model on claude.ai now gets coaching copy tailored to its actual context-rot behavior, with claude.ai-specific actions (start a new chat, use Projects) instead of
/compactinstructions that only exist in Claude Code.Type of Change
feat— New featurefix— Bug fixrefactor— Code restructure, no behavior changetest— Tests onlychore— Build, CI, tooling, dependenciesdocs— Documentation onlyWhat Was Changed
New agent —
lib/context-rot-thresholds.ts. Single source of truth for per-model thresholds + coaching copy. Longest-prefix lookup, conservative fallback profile, detail-heavy adjustment (-15pp) with a 30% floor. Each profile carries optionalmrcrAt1MPct/sourceUrl/sourceQuoteprovenance fields for the two Anthropic-published scores (Opus 4.6 = 76%, Sonnet 4.5 = 18.5%).New spec —
docs/context-rot-thresholds-spec.md. Verbatim Anthropic quotes pinned, threshold derivation explained, drift policy documented. Same.gitignoreexception pattern asattachment-cost-spec.md.Health rewrite —
lib/health-score.ts.HealthInputnow requiresmodel: stringandisDetailHeavy: boolean. Two independent classifiers, more severe wins:TURN_AWARE_WARN_OFFSET = 10: a turn-aware degrading rule keeps the legacy "70% + 11 turns = degrading" coverage on Sonnet 4.5-class models without locking it to 70.ABSOLUTE_CRITICAL_FLOOR = 90%: model-agnostic safety net.Detail-heavy detector —
lib/prompt-analysis.tsexportsisDetailHeavy(promptText)and aDETAIL_HEAVY_KEYWORDSconstant. Pure function; the bridge schema carries the precomputed flag so raw prompt text never crosses worlds.inject.tsmirrors the keyword list inline (nolib/imports allowed in the MAIN world); a cross-file drift test asserts the mirror stays in sync.Bridge —
lib/message-types.tsadds optionalisDetailHeavy?: booleantoStreamCompletePayload.lib/bridge-validation.tstype-checks the new field.Wiring —
entrypoints/claude-ai.content.tsplumbsmodel+isDetailHeavyinto all fourcomputeHealthScorecall sites and resetslastDetailHeavyon SPA navigation.entrypoints/sidepanel/hooks/useDashboardData.tspasses the stored conversation's model withisDetailHeavy=false(no live draft on the side panel).Tests — 52 new tests (44 for context-rot-thresholds, 7 for
isDetailHeavy, 1 cross-file mirror drift assertion, plus a directly-exposed helper for the floor branch). All existing audit, integration, perf, fuzz, and smoke suites updated to pass the two newHealthInputfields.bun run testreports 1709 passing (+52 from baseline).How to Test
bun run compile && bun run test && bun run build— all clean./compact(Claude Code feature, not on the web).Checklist
bun run test) — 1709 passingbun run compile)bun run build) — 73.99 KB content scriptRelated Issues
Closes GET-28.
Notes for Reviewer
main), the "200% session %" calibration bug (weekly-cap display, not context rot), and detail-heavy detection on the draft (vs last-sent) prompt — Cycle 2.Summary by CodeRabbit
New Features
Documentation
Tests
Validation