feat(cache-proposals): confidence score on threshold recommendations#224
feat(cache-proposals): confidence score on threshold recommendations#224jamby77 wants to merge 9 commits into
Conversation
|
@BugBot review |
|
@BugBot review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 6ab303b. Configure here.
| }, 0); | ||
| const result = computeConfidence({ | ||
| sampleCount, | ||
| signalRate: isTighten ? uncertainHitRate : nearMissRate, |
There was a problem hiding this comment.
When the LOOSEN comes from the low_hit_rate path, won't nearMissRate always be ≤ 0.25 here (since we fell through the nearMissRate > 0.25 branch)? That would make signalRate - LOOSEN_BOUNDARY negative, clamp to 0, and give every low_hit_rate LOOSEN a confidence_score: 0 regardless of actual signal strength.
There was a problem hiding this comment.
Good catch. Fixed in db11587: the wiring now captures the engine's chosen signalRate alongside the signal discriminator and routes per-signal boundaries (TIGHTEN_BOUNDARY for uncertain_hits, DISTANT_HITS_BOUNDARY=0.25 for distant_hits, LOOSEN_BOUNDARY for near_misses, LOW_HIT_RATE_BOUNDARY=0.1 for low_hit_rate). Tests added for the two paths that were collapsing to 0.
There was a problem hiding this comment.
Still slightly better than BugBot, lol
…cision path The engine has four TIGHTEN/LOOSEN paths (uncertain_hits, distant_hits, near_misses, low_hit_rate). The confidence-score wiring was passing the wrong signal/boundary for distant_hits (used uncertainHitRate instead of distantHitRate) and low_hit_rate (used nearMissRate, which is below its own decision cutoff on that path), collapsing the score to 0 in both cases. Capture the engine's chosen signalRate alongside the existing `signal` discriminator, add per-signal boundary constants, and route both into the confidence calc. Reported by @KIvanow on PR #224.
The engine triggers LOOSEN when nearMissRate > 0.25, but the confidence score was using 0.3 as the signal boundary — creating a dead zone where recommendations between 0.25 and 0.3 collapsed to score 0 because the signal component went negative.
…cision path The engine has four TIGHTEN/LOOSEN paths (uncertain_hits, distant_hits, near_misses, low_hit_rate). The confidence-score wiring was passing the wrong signal/boundary for distant_hits (used uncertainHitRate instead of distantHitRate) and low_hit_rate (used nearMissRate, which is below its own decision cutoff on that path), collapsing the score to 0 in both cases. Capture the engine's chosen signalRate alongside the existing `signal` discriminator, add per-signal boundary constants, and route both into the confidence calc. Reported by @KIvanow on PR #224.
…g bug Adds unit tests that: - assert signalBoundaryFor returns the right boundary for each of the four engine signals, and null for unknown/undefined input. - document the original mapping bug: feeding a path's rate against the wrong boundary (the old wiring) yields signal=0 and score=0, while feeding the right pair yields signal>0 and score>0. Two cases cover the low_hit_rate and distant_hits paths. Belt-and-suspenders over the existing service-spec regression guards.
05305d7 to
bbcd2c9
Compare
Summary
confidence_score: number | nullandconfidence_breakdown: { sample, signal, freshness } | nulltoThresholdRecommendation. Populated only forTIGHTEN/LOOSEN(null forOPTIMAL/INSUFFICIENT_DATA)./200), signal strength past the decision boundary (saturating at0.8), and sample freshness in the last hour. One weak component drags the whole score down by design.TIGHTEN/LOOSENis emitted, so we can observe the production distribution before designing auto-approval cutoffs.Pure math lives in its own module (
confidence-score.ts) so it's unit-testable in isolation. The service composes it into the existing flow with no schema or endpoint changes. Tier gating is already enforced by@RequiresFeature(Feature.CACHE_INTELLIGENCE)on the controllers — no new license plumbing.This is step 1 of the Self-Optimization track. Auto-approval gate and historical-accuracy weighting come later (the latter is blocked on the Outcome Tracking work).
Test plan
Note
Medium Risk
Changes how threshold recommendations are interpreted (new confidence fields and signal-rate mapping) in cache intelligence paths, though behavior is additive with no API schema break and extensive tests.
Overview
Adds
confidence_scoreandconfidence_breakdown(sample,signal,freshness) to semantic cacheThresholdRecommendationresponses. Values are set only for actionabletighten_threshold/loosen_thresholdoutcomes;optimalandinsufficient_datareturnnull.Scoring lives in new
confidence-score.ts: geometric mean of sample volume (saturates at 200), signal strength past the same decision boundary the engine used (saturates at 0.8), and freshness within the last hour. The service now recordssignalRateper trigger (uncertain_hits,distant_hits,near_misses,low_hit_rate) and maps each to the correct boundary—fixing cases wherelow_hit_rate/distant_hitswould have collapsed confidence to zero. Info-level logs emit score and breakdown when a tighten/loosen is returned.Unit and service tests cover the pure math, null flow-through, stale samples, and non-zero confidence on
low_hit_rateloosen anddistant_hitstighten paths.Reviewed by Cursor Bugbot for commit bbcd2c9. Bugbot is set up for automated code reviews on this repo. Configure here.