Skip to content

feat(cache-proposals): confidence score on threshold recommendations#224

Open
jamby77 wants to merge 9 commits into
masterfrom
feature/cache-proposals-confidence-score
Open

feat(cache-proposals): confidence score on threshold recommendations#224
jamby77 wants to merge 9 commits into
masterfrom
feature/cache-proposals-confidence-score

Conversation

@jamby77
Copy link
Copy Markdown
Collaborator

@jamby77 jamby77 commented May 27, 2026

Summary

  • Adds confidence_score: number | null and confidence_breakdown: { sample, signal, freshness } | null to ThresholdRecommendation. Populated only for TIGHTEN / LOOSEN (null for OPTIMAL / INSUFFICIENT_DATA).
  • Score is the geometric mean of three 0–1 components — sample-count saturation (/200), signal strength past the decision boundary (saturating at 0.8), and sample freshness in the last hour. One weak component drags the whole score down by design.
  • Logs the score + breakdown at info level when a TIGHTEN / LOOSEN is emitted, so we can observe the production distribution before designing auto-approval cutoffs.

Pure math lives in its own module (confidence-score.ts) so it's unit-testable in isolation. The service composes it into the existing flow with no schema or endpoint changes. Tier gating is already enforced by @RequiresFeature(Feature.CACHE_INTELLIGENCE) on the controllers — no new license plumbing.

This is step 1 of the Self-Optimization track. Auto-approval gate and historical-accuracy weighting come later (the latter is blocked on the Outcome Tracking work).

Test plan

  • 9 unit tests on the pure scoring function (saturation, boundary short-circuit, clamping, clock skew, over-count, LOOSEN path, fuzz).
  • 3 service-level tests covering field flow-through (null on INSUFFICIENT_DATA, populated on TIGHTEN, freshness=0 on stale samples).
  • Full `api` unit suite: 1668 / 1675 pass (1 pre-existing license-spec failure unrelated to this branch).
  • No e2e needed — verified controllers are bare passthroughs with no DTO / class-transformer that could drop the new fields.

Note

Medium Risk
Changes how threshold recommendations are interpreted (new confidence fields and signal-rate mapping) in cache intelligence paths, though behavior is additive with no API schema break and extensive tests.

Overview
Adds confidence_score and confidence_breakdown (sample, signal, freshness) to semantic cache ThresholdRecommendation responses. Values are set only for actionable tighten_threshold / loosen_threshold outcomes; optimal and insufficient_data return null.

Scoring lives in new confidence-score.ts: geometric mean of sample volume (saturates at 200), signal strength past the same decision boundary the engine used (saturates at 0.8), and freshness within the last hour. The service now records signalRate per trigger (uncertain_hits, distant_hits, near_misses, low_hit_rate) and maps each to the correct boundary—fixing cases where low_hit_rate / distant_hits would have collapsed confidence to zero. Info-level logs emit score and breakdown when a tighten/loosen is returned.

Unit and service tests cover the pure math, null flow-through, stale samples, and non-zero confidence on low_hit_rate loosen and distant_hits tighten paths.

Reviewed by Cursor Bugbot for commit bbcd2c9. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread proprietary/cache-proposals/cache-readonly.service.ts Outdated
@jamby77
Copy link
Copy Markdown
Collaborator Author

jamby77 commented May 27, 2026

@BugBot review

Comment thread proprietary/cache-proposals/confidence-score.ts Outdated
@jamby77
Copy link
Copy Markdown
Collaborator Author

jamby77 commented May 27, 2026

@BugBot review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 6ab303b. Configure here.

@jamby77 jamby77 requested a review from KIvanow May 28, 2026 05:46
}, 0);
const result = computeConfidence({
sampleCount,
signalRate: isTighten ? uncertainHitRate : nearMissRate,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the LOOSEN comes from the low_hit_rate path, won't nearMissRate always be ≤ 0.25 here (since we fell through the nearMissRate > 0.25 branch)? That would make signalRate - LOOSEN_BOUNDARY negative, clamp to 0, and give every low_hit_rate LOOSEN a confidence_score: 0 regardless of actual signal strength.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed in db11587: the wiring now captures the engine's chosen signalRate alongside the signal discriminator and routes per-signal boundaries (TIGHTEN_BOUNDARY for uncertain_hits, DISTANT_HITS_BOUNDARY=0.25 for distant_hits, LOOSEN_BOUNDARY for near_misses, LOW_HIT_RATE_BOUNDARY=0.1 for low_hit_rate). Tests added for the two paths that were collapsing to 0.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still slightly better than BugBot, lol

jamby77 added a commit that referenced this pull request May 28, 2026
…cision path

The engine has four TIGHTEN/LOOSEN paths (uncertain_hits, distant_hits,
near_misses, low_hit_rate). The confidence-score wiring was passing the
wrong signal/boundary for distant_hits (used uncertainHitRate instead
of distantHitRate) and low_hit_rate (used nearMissRate, which is below
its own decision cutoff on that path), collapsing the score to 0 in
both cases.

Capture the engine's chosen signalRate alongside the existing `signal`
discriminator, add per-signal boundary constants, and route both into
the confidence calc.

Reported by @KIvanow on PR #224.
jamby77 added 9 commits May 30, 2026 12:41
The engine triggers LOOSEN when nearMissRate > 0.25, but the confidence
score was using 0.3 as the signal boundary — creating a dead zone where
recommendations between 0.25 and 0.3 collapsed to score 0 because the
signal component went negative.
…cision path

The engine has four TIGHTEN/LOOSEN paths (uncertain_hits, distant_hits,
near_misses, low_hit_rate). The confidence-score wiring was passing the
wrong signal/boundary for distant_hits (used uncertainHitRate instead
of distantHitRate) and low_hit_rate (used nearMissRate, which is below
its own decision cutoff on that path), collapsing the score to 0 in
both cases.

Capture the engine's chosen signalRate alongside the existing `signal`
discriminator, add per-signal boundary constants, and route both into
the confidence calc.

Reported by @KIvanow on PR #224.
…g bug

Adds unit tests that:
- assert signalBoundaryFor returns the right boundary for each of the
  four engine signals, and null for unknown/undefined input.
- document the original mapping bug: feeding a path's rate against the
  wrong boundary (the old wiring) yields signal=0 and score=0, while
  feeding the right pair yields signal>0 and score>0. Two cases cover
  the low_hit_rate and distant_hits paths.

Belt-and-suspenders over the existing service-spec regression guards.
@jamby77 jamby77 force-pushed the feature/cache-proposals-confidence-score branch from 05305d7 to bbcd2c9 Compare May 30, 2026 09:42
Copy link
Copy Markdown
Member

@KIvanow KIvanow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nicely done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants