feat(cache-proposals): confidence score on threshold recommendations by jamby77 · Pull Request #224 · BetterDB-inc/monitor

jamby77 · 2026-05-27T13:30:50Z

Summary

Adds confidence_score: number | null and confidence_breakdown: { sample, signal, freshness } | null to ThresholdRecommendation. Populated only for TIGHTEN / LOOSEN (null for OPTIMAL / INSUFFICIENT_DATA).
Score is the geometric mean of three 0–1 components — sample-count saturation (/200), signal strength past the decision boundary (saturating at 0.8), and sample freshness in the last hour. One weak component drags the whole score down by design.
Logs the score + breakdown at info level when a TIGHTEN / LOOSEN is emitted, so we can observe the production distribution before designing auto-approval cutoffs.

Pure math lives in its own module (confidence-score.ts) so it's unit-testable in isolation. The service composes it into the existing flow with no schema or endpoint changes. Tier gating is already enforced by @RequiresFeature(Feature.CACHE_INTELLIGENCE) on the controllers — no new license plumbing.

This is step 1 of the Self-Optimization track. Auto-approval gate and historical-accuracy weighting come later (the latter is blocked on the Outcome Tracking work).

Test plan

9 unit tests on the pure scoring function (saturation, boundary short-circuit, clamping, clock skew, over-count, LOOSEN path, fuzz).
3 service-level tests covering field flow-through (null on INSUFFICIENT_DATA, populated on TIGHTEN, freshness=0 on stale samples).
Full `api` unit suite: 1668 / 1675 pass (1 pre-existing license-spec failure unrelated to this branch).
No e2e needed — verified controllers are bare passthroughs with no DTO / class-transformer that could drop the new fields.

Note

Medium Risk
Changes how threshold recommendations are interpreted (new confidence fields and signal-rate mapping) in cache intelligence paths, though behavior is additive with no API schema break and extensive tests.

Overview
Adds confidence_score and confidence_breakdown (sample, signal, freshness) to semantic cache ThresholdRecommendation responses. Values are set only for actionable tighten_threshold / loosen_threshold outcomes; optimal and insufficient_data return null.

Scoring lives in new confidence-score.ts: geometric mean of sample volume (saturates at 200), signal strength past the same decision boundary the engine used (saturates at 0.8), and freshness within the last hour. The service now records signalRate per trigger (uncertain_hits, distant_hits, near_misses, low_hit_rate) and maps each to the correct boundary—fixing cases where low_hit_rate / distant_hits would have collapsed confidence to zero. Info-level logs emit score and breakdown when a tighten/loosen is returned.

Unit and service tests cover the pure math, null flow-through, stale samples, and non-zero confidence on low_hit_rate loosen and distant_hits tighten paths.

^{Reviewed by Cursor Bugbot for commit bbcd2c9. Bugbot is set up for automated code reviews on this repo. Configure here.}

jamby77 · 2026-05-27T13:58:20Z

@BugBot review

jamby77 · 2026-05-27T14:08:45Z

@BugBot review

cursor

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

^{Reviewed by Cursor Bugbot for commit 6ab303b. Configure here.}

KIvanow · 2026-05-28T08:41:19Z

+      }, 0);
+      const result = computeConfidence({
+        sampleCount,
+        signalRate: isTighten ? uncertainHitRate : nearMissRate,


When the LOOSEN comes from the low_hit_rate path, won't nearMissRate always be ≤ 0.25 here (since we fell through the nearMissRate > 0.25 branch)? That would make signalRate - LOOSEN_BOUNDARY negative, clamp to 0, and give every low_hit_rate LOOSEN a confidence_score: 0 regardless of actual signal strength.

Good catch. Fixed in db11587: the wiring now captures the engine's chosen signalRate alongside the signal discriminator and routes per-signal boundaries (TIGHTEN_BOUNDARY for uncertain_hits, DISTANT_HITS_BOUNDARY=0.25 for distant_hits, LOOSEN_BOUNDARY for near_misses, LOW_HIT_RATE_BOUNDARY=0.1 for low_hit_rate). Tests added for the two paths that were collapsing to 0.

Still slightly better than BugBot, lol

@KIvanow

…cision path The engine has four TIGHTEN/LOOSEN paths (uncertain_hits, distant_hits, near_misses, low_hit_rate). The confidence-score wiring was passing the wrong signal/boundary for distant_hits (used uncertainHitRate instead of distantHitRate) and low_hit_rate (used nearMissRate, which is below its own decision cutoff on that path), collapsing the score to 0 in both cases. Capture the engine's chosen signalRate alongside the existing `signal` discriminator, add per-signal boundary constants, and route both into the confidence calc. Reported by @KIvanow on PR #224.

…ecommendations

…on type

…dations

…d recommendation

The engine triggers LOOSEN when nearMissRate > 0.25, but the confidence score was using 0.3 as the signal boundary — creating a dead zone where recommendations between 0.25 and 0.3 collapsed to score 0 because the signal component went negative.

@KIvanow

…cision path The engine has four TIGHTEN/LOOSEN paths (uncertain_hits, distant_hits, near_misses, low_hit_rate). The confidence-score wiring was passing the wrong signal/boundary for distant_hits (used uncertainHitRate instead of distantHitRate) and low_hit_rate (used nearMissRate, which is below its own decision cutoff on that path), collapsing the score to 0 in both cases. Capture the engine's chosen signalRate alongside the existing `signal` discriminator, add per-signal boundary constants, and route both into the confidence calc. Reported by @KIvanow on PR #224.

…g bug Adds unit tests that: - assert signalBoundaryFor returns the right boundary for each of the four engine signals, and null for unknown/undefined input. - document the original mapping bug: feeding a path's rate against the wrong boundary (the old wiring) yields signal=0 and score=0, while feeding the right pair yields signal>0 and score>0. Two cases cover the low_hit_rate and distant_hits paths. Belt-and-suspenders over the existing service-spec regression guards.

KIvanow

very nicely done!

cursor Bot reviewed May 27, 2026

View reviewed changes

Comment thread proprietary/cache-proposals/cache-readonly.service.ts Outdated

cursor Bot reviewed May 27, 2026

View reviewed changes

Comment thread proprietary/cache-proposals/confidence-score.ts Outdated

cursor Bot reviewed May 27, 2026

View reviewed changes

jamby77 requested a review from KIvanow May 28, 2026 05:46

KIvanow reviewed May 28, 2026

View reviewed changes

KIvanow approved these changes May 28, 2026

View reviewed changes

jamby77 mentioned this pull request May 30, 2026

feat(cache-proposals): cost-weighted threshold optimization #227

Open

4 tasks

jamby77 added 9 commits May 30, 2026 12:41

feat(cache-proposals): pure confidence-score function for threshold r…

fa11b5b

…ecommendations

chore(cache-proposals): expand freshness ternary to braced if/else

f5537e6

feat(cache-proposals): add confidence fields to ThresholdRecommendati…

195fd5c

…on type

feat(cache-proposals): surface confidence score on threshold recommen…

2f8a205

…dations

chore(cache-proposals): expand latestRecordedAt reduce to braced form

2c235f5

test(cache-proposals): assert confidence fields flow through threshol…

216e15c

…d recommendation

jamby77 force-pushed the feature/cache-proposals-confidence-score branch from 05305d7 to bbcd2c9 Compare May 30, 2026 09:42

KIvanow approved these changes Jun 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cache-proposals): confidence score on threshold recommendations#224

feat(cache-proposals): confidence score on threshold recommendations#224
jamby77 wants to merge 9 commits into
masterfrom
feature/cache-proposals-confidence-score

jamby77 commented May 27, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

jamby77 commented May 27, 2026

Uh oh!

Uh oh!

jamby77 commented May 27, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

KIvanow May 28, 2026

Uh oh!

jamby77 May 28, 2026

Uh oh!

KIvanow May 28, 2026

Uh oh!

KIvanow left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jamby77 commented May 27, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

jamby77 commented May 27, 2026

Uh oh!

Uh oh!

jamby77 commented May 27, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

KIvanow May 28, 2026

Choose a reason for hiding this comment

Uh oh!

jamby77 May 28, 2026

Choose a reason for hiding this comment

Uh oh!

KIvanow May 28, 2026

Choose a reason for hiding this comment

Uh oh!

KIvanow left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jamby77 commented May 27, 2026 •

edited by cursor Bot

Loading