Wave 1 groundwork: Trust Layer verification report + per-build cost attribution#361
Conversation
Full plan in docs/strategy/roadmap.md (per-goal tasks, file touchpoints, acceptance criteria) plus a top-level ROADMAP.md pointer. Derived from AI-coding market/competitor research; sequences work into three waves and frames positioning around the trust gap. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add ArtifactManager.save_verification_report() and a pure build_verification_report() helper that assembles a normalized report (verdict, confidence, tests run, diff summary, uncertainty, out-of-scope edits) from the QA loop's existing signals. Establishes the verification-report.json contract the desktop UI and GitHub PR comments will surface. uncertainty/out_of_scope_edits are contract placeholders until P1.T2/T3 populate them; wiring into the QA reviewer/fixer sessions is the next step. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…tart Reading the real code showed coder.py already gates direct-API autonomy via the AUTO_CODE_AUTONOMY level, so P3.T1 needs no work. Mark it done with file evidence, repoint P3 start to the UX/observability gap (T4/T5), and update the start-here list to reflect P1.T1 landing and the next smallest commits. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
After the QA loop, build_commands now assembles the verification report from the persisted qa_signoff (verdict, issues, test_results, iteration) and writes artifacts/verification-report.json. Writes on every build (not only CI/json) via a best-effort helper that reuses or creates an ArtifactManager, so the desktop UI and PR comments can surface it. Never breaks the build on failure. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add optional model/provider to PhaseTokenStats (serialized in to_dict and reconstructed on load). save_token_stats accepts them and auto-resolves the active provider from provider config when omitted, so every caller records provider with no call-site change; the shared session path also records the model. Backward compatible (new fields default to null); foundation for the per-build cost dashboard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Warning Review limit reached
More reviews will be available in 31 minutes and 53 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (14)
📝 WalkthroughWalkthroughAdds a Trust Layer verification report artifact and UI path, extends token stats with model/provider attribution, and updates roadmap/autonomy documentation. ChangesTrust Layer Verification Report
Token Stats Model/Provider Attribution
Roadmap and autonomy docs
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
apps/backend/agents/session.py (1)
1085-1085: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueStale docstring:
modelis not a parameter ofrun_agent_session.The Args section documents a
modelparameter, but the signature (Lines 1063-1071) has no such parameter. Remove the stale entry to avoid confusion.📝 Proposed doc fix
conversation_history: Optional existing history for resuming sessions subtask_id: Optional subtask ID for session tracking - model: The AI model being used (e.g., "claude-sonnet-4-5-20250929")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@apps/backend/agents/session.py` at line 1085, The Args docstring for run_agent_session contains a stale model entry that does not match the function signature. Update the run_agent_session docstring in session.py by removing the model parameter description from the Args section and keeping the documented arguments aligned with the actual parameters, using run_agent_session as the reference point.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@apps/backend/cli/build_commands.py`:
- Around line 581-586: Populate changed_files before calling
_save_verification_report in build_commands.py, because it is still empty at
this point and causes diff_summary to be blank for the persisted report. Move or
reuse the changed-files discovery logic so the verification report is written
only after the list is populated, and ensure the same source of truth is used
for both the normal build path and the JSON-output tail.
- Around line 198-205: The `build_verification_report` call in
`build_commands.py` is incorrectly coupling `qa_session` and `iteration` by
falling each one back to the other source, which can invent metadata when only
one value exists. Update the logic in the report assembly so `qa_session` is
taken only from `qa_signoff.get("qa_session")` and `iteration` is taken only
from `qa_stats.get("last_iteration")`, keeping the two fields independent in
`build_verification_report`.
In `@docs/strategy/roadmap.md`:
- Line 48: The roadmap wiring reference is outdated: it points to qa/reviewer.py
and qa/fixer.py, but the implemented persistence path is now
_save_verification_report in build_commands and the behavior is covered by
test_build_verification_wiring. Update the T1/P1 entry so it names the
build-path integration and the relevant verification test, and remove the
QA-session references to avoid conflicting follow-up work.
---
Outside diff comments:
In `@apps/backend/agents/session.py`:
- Line 1085: The Args docstring for run_agent_session contains a stale model
entry that does not match the function signature. Update the run_agent_session
docstring in session.py by removing the model parameter description from the
Args section and keeping the documented arguments aligned with the actual
parameters, using run_agent_session as the reference point.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0f49d1f1-3169-4340-9427-62058354c49a
📒 Files selected for processing (9)
ROADMAP.mdapps/backend/agents/session.pyapps/backend/cli/artifacts.pyapps/backend/cli/build_commands.pyapps/backend/core/token_stats.pydocs/strategy/roadmap.mdtests/test_build_verification_wiring.pytests/test_token_stats_model_provider.pytests/test_verification_report.py
Add qa/scope_check.py (pure: get_planned_files + detect_out_of_scope_edits) comparing the plan's declared files (files_to_modify + files_to_create) against the files the build changed; ignores .auto-claude/ bookkeeping. Wired into the build's verification report (lazy/guarded import) and _save_verification_report now derives changed files from the spec worktree via WorktreeManager.get_changed_files. Populates out_of_scope_edits. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t (P1.T3) Reviewer now reports a self-assessed confidence (0-1) and an uncertainty list in its qa_signoff (qa_reviewer.md prompt + runtime sign-off examples). merge_runtime_qa_signoff_artifact carries them through the runtime path, sanitized like the other fields (real number clamped to [0,1]; dict-only uncertainty items; bool confidence dropped). The build reads both into the verification report so the UI/PR comment can show how sure QA was and what it could not verify. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…p progress (P3.T5) Add an 'Autonomy: one knob' section to CLI-USAGE leading with AUTO_CODE_AUTONOMY (off/claude/safe/bold) + preset and a local-model privacy recipe; relegate the env-var matrix to an advanced-overrides pointer. Mark ADR-006 Accepted (core implemented) with remaining follow-ups. Add a QUICK-START pointer. Update docs/strategy/roadmap.md to show Wave 1 backend complete. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a verification-report.json reader + IPC channel (TASK_SPEC_VERIFICATION_REPORT_GET) + preload bridge, a VerificationReport type, and a VerificationReportPanel that renders verdict, confidence, tests, out-of-scope edits, and uncertainty. Mounted in TaskOverview between the runtime artifacts and the QA report. i18n keys added for en + fr. tsc --noEmit passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add an AppSettings.autonomyLevel (off/claude/safe/bold, default claude) with a one-knob dropdown in Provider Settings (en+fr i18n). The build's process env injects AUTO_CODE_AUTONOMY from the persisted setting in AgentProcessManager.setupProcessEnvironment, with explicit environment variables taking precedence (ADR-006). tsc --noEmit passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Reliability gate: use pytest.approx for float-equality asserts (S1244 bugs) across the 3 trust-layer test files — restores Reliability A on new code. - CodeRabbit (Major): keep qa_session and iteration independent in the verification report instead of cross-falling-back (avoids mislabeled metadata). - CodeQL: explain the duration except-block (no longer an empty except). - Sonar smells: VerificationReportPanel props read-only, extract VerdictIcon (no nested ternary), content-based list keys (no array index); extract _compute_out_of_scope and _subtask_planned_files helpers to cut cognitive complexity. Verified: 51 backend tests pass; tsc --noEmit, ruff, biome clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|



Summary
Wave 1 groundwork from the new development roadmap. Establishes the Trust Layer
(our key differentiator — a structured "what was verified" report shipped with every
build) and the foundation for per-build cost transparency.
This is a focused, additive, fully-tested first slice. Larger roadmap items
(GitHub App issue→PR, multi-user cloud) will land as separate PRs.
What's included
docs/strategy/roadmap.md(full plan: 5 goals, 3 waves, per-taskfiles + acceptance criteria) and a top-level
ROADMAP.mdpointer.build_verification_report()+ArtifactManager.save_verification_report():the
verification-report.jsoncontract (verdict, confidence, tests run, diff summary,uncertainty, out-of-scope edits).
qa_signoffand writes it on every build (not only CI/json), best-effort so it neverbreaks the build. This is what the desktop UI and (later) PR comments will surface.
token_stats.json(providerauto-resolved centrally, so all callers record it with no call-site change). Foundation
for the cost dashboard. Backward compatible (new fields default to null).
While wiring P1.T1 I confirmed P3.T1 (coder respects
AUTO_CODE_AUTONOMY) is alreadyimplemented — the roadmap was corrected to reflect this rather than duplicating it.
Tests
New unit tests, all green (
apps/backend/.venv/bin/pytest):tests/test_verification_report.py— 7 passedtests/test_build_verification_wiring.py— 4 passedtests/test_token_stats_model_provider.py— 4 passedtests/test_planner.py(8 passed) confirms no regression in thesave_token_statscallers.Notes
out_of_scope_edits/confidence/uncertaintyare part of the report contract andstay empty until P1.T2 (scope detection) and P1.T3 (confidence extraction) populate them.
tests/test_qa_reviewer.pyfails to collect in isolation due toan
integrations.webhooksimport-path quirk (not touched by this PR).🤖 Generated with Claude Code
Summary by CodeRabbit