Wave 1 groundwork: Trust Layer verification report + per-build cost attribution by OBenner · Pull Request #361 · OBenner/Auto-Coding

OBenner · 2026-06-24T15:00:13Z

Summary

Wave 1 groundwork from the new development roadmap. Establishes the Trust Layer
(our key differentiator — a structured "what was verified" report shipped with every
build) and the foundation for per-build cost transparency.

This is a focused, additive, fully-tested first slice. Larger roadmap items
(GitHub App issue→PR, multi-user cloud) will land as separate PRs.

What's included

docs(strategy) — docs/strategy/roadmap.md (full plan: 5 goals, 3 waves, per-task
files + acceptance criteria) and a top-level ROADMAP.md pointer.
feat(qa) P1.T1 — build_verification_report() + ArtifactManager.save_verification_report():
the verification-report.json contract (verdict, confidence, tests run, diff summary,
uncertainty, out-of-scope edits).
feat(qa) P1.T1-wire — the QA build path now assembles the report from the persisted
qa_signoff and writes it on every build (not only CI/json), best-effort so it never
breaks the build. This is what the desktop UI and (later) PR comments will surface.
feat(cost) P5.T1 — record model/provider per phase in token_stats.json (provider
auto-resolved centrally, so all callers record it with no call-site change). Foundation
for the cost dashboard. Backward compatible (new fields default to null).

While wiring P1.T1 I confirmed P3.T1 (coder respects AUTO_CODE_AUTONOMY) is already
implemented — the roadmap was corrected to reflect this rather than duplicating it.

Tests

New unit tests, all green (apps/backend/.venv/bin/pytest):

tests/test_verification_report.py — 7 passed
tests/test_build_verification_wiring.py — 4 passed
tests/test_token_stats_model_provider.py — 4 passed

tests/test_planner.py (8 passed) confirms no regression in the save_token_stats callers.

Notes

out_of_scope_edits / confidence / uncertainty are part of the report contract and
stay empty until P1.T2 (scope detection) and P1.T3 (confidence extraction) populate them.
Pre-existing, unrelated: tests/test_qa_reviewer.py fails to collect in isolation due to
an integrations.webhooks import-path quirk (not touched by this PR).

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added a “Trust Layer” verification report artifact (verdict, confidence, uncertainty, tests/diff summary, issues, and out-of-scope edits) and integrated it into build output.
- Extended token statistics to persist per-phase model/provider for cost transparency.
- Added a UI panel and IPC/API endpoints to display the verification report, with updated i18n.
Bug Fixes
- Confidence/uncertainty signals are validated/clamped and missing fields no longer break report generation.
- Model/provider attribution is not clobbered when later updates omit values.
Documentation
- Updated autonomy CLI guidance and added a product roadmap.
Tests
- Expanded automated coverage for report generation/writing, trust-signal handling, token attribution, and out-of-scope detection.

Full plan in docs/strategy/roadmap.md (per-goal tasks, file touchpoints, acceptance criteria) plus a top-level ROADMAP.md pointer. Derived from AI-coding market/competitor research; sequences work into three waves and frames positioning around the trust gap. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add ArtifactManager.save_verification_report() and a pure build_verification_report() helper that assembles a normalized report (verdict, confidence, tests run, diff summary, uncertainty, out-of-scope edits) from the QA loop's existing signals. Establishes the verification-report.json contract the desktop UI and GitHub PR comments will surface. uncertainty/out_of_scope_edits are contract placeholders until P1.T2/T3 populate them; wiring into the QA reviewer/fixer sessions is the next step. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…tart Reading the real code showed coder.py already gates direct-API autonomy via the AUTO_CODE_AUTONOMY level, so P3.T1 needs no work. Mark it done with file evidence, repoint P3 start to the UX/observability gap (T4/T5), and update the start-here list to reflect P1.T1 landing and the next smallest commits. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

After the QA loop, build_commands now assembles the verification report from the persisted qa_signoff (verdict, issues, test_results, iteration) and writes artifacts/verification-report.json. Writes on every build (not only CI/json) via a best-effort helper that reuses or creates an ArtifactManager, so the desktop UI and PR comments can surface it. Never breaks the build on failure. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add optional model/provider to PhaseTokenStats (serialized in to_dict and reconstructed on load). save_token_stats accepts them and auto-resolves the active provider from provider config when omitted, so every caller records provider with no call-site change; the shared session path also records the model. Backward compatible (new fields default to null); foundation for the per-build cost dashboard. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

coderabbitai · 2026-06-24T15:00:31Z

Warning

Review limit reached

@OBenner, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 31 minutes and 53 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1bc3e098-c7b6-4876-aaa1-8f4d5e3ac32b

📥 Commits

Reviewing files that changed from the base of the PR and between 8ad98d3 and e2f9c5b.

📒 Files selected for processing (14)

apps/backend/cli/artifacts.py
apps/backend/cli/build_commands.py
apps/backend/qa/scope_check.py
apps/frontend/src/main/agent/agent-process.ts
apps/frontend/src/renderer/components/settings/ProviderSettings.tsx
apps/frontend/src/renderer/components/task-detail/VerificationReportPanel.tsx
apps/frontend/src/shared/constants/config.ts
apps/frontend/src/shared/i18n/locales/en/settings.json
apps/frontend/src/shared/i18n/locales/fr/settings.json
apps/frontend/src/shared/types/settings.ts
docs/strategy/roadmap.md
tests/test_build_verification_wiring.py
tests/test_qa_trust_signals.py
tests/test_verification_report.py

📝 Walkthrough

Walkthrough

Adds a Trust Layer verification report artifact and UI path, extends token stats with model/provider attribution, and updates roadmap/autonomy documentation.

Changes

Trust Layer Verification Report

Layer / File(s)	Summary
Report schema and persistence `apps/backend/cli/artifacts.py`, `tests/test_verification_report.py`	Defines the verification report filename, schema version, and verdict normalization, then adds pure report building and JSON artifact saving with timestamp stamping and error handling.
Build, scope, and QA trust signals `apps/backend/cli/build_commands.py`, `apps/backend/qa/scope_check.py`, `apps/backend/qa/reviewer.py`, `apps/backend/prompts/qa_reviewer.md`, `tests/test_build_verification_wiring.py`, `tests/test_qa_trust_signals.py`, `tests/test_scope_check.py`	Generates verification report data from `implementation_plan.json`, detects out-of-scope edits, carries `confidence` and `uncertainty` through QA sign-off handling, and persists the report after QA completion.
Frontend verification report display `apps/frontend/src/...`	Adds the IPC channel, shared types, preload API, renderer fetch path, panel UI, i18n strings, and mocks needed to load and show the verification report in task overview.

Token Stats Model/Provider Attribution

Layer / File(s)	Summary
Phase token stats attribution `apps/backend/core/token_stats.py`, `apps/backend/agents/session.py`, `tests/test_token_stats_model_provider.py`	Adds optional `model` and `provider` fields to phase token stats, restores them from disk, resolves missing providers, and preserves previously stored values on later updates.

Roadmap and autonomy docs

Layer / File(s)	Summary
Roadmap and autonomy documentation `ROADMAP.md`, `docs/strategy/roadmap.md`, `docs/architecture/adr/ADR-006-autonomy-levels.md`, `guides/CLI-USAGE.md`, `guides/QUICK-START.md`	Adds a roadmap overview and detailed strategy document, updates the autonomy ADR status, expands CLI autonomy guidance, and links the quick-start guide to the new autonomy section.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

OBenner/Auto-Coding#276: Shares the QA reviewer and sign-off flow that now carries confidence and uncertainty through runtime handling.
OBenner/Auto-Coding#340: Closely matches the runtime/persisted QA sign-off merge path extended here with trust-signal fields.
OBenner/Auto-Coding#361: Appears to be the larger parent change set for the Trust Layer report and related backend/frontend wiring.

Suggested labels

area/backend, area/frontend, feature

Poem

🐇 A trust report hops into view,
With clues and confidence too.
Model and provider now leave a trail,
While roadmap winds and docs set sail.
The bunny nods: “This run is neat—
Signals, stats, and screens all meet.”

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.37% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: Trust Layer verification reporting and per-build cost attribution.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/infallible-wilbur-44ffb3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

apps/backend/agents/session.py (1)
1085-1085: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Stale docstring: model is not a parameter of run_agent_session.

The Args section documents a model parameter, but the signature (Lines 1063-1071) has no such parameter. Remove the stale entry to avoid confusion.
📝 Proposed doc fix
         conversation_history: Optional existing history for resuming sessions
         subtask_id: Optional subtask ID for session tracking
-        model: The AI model being used (e.g., "claude-sonnet-4-5-20250929")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@apps/backend/agents/session.py` at line 1085, The Args docstring for
run_agent_session contains a stale model entry that does not match the function
signature. Update the run_agent_session docstring in session.py by removing the
model parameter description from the Args section and keeping the documented
arguments aligned with the actual parameters, using run_agent_session as the
reference point.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@apps/backend/cli/build_commands.py`:
- Around line 581-586: Populate changed_files before calling
_save_verification_report in build_commands.py, because it is still empty at
this point and causes diff_summary to be blank for the persisted report. Move or
reuse the changed-files discovery logic so the verification report is written
only after the list is populated, and ensure the same source of truth is used
for both the normal build path and the JSON-output tail.
- Around line 198-205: The `build_verification_report` call in
`build_commands.py` is incorrectly coupling `qa_session` and `iteration` by
falling each one back to the other source, which can invent metadata when only
one value exists. Update the logic in the report assembly so `qa_session` is
taken only from `qa_signoff.get("qa_session")` and `iteration` is taken only
from `qa_stats.get("last_iteration")`, keeping the two fields independent in
`build_verification_report`.

In `@docs/strategy/roadmap.md`:
- Line 48: The roadmap wiring reference is outdated: it points to qa/reviewer.py
and qa/fixer.py, but the implemented persistence path is now
_save_verification_report in build_commands and the behavior is covered by
test_build_verification_wiring. Update the T1/P1 entry so it names the
build-path integration and the relevant verification test, and remove the
QA-session references to avoid conflicting follow-up work.

---

Outside diff comments:
In `@apps/backend/agents/session.py`:
- Line 1085: The Args docstring for run_agent_session contains a stale model
entry that does not match the function signature. Update the run_agent_session
docstring in session.py by removing the model parameter description from the
Args section and keeping the documented arguments aligned with the actual
parameters, using run_agent_session as the reference point.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0f49d1f1-3169-4340-9427-62058354c49a

📥 Commits

Reviewing files that changed from the base of the PR and between f192d77 and 17a31d0.

📒 Files selected for processing (9)

ROADMAP.md
apps/backend/agents/session.py
apps/backend/cli/artifacts.py
apps/backend/cli/build_commands.py
apps/backend/core/token_stats.py
docs/strategy/roadmap.md
tests/test_build_verification_wiring.py
tests/test_token_stats_model_provider.py
tests/test_verification_report.py

Add qa/scope_check.py (pure: get_planned_files + detect_out_of_scope_edits) comparing the plan's declared files (files_to_modify + files_to_create) against the files the build changed; ignores .auto-claude/ bookkeeping. Wired into the build's verification report (lazy/guarded import) and _save_verification_report now derives changed files from the spec worktree via WorktreeManager.get_changed_files. Populates out_of_scope_edits. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…t (P1.T3) Reviewer now reports a self-assessed confidence (0-1) and an uncertainty list in its qa_signoff (qa_reviewer.md prompt + runtime sign-off examples). merge_runtime_qa_signoff_artifact carries them through the runtime path, sanitized like the other fields (real number clamped to [0,1]; dict-only uncertainty items; bool confidence dropped). The build reads both into the verification report so the UI/PR comment can show how sure QA was and what it could not verify. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…p progress (P3.T5) Add an 'Autonomy: one knob' section to CLI-USAGE leading with AUTO_CODE_AUTONOMY (off/claude/safe/bold) + preset and a local-model privacy recipe; relegate the env-var matrix to an advanced-overrides pointer. Mark ADR-006 Accepted (core implemented) with remaining follow-ups. Add a QUICK-START pointer. Update docs/strategy/roadmap.md to show Wave 1 backend complete. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add a verification-report.json reader + IPC channel (TASK_SPEC_VERIFICATION_REPORT_GET) + preload bridge, a VerificationReport type, and a VerificationReportPanel that renders verdict, confidence, tests, out-of-scope edits, and uncertainty. Mounted in TaskOverview between the runtime artifacts and the QA report. i18n keys added for en + fr. tsc --noEmit passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add an AppSettings.autonomyLevel (off/claude/safe/bold, default claude) with a one-knob dropdown in Provider Settings (en+fr i18n). The build's process env injects AUTO_CODE_AUTONOMY from the persisted setting in AgentProcessManager.setupProcessEnvironment, with explicit environment variables taking precedence (ADR-006). tsc --noEmit passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

- Reliability gate: use pytest.approx for float-equality asserts (S1244 bugs) across the 3 trust-layer test files — restores Reliability A on new code. - CodeRabbit (Major): keep qa_session and iteration independent in the verification report instead of cross-falling-back (avoids mislabeled metadata). - CodeQL: explain the duration except-block (no longer an empty except). - Sonar smells: VerificationReportPanel props read-only, extract VerdictIcon (no nested ternary), content-based list keys (no array index); extract _compute_out_of_scope and _subtask_planned_files helpers to cut cognitive complexity. Verified: 51 backend tests pass; tsc --noEmit, ruff, biome clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

sonarqubecloud · 2026-06-24T16:51:07Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

OBenner and others added 5 commits June 24, 2026 17:23

github-actions Bot added area/backend size/L labels Jun 24, 2026

github-advanced-security AI found potential problems Jun 24, 2026

View reviewed changes

Comment thread apps/backend/cli/artifacts.py Fixed

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread apps/backend/cli/build_commands.py

Comment thread apps/backend/cli/build_commands.py

Comment thread docs/strategy/roadmap.md

OBenner and others added 3 commits June 24, 2026 19:12

github-actions Bot added size/XL and removed size/L labels Jun 24, 2026

github-actions Bot added area/fullstack and removed area/backend labels Jun 24, 2026

OBenner and others added 3 commits June 24, 2026 20:14

docs(strategy): mark Wave 1 fully complete (P1.T4 + P3.T4 landed)

57c48f8

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

OBenner merged commit e5824e5 into develop Jun 24, 2026
19 checks passed

OBenner deleted the claude/infallible-wilbur-44ffb3 branch June 24, 2026 16:41

OBenner mentioned this pull request Jun 24, 2026

refactor(artifacts): timezone-aware UTC timestamps (Sonar S6903) #362

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wave 1 groundwork: Trust Layer verification report + per-build cost attribution#361

Wave 1 groundwork: Trust Layer verification report + per-build cost attribution#361
OBenner merged 12 commits into
developfrom
claude/infallible-wilbur-44ffb3

OBenner commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

OBenner commented Jun 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Tests

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud Bot commented Jun 24, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

OBenner commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading