Phase 2B: DAG parallelization + analyst copilot UX + confidence calibration by R00T-Kim · Pull Request #6 · R00T-Kim/SCOUT

R00T-Kim · 2026-04-13T09:58:29Z

Summary

Phase 2B integration — 6 commits addressing external reviewer critiques on performance, analyst UX, and confidence calibration. Follows Phase 2A (trust baseline + typecheck cleanup). All commits atomic; any single commit could ship independently.

Commit	PR	Focus
`1a6d0e0`	#11	reasoning_trail field — adversarial_triage/fp_verification capture advocate/critic/decision/pattern-hit entries (200-char excerpt cap, additive PR #7a pattern)
`8d14e2d`	#14	extraction failure analyst guidance — vendor_decrypt hint, --rootfs option, binwalk variants, docs/runbook.md#extraction-failure pointer
`990957d`	#12	4 MCP analyst tools — scout_get_finding_reasoning, scout_inject_hint, scout_override_verdict, scout_filter_by_category; terminator_feedback extended with fcntl.flock-safe hint channel; adversarial_triage advocate prompt reads analyst_hints
`23ac24f`	#10	DAG parallelization PoC — `stage_dag.py` STAGE_DEPS + Kahn topo_levels + `run_stages_parallel()` ThreadPoolExecutor, `--experimental-parallel [N]` flag, out-of-order ProgressTracker mode. Sequential `run_stages()` unchanged.
`9103b12`	#15	detection vs priority calibration — new `scoring.py` with `PriorityInputs` + `compute_priority_score()` (detection 50% / EPSS 25% / reach 15% / CVSS 10% / backport -0.20). `cve_scan.py:1140-1170` refactored: detection_confidence strictly capped at `STATIC_CODE_VERIFIED_CAP=0.55`, EPSS/reachability/backport feed `priority_score` instead. Closes reviewer critique that EPSS-additive confidence looked like a ranking heuristic.
`3127de9`	#13	reasoning_trail viewer — collapsible section in embedded web viewer template, numbered subsection in analyst markdown, `render_finding_detail_with_trail()` in TUI (AIEDGE_TUI_ASCII-compatible). All surfaces hide when trail absent.

Verification

pytest: 1027 passed, 1 skipped (865 → 1027, +162 new tests)
ruff: All checks passed
pyright: 0 errors, 0 warnings (Phase 2A baseline preserved)
R7000 smoke (PR #15): 3 findings, all carry priority_score + priority_inputs; cve_confidence_above_0.55_cap = 0 (detection cap correctly enforced)

Design Invariants Preserved

Additive only on findings.py (PR #7a pattern for category, reasoning_trail, priority_score, priority_inputs). No report schema version bump.
All file writes continue to route through assert_under_dir() (path_safety.py).
Sequential run_stages() behavior bit-identical to pre-PR state.
Existing LLM driver contracts untouched.

Test plan

Full test suite green on merged state
New test files (test_reasoning_trail.py, test_extraction_guidance.py, test_mcp_analyst_tools.py, test_stage_dag.py, test_run_stages_parallel.py, test_scoring.py, test_reasoning_trail_viewer.py) — 162 new tests, all pass
Merge-conflict resolution verified on findings.py, sarif_export.py, __main__.py, run.py (additive field coexistence + CLI flag composition)
Wall-clock comparison sequential vs --experimental-parallel 4 on R7000 (deferred — R7000 input not present in agent worktree)
Adversarial triage hint injection loop smoke (requires AIEDGE_FEEDBACK_DIR + real LLM)

Context

Follows Phase 2A (PRs #1-#9, merged at 484dbf4 + typecheck cleanup at 993dad2). Phase 2B positions SCOUT as a single-firmware analyst copilot per external reviewer guidance, with DAG parallelization as the performance lever and reasoning trail + MCP override loop as the UX lever.

🤖 Generated with Claude Code

…p_verification capture PR #11 of Phase 2B. Adds reasoning_trail.py helper + ReasoningEntry dataclass. adversarial_triage now records advocate/critic LLM responses (with 200-char excerpt redaction) and the synthesizing decision entry. fp_verification records sanitizer/non-propagating/ sysfile pattern hits with per-pattern delta. SARIF export includes scout_reasoning_trail in properties bag. Additive field — PR #7a pattern, no schema bump, no downstream consumer changes. R7000 smoke test skipped — run-time validation deferred (full pipeline run not exercised in this PR; integration tests cover both stages end-to-end via FakeLLMDriver). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PR #14 of Phase 2B. When the extraction stage fails, attach a structured guidance message to the stage outcome with detected reason, 3-4 actionable suggestions (vendor_decrypt, --rootfs, binwalk variants, issue filing), and a pointer to docs/runbook.md. run.py prints the guidance to stderr unless --quiet is set. Additive — failure status semantics unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…verride, category filter PR #12 of Phase 2B. Adds scout_get_finding_reasoning, scout_inject_hint, scout_override_verdict, scout_filter_by_category to mcp_server.py. Extends terminator_feedback.py with add_analyst_hint / get_analyst_hints / set_verdict_override helpers (fcntl.flock-safe, path_safety enforced). adversarial_triage advocate prompt now reads analyst_hints from feedback registry and prefixes them when present, closing the loop from analyst input back into next-run LLM judgment. Reasoning trail (PR #11) and category (PR #7a) both queryable via MCP.

PR #10 of Phase 2B. Adds stage_dag.py with manual STAGE_DEPS + Kahn topological levels, run_stages_parallel() using ThreadPoolExecutor, out-of-order ProgressTracker mode, and --experimental-parallel CLI flag. Sequential run_stages() unchanged. Skipped-on-failed-dep semantics + optional fail_fast. R7000 smoke test: skipped, aiedge-inputs/netgear/R7000-V1.0.11.136_10.2.120.chk not present in this worktree. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PR #15 of Phase 2B. Adds scoring.py with PriorityInputs dataclass + compute_priority_score() weighting (detection 50%, EPSS 25%, reach 15%, CVSS 10%, backport -0.20 penalty). cve_scan.py:1140-1170 refactored: detection_confidence stays strictly at static-evidence cap (STATIC_CODE_VERIFIED_CAP = 0.55), EPSS / reachability / backport / CVSS now feed priority_score instead. findings.py adds priority_score + priority_inputs as additive optional fields (PR quality_metrics adds per-priority bucket aggregation. New doc: docs/scoring_calibration.md explains the split. Addresses external reviewer critique that EPSS-additive confidence looked like a ranking heuristic, not a detection probability.

… and TUI PR #13 of Phase 2B. Web viewer gains a collapsible "Reasoning Trail" section in finding detail panels. report_assembler analyst markdown gets a numbered Reasoning Trail subsection per finding. cli_tui_render appends a Reasoning Trail block to the finding detail view (ASCII-mode compatible). All three surfaces hide the section when the trail is absent. Closes the analyst-visibility loop opened by PR #11. Surfaces: - reasoning_trail.py grows format_trail_for_markdown, format_trail_for_tui, normalize_trail helpers (single source of truth for all rendering). - report_assembler.py exposes build_finding_reasoning_trail_md and reasoning_trail_for_analyst_json as the analyst-facing entry points. - reporting.py: _normalize_v2_claim_from_finding now passes through reasoning_trail (additive, no schema bump). write_analyst_report_v2_md emits the numbered subsection. write_analyst_report_v2_viewer adds collapsible <details> JS rendering with stage badge, llm_model, delta with sign+color, plain-Date timestamp formatting, and a raw_response_excerpt sub-details panel. - cli_tui_data.py loads findings.json and surfaces findings_with_trails in the snapshot dict. - cli_tui_render.py adds render_finding_detail_with_trail (testable in isolation) and a "Findings with Reasoning Trail" snapshot section that respects AIEDGE_TUI_ASCII via use_unicode plumbing. Invariants: - Additive only: surfaces hide the section when trail is empty/absent. - No analyst report v2 schema version bump. - AIEDGE_TUI_ASCII produces zero non-ASCII glyphs in TUI output. - pyright src: 0 errors. ruff src tests: clean. - pytest: 928 passed, 2 skipped (44 new tests for PR #13). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…libration Bumps version to 2.6.0. CHANGELOG.md v2.6.0 section summarizes the 6 atomic commits merged via PR #6 (Phase 2B integration). docs/status.md v2.6.0 section covers all three axes: DAG parallelization PoC (PR #10), reasoning trail + MCP analyst tools (PRs #11 #12 #13), extraction guidance (PR #14), detection vs priority calibration (PR #15). README badges bumped 2.5.0 → 2.6.0 (benchmark carry-over disclaimer unchanged — fresh corpus re-validation still pending). Verified: pytest 865 → 1027 (+162), pyright 0 errors, ruff clean, CI 5/5 green. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

R00T-Kim and others added 6 commits April 13, 2026 17:59

R00T-Kim merged commit aa6e554 into main Apr 13, 2026
5 checks passed

R00T-Kim deleted the phase2b-integration branch April 13, 2026 10:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 2B: DAG parallelization + analyst copilot UX + confidence calibration#6

Phase 2B: DAG parallelization + analyst copilot UX + confidence calibration#6
R00T-Kim merged 6 commits into
mainfrom
phase2b-integration

R00T-Kim commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

R00T-Kim commented Apr 13, 2026

Summary

Verification

Design Invariants Preserved

Test plan

Context

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant