Skip to content

Phase 2B: DAG parallelization + analyst copilot UX + confidence calibration#6

Merged
R00T-Kim merged 6 commits into
mainfrom
phase2b-integration
Apr 13, 2026
Merged

Phase 2B: DAG parallelization + analyst copilot UX + confidence calibration#6
R00T-Kim merged 6 commits into
mainfrom
phase2b-integration

Conversation

@R00T-Kim
Copy link
Copy Markdown
Owner

Summary

Phase 2B integration — 6 commits addressing external reviewer critiques on performance, analyst UX, and confidence calibration. Follows Phase 2A (trust baseline + typecheck cleanup). All commits atomic; any single commit could ship independently.

Commit PR Focus
1a6d0e0 #11 reasoning_trail field — adversarial_triage/fp_verification capture advocate/critic/decision/pattern-hit entries (200-char excerpt cap, additive PR #7a pattern)
8d14e2d #14 extraction failure analyst guidance — vendor_decrypt hint, --rootfs option, binwalk variants, docs/runbook.md#extraction-failure pointer
990957d #12 4 MCP analyst tools — scout_get_finding_reasoning, scout_inject_hint, scout_override_verdict, scout_filter_by_category; terminator_feedback extended with fcntl.flock-safe hint channel; adversarial_triage advocate prompt reads analyst_hints
23ac24f #10 DAG parallelization PoC — stage_dag.py STAGE_DEPS + Kahn topo_levels + run_stages_parallel() ThreadPoolExecutor, --experimental-parallel [N] flag, out-of-order ProgressTracker mode. Sequential run_stages() unchanged.
9103b12 #15 detection vs priority calibration — new scoring.py with PriorityInputs + compute_priority_score() (detection 50% / EPSS 25% / reach 15% / CVSS 10% / backport -0.20). cve_scan.py:1140-1170 refactored: detection_confidence strictly capped at STATIC_CODE_VERIFIED_CAP=0.55, EPSS/reachability/backport feed priority_score instead. Closes reviewer critique that EPSS-additive confidence looked like a ranking heuristic.
3127de9 #13 reasoning_trail viewer — collapsible section in embedded web viewer template, numbered subsection in analyst markdown, render_finding_detail_with_trail() in TUI (AIEDGE_TUI_ASCII-compatible). All surfaces hide when trail absent.

Verification

  • pytest: 1027 passed, 1 skipped (865 → 1027, +162 new tests)
  • ruff: All checks passed
  • pyright: 0 errors, 0 warnings (Phase 2A baseline preserved)
  • R7000 smoke (PR #15): 3 findings, all carry priority_score + priority_inputs; cve_confidence_above_0.55_cap = 0 (detection cap correctly enforced)

Design Invariants Preserved

  • Additive only on findings.py (PR #7a pattern for category, reasoning_trail, priority_score, priority_inputs). No report schema version bump.
  • All file writes continue to route through assert_under_dir() (path_safety.py).
  • Sequential run_stages() behavior bit-identical to pre-PR state.
  • Existing LLM driver contracts untouched.

Test plan

  • Full test suite green on merged state
  • New test files (test_reasoning_trail.py, test_extraction_guidance.py, test_mcp_analyst_tools.py, test_stage_dag.py, test_run_stages_parallel.py, test_scoring.py, test_reasoning_trail_viewer.py) — 162 new tests, all pass
  • Merge-conflict resolution verified on findings.py, sarif_export.py, __main__.py, run.py (additive field coexistence + CLI flag composition)
  • Wall-clock comparison sequential vs --experimental-parallel 4 on R7000 (deferred — R7000 input not present in agent worktree)
  • Adversarial triage hint injection loop smoke (requires AIEDGE_FEEDBACK_DIR + real LLM)

Context

Follows Phase 2A (PRs #1-#9, merged at 484dbf4 + typecheck cleanup at 993dad2). Phase 2B positions SCOUT as a single-firmware analyst copilot per external reviewer guidance, with DAG parallelization as the performance lever and reasoning trail + MCP override loop as the UX lever.

🤖 Generated with Claude Code

R00T-Kim and others added 6 commits April 13, 2026 17:59
…p_verification capture

PR #11 of Phase 2B. Adds reasoning_trail.py helper + ReasoningEntry
dataclass. adversarial_triage now records advocate/critic LLM
responses (with 200-char excerpt redaction) and the synthesizing
decision entry. fp_verification records sanitizer/non-propagating/
sysfile pattern hits with per-pattern delta. SARIF export includes
scout_reasoning_trail in properties bag. Additive field — PR #7a
pattern, no schema bump, no downstream consumer changes.

R7000 smoke test skipped — run-time validation deferred (full pipeline
run not exercised in this PR; integration tests cover both stages
end-to-end via FakeLLMDriver).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR #14 of Phase 2B. When the extraction stage fails, attach a
structured guidance message to the stage outcome with detected
reason, 3-4 actionable suggestions (vendor_decrypt, --rootfs,
binwalk variants, issue filing), and a pointer to docs/runbook.md.
run.py prints the guidance to stderr unless --quiet is set.
Additive — failure status semantics unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…verride, category filter

PR #12 of Phase 2B. Adds scout_get_finding_reasoning,
scout_inject_hint, scout_override_verdict, scout_filter_by_category
to mcp_server.py. Extends terminator_feedback.py with
add_analyst_hint / get_analyst_hints / set_verdict_override
helpers (fcntl.flock-safe, path_safety enforced). adversarial_triage
advocate prompt now reads analyst_hints from feedback registry and
prefixes them when present, closing the loop from analyst input
back into next-run LLM judgment. Reasoning trail (PR #11) and
category (PR #7a) both queryable via MCP.
PR #10 of Phase 2B. Adds stage_dag.py with manual STAGE_DEPS + Kahn
topological levels, run_stages_parallel() using ThreadPoolExecutor,
out-of-order ProgressTracker mode, and --experimental-parallel CLI
flag. Sequential run_stages() unchanged. Skipped-on-failed-dep
semantics + optional fail_fast.

R7000 smoke test: skipped, aiedge-inputs/netgear/R7000-V1.0.11.136_10.2.120.chk
not present in this worktree.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR #15 of Phase 2B. Adds scoring.py with PriorityInputs dataclass +
compute_priority_score() weighting (detection 50%, EPSS 25%, reach
15%, CVSS 10%, backport -0.20 penalty). cve_scan.py:1140-1170
refactored: detection_confidence stays strictly at static-evidence
cap (STATIC_CODE_VERIFIED_CAP = 0.55), EPSS / reachability /
backport / CVSS now feed priority_score instead. findings.py adds
priority_score + priority_inputs as additive optional fields (PR
quality_metrics adds per-priority bucket aggregation. New doc:
docs/scoring_calibration.md explains the split.

Addresses external reviewer critique that EPSS-additive confidence
looked like a ranking heuristic, not a detection probability.
… and TUI

PR #13 of Phase 2B. Web viewer gains a collapsible "Reasoning Trail"
section in finding detail panels. report_assembler analyst markdown
gets a numbered Reasoning Trail subsection per finding. cli_tui_render
appends a Reasoning Trail block to the finding detail view (ASCII-mode
compatible). All three surfaces hide the section when the trail is
absent. Closes the analyst-visibility loop opened by PR #11.

Surfaces:
- reasoning_trail.py grows format_trail_for_markdown,
  format_trail_for_tui, normalize_trail helpers (single source of
  truth for all rendering).
- report_assembler.py exposes build_finding_reasoning_trail_md and
  reasoning_trail_for_analyst_json as the analyst-facing entry points.
- reporting.py: _normalize_v2_claim_from_finding now passes through
  reasoning_trail (additive, no schema bump). write_analyst_report_v2_md
  emits the numbered subsection. write_analyst_report_v2_viewer adds
  collapsible <details> JS rendering with stage badge, llm_model,
  delta with sign+color, plain-Date timestamp formatting, and a
  raw_response_excerpt sub-details panel.
- cli_tui_data.py loads findings.json and surfaces findings_with_trails
  in the snapshot dict.
- cli_tui_render.py adds render_finding_detail_with_trail (testable
  in isolation) and a "Findings with Reasoning Trail" snapshot section
  that respects AIEDGE_TUI_ASCII via use_unicode plumbing.

Invariants:
- Additive only: surfaces hide the section when trail is empty/absent.
- No analyst report v2 schema version bump.
- AIEDGE_TUI_ASCII produces zero non-ASCII glyphs in TUI output.
- pyright src: 0 errors. ruff src tests: clean.
- pytest: 928 passed, 2 skipped (44 new tests for PR #13).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@R00T-Kim R00T-Kim merged commit aa6e554 into main Apr 13, 2026
5 checks passed
@R00T-Kim R00T-Kim deleted the phase2b-integration branch April 13, 2026 10:02
R00T-Kim added a commit that referenced this pull request Apr 13, 2026
…libration

Bumps version to 2.6.0. CHANGELOG.md v2.6.0 section summarizes the 6
atomic commits merged via PR #6 (Phase 2B integration). docs/status.md
v2.6.0 section covers all three axes: DAG parallelization PoC (PR #10),
reasoning trail + MCP analyst tools (PRs #11 #12 #13), extraction
guidance (PR #14), detection vs priority calibration (PR #15). README
badges bumped 2.5.0 → 2.6.0 (benchmark carry-over disclaimer unchanged
— fresh corpus re-validation still pending).

Verified: pytest 865 → 1027 (+162), pyright 0 errors, ruff clean,
CI 5/5 green.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant