Phase 2B: DAG parallelization + analyst copilot UX + confidence calibration#6
Merged
Conversation
…p_verification capture PR #11 of Phase 2B. Adds reasoning_trail.py helper + ReasoningEntry dataclass. adversarial_triage now records advocate/critic LLM responses (with 200-char excerpt redaction) and the synthesizing decision entry. fp_verification records sanitizer/non-propagating/ sysfile pattern hits with per-pattern delta. SARIF export includes scout_reasoning_trail in properties bag. Additive field — PR #7a pattern, no schema bump, no downstream consumer changes. R7000 smoke test skipped — run-time validation deferred (full pipeline run not exercised in this PR; integration tests cover both stages end-to-end via FakeLLMDriver). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR #14 of Phase 2B. When the extraction stage fails, attach a structured guidance message to the stage outcome with detected reason, 3-4 actionable suggestions (vendor_decrypt, --rootfs, binwalk variants, issue filing), and a pointer to docs/runbook.md. run.py prints the guidance to stderr unless --quiet is set. Additive — failure status semantics unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…verride, category filter PR #12 of Phase 2B. Adds scout_get_finding_reasoning, scout_inject_hint, scout_override_verdict, scout_filter_by_category to mcp_server.py. Extends terminator_feedback.py with add_analyst_hint / get_analyst_hints / set_verdict_override helpers (fcntl.flock-safe, path_safety enforced). adversarial_triage advocate prompt now reads analyst_hints from feedback registry and prefixes them when present, closing the loop from analyst input back into next-run LLM judgment. Reasoning trail (PR #11) and category (PR #7a) both queryable via MCP.
PR #10 of Phase 2B. Adds stage_dag.py with manual STAGE_DEPS + Kahn topological levels, run_stages_parallel() using ThreadPoolExecutor, out-of-order ProgressTracker mode, and --experimental-parallel CLI flag. Sequential run_stages() unchanged. Skipped-on-failed-dep semantics + optional fail_fast. R7000 smoke test: skipped, aiedge-inputs/netgear/R7000-V1.0.11.136_10.2.120.chk not present in this worktree. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR #15 of Phase 2B. Adds scoring.py with PriorityInputs dataclass + compute_priority_score() weighting (detection 50%, EPSS 25%, reach 15%, CVSS 10%, backport -0.20 penalty). cve_scan.py:1140-1170 refactored: detection_confidence stays strictly at static-evidence cap (STATIC_CODE_VERIFIED_CAP = 0.55), EPSS / reachability / backport / CVSS now feed priority_score instead. findings.py adds priority_score + priority_inputs as additive optional fields (PR quality_metrics adds per-priority bucket aggregation. New doc: docs/scoring_calibration.md explains the split. Addresses external reviewer critique that EPSS-additive confidence looked like a ranking heuristic, not a detection probability.
… and TUI PR #13 of Phase 2B. Web viewer gains a collapsible "Reasoning Trail" section in finding detail panels. report_assembler analyst markdown gets a numbered Reasoning Trail subsection per finding. cli_tui_render appends a Reasoning Trail block to the finding detail view (ASCII-mode compatible). All three surfaces hide the section when the trail is absent. Closes the analyst-visibility loop opened by PR #11. Surfaces: - reasoning_trail.py grows format_trail_for_markdown, format_trail_for_tui, normalize_trail helpers (single source of truth for all rendering). - report_assembler.py exposes build_finding_reasoning_trail_md and reasoning_trail_for_analyst_json as the analyst-facing entry points. - reporting.py: _normalize_v2_claim_from_finding now passes through reasoning_trail (additive, no schema bump). write_analyst_report_v2_md emits the numbered subsection. write_analyst_report_v2_viewer adds collapsible <details> JS rendering with stage badge, llm_model, delta with sign+color, plain-Date timestamp formatting, and a raw_response_excerpt sub-details panel. - cli_tui_data.py loads findings.json and surfaces findings_with_trails in the snapshot dict. - cli_tui_render.py adds render_finding_detail_with_trail (testable in isolation) and a "Findings with Reasoning Trail" snapshot section that respects AIEDGE_TUI_ASCII via use_unicode plumbing. Invariants: - Additive only: surfaces hide the section when trail is empty/absent. - No analyst report v2 schema version bump. - AIEDGE_TUI_ASCII produces zero non-ASCII glyphs in TUI output. - pyright src: 0 errors. ruff src tests: clean. - pytest: 928 passed, 2 skipped (44 new tests for PR #13). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
R00T-Kim
added a commit
that referenced
this pull request
Apr 13, 2026
…libration Bumps version to 2.6.0. CHANGELOG.md v2.6.0 section summarizes the 6 atomic commits merged via PR #6 (Phase 2B integration). docs/status.md v2.6.0 section covers all three axes: DAG parallelization PoC (PR #10), reasoning trail + MCP analyst tools (PRs #11 #12 #13), extraction guidance (PR #14), detection vs priority calibration (PR #15). README badges bumped 2.5.0 → 2.6.0 (benchmark carry-over disclaimer unchanged — fresh corpus re-validation still pending). Verified: pytest 865 → 1027 (+162), pyright 0 errors, ruff clean, CI 5/5 green. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2B integration — 6 commits addressing external reviewer critiques on performance, analyst UX, and confidence calibration. Follows Phase 2A (trust baseline + typecheck cleanup). All commits atomic; any single commit could ship independently.
1a6d0e08d14e2d990957d23ac24fstage_dag.pySTAGE_DEPS + Kahn topo_levels +run_stages_parallel()ThreadPoolExecutor,--experimental-parallel [N]flag, out-of-order ProgressTracker mode. Sequentialrun_stages()unchanged.9103b12scoring.pywithPriorityInputs+compute_priority_score()(detection 50% / EPSS 25% / reach 15% / CVSS 10% / backport -0.20).cve_scan.py:1140-1170refactored: detection_confidence strictly capped atSTATIC_CODE_VERIFIED_CAP=0.55, EPSS/reachability/backport feedpriority_scoreinstead. Closes reviewer critique that EPSS-additive confidence looked like a ranking heuristic.3127de9render_finding_detail_with_trail()in TUI (AIEDGE_TUI_ASCII-compatible). All surfaces hide when trail absent.Verification
cve_confidence_above_0.55_cap = 0(detection cap correctly enforced)Design Invariants Preserved
category,reasoning_trail,priority_score,priority_inputs). No report schema version bump.assert_under_dir()(path_safety.py).run_stages()behavior bit-identical to pre-PR state.Test plan
findings.py,sarif_export.py,__main__.py,run.py(additive field coexistence + CLI flag composition)--experimental-parallel 4on R7000 (deferred — R7000 input not present in agent worktree)Context
Follows Phase 2A (PRs #1-#9, merged at
484dbf4+ typecheck cleanup at993dad2). Phase 2B positions SCOUT as a single-firmware analyst copilot per external reviewer guidance, with DAG parallelization as the performance lever and reasoning trail + MCP override loop as the UX lever.🤖 Generated with Claude Code