Add finding diversity gate and pair-eval timeout diagnostic (Phase 2C+.5)#9
Merged
Conversation
…+.5)
Reviewer eval lane analysis (2026-04-19) surfaced two blockers on the path
into Phase 2D':
- the local-7 lane mapped every pair-side row to a single finding_id
(degenerate ROC: all 14 rows on aiedge.findings.web.exec_sink_overlap)
- the dedicated reviewer reruns (claude-6h, codex-6h) terminated at
`run_index rows = 0` with no actionable diagnostic
This adds measurement scaffolding for both:
- quality_policy.py: compute_pair_eval_diversity_index() (max-share over
finding_id), load_pair_eval_finding_ids() (CSV reader with optional
ground_truth filter), evaluate_pair_eval_diversity_gate() (fails when
index >= AIEDGE_PAIR_DIVERSITY_MAX, default 0.5). New violation tokens
QUALITY_GATE_DIVERSITY_MISS / QUALITY_GATE_INVALID_PAIR_EVAL.
- run_pair_eval.py: TimeoutExpired now writes <side>/timeout_diagnostic.json
with last 200 stderr / 50 stdout lines, best-effort run_dir guess, and
the most recent stage's name/status.
- release_gate.sh: opt-in PAIR_EVAL_DIVERSITY sub-gate via
--pair-eval-findings; absent flag emits an INFO skip line.
- docs/finding_diversity_gate.md: threshold rationale, output schema,
Phase 2D entry exit-gate hook (recall >= 0.40 / tier >= 2 nonzero TP /
diversity < 0.5 / dedicated rerun success / corpus >= 10).
Verification:
pytest -q tests/test_finding_diversity_gate.py # 12 passed
pytest -q # full suite green
ruff check, pyright, check_doc_consistency # clean
bash -n scripts/release_gate.sh # clean
Phase 2C+ Track A first PR (Pivot 2026-04-19).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ector (Phase 2C+.3)
Doubles the dangerous-call catalogue covered by taint_propagation so the
diversity gate added in 2C+.5 has more candidates to discriminate against.
The pre-Pivot _SINK_SYMBOLS only covered the cmd-injection / strcpy /
printf families (29 symbols); the firmware corpus routinely surfaces
sinks across at least nine CWE families that were silently missed.
- _SINK_SYMBOLS 29 -> 51, with explicit CWE comments per group:
* CWE-78 + wordexp / posix_spawn / posix_spawnp
* CWE-22 + fopen / open / openat / freopen / chdir
* CWE-426 + dlsym / dlmopen
* CWE-732 + chmod / fchmod / chown / fchown / lchown
* CWE-377 + mktemp / tmpnam / tempnam / tmpfile
* CWE-250/269 + chroot / setuid / seteuid / setgid / setegid
* CWE-454 + putenv / setenv / unsetenv
* CWE-134 + vsnprintf / dprintf / vdprintf
- _FORMAT_STRING_SINKS 6 -> 15 with size-bounded, fd-based, and
wide-character format-string variants.
- _is_format_string_variable() is widened to flag any first argument
whose first non-whitespace character is *not* a string literal: bare
identifiers, function calls, struct field access (`obj->field`),
array subscripts, C-style casts, parenthesised ternaries, and
pointer dereferences (`*p_fmt`). Previously only bare identifiers
matched, so `printf(obj->field)` was silently considered safe.
Verification:
pytest -q tests/test_taint_propagation.py # 20 passed
pytest -q # full suite green
ruff check src/ tests/ # clean
pyright (changed files) # 0 errors
python3 scripts/check_doc_consistency.py # OK
Phase 2C+ Track A second commit on PR #9 (Pivot 2026-04-19, Plan
~/.claude/plans/twinkly-hugging-leaf.md).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…+.2) EnhancedSourceStage now widens source identification beyond the INPUT_APIS dynstr scan to cover the three attacker-influenced string families that LARA (USENIX Sec 2024) showed are missing from traditional source pools: * URI prefixes (20 patterns) * CGI environment variables (17 patterns) * NVRAM / sysconf keys (24 patterns) This is the source-side counterpart to PR #9's sink expansion (2C+.3): together they grow both ends of the source->sink graph so the diversity gate (2C+.5) has a meaningfully larger candidate pool. - _URI_SOURCE_PATTERNS / _CGI_VAR_PATTERNS / _CONFIG_KEY_PATTERNS frozensets with CWE / RFC / OEM provenance comments. - _extract_uri_key_sources(bin_path, symbols, ascii_strings=None) returns deduplicated (pattern, kind) tuples. Matching policy: URI: substring vs bin_path AND ascii_strings (symbols intentionally excluded -- '/' is not a valid identifier char) CGI var: exact lower-case match against symbols OR ascii_strings config key: substring vs bin_path, symbols, AND ascii_strings - EnhancedSourceStage.run() loop wraps each match into a source dict with confidence=0.40 (SYMBOL_COOCCURRENCE cap), method="lara_pattern", and source_type set to the match kind. ascii_strings wiring is intentionally deferred -- a follow-up will plumb inventory's string_hits / sbom _extract_ascii_runs through into this call site. Verification: pytest -q tests/test_uri_source_extraction.py # 13 passed pytest -q # full suite green ruff check src/ tests/ # clean pyright (changed files) # 0 errors python3 scripts/check_doc_consistency.py # OK Phase 2C+ Track A third commit on PR #9 (Pivot 2026-04-19). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
First-cut implementation of the LATTE (Liu et al., TOSEM 2025)
prompt-slicing idea: when AIEDGE_LATTE_SLICING=1 is set,
_build_taint_prompt() replaces the full function body with a
sink-rooted backward slice so the LLM spends its token budget on
the data-dependency chain instead of the entire function.
- src/aiedge/code_slicing.py (new, 190 lines):
* latte_slicing_enabled() -- env-gate helper
* find_sink_line(body, sink_sym) -- first sink call location
* extract_backward_slice(body, sink_line_idx, max_lines=30)
bottom-up walker: start from the sink line, track identifiers,
keep earlier lines whose identifier set intersects the tracked
set. Blank/comment lines are kept for structural context.
Source order preserved; the sink line and defining lines of
its arguments always land in the slice.
* extract_slice_around_sink() -- convenience wrapper
* maybe_slice() -- env-gated entry point (recommended for
taint_propagation call site; default-off returns body
unchanged so existing prompts are byte-identical)
* slice_compression_ratio() -- telemetry helper
- src/aiedge/taint_propagation.py (+5 lines):
_build_taint_prompt() pipes each function body through
maybe_slice(body, sink_symbol) before the _truncate_text() cap.
- tests/test_code_slicing.py (new, 32 tests):
sink location + word boundary / slice invariants (subset,
source order, sink kept, defining lines pulled in) / max_lines
cap / degenerate inputs / env-gate parsing (truthy/falsy) /
byte-identical default-off / compression-ratio telemetry.
- docs/code_slicing_contract.md (new): algorithm description,
over-approximation caveats, env gate, call site, Phase 2D entry
interaction guidance.
Verification:
pytest -q tests/test_code_slicing.py # 32 passed
pytest -q # full suite green
ruff check, pyright (changed files) # clean / 0 errors
python3 scripts/check_doc_consistency.py # OK
Phase 2C+ Track A fourth commit on PR #9 (Pivot 2026-04-19). This
closes 2C+.1, leaving 2C+.4 (vendor extraction chain -- requires
five external firmware binaries) as the only remaining Track A
step before the Phase 2D entry exit-gate evaluation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # CHANGELOG.md
R00T-Kim
added a commit
that referenced
this pull request
Apr 22, 2026
…corecard - Mark Phase 1.5 (LATTE) / 1.6 (LARA) as landed in v2.7.0 PR #9 in the phase-mapping table (previously only labeled 'Phase 2C+로 이관') - Add 'v2.7.0 / v2.7.1 landed status' subsection under Phase 2C+ Insert that captures: all 2C+ items shipped, Phase 2D' Entry Gate FINAL 2/5 PASS scorecard (12-pair, WRT ok measurement), partial-extraction artifact back-slide lesson, and Pivot Option D unchanged (v2.7.1 is a quantitative refinement of scenario C, not a re-pivot) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 2C+ Track A first PR. Adds measurement scaffolding for the two reviewer-eval-lane blockers identified in the 2026-04-19 Direction Pivot:
finding_id(aiedge.findings.web.exec_sink_overlap), givingfinding_diversity_index = 1.0(degenerate). Gate fails whenmax_share(finding_id) >= AIEDGE_PAIR_DIVERSITY_MAX(default 0.5).pair-eval-dedicated-local7-claude-6h,codex-6h) terminated atrun_index rows = 0with no actionable signal._dump_timeout_diagnostic()now captures last 200 stderr / 50 stdout lines + best-effort run_dir guess + last stage status into<side>/timeout_diagnostic.json.Phase 2D entry exit-gate
This PR delivers gate 3 of 5 for the Phase 2D entry threshold:
Test plan
pytest -q tests/test_finding_diversity_gate.py— 12 passedpytest -q— full suite greenruff check src/ tests/ scripts/— cleanpyright src/aiedge/quality_policy.py scripts/run_pair_eval.py tests/test_finding_diversity_gate.py— 0 errorspython3 scripts/check_doc_consistency.py— OKbash -n scripts/release_gate.sh— cleanrelease_gate.sh --pair-eval-findings benchmark-results/pair-eval/pair_eval_findings.csvagainst the local-7 baseline → expect FAIL withQUALITY_GATE_DIVERSITY_MISS, actual=1.0Files
src/aiedge/quality_policy.py(+115)scripts/run_pair_eval.py(+82, helpers + TimeoutExpired branch)scripts/release_gate.sh(+57, opt-inPAIR_EVAL_DIVERSITYsub-gate)tests/test_finding_diversity_gate.py(new, 12 tests)docs/finding_diversity_gate.md(new)CHANGELOG.md(Unreleased ### Added)Plan reference
See
~/.claude/plans/twinkly-hugging-leaf.mdfor the broader Track A / Track B parallel plan.🤖 Generated with Claude Code