R00T-Kim · R00T-Kim · Apr 19, 2026 · Apr 19, 2026 · Apr 19, 2026 · Apr 19, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,11 @@ Format based on [Keep a Changelog](https://keepachangelog.com/).
 
 ### Added
 
+- **LATTE-inspired text-based backward slicing (Phase 2C+.1)** (`src/aiedge/code_slicing.py`, `src/aiedge/taint_propagation.py`, `tests/test_code_slicing.py`, `docs/code_slicing_contract.md`). First-cut implementation of the LATTE (Liu et al., TOSEM 2025) prompt-slicing idea: when `AIEDGE_LATTE_SLICING=1` is set, `_build_taint_prompt()` replaces the full function body with a sink-rooted backward slice. The slice walks bottom-up from the sink call, keeping earlier lines whose identifiers overlap the tracked variables-of-interest (minus a conservative noise set of C keywords / literals / common macros). The slice is a strict subset of the original body with source order preserved; the sink line and the defining lines of its arguments are always retained. Public API: `find_sink_line`, `extract_backward_slice`, `extract_slice_around_sink`, `maybe_slice`, `slice_compression_ratio`, `latte_slicing_enabled`. Default-off keeps existing LLM prompts byte-identical. _(32 new tests in `tests/test_code_slicing.py`.)_
+- **LARA-style URI / CGI / config-key source identification (Phase 2C+.2)** (`enhanced_source.py`, `tests/test_uri_source_extraction.py`). `EnhancedSourceStage` now widens source identification beyond C-level input APIs by recognising attacker-influenced strings, taking inspiration from the LARA paper (USENIX Sec 2024). Three new pattern sets totalling 50 entries cover URI prefixes (`/cgi-bin/`, `/api/`, `/upnp/`, `/admin/`, `/goform/`, ...), CGI environment variables (`QUERY_STRING`, `REQUEST_METHOD`, `HTTP_*`, ...), and NVRAM / sysconf config keys (`http_passwd`, `wpa_psk`, `cloud_token`, `firmware_url`, ...). New helper `_extract_uri_key_sources(bin_path, symbols, ascii_strings=None)` produces `(pattern, kind)` tuples that are wrapped per-binary into source dicts with `confidence=0.40` (SYMBOL_COOCCURRENCE cap, since string presence alone does not prove reachability) and `method="lara_pattern"`. Symbol-based URI matching is intentionally skipped to avoid noise; the optional `ascii_strings` parameter is the path for string-literal evidence (to be wired through inventory data in a follow-up). _(13 new tests in `tests/test_uri_source_extraction.py`.)_
+- **Sink coverage expansion (Phase 2C+.3)** (`taint_propagation.py`, `tests/test_taint_propagation.py`). `_SINK_SYMBOLS` grows from 29 to 51 symbols, mapping the full CWE taxonomy that the firmware corpus actually exercises: CWE-78 cmd injection (now incl. `wordexp`, `posix_spawn`, `posix_spawnp`), CWE-22 path traversal (`fopen`, `open`, `openat`, `freopen`, `chdir`), CWE-426 search path (`dlsym`, `dlmopen`), CWE-732 perms (`chmod`/`fchmod`/`chown`/`fchown`/`lchown`), CWE-377 insecure tmp (`mktemp`, `tmpnam`, `tempnam`, `tmpfile`), CWE-250/269 privilege (`chroot`, `setuid`, `seteuid`, `setgid`, `setegid`), and CWE-454 env injection (`putenv`, `setenv`, `unsetenv`). `_FORMAT_STRING_SINKS` doubles from 6 to 15 with size-bounded (`vsnprintf`), file-descriptor (`dprintf`/`vdprintf`), and wide-char (`swprintf`, `vswprintf`, `wprintf`, `vwprintf`, `fwprintf`, `vfwprintf`) variants. `_is_format_string_variable()` is strengthened to flag struct field access, array subscripts, function-call results, C-style casts, parenthesised ternaries, and pointer dereferences as variable first-arguments — not just bare identifiers. _(20 new tests in `tests/test_taint_propagation.py`.)_
+- **Finding diversity gate (Phase 2C+.5)** (`quality_policy.py`, `release_gate.sh`, `tests/test_finding_diversity_gate.py`, `docs/finding_diversity_gate.md`). Detects degenerate pair-eval coverage where every pair-side row maps to the same `finding_id` — the structural failure surfaced by the 2026-04-19 reviewer eval lane analysis (local-7 baseline `finding_diversity_index = 1.0`, all 14 rows on `aiedge.findings.web.exec_sink_overlap`). New helpers `compute_pair_eval_diversity_index()`, `load_pair_eval_finding_ids()`, `evaluate_pair_eval_diversity_gate()` produce a `QUALITY_GATE_DIVERSITY_MISS` violation when `max_share(finding_id) >= AIEDGE_PAIR_DIVERSITY_MAX` (default 0.5). `release_gate.sh` wires this in as the opt-in `PAIR_EVAL_DIVERSITY` sub-gate via `--pair-eval-findings`. _(12 new tests in `tests/test_finding_diversity_gate.py`.)_
+- **Pair-eval timeout diagnostic** (`scripts/run_pair_eval.py`). When a pair-side run hits the wall-clock timeout, `_dump_timeout_diagnostic()` writes `<side>/timeout_diagnostic.json` capturing the last 200 stderr / 50 stdout lines, a best-effort run_dir guess, and the most recent stage's name/status. Closes the visibility gap that left the dedicated reviewer rerun lanes (`pair-eval-dedicated-local7-claude-6h`, `codex-6h`) stuck at `run_index rows = 0` without actionable signal.
 - **FDA Section 524B compatibility mapping (Phase 3'.1 step B-2)** (`docs/compliance_mapping/fda_section_524b.md`). Maps SCOUT outputs to the four §524B(b) statutory obligations (postmarket vulnerability monitoring plan, secure design/develop/maintain processes, postmarket updates/patches, SBOM) and to the September 2023 FDA premarket cybersecurity guidance content elements (security objectives, threat modelling, security risk management, cybersecurity testing, architecture views, SBOM, vulnerability management, labelling, postmarket plan). Coverage is documented per element with explicit "out of scope" callouts for sponsor-side QMS deliverables. Disclaimer reuses the directory-wide "compatible with" wording rule.
 - **ISO/SAE 21434 compatibility mapping (Phase 3'.1 step B-3)** (`docs/compliance_mapping/iso_21434.md`). Maps SCOUT outputs to ISO/SAE 21434:2021 work products across clauses 8 (continual cybersecurity activities), 9 (concept), 10 (product development), 11 (cybersecurity validation), 13 (operations and maintenance), and 15 (TARA methods). Identifies which work products are tool-friendly (WP-08-01..04, WP-10-04, WP-10-05, WP-13-02) versus manufacturer-side narratives (WP-09-02, WP-10-01, WP-10-02, etc.).
 - **UN R155 compatibility mapping (Phase 3'.1 step B-3)** (`docs/compliance_mapping/un_r155.md`). Maps SCOUT outputs to UN R155 §7.2 (CSMS) and §7.3 (vehicle-type approval) requirements, plus per-threat guidance for the 15 most-relevant Annex 5 threat categories (manipulation, replay, malware insertion, network-design vulnerabilities, etc.). Co-published with the ISO/SAE 21434 mapping per the standard / regulation pairing.

diff --git a/docs/code_slicing_contract.md b/docs/code_slicing_contract.md
@@ -0,0 +1,111 @@
+# LATTE Code Slicing Contract
+
+> Phase 2C+.1 (Pivot 2026-04-19) — text-based backward slicing that the taint
+> propagation stage uses to compress LLM prompts when
+> `AIEDGE_LATTE_SLICING=1` is set.
+
+## Why this exists
+
+LATTE (Liu et al., "LATTE: LLM-Powered Static Binary Taint Analysis",
+TOSEM 2025) reported that feeding the LLM the **sink-rooted backward
+slice** instead of the full decompiled function body improved new-bug
+discovery and reduced token usage. SCOUT's first-cut implementation
+takes the same idea but stays conservative: it operates on plain text,
+does not require a Ghidra-grade SSA backend, and is opt-in so the
+existing prompt behaviour stays byte-identical when the env var is
+unset.
+
+The slicing is **over-approximate**: it keeps every earlier line whose
+identifier set overlaps the already-tracked variables-of-interest. That
+means the slice is a strict subset of the original body (ordering
+preserved) but it may retain irrelevant lines that happen to mention a
+tainted variable name in passing. In exchange, it never drops a line
+that contains a real data dependency along the sink path, so the LLM
+never has to reason about a variable whose definition disappeared.
+
+## Public API
+
+Source: `src/aiedge/code_slicing.py`.
+
+| Function | Purpose |
+|---|---|
+| `latte_slicing_enabled()` | Returns `True` when `AIEDGE_LATTE_SLICING` is set to `1`/`true`/`yes`/`on` (case-insensitive). |
+| `find_sink_line(body, sink_sym)` | 0-based line index of the first `sink_sym(` call, or `None`. |
+| `extract_backward_slice(body, sink_line_idx, max_lines=30)` | Backward-walks from `sink_line_idx`, keeps lines whose identifiers overlap the tracked set. Returns a string of the retained lines in source order. |
+| `extract_slice_around_sink(body, sink_sym, max_lines=30)` | Convenience: `find_sink_line` then `extract_backward_slice`. Returns `None` when the sink is absent. |
+| `maybe_slice(body, sink_sym, max_lines=30)` | Recommended entry point for call sites: when the env gate is off it returns the body unchanged; when on it returns the slice (falling back to the full body if the sink is not found). Never returns `None`. |
+| `slice_compression_ratio(original, sliced)` | Telemetry helper — ratio of kept lines to original lines. |
+
+## Env gate
+
+```
+AIEDGE_LATTE_SLICING=1   # enable slicing (any of 1/true/yes/on)
+```
+
+Default (unset) means `maybe_slice` returns the input body verbatim, so
+dropping the env var gives byte-identical prompts to every LLM call.
+
+## Algorithm (first-cut)
+
+```
+1. Locate the sink line (first occurrence of `<sink_sym>(`).
+2. Initial variables-of-interest = identifiers on the sink line
+   (minus the noise set: C keywords, literals, common macros).
+3. For each earlier line (bottom-up):
+     a. If its identifier set intersects the variables-of-interest,
+        include it and union its identifiers into the interest set.
+     b. If the line has no usable identifier (blank, comment-only),
+        include it so the LLM keeps structural context.
+     c. Stop at `max_lines` or the function start.
+4. Emit retained lines in source order.
+```
+
+Noise identifiers (`_NOISE_IDENTIFIERS`) are kept minimal on purpose: we
+filter only what is guaranteed not to carry data (`if`, `int`, `NULL`,
+`true`, ...). Vendor-specific tokens are *not* filtered because they
+often *are* the relevant variables in router firmware decompilation.
+
+## Over-approximation behaviour
+
+Because the algorithm tracks identifiers and not their scopes, a slice
+may include lines that merely reference a same-named variable elsewhere
+in the function. This is acceptable for prompt compression but analysts
+who need an exact data-flow trace should still consult the Ghidra
+P-code SSA path (`pcode_taint.py`).
+
+## Call site
+
+The only caller today is `_build_taint_prompt()` in
+`src/aiedge/taint_propagation.py`:
+
+```python
+body_raw = fb.get("body", "")
+body_sliced = maybe_slice(body_raw, sink_symbol)
+body = _truncate_text(body_sliced, max_chars=2000)
+```
+
+When `AIEDGE_LATTE_SLICING` is unset the call returns `body_raw`
+unchanged and the subsequent `_truncate_text` path is byte-identical to
+pre-2C+.1 behaviour.
+
+## Phase 2D entry interaction
+
+Phase 2D.1 (reasoning_trail + MCP loop validation) depends on the LLM
+actually producing useful verdicts across diverse findings. Slicing is
+the main lever we have today to let the LLM see *more* findings within
+the same token budget — so even if Phase 2D.1 does not require slicing,
+leaving it disabled in production runs means the analyst cycles through
+a smaller effective corpus. Operators planning a Phase 2D.1 walkthrough
+should enable `AIEDGE_LATTE_SLICING=1` for the run.
+
+## Related artifacts
+
+- `src/aiedge/code_slicing.py` — implementation
+- `src/aiedge/taint_propagation.py` — call site in `_build_taint_prompt`
+- `tests/test_code_slicing.py` — unit tests (32 cases) that pin:
+  - sink-line location and word-boundary behaviour
+  - slice invariants (subset, source order, sink kept, defining lines
+    pulled in)
+  - `max_lines` cap and degenerate inputs
+  - env-gate parsing and byte-identical default-off
+  - compression-ratio telemetry
diff --git a/docs/finding_diversity_gate.md b/docs/finding_diversity_gate.md
@@ -0,0 +1,137 @@
+# Finding Diversity Gate
+
+> Phase 2C+.5 (Pivot 2026-04-19) — pair-eval lane gate that detects degenerate
+> evidence-tier coverage by measuring finding-id share concentration.
+
+## Why this gate exists
+
+The 2026-04-19 reviewer eval lane analysis surfaced a structural failure that
+neither precision/recall nor confidence caps caught: **every pair-side row in the
+local-7 lane mapped to the same `finding_id`** (`aiedge.findings.web.exec_sink_overlap`,
+`evidence_tier=symbol_only`). The pair-level recall and FP rate looked plausible
+(0.142857 each) yet the underlying tier-ROC was *degenerate* — there was nothing
+to discriminate between vulnerable and patched runs because the detection layer
+collapsed onto a single finding.
+
+The diversity gate quantifies this collapse and blocks releases that ship it.
+
+## Definition
+
+```
+finding_diversity_index = max_count(finding_id) / total_rows
+```
+
+- `1.0` — degenerate (every row mapped to a single `finding_id`)
+- `1/N` — fully diverse (every row a distinct `finding_id`)
+- `0.0` — empty input (callers decide whether to treat as violation)
+
+The index is a **maximum-share** metric, not entropy. It is robust to long-tail
+distributions and surfaces the dominant finding bucket directly.
+
+## Threshold
+
+| Env variable | Default | Direction |
+|---|---|---|
+| `AIEDGE_PAIR_DIVERSITY_MAX` | `0.5` | gate fails when index `>=` threshold |
+
+The default `0.5` was chosen as a first-cut: any single `finding_id` accounting
+for 50%+ of pair rows is treated as a degenerate signal. Once the corpus grows
+past 10 pairs the threshold should be re-evaluated against representative runs
+(see Phase 2C+.4 vendor-extraction expansion).
+
+## Inputs
+
+The gate consumes the pair-eval findings CSV produced by
+`scripts/run_pair_eval.py`. Schema (relevant columns):
+
+| Column | Use |
+|---|---|
+| `finding_id` | counted into the share distribution |
+| `ground_truth` | optional filter via `load_pair_eval_finding_ids(only_ground_truth=...)` |
+
+Empty `finding_id` rows are skipped silently. Missing CSV raises
+`QUALITY_GATE_INVALID_PAIR_EVAL`.
+
+## Output schema
+
+```json
+{
+  "schema_version": 1,
+  "verdict": "pass" | "fail",
+  "passed": true | false,
+  "findings_source": "<path string>",
+  "policy": {
+    "finding_diversity_max": 0.5,
+    "finding_diversity_max_env": "AIEDGE_PAIR_DIVERSITY_MAX"
+  },
+  "measured": {
+    "finding_diversity_index": 0.0..1.0,
+    "sample_size": <int>
+  },
+  "errors": [
+    {
+      "error_token": "QUALITY_GATE_DIVERSITY_MISS",
+      "metric": "finding_diversity_index",
+      "source_field": "pair_eval_findings.finding_id",
+      "actual": 1.0,
+      "threshold": 0.5,
+      "operator": "<",
+      "sample_size": 14,
+      "message": "..."
+    }
+  ]
+}
+```
+
+## Wiring into `release_gate.sh`
+
+The unified release gate wires this in as the `PAIR_EVAL_DIVERSITY` sub-gate. It
+is **opt-in** via `--pair-eval-findings`:
+
+```bash
+scripts/release_gate.sh \
+  --run-dir aiedge-runs/<id> \
+  --pair-eval-findings benchmark-results/pair-eval/pair_eval_findings.csv
+```
+
+When the flag is omitted the gate is skipped with an `INFO` line so existing
+release flows continue working unchanged.
+
+## Current baseline (2026-04-19)
+
+Running the gate against the trusted summary-reuse local-7 lane:
+
+```
+sample_size = 14   (7 pairs × 2 sides)
+finding_diversity_index = 1.0   (degenerate — single finding for all rows)
+verdict = fail
+```
+
+This matches the Pivot 2026-04-19 [diagnosis](../docs/status.md): Phase 2D entry
+is gated until detection coverage produces at least two distinct findings across
+the pair lane. The gate makes that requirement enforceable instead of advisory.
+
+## Phase 2D entry exit-gate hook
+
+The diversity gate is one of the five Phase 2D entry exit-gate thresholds
+defined in [`docs/status.md`](status.md):
+
+| Gate | Threshold | Tooling |
+|---|---|---|
+| Detection recall | `≥ 0.40` | `pair_eval_summary.json` |
+| Tier discriminability | `≥ 2 nonzero TP tiers` | `pair_eval_findings.csv` |
+| **Finding diversity** | **`< 0.5`** | **this gate** |
+| Dedicated rerun | `≥ 1 driver success` | `pair-eval-dedicated-*` lanes |
+| Corpus size | `≥ 10 pairs` | `benchmarks/pair-eval/pairs.json` |
+
+The other four are tracked in their own places; this gate only owns the
+diversity threshold.
+
+## Related artifacts
+
+- `src/aiedge/quality_policy.py` — `compute_pair_eval_diversity_index`,
+  `load_pair_eval_finding_ids`, `evaluate_pair_eval_diversity_gate`
+- `scripts/run_pair_eval.py` — adds `timeout_diagnostic.json` for dedicated
+  rerun timeout investigations (companion 2C+.5 work)
+- `scripts/release_gate.sh` — `PAIR_EVAL_DIVERSITY` sub-gate
+- `tests/test_finding_diversity_gate.py` — unit + baseline tests
diff --git a/scripts/release_gate.sh b/scripts/release_gate.sh
@@ -10,12 +10,13 @@ CORPUS_MANIFEST="benchmarks/corpus/manifest.json"
 METRICS_OUT=""
 QUALITY_OUT=""
 LLM_FIXTURE=""
+PAIR_EVAL_FINDINGS=""
 
 FAILED=0
 
 usage() {
   cat <<'EOF'
-Usage: scripts/release_gate.sh --run-dir <PATH> [--manifest <PATH>] [--metrics-out <PATH>] [--quality-out <PATH>] [--llm-fixture <PATH>]
+Usage: scripts/release_gate.sh --run-dir <PATH> [--manifest <PATH>] [--metrics-out <PATH>] [--quality-out <PATH>] [--llm-fixture <PATH>] [--pair-eval-findings <PATH>]
 
 Unified release governance gate (single entrypoint).
 
@@ -25,6 +26,7 @@ Sub-gates:
   - QUALITY_METRICS: aiedge quality-metrics
   - QUALITY_POLICY: aiedge release-quality-gate
   - EXPLOIT_TIER_POLICY: schema tier checks plus exploit_policy artifact checks when present
+  - PAIR_EVAL_DIVERSITY: finding-diversity gate over pair_eval_findings.csv (skipped when --pair-eval-findings absent)
   - TAMPER_SUITE: pytest tests/test_tamper_suite.py
 EOF
 }
@@ -97,6 +99,10 @@ while [[ $# -gt 0 ]]; do
       LLM_FIXTURE="$2"
       shift 2
       ;;
+    --pair-eval-findings)
+      PAIR_EVAL_FINDINGS="$2"
+      shift 2
+      ;;
     -h|--help)
       usage
       exit 0
@@ -203,6 +209,66 @@ else
 fi
 rm -f "$EXPLOIT_CHECK_OUTPUT"
 
+if [[ -n "$PAIR_EVAL_FINDINGS" ]]; then
+  PAIR_EVAL_OUTPUT="$(mktemp)"
+  set +e
+  PYTHONPATH="$PYTHONPATH" python3 - <<'PY' "$PAIR_EVAL_FINDINGS" "$RUN_DIR" >"$PAIR_EVAL_OUTPUT" 2>&1
+import json
+import sys
+from pathlib import Path
+
+from aiedge.quality_policy import (
+    QualityGateError,
+    evaluate_pair_eval_diversity_gate,
+    load_pair_eval_finding_ids,
+)
+
+csv_path = Path(sys.argv[1]).resolve()
+run_dir = Path(sys.argv[2]).resolve()
+out_path = run_dir / "pair_eval_diversity_gate.json"
+try:
+    finding_ids = load_pair_eval_finding_ids(csv_path)
+except QualityGateError as exc:
+    print(f"{exc.token}: {exc}")
+    raise SystemExit(1) from exc
+
+result = evaluate_pair_eval_diversity_gate(
+    finding_ids=finding_ids,
+    findings_source=str(csv_path),
+)
+out_path.write_text(
+    json.dumps(result, indent=2, sort_keys=True) + "\n", encoding="utf-8"
+)
+if not result["passed"]:
+    for err in result["errors"]:
+        print(err.get("message") or err.get("error_token"))
+    raise SystemExit(1)
+measured = result["measured"]
+print(
+    "diversity_index="
+    + str(measured["finding_diversity_index"])
+    + " sample_size="
+    + str(measured["sample_size"])
+)
+PY
+  PAIR_EVAL_RC=$?
+  set -e
+  if [[ "$PAIR_EVAL_RC" -ne 0 ]]; then
+    gate_fail "PAIR_EVAL_DIVERSITY" "diversity gate violated"
+    while IFS= read -r line; do
+      [[ -n "$line" ]] && echo "[GATE][LOG][PAIR_EVAL_DIVERSITY] $line"
+    done <"$PAIR_EVAL_OUTPUT"
+  else
+    gate_pass "PAIR_EVAL_DIVERSITY" "diversity gate passed"
+    while IFS= read -r line; do
+      [[ -n "$line" ]] && gate_info "PAIR_EVAL_DIVERSITY" "$line"
+    done <"$PAIR_EVAL_OUTPUT"
+  fi
+  rm -f "$PAIR_EVAL_OUTPUT"
+else
+  gate_info "PAIR_EVAL_DIVERSITY" "skipped (no --pair-eval-findings)"
+fi
+
 if [[ "${AIEDGE_SKIP_TAMPER_TESTS:-0}" == "1" ]]; then
   gate_info "TAMPER_SUITE" "skipped by AIEDGE_SKIP_TAMPER_TESTS=1"
 else