Dependency-as-ranking-signal: reverse-dependency WalkPolicy (ir.traverse) + post-fusion dep-evidence boost (ir.retrieve), seeded by the caller

## Context
Thread: i2mint/ir#61. The dependency list is our most discriminative signal, but `ir` uses deps only as a hard FILTER field and as forward `REF` edges (`package -> its deps`) it never walks for ranking. `graph.py` already builds `deps->REF` edges (`graph.py:179`) and `traverse.py` already has a bounded `WalkPolicy` operator (`traverse.py:105`), but the only shipped policy is `collapsed_tree_policy` (summary-routing); there is no policy that walks `REF` edges, and `CorpusGraph` has no reverse (`deps -> dependents`) index. Both mechanisms below make `ir`'s **own single-shot search** better when a seed/lib set is supplied, satisfying the #38 decision rule.

## Problem (with our FP/FN evidence)
Every "uses-tools" package that DEPENDS ON a domain library but never says so in prose is a false negative: `chromadol` (0.02, vector-DB DOL), `http_cosmo_prep` (0.01, embeddings service), `allude` (no score, depends on `meshed`), `unbox` (no score, import-dependency graph), `cosmo_data_prep` (no score), `xcosmo` (0.29, cosmograph viz). Meanwhile the dense leg promotes prose-similar distractors whose deps CONTRADICT the match: `au` #2 (0.51) for graphs depends on async-task libs not graph libs; `su`/`csm`/`voxy`/`theremin` (0.44–0.51) for embeddings depend on audio/DSP libs; `ef`/`imbed` (0.46/0.34) for graphs are embedding *flows*, not graph libs.

## Proposal (two composable mechanisms, both pure-structural, offline, model-free)
**(1) Recall-time — reverse-dependency walk** in `ir/graph.py` + `ir/traverse.py`:
- Extend `CorpusGraph` with `reverse_neighbors(node_id, *, edge_type='REF')` (invert the stored links view once into a `{dep_name: [dependents]}` map, cached derived state like the forward edges) and a `fan_in(node_id)` count as a PageRank-style centrality prior.
- Ship `reverse_dependency_policy(*, seeds, edge_type='REF', max_depth, fan_in_weight)` in `ir/traverse.py`: seeds on a **caller-supplied** set of known domain-library ids, walks reverse `REF` edges to surface dependents, scores each committed node by combining cosine-to-query with a normalized fan-in/proximity-to-seed term. Reuses `traverse()`'s existing visited-set/depth/budget primitives unchanged.

**(2) Ranking-time — dep-evidence boost** in `ir/retrieve.py`:
- Add an optional `dep_boost: Callable[[SearchHit, Mapping], float] | None = None` to `search()`, applied **after fusion and BEFORE per-artifact collapse** (composes with the existing `rerank=` seam at `retrieve.py:291`).
- Ship `dependency_evidence_boost(relevant_libs: set[str], *, weight, mode='additive')`: reads the candidate's `deps` filter field and returns `weight * normalized_overlap` with the query-relevant library set. It is a cheaper, structural complement to the text reranker, which can still be fooled by "DAG".

**CRITICAL BOUNDARY:** both take the seed set / `relevant_libs` as an **argument**. `ir` does NOT decide which libraries are domain-relevant for a goal — that source-selection/planning decision is **raglab**'s Planner (thorwhalen/raglab#2), the same seam in both mechanisms. `ir` ships only the operators + the pure-vector + structural scoring. Deliberately a SIGNAL, never a back-edge or loop (guards #38's "no control loop in ir").

## Experiment
Build the corpus with default `Package` + `default_edge_extractor` on `all-MiniLM`. Cases from the **private** benchmark repo **`thorwhalen/ir-eval-data`** (access-controlled) — `package_relevance_labels.jsonl` (full 231-package graded gold labeling), `named_sets.json` (per-theme `distractors` + `hard_positives`), and `benchmark_analysis.json` (frozen `all-MiniLM-L6-v2` baseline: precision@K = recall@K ≈ 0.42 for both themes). Clone with repo access; package names are not mirrored into this public repo. with graded gold. Baseline arm: `ir.search(mode='hybrid')`. Treatment arms: `fuse_hits` over `[flat hybrid hits] + [traverse(query, CorpusGraph, policy=reverse_dependency_policy(seeds=...))]`, and separately `search(..., dep_boost=dependency_evidence_boost(relevant_libs=...))`. Seeds/libs for embeddings = `{ef, imbed, vd, grub, sentence-transformers, transformers, torch, openai, oa, aix, sklearn}`; for graphs = `{meshed, linked, networkx, graphviz, igraph, dagapp, cosmograph}`. Ablate the boost `weight` to plot FP-rate-vs-recall. Compare against an `ef.Reranker` arm to show the dep signal is complementary (the reranker reads prose and can be fooled by "DAG").

## Success metric
- nDCG@20 (graded) + recall@20 per theme vs the all-MiniLM hybrid baseline.
- Targeted: recall@20 on the 8 named uses-tools/thin-desc hard-positives `{chromadol, http_cosmo_prep, allude, unbox, cosmo_data_prep, xcosmo, kroki, lexis}` — baseline ~0/8 in top-20 → target ≥6/8.
- Guardrail: FP-rate on `{au, strand, reci, creek, su, csm, voxy}` must NOT increase (they don't depend on seed libs). For the boost: `au` drops out of top-10 for graphs; the audio cluster drops out of top-10 for embeddings.
- Report the fan-in distribution as a sanity check that the centrality prior is non-degenerate.

## Data
Full 231-package graded cases (the **private** benchmark repo **`thorwhalen/ir-eval-data`** (access-controlled) — `package_relevance_labels.jsonl` (full 231-package graded gold labeling), `named_sets.json` (per-theme `distractors` + `hard_positives`), and `benchmark_analysis.json` (frozen `all-MiniLM-L6-v2` baseline: precision@K = recall@K ≈ 0.42 for both themes). Clone with repo access; package names are not mirrored into this public repo.) + named distractor/hard-positive subsets. The reverse index + dep overlap are built from the already-stored `deps` filter field — no re-embedding required.

https://claude.ai/code/session_01D229oNHVN1drd1mdbQL5MV

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dependency-as-ranking-signal: reverse-dependency WalkPolicy (ir.traverse) + post-fusion dep-evidence boost (ir.retrieve), seeded by the caller #64

Context

Problem (with our FP/FN evidence)

Proposal (two composable mechanisms, both pure-structural, offline, model-free)

Experiment

Success metric

Data

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Dependency-as-ranking-signal: reverse-dependency WalkPolicy (ir.traverse) + post-fusion dep-evidence boost (ir.retrieve), seeded by the caller #64

Description

Context

Problem (with our FP/FN evidence)

Proposal (two composable mechanisms, both pure-structural, offline, model-free)

Experiment

Success metric

Data

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions