Context
Thread: #61. The dependency list is our most discriminative signal, but ir uses deps only as a hard FILTER field and as forward REF edges (package -> its deps) it never walks for ranking. graph.py already builds deps->REF edges (graph.py:179) and traverse.py already has a bounded WalkPolicy operator (traverse.py:105), but the only shipped policy is collapsed_tree_policy (summary-routing); there is no policy that walks REF edges, and CorpusGraph has no reverse (deps -> dependents) index. Both mechanisms below make ir's own single-shot search better when a seed/lib set is supplied, satisfying the #38 decision rule.
Problem (with our FP/FN evidence)
Every "uses-tools" package that DEPENDS ON a domain library but never says so in prose is a false negative: chromadol (0.02, vector-DB DOL), http_cosmo_prep (0.01, embeddings service), allude (no score, depends on meshed), unbox (no score, import-dependency graph), cosmo_data_prep (no score), xcosmo (0.29, cosmograph viz). Meanwhile the dense leg promotes prose-similar distractors whose deps CONTRADICT the match: au #2 (0.51) for graphs depends on async-task libs not graph libs; su/csm/voxy/theremin (0.44–0.51) for embeddings depend on audio/DSP libs; ef/imbed (0.46/0.34) for graphs are embedding flows, not graph libs.
Proposal (two composable mechanisms, both pure-structural, offline, model-free)
(1) Recall-time — reverse-dependency walk in ir/graph.py + ir/traverse.py:
- Extend
CorpusGraph with reverse_neighbors(node_id, *, edge_type='REF') (invert the stored links view once into a {dep_name: [dependents]} map, cached derived state like the forward edges) and a fan_in(node_id) count as a PageRank-style centrality prior.
- Ship
reverse_dependency_policy(*, seeds, edge_type='REF', max_depth, fan_in_weight) in ir/traverse.py: seeds on a caller-supplied set of known domain-library ids, walks reverse REF edges to surface dependents, scores each committed node by combining cosine-to-query with a normalized fan-in/proximity-to-seed term. Reuses traverse()'s existing visited-set/depth/budget primitives unchanged.
(2) Ranking-time — dep-evidence boost in ir/retrieve.py:
- Add an optional
dep_boost: Callable[[SearchHit, Mapping], float] | None = None to search(), applied after fusion and BEFORE per-artifact collapse (composes with the existing rerank= seam at retrieve.py:291).
- Ship
dependency_evidence_boost(relevant_libs: set[str], *, weight, mode='additive'): reads the candidate's deps filter field and returns weight * normalized_overlap with the query-relevant library set. It is a cheaper, structural complement to the text reranker, which can still be fooled by "DAG".
CRITICAL BOUNDARY: both take the seed set / relevant_libs as an argument. ir does NOT decide which libraries are domain-relevant for a goal — that source-selection/planning decision is raglab's Planner (thorwhalen/raglab#2), the same seam in both mechanisms. ir ships only the operators + the pure-vector + structural scoring. Deliberately a SIGNAL, never a back-edge or loop (guards #38's "no control loop in ir").
Experiment
Build the corpus with default Package + default_edge_extractor on all-MiniLM. Cases from the private benchmark repo thorwhalen/ir-eval-data (access-controlled) — package_relevance_labels.jsonl (full 231-package graded gold labeling), named_sets.json (per-theme distractors + hard_positives), and benchmark_analysis.json (frozen all-MiniLM-L6-v2 baseline: precision@K = recall@K ≈ 0.42 for both themes). Clone with repo access; package names are not mirrored into this public repo. with graded gold. Baseline arm: ir.search(mode='hybrid'). Treatment arms: fuse_hits over [flat hybrid hits] + [traverse(query, CorpusGraph, policy=reverse_dependency_policy(seeds=...))], and separately search(..., dep_boost=dependency_evidence_boost(relevant_libs=...)). Seeds/libs for embeddings = {ef, imbed, vd, grub, sentence-transformers, transformers, torch, openai, oa, aix, sklearn}; for graphs = {meshed, linked, networkx, graphviz, igraph, dagapp, cosmograph}. Ablate the boost weight to plot FP-rate-vs-recall. Compare against an ef.Reranker arm to show the dep signal is complementary (the reranker reads prose and can be fooled by "DAG").
Success metric
- nDCG@20 (graded) + recall@20 per theme vs the all-MiniLM hybrid baseline.
- Targeted: recall@20 on the 8 named uses-tools/thin-desc hard-positives
{chromadol, http_cosmo_prep, allude, unbox, cosmo_data_prep, xcosmo, kroki, lexis} — baseline ~0/8 in top-20 → target ≥6/8.
- Guardrail: FP-rate on
{au, strand, reci, creek, su, csm, voxy} must NOT increase (they don't depend on seed libs). For the boost: au drops out of top-10 for graphs; the audio cluster drops out of top-10 for embeddings.
- Report the fan-in distribution as a sanity check that the centrality prior is non-degenerate.
Data
Full 231-package graded cases (the private benchmark repo thorwhalen/ir-eval-data (access-controlled) — package_relevance_labels.jsonl (full 231-package graded gold labeling), named_sets.json (per-theme distractors + hard_positives), and benchmark_analysis.json (frozen all-MiniLM-L6-v2 baseline: precision@K = recall@K ≈ 0.42 for both themes). Clone with repo access; package names are not mirrored into this public repo.) + named distractor/hard-positive subsets. The reverse index + dep overlap are built from the already-stored deps filter field — no re-embedding required.
https://claude.ai/code/session_01D229oNHVN1drd1mdbQL5MV
Context
Thread: #61. The dependency list is our most discriminative signal, but
iruses deps only as a hard FILTER field and as forwardREFedges (package -> its deps) it never walks for ranking.graph.pyalready buildsdeps->REFedges (graph.py:179) andtraverse.pyalready has a boundedWalkPolicyoperator (traverse.py:105), but the only shipped policy iscollapsed_tree_policy(summary-routing); there is no policy that walksREFedges, andCorpusGraphhas no reverse (deps -> dependents) index. Both mechanisms below makeir's own single-shot search better when a seed/lib set is supplied, satisfying the #38 decision rule.Problem (with our FP/FN evidence)
Every "uses-tools" package that DEPENDS ON a domain library but never says so in prose is a false negative:
chromadol(0.02, vector-DB DOL),http_cosmo_prep(0.01, embeddings service),allude(no score, depends onmeshed),unbox(no score, import-dependency graph),cosmo_data_prep(no score),xcosmo(0.29, cosmograph viz). Meanwhile the dense leg promotes prose-similar distractors whose deps CONTRADICT the match:au#2 (0.51) for graphs depends on async-task libs not graph libs;su/csm/voxy/theremin(0.44–0.51) for embeddings depend on audio/DSP libs;ef/imbed(0.46/0.34) for graphs are embedding flows, not graph libs.Proposal (two composable mechanisms, both pure-structural, offline, model-free)
(1) Recall-time — reverse-dependency walk in
ir/graph.py+ir/traverse.py:CorpusGraphwithreverse_neighbors(node_id, *, edge_type='REF')(invert the stored links view once into a{dep_name: [dependents]}map, cached derived state like the forward edges) and afan_in(node_id)count as a PageRank-style centrality prior.reverse_dependency_policy(*, seeds, edge_type='REF', max_depth, fan_in_weight)inir/traverse.py: seeds on a caller-supplied set of known domain-library ids, walks reverseREFedges to surface dependents, scores each committed node by combining cosine-to-query with a normalized fan-in/proximity-to-seed term. Reusestraverse()'s existing visited-set/depth/budget primitives unchanged.(2) Ranking-time — dep-evidence boost in
ir/retrieve.py:dep_boost: Callable[[SearchHit, Mapping], float] | None = Nonetosearch(), applied after fusion and BEFORE per-artifact collapse (composes with the existingrerank=seam atretrieve.py:291).dependency_evidence_boost(relevant_libs: set[str], *, weight, mode='additive'): reads the candidate'sdepsfilter field and returnsweight * normalized_overlapwith the query-relevant library set. It is a cheaper, structural complement to the text reranker, which can still be fooled by "DAG".CRITICAL BOUNDARY: both take the seed set /
relevant_libsas an argument.irdoes NOT decide which libraries are domain-relevant for a goal — that source-selection/planning decision is raglab's Planner (thorwhalen/raglab#2), the same seam in both mechanisms.irships only the operators + the pure-vector + structural scoring. Deliberately a SIGNAL, never a back-edge or loop (guards #38's "no control loop in ir").Experiment
Build the corpus with default
Package+default_edge_extractoronall-MiniLM. Cases from the private benchmark repothorwhalen/ir-eval-data(access-controlled) —package_relevance_labels.jsonl(full 231-package graded gold labeling),named_sets.json(per-themedistractors+hard_positives), andbenchmark_analysis.json(frozenall-MiniLM-L6-v2baseline: precision@K = recall@K ≈ 0.42 for both themes). Clone with repo access; package names are not mirrored into this public repo. with graded gold. Baseline arm:ir.search(mode='hybrid'). Treatment arms:fuse_hitsover[flat hybrid hits] + [traverse(query, CorpusGraph, policy=reverse_dependency_policy(seeds=...))], and separatelysearch(..., dep_boost=dependency_evidence_boost(relevant_libs=...)). Seeds/libs for embeddings ={ef, imbed, vd, grub, sentence-transformers, transformers, torch, openai, oa, aix, sklearn}; for graphs ={meshed, linked, networkx, graphviz, igraph, dagapp, cosmograph}. Ablate the boostweightto plot FP-rate-vs-recall. Compare against anef.Rerankerarm to show the dep signal is complementary (the reranker reads prose and can be fooled by "DAG").Success metric
{chromadol, http_cosmo_prep, allude, unbox, cosmo_data_prep, xcosmo, kroki, lexis}— baseline ~0/8 in top-20 → target ≥6/8.{au, strand, reci, creek, su, csm, voxy}must NOT increase (they don't depend on seed libs). For the boost:audrops out of top-10 for graphs; the audio cluster drops out of top-10 for embeddings.Data
Full 231-package graded cases (the private benchmark repo
thorwhalen/ir-eval-data(access-controlled) —package_relevance_labels.jsonl(full 231-package graded gold labeling),named_sets.json(per-themedistractors+hard_positives), andbenchmark_analysis.json(frozenall-MiniLM-L6-v2baseline: precision@K = recall@K ≈ 0.42 for both themes). Clone with repo access; package names are not mirrored into this public repo.) + named distractor/hard-positive subsets. The reverse index + dep overlap are built from the already-storeddepsfilter field — no re-embedding required.https://claude.ai/code/session_01D229oNHVN1drd1mdbQL5MV