Skip to content

Dependency-as-ranking-signal: reverse-dependency WalkPolicy (ir.traverse) + post-fusion dep-evidence boost (ir.retrieve), seeded by the caller #64

Description

@thorwhalen

Context

Thread: #61. The dependency list is our most discriminative signal, but ir uses deps only as a hard FILTER field and as forward REF edges (package -> its deps) it never walks for ranking. graph.py already builds deps->REF edges (graph.py:179) and traverse.py already has a bounded WalkPolicy operator (traverse.py:105), but the only shipped policy is collapsed_tree_policy (summary-routing); there is no policy that walks REF edges, and CorpusGraph has no reverse (deps -> dependents) index. Both mechanisms below make ir's own single-shot search better when a seed/lib set is supplied, satisfying the #38 decision rule.

Problem (with our FP/FN evidence)

Every "uses-tools" package that DEPENDS ON a domain library but never says so in prose is a false negative: chromadol (0.02, vector-DB DOL), http_cosmo_prep (0.01, embeddings service), allude (no score, depends on meshed), unbox (no score, import-dependency graph), cosmo_data_prep (no score), xcosmo (0.29, cosmograph viz). Meanwhile the dense leg promotes prose-similar distractors whose deps CONTRADICT the match: au #2 (0.51) for graphs depends on async-task libs not graph libs; su/csm/voxy/theremin (0.44–0.51) for embeddings depend on audio/DSP libs; ef/imbed (0.46/0.34) for graphs are embedding flows, not graph libs.

Proposal (two composable mechanisms, both pure-structural, offline, model-free)

(1) Recall-time — reverse-dependency walk in ir/graph.py + ir/traverse.py:

  • Extend CorpusGraph with reverse_neighbors(node_id, *, edge_type='REF') (invert the stored links view once into a {dep_name: [dependents]} map, cached derived state like the forward edges) and a fan_in(node_id) count as a PageRank-style centrality prior.
  • Ship reverse_dependency_policy(*, seeds, edge_type='REF', max_depth, fan_in_weight) in ir/traverse.py: seeds on a caller-supplied set of known domain-library ids, walks reverse REF edges to surface dependents, scores each committed node by combining cosine-to-query with a normalized fan-in/proximity-to-seed term. Reuses traverse()'s existing visited-set/depth/budget primitives unchanged.

(2) Ranking-time — dep-evidence boost in ir/retrieve.py:

  • Add an optional dep_boost: Callable[[SearchHit, Mapping], float] | None = None to search(), applied after fusion and BEFORE per-artifact collapse (composes with the existing rerank= seam at retrieve.py:291).
  • Ship dependency_evidence_boost(relevant_libs: set[str], *, weight, mode='additive'): reads the candidate's deps filter field and returns weight * normalized_overlap with the query-relevant library set. It is a cheaper, structural complement to the text reranker, which can still be fooled by "DAG".

CRITICAL BOUNDARY: both take the seed set / relevant_libs as an argument. ir does NOT decide which libraries are domain-relevant for a goal — that source-selection/planning decision is raglab's Planner (thorwhalen/raglab#2), the same seam in both mechanisms. ir ships only the operators + the pure-vector + structural scoring. Deliberately a SIGNAL, never a back-edge or loop (guards #38's "no control loop in ir").

Experiment

Build the corpus with default Package + default_edge_extractor on all-MiniLM. Cases from the private benchmark repo thorwhalen/ir-eval-data (access-controlled) — package_relevance_labels.jsonl (full 231-package graded gold labeling), named_sets.json (per-theme distractors + hard_positives), and benchmark_analysis.json (frozen all-MiniLM-L6-v2 baseline: precision@K = recall@K ≈ 0.42 for both themes). Clone with repo access; package names are not mirrored into this public repo. with graded gold. Baseline arm: ir.search(mode='hybrid'). Treatment arms: fuse_hits over [flat hybrid hits] + [traverse(query, CorpusGraph, policy=reverse_dependency_policy(seeds=...))], and separately search(..., dep_boost=dependency_evidence_boost(relevant_libs=...)). Seeds/libs for embeddings = {ef, imbed, vd, grub, sentence-transformers, transformers, torch, openai, oa, aix, sklearn}; for graphs = {meshed, linked, networkx, graphviz, igraph, dagapp, cosmograph}. Ablate the boost weight to plot FP-rate-vs-recall. Compare against an ef.Reranker arm to show the dep signal is complementary (the reranker reads prose and can be fooled by "DAG").

Success metric

  • nDCG@20 (graded) + recall@20 per theme vs the all-MiniLM hybrid baseline.
  • Targeted: recall@20 on the 8 named uses-tools/thin-desc hard-positives {chromadol, http_cosmo_prep, allude, unbox, cosmo_data_prep, xcosmo, kroki, lexis} — baseline ~0/8 in top-20 → target ≥6/8.
  • Guardrail: FP-rate on {au, strand, reci, creek, su, csm, voxy} must NOT increase (they don't depend on seed libs). For the boost: au drops out of top-10 for graphs; the audio cluster drops out of top-10 for embeddings.
  • Report the fan-in distribution as a sanity check that the centrality prior is non-degenerate.

Data

Full 231-package graded cases (the private benchmark repo thorwhalen/ir-eval-data (access-controlled) — package_relevance_labels.jsonl (full 231-package graded gold labeling), named_sets.json (per-theme distractors + hard_positives), and benchmark_analysis.json (frozen all-MiniLM-L6-v2 baseline: precision@K = recall@K ≈ 0.42 for both themes). Clone with repo access; package names are not mirrored into this public repo.) + named distractor/hard-positive subsets. The reverse index + dep overlap are built from the already-stored deps filter field — no re-embedding required.

https://claude.ai/code/session_01D229oNHVN1drd1mdbQL5MV

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions