Skip to content

Up-weight the 'deps' surface at hybrid fusion (per-surface-kind weighting in ir.retrieve) #68

Description

@thorwhalen

Context

Follow-up to #62 (deps-as-text surface) — thread #61. #62 added an opt-in deps surface to ir.strategy.Package. Measured on the 231-package corpus (all-MiniLM, hybrid) it produces real, correctly-targeted dependency signal — allude 83→28 (depends on meshed), xcosmo 103→40 (depends on cosmograph), while a true non-dependent stays flat (au 22→23). But as an equal-weighted surface in hybrid RRF the lift mostly lands in the rank 20–50 band, so aggregate nDCG@20 moves only +0.023 (embeddings) / +0.00 (graphs), and a few tail items jitter by 1–4 ranks.

Problem

The deps surface is a high-precision, low-noise signal (exact library tokens), but RRF fuses every surface-hit equally, so a single dep-token match is averaged in with prose chunks rather than boosted. ir/retrieve.py already filters/branches on surface_kind (~line 106), so the hook exists — what's missing is a per-surface-kind weight at fusion.

Proposal (single-shot retrieval seam → ir, per #38)

Add an optional surface_weights: Mapping[str, float] | None (e.g. {"deps": 2.0}) to ir.search / fuse_hits, applied as a multiplier on a surface-hit's fused contribution (RRF rank-weight or blend score) before the per-artifact collapse. Default None = today's equal weighting (progressive disclosure). Keep it embedder-agnostic and offline.

Experiment

Reuse the #66 harness + the private benchmark. Sweep surface_weights={"deps": w} for w ∈ {1, 1.5, 2, 3} via compare_indexings and pick the w that maximizes hard-positive recall@20 on the dep-revealing set (allude, xcosmo, chromadol, http_cosmo_prep, lexis, unbox) without raising the distractor fp_rate (esp. au/creek/strand must stay put).

Success metric

Part of #61.

https://claude.ai/code/session_01D229oNHVN1drd1mdbQL5MV

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions