Skip to content

Add opt-in embedded deps surface to ir.strategy.Package (#62)#69

Merged
thorwhalen merged 1 commit into
masterfrom
claude/issue-62-deps-surface
Jun 18, 2026
Merged

Add opt-in embedded deps surface to ir.strategy.Package (#62)#69
thorwhalen merged 1 commit into
masterfrom
claude/issue-62-deps-surface

Conversation

@thorwhalen

Copy link
Copy Markdown
Member

What

Adds an opt-in embedded deps surface to ir.strategy.Package (issue #62, tracking #61). Package(embed_deps=True) emits one extra Surface(kind="deps", granularity="field") serializing the bare dependency names (version specifiers / extras / markers stripped via ir.graph._dep_name), e.g. "Depends on: sentence-transformers, networkx, meshed".

  • The deps bag is its own surface, separate from prose, so a rare library name isn't diluted and the BM25 leg picks up exact dep-token matches.
  • Deps remain a filter field too. Appended last, so the description (0) / readme_chunk surface_index contract is unchanged.
  • Default False (progressive disclosure — today's behavior). Folds into the strategy id, so toggling re-decomposes incrementally; round-trips via strategy_to_spec. deps_template overrides the serialization.
  • Touches only ir/strategy.py (deps are already scanned by sources._scan_packages).

Per the #38 decision rule this is a single-shot retrieval-quality change → lands in ir.

Why

The 231-package run found the dependency list was the single most discriminative signal, yet Package stored deps as a filter field only and never embedded them — so the index couldn't see that a package depends on meshed/networkx (⇒ graphs) or sentence-transformers/ef (⇒ embeddings).

Measured (all-MiniLM, hybrid, k=20, vs Package() baseline)

Using the #66 harness + the private benchmark:

theme nDCG@20 distractor fp_rate@20 hard-pos recall@20
embeddings 0.861 → 0.884 0.071 → 0.071 0.40 → 0.40
graphs 0.745 → 0.745 0.222 → 0.222 0.167 → 0.167

Per-package (the real story): dep-revealing packages climb sharply — allude 83→28 (depends on meshed), xcosmo 103→40 (depends on cosmograph) — while a true non-dependent stays flat: au 22→23 (no graph dep, so deps correctly don't lift it — the FP guard holds). This confirms the lift is real dependency signal, not added text volume.

As an equal-weighted surface the lift mostly lands in the rank 20–50 band, so aggregate nDCG@20 moves modestly (+0.023 embeddings) and a few tail items jitter ±1–4 ranks. Getting allude/xcosmo into the top-20 needs deps-surface up-weighting at fusion → filed as #68. Whether to flip embed_deps on by default (and rebuild the live packages corpus) is gated on that experiment.

Tests

6 new tests/test_strategy.py cases (off-by-default, surface emitted + appended last, empty-deps → no surface, custom template, _default_deps_text strip/dedup/lowercase, spec round-trip). Full suite 420 passed; ruff E/W/F/B clean at LL88.

Part of #61. Closes #62. Follow-up: #68.

https://claude.ai/code/session_01D229oNHVN1drd1mdbQL5MV

Package(embed_deps=True) emits one extra Surface(kind='deps', granularity='field')
serializing the BARE dependency names (via ir.graph._dep_name), e.g.
'Depends on: sentence-transformers, networkx, meshed'. The deps bag is kept in its
own surface (separate from prose, so a rare library name is not diluted and the
BM25 leg picks up exact dep-token matches) and remains a filter field. Appended
last, so the description (0) and readme_chunk surface_index contract is preserved.
Default False (progressive disclosure); folds into the strategy id so toggling
re-decomposes incrementally; round-trips via strategy_to_spec. deps_template
overrides the serialization. Touches only strategy.py.

Measured on the 231-package corpus (all-MiniLM, hybrid) vs Package() baseline:
dep-revealing packages climb sharply (allude 83->28, xcosmo 103->40) while a true
non-dependent stays flat (au 22->23 — the FP guard holds), confirming real
dependency signal. Aggregate nDCG@20 +0.023 (embeddings); full payoff needs
deps-surface up-weighting at fusion (follow-up).

Part of #61. Closes #62.

Claude-Session: https://claude.ai/code/session_01D229oNHVN1drd1mdbQL5MV
@thorwhalen thorwhalen merged commit 1768b2e into master Jun 18, 2026
12 checks passed
@thorwhalen thorwhalen deleted the claude/issue-62-deps-surface branch June 18, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Embed the dependency list as a first-class 'deps' surface in ir.strategy.Package (deps-as-text), keeping the filter field

1 participant