Add opt-in embedded deps surface to ir.strategy.Package (#62)#69
Merged
Conversation
Package(embed_deps=True) emits one extra Surface(kind='deps', granularity='field') serializing the BARE dependency names (via ir.graph._dep_name), e.g. 'Depends on: sentence-transformers, networkx, meshed'. The deps bag is kept in its own surface (separate from prose, so a rare library name is not diluted and the BM25 leg picks up exact dep-token matches) and remains a filter field. Appended last, so the description (0) and readme_chunk surface_index contract is preserved. Default False (progressive disclosure); folds into the strategy id so toggling re-decomposes incrementally; round-trips via strategy_to_spec. deps_template overrides the serialization. Touches only strategy.py. Measured on the 231-package corpus (all-MiniLM, hybrid) vs Package() baseline: dep-revealing packages climb sharply (allude 83->28, xcosmo 103->40) while a true non-dependent stays flat (au 22->23 — the FP guard holds), confirming real dependency signal. Aggregate nDCG@20 +0.023 (embeddings); full payoff needs deps-surface up-weighting at fusion (follow-up). Part of #61. Closes #62. Claude-Session: https://claude.ai/code/session_01D229oNHVN1drd1mdbQL5MV
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds an opt-in embedded
depssurface toir.strategy.Package(issue #62, tracking #61).Package(embed_deps=True)emits one extraSurface(kind="deps", granularity="field")serializing the bare dependency names (version specifiers / extras / markers stripped viair.graph._dep_name), e.g."Depends on: sentence-transformers, networkx, meshed".description(0) /readme_chunksurface_indexcontract is unchanged.False(progressive disclosure — today's behavior). Folds into the strategy id, so toggling re-decomposes incrementally; round-trips viastrategy_to_spec.deps_templateoverrides the serialization.ir/strategy.py(deps are already scanned bysources._scan_packages).Per the #38 decision rule this is a single-shot retrieval-quality change → lands in
ir.Why
The 231-package run found the dependency list was the single most discriminative signal, yet
Packagestored deps as a filter field only and never embedded them — so the index couldn't see that a package depends onmeshed/networkx(⇒ graphs) orsentence-transformers/ef(⇒ embeddings).Measured (all-MiniLM, hybrid, k=20, vs
Package()baseline)Using the #66 harness + the private benchmark:
Per-package (the real story): dep-revealing packages climb sharply —
allude83→28 (depends onmeshed),xcosmo103→40 (depends oncosmograph) — while a true non-dependent stays flat:au22→23 (no graph dep, so deps correctly don't lift it — the FP guard holds). This confirms the lift is real dependency signal, not added text volume.As an equal-weighted surface the lift mostly lands in the rank 20–50 band, so aggregate nDCG@20 moves modestly (+0.023 embeddings) and a few tail items jitter ±1–4 ranks. Getting
allude/xcosmointo the top-20 needs deps-surface up-weighting at fusion → filed as #68. Whether to flipembed_depson by default (and rebuild the livepackagescorpus) is gated on that experiment.Tests
6 new
tests/test_strategy.pycases (off-by-default, surface emitted + appended last, empty-deps → no surface, custom template,_default_deps_textstrip/dedup/lowercase, spec round-trip). Full suite 420 passed; ruff E/W/F/B clean at LL88.Part of #61. Closes #62. Follow-up: #68.
https://claude.ai/code/session_01D229oNHVN1drd1mdbQL5MV