Context
Thread: #61. ir.with_synopsis (the document-summary-index fix from ADR#43 / report 12) already exists in ir/synopsis.py and would write a richer LLM topical surface for exactly the packages that failed — but it is OPT-IN, was NOT used in this run, and when used wraps every artifact (paying LLM cost on the 170+ packages that already have good descriptions). There is no cheap, targeted "only synthesize where the description is thin" path. Single-shot indexing improvement ⇒ ir per #38.
Problem (with our FN evidence)
Nearly every false negative had an empty/thin [project].description, leaving ir only a weak README signal: chromadol (0.02), http_cosmo_prep (0.01), cosmo_data_prep/allude/unbox (no score), and the thin-description CORE packages imbed/meshed/linked (empty descriptions, surviving on README alone and fragile).
Proposal
Add a conditional-synopsis path in ir/synopsis.py:
with_synopsis(strategy, when=predicate) # predicate: (Artifact | filter_fields) -> bool
plus a shipped thin_description_predicate (true when the description is empty / below N chars, or has_readme is the only signal). _SynopsisStrategy.decompose (synopsis.py:203-217) already drops empty synopses; extend it to skip synthesis entirely when when(...) is False, so cost is paid only on the ~dozen thin-description packages, not all 200. Default when='always' preserves current with_synopsis semantics (progressive disclosure). LLM call stays lazy/opt-in via aix exactly as today — import ir stays offline. The synopsis prompt should be topical/classification-oriented ("what domain is this package about"), complementing the ef instruction-conditioning issue.
Experiment
Three corpora on all-MiniLM, identical except strategy: (a) Package() baseline; (b) with_synopsis(Package()) unconditional; (c) with_synopsis(Package(), when=thin_description_predicate). Inject a deterministic test-double synthesizer for a hermetic CI run, and a real aix synopsis for the quality run. Focus the eval on the thin-description FN set {chromadol, http_cosmo_prep, cosmo_data_prep, allude, unbox, imbed, meshed, linked} as gold positives (from the private benchmark repo thorwhalen/ir-eval-data (access-controlled) — package_relevance_labels.jsonl (full 231-package graded gold labeling), named_sets.json (per-theme distractors + hard_positives), and benchmark_analysis.json (frozen all-MiniLM-L6-v2 baseline: precision@K = recall@K ≈ 0.42 for both themes). Clone with repo access; package names are not mirrored into this public repo.). Count LLM invocations as a cost metric.
Success metric
- Recall@10 and mean rank of the 8 thin-description FN packages:
chromadol/http_cosmo_prep move from <0.03 into top-10; imbed/meshed/linked rank robustly.
- LLM-synthesis call count for (c) ≤ 15% of (b) (only thin packages synthesized) while (c) matches (b)'s recall on the FN set.
- No nDCG regression on the well-described packages.
Data
Full 231-package corpus + the thin-description FN subset as gold positives + the verified labels to confirm no regression, all from the private benchmark repo thorwhalen/ir-eval-data (access-controlled) — package_relevance_labels.jsonl (full 231-package graded gold labeling), named_sets.json (per-theme distractors + hard_positives), and benchmark_analysis.json (frozen all-MiniLM-L6-v2 baseline: precision@K = recall@K ≈ 0.42 for both themes). Clone with repo access; package names are not mirrored into this public repo..
https://claude.ai/code/session_01D229oNHVN1drd1mdbQL5MV
Context
Thread: #61.
ir.with_synopsis(the document-summary-index fix from ADR#43 / report 12) already exists inir/synopsis.pyand would write a richer LLM topical surface for exactly the packages that failed — but it is OPT-IN, was NOT used in this run, and when used wraps every artifact (paying LLM cost on the 170+ packages that already have good descriptions). There is no cheap, targeted "only synthesize where the description is thin" path. Single-shot indexing improvement ⇒irper #38.Problem (with our FN evidence)
Nearly every false negative had an empty/thin
[project].description, leavingironly a weak README signal:chromadol(0.02),http_cosmo_prep(0.01),cosmo_data_prep/allude/unbox(no score), and the thin-description CORE packagesimbed/meshed/linked(empty descriptions, surviving on README alone and fragile).Proposal
Add a conditional-synopsis path in
ir/synopsis.py:plus a shipped
thin_description_predicate(true when the description is empty / below N chars, orhas_readmeis the only signal)._SynopsisStrategy.decompose(synopsis.py:203-217) already drops empty synopses; extend it to skip synthesis entirely whenwhen(...)is False, so cost is paid only on the ~dozen thin-description packages, not all 200. Defaultwhen='always'preserves currentwith_synopsissemantics (progressive disclosure). LLM call stays lazy/opt-in viaaixexactly as today —import irstays offline. The synopsis prompt should be topical/classification-oriented ("what domain is this package about"), complementing the ef instruction-conditioning issue.Experiment
Three corpora on
all-MiniLM, identical except strategy: (a)Package()baseline; (b)with_synopsis(Package())unconditional; (c)with_synopsis(Package(), when=thin_description_predicate). Inject a deterministic test-double synthesizer for a hermetic CI run, and a realaixsynopsis for the quality run. Focus the eval on the thin-description FN set{chromadol, http_cosmo_prep, cosmo_data_prep, allude, unbox, imbed, meshed, linked}as gold positives (from the private benchmark repothorwhalen/ir-eval-data(access-controlled) —package_relevance_labels.jsonl(full 231-package graded gold labeling),named_sets.json(per-themedistractors+hard_positives), andbenchmark_analysis.json(frozenall-MiniLM-L6-v2baseline: precision@K = recall@K ≈ 0.42 for both themes). Clone with repo access; package names are not mirrored into this public repo.). Count LLM invocations as a cost metric.Success metric
chromadol/http_cosmo_prepmove from <0.03 into top-10;imbed/meshed/linkedrank robustly.Data
Full 231-package corpus + the thin-description FN subset as gold positives + the verified labels to confirm no regression, all from the private benchmark repo
thorwhalen/ir-eval-data(access-controlled) —package_relevance_labels.jsonl(full 231-package graded gold labeling),named_sets.json(per-themedistractors+hard_positives), andbenchmark_analysis.json(frozenall-MiniLM-L6-v2baseline: precision@K = recall@K ≈ 0.42 for both themes). Clone with repo access; package names are not mirrored into this public repo..https://claude.ai/code/session_01D229oNHVN1drd1mdbQL5MV