Skip to content

Conditional synopsis: auto-apply ir.with_synopsis only to thin-description packages to rescue empty-pyproject false negatives #63

Description

@thorwhalen

Context

Thread: #61. ir.with_synopsis (the document-summary-index fix from ADR#43 / report 12) already exists in ir/synopsis.py and would write a richer LLM topical surface for exactly the packages that failed — but it is OPT-IN, was NOT used in this run, and when used wraps every artifact (paying LLM cost on the 170+ packages that already have good descriptions). There is no cheap, targeted "only synthesize where the description is thin" path. Single-shot indexing improvement ⇒ ir per #38.

Problem (with our FN evidence)

Nearly every false negative had an empty/thin [project].description, leaving ir only a weak README signal: chromadol (0.02), http_cosmo_prep (0.01), cosmo_data_prep/allude/unbox (no score), and the thin-description CORE packages imbed/meshed/linked (empty descriptions, surviving on README alone and fragile).

Proposal

Add a conditional-synopsis path in ir/synopsis.py:

with_synopsis(strategy, when=predicate)  # predicate: (Artifact | filter_fields) -> bool

plus a shipped thin_description_predicate (true when the description is empty / below N chars, or has_readme is the only signal). _SynopsisStrategy.decompose (synopsis.py:203-217) already drops empty synopses; extend it to skip synthesis entirely when when(...) is False, so cost is paid only on the ~dozen thin-description packages, not all 200. Default when='always' preserves current with_synopsis semantics (progressive disclosure). LLM call stays lazy/opt-in via aix exactly as today — import ir stays offline. The synopsis prompt should be topical/classification-oriented ("what domain is this package about"), complementing the ef instruction-conditioning issue.

Experiment

Three corpora on all-MiniLM, identical except strategy: (a) Package() baseline; (b) with_synopsis(Package()) unconditional; (c) with_synopsis(Package(), when=thin_description_predicate). Inject a deterministic test-double synthesizer for a hermetic CI run, and a real aix synopsis for the quality run. Focus the eval on the thin-description FN set {chromadol, http_cosmo_prep, cosmo_data_prep, allude, unbox, imbed, meshed, linked} as gold positives (from the private benchmark repo thorwhalen/ir-eval-data (access-controlled) — package_relevance_labels.jsonl (full 231-package graded gold labeling), named_sets.json (per-theme distractors + hard_positives), and benchmark_analysis.json (frozen all-MiniLM-L6-v2 baseline: precision@K = recall@K ≈ 0.42 for both themes). Clone with repo access; package names are not mirrored into this public repo.). Count LLM invocations as a cost metric.

Success metric

  • Recall@10 and mean rank of the 8 thin-description FN packages: chromadol/http_cosmo_prep move from <0.03 into top-10; imbed/meshed/linked rank robustly.
  • LLM-synthesis call count for (c) ≤ 15% of (b) (only thin packages synthesized) while (c) matches (b)'s recall on the FN set.
  • No nDCG regression on the well-described packages.

Data

Full 231-package corpus + the thin-description FN subset as gold positives + the verified labels to confirm no regression, all from the private benchmark repo thorwhalen/ir-eval-data (access-controlled) — package_relevance_labels.jsonl (full 231-package graded gold labeling), named_sets.json (per-theme distractors + hard_positives), and benchmark_analysis.json (frozen all-MiniLM-L6-v2 baseline: precision@K = recall@K ≈ 0.42 for both themes). Clone with repo access; package names are not mirrored into this public repo..

https://claude.ai/code/session_01D229oNHVN1drd1mdbQL5MV

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions