Skip to content

with_synopsis — LLM-derived synopsis surfaces as collapsed-tree routing fuel (#48)#55

Merged
thorwhalen merged 2 commits into
masterfrom
feature/synopsis-surfaces
Jun 13, 2026
Merged

with_synopsis — LLM-derived synopsis surfaces as collapsed-tree routing fuel (#48)#55
thorwhalen merged 2 commits into
masterfrom
feature/synopsis-surfaces

Conversation

@thorwhalen

Copy link
Copy Markdown
Member

Closes #48.

Adds ir.with_synopsis(strategy, *, synthesize=None): a strategy wrapper that adds one LLM-derived synopsis surface per artifact at build time — the document-summary-index / collapsed-tree fuel (report 12, ADR #43). A query matching the synopsis routes (via ir.traverse + collapsed_tree_policy) down to that artifact's chunks; build-time cost, ≈free at query time, fully incremental.

What landed

  • ir/synopsis.pywith_synopsis (prepends the synopsis so it is the first summary surface → the router; empty synopsis dropped), make_llm_synthesizer (Artifact -> str, mirrors make_llm_formulator/make_llm_selector: injectable summarize double, lazy oa on first synthesis so import ir stays offline, errors/empty → "" never a fabricated summary), and the Synthesizer type.
  • Staleness via the existing ledger — the wrapper exposes scalar synthesizer_id + synopsis_kind and holds the inner strategy; index._strategy_id now recurses into nested strategies, so a model/prompt change OR an inner-strategy-param change re-synthesizes exactly the affected artifacts. No new bookkeeping.
  • strategy._text_of → public text_of (now a cross-module SSOT for artifact-text extraction; _text_of kept as an alias).
  • Exports + README note.

Design decisions (posted on #48 before coding)

  • Synopsis is prepended → router precedence over a terse description.
  • Routing is surface-level (within-artifact records_for_artifact descent), so no links-view edges are written — that view is for cross-artifact edges (different grain). Refines the issue body's wording.
  • Caveat documented: with_synopsis + edge_extractor= re-runs synthesis every build (eager edge ingest calls decompose for all artifacts). The common path (no edge_extractor) stays fully incremental.

Acceptance criteria

  • build(strategy=with_synopsis(Chunked())) with an injected fake synthesizer indexes a synopsis surface per artifact, hermetically.
  • Search restricted to synopsis surfaces + traverse routes to the right chunks end-to-end (gold routed, trap whose synopsis didn't match excluded though its chunk matches the query).
  • Incremental rebuild re-synthesizes only changed artifacts; synthesizer identity stamped + staleness-checked (and inner-strategy change too).
  • Offline import preserved; no oa at import time.

10 hermetic tests (light embedder + memory store + injected synthesizers); 358 total green, lint + format clean.

… surfaces (#48)

The document-summary-index / collapsed-tree fuel (report 12, ADR #43): run a
summarizer over each artifact at build time, index it as a 'synopsis' surface,
and let the collapsed-tree policy (#47) route a synopsis match to the artifact's
chunks. Build-time cost, ~free at query time, incremental.

ir/synopsis.py:
- with_synopsis(strategy, *, synthesize=None, synthesizer_id=None) wraps any
  IndexingStrategy and PREPENDS one synopsis surface (so it is the first summary
  surface -> the router). Empty synopsis is dropped.
- make_llm_synthesizer (Artifact -> str) mirrors make_llm_formulator/selector:
  injectable summarize double, lazy oa on first synthesis (import ir stays
  offline), errors/empty -> '' (never a fabricated summary).
- Synthesizer type exported.

Staleness reuses the ledger mechanism: the wrapper exposes scalar synthesizer_id
+ synopsis_kind and holds the inner strategy; index._strategy_id now RECURSES
into nested strategies, so a model/prompt change OR an inner-strategy-param
change re-synthesizes exactly the affected artifacts. No new bookkeeping.

Routing needs no edges: collapsed-tree descends synopsis->chunks within an
artifact via records_for_artifact (surface grain), distinct from the #46 links
view (cross-artifact edges). Promote strategy._text_of -> public text_of (now a
cross-module SSOT; _text_of kept as alias). 10 hermetic tests, 358 total.
…esizer (review)

Adversarial review of PR #55 confirmed two should-fixes (both contradicted
docstrings shipped in this PR) and four test-hardening nits.

should-fix 1 — text_key alignment: the default synthesizer extracted text via
text_of(raw) with NO text_key, so with_synopsis(Chunked(text_key='body')) +
the default synthesizer summarized a different field than the strategy indexed
(silent wrong-field synopsis). Thread the inner strategy's text_key into the
default: make_llm_synthesizer gains text_key=; _SynopsisStrategy passes
getattr(strategy, 'text_key', None). Restores the SSOT-alignment promise.

should-fix 2 — non-identifiable synthesizer: an unnamed lambda / local closure
falls through to a '<lambda>'/'<locals>' qualname that distinct callables share,
so swapping one for another left strategy_id unchanged -> silent staleness,
contradicting 'no silent staleness'. Now warn (UserWarning) and use a 'custom'
sentinel, surfacing the lost guarantee at construction; named functions and
explicit/stamped ids still track.

Tests (+7): default-synth text_key threading; lambda-warns + named-tracks;
with_synopsis(Package()) synopsis-over-description router precedence;
mutation-resistant offline (poison oa); default-id content-stability;
non-str synthesize at decompose; file-backed incremental round-trip. 365 total.
@thorwhalen thorwhalen merged commit 746d859 into master Jun 13, 2026
12 checks passed
@thorwhalen thorwhalen deleted the feature/synopsis-surfaces branch June 13, 2026 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Synopsis surfaces: LLM-derived summaries as indexed surfaces (document-summary routing fuel)

1 participant