Skip to content

Synopsis surfaces: LLM-derived summaries as indexed surfaces (document-summary routing fuel) #48

Description

@thorwhalen

Problem

The motivating case for linked retrieval (ADR #43): run an LLM over each document to produce a synopsis, index the synopses, and route from a synopsis match down to the document's chunks. ir's data model reserved the slot — "synopsis" is a documented surface kind and the strategy docstrings name "AI synopsis / problem-class surfaces" as the extension point — but no strategy produces them. This is report 12's document-summary-index pattern (build-time LLM cost, ≈free at query time) and the natural fuel for the collapsed-tree policy (#47).

Scope

  • A synopsis-producing strategy decorator/wrapper (e.g. with_synopsis(strategy, *, synthesize=None)): wraps any IndexingStrategy, adds one synopsis surface per artifact. synthesize: Callable[[Artifact], str] is injectable; built lazily on oa only when omitted (the make_llm_* idiom — import ir stays offline; tests inject doubles).
  • Build-time cost is the regime (report 12: summary indexing ≈30 LLM calls/doc happens once); the ledger's incremental rebuild means only new/changed artifacts get synthesized.
  • Synopsis text rides the normal embed/index path (it's just a surface); surfaces={"synopsis"} search works immediately; PARENT/CHILD edges to the artifact's chunks land in the links view (links: typed-edge view on CorpusStore + GraphStore protocol (semantic link graph) #46) at build time.
  • Persistence/staleness: synopses are derived state — stamp the synthesizer identity (model/prompt hash) the way embedder_id/strategy_id are stamped, so a prompt change triggers re-synthesis, not silent staleness.

Acceptance criteria

  • build(source, strategy=with_synopsis(Chunked())) with an injected fake synthesizer indexes a synopsis surface per artifact, hermetically.
  • Search restricted to synopsis surfaces + traverse (traverse(query, store, policy) — pluggable graph traversal with operator-enforced safety #47) routes to the right chunks end-to-end.
  • Incremental rebuild re-synthesizes only changed artifacts; synthesizer identity stamped and staleness-checked.
  • Offline import preserved; no oa at import time.

Size: M. Depends on #46 (edges) and feeds #47 (routing policy). Capability 1 (ir). Refs ADR #43, report 12.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions