Skip to content

traverse(query, store, policy) — pluggable graph traversal with operator-enforced safety (#47)#54

Merged
thorwhalen merged 3 commits into
masterfrom
feature/traverse-operator
Jun 13, 2026
Merged

traverse(query, store, policy) — pluggable graph traversal with operator-enforced safety (#47)#54
thorwhalen merged 3 commits into
masterfrom
feature/traverse-operator

Conversation

@thorwhalen

Copy link
Copy Markdown
Member

Closes #47. The query-time traversal operator (ADR discussion #43, report 12), on the GraphStore substrate from #46. Design decisions recorded in the issue comment. Second of the two-PR linked-retrieval arc.

What

  • ir/traverse.py (new): traverse(query, store, *, policy, max_depth, node_budget, k) -> list[SearchHit]. The loop — score frontier → select → commit → expand — is the operator's; the safety primitives are the operator's, enforced regardless of policy: a visited-set (keyed on policy.node_id), a depth cap, and a node budget live in WalkState, so a cyclic graph or a never-stopping policy still terminates.
  • WalkPolicy protocol (report 12's seed/score/select/expand/stop, plus node_id for the visited-set and to_hit for materialization — to_hit → None marks a router-only node). store is passed to the policy verbatim; the operator never interprets it (a Corpus for collapsed-tree, a CorpusGraph for artifact-link policies).
  • collapsed_tree_policy — the first shipped policy, pure-vector (no LLM in the loop): seeds on summary surfaces (routers, not emitted), descends to the artifact's leaf chunks (the results), scored by cosine. A query that matches an artifact's summary surfaces that artifact's best chunk.

Acceptance (from #47)

  • A cyclic edge set terminates — visited-set + depth + budget enforced by the operator regardless of policy (tested with an adversarial never-stop policy, and with a policy emitting unbounded fresh node ids that the visited-set can't dedup — node budget is the backstop).
  • Collapsed-tree routes a summary match down to that artifact's chunks and beats flat top-k on a constructed routing case: the gold answer chunk lands at rank 0 under traversal vs rank 3 under flat, because the distractor (whose chunk matches the query terms but whose summary does not) is excluded by routing before its chunk can compete.
  • PPR expressible as a degenerate policy (closed-form score, stop immediately true, expand via CorpusGraph.neighbors) — shape-tested, not implemented or promoted.
  • select / disclose compose downstream unchanged; flat search paths untouched (traverse is opt-in; flat stays the default).
  • Hermetic tests; walk provenance (metadata["walk_depth"] / ["seed"]) additive + JSON-clean.

24 tests in tests/test_traverse.py; 343 total pass. An adversarial multi-agent review will run before merge.

…policy (#47)

Capability 1's traversal operator (ADR discussion #43, report 12), on
the GraphStore substrate from #46:

- ir/traverse.py: traverse(query, store, *, policy, max_depth,
  node_budget, k) -> list[SearchHit]. The loop (score frontier ->
  select -> commit -> expand) is the operator's; the SAFETY primitives
  are non-negotiable and enforced by the operator — a visited-set (on
  the policy's node id), depth cap, and node budget live in WalkState,
  so a cyclic graph or a never-stopping policy still terminates.
- WalkPolicy protocol (seed/score/select/expand/stop + node_id +
  to_hit, where to_hit->None marks a router-only node). store is passed
  to the policy verbatim — the operator never interprets it.
- collapsed_tree_policy: pure-vector summary-routing. Seeds on summary
  surfaces (routers, not emitted), descends to the artifact's leaf
  chunks (the results). Routes a query that matches an artifact's
  summary down to its best chunk — beating flat top-k on a constructed
  routing case where the gold chunk is buried under a distractor whose
  summary doesn't match (so routing excludes it).
- PPR is expressible as a degenerate policy (closed-form score, stop
  immediately, expand via CorpusGraph.neighbors) — shape-tested, not
  shipped. Flat stays the default; traverse is opt-in.
- Walk provenance (walk_depth, seed) additive + JSON-clean; select /
  disclose compose unchanged. README + exports.

24 new tests; 343 total pass.
… silent-empty

DFLT_SUMMARY_KINDS includes 'document'/'capability', so a WholeText- or
Skill-only corpus (one summary-kind surface per artifact, no leaf kinds)
had every surface classified as a router: to_hit returned None and expand
found no leaves, so traverse() silently returned [].

Make the router/leaf split structural rather than kind-based: a summary
surface routes (and is suppressed) only when its artifact actually has
leaf surfaces to descend to. A leaf-less summary is emitted directly, so
a single-surface corpus degrades to flat-over-summaries. Keeps the broad
summary_kinds defaults on purpose (that is what lets such corpora seed).

Regression tests: WholeText + Skill corpora no longer return empty;
Package (genuine tree) still routes its description summary.
…s completeness

Adversarial review (PR #54) found the leaf-less-emit fix's most load-bearing
behavior — the PER-ARTIFACT has_leaves decision — was unpinned: replacing it
with a corpus-global flag still passed every test. Add:

- test_mixed_corpus_routes_leaf_having_emits_leaf_less: one walk over a corpus
  with both a leaf-having (routes) and a leaf-less (emits) artifact; fails under
  the global-flag mutation.
- completeness assertions on the WholeText/Skill tests (emit ALL leaf-less
  seeds, not just the top one) — guards against a select[:1]/early-stop regress.
- test_chunked_only_corpus_is_out_of_scope_documented_boundary: pins the
  documented boundary (chunk-only corpus -> [] -> use ir.search).
@thorwhalen thorwhalen merged commit 509e9ff into master Jun 13, 2026
12 checks passed
@thorwhalen thorwhalen deleted the feature/traverse-operator branch June 13, 2026 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

traverse(query, store, policy) — pluggable graph traversal with operator-enforced safety

1 participant