Skip to content

traverse(query, store, policy) — pluggable graph traversal with operator-enforced safety #47

Description

@thorwhalen

Problem

With edges in place (#46), ir needs the query-time traversal operator. Report 12's central claim: recursive retrieval, routing, RAPTOR collapsed-tree, PPR/spreading-activation, and beam walks are one operator — "score frontier → select → expand → test stop" — parameterized by a pluggable policy; and the engineering substance is termination: safety ("will it stop?" — structural, enforced by the operator) vs sufficiency ("should it stop?" — injected, fallible). Graphs here may contain directed cycles (citations, cross-refs), so safety cannot be a policy's responsibility.

Scope (report 12 §5 contract, adapted to ir idioms)

  • traverse(query, store, *, policy, max_depth=..., node_budget=..., k=...) -> list[SearchHit] with a WalkPolicy protocol: seed / score / select / expand / stop. Safety primitives (visited-set on (source, artifact_id), depth cap, node budget) are non-optional and enforced by traverse itself — report 12 calls this "the single highest-leverage robustness decision". stop() is the injected sufficiency check; safe default is None (run to budget).
  • First policy, pure-vector (no LLM in the query loop): summary-routing / collapsed-tree — seed on summary/description surfaces, expand down to the artifact's chunks (the document-summary-index pattern; ir's Skill/Package corpora already index description surfaces). RAPTOR's own ablation: collapsed-tree ≥ tree-traversal.
  • Deferred to follow-ups (record, don't build): PPRPolicy (HippoRAG-2 style; the best multi-hop/cost trade-off), router/recursive policies, anything LLM-in-the-loop (cost-gated; the when-to-traverse decision is the agent layer's — raglab).
  • Flat stays the default: traverse is opt-in; promotion of any policy requires beating flat+rerank on our eval set (report 12: flat wins simple lookup decisively; GraphRAG-global ≈57×/≈210× cost). Wire an eval harness case before promoting (ir.eval has the machinery).
  • Returned hits carry provenance of the walk (e.g. metadata["walk_depth"], seed id) — JSON-clean, additive.

Acceptance criteria

  • A cyclic edge set terminates: visited-set + depth + budget enforced by the operator regardless of policy behavior (test with an adversarial policy that never stops).
  • Collapsed-tree policy routes a synopsis/description match down to that artifact's chunks on a hermetic corpus, and beats flat top-k on a constructed routing case.
  • PPR expressible as a degenerate policy (closed-form in score(), stop() immediately true) — shape-tested, not implemented.
  • select/disclose compose downstream unchanged; flat search paths untouched.

Size: L. Depends on #46; synergizes with #48 (synopsis surfaces). Capability 1 (ir). Refs ADR #43, report 12, epic #38 (boundary revision recorded there).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions