Skip to content

Build a trace-replay regression harness for policy changes #213

Description

@dgenio

Summary

Add a harness that replays a recorded set of grant/invoke decisions (principals,
capabilities, args metadata, outcomes) against a candidate policy and reports
decision diffs — so policy edits can be validated against historical traffic before
deployment.

Why this matters

Policy changes are the highest-blast-radius configuration change in this system: one
edited rule can silently widen access or break every agent in production. Shadow
mode (#139) compares policies on live traffic; a replay harness covers the
pre-deployment gap using recorded traces, and gives policy authors a deterministic
"what would have changed" answer. It also complements the fixture-based policy
testing framework (#138) with real-traffic coverage.

Current evidence

External context

Decision-replay ("what-if") tooling is an established pattern in authorization
systems for validating policy changes against historical request logs.

Proposed implementation

  1. Define a DecisionRecord (serializable subset of trace + grant context) and an
    exporter from TraceStore (depends on ISSUE 7 for denials, ISSUE 8 for querying).
  2. replay(records, engine) -> DecisionDiff listing flips (allow→deny, deny→allow,
    reason-code changes), deterministic ordering.
  3. Ship as a module (e.g., alongside policy testing utilities) plus an example
    script under examples/.
  4. Coordinate scope with [DX] Policy testing framework: assert allow/deny/ask decisions as fixtures #138/[DX] Policy shadow mode: evaluate a candidate policy alongside the active one #139 in the issue body so the three pieces compose.

AI-agent execution notes

Acceptance criteria

  • Replaying records against the same policy yields an empty diff.
  • A rule change produces the expected flip set in tests.
  • Rate-limit-dependent decisions are handled explicitly (documented behavior).

Test plan

Unit tests with synthetic record sets and policy variants; an example script run in
make example if added to the Makefile. Run make ci.

Documentation plan

Section in docs/capabilities.md or future policy docs; cross-link #138/#139;
CHANGELOG Added.

Migration and compatibility notes

Not expected to require migration.

Risks and tradeoffs

Replay fidelity is limited by what traces capture (args are redacted — ISSUE 6);
document that replay validates policy structure, not arg-dependent rules, unless
arg metadata is retained.

Suggested labels

testing, product, developer-experience

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions