Build a trace-replay regression harness for policy changes

## Summary

Add a harness that replays a recorded set of grant/invoke decisions (principals,
capabilities, args metadata, outcomes) against a *candidate* policy and reports
decision diffs — so policy edits can be validated against historical traffic before
deployment.

## Why this matters

Policy changes are the highest-blast-radius configuration change in this system: one
edited rule can silently widen access or break every agent in production. Shadow
mode (#139) compares policies on *live* traffic; a replay harness covers the
*pre-deployment* gap using recorded traces, and gives policy authors a deterministic
"what would have changed" answer. It also complements the fixture-based policy
testing framework (#138) with real-traffic coverage.

## Current evidence

- `policy.py` `DefaultPolicyEngine.evaluate()` is a pure-ish decision function over (principal, capability, context) — replayable by construction.
- `trace.py` `ActionTrace` records capability, principal, and outcome — most of a replay record already exists (denials need ISSUE 7 to be recorded).
- Open issues #138 and #139 are adjacent but neither replays recorded history against a candidate policy offline.

## External context

Decision-replay ("what-if") tooling is an established pattern in authorization
systems for validating policy changes against historical request logs.

## Proposed implementation

1. Define a `DecisionRecord` (serializable subset of trace + grant context) and an
   exporter from `TraceStore` (depends on ISSUE 7 for denials, ISSUE 8 for querying).
2. `replay(records, engine) -> DecisionDiff` listing flips (allow→deny, deny→allow,
   reason-code changes), deterministic ordering.
3. Ship as a module (e.g., alongside policy testing utilities) plus an example
   script under `examples/`.
4. Coordinate scope with #138/#139 in the issue body so the three pieces compose.

## AI-agent execution notes

- Inspect first: `policy.py` (evaluate signature and inputs), `trace.py`, `policy_reasons.py`, `tests/test_policy.py`.
- Rate-limit state makes replay order-sensitive: replay must either reset limiter state per run or exclude rate-based denials from diffs (flag them separately).
- Determinism: identical inputs → identical diff.
- Do not add persistence here (that is #126); records are plain dicts/JSONL.

## Acceptance criteria

- Replaying records against the same policy yields an empty diff.
- A rule change produces the expected flip set in tests.
- Rate-limit-dependent decisions are handled explicitly (documented behavior).

## Test plan

Unit tests with synthetic record sets and policy variants; an example script run in
`make example` if added to the Makefile. Run `make ci`.

## Documentation plan

Section in `docs/capabilities.md` or future policy docs; cross-link #138/#139;
CHANGELOG `Added`.

## Migration and compatibility notes

Not expected to require migration.

## Risks and tradeoffs

Replay fidelity is limited by what traces capture (args are redacted — ISSUE 6);
document that replay validates policy structure, not arg-dependent rules, unless
arg metadata is retained.

## Suggested labels

testing, product, developer-experience


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build a trace-replay regression harness for policy changes #213

Summary

Why this matters

Current evidence

External context

Proposed implementation

AI-agent execution notes

Acceptance criteria

Test plan

Documentation plan

Migration and compatibility notes

Risks and tradeoffs

Suggested labels

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Build a trace-replay regression harness for policy changes #213

Description

Summary

Why this matters

Current evidence

External context

Proposed implementation

AI-agent execution notes

Acceptance criteria

Test plan

Documentation plan

Migration and compatibility notes

Risks and tradeoffs

Suggested labels

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions