You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a harness that replays a recorded set of grant/invoke decisions (principals,
capabilities, args metadata, outcomes) against a candidate policy and reports
decision diffs — so policy edits can be validated against historical traffic before
deployment.
Why this matters
Policy changes are the highest-blast-radius configuration change in this system: one
edited rule can silently widen access or break every agent in production. Shadow
mode (#139) compares policies on live traffic; a replay harness covers the pre-deployment gap using recorded traces, and gives policy authors a deterministic
"what would have changed" answer. It also complements the fixture-based policy
testing framework (#138) with real-traffic coverage.
Current evidence
policy.pyDefaultPolicyEngine.evaluate() is a pure-ish decision function over (principal, capability, context) — replayable by construction.
trace.pyActionTrace records capability, principal, and outcome — most of a replay record already exists (denials need ISSUE 7 to be recorded).
Decision-replay ("what-if") tooling is an established pattern in authorization
systems for validating policy changes against historical request logs.
Proposed implementation
Define a DecisionRecord (serializable subset of trace + grant context) and an
exporter from TraceStore (depends on ISSUE 7 for denials, ISSUE 8 for querying).
Inspect first: policy.py (evaluate signature and inputs), trace.py, policy_reasons.py, tests/test_policy.py.
Rate-limit state makes replay order-sensitive: replay must either reset limiter state per run or exclude rate-based denials from diffs (flag them separately).
Replaying records against the same policy yields an empty diff.
A rule change produces the expected flip set in tests.
Rate-limit-dependent decisions are handled explicitly (documented behavior).
Test plan
Unit tests with synthetic record sets and policy variants; an example script run in make example if added to the Makefile. Run make ci.
Documentation plan
Section in docs/capabilities.md or future policy docs; cross-link #138/#139;
CHANGELOG Added.
Migration and compatibility notes
Not expected to require migration.
Risks and tradeoffs
Replay fidelity is limited by what traces capture (args are redacted — ISSUE 6);
document that replay validates policy structure, not arg-dependent rules, unless
arg metadata is retained.
Summary
Add a harness that replays a recorded set of grant/invoke decisions (principals,
capabilities, args metadata, outcomes) against a candidate policy and reports
decision diffs — so policy edits can be validated against historical traffic before
deployment.
Why this matters
Policy changes are the highest-blast-radius configuration change in this system: one
edited rule can silently widen access or break every agent in production. Shadow
mode (#139) compares policies on live traffic; a replay harness covers the
pre-deployment gap using recorded traces, and gives policy authors a deterministic
"what would have changed" answer. It also complements the fixture-based policy
testing framework (#138) with real-traffic coverage.
Current evidence
policy.pyDefaultPolicyEngine.evaluate()is a pure-ish decision function over (principal, capability, context) — replayable by construction.trace.pyActionTracerecords capability, principal, and outcome — most of a replay record already exists (denials need ISSUE 7 to be recorded).External context
Decision-replay ("what-if") tooling is an established pattern in authorization
systems for validating policy changes against historical request logs.
Proposed implementation
DecisionRecord(serializable subset of trace + grant context) and anexporter from
TraceStore(depends on ISSUE 7 for denials, ISSUE 8 for querying).replay(records, engine) -> DecisionDifflisting flips (allow→deny, deny→allow,reason-code changes), deterministic ordering.
script under
examples/.AI-agent execution notes
policy.py(evaluate signature and inputs),trace.py,policy_reasons.py,tests/test_policy.py.Acceptance criteria
Test plan
Unit tests with synthetic record sets and policy variants; an example script run in
make exampleif added to the Makefile. Runmake ci.Documentation plan
Section in
docs/capabilities.mdor future policy docs; cross-link #138/#139;CHANGELOG
Added.Migration and compatibility notes
Not expected to require migration.
Risks and tradeoffs
Replay fidelity is limited by what traces capture (args are redacted — ISSUE 6);
document that replay validates policy structure, not arg-dependent rules, unless
arg metadata is retained.
Suggested labels
testing, product, developer-experience