dgenio · dgenio · Jun 20, 2026 · Jun 18, 2026 · Jun 18, 2026 · Jun 19, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -48,6 +48,8 @@ jobs:
           python examples/tutorial.py
           python examples/readme_quickstart.py
           python examples/trace_export_demo.py
+          python examples/ocsf_export_demo.py
+          python examples/trace_replay_demo.py
 
   conformance_stub:
     name: "Weaver Spec Conformance Stub (v0.1.0)"

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,48 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+- **Handle expansions and policy denials are now first-class audit records
+  (#175).** `ActionTrace` gained an additive `event_type`
+  (`invoke`/`expand`/`deny`) and `reason_code`. A successful `Kernel.expand()`
+  records an `expand` event (and expansion Frames now carry a non-empty
+  `Provenance.principal_id`); a `PolicyDenied` grant records a `deny` event with
+  the stable reason code before the exception propagates, so `explain()` and the
+  trace listing answer "who was refused what, when, and why" (I-02).
+- **TraceStore query API (#177).** New `TraceQuery` dataclass and pure
+  `query_traces()` filter by principal, capability, event type, outcome, reason
+  code, and time window (since-inclusive / until-exclusive) with deterministic
+  `(invoked_at, action_id)` ordering and pagination. Exposed as
+  `TraceStore.query()` / `Kernel.query_traces()` and added to
+  `TraceStoreProtocol`, so the SQLite and JSONL backends share the contract.
+- **Programmatic kernel metrics counters (#179).** `Kernel.stats` returns an
+  immutable `StatsSnapshot` (grants, denials by reason code, invocations,
+  invocation failures, fallback activations, redaction events, budget downgrades,
+  handle stores, expansions); `Kernel.reset_stats()` zeroes them. Dependency-free
+  and lock-guarded — telemetry without exporting the full trail or installing the
+  `otel` extra.
+- **OCSF / OWASP-AOS SIEM export (#176).** `trace_to_ocsf()` / `traces_to_ocsf()`
+  map any `ActionTrace` (invoke/expand/deny) to OCSF API Activity (class 6003)
+  events, AOS-enriched, as a pure dependency-free dict construction. See the SIEM
+  section in `docs/integrations.md` for the field-mapping table and a runnable
+  recipe (`examples/ocsf_export_demo.py`).
+- **Policy-replay regression harness (#213).** `DecisionRecord`,
+  `record_decision()`, and `replay(records, engine) -> DecisionDiff` re-evaluate a
+  recorded decision corpus against a candidate policy and report allow→deny,
+  deny→allow, and reason-code flips deterministically. Rate-limit-dependent flips
+  are surfaced separately (`DecisionDiff.rate_limited`).
+  Companion: `examples/trace_replay_demo.py`.
+
+### Changed
+- **Bounded memory for in-memory audit and revocation state (#182).**
+  `TraceStore` now caps at `max_entries` (default 10 000) with oldest-first
+  eviction, a one-time warning, and an observable `evicted_count`. The revocation
+  store tracks each token's expiry and sweeps state for already-expired tokens
+  (lazily on an interval and via `HMACTokenProvider.sweep_revocations()`), never
+  un-revoking a live token. `RevocationStoreProtocol.track()` now takes an
+  `expires_at` argument and the protocol gained `sweep_expired()` (breaking for
+  custom revocation backends).
+
 ## [0.11.0] - 2026-06-19
 
 ### Fixed

diff --git a/Makefile b/Makefile
@@ -27,5 +27,7 @@ example:
 	python examples/evaluation_artifact_policy.py
 	python examples/trace_export_demo.py
 	python examples/persistent_audit_demo.py
+	python examples/ocsf_export_demo.py
+	python examples/trace_replay_demo.py
 
 ci: fmt-check lint type test example
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -158,6 +158,51 @@ Records every `ActionTrace`. `explain(action_id)` returns the full audit record.
 
 `export_action_trace` / `export_action_traces` serialise traces into a stable, versioned, JSON-serialisable shape for downstream analysis tools (distinct from the OpenTelemetry observability export); `Kernel.list_traces()` is the public accessor that feeds them the audit trail. See [trace_export.md](trace_export.md).
 
+#### Audited event types (#175)
+
+`ActionTrace.event_type` distinguishes three kinds of audited event, so the
+audit trail covers authorization decisions and data-access events, not only
+successful invocations (I-02):
+
+| `event_type` | Recorded when | Notable fields |
+|--------------|---------------|----------------|
+| `invoke` (default) | A capability invocation runs | `driver_id`, `result_summary`, `handle_id` |
+| `expand` | `Kernel.expand()` serves more rows of a handle | `handle_id`, `result_summary`; expansion Frames carry `Provenance.principal_id` |
+| `deny` | A `grant_capability()` call is rejected by policy | `reason_code` (stable `DenialReason`), redacted `error`; no token is issued |
+
+`reason_code` is populated for `deny` events. All three fields are additive with
+defaults, so a directly-constructed trace keeps the original `invoke` meaning.
+
+#### Querying the audit trail (#177)
+
+`Kernel.query_traces(TraceQuery(...))` (and `TraceStore.query(...)` on any
+backend) filters records by `principal_id`, `capability_id`, `event_type`,
+`outcome` (`succeeded`/`failed`), `reason_code`, and a `since`/`until` window
+(`since` inclusive, `until` exclusive), with `limit`/`offset` pagination. Results
+are ordered deterministically by `(invoked_at, action_id)`, so successive pages
+over an unchanged store are disjoint and complete. The pure `query_traces()`
+function applies the same semantics to any iterable of traces.
+
+#### Bounded memory (#182)
+
+The in-memory `TraceStore` caps itself at `max_entries` (default 10 000) and
+evicts oldest-first when exceeded; eviction is *loud* (first eviction logs a
+warning) and observable via `TraceStore.evicted_count`. Re-recording an existing
+`action_id` overwrites in place and never evicts. Deployments needing unbounded
+retention should use a durable backend. Revocation state is bounded similarly —
+see [Persistence & durable stores](#persistence--durable-stores) and
+[security.md](security.md).
+
+#### Kernel metrics counters (#179)
+
+`Kernel.stats` returns an immutable `StatsSnapshot` of aggregate counters
+(grants, denials by reason code, invocations, invocation failures, fallback
+activations, redaction events, budget downgrades, handle stores, expansions);
+`Kernel.reset_stats()` zeroes them. The counters are dependency-free and
+lock-guarded — cheap health-check telemetry that needs neither a trace export nor
+the `otel` extra. They are *aggregates*; the `TraceStore` remains the record of
+individual events.
+
 ## Persistence & durable stores
 
 The stateful stores are protocol-based seams (`weaver_kernel.stores`), mirroring
@@ -189,6 +234,14 @@ an append-only log that is easy to ship to a collector. Use
 or apply across workers sharing a database file. All durable backends use only
 the standard library (`sqlite3`, `json`) — no new runtime dependency.
 
+**Bounded revocation state (#182).** Every revocation backend tracks each
+token's `expires_at` and can `sweep_expired(now)` to drop bookkeeping for tokens
+that have already expired — they fail the verifier's expiry check regardless, so
+a sweep never un-revokes a *live* token. The in-memory store sweeps lazily on an
+interval; `HMACTokenProvider.sweep_revocations()` triggers it explicitly (call it
+on a schedule for durable backends). `RevocationStoreProtocol.track()` therefore
+takes an `expires_at` argument and the protocol includes `sweep_expired()`.
+
 **Verifiable audit chain.** Persisted traces are hash-chained
 (`prev_hash`/`record_hash`, HMAC-SHA256 keyed by `WEAVER_KERNEL_SECRET`).
 `verify_chain()` detects mutation, insertion, deletion, and reordering;

diff --git a/docs/capabilities.md b/docs/capabilities.md
@@ -222,3 +222,42 @@ Both built-in engines support `explain()`. If you bring a custom policy
 engine that implements only `PolicyEngine.evaluate`, `explain_denial` raises
 `AgentKernelError` with guidance — implement the `ExplainingPolicyEngine`
 protocol to enable structured explanations.
+
+## Validating a policy change with replay (#213)
+
+A policy edit is the highest-blast-radius change in the system: one rule can
+silently widen access or break every agent. The replay harness re-evaluates a
+corpus of recorded decisions against a *candidate* policy and reports the
+decision diff, so you get a deterministic "what would have changed" answer before
+deploying.
+
+```python
+from weaver_kernel import DefaultPolicyEngine, record_decision, replay
+
+baseline = DefaultPolicyEngine()
+# Build a corpus (a real one would come from historical traffic).
+records = [
+    record_decision(baseline, request, capability, principal, justification="..."),
+    # ...
+]
+
+diff = replay(records, candidate_engine)
+assert diff.empty          # replaying against the same engine → no flips
+for flip in diff.flips:    # allow_to_deny | deny_to_allow | reason_code_change
+    print(flip.record.capability.capability_id, flip.kind,
+          flip.baseline_reason_code, "->", flip.candidate_reason_code)
+```
+
+Determinism and fidelity:
+
+- Output order is the input record order; replaying records against the engine
+  that produced them yields `diff.empty`.
+- Rate-limit decisions are replay-order-sensitive (the default engine's limiter is
+  stateful), so flips involving `DenialReason.RATE_LIMITED` are surfaced in
+  `diff.rate_limited` rather than `diff.flips`.
+- Replay validates **policy structure** (role/justification/constraint rules), not
+  argument-dependent rules whose inputs the audit trail redacts.
+
+Runnable recipe: [`examples/trace_replay_demo.py`](../examples/trace_replay_demo.py).
+This complements shadow mode (live-traffic comparison) and the fixture-based
+policy testing framework with real-traffic, pre-deployment coverage.
diff --git a/docs/integrations.md b/docs/integrations.md
@@ -363,6 +363,45 @@ instrument_kernel(kernel)
 no-op. Use `weaver_kernel.otel.reset_instrumentation(kernel)` in tests to
 re-instrument with a different provider.
 
+## SIEM export (OCSF / OWASP AOS)
+
+OpenTelemetry feeds the *observability* pipeline; SIEMs speak **OCSF** (the Open
+Cybersecurity Schema Framework), the *security-operations* pipeline. The audit
+trail maps to OCSF **API Activity** events (class `6003`), enriched per the OWASP
+Agent Observability Standard (AOS), with no new dependency — the mapping is a pure
+dict construction.
+
+```python
+from weaver_kernel import traces_to_ocsf
+
+events = traces_to_ocsf(kernel.list_traces())   # list[dict], OCSF-shaped
+# ship `events` to your SIEM (one JSON object per event)
+```
+
+`trace_to_ocsf(trace)` maps a single record. Runnable recipe:
+[`examples/ocsf_export_demo.py`](../examples/ocsf_export_demo.py).
+
+Field mapping (kernel `ActionTrace` → OCSF API Activity 6003):
+
+| OCSF field | Source |
+|------------|--------|
+| `class_uid` / `class_name` | constant `6003` / `"API Activity"` |
+| `category_uid` / `category_name` | constant `6` / `"Application Activity"` |
+| `activity_id` / `activity_name` | `event_type`: invoke→Other(99), expand→Read(2), deny→Other(99) |
+| `type_uid` | `class_uid * 100 + activity_id` |
+| `status_id` / `status` | `2`/`Failure` when `error` is set, else `1`/`Success` |
+| `status_detail` | `error` (already redacted at record time) |
+| `severity_id` / `severity` | deny→Medium(3), else Informational(1) |
+| `time` | `invoked_at` as epoch milliseconds (UTC) |
+| `actor.user.uid` | `principal_id` |
+| `api.operation` / `api.service.name` | `capability_id` / `driver_id` (or `"weaver-kernel"`) |
+| `metadata` | product + OCSF version (`OCSF_VERSION`) + AOS extension marker |
+| `unmapped` | kernel specifics: `action_id`, `token_id`, `event_type`, `response_mode`, `sensitivity`, `reason_code`, `handle_id`, `result_summary` |
+
+The mapping is built only from already-redaction-safe trace fields, so exporting
+cannot widen the I-01 boundary. AOS is young, so the mapping is versioned and
+isolated in `weaver_kernel.ocsf`; output is validated structurally in the tests.
+
 ## Ecosystem integration patterns
 
 These reference flows show how agent-kernel composes with neighboring Weaver

diff --git a/docs/security.md b/docs/security.md
@@ -120,6 +120,38 @@ in-memory trace did not already hold and cannot widen the I-01 boundary.
 The CLI exposes verification to operators: `weaver-kernel audit verify --store
 audit.db` exits non-zero on any divergence (see [cli.md](cli.md)).
 
+## What the audit trail captures (#175)
+
+Auditability (I-02) covers authorization decisions and data-access events, not
+only successful invocations. Every recorded `ActionTrace` carries an `event_type`:
+
+- `invoke` — a capability invocation (success or driver failure).
+- `expand` — a `Kernel.expand()` data-access event (more rows of a stored
+  handle). Expansion Frames carry the expanding principal in
+  `Provenance.principal_id`.
+- `deny` — a `grant_capability()` rejected by policy, recorded with the stable
+  `reason_code` (a `DenialReason`) and a redacted reason message *before* the
+  `PolicyDenied` exception propagates.
+
+So `explain()` and `query_traces()` can answer "who was refused what, when, and
+why" and "which rows were expanded by whom". Expansion query arguments and denial
+messages pass through the same firewall redactor as invocation args, so these new
+records never make the trace store a sensitive-data sink.
+
+## Retention bounding (#182)
+
+Long-lived processes accumulate one trace per invocation and one revocation entry
+per revoked token. Both in-memory structures are bounded:
+
+- The in-memory `TraceStore` caps at `max_entries` (default 10 000), evicting
+  oldest-first. Eviction discards audit data, so it is deliberately loud (a
+  warning on first eviction) and counted (`evicted_count`). For unbounded
+  retention, use a durable backend.
+- Revocation state records each token's expiry and is swept for already-expired
+  tokens (lazily, and via `HMACTokenProvider.sweep_revocations()`). A sweep never
+  un-revokes a live token — only entries for tokens that already fail the expiry
+  check are removed.
+
 ## Security disclaimers
 
 > **v0.1 is not production-hardened for real authentication.**

diff --git a/docs/trace_export.md b/docs/trace_export.md
@@ -134,10 +134,24 @@ envelope = export_action_traces(
 }
 ```
 
+The envelope also carries `event_type` (`invoke`/`expand`/`deny`) and, for
+denials, a stable `reason_code` (#175), so an exported trail distinguishes
+invocations, handle expansions, and policy denials.
+
 ## Stability
 
 `TRACE_EXPORT_VERSION` is bumped only on a **breaking** change to the field
 shape. New optional fields may be added without a bump, so consumers should
 ignore unknown keys. Assert on `status`, `sensitivity`, and the presence of
 `error` rather than on human-readable strings (the `error` text itself may
 evolve).
+
+## Related
+
+- **Querying the trail:** `Kernel.query_traces(TraceQuery(...))` filters by
+  principal, capability, event type, outcome, reason code, and time window — see
+  [architecture.md](architecture.md#querying-the-audit-trail-177).
+- **SIEM export:** `traces_to_ocsf()` renders the trail as OCSF/AOS events for a
+  security pipeline — see
+  [integrations.md](integrations.md#siem-export-ocsf--owasp-aos). The OTel
+  observability export is distinct from both (live spans/metrics).