Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ jobs:
python examples/tutorial.py
python examples/readme_quickstart.py
python examples/trace_export_demo.py
python examples/ocsf_export_demo.py
python examples/trace_replay_demo.py

conformance_stub:
name: "Weaver Spec Conformance Stub (v0.1.0)"
Expand Down
42 changes: 42 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,48 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Added
- **Handle expansions and policy denials are now first-class audit records
(#175).** `ActionTrace` gained an additive `event_type`
(`invoke`/`expand`/`deny`) and `reason_code`. A successful `Kernel.expand()`
records an `expand` event (and expansion Frames now carry a non-empty
`Provenance.principal_id`); a `PolicyDenied` grant records a `deny` event with
the stable reason code before the exception propagates, so `explain()` and the
trace listing answer "who was refused what, when, and why" (I-02).
- **TraceStore query API (#177).** New `TraceQuery` dataclass and pure
`query_traces()` filter by principal, capability, event type, outcome, reason
code, and time window (since-inclusive / until-exclusive) with deterministic
`(invoked_at, action_id)` ordering and pagination. Exposed as
`TraceStore.query()` / `Kernel.query_traces()` and added to
`TraceStoreProtocol`, so the SQLite and JSONL backends share the contract.
- **Programmatic kernel metrics counters (#179).** `Kernel.stats` returns an
immutable `StatsSnapshot` (grants, denials by reason code, invocations,
invocation failures, fallback activations, redaction events, budget downgrades,
handle stores, expansions); `Kernel.reset_stats()` zeroes them. Dependency-free
and lock-guarded — telemetry without exporting the full trail or installing the
`otel` extra.
- **OCSF / OWASP-AOS SIEM export (#176).** `trace_to_ocsf()` / `traces_to_ocsf()`
map any `ActionTrace` (invoke/expand/deny) to OCSF API Activity (class 6003)
events, AOS-enriched, as a pure dependency-free dict construction. See the SIEM
section in `docs/integrations.md` for the field-mapping table and a runnable
recipe (`examples/ocsf_export_demo.py`).
- **Policy-replay regression harness (#213).** `DecisionRecord`,
`record_decision()`, and `replay(records, engine) -> DecisionDiff` re-evaluate a
recorded decision corpus against a candidate policy and report allow→deny,
deny→allow, and reason-code flips deterministically. Rate-limit-dependent flips
are surfaced separately (`DecisionDiff.rate_limited`).
Companion: `examples/trace_replay_demo.py`.

### Changed
- **Bounded memory for in-memory audit and revocation state (#182).**
`TraceStore` now caps at `max_entries` (default 10 000) with oldest-first
eviction, a one-time warning, and an observable `evicted_count`. The revocation
store tracks each token's expiry and sweeps state for already-expired tokens
(lazily on an interval and via `HMACTokenProvider.sweep_revocations()`), never
un-revoking a live token. `RevocationStoreProtocol.track()` now takes an
`expires_at` argument and the protocol gained `sweep_expired()` (breaking for
custom revocation backends).

## [0.11.0] - 2026-06-19

### Fixed
Expand Down
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,7 @@ example:
python examples/evaluation_artifact_policy.py
python examples/trace_export_demo.py
python examples/persistent_audit_demo.py
python examples/ocsf_export_demo.py
python examples/trace_replay_demo.py

ci: fmt-check lint type test example
53 changes: 53 additions & 0 deletions docs/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,51 @@ Records every `ActionTrace`. `explain(action_id)` returns the full audit record.

`export_action_trace` / `export_action_traces` serialise traces into a stable, versioned, JSON-serialisable shape for downstream analysis tools (distinct from the OpenTelemetry observability export); `Kernel.list_traces()` is the public accessor that feeds them the audit trail. See [trace_export.md](trace_export.md).

#### Audited event types (#175)

`ActionTrace.event_type` distinguishes three kinds of audited event, so the
audit trail covers authorization decisions and data-access events, not only
successful invocations (I-02):

| `event_type` | Recorded when | Notable fields |
|--------------|---------------|----------------|
| `invoke` (default) | A capability invocation runs | `driver_id`, `result_summary`, `handle_id` |
| `expand` | `Kernel.expand()` serves more rows of a handle | `handle_id`, `result_summary`; expansion Frames carry `Provenance.principal_id` |
| `deny` | A `grant_capability()` call is rejected by policy | `reason_code` (stable `DenialReason`), redacted `error`; no token is issued |

`reason_code` is populated for `deny` events. All three fields are additive with
defaults, so a directly-constructed trace keeps the original `invoke` meaning.

#### Querying the audit trail (#177)

`Kernel.query_traces(TraceQuery(...))` (and `TraceStore.query(...)` on any
backend) filters records by `principal_id`, `capability_id`, `event_type`,
`outcome` (`succeeded`/`failed`), `reason_code`, and a `since`/`until` window
(`since` inclusive, `until` exclusive), with `limit`/`offset` pagination. Results
are ordered deterministically by `(invoked_at, action_id)`, so successive pages
over an unchanged store are disjoint and complete. The pure `query_traces()`
function applies the same semantics to any iterable of traces.

#### Bounded memory (#182)

The in-memory `TraceStore` caps itself at `max_entries` (default 10 000) and
evicts oldest-first when exceeded; eviction is *loud* (first eviction logs a
warning) and observable via `TraceStore.evicted_count`. Re-recording an existing
`action_id` overwrites in place and never evicts. Deployments needing unbounded
retention should use a durable backend. Revocation state is bounded similarly —
see [Persistence & durable stores](#persistence--durable-stores) and
[security.md](security.md).

#### Kernel metrics counters (#179)

`Kernel.stats` returns an immutable `StatsSnapshot` of aggregate counters
(grants, denials by reason code, invocations, invocation failures, fallback
activations, redaction events, budget downgrades, handle stores, expansions);
`Kernel.reset_stats()` zeroes them. The counters are dependency-free and
lock-guarded — cheap health-check telemetry that needs neither a trace export nor
the `otel` extra. They are *aggregates*; the `TraceStore` remains the record of
individual events.

## Persistence & durable stores

The stateful stores are protocol-based seams (`weaver_kernel.stores`), mirroring
Expand Down Expand Up @@ -189,6 +234,14 @@ an append-only log that is easy to ship to a collector. Use
or apply across workers sharing a database file. All durable backends use only
the standard library (`sqlite3`, `json`) — no new runtime dependency.

**Bounded revocation state (#182).** Every revocation backend tracks each
token's `expires_at` and can `sweep_expired(now)` to drop bookkeeping for tokens
that have already expired — they fail the verifier's expiry check regardless, so
a sweep never un-revokes a *live* token. The in-memory store sweeps lazily on an
interval; `HMACTokenProvider.sweep_revocations()` triggers it explicitly (call it
on a schedule for durable backends). `RevocationStoreProtocol.track()` therefore
takes an `expires_at` argument and the protocol includes `sweep_expired()`.

**Verifiable audit chain.** Persisted traces are hash-chained
(`prev_hash`/`record_hash`, HMAC-SHA256 keyed by `WEAVER_KERNEL_SECRET`).
`verify_chain()` detects mutation, insertion, deletion, and reordering;
Expand Down
39 changes: 39 additions & 0 deletions docs/capabilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,3 +222,42 @@ Both built-in engines support `explain()`. If you bring a custom policy
engine that implements only `PolicyEngine.evaluate`, `explain_denial` raises
`AgentKernelError` with guidance — implement the `ExplainingPolicyEngine`
protocol to enable structured explanations.

## Validating a policy change with replay (#213)

A policy edit is the highest-blast-radius change in the system: one rule can
silently widen access or break every agent. The replay harness re-evaluates a
corpus of recorded decisions against a *candidate* policy and reports the
decision diff, so you get a deterministic "what would have changed" answer before
deploying.

```python
from weaver_kernel import DefaultPolicyEngine, record_decision, replay

baseline = DefaultPolicyEngine()
# Build a corpus (a real one would come from historical traffic).
records = [
record_decision(baseline, request, capability, principal, justification="..."),
# ...
]

diff = replay(records, candidate_engine)
assert diff.empty # replaying against the same engine → no flips
for flip in diff.flips: # allow_to_deny | deny_to_allow | reason_code_change
print(flip.record.capability.capability_id, flip.kind,
flip.baseline_reason_code, "->", flip.candidate_reason_code)
```

Determinism and fidelity:

- Output order is the input record order; replaying records against the engine
that produced them yields `diff.empty`.
- Rate-limit decisions are replay-order-sensitive (the default engine's limiter is
stateful), so flips involving `DenialReason.RATE_LIMITED` are surfaced in
`diff.rate_limited` rather than `diff.flips`.
- Replay validates **policy structure** (role/justification/constraint rules), not
argument-dependent rules whose inputs the audit trail redacts.

Runnable recipe: [`examples/trace_replay_demo.py`](../examples/trace_replay_demo.py).
This complements shadow mode (live-traffic comparison) and the fixture-based
policy testing framework with real-traffic, pre-deployment coverage.
39 changes: 39 additions & 0 deletions docs/integrations.md
Original file line number Diff line number Diff line change
Expand Up @@ -363,6 +363,45 @@ instrument_kernel(kernel)
no-op. Use `weaver_kernel.otel.reset_instrumentation(kernel)` in tests to
re-instrument with a different provider.

## SIEM export (OCSF / OWASP AOS)

OpenTelemetry feeds the *observability* pipeline; SIEMs speak **OCSF** (the Open
Cybersecurity Schema Framework), the *security-operations* pipeline. The audit
trail maps to OCSF **API Activity** events (class `6003`), enriched per the OWASP
Agent Observability Standard (AOS), with no new dependency — the mapping is a pure
dict construction.

```python
from weaver_kernel import traces_to_ocsf

events = traces_to_ocsf(kernel.list_traces()) # list[dict], OCSF-shaped
# ship `events` to your SIEM (one JSON object per event)
```

`trace_to_ocsf(trace)` maps a single record. Runnable recipe:
[`examples/ocsf_export_demo.py`](../examples/ocsf_export_demo.py).

Field mapping (kernel `ActionTrace` → OCSF API Activity 6003):

| OCSF field | Source |
|------------|--------|
| `class_uid` / `class_name` | constant `6003` / `"API Activity"` |
| `category_uid` / `category_name` | constant `6` / `"Application Activity"` |
| `activity_id` / `activity_name` | `event_type`: invoke→Other(99), expand→Read(2), deny→Other(99) |
| `type_uid` | `class_uid * 100 + activity_id` |
| `status_id` / `status` | `2`/`Failure` when `error` is set, else `1`/`Success` |
| `status_detail` | `error` (already redacted at record time) |
| `severity_id` / `severity` | deny→Medium(3), else Informational(1) |
| `time` | `invoked_at` as epoch milliseconds (UTC) |
| `actor.user.uid` | `principal_id` |
| `api.operation` / `api.service.name` | `capability_id` / `driver_id` (or `"weaver-kernel"`) |
| `metadata` | product + OCSF version (`OCSF_VERSION`) + AOS extension marker |
| `unmapped` | kernel specifics: `action_id`, `token_id`, `event_type`, `response_mode`, `sensitivity`, `reason_code`, `handle_id`, `result_summary` |

The mapping is built only from already-redaction-safe trace fields, so exporting
cannot widen the I-01 boundary. AOS is young, so the mapping is versioned and
isolated in `weaver_kernel.ocsf`; output is validated structurally in the tests.

## Ecosystem integration patterns

These reference flows show how agent-kernel composes with neighboring Weaver
Expand Down
32 changes: 32 additions & 0 deletions docs/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,38 @@ in-memory trace did not already hold and cannot widen the I-01 boundary.
The CLI exposes verification to operators: `weaver-kernel audit verify --store
audit.db` exits non-zero on any divergence (see [cli.md](cli.md)).

## What the audit trail captures (#175)

Auditability (I-02) covers authorization decisions and data-access events, not
only successful invocations. Every recorded `ActionTrace` carries an `event_type`:

- `invoke` — a capability invocation (success or driver failure).
- `expand` — a `Kernel.expand()` data-access event (more rows of a stored
handle). Expansion Frames carry the expanding principal in
`Provenance.principal_id`.
- `deny` — a `grant_capability()` rejected by policy, recorded with the stable
`reason_code` (a `DenialReason`) and a redacted reason message *before* the
`PolicyDenied` exception propagates.

So `explain()` and `query_traces()` can answer "who was refused what, when, and
why" and "which rows were expanded by whom". Expansion query arguments and denial
messages pass through the same firewall redactor as invocation args, so these new
records never make the trace store a sensitive-data sink.

## Retention bounding (#182)

Long-lived processes accumulate one trace per invocation and one revocation entry
per revoked token. Both in-memory structures are bounded:

- The in-memory `TraceStore` caps at `max_entries` (default 10 000), evicting
oldest-first. Eviction discards audit data, so it is deliberately loud (a
warning on first eviction) and counted (`evicted_count`). For unbounded
retention, use a durable backend.
- Revocation state records each token's expiry and is swept for already-expired
tokens (lazily, and via `HMACTokenProvider.sweep_revocations()`). A sweep never
un-revokes a live token — only entries for tokens that already fail the expiry
check are removed.

## Security disclaimers

> **v0.1 is not production-hardened for real authentication.**
Expand Down
14 changes: 14 additions & 0 deletions docs/trace_export.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,10 +134,24 @@ envelope = export_action_traces(
}
```

The envelope also carries `event_type` (`invoke`/`expand`/`deny`) and, for
denials, a stable `reason_code` (#175), so an exported trail distinguishes
invocations, handle expansions, and policy denials.

## Stability

`TRACE_EXPORT_VERSION` is bumped only on a **breaking** change to the field
shape. New optional fields may be added without a bump, so consumers should
ignore unknown keys. Assert on `status`, `sensitivity`, and the presence of
`error` rather than on human-readable strings (the `error` text itself may
evolve).

## Related

- **Querying the trail:** `Kernel.query_traces(TraceQuery(...))` filters by
principal, capability, event type, outcome, reason code, and time window — see
[architecture.md](architecture.md#querying-the-audit-trail-177).
- **SIEM export:** `traces_to_ocsf()` renders the trail as OCSF/AOS events for a
security pipeline — see
[integrations.md](integrations.md#siem-export-ocsf--owasp-aos). The OTel
observability export is distinct from both (live spans/metrics).
Loading
Loading