Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,20 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
tables (`durable_executions`, `durable_journal`, `durable_promises`, `durable_timers`) were added
as numbered migrations `097`–`100` in both `zeph-db/migrations/sqlite/` and `.../postgres/`;
`zeph-durable` owns no `.sql` files and no `sqlx::migrate!` (INV-14). (#4944)
- `feat(durable)`: added the journal payload AEAD boundary. `zeph-durable` now defines the
`PayloadCipher` seal/open trait, the `PayloadAad` location binding
(`execution_id`/`step_id`/`entry_kind`/`idem_key`) with a deterministic injective
`canonical_bytes` encoding, the `EntryKindTag` discriminator (`EntryKind::tag_enum`/`tag` now
delegate to a single source of truth), the metadata-only `CipherError` (with a fail-closed
`From<CipherError>` for `DurableError`), and the `ensure_payload_within_limit` read-side
`max_payload` guard (INV-11, no decode before the size check). `DurableConfig::encryption_gate`
enforces INV-8: AEAD may be disabled only for a single-user local backend (startup `WARN`), and
is rejected for shared-database or Restate deployments (`DurableError::EncryptionRequired`). The
concrete XChaCha20-Poly1305 cipher lives in `zeph-core::durable::XChaCha20Poly1305Cipher` (keeps
`zeph-durable` crypto-dependency-free, INV-1): fresh 192-bit CSPRNG nonce per seal (INV-7),
`key_id || nonce(24) || ciphertext || tag(16)` blob layout with a one-key rotation window, and
zeroized key material. Keyed from the vault `ZEPH_DURABLE_KEY`; see the new "Durable Journal
Encryption" security reference page for the key-rotation policy. (#4945)

### Fixed

Expand Down
3 changes: 3 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ bytes = "1.11.1"
candle-core = { version = "0.10.2", default-features = false }
candle-nn = { version = "0.10.2", default-features = false }
candle-transformers = { version = "0.10.2", default-features = false }
chacha20poly1305 = "0.10.1"
chrono = { version = "0.4.44", default-features = false }
clap = "4.6.1"
cpu-time = "1.0"
Expand Down Expand Up @@ -162,6 +163,7 @@ zeph-config = { path = "crates/zeph-config", version = "0.21.4" }
zeph-context = { path = "crates/zeph-context", version = "0.21.4" }
zeph-core = { path = "crates/zeph-core", version = "0.21.4" }
zeph-db = { path = "crates/zeph-db", default-features = false, version = "0.21.4" }
zeph-durable = { path = "crates/zeph-durable", default-features = false, version = "0.21.4" }
zeph-experiments = { path = "crates/zeph-experiments", version = "0.21.4" }
zeph-gateway = { path = "crates/zeph-gateway", version = "0.21.4" }
zeph-index = { path = "crates/zeph-index", version = "0.21.4" }
Expand Down
1 change: 1 addition & 0 deletions book/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@
- [Untrusted Content Isolation](reference/security/untrusted-content-isolation.md)
- [File Read Sandbox](reference/security/file-sandbox.md)
- [ShadowSentinel Safety Probing](reference/security/shadow-sentinel.md)
- [Durable Journal Encryption](reference/security/durable-encryption.md)

# Development

Expand Down
77 changes: 77 additions & 0 deletions book/src/reference/security/durable-encryption.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Durable Journal Encryption

The durable execution layer journals the control flow of an execution — step
results, promise resolutions, and checkpoint snapshots — to a dedicated
`durable.db` database so an interrupted execution can resume rather than restart.
Those payloads can contain sensitive intermediate data, so they are sealed with
an authenticated cipher before they touch disk.

## Cipher

Payloads are encrypted with **XChaCha20-Poly1305** (AEAD), a 192-bit
extended-nonce construction. A fresh random nonce is drawn from the operating
system CSPRNG on every seal, so no nonce-sequencing state has to be persisted and
nonce reuse under a fixed key cannot occur.

The stored blob layout is:

```text
key_id(1 byte) || nonce(24 bytes) || ciphertext || Poly1305 tag(16 bytes)
```

The leading `key_id` byte selects which key decrypts the blob, enabling the
rotation window described below.

### Associated data (tamper-evidence)

Every seal binds the payload to its journal location through the AEAD associated
data: `(execution_id, step_id, entry_kind, idempotency_key)`. As a result a
sealed result cannot be silently relocated — moving a blob to a different step, or
replaying it under a different execution, changes the associated data and makes
decryption fail authentication. A forged or moved entry is rejected (fail-closed)
rather than decrypted into a bogus result.

## Vault key: `ZEPH_DURABLE_KEY`

The cipher key is resolved from the age vault under the key name
`ZEPH_DURABLE_KEY`, never from inline TOML or environment variables (the standard
Zeph vault contract). It must be exactly **32 bytes** of high-entropy key
material.

Generate and store it once:

```bash
# Generate 32 random bytes and store them in the age vault.
head -c 32 /dev/urandom | zeph vault set ZEPH_DURABLE_KEY --stdin
```

## Encryption requirement (`encrypt_payload`)

AEAD encryption is **on by default** (`[durable].encrypt_payload = true`).
Disabling it is a development-only override and is governed by the deployment:

| Deployment | `encrypt_payload = false` |
| --------------------------------------- | ------------------------- |
| Single-user **local SQLite** | Allowed; logs a startup `WARN` |
| **Shared database** (Postgres / shared) | **Forbidden** — startup error |
| **Restate** backend | **Forbidden** — startup error |

The rationale is the trust boundary: a single-user SQLite file inherits the
operating-system file permissions, but a shared or networked database does not,
so the journal must protect its own payloads there.

## Key rotation

The `key_id` byte makes rotation possible without rewriting the journal:

1. Generate a new key and assign it the next `key_id`.
2. Run with the new key as **current** and the old key registered as the
**previous** key. New entries seal under the new key; in-flight entries sealed
under the old key still decrypt during this window.
3. Once all executions that used the old key have reached a terminal status
(drain), remove the old key.

If you prefer not to run a rotation window, the simpler drain-based policy is to
**quiesce** the durable layer — let all running executions reach a terminal
status — before swapping `ZEPH_DURABLE_KEY`. After a clean drain there are no
entries sealed under the old key, so no previous-key window is needed.
6 changes: 4 additions & 2 deletions crates/zeph-core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,17 @@ cocoon = ["zeph-llm/cocoon", "zeph-commands/cocoon"]
index = ["zeph-agent-context/index"]
metal = ["zeph-llm/metal"]
mock = ["zeph-vault/mock"]
postgres = ["zeph-db/postgres", "zeph-agent-context/postgres", "zeph-agent-persistence/postgres"]
postgres = ["zeph-db/postgres", "zeph-agent-context/postgres", "zeph-agent-persistence/postgres", "zeph-durable/postgres"]
profiling = ["dep:tracing-subscriber", "zeph-commands/profiling"]
profiling-alloc = ["profiling"]
scheduler = []
sqlite = ["zeph-db/sqlite", "zeph-agent-context/sqlite", "zeph-agent-persistence/sqlite"]
sqlite = ["zeph-db/sqlite", "zeph-agent-context/sqlite", "zeph-agent-persistence/sqlite", "zeph-durable/sqlite"]
sysinfo = ["dep:sysinfo"]

[dependencies]
base64.workspace = true
blake3.workspace = true
chacha20poly1305.workspace = true
chrono.workspace = true
cpu-time.workspace = true
dirs.workspace = true
Expand Down Expand Up @@ -71,6 +72,7 @@ zeph-common.workspace = true
zeph-config.workspace = true
zeph-context.workspace = true
zeph-db.workspace = true
zeph-durable.workspace = true
zeph-experiments.workspace = true
zeph-index.workspace = true
zeph-llm.workspace = true
Expand Down
Loading
Loading