Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,17 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
hard cap (100%) aborts with `DurableError::StepCapExceeded`. A resume replays folded steps from the
checkpoint snapshot — preserving each step's idempotency key for the divergence guard — without
re-running their operations. (#4948)
- `feat(durable)`: wired the durable execution layer into all mandatory integration points (spec-064
C6). A new `zeph durable` CLI group (`list`/`show`/`inspect`/`prune`/`resume`) connects directly to
`durable.db` with no running agent; output is redacted by default (INV-5) and `--reveal` decrypts
through the vault-resolved `ZEPH_DURABLE_KEY` (FR-DE-07/FR-DE-08). The `[durable]` config section is
now part of the root `Config`, the `--init` wizard generates and stores `ZEPH_DURABLE_KEY` in the
age vault (never inline), and an additive, idempotent `--migrate-config` step adds `[durable]`
(default-off) to existing configs. A ratatui `DurableView` (command-palette `durable`, `D` key)
shows in-flight executions with mandatory status spinners (spec-011), fed by a read-only poll task.
The pure-data `DurableConfig`/`RetentionPolicy`/`DurableBackend` moved to `zeph-config` (single
source of truth, re-exported by `zeph-durable`); the AEAD enforcement gate `encryption_gate` is now
a free function in `zeph-durable`. (#4949)

### Fixed

Expand Down
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,7 @@ sqlite = [
"zeph-tools/sqlite",
"zeph-scheduler?/sqlite",
"zeph-core/sqlite",
"zeph-durable/sqlite",
]
postgres = [
"zeph-db/postgres",
Expand All @@ -272,6 +273,7 @@ postgres = [
"zeph-tools/postgres",
"zeph-scheduler?/postgres",
"zeph-core/postgres",
"zeph-durable/postgres",
]

[dependencies]
Expand Down Expand Up @@ -322,6 +324,7 @@ zeph-config.workspace = true
zeph-context.workspace = true
zeph-core.workspace = true
zeph-db.workspace = true
zeph-durable.workspace = true
zeph-experiments.workspace = true
zeph-gateway = { workspace = true, optional = true }
zeph-index.workspace = true
Expand Down
23 changes: 23 additions & 0 deletions book/src/reference/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ zeph [OPTIONS] [COMMAND]
| `sessions` | Manage ACP session history — list, show, delete (requires `acp` feature) |
| `schedule` | Manage cron-based scheduled jobs — list, add, remove, show (requires `scheduler` feature; see [Scheduler](../concepts/scheduler.md)) |
| `db` | Database management — run migrations, check status (see [Database Abstraction](../concepts/database.md)) |
| `durable` | Inspect the durable execution journal — list, show, inspect, prune, resume (see [Durable Journal Encryption](security/durable-encryption.md)) |
| `migrate-config` | Add missing config parameters as commented-out blocks and reformat the file (see [Migrate Config](../guides/migrate-config.md)) |
| `worktree` | Manage background sub-agent git worktrees — list active, remove stale (requires `[worktree] enabled = true`; see [Worktree Isolation](../guides/worktree.md)) |

Expand All @@ -43,6 +44,28 @@ zeph db migrate # apply pending migrations
zeph db migrate --status # check what would be applied
```

### `zeph durable`

Inspect the durable execution journal directly — no running agent process is
required. Output is **redacted by default** (INV-5): payload bytes and resolver
tokens are shown only with `--reveal`, which decrypts through the vault-resolved
`ZEPH_DURABLE_KEY`.

| Subcommand | Description |
|------------|-------------|
| `durable list [--status <s>] [--kind <k>] [--limit <n>]` | List executions, newest first |
| `durable show <id> [--reveal]` | Show an execution's journal entries (metadata only by default) |
| `durable inspect <id> --step <n> [--reveal]` | Inspect a single step entry |
| `durable prune [--dry-run]` | Sweep terminal executions past their TTL |
| `durable resume <id>` | Report resume state for an execution |

```bash
zeph durable list --status running # in-flight executions
zeph durable show <uuid> # redacted journal entries
zeph durable show <uuid> --reveal # decrypted payloads (prints a warning)
zeph durable prune --dry-run # how many would be pruned
```

### `zeph init`

Generate a `config.toml` through a guided wizard.
Expand Down
26 changes: 26 additions & 0 deletions book/src/reference/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -794,6 +794,32 @@ self_check = false # Enable MARCH Proposer+Checker self-
[cli.loop]
min_interval_secs = 5 # Minimum loop interval in seconds (default: 5)
max_iterations = 1000 # Max repetitions before loop auto-stops (default: 1000)

# Durable execution layer (spec-064). Opt-in, default-off. When enabled, the
# agent journals control flow to a dedicated durable.db for crash-resume.
# The AEAD key is vault-only (ZEPH_DURABLE_KEY); see Durable Journal Encryption.
[durable]
enabled = false # Master opt-in (default: false — current behavior)
backend = "local" # "local" (durable.db) or "restate" (server feature)
encrypt_payload = true # AEAD-encrypt payloads (dev-only override; see security docs)
agent_turns = true # Wrap agent-loop steps when enabled
orchestration = true # Journal /plan resume replan budget when enabled
scheduler = true # Exactly-once scheduler job fire when enabled
subagent = true # Durable promise for subagent spawn/await when enabled
journal_flush_interval_ms = 10 # Group-commit interval for buffered appends (ms)
journal_ack_timeout_ms = 5000 # Acknowledged-append timeout before non-durable degrade (ms)
max_steps_per_execution = 10000 # In-execution step cap (soft fold 90%, hard abort 100%)
max_payload_bytes = 1048576 # Max payload size, enforced on append + read (1 MiB)
promise_poll_interval_secs = 2 # DB fallback poll interval for parked promises (s)
max_parked_promises = 1000 # Above this, promise resolution falls back to polling

[durable.retention]
ttl_completed_secs = 604800 # Prune completed executions older than this (7 days)
ttl_failed_secs = 2592000 # Prune failed/aborted executions older than this (30 days)
max_executions = 10000 # LRU cap on stored executions
max_journal_bytes = 1073741824 # Journal size cap in bytes (1 GiB)
prune_batch_size = 500 # Rows deleted per transaction during a sweep
prune_interval_secs = 3600 # Background prune poll interval (s)
```

### Provider Entry Fields
Expand Down
15 changes: 10 additions & 5 deletions book/src/reference/security/durable-encryption.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,16 +35,21 @@ rather than decrypted into a bogus result.

The cipher key is resolved from the age vault under the key name
`ZEPH_DURABLE_KEY`, never from inline TOML or environment variables (the standard
Zeph vault contract). It must be exactly **32 bytes** of high-entropy key
material.
Zeph vault contract). It is exactly **32 bytes** of high-entropy key material,
**base64-encoded** for storage as a vault string value.

Generate and store it once:
The easiest path is the configuration wizard: `zeph --init` generates a fresh
key and stores it in the age vault automatically when you enable durable
execution. To generate and store it manually instead:

```bash
# Generate 32 random bytes and store them in the age vault.
head -c 32 /dev/urandom | zeph vault set ZEPH_DURABLE_KEY --stdin
# Generate 32 random bytes, base64-encode them, and store in the age vault.
head -c 32 /dev/urandom | base64 | zeph vault set ZEPH_DURABLE_KEY --stdin
```

Inspect a journal with decrypted payloads using `zeph durable show <id>
--reveal`, which resolves and decodes this key.

## Encryption requirement (`encrypt_payload`)

AEAD encryption is **on by default** (`[durable].encrypt_payload = true`).
Expand Down
47 changes: 47 additions & 0 deletions config/default.toml
Original file line number Diff line number Diff line change
Expand Up @@ -1545,3 +1545,50 @@ provider_persistence = true
# command = "osascript -e 'display notification \"Task complete\" with title \"Zeph\"'"
# timeout_secs = 3
# fail_closed = false

# ----------------------------------------------------------------------------
# Durable execution layer (spec-064). Opt-in, default-off. When enabled, the
# agent journals control flow to a dedicated durable.db so a crashed or
# interrupted execution can resume at the point of failure instead of
# restarting. Inspect with `zeph durable`. The AEAD key is vault-only
# (ZEPH_DURABLE_KEY) and never written inline here.
# ----------------------------------------------------------------------------
[durable]
# Master opt-in. false = current behavior, no journal opened.
enabled = false
# Journal backend: "local" (dedicated durable.db) | "restate" (server feature).
backend = "local"
# Encrypt payloads with AEAD. Disabling is a dev-only override (forbidden for
# non-local backends and shared databases).
encrypt_payload = true
# Per-adapter opt-in (only take effect when enabled = true).
agent_turns = true
orchestration = true
scheduler = true
subagent = true
# Group-commit interval for buffered appends (ms).
journal_flush_interval_ms = 10
# Acknowledged-append timeout before degrading to non-durable mode (ms).
journal_ack_timeout_ms = 5000
# In-execution step cap (soft fold at 90%, hard abort at 100%).
max_steps_per_execution = 10000
# Maximum payload size in bytes, enforced on append and read (1 MiB).
max_payload_bytes = 1048576
# Database fallback poll interval for parked promises (seconds).
promise_poll_interval_secs = 2
# Above this many parked promises, resolution falls back to pure polling.
max_parked_promises = 1000

[durable.retention]
# Prune completed executions older than this (seconds, 7 days).
ttl_completed_secs = 604800
# Prune failed/aborted executions older than this (seconds, 30 days).
ttl_failed_secs = 2592000
# LRU cap on stored executions.
max_executions = 10000
# Size cap on the journal in bytes (1 GiB).
max_journal_bytes = 1073741824
# Rows deleted per transaction during a prune sweep.
prune_batch_size = 500
# Background prune poll interval (seconds).
prune_interval_secs = 3600
Loading
Loading