Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,21 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

## [Unreleased]

### Added

- `feat(durable)`: scaffolded the new Layer-0 `zeph-durable` crate (spec-064) — the foundation of
the native durable execution layer. This first slice is type-level only, with no runtime
behavior: journal-boundary newtypes (`ExecutionId`/`PromiseId`/`TimerId` as UUIDv7, `StepId`,
`JournalSeq`, `IdempotencyKey`, plus the `ExecutionKind` discriminator), the `Journal` trait and
its `JournalEntry`/`EntryKind`/`ExecutionStatus` data model, the `EffectClass` side-effect
contract, the pure-data `DurableConfig`/`RetentionPolicy` mirroring `[durable]` TOML (all
spec-default-backed), and the `DurableError` type. `IdempotencyKey::derive` uses BLAKE3
`derive_key` with a domain-separation context and length-delimited (injective) input. The crate
is pure infrastructure with no business-layer dependencies (INV-1). The four `durable_*` schema
tables (`durable_executions`, `durable_journal`, `durable_promises`, `durable_timers`) were added
as numbered migrations `097`–`100` in both `zeph-db/migrations/sqlite/` and `.../postgres/`;
`zeph-durable` owns no `.sql` files and no `sqlx::migrate!` (INV-14). (#4944)

### Fixed

- `perf(acp)`: `list_directory` and `find_path` tools now offload their synchronous
Expand Down
14 changes: 14 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ base64 = "0.22.1"
bech32 = "0.11.1"
blake3 = "1.8.5"
bytemuck = "1.25"
bytes = "1.11.1"
candle-core = { version = "0.10.2", default-features = false }
candle-nn = { version = "0.10.2", default-features = false }
candle-transformers = { version = "0.10.2", default-features = false }
Expand Down
13 changes: 13 additions & 0 deletions crates/zeph-db/migrations/postgres/097_durable_executions.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
-- Durable execution layer (spec-064, #4944): one row per durable execution.
-- Applied via zeph_db::run_migrations against the dedicated durable pool (INV-14).
-- The owning zeph-durable crate holds no .sql files and no sqlx::migrate!.
CREATE TABLE durable_executions (
execution_id TEXT PRIMARY KEY,
kind TEXT NOT NULL,
status TEXT NOT NULL CHECK(status IN ('running', 'completed', 'failed', 'aborted')),
created_at BIGINT NOT NULL,
updated_at BIGINT NOT NULL,
finalized_at BIGINT -- NULL until terminal; drives retention.
);

CREATE INDEX idx_durable_exec_status_time ON durable_executions(status, finalized_at);
27 changes: 27 additions & 0 deletions crates/zeph-db/migrations/postgres/098_durable_journal.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
-- Durable execution layer (spec-064, #4944): append-only journal entries.
-- payload holds AEAD-sealed bytes (nonce || ciphertext || tag); control entries leave it NULL.
CREATE TABLE durable_journal (
seq BIGSERIAL PRIMARY KEY, -- global append order (durability anchor)
execution_id TEXT NOT NULL REFERENCES durable_executions(execution_id),
step_id BIGINT NOT NULL,
entry_kind TEXT NOT NULL,
idem_key BYTEA, -- IdempotencyKey (32B); NULL for non-step entries
effect_class TEXT,
payload BYTEA, -- AEAD-sealed; NULL for control entries
payload_version INTEGER,
hmac BYTEA, -- row-level HMAC for shared-DB / Restate
created_at BIGINT NOT NULL
);

CREATE INDEX idx_durable_journal_exec_step
ON durable_journal(execution_id, step_id, seq);

-- Enforce at most one committed result per step (defense in depth alongside the writer).
CREATE UNIQUE INDEX idx_durable_journal_result
ON durable_journal(execution_id, step_id)
WHERE entry_kind = 'step_result';

-- Efficient exactly-once intent lookup ("does this intent already exist?").
CREATE INDEX idx_durable_journal_idem_key
ON durable_journal(execution_id, idem_key)
WHERE idem_key IS NOT NULL;
11 changes: 11 additions & 0 deletions crates/zeph-db/migrations/postgres/099_durable_promises.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
-- Durable execution layer (spec-064, #4944): external-completion handles.
-- The 32-byte resolver token is never stored; only its BLAKE3 hash (INV-9).
CREATE TABLE durable_promises (
promise_id TEXT PRIMARY KEY,
execution_id TEXT NOT NULL REFERENCES durable_executions(execution_id),
resolver_token_hash BYTEA NOT NULL,
resolved INTEGER NOT NULL DEFAULT 0,
payload BYTEA,
created_at BIGINT NOT NULL,
resolved_at BIGINT
);
10 changes: 10 additions & 0 deletions crates/zeph-db/migrations/postgres/100_durable_timers.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
-- Durable execution layer (spec-064, #4944): durable wakes persisted across restarts.
CREATE TABLE durable_timers (
timer_id TEXT PRIMARY KEY,
execution_id TEXT NOT NULL REFERENCES durable_executions(execution_id),
due_at BIGINT NOT NULL,
fired INTEGER NOT NULL DEFAULT 0,
created_at BIGINT NOT NULL
);

CREATE INDEX idx_durable_timers_due ON durable_timers(fired, due_at);
13 changes: 13 additions & 0 deletions crates/zeph-db/migrations/sqlite/097_durable_executions.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
-- Durable execution layer (spec-064, #4944): one row per durable execution.
-- Applied via zeph_db::run_migrations against the dedicated durable.db pool (INV-14).
-- The owning zeph-durable crate holds no .sql files and no sqlx::migrate!.
CREATE TABLE durable_executions (
execution_id TEXT PRIMARY KEY,
kind TEXT NOT NULL,
status TEXT NOT NULL CHECK(status IN ('running', 'completed', 'failed', 'aborted')),
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
finalized_at INTEGER -- NULL until terminal; drives retention.
);

CREATE INDEX idx_durable_exec_status_time ON durable_executions(status, finalized_at);
27 changes: 27 additions & 0 deletions crates/zeph-db/migrations/sqlite/098_durable_journal.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
-- Durable execution layer (spec-064, #4944): append-only journal entries.
-- payload holds AEAD-sealed bytes (nonce || ciphertext || tag); control entries leave it NULL.
CREATE TABLE durable_journal (
seq INTEGER PRIMARY KEY AUTOINCREMENT, -- global append order (durability anchor)
execution_id TEXT NOT NULL REFERENCES durable_executions(execution_id),
step_id INTEGER NOT NULL,
entry_kind TEXT NOT NULL,
idem_key BLOB, -- IdempotencyKey (32B); NULL for non-step entries
effect_class TEXT,
payload BLOB, -- AEAD-sealed; NULL for control entries
payload_version INTEGER,
hmac BLOB, -- row-level HMAC for shared-DB / Restate
created_at INTEGER NOT NULL
);

CREATE INDEX idx_durable_journal_exec_step
ON durable_journal(execution_id, step_id, seq);

-- Enforce at most one committed result per step (defense in depth alongside the writer).
CREATE UNIQUE INDEX idx_durable_journal_result
ON durable_journal(execution_id, step_id)
WHERE entry_kind = 'step_result';

-- Efficient exactly-once intent lookup ("does this intent already exist?").
CREATE INDEX idx_durable_journal_idem_key
ON durable_journal(execution_id, idem_key)
WHERE idem_key IS NOT NULL;
11 changes: 11 additions & 0 deletions crates/zeph-db/migrations/sqlite/099_durable_promises.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
-- Durable execution layer (spec-064, #4944): external-completion handles.
-- The 32-byte resolver token is never stored; only its BLAKE3 hash (INV-9).
CREATE TABLE durable_promises (
promise_id TEXT PRIMARY KEY,
execution_id TEXT NOT NULL REFERENCES durable_executions(execution_id),
resolver_token_hash BLOB NOT NULL,
resolved INTEGER NOT NULL DEFAULT 0,
payload BLOB,
created_at INTEGER NOT NULL,
resolved_at INTEGER
);
10 changes: 10 additions & 0 deletions crates/zeph-db/migrations/sqlite/100_durable_timers.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
-- Durable execution layer (spec-064, #4944): durable wakes persisted across restarts.
CREATE TABLE durable_timers (
timer_id TEXT PRIMARY KEY,
execution_id TEXT NOT NULL REFERENCES durable_executions(execution_id),
due_at INTEGER NOT NULL,
fired INTEGER NOT NULL DEFAULT 0,
created_at INTEGER NOT NULL
);

CREATE INDEX idx_durable_timers_due ON durable_timers(fired, due_at);
35 changes: 35 additions & 0 deletions crates/zeph-durable/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
[package]
name = "zeph-durable"
version.workspace = true
edition.workspace = true
authors.workspace = true
license.workspace = true
repository.workspace = true
homepage.workspace = true
keywords.workspace = true
categories.workspace = true
publish.workspace = true
description = "Native durable execution layer for Zeph: journaled control flow with crash-resume"
readme = "README.md"

[features]
# Backend selection forwarded to zeph-db (mutually exclusive, mirrors zeph-scheduler).
# default = ["sqlite"] keeps standalone `cargo check -p zeph-durable` and rust-analyzer working.
default = ["sqlite"]
sqlite = ["zeph-db/sqlite"]
postgres = ["zeph-db/postgres"]

[dependencies]
blake3.workspace = true
bytes.workspace = true
serde = { workspace = true, features = ["derive"] }
thiserror.workspace = true
uuid = { workspace = true, features = ["serde", "v7"] }
zeph-db.workspace = true

[dev-dependencies]
serde_json.workspace = true
toml.workspace = true

[lints]
workspace = true
121 changes: 121 additions & 0 deletions crates/zeph-durable/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# zeph-durable

[![Crates.io](https://img.shields.io/crates/v/zeph-durable)](https://crates.io/crates/zeph-durable)
[![docs.rs](https://img.shields.io/docsrs/zeph-durable)](https://docs.rs/zeph-durable)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](../../LICENSE)
[![MSRV](https://img.shields.io/badge/MSRV-1.95-blue)](https://www.rust-lang.org)

Native durable execution layer for [Zeph](https://github.com/bug-ops/zeph) — journals the *control
flow* of an execution (steps, promises, timers) so a crashed or interrupted run can resume at the
point of failure instead of restarting from scratch.

> [!IMPORTANT]
> This crate is a **foundational scaffold** (spec-064, issue #4944). It currently exposes
> *type-level* building blocks only — there is **no execution behavior yet**. The journal writer,
> execution backends, replay cursor, and the durable step primitive land in follow-up issues of
> epic [#4707](https://github.com/bug-ops/zeph/issues/4707).

## Overview

`zeph-durable` is a Layer-0 infrastructure crate, analogous to `zeph-db` and `zeph-common`. It is a
pure infrastructure primitive: it sees opaque serialized payloads, never domain types. Domain
meaning lives in thin adapter modules inside each consuming crate (the agent tool-loop,
orchestration, scheduler, and subagent layers).

The eventual design provides a `DurableContext` facade (`step()` / `parallel()` / `promise()` /
`sleep_until()`), an explicit `EffectClass` contract per step, a background journal-writer actor
with group-commit, AEAD payload encryption, and a fingerprint-guarded replay cursor — all backed by
a dedicated `durable.db` (SQLite) or a feature-gated Restate backend.

## Key Modules

- **ids** — journal-boundary newtypes: `ExecutionId` / `PromiseId` / `TimerId` (UUIDv7), `StepId`,
`JournalSeq`, `IdempotencyKey`, and the `ExecutionKind` discriminator. Private fields, smart
constructors, serde-round-trip stable.
- **journal** — the `Journal` trait plus its data model: `JournalEntry`, the closed `EntryKind`
enum, and `ExecutionStatus`.
- **effect** — `EffectClass`, the per-step side-effect contract (`Idempotent` / `AtLeastOnce` /
`ExactlyOnceGuarded`).
- **config** — pure-data `DurableConfig` and `RetentionPolicy` mirroring the `[durable]` TOML
section, with spec defaults applied on deserialization.
- **error** — the crate-wide `DurableError`.

## Architecture & invariants

- **Layer 0, no business-logic dependencies (INV-1).** `zeph-durable` MUST NOT depend on
`zeph-llm`, `zeph-memory`, `zeph-core`, `zeph-sanitizer`, or any business-layer crate. Its only
dependencies are `zeph-db` and `zeph-common`.
- **Closed enums make illegal states unrepresentable.** Control entries (`EffectIntent`,
`PromiseCreated`, `TimerArmed`) carry no payload field — a "control entry with payload" cannot be
constructed.
- **Domain-separated idempotency keys.** `IdempotencyKey::derive` uses BLAKE3 `derive_key` with a
fixed context string and length-delimited (injective) input, so an attacker-controlled
fingerprint cannot collide with a different `(execution_id, step_id)` pair.

> [!NOTE]
> **Schema ownership (INV-14).** `zeph-durable` owns **no** `.sql` files and **no**
> `sqlx::migrate!`. The four `durable_*` tables (`durable_executions`, `durable_journal`,
> `durable_promises`, `durable_timers`) live as numbered migrations in
> `zeph-db/migrations/{sqlite,postgres}/` and are applied via `zeph_db::run_migrations` against a
> dedicated `durable.db` pool.

## Installation

This crate is an internal workspace member of Zeph. To use it from another workspace crate:

```toml
[dependencies]
zeph-durable = { path = "../zeph-durable" }
# or with the postgres backend:
zeph-durable = { path = "../zeph-durable", default-features = false, features = ["postgres"] }
```

## Feature Flags

Backend selection is forwarded to `zeph-db`; exactly one backend is active at a time.

| Feature | Description | Default |
|---------|-------------|---------|
| `sqlite` | Enables the SQLite backend via `zeph-db/sqlite` | Yes |
| `postgres` | Enables the PostgreSQL backend via `zeph-db/postgres` | No |

> [!WARNING]
> `sqlite` and `postgres` are mutually exclusive (enforced by `zeph-db`). Building with
> `--all-features` is intentionally unsupported — use `--features full` or `--features full,postgres`.

## Usage

Idempotency keys are deterministic for a given `(execution, step, fingerprint)` and domain-separated
from any other BLAKE3 use:

```rust
use zeph_durable::{ExecutionId, IdempotencyKey, StepId};

let execution = ExecutionId::new(); // fresh, time-ordered UUIDv7

let key = IdempotencyKey::derive(execution, StepId::new(0), b"tool:read_file");
assert_eq!(
key,
IdempotencyKey::derive(execution, StepId::new(0), b"tool:read_file"),
);
```

Configuration deserializes from the `[durable]` TOML table with every field defaulted to its spec
value:

```rust
use zeph_durable::DurableConfig;

let cfg: DurableConfig = toml::from_str("").unwrap(); // empty table => all defaults
assert!(!cfg.enabled);
assert_eq!(cfg.journal_ack_timeout_ms, 5_000);
assert_eq!(cfg.max_payload_bytes, 1_048_576);
```

## MSRV

Rust **1.95** (Edition 2024, resolver 3).

## License

MIT — see [LICENSE](../../LICENSE).
Loading
Loading