Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,19 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
`key_id || nonce(24) || ciphertext || tag(16)` blob layout with a one-key rotation window, and
zeroized key material. Keyed from the vault `ZEPH_DURABLE_KEY`; see the new "Durable Journal
Encryption" security reference page for the key-rotation policy. (#4945)
- `feat(durable)`: added the durable persistence engine to `zeph-durable`. `LocalBackend` owns a
dedicated `durable.db` pool (INV-14, schema applied via `zeph_db::run_migrations`), implements the
`Journal` trait (append, full and ranged reads, `finalize`, and a `prune` stub), AEAD-seals
payload-bearing entries through the injected `Option<Arc<dyn PayloadCipher>>` with the entry's
location bound as associated data, and stamps a keyed-BLAKE3 row HMAC over control entries when a
key is configured. The background `JournalWriter` actor decouples writes from the calling path:
buffered appends group-commit on a flush interval (dropped with a `WARN` under backpressure),
exactly-once appends flush all causally-preceding entries before committing (INV-4) and return
their `JournalSeq` over a oneshot bounded by `journal_ack_timeout_ms` (INV-12, FR-DE-11), and the
writer resumes from `MAX(seq)` on restart (FR-DE-12). The sealed `ExecutionBackend` trait and the
`DurableBackendEnum` enum dispatcher keep the backend surface closed with no `Box<dyn>` on the
hot path. Promise, timer, and checkpoint journal entries fail closed
(`DurableError::UnsupportedEntryKind`) until the promise/timer layer lands. (#4946)

### Fixed

Expand Down
4 changes: 4 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions crates/zeph-durable/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,18 @@ postgres = ["zeph-db/postgres"]
[dependencies]
blake3.workspace = true
bytes.workspace = true
metrics.workspace = true
serde = { workspace = true, features = ["derive"] }
thiserror.workspace = true
tokio = { workspace = true, features = ["macros", "sync", "time"] }
tracing.workspace = true
uuid = { workspace = true, features = ["serde", "v7"] }
zeph-db.workspace = true

[dev-dependencies]
serde_json.workspace = true
tempfile.workspace = true
tokio = { workspace = true, features = ["macros", "rt-multi-thread", "sync", "time"] }
toml.workspace = true

[lints]
Expand Down
22 changes: 17 additions & 5 deletions crates/zeph-durable/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,12 @@ flow* of an execution (steps, promises, timers) so a crashed or interrupted run
point of failure instead of restarting from scratch.

> [!IMPORTANT]
> This crate is a **foundational scaffold** (spec-064, issue #4944). It currently exposes
> *type-level* building blocks only — there is **no execution behavior yet**. The journal writer,
> execution backends, replay cursor, and the durable step primitive land in follow-up issues of
> epic [#4707](https://github.com/bug-ops/zeph/issues/4707).
> This crate is **under active construction** (spec-064, epic
> [#4707](https://github.com/bug-ops/zeph/issues/4707)). The type-level foundation, the AEAD payload
> contract, and the persistence engine — `LocalBackend` (a dedicated `durable.db` pool), the
> background `JournalWriter` actor, and the sealed `ExecutionBackend` dispatcher — have landed. The
> `DurableContext` facade, the replay cursor, the promise/timer layer, and the consuming adapters
> land in follow-up issues of the epic.

## Overview

Expand All @@ -36,15 +38,25 @@ a dedicated `durable.db` (SQLite) or a feature-gated Restate backend.
enum, and `ExecutionStatus`.
- **effect** — `EffectClass`, the per-step side-effect contract (`Idempotent` / `AtLeastOnce` /
`ExactlyOnceGuarded`).
- **cipher** — the `PayloadCipher` AEAD seal/open contract, the `PayloadAad` location binding, and
the read-side `ensure_payload_within_limit` guard. The concrete cipher lives in a consuming crate
(INV-1).
- **config** — pure-data `DurableConfig` and `RetentionPolicy` mirroring the `[durable]` TOML
section, with spec defaults applied on deserialization.
- **backend** — the sealed `ExecutionBackend` trait, `BackendCapabilities`, the `DurableBackendEnum`
enum dispatcher, and `LocalBackend` (a dedicated `durable.db` pool implementing `Journal`, sealing
payloads through the injected cipher).
- **writer** — the background `JournalWriter` actor and its cloneable `JournalWriterHandle`:
group-commit for buffered appends, flush-before-commit ACKs for exactly-once entries, and
`MAX(seq)` restart resume.
- **error** — the crate-wide `DurableError`.

## Architecture & invariants

- **Layer 0, no business-logic dependencies (INV-1).** `zeph-durable` MUST NOT depend on
`zeph-llm`, `zeph-memory`, `zeph-core`, `zeph-sanitizer`, or any business-layer crate. Its only
dependencies are `zeph-db` and `zeph-common`.
direct `zeph-*` dependency is `zeph-db`; the rest are infrastructure crates (`tokio`, `tracing`,
`metrics`, `bytes`, `blake3`, `serde`, `uuid`). The concrete payload cipher lives in `zeph-core`.
- **Closed enums make illegal states unrepresentable.** Control entries (`EffectIntent`,
`PromiseCreated`, `TimerArmed`) carry no payload field — a "control entry with payload" cannot be
constructed.
Expand Down
159 changes: 159 additions & 0 deletions crates/zeph-durable/src/backend.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
// SPDX-FileCopyrightText: 2026 Andrei G <bug-ops>
// SPDX-License-Identifier: MIT OR Apache-2.0

//! The sealed execution-backend abstraction and its enum-dispatch front door.
//!
//! A backend is the persistence engine behind a durable execution: it journals control flow and,
//! for the local backend, owns the dedicated `durable.db` pool. The [`ExecutionBackend`] trait is
//! **sealed** (it requires [`crate::sealed::Sealed`]), so only backends declared inside this crate
//! can implement it. External crates never name a concrete backend; they hold a
//! [`DurableBackendEnum`] and dispatch through it.
//!
//! # Why enum dispatch instead of `Box<dyn ExecutionBackend>`
//!
//! The journal append path is hot. A trait object would force a virtual call and a heap allocation
//! per dispatch; [`DurableBackendEnum`] resolves the backend with a single `match` and no
//! allocation (the spec's NEVER list forbids `Box<dyn ExecutionBackend>` on the dispatch path).
//! Because the trait is sealed, adding methods to it later — when the `DurableContext`, promise,
//! and timer entry points land — is a non-breaking change.
//!
//! # Scope
//!
//! This module defines [`BackendCapabilities`], the sealed [`ExecutionBackend`] trait (with its
//! `capabilities` accessor), and the [`DurableBackendEnum`] dispatcher. The execution-open,
//! promise-resolution, and timer-scan methods named in the spec land alongside the
//! `DurableContext` (the trait can gain them without breaking callers).

use crate::config::RetentionPolicy;
use crate::error::DurableError;
use crate::ids::{ExecutionId, JournalSeq};
use crate::journal::{ExecutionStatus, Journal, JournalEntry};

pub mod local;

pub use local::LocalBackend;

/// The capabilities a backend advertises so callers can adapt their journaling strategy.
///
/// The replay cursor and the durable-step primitive read these flags to decide, for example,
/// whether parallel steps may journal concurrently or must be serialized into reserved-id order.
///
/// # Examples
///
/// ```
/// use zeph_durable::BackendCapabilities;
///
/// // The local backend journals parallel steps concurrently and stays in-process.
/// let caps = BackendCapabilities {
/// parallel_steps: true,
/// cross_process: false,
/// max_payload: 1_048_576,
/// };
/// assert!(caps.parallel_steps);
/// ```
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct BackendCapabilities {
/// Whether the backend may record parallel steps concurrently. `false` (e.g. Restate) requires
/// the durable wrapper to serialize *recording* into reserved-`StepId` order.
pub parallel_steps: bool,
/// Whether the journal lives on a database shared across processes. Drives the INV-8 encryption
/// gate and row-level HMAC requirement.
pub cross_process: bool,
/// The maximum payload size, in bytes, the backend accepts on append.
pub max_payload: usize,
}

/// A durable-execution persistence backend.
///
/// `ExecutionBackend` is the closed set of journal engines Zeph ships. It is sealed via
/// [`crate::sealed::Sealed`]: external crates cannot implement it and must dispatch through
/// [`DurableBackendEnum`]. Every backend is also a [`Journal`], so the append/read/finalize/prune
/// surface is available uniformly.
///
/// # Contract for implementors
///
/// - [`capabilities`](ExecutionBackend::capabilities) MUST return a stable description of the
/// backend; callers cache it and adapt their journaling strategy to it.
/// - The [`Journal`] half MUST serialize writes through a single connection so appends receive a
/// monotonic [`JournalSeq`].
///
/// Additional entry points (execution open, promise resolution, timer scan) are added as the
/// higher layers land; because the trait is sealed, those additions do not break callers.
pub trait ExecutionBackend: Journal + Send + Sync + crate::sealed::Sealed {
/// Return this backend's stable capability description.
fn capabilities(&self) -> BackendCapabilities;
}

/// Closed enum dispatch over the compiled-in backends.
///
/// Construct it from a concrete backend and hand it across the crate boundary behind an `Arc`;
/// callers invoke the [`Journal`] and [`ExecutionBackend`] methods on the enum and the dispatch
/// resolves to the active variant with a single `match`. The enum is `#[non_exhaustive]`: the
/// feature-gated `Restate` variant joins it with the `restate` feature without breaking in-crate
/// matches.
///
/// # Examples
///
/// ```
/// use std::sync::Arc;
/// use zeph_durable::{BackendCapabilities, DurableBackendEnum};
///
/// fn max_payload(backend: &DurableBackendEnum) -> usize {
/// use zeph_durable::ExecutionBackend as _;
/// backend.capabilities().max_payload
/// }
/// # let _ = max_payload;
/// ```
#[derive(Debug)]
#[non_exhaustive]
pub enum DurableBackendEnum {
/// The always-compiled local backend journaling to a dedicated `durable.db`.
Local(LocalBackend),
}

impl crate::sealed::Sealed for DurableBackendEnum {}

impl Journal for DurableBackendEnum {
async fn append(&self, entry: JournalEntry) -> Result<JournalSeq, DurableError> {
match self {
Self::Local(backend) => backend.append(entry).await,
}
}

async fn read_execution(&self, id: ExecutionId) -> Result<Vec<JournalEntry>, DurableError> {
match self {
Self::Local(backend) => backend.read_execution(id).await,
}
}

async fn read_execution_range(
&self,
id: ExecutionId,
from_step_id: u32,
limit: usize,
) -> Result<Vec<JournalEntry>, DurableError> {
match self {
Self::Local(backend) => backend.read_execution_range(id, from_step_id, limit).await,
}
}

async fn finalize(&self, id: ExecutionId, status: ExecutionStatus) -> Result<(), DurableError> {
match self {
Self::Local(backend) => backend.finalize(id, status).await,
}
}

async fn prune(&self, policy: &RetentionPolicy) -> Result<u64, DurableError> {
match self {
Self::Local(backend) => backend.prune(policy).await,
}
}
}

impl ExecutionBackend for DurableBackendEnum {
fn capabilities(&self) -> BackendCapabilities {
match self {
Self::Local(backend) => backend.capabilities(),
}
}
}
Loading
Loading