bug-ops · bug-ops · Jun 6, 2026 · Jun 6, 2026 · Jun 6, 2026
diff --git a/.github/AGENTS.md b/.github/AGENTS.md
@@ -0,0 +1,8 @@
+# GitHub Automation Guide
+
+This directory contains CI, release, and repository automation files.
+
+- Keep workflow commands aligned with the actual supported local commands in the repository.
+- Prefer explicit, reproducible CI steps over clever shell tricks.
+- If you change checks, matrix entries, release flow, or badges, update the relevant docs and README references.
+- Treat permission scopes, tokens, and supply-chain settings as security-sensitive.
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -6,12 +6,14 @@ applyTo: "**"
 
 ## Project Context
 
-Zeph is a Rust AI agent workspace (Edition 2024, resolver 3) with 10 crates. Review all changes against these standards.
+Zeph is a Rust AI agent workspace (Edition 2024, resolver 3) with 31 library crates under `crates/` plus the `zeph` CLI binary. Review all changes against these standards.
 
 ## Architecture Rules
 
 - `zeph-core` orchestrates all leaf crates; no reverse dependencies allowed
-- `zeph-a2a`, `zeph-mcp`, `zeph-tui` are feature-gated and optional
+- `zeph-agent-context`, `zeph-agent-persistence`, and `zeph-agent-tools` MUST NOT depend on `zeph-core` — the dependency runs `zeph-core` → these crates, never the reverse; flag any new `zeph-core` import added to them
+- `zeph-durable` is a Layer-0 infra crate (alongside `zeph-db`, `zeph-common`); it MUST NOT depend on `zeph-llm`, `zeph-memory`, `zeph-core`, `zeph-sanitizer`, or any business-layer crate
+- `zeph-tui`, `zeph-a2a`, `zeph-acp`, `zeph-gateway`, `zeph-scheduler`, and `zeph-bench` are feature-gated and optional
 - All dependencies defined in root `[workspace.dependencies]` with no features at workspace level
 - Crates inherit via `workspace = true` and specify features locally
 - TLS: `rustls` only — reject any `openssl-sys` introduction

diff --git a/.gitignore b/.gitignore
@@ -14,3 +14,6 @@ book/book/
 *.log
 zeph.log
 *.session
+
+# Track per-directory agent guides despite the global AGENTS.md ignore
+!AGENTS.md
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,91 @@
+# Repository Guidelines
+
+## Project Structure & Module Organization
+Zeph is a Rust workspace (`Cargo.toml`) with the CLI entrypoint in `src/` and domain crates in `crates/` (for example: `zeph-core`, `zeph-llm`, `zeph-memory`, `zeph-tools`, `zeph-tui`).
+Top-level integration tests live in `tests/`, while crate-specific tests and benches live under each crate’s `tests/` and `benches/` directories. Documentation sources are in `docs/src/` (mdBook), runtime defaults are in `config/default.toml`, and container assets are in `docker/`.
+Do not create git worktrees inside this repository. Create separate worktrees only under the sibling directory `../worktrees/`.
+
+## Specifications (MANDATORY)
+
+All feature and system specifications live in `specs/`. **Compliance is non-negotiable.**
+
+- **Before implementing any feature**: read the relevant spec in `specs/` and the system
+  invariants in `specs/001-system-invariants/spec.md`.
+- **Before modifying any subsystem**: read the corresponding spec document. The `## Key Invariants`
+  and `NEVER` sections define hard constraints — violating them requires an explicit architectural
+  decision, not just a code review.
+- **Index**: `specs/README.md` — maps every subsystem to its spec file.
+- **Constitution**: `specs/constitution.md` — project-wide non-negotiable rules that apply to every change.
+- If a spec is missing or outdated for the area you are changing, create or update it before writing code.
+
+## Agent Operating Notes
+
+`AGENTS.md` is the primary instruction file for all coding agents (Codex, Copilot, etc.). `.claude/CLAUDE.md` and `.claude/rules/*` are Claude-specific or supplemental context — not the primary source of operating rules.
+Path-specific instructions for GitHub Copilot live in `.github/instructions/*.instructions.md` with `applyTo` frontmatter.
+
+- Use `cargo nextest run` as the default test runner.
+- Keep Rust changes compatible with Edition 2024 and MSRV `1.88`.
+- Prefer zero-warning `clippy`; avoid `unwrap`/`expect` in production code when proper error propagation is possible.
+- Any user-facing change must update relevant docs, config defaults, and `CHANGELOG.md` (`[Unreleased]` section).
+- For new functionality, provide all integration points: config section, CLI subcommand/argument, TUI command palette entry, `--init` wizard, `--migrate-config` migration step, live testing playbook in `.local/testing/playbooks/`, and coverage row in `.local/testing/coverage-status.md`.
+- In the TUI, every background operation (LLM inference, memory search, tool execution, MCP connection, etc.) must have a visible spinner with a short, descriptive status message.
+
+## Multi-Model Design
+
+All new functionality must be designed for multi-model operation. Match models to task complexity:
+
+- **Simple tasks** (entity extraction, embeddings, classification, routing): fast, cheap models (e.g. `gpt-4o-mini`, `qwen3:8b`).
+- **Medium tasks** (summarization, compaction, skill matching): mid-tier models.
+- **Complex tasks** (planning, code generation, multi-step reasoning): most capable models (e.g. `claude-opus-4`).
+
+All providers declared once in `[[llm.providers]]` with a `name` field. Subsystems reference a provider by name via a single `*_provider` field — no inline provider config duplication. When the field is empty, fall back to the default provider. Never hardcode a model name.
+
+## Secrets & Vault
+
+All secrets and API keys are stored exclusively in the Zeph age vault. Never use environment variables, `.env` files, or shell profiles for secrets. All `ZEPH_*` keys are resolved automatically from the age vault at startup. To read a key for external scripts: `cargo run --features full -- vault get <KEY_NAME>`.
+
+## Continuous Improvement Protocol
+- Continuous-improvement sessions are read-only: use them for live manual testing, anomaly detection, research, dependency review, and issue triage; do not make code changes in those sessions.
+- Do not treat unit/integration tests as sufficient validation for user-facing or agent-loop changes. Run real end-to-end agent sessions with real LLM/tool execution via `cargo run --features full -- --config .local/config/testing.toml` and use `--tui` or `RUST_LOG=debug` when needed.
+- For continuous-improvement and ACP/IDE test runs, override runtime artifact paths in the active config to absolute temporary locations so logs, SQLite state, debug dumps, audit files, and other ephemeral outputs land under a dedicated temp directory instead of polluting the default user data directories. Prefer a per-session root such as `/tmp/zeph-ci/<timestamp>/`.
+- At minimum, override `logging.file`, `memory.sqlite_path`, `debug.output_dir`, and any other runtime-writing paths used by the scenario under test to absolute temp paths before launching the agent.
+- After each live test session, review the resolved debug dump and log directories from the active config, plus metrics and agent responses, for correctness, regressions, retries, timeouts, hallucinations, and tool-use quality.
+- Treat the configured runtime artifact directory as the primary evidence location for live runs: request/response payloads in the configured debug dump directory and runtime traces in the configured log directory should be checked before drawing conclusions from terminal output alone.
+- For GonkaGate continuous-improvement runs, use `.local/config/testing-gonkagate.toml`. It must resolve `ZEPH_COMPATIBLE_GONKAGATE_API_KEY` from the age vault only, target `https://api.gonkagate.com/v1`, use `Qwen/Qwen3-235B-A22B-Instruct-2507-FP8`, and keep logs/debug/SQLite/audit artifacts under `/tmp/zeph-ci/gonkagate`.
+- Record every session in `.local/testing/journal.md`, including positive confirmations, failures, scope, config used, and component status. When behavior changes, reset the affected component to untested until re-verified.
+- When you find an anomaly, reproduce it, classify severity, and file a GitHub issue with clear repro steps and expected vs actual behavior. Fixes must happen later in a dedicated implementation session, not during the continuous-improvement pass.
+- Testing is a hard gate: when many components are untested or a batch of core changes landed, prioritize live validation before research, dependency updates, or new feature work.
+- Keep the test environment clean when relevant: clear the temporary runtime artifact directory, test DB state, audit logs, session logs, and overflow files before comprehensive runs; do not delete `.local/testing/`.
+- Maintain the testing knowledge base in `.local/testing/`: `journal.md`, `process-notes.md`, `playbooks/`, and `regressions.md`. Add regressions for newly found bugs and capture process improvements after each session.
+- Research new agent/testing techniques and monitor dependency health (`cargo outdated --workspace`, `cargo deny check advisories`) as part of the cycle, but only after current critical functionality has been live-tested.
+
+## Build, Test, and Development Commands
+- `cargo build --workspace --features full`: Build the full workspace feature set used in CI.
+- `cargo run -- --help`: Run the CLI locally and list commands.
+- `cargo nextest run --workspace --lib --bins`: Fast unit/binary test pass.
+- `cargo nextest run --config-file .github/nextest.toml --workspace --profile ci --features full`: Full suite with the CI config and full feature set (Docker required).
+- `cargo +nightly fmt --check`: Verify formatting.
+- `cargo clippy --profile ci --workspace --features full -- -D warnings`: Run CI-equivalent lint checks with the full feature set.
+- `cargo llvm-cov --all-features --workspace`: Generate coverage locally.
+
+## Coding Style & Naming Conventions
+Use Rust 2024 edition and MSRV `1.88`. Follow `rustfmt` defaults (4-space indentation) and keep Clippy warnings at zero where practical.
+Use `snake_case` for functions/modules/files, `PascalCase` for types/traits, and `SCREAMING_SNAKE_CASE` for constants. Prefer small modules with explicit responsibilities; keep public APIs in `lib.rs` minimal and re-export intentionally.
+
+## Testing Guidelines
+Use `cargo-nextest` as the default runner. Name integration tests with `*_integration.rs` and keep deterministic unit tests close to implementation.
+For features touching external systems (for example Qdrant), use container-backed tests and document required services. Add regression tests for every bug fix.
+
+## Commit & Pull Request Guidelines
+Git history follows Conventional Commits with scopes, e.g. `feat(zeph-tools): ...`, `fix: ...`, `ci: ...`.
+Keep subject lines imperative and concise, and reference issues/PRs when relevant (`(#736)`).
+Before every commit, run the full validation suite against the CI feature set (`--features full` where supported) and do not commit if any step fails:
+- `cargo +nightly fmt --check`
+- `cargo clippy --profile ci --workspace --features full -- -D warnings`
+- `cargo nextest run --config-file .github/nextest.toml --workspace --profile ci --features full`
+- `cargo check --workspace --features full`
+- `cargo build --workspace --features full`
+PRs should include: a clear problem statement, implementation summary, test evidence (commands + results), and docs/config updates when behavior changes.
+
+## Security & Configuration Tips
+Do not commit secrets. All secrets must go through the Zeph age vault (`zeph vault` commands) — never environment variables or `.env` files. Review `SECURITY.md` before changing tool execution, sandboxing, or permission flows.
diff --git a/book/AGENTS.md b/book/AGENTS.md
@@ -0,0 +1,8 @@
+# Docs Guide
+
+This directory holds the mdBook source for Zeph documentation.
+
+- Keep docs aligned with the current code, config defaults, and CLI behavior.
+- When adding or moving pages under `docs/src/`, update `docs/src/SUMMARY.md`.
+- Prefer concise, technical documentation over marketing copy in reference and guide pages.
+- If a code change affects user-facing behavior, configuration, architecture, or workflows, update the relevant docs in the same change.
diff --git a/config/AGENTS.md b/config/AGENTS.md
@@ -0,0 +1,8 @@
+# Config Guide
+
+This directory holds runtime defaults and configuration templates.
+
+- Treat `config/default.toml` as the source of truth for documented defaults unless a feature explicitly overrides them at runtime.
+- Keep config keys, comments, and docs in sync. Config supports `ZEPH_*` env var overrides for non-secret values only — secrets must go through the age vault, never env vars.
+- When adding or changing config keys, update the relevant docs and any setup/instruction materials that reference them.
+- Preserve readability: keep sections coherent and comments directly attached to the settings they describe.
diff --git a/crates/AGENTS.md b/crates/AGENTS.md
@@ -0,0 +1,8 @@
+# Crates Guide
+
+This directory contains the workspace crates. The root [`AGENTS.md`](../AGENTS.md) remains the primary instruction file; use local `AGENTS.md` files in individual crates for crate-specific workflow notes.
+
+- Keep changes local to the crate you are editing unless shared APIs require coordinated updates.
+- Default verification is crate-first: `cargo build -p <crate>`, `cargo nextest run -p <crate>`, then workspace checks only if the change crosses crate boundaries.
+- Use `cargo clippy -p <crate> --all-targets -- -D warnings` before considering crate-local work done.
+- If a crate's public behavior or configuration changes, update the crate `README.md`, root docs in `docs/src/`, and any relevant config or CLI surfaces.
diff --git a/crates/zeph-a2a/AGENTS.md b/crates/zeph-a2a/AGENTS.md
@@ -0,0 +1,8 @@
+# zeph-a2a Guide
+
+Protocol client/server work for A2A lives here.
+
+- Start with crate-local checks: `cargo build -p zeph-a2a`, `cargo nextest run -p zeph-a2a`, `cargo clippy -p zeph-a2a --all-targets -- -D warnings`.
+- Keep changes isolated to A2A transport, discovery, JSON-RPC, and server behavior unless a shared API truly requires cross-crate edits.
+- Preserve protocol-facing behavior unless the task explicitly calls for a spec-aligned change.
+- If external behavior changes, update `crates/zeph-a2a/README.md` and the relevant docs in `docs/src/advanced/a2a.md`.
diff --git a/crates/zeph-acp/AGENTS.md b/crates/zeph-acp/AGENTS.md
@@ -0,0 +1,8 @@
+# zeph-acp Guide
+
+IDE embedding, ACP transport, permissions, and session handling live here.
+
+- Start with crate-local checks: `cargo build -p zeph-acp`, `cargo nextest run -p zeph-acp`, `cargo clippy -p zeph-acp --all-targets -- -D warnings`.
+- Be careful with session lifecycle, permission gates, HTTP/WebSocket transport, and filesystem/terminal bridging.
+- Changes in ACP behavior should stay aligned with the CLI flags in the root binary and with ACP-related docs.
+- If user-visible behavior changes, update `crates/zeph-acp/README.md` and the relevant docs in `docs/src/advanced/acp.md` or `docs/src/guides/ide-integration.md`.
diff --git a/crates/zeph-agent-context/AGENTS.md b/crates/zeph-agent-context/AGENTS.md
@@ -0,0 +1,11 @@
+# zeph-agent-context Guide
+
+Context-assembly service (`ContextService`): system prompt rebuild, memory injection, semantic recall, summarization, and compaction. A stateless façade extracted from `Agent<C>` so context-assembly edits do not recompile the tool dispatcher or persistence layer.
+
+- Start with crate-local checks: `cargo build -p zeph-agent-context`, `cargo nextest run -p zeph-agent-context`, `cargo clippy -p zeph-agent-context --all-targets -- -D warnings`.
+- Read `specs/021-zeph-context/spec.md` before changing assembly, budget, or compaction behavior; honor its `## Key Invariants` and `NEVER` sections.
+- Core invariant: this crate MUST NOT depend on `zeph-core`. The decoupling is the whole point — never add a `zeph-core` dependency to satisfy a borrow.
+- Keep the borrow-lens views (`MessageWindowView`, `ContextAssemblyView`, `ContextSummarizationView`) narrow; `zeph-core` constructs them from `Agent` field projections.
+- Features: `sqlite` (default) / `postgres` forwarded to `zeph-memory`, plus `index` for `IndexAccess` integration. Run `cargo nextest run -p zeph-agent-context --features index` when touching index-backed assembly views.
+- LLM serialization gate: summarization and compaction build LLM request payloads — changes to `summarization/` or `compaction.rs` require a live API session test (no 400/422, well-formed `messages` array in the debug dump) before merge.
+- Multi-model: summarization and compaction call an LLM — resolve the provider via a `*_provider` field referencing a named `[[llm.providers]]` entry; never hardcode a model.
diff --git a/crates/zeph-agent-feedback/AGENTS.md b/crates/zeph-agent-feedback/AGENTS.md
@@ -0,0 +1,10 @@
+# zeph-agent-feedback Guide
+
+Implicit correction detection from user messages: the regex-only `FeedbackDetector` (zero LLM calls) and the LLM-backed `JudgeDetector` for borderline or missed cases.
+
+- Start with crate-local checks: `cargo build -p zeph-agent-feedback`, `cargo nextest run -p zeph-agent-feedback`, `cargo clippy -p zeph-agent-feedback --all-targets -- -D warnings`.
+- Read `specs/016-agent-feedback/spec.md` before changing detection strategy, thresholds, or rate limits.
+- Multi-language patterns span 7 languages (en, ru, es, de, fr, zh, ja). Any new or altered pattern needs regression tests for both tiers — anchored (`^`, base confidence) and unanchored (mid-sentence, base − 0.10). Preserve that tier semantics.
+- Mind the CJK limitations documented in `lib.rs`: `token_overlap()` uses whitespace tokenisation and does not segment Chinese/Japanese; keep the 2+ character minimum for unanchored CJK patterns to limit false positives.
+- Keep regexes compiled once into the flat `Vec<(Regex, f32)>` per correction kind — this is a hot path; never recompile per message.
+- Multi-model: `JudgeDetector` calls an LLM — resolve the provider via a `*_provider` field referencing a named `[[llm.providers]]` entry; never hardcode a model. Do not bypass the `judge_rate_limit` / `judge_rate_window_secs` rate limiting from `LearningConfig`.
diff --git a/crates/zeph-agent-persistence/AGENTS.md b/crates/zeph-agent-persistence/AGENTS.md
@@ -0,0 +1,11 @@
+# zeph-agent-persistence Guide
+
+Persistence service (`PersistenceService`): loads conversation history from and writes messages to `SemanticMemory` (SQLite + Qdrant), plus tool-pair sanitization, embedding decisions, and graph-extraction configuration.
+
+- Start with crate-local checks: `cargo build -p zeph-agent-persistence`, `cargo nextest run -p zeph-agent-persistence`, `cargo clippy -p zeph-agent-persistence --all-targets -- -D warnings`.
+- Read `specs/057-agent-persistence/spec.md` before changing history loading, persistence, or extraction enqueueing.
+- Core invariant: this crate MUST NOT depend on `zeph-core`. Keep the borrow-lens views (`MemoryPersistenceView`, `SecurityView`, `MetricsView`) narrow; `zeph-core` builds them from `Agent` fields.
+- Features: `sqlite` (default) / `postgres` forwarded to `zeph-memory` — verify behavior is identical across both backends; silent divergence is a first-class bug.
+- LLM serialization gate: tool-pair sanitization (`sanitize.rs`, `request.rs`) controls whether `tool_use`/`tool_result` blocks are well-formed. A malformed pairing causes hard LLM 400/422 errors that unit tests do not catch — changes here require a live multi-turn + tool-call session test before merge.
+- Embedding dimension mismatches are a recurring source of bugs: whenever the embedding model or vector collection config changes, verify stored and query vector dimensions match before running tests.
+- Multi-model: graph extraction and embedding each call an LLM/embedder — resolve via `*_provider` fields referencing named `[[llm.providers]]` entries; never hardcode a model.