From ee048ecb9c7fd03a229c012119355f4c34581df0 Mon Sep 17 00:00:00 2001
From: Alisha Kawaguchi <alisha@entire.io>
Date: Fri, 27 Feb 2026 16:20:02 -0800
Subject: [PATCH 1/4] Update agent-integration skill docs

Entire-Checkpoint: ea560cc631ef
---
 .claude/skills/agent-integration/SKILL.md     |  62 +++-
 .../skills/agent-integration/implementer.md   | 312 ++++++++++++++----
 .../skills/agent-integration/researcher.md    | 129 ++++----
 .../skills/agent-integration/test-writer.md   |  97 ++----
 4 files changed, 397 insertions(+), 203 deletions(-)
diff --git a/.claude/skills/agent-integration/SKILL.md b/.claude/skills/agent-integration/SKILL.md
index 88be7ebc5..ba2ce1560 100644
--- a/.claude/skills/agent-integration/SKILL.md
+++ b/.claude/skills/agent-integration/SKILL.md
@@ -2,7 +2,8 @@
 name: agent-integration
 description: >
   Run all three agent integration phases sequentially: research, write-tests,
-  and implement. For individual phases, use /agent-integration:research,
+  and implement using E2E-first TDD (unit tests written last).
+  For individual phases, use /agent-integration:research,
   /agent-integration:write-tests, or /agent-integration:implement.
   Use when the user says "integrate agent", "add agent support", or wants
   to run the full agent integration pipeline end-to-end.
@@ -16,13 +17,16 @@ Run all three phases of agent integration in a single session. Parameters are co
 
 Collect these before starting (ask the user if not provided):
 
-| Parameter | Example | Description |
-|-----------|---------|-------------|
-| `AGENT_NAME` | "Windsurf" | Human-readable agent name |
-| `AGENT_SLUG` | "windsurf" | Lowercase slug for file/directory paths |
-| `AGENT_BIN` | "windsurf" | CLI binary name |
-| `LIVE_COMMAND` | "windsurf --project ." | Full command to launch agent |
-| `EVENTS_OR_UNKNOWN` | "unknown" | Known hook event names, or "unknown" |
+| Parameter | Description | How to derive |
+|-----------|-------------|---------------|
+| `AGENT_NAME` | Human-readable name (e.g., "Gemini CLI") | User provides |
+| `AGENT_PACKAGE` | Go package dir name — **no hyphens** | Lowercase, remove hyphens/spaces |
+| `AGENT_KEY` | Registry key for `agent.Register()` and `entire enable` | Check existing patterns in `cmd/entire/cli/agent/registry.go` |
+| `AGENT_BIN` | CLI binary name | `command -v <binary>` |
+| `LIVE_COMMAND` | Full command to launch agent | User provides |
+| `EVENTS_OR_UNKNOWN` | Known hook event names, or "unknown" | From agent docs or "unknown" |
+
+**Note:** These identifiers can differ. Run `grep -r 'AgentName\|func.*Name()' cmd/entire/cli/agent/*/` and `e2e/agents/` to see how existing agents handle the split.
 
 ## Architecture References
 
@@ -31,35 +35,59 @@ These documents define the agent integration contract:
 - **Implementation guide**: `docs/architecture/agent-guide.md` — Step-by-step code templates, event mapping, testing patterns
 - **Integration checklist**: `docs/architecture/agent-integration-checklist.md` — Design principles and validation criteria
 
+## Scope
+
+This skill targets **hook-capable agents** — those that support lifecycle hooks
+(implementing `HookSupport` from `agent.go`). Agents that use file-based detection
+(implementing `FileWatcher`) require a different integration approach not covered here.
+Check `agent.go` for the current interface definitions.
+
+## Core Rule: E2E-First TDD
+
+This skill enforces strict E2E-first test-driven development. The rules:
+
+1. **E2E tests are the spec.** The existing `ForEachAgent` test scenarios define what "working" means. The agent runner makes those tests runnable for the new agent.
+2. **Run E2E tests at every step.** Each implementation tier starts by running the E2E test and watching it fail. You implement until it passes. No exceptions.
+3. **Unit tests are written last.** After all E2E tiers pass (Step 14), you write unit tests using real data collected from E2E runs as golden fixtures.
+4. **If you didn't watch it fail, you don't know if it tests the right thing.** Never write a test you haven't seen fail first.
+5. **Minimum viable fix.** At each E2E failure, implement only the code needed to fix that failure. Don't anticipate future tiers.
+6. **`/debug-e2e` is your debugger.** When an E2E test fails, use the artifact directory with `/debug-e2e` before guessing at fixes.
+
 ## Pipeline
 
 Run these three phases in order. Each phase builds on the previous phase's output.
 
 ### Phase 1: Research
 
-Assess whether the agent's hook/lifecycle model is compatible with the Entire CLI.
+Discover the agent's hook mechanism, transcript format, and configuration through binary probing and documentation research. Produces an implementation one-pager at `cmd/entire/cli/agent/$AGENT_PACKAGE/AGENT.md` that the other phases use as their single source of agent-specific information.
 
 Read and follow the research procedure from `.claude/skills/agent-integration/researcher.md`.
 
-**Expected output:** Compatibility report with lifecycle event mapping, interface feasibility assessment, and a test script at `scripts/test-$AGENT_SLUG-agent-integration.sh`.
+**Expected output:** Implementation one-pager at `cmd/entire/cli/agent/$AGENT_PACKAGE/AGENT.md` and a test script at `scripts/test-$AGENT_SLUG-agent-integration.sh`.
+
+**Commit:** After the research phase completes, use `/commit` to commit all files.
 
 **Gate:** If the verdict is INCOMPATIBLE, stop and discuss with the user before proceeding.
 
-### Phase 2: Write Tests
+### Phase 2: Write E2E Runner
 
-Generate the E2E test suite based on the research findings.
+Create the E2E agent runner so existing test scenarios can exercise the new agent. No unit tests are written in this phase — no new test scenarios either (existing `ForEachAgent` tests are the spec).
 
-Read and follow the write-tests procedure from `.claude/skills/agent-integration/test-writer.md`.
+Read and follow the procedure from `.claude/skills/agent-integration/test-writer.md`.
 
-**Expected output:** E2E agent runner at `e2e/agents/$AGENT_SLUG.go` and any agent-specific test scenarios.
+**Expected output:** E2E agent runner at `e2e/agents/$AGENT_SLUG.go` that compiles and registers with the test framework.
 
-### Phase 3: Implement
+**Commit:** After the E2E runner compiles and registers, use `/commit` to commit all files.
 
-Build the Go agent package using test-driven development.
+### Phase 3: Implement (E2E-First, Unit Tests Last)
+
+Build the Go agent package using strict E2E-first TDD. E2E tests drive development at every step — run each tier, watch it fail, implement the minimum fix, repeat. Unit tests are written only after all E2E tiers pass, using real data from E2E runs as golden fixtures.
 
 Read and follow the implement procedure from `.claude/skills/agent-integration/implementer.md`.
 
-**Expected output:** Complete agent package at `cmd/entire/cli/agent/$AGENT_SLUG/` with all tests passing.
+**Expected output:** Complete agent package at `cmd/entire/cli/agent/$AGENT_PACKAGE/` with all E2E tiers passing and unit tests locking in behavior.
+
+**Note:** `AGENT.md` is a living document — Phases 2 and 3 update it when they discover new information during testing or implementation.
 
 ## Final Validation
 
diff --git a/.claude/skills/agent-integration/implementer.md b/.claude/skills/agent-integration/implementer.md
index 24174f7e3..2d2a4321a 100644
--- a/.claude/skills/agent-integration/implementer.md
+++ b/.claude/skills/agent-integration/implementer.md
@@ -1,12 +1,24 @@
 # Implement Command
 
-Build the agent Go package using test-driven development. Uses the research report findings and the E2E test suite as the spec.
+Build the agent Go package using strict E2E-first TDD. Unit tests are written ONLY after all E2E tests pass.
 
 ## Prerequisites
 
-- The research command's findings (hook events, transcript format, config mechanism)
+- The research command's one-pager at `cmd/entire/cli/agent/$AGENT_PACKAGE/AGENT.md`
 - The E2E test runner already added (from `write-tests` command)
-- If neither exists, read the agent's docs and ask the user about hook events, transcript format, and config
+- If no one-pager exists, read the agent's docs and ask the user about hook events, transcript format, and config
+
+## Core Principle: E2E-First TDD
+
+1. **E2E tests are the spec.** The existing `ForEachAgent` test scenarios define "working". You implement until they pass.
+2. **Watch it fail first.** Every E2E tier starts by running the test and observing the failure. If you haven't seen the failure, you don't understand what needs fixing.
+3. **Minimum viable fix.** At each failure, implement only the code needed to make that specific assertion pass. Don't anticipate future tiers.
+4. **`/debug-e2e` is your debugger.** When an E2E test fails, use the artifact directory with `/debug-e2e` before guessing at fixes.
+5. **No unit tests during Steps 4-13.** Unit tests are written in Step 14 after all E2E tiers pass, using real data from E2E runs as golden fixtures.
+6. **Format and lint, don't unit test.** Between E2E tiers, run `mise run fmt && mise run lint` to keep code clean. No `mise run test` until Step 14.
+7. **If you didn't watch it fail, you don't know if it tests the right thing.**
+
+**Do NOT write unit tests during Steps 4-13.** All test writing is consolidated in Step 14.
 
 ## Procedure
 
@@ -21,103 +33,268 @@ Read these files thoroughly before writing any code:
 
 ### Step 2: Read Reference Implementation
 
-Run `Glob("cmd/entire/cli/agent/*/")` to find all existing agent packages. Pick the closest match based on research findings — read a few agents' `hooks.go` files to find one with a similar hook mechanism to your target. Read all `*.go` files (skip `*_test.go` on first pass) in the chosen reference.
+Read `cmd/entire/cli/agent/$AGENT_PACKAGE/AGENT.md` (the one-pager from the research phase) for the agent's hook mechanism, transcript format, and config structure.
 
-### Step 3: Create Package Structure
+Run `Glob("cmd/entire/cli/agent/*/")` to find all existing agent packages. Check the one-pager's "Hook Mechanism" and "Gaps & Limitations" sections to pick the best reference — choose an agent with a similar hook mechanism to your target. Read all `*.go` files (skip `*_test.go` on first pass) in the chosen reference.
 
-Create the agent package directory:
+### Step 3: Create Bare-Minimum Compiling Package
+
+Create the agent package directory and stub out every required interface method so the project compiles.
 
 ```
-cmd/entire/cli/agent/$AGENT_SLUG/
+cmd/entire/cli/agent/$AGENT_PACKAGE/
 ```
 
-### Step 4: TDD Cycle — Types
+**What to create:**
 
-**Red**: Write `types_test.go` with tests for hook input struct parsing:
+1. **`${AGENT_PACKAGE}.go`** — Struct definition, `init()` with `agent.Register(agent.AgentName("$AGENT_KEY"), New)`, and stub implementations for every method in the `Agent` interface — refer to `agent.go` from Step 1. Include `HookSupport` methods in `lifecycle.go` and `hooks.go`.
+2. **`types.go`** — Hook input struct(s) with JSON tags matching the one-pager's "Hook input (stdin JSON)" section.
+3. **`lifecycle.go`** — Stub `ParseHookEvent()` that returns `nil, nil` for all inputs. Use the one-pager's "Hook names" table for the native hook name → Entire EventType mapping.
+4. **`hooks.go`** — Stub `InstallHooks()`, `UninstallHooks()`, `AreHooksInstalled()` that return nil/false. Use the one-pager's "Config file" and "Hook registration" sections for the config path and format.
+5. **`transcript.go`** — Stub `TranscriptAnalyzer` methods if the one-pager's "Transcript" section indicates the agent supports transcript analysis. Use the one-pager for transcript location and format.
 
-```go
-//go:build !e2e
+**Wire up blank imports:**
 
-package $AGENT_SLUG
+- Ensure the blank import `_ "github.com/entireio/cli/cmd/entire/cli/agent/$AGENT_PACKAGE"` exists in `cmd/entire/cli/hooks_cmd.go`
 
-import (
-    "encoding/json"
-    "testing"
-)
+**Verify compilation:**
 
-func TestHookInput_Parsing(t *testing.T) {
-    t.Parallel()
-    // Test that hook JSON payloads deserialize correctly
-}
+```bash
+mise run fmt && mise run lint && mise run test
 ```
 
-**Green**: Write `types.go` with hook input structs:
+Everything must pass before proceeding. Fix any issues.
 
-```go
-package $AGENT_SLUG
+**Commit:** Use `/commit` to commit all files. Skip if no files changed.
 
-// HookInput represents the JSON payload from the agent's hooks.
-type HookInput struct {
-    SessionID      string `json:"session_id"`
-    TranscriptPath string `json:"transcript_path"`
-    // ... fields from research report's captured payloads
-}
-```
+**Standing instruction for Steps 4-12:** If you need agent-specific information (hook format, transcript location, config structure), check `AGENT.md` first. If `AGENT.md` doesn't cover what you need, you may search external docs — but always update `AGENT.md` with anything new you discover so future steps don't need to re-search.
+
+### Step 4: E2E Tier 1 — `TestHumanOnlyChangesAndCommits`
+
+This test requires no agent prompts — it only exercises hooks, so it's the fastest feedback loop.
+
+**What it exercises:**
+- `InstallHooks()` — real hook installation in the agent's config
+- `AreHooksInstalled()` — detection that hooks are present
+- `ParseHookEvent()` — at minimum, the event types needed for session start and turn end (see `EventType` constants in `event.go`)
+- Basic hook invocation flow (the test calls hooks directly via the CLI)
+
+**Cycle:**
+
+1. Run: `mise run test:e2e --agent $AGENT_SLUG TestHumanOnlyChangesAndCommits`
+2. **Watch it fail** — read the failure output carefully
+3. If there are artifact dirs, use `/debug-e2e {artifact-dir}` to understand what happened
+4. Implement the minimum code to fix the first failure
+5. Repeat until the test passes
+
+Run: `mise run fmt && mise run lint`
+
+**Commit:** Use `/commit` to commit all files. Skip if no files changed.
+
+### Step 5: E2E Tier 2 — `TestSingleSessionManualCommit`
+
+The foundational test. This exercises the full agent lifecycle: start session → agent prompt → agent produces files → user commits → session ends.
+
+**What it exercises:**
+- Complete `ParseHookEvent()` for all lifecycle event types from `event.go`. Use the one-pager's hook mapping table to translate native hook names to `EventType` constants.
+- `GetSessionDir` / `ResolveSessionFile` — finding the agent's session/transcript files
+- `ReadTranscript` / `ChunkTranscript` / `ReassembleTranscript` — reading native transcript format
+- `TranscriptAnalyzer` methods (see `agent.go` for current method signatures)
+
+**Cycle:**
+
+1. Run: `mise run test:e2e -agent $AGENT_SLUG TestSingleSessionManualCommit`
+2. **Watch it fail** — read the failure output carefully
+3. Use `/debug-e2e {artifact-dir}` to understand what happened
+4. Implement the minimum code to fix the first failure
+5. Repeat until the test passes
+
+Run: `mise run fmt && mise run lint`
+
+**Commit:** Use `/commit` to commit all files. Skip if no files changed.
+
+### Step 6: E2E Tier 2b — `TestCheckpointMetadataDeepValidation`
+
+Validates transcript quality: JSONL validity, content hash correctness, prompt extraction accuracy.
+
+**What it exercises:**
+- Transcript content stored at checkpoints is valid JSONL
+- Content hash matches the stored transcript
+- User prompts are correctly extracted
+- Metadata fields are populated
+
+**Cycle:**
+
+1. Run: `mise run test:e2e --agent $AGENT_SLUG TestCheckpointMetadataDeepValidation`
+2. **Watch it fail** — this test often exposes subtle transcript formatting bugs
+3. Use `/debug-e2e {artifact-dir}` on any failures
+4. Fix and repeat
+
+Run: `mise run fmt && mise run lint`
+
+**Commit:** Use `/commit` to commit all files. Skip if no files changed.
+
+### Step 7: E2E Tier 3 — `TestSingleSessionAgentCommitInTurn`
+
+Agent creates files and commits them within a single prompt turn. Tests the in-turn commit path.
+
+**What it exercises:**
+- Hook events firing during an agent's commit (post-commit hooks while agent is active)
+- Checkpoint creation when agent commits mid-turn
+- Usually no new agent-specific code needed — this tests the strategy's handling of agent commits
+
+**Cycle:**
+
+1. Run: `mise run test:e2e --agent $AGENT_SLUG TestSingleSessionAgentCommitInTurn`
+2. **Watch it fail** — use `/debug-e2e {artifact-dir}` on failures
+3. Fix and repeat — if the agent doesn't support committing, skip this test
+
+Run: `mise run fmt && mise run lint`
+
+**Commit:** Use `/commit` to commit all files. Skip if no files changed.
+
+### Step 8: E2E Tier 4 — Multi-Session Tests
+
+Run these tests to validate multi-session behavior:
+
+- `TestMultiSessionManualCommit` — Two sessions, both produce files, user commits
+- `TestMultiSessionSequential` — Sessions run one after another
+- `TestEndedSessionUserCommitsAfterExit` — User commits after session ends
+
+**Cycle (for each test):**
 
-**Refactor**: Ensure struct tags match the actual JSON field names from the research captures.
+1. Run: `mise run test:e2e:$AGENT_SLUG TestMultiSessionManualCommit`
+2. **Watch it fail** — use `/debug-e2e {artifact-dir}` on failures
+3. Fix and repeat
+4. Move to next test
 
-Run: `mise run test` to verify.
+These tests rarely need new agent code — they exercise the strategy layer.
 
-### Step 5: TDD Cycle — Core Agent
+Run: `mise run fmt && mise run lint`
 
-**Red**: Write `${AGENT_SLUG}_test.go` with tests for Identity methods (Name, Type, Description, IsPreview, DetectPresence, ProtectedDirs) and session management methods.
+**Commit:** Use `/commit` to commit all files. Skip if no files changed.
 
-**Green**: Create `${AGENT_SLUG}.go`. Read the `Agent` interface in `cmd/entire/cli/agent/agent.go` for exact method signatures. Read `docs/architecture/agent-guide.md` Step 3 for the full code template. Use `agent.Register(agent.AgentName("$AGENT_SLUG"), New)` in `init()`.
+### Step 9: E2E Tier 5 — File Operation Edge Cases
 
-Run: `mise run test`
+Run these tests for file operation correctness:
 
-### Step 6: TDD Cycle — Lifecycle (ParseHookEvent)
+- `TestModifyExistingTrackedFile` — Agent modifies (not creates) a file
+- `TestUserSplitsAgentChanges` — User stages only some of the agent's changes
+- `TestDeletedFilesCommitDeletion` — Agent deletes a file, user commits the deletion
+- `TestMixedNewAndModifiedFiles` — Agent both creates and modifies files
 
-This is the **main contribution surface** — mapping native hooks to Entire events.
+**Cycle:** Same as above — run each test, **watch it fail**, use `/debug-e2e` on failures, fix, repeat.
 
-**Red**: Write `lifecycle_test.go` with tests for each hook name from the research report. Use actual JSON payloads from research captures. Test every EventType mapping, nil returns for pass-through hooks, empty input, and malformed JSON.
+Run: `mise run fmt && mise run lint`
 
-**Green**: Create `lifecycle.go`. Read the `HookSupport` interface in `cmd/entire/cli/agent/agent.go` for exact method signatures. Read `docs/architecture/agent-guide.md` Step 4 for the switch-case pattern. Read a reference agent's `lifecycle.go` (find via `Glob("cmd/entire/cli/agent/*/lifecycle.go")`) for the implementation pattern.
+**Commit:** Use `/commit` to commit all files. Skip if no files changed.
 
-Run: `mise run test`
+### Step 10: Optional Interfaces
 
-### Step 7: TDD Cycle — Hooks (HookSupport)
+Read `cmd/entire/cli/agent/agent.go` for all optional interfaces. For each one the one-pager's "Gaps & Limitations" or "Transcript" sections suggest is feasible:
 
-**Red**: Write `hooks_test.go` with tests for InstallHooks (creates config, idempotent), UninstallHooks (removes hooks), and AreHooksInstalled (detects presence).
+- **`TranscriptPreparer`** — If the agent needs pre-processing before transcript storage
+- **`TokenCalculator`** — If the agent provides token usage data
+- **`SubagentAwareExtractor`** — If the agent has subagent/tool-use patterns
 
-**Green**: Create `hooks.go`. Read the `HookSupport` interface in `cmd/entire/cli/agent/agent.go` for exact signatures. Read `docs/architecture/agent-guide.md` Step 8 for the installation pattern. Read a reference agent's `hooks.go` (find via `Glob("cmd/entire/cli/agent/*/hooks.go")`) for the JSON config file pattern.
+For each optional interface:
 
-Use the research report to determine:
-- Which config file to modify (e.g., `.agent/settings.json`)
-- How hooks are registered (JSON objects, env vars, etc.)
-- What command format to use (`entire hooks $AGENT_SLUG <verb>`)
+1. Implement the methods based on `AGENT.md` and reference implementation
+2. Run relevant E2E tests to verify integration (e.g., `TestCheckpointMetadataDeepValidation` for transcript methods)
 
-Run: `mise run test`
+Run: `mise run fmt && mise run lint`
 
-### Step 8: TDD Cycle — Transcript
+**Commit:** Use `/commit` to commit all files. Skip if no files changed.
 
-**Red**: Write `transcript_test.go` with tests for reading, chunking, and reassembling transcripts. Use sample data in the agent's native format.
+### Step 11: E2E Tier 6 — Interactive and Rewind Tests
 
-**Green**: Create `transcript.go`. Read the `TranscriptAnalyzer` interface in `cmd/entire/cli/agent/agent.go` if implementing analysis. Read `docs/architecture/agent-guide.md` Transcript Format Guide for JSONL vs JSON patterns. Read a reference agent's `transcript.go` (find via `Glob("cmd/entire/cli/agent/*/transcript.go")`) for the implementation pattern.
+Run these if the agent supports interactive multi-step sessions:
 
-Run: `mise run test`
+- `TestInteractiveMultiStep` — Multiple prompts in one session
+- `TestRewindPreCommit` — Rewind to a checkpoint before committing
+- `TestRewindAfterCommit` — Rewind to a checkpoint after committing
+- `TestRewindMultipleFiles` — Rewind with multiple files changed
 
-### Step 9: Optional Interfaces
+**Cycle:** Same pattern — run, **watch it fail**, `/debug-e2e` on failures, fix, repeat.
 
-Read `cmd/entire/cli/agent/agent.go` for all optional interfaces. For each one the research report marked as feasible, follow the same TDD cycle: write tests, implement, refactor. Read the corresponding section in `docs/architecture/agent-guide.md` (Optional Interface Decision Tree) for guidance on when each is needed.
+Run: `mise run fmt && mise run lint`
 
-### Step 10: Register and Wire Up
+**Commit:** Use `/commit` to commit all files. Skip if no files changed.
 
-1. **Register hook commands**: Search `cmd/entire/cli/` for where hook subcommands are registered and add the new agent
-2. **Verify registration**: The `init()` function in `${AGENT_SLUG}.go` should call `agent.Register(agent.AgentName("$AGENT_SLUG"), New)`
-3. **Run full test suite**: `mise run test:ci`
+### Step 12: E2E Tier 7 — Complex Scenarios
 
-### Step 11: Final Validation
+Run the remaining edge case and stress tests:
+
+- `TestPartialCommitStashNewPrompt` — Partial commit, stash, new prompt
+- `TestStashSecondPromptUnstashCommitAll` — Stash workflow across prompts
+- `TestRapidSequentialCommits` — Multiple commits in quick succession
+- `TestAgentContinuesAfterCommit` — Agent keeps working after a commit
+- `TestSubagentCommitFlow` — If the agent has subagent support
+- `TestSingleSessionSubagentCommitInTurn` — Subagent commits during a turn
+
+**Cycle:** Same pattern — **watch it fail**, fix, repeat. Many of these require no new agent code — they exercise strategy-layer behavior.
+
+Run: `mise run fmt && mise run lint`
+
+**Commit:** Use `/commit` to commit all files. Skip if no files changed.
+
+### Step 13: Full E2E Suite Pass
+
+Run the complete E2E suite for the agent to catch any regressions or tests that were skipped in earlier tiers:
+
+```bash
+mise run test:e2e --agent $AGENT_SLUG
+```
+
+This runs every `ForEachAgent` test, not just the ones targeted in Steps 4-12.
+
+**Important: E2E tests can be flaky when run all at once.** Do NOT run them in parallel — always use sequential execution. If some tests fail when running the full suite, re-run each failing test individually before investigating:
+
+```bash
+mise run test:e2e --agent $AGENT_SLUG TestFailingTestName
+```
+
+If a test passes when run individually but fails in the full suite, it's a flaky failure — not a real error. Only investigate failures that reproduce consistently when run in isolation.
+
+Fix any real failures before proceeding — the same cycle applies: read the failure, use `/debug-e2e {artifact-dir}`, implement the minimum fix, re-run.
+
+All E2E tests must pass before writing unit tests.
+
+### Step 14: Write Unit Tests
+
+Now that all E2E tiers pass, write unit tests to lock in behavior. Use real data from E2E runs (captured JSON payloads, transcript snippets, config file contents) as golden fixtures.
+
+**Test files to create:**
+
+1. **`hooks_test.go`** — Test `InstallHooks` (creates config, idempotent), `UninstallHooks` (removes hooks), `AreHooksInstalled` (detects presence). Use a temp directory to avoid touching real config.
+
+2. **`lifecycle_test.go`** — Test `ParseHookEvent` for all event types. Use actual JSON payloads from E2E artifacts or `AGENT.md` examples. Test every `EventType` mapping, nil returns for unknown hook names, pass-through hooks, empty input, and malformed JSON. **Important:** Test against `EventType` constants from `event.go`, not native hook names — the agent's native hook verbs (e.g., "stop") map to normalized EventTypes (e.g., `TurnEnd`).
+
+3. **`types_test.go`** — Test hook input struct parsing with actual JSON payloads from E2E artifacts or `AGENT.md` examples.
+
+4. **`transcript_test.go`** — Test `ReadTranscript`, `ChunkTranscript`, `ReassembleTranscript` with sample data in the agent's native format. Test all `TranscriptAnalyzer` methods (from `agent.go`) if implemented. Use transcript snippets from E2E artifact directories as golden test data.
+
+5. **`${AGENT_PACKAGE}_test.go`** — Test agent constructor (`New`), `Name()`, `AgentName()`, and any other agent-level methods. Verify the agent satisfies all expected interfaces using compile-time checks (`var _ agent.Agent = (*${AgentType})(nil)`).
+
+**Where to find golden test data:**
+
+- E2E artifact directories contain captured transcripts, hook payloads, and config files
+- `AGENT.md` has example JSON payloads in the "Hook input" sections
+- The agent's actual config file format from E2E test repos
+
+Run: `mise run fmt && mise run lint && mise run test`
+
+**Commit:** Use `/commit` to commit all files. Skip if no files changed.
+
+### Step 15: Verify Registration
+
+Verify that registration from Step 3 is correct and complete:
+
+1. The `init()` function in `${AGENT_PACKAGE}.go` calls `agent.Register(agent.AgentName("$AGENT_KEY"), New)`
+2. The blank import in `cmd/entire/cli/hooks_cmd.go` is present
+3. Run the full test suite: `mise run test:ci`
+
+### Step 16: Final Validation
 
 Run the complete validation:
 
@@ -136,6 +313,18 @@ Check against the integration checklist (`docs/architecture/agent-integration-ch
 - [ ] Hook installation/uninstallation working
 - [ ] Tests pass with `t.Parallel()`
 
+**Commit:** Use `/commit` to commit all files. Skip if no files changed.
+
+## E2E Debugging Protocol
+
+At every E2E failure, follow this protocol:
+
+1. **Read the test output** — the assertion message often tells you exactly what's wrong
+2. **Find the artifact directory** — E2E tests save artifacts (logs, transcripts, git state) to a temp dir printed in the output
+3. **Run `/debug-e2e {artifact-dir}`** — this skill analyzes artifacts and diagnoses the root cause
+4. **Implement the minimum fix** — don't over-engineer; fix only what the test demands
+5. **Re-run the failing test** — not the whole suite, just the one test
+
 ## Key Patterns to Follow
 
 - **Use `agent.ReadAndParseHookInput[T]`** for parsing hook stdin JSON
@@ -151,6 +340,7 @@ Summarize what was implemented:
 - Package directory and files created
 - Interfaces implemented (core + optional)
 - Hook names registered
-- Test coverage (number of test functions, what they cover)
+- E2E tiers passing (list which E2E tests pass)
+- Unit test coverage (number of test functions, what they cover — written in Step 13)
 - Any gaps or TODOs remaining
 - Commands to run full validation
diff --git a/.claude/skills/agent-integration/researcher.md b/.claude/skills/agent-integration/researcher.md
index c5ef5d3e1..6f4223d43 100644
--- a/.claude/skills/agent-integration/researcher.md
+++ b/.claude/skills/agent-integration/researcher.md
@@ -4,20 +4,11 @@ Assess whether a target AI coding agent's hook/lifecycle model is compatible wit
 
 ## Procedure
 
-### Phase 1: Architecture Inspection
+### Phase 1: Understand Entire's Expectations
 
-Read these repo files to understand the Entire lifecycle model that the agent must integrate with:
+Read `docs/architecture/agent-guide.md` to understand what Entire expects from agents: EventType names, required interfaces, hook patterns, and lifecycle flow. This gives you the vocabulary to map the target agent's native hooks to Entire's event model.
 
-**Required reading:**
-
-1. `cmd/entire/cli/agent/agent.go` — Read to find the `Agent` interface and all optional capability interfaces
-2. `cmd/entire/cli/agent/event.go` — Read to find all `EventType` constants (the normalized lifecycle events agents must map to)
-3. `cmd/entire/cli/hook_registry.go` — How native hook names are registered and routed
-4. `cmd/entire/cli/lifecycle.go` — `DispatchLifecycleEvent` handler
-5. `docs/architecture/agent-guide.md` — Full implementation guide
-6. `docs/architecture/agent-integration-checklist.md` — Validation criteria
-
-**Reference implementations:** Run `Glob("cmd/entire/cli/agent/*/")` to discover all existing agent packages. Pick 1-2 as reference. In each, focus on `lifecycle.go` (ParseHookEvent), `hooks.go` (HookSupport), and `types.go` (hook input structs).
+**Do NOT read other internal Entire source files** (`agent.go`, `event.go`, `hook_registry.go`, `lifecycle.go`, or reference implementations). The implementer handles those.
 
 ### Phase 2: Static Capability Checks
 
@@ -73,65 +64,84 @@ Run the script and analyze:
 1. **Execute**: `chmod +x scripts/test-$AGENT_SLUG-agent-integration.sh && scripts/test-$AGENT_SLUG-agent-integration.sh --manual-live`
 2. **For each captured payload**: show command, artifact path, decoded JSON
 3. **Lifecycle mapping**: native hook name → Entire EventType
-4. **Field coverage**: which `Event` struct fields can be populated per event
 
-### Phase 5: Compatibility Report
+### Phase 5: Implementation One-Pager
 
-Generate structured markdown output directly to the user:
+Write the research findings to `cmd/entire/cli/agent/$AGENT_PACKAGE/AGENT.md` as a structured one-pager that the test-writer and implementer phases will use as their single source of agent-specific information.
 
-```markdown
-# Agent Compatibility Report: $AGENT_NAME
+**Create the agent package directory first** (if it doesn't exist):
 
-**Date:** YYYY-MM-DD
-**Agent:** $AGENT_NAME v$VERSION
-**Binary:** $AGENT_BIN
-**Verdict:** COMPATIBLE / PARTIAL / INCOMPATIBLE
+```bash
+mkdir -p cmd/entire/cli/agent/$AGENT_PACKAGE
+```
 
-## Static Capability Checks
+**Write the one-pager using this template:**
 
+```markdown
+# $AGENT_NAME — Integration One-Pager
+
+## Verdict: COMPATIBLE / PARTIAL / INCOMPATIBLE
+
+## Static Checks
 | Check | Result | Notes |
 |-------|--------|-------|
 | Binary present | PASS/FAIL | path |
 | Help available | PASS/FAIL | |
-| Hook keywords found | PASS/WARN/FAIL | keywords found |
-| Session concept | PASS/WARN/FAIL | |
-| Config directory | PASS/WARN/FAIL | path |
-| Documentation | PASS/WARN/FAIL | URLs |
-
-## Lifecycle Event Mapping
-
-For each EventType constant found in `cmd/entire/cli/agent/event.go`, create a row:
-
-| Entire EventType | Native Hook | Status | Fields Available |
-|-----------------|-------------|--------|-----------------|
-| (one row per EventType from event.go) | ? | MAPPED/PARTIAL/MISSING | |
-
-## Required Interface Feasibility
-
-For each interface defined in `cmd/entire/cli/agent/agent.go`, assess feasibility:
-
-| Interface | Feasible | Complexity | Notes |
-|-----------|----------|------------|-------|
-| Agent (core) | Yes/No/Partial | Low/Med/High | |
-| (one row per optional interface from agent.go) | ... | ... | |
-
-## Integration Gaps
-
-1. **[HIGH/MED/LOW]** Description and impact
-2. ...
+| Version info | PASS/FAIL | version string |
+| Hook keywords | PASS/FAIL | keywords found |
+| Session keywords | PASS/FAIL | keywords found |
+| Config directory | PASS/FAIL | path |
+| Documentation | PASS/FAIL | URL |
+
+## Binary
+- Name: `$AGENT_BIN`
+- Version: ...
+- Install: ... (how to install if not present)
+
+## Hook Mechanism
+- Config file: `~/.config/$AGENT_SLUG/settings.json` (exact path)
+- Config format: JSON / YAML / TOML
+- Hook registration: ... (how hooks are declared — JSON objects, env vars, etc.)
+- Hook names and when they fire:
+  | Native Hook Name | When It Fires | Entire EventType |
+  |-----------------|---------------|-----------------|
+  | `on_session_start` | Agent session begins | `SessionStart` |
+  | ... | ... | ... |
+- Valid Entire EventTypes: `SessionStart`, `TurnStart`, `TurnEnd`, `Compaction`, `SessionEnd`, `SubagentStart`, `SubagentEnd`
+- Hook input (stdin JSON): ... (exact fields with example payload)
+
+## Transcript
+- Location: `~/.config/$AGENT_SLUG/sessions/<id>/transcript.jsonl`
+- Format: JSONL / JSON array / other
+- Session ID extraction: ... (from hook payload field or directory name)
+- Example entry: `{"role": "user", "content": "..."}`
+
+## Config Preservation
+- Keys to preserve when modifying: ... (or "use read-modify-write on entire file")
+- Settings that affect hook behavior: ...
+
+## CLI Flags
+- Non-interactive prompt: `$AGENT_BIN --prompt "..." --no-confirm`
+- Interactive mode: `$AGENT_BIN` (or "not supported")
+- Relevant env vars: ...
+
+## Gaps & Limitations
+- ... (anything that doesn't map cleanly)
+
+## Captured Payloads
+- See `.entire/tmp/probe-$AGENT_SLUG-*/captures/` for raw JSON captures
+```
 
-## Recommended Adapter Approach
+**Key points about the one-pager:**
 
-- Which interfaces to implement
-- Complexity estimate (files, LOC)
-- Similar implementation to use as template
-- Key challenges
+- The **Entire EventType mapping** (which native hook → which EventType) uses the event names learned from `agent-guide.md` in Phase 1. The researcher can do this mapping because it's a simple table — it doesn't need Entire source code.
+- Fill in every section with concrete values from Phases 2-4. Don't leave placeholders.
+- If a section doesn't apply (e.g., no transcript support), say so explicitly.
+- This file persists as development documentation — future maintainers will reference it.
 
-## Artifacts
+### Phase 6: Commit
 
-- Test script: `scripts/test-$AGENT_SLUG-agent-integration.sh`
-- Captured payloads: `.entire/tmp/probe-$AGENT_SLUG-*/captures/`
-```
+Use `/commit` to commit all files.
 
 ## Blocker Handling
 
@@ -144,7 +154,8 @@ If blocked at any point (auth, sandbox, binary not found):
 
 ## Constraints
 
-- **No Go code.** This command produces a feasibility report and test script only.
-- **Non-destructive.** All artifacts go under `.entire/tmp/` (gitignored).
+- **No Go code.** This command produces a one-pager and test script only.
+- **Non-destructive.** All artifacts go under `.entire/tmp/` (gitignored). The one-pager goes in the agent package directory.
 - **Agent-specific scripts.** Adapt based on Phase 2 findings, not a generic template.
 - **Ask, don't assume.** If the hook mechanism is unclear, ask the user.
+- **External focus.** Do not read internal Entire source files beyond `agent-guide.md`. The implementer reads those.
diff --git a/.claude/skills/agent-integration/test-writer.md b/.claude/skills/agent-integration/test-writer.md
index 601e950a4..fba42a80b 100644
--- a/.claude/skills/agent-integration/test-writer.md
+++ b/.claude/skills/agent-integration/test-writer.md
@@ -1,17 +1,19 @@
 # Write-Tests Command
 
-Generate the E2E test suite for a new agent integration. Uses the research report's findings and the existing E2E test infrastructure.
+Create the E2E agent runner only — no unit tests, no new test scenarios. The runner registers the agent with the E2E framework so existing `ForEachAgent` tests can exercise it. Uses the implementation one-pager (`AGENT.md`) and the existing E2E test infrastructure.
 
 ## Prerequisites
 
-- The research command should have been run first (or equivalent knowledge of the agent's hook model)
-- If no research report exists, ask the user about the agent's hook events, transcript format, and config mechanism
+- The research command's one-pager at `cmd/entire/cli/agent/$AGENT_PACKAGE/AGENT.md`
+- If no one-pager exists, ask the user for: binary name, prompt CLI flags, interactive mode support, and hook event names
 
 ## Procedure
 
 ### Step 1: Read E2E Test Infrastructure
 
-Read these files to understand the existing test patterns:
+Read these files to understand the existing test patterns.
+
+**Most critical:** Focus on items 3 (`agent.go` — the interface you must implement) and read one existing agent implementation (e.g., `e2e/agents/claude.go`) as a reference. Skim the rest for context.
 
 1. `e2e/tests/main_test.go` — `TestMain` builds the CLI binary (via `entire.BinPath()`), runs preflight checks for required binaries (git, tmux, agent CLIs), sets up artifact directories, and configures env
 2. `e2e/testutil/repo.go` — `RepoState` struct (holds agent, dir, artifact dir, head/checkpoint refs), `SetupRepo` (creates temp git repo, runs `entire enable`, patches settings), `ForEachAgent` (runs a test per registered agent with repo setup, concurrency gating, and timeout scaling)
@@ -36,6 +38,15 @@ Read `docs/architecture/checkpoint-scenarios.md` for the state machine and scena
 
 ### Step 4: Create Agent Implementation
 
+Read `cmd/entire/cli/agent/$AGENT_PACKAGE/AGENT.md` (the one-pager from the research phase) for all agent-specific information:
+- Binary name → "Binary" section
+- Prompt flags → "CLI Flags" section
+- Interactive mode → "CLI Flags" section
+- Transient error patterns → "Gaps & Limitations" section (use defaults if not listed)
+- Bootstrap setup → "Config Preservation" section
+
+**If something is missing from the one-pager**, you may search external docs — but update `AGENT.md` with anything new you discover.
+
 Add a new `Agent` implementation in `e2e/agents/${agent_slug}.go`:
 
 **Pattern to follow** (based on existing implementations like `claude.go`, `gemini.go`, `opencode.go`):
@@ -147,7 +158,7 @@ Key implementation details:
 - `IsTransientError()` identifies retryable API failures — `RepoState.RunPrompt` retries once on transient errors
 - `RunPrompt()` uses `exec.CommandContext` with `Setpgid: true` and process-group kill for clean cancellation
 - `StartSession()` uses `NewTmuxSession` for interactive PTY tests; return `nil` if interactive mode isn't supported
-- Use the research report to determine CLI flags, prompt passing mechanism, and env vars
+- Use `AGENT.md` (the one-pager) for CLI flags, prompt passing mechanism, and env vars
 
 ### Step 5: Update SetupRepo (if needed)
 
@@ -159,64 +170,19 @@ Check if `testutil.SetupRepo` in `e2e/testutil/repo.go` needs agent-specific con
 
 If no special setup is needed, skip this step.
 
-### Step 6: Write E2E Test Scenarios
-
-Existing tests are agent-agnostic (they use `ForEachAgent`), so they should already work with the new agent. **Only create new test files if the agent has unique behaviors** that existing scenarios don't cover.
-
-Check if all existing scenarios work by reviewing:
-- Does the agent support non-interactive prompt mode? (required for `RunPrompt`)
-- Does the agent create files when prompted? (required for basic workflow)
-- Does the agent support git operations? (required for commit scenarios)
-- Does the agent support interactive mode? (required for interactive tests — can return nil from `StartSession`)
-
-If the agent has unique behaviors, create new test files in `e2e/tests/`:
-
-```go
-//go:build e2e
-
-package tests
-
-import (
-	"context"
-	"testing"
-	"time"
-
-	"github.com/entireio/cli/e2e/testutil"
-)
-
-func TestAgentSpecificBehavior(t *testing.T) {
-	testutil.ForEachAgent(t, 2*time.Minute, func(t *testing.T, s *testutil.RepoState, ctx context.Context) {
-		// Skip for agents that don't apply
-		if s.Agent.Name() != "${agent-slug}" {
-			t.Skip("only applies to ${agent-slug}")
-		}
-
-		// Use s.RunPrompt for non-interactive, s.StartSession for interactive
-		_, err := s.RunPrompt(t, ctx,
-			"create a file at hello.txt with 'hello world'. Do not ask for confirmation.")
-		if err != nil {
-			t.Fatalf("agent failed: %v", err)
-		}
-
-		testutil.AssertFileExists(t, s.Dir, "hello.txt")
-	})
-}
-```
+### Step 6: Verify
 
-See `e2e/README.md` for the canonical reference on structure, debugging, and CI workflows.
+After writing the runner code:
 
-### Step 7: Verify
+1. **Lint check**: `mise run lint` — ensure no lint errors
+2. **Compile check**: `go test -c -tags=e2e ./e2e/tests` — compile-only with the build tag to verify the runner compiles and registers
+3. **Verify registration**: The runner's `init()` calls `Register()` and will be picked up by `ForEachAgent` in existing tests
+4. **Add mise task**: Remind the user to add a `test:e2e:${agent_slug}` task in `mise.toml` and update CI workflows
+5. **Next step**: The implement phase will run E2E tests against this runner — that's where failures are diagnosed and fixed
 
-After writing the code:
+### Step 7: Commit
 
-1. **Lint check**: `mise run lint` — ensure no lint errors
-2. **Compile check**: `go test -c -tags=e2e ./e2e/tests` — compile-only with the build tag to verify the code compiles
-3. **List what to run**: Print the exact E2E commands but do NOT run them (they cost money):
-   ```bash
-   mise run test:e2e:${agent_slug} TestSingleSessionManualCommit
-   ```
-4. **Debug failures**: If tests fail, use `/debug-e2e {artifact-dir}` to diagnose — artifacts are auto-captured to `e2e/artifacts/{timestamp}/`
-5. **Add mise task**: Remind the user to add a `test:e2e:${agent_slug}` task in `mise.toml` and update CI workflows
+Use `/commit` to commit all files.
 
 ## Key Conventions
 
@@ -231,16 +197,15 @@ After writing the code:
 - **Console logging**: All operations through `s.RunPrompt`, `s.Git`, `s.Send`, `s.WaitFor` are automatically logged to `console.log`
 - **Transient errors**: `s.RunPrompt` auto-retries once on transient API errors via `IsTransientError`
 - **Interactive tests**: Use `s.StartSession`, `s.Send`, `s.WaitFor` — tmux pane is auto-captured in artifacts
-- **Run commands**: `mise run test:e2e:${slug} TestName` — see `e2e/README.md` for all options
-- **Do NOT run E2E tests**: They make real API calls. Only write the code and print commands.
-- **Debugging failures**: If the user runs tests and they fail, use `/debug-e2e` with the artifact directory to diagnose CLI-level issues (hooks, checkpoints, session phases, attribution)
+- **Run commands**: `mise run test:e2e --agent ${slug} TestName` — see `e2e/README.md` for all options
+- **E2E tests are run during the implement phase**: This phase only creates the runner. The implement phase runs E2E tests at each tier to drive development.
+- **Debugging failures**: If tests fail during the implement phase, use `/debug-e2e` with the artifact directory to diagnose CLI-level issues (hooks, checkpoints, session phases, attribution)
 
 ## Output
 
 Summarize what was created/modified:
 - Files added or modified
-- New agent implementation details (how it invokes the agent, auth setup, concurrency gate)
-- Any agent-specific test scenarios added
-- Commands to run the tests (for user to execute manually)
-- If tests fail, suggest using `/debug-e2e {artifact-dir}` for root cause analysis
+- New agent runner details (how it invokes the agent, auth setup, concurrency gate)
+- Confirmation that the runner compiles and registers with the E2E framework
 - Reminder to update `mise.toml` and CI workflows
+- Note that the implement phase will run E2E tests against this runner

From 75be1c05f2e6f48b3098ddb5bc74aee08d98176e Mon Sep 17 00:00:00 2001
From: Alisha Kawaguchi <alisha@entire.io>
Date: Fri, 27 Feb 2026 16:25:34 -0800
Subject: [PATCH 2/4] Include .claude/plugins/agent-integration updates

---
 .claude/plugins/agent-integration/commands/implement.md   | 2 +-
 .claude/plugins/agent-integration/commands/write-tests.md | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/.claude/plugins/agent-integration/commands/implement.md b/.claude/plugins/agent-integration/commands/implement.md
index 191ff6e05..cad196e49 100644
--- a/.claude/plugins/agent-integration/commands/implement.md
+++ b/.claude/plugins/agent-integration/commands/implement.md
@@ -1,5 +1,5 @@
 ---
-description: "Build the agent Go package via TDD using research findings and E2E tests as spec"
+description: "E2E-first Test driven develpoment — unit tests written last"
 ---
 
 # Implement Command
diff --git a/.claude/plugins/agent-integration/commands/write-tests.md b/.claude/plugins/agent-integration/commands/write-tests.md
index 5f3e019ec..54c347d00 100644
--- a/.claude/plugins/agent-integration/commands/write-tests.md
+++ b/.claude/plugins/agent-integration/commands/write-tests.md
@@ -1,5 +1,5 @@
 ---
-description: "Generate E2E test suite for a new agent integration"
+description: "Create E2E agent runner (no unit tests)"
 ---
 
 # Write-Tests Command

From 2ea43db1b74f569a159fd308f1cf47d8d0c975d0 Mon Sep 17 00:00:00 2001
From: Alisha Kawaguchi <alisha@entire.io>
Date: Fri, 27 Feb 2026 16:33:23 -0800
Subject: [PATCH 3/4] Fix PR review comments: mise syntax, step numbering, and
 missing parameter
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Fix single-dash `-agent` to double-dash `--agent` in implementer.md
- Fix `test:e2e:$SLUG` colon syntax to `test:e2e --agent $SLUG` in
  implementer.md and test-writer.md
- Clarify Step 3's `mise run test` is a compile-only sanity check
- Fix "Step 13" → "Step 14" cross-reference in output section
- Add missing `AGENT_SLUG` parameter to SKILL.md parameter table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 17a7922e66f6
---
 .claude/skills/agent-integration/SKILL.md       |  1 +
 .claude/skills/agent-integration/implementer.md | 12 ++++++------
 .claude/skills/agent-integration/test-writer.md |  2 +-
 3 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/.claude/skills/agent-integration/SKILL.md b/.claude/skills/agent-integration/SKILL.md
index ba2ce1560..1c099efd7 100644
--- a/.claude/skills/agent-integration/SKILL.md
+++ b/.claude/skills/agent-integration/SKILL.md
@@ -22,6 +22,7 @@ Collect these before starting (ask the user if not provided):
 | `AGENT_NAME` | Human-readable name (e.g., "Gemini CLI") | User provides |
 | `AGENT_PACKAGE` | Go package dir name — **no hyphens** | Lowercase, remove hyphens/spaces |
 | `AGENT_KEY` | Registry key for `agent.Register()` and `entire enable` | Check existing patterns in `cmd/entire/cli/agent/registry.go` |
+| `AGENT_SLUG` | Filesystem/URL-safe slug (kebab-case) used in E2E runner filenames and script names | Kebab-case of agent name; align with existing entries in `e2e/agents/` |
 | `AGENT_BIN` | CLI binary name | `command -v <binary>` |
 | `LIVE_COMMAND` | Full command to launch agent | User provides |
 | `EVENTS_OR_UNKNOWN` | Known hook event names, or "unknown" | From agent docs or "unknown" |
diff --git a/.claude/skills/agent-integration/implementer.md b/.claude/skills/agent-integration/implementer.md
index 2d2a4321a..70bbbb191 100644
--- a/.claude/skills/agent-integration/implementer.md
+++ b/.claude/skills/agent-integration/implementer.md
@@ -15,7 +15,7 @@ Build the agent Go package using strict E2E-first TDD. Unit tests are written ON
 3. **Minimum viable fix.** At each failure, implement only the code needed to make that specific assertion pass. Don't anticipate future tiers.
 4. **`/debug-e2e` is your debugger.** When an E2E test fails, use the artifact directory with `/debug-e2e` before guessing at fixes.
 5. **No unit tests during Steps 4-13.** Unit tests are written in Step 14 after all E2E tiers pass, using real data from E2E runs as golden fixtures.
-6. **Format and lint, don't unit test.** Between E2E tiers, run `mise run fmt && mise run lint` to keep code clean. No `mise run test` until Step 14.
+6. **Format and lint, don't unit test.** Between E2E tiers, run `mise run fmt && mise run lint` to keep code clean. Any earlier `mise run test` invocations (e.g., in Step 3) are strictly compile-only sanity checks — no `mise run test` between E2E tiers (Steps 4-13).
 7. **If you didn't watch it fail, you don't know if it tests the right thing.**
 
 **Do NOT write unit tests during Steps 4-13.** All test writing is consolidated in Step 14.
@@ -57,13 +57,13 @@ cmd/entire/cli/agent/$AGENT_PACKAGE/
 
 - Ensure the blank import `_ "github.com/entireio/cli/cmd/entire/cli/agent/$AGENT_PACKAGE"` exists in `cmd/entire/cli/hooks_cmd.go`
 
-**Verify compilation:**
+**Verify compilation (compile-only sanity check, not unit-test-driven development):**
 
 ```bash
 mise run fmt && mise run lint && mise run test
 ```
 
-Everything must pass before proceeding. Fix any issues.
+Everything must compile and pass existing tests before proceeding. Fix any issues.
 
 **Commit:** Use `/commit` to commit all files. Skip if no files changed.
 
@@ -103,7 +103,7 @@ The foundational test. This exercises the full agent lifecycle: start session 
 
 **Cycle:**
 
-1. Run: `mise run test:e2e -agent $AGENT_SLUG TestSingleSessionManualCommit`
+1. Run: `mise run test:e2e --agent $AGENT_SLUG TestSingleSessionManualCommit`
 2. **Watch it fail** — read the failure output carefully
 3. Use `/debug-e2e {artifact-dir}` to understand what happened
 4. Implement the minimum code to fix the first failure
@@ -163,7 +163,7 @@ Run these tests to validate multi-session behavior:
 
 **Cycle (for each test):**
 
-1. Run: `mise run test:e2e:$AGENT_SLUG TestMultiSessionManualCommit`
+1. Run: `mise run test:e2e --agent $AGENT_SLUG TestMultiSessionManualCommit`
 2. **Watch it fail** — use `/debug-e2e {artifact-dir}` on failures
 3. Fix and repeat
 4. Move to next test
@@ -341,6 +341,6 @@ Summarize what was implemented:
 - Interfaces implemented (core + optional)
 - Hook names registered
 - E2E tiers passing (list which E2E tests pass)
-- Unit test coverage (number of test functions, what they cover — written in Step 13)
+- Unit test coverage (number of test functions, what they cover — written in Step 14)
 - Any gaps or TODOs remaining
 - Commands to run full validation
diff --git a/.claude/skills/agent-integration/test-writer.md b/.claude/skills/agent-integration/test-writer.md
index fba42a80b..042561362 100644
--- a/.claude/skills/agent-integration/test-writer.md
+++ b/.claude/skills/agent-integration/test-writer.md
@@ -177,7 +177,7 @@ After writing the runner code:
 1. **Lint check**: `mise run lint` — ensure no lint errors
 2. **Compile check**: `go test -c -tags=e2e ./e2e/tests` — compile-only with the build tag to verify the runner compiles and registers
 3. **Verify registration**: The runner's `init()` calls `Register()` and will be picked up by `ForEachAgent` in existing tests
-4. **Add mise task**: Remind the user to add a `test:e2e:${agent_slug}` task in `mise.toml` and update CI workflows
+4. **Add mise task**: Remind the user that E2E tests are run via `mise run test:e2e --agent ${agent_slug}` and to update CI workflows if needed
 5. **Next step**: The implement phase will run E2E tests against this runner — that's where failures are diagnosed and fixed
 
 ### Step 7: Commit

From d5eea9ffdb43a6790a3a9e8bd94af4c0a5bfd90c Mon Sep 17 00:00:00 2001
From: Alex Ong <alex@entire.io>
Date: Mon, 2 Mar 2026 10:40:50 +1100
Subject: [PATCH 4/4] Fix typo in implement command description

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 17297f0e466a
---
 .claude/plugins/agent-integration/commands/implement.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.claude/plugins/agent-integration/commands/implement.md b/.claude/plugins/agent-integration/commands/implement.md
index cad196e49..b0c9201fd 100644
--- a/.claude/plugins/agent-integration/commands/implement.md
+++ b/.claude/plugins/agent-integration/commands/implement.md
@@ -1,5 +1,5 @@
 ---
-description: "E2E-first Test driven develpoment — unit tests written last"
+description: "E2E-first Test driven development — unit tests written last"
 ---
 
 # Implement Command