PostHog · gewenyu99 · Jun 21, 2026 · Jun 21, 2026 · Jun 21, 2026 · Jun 22, 2026
diff --git a/.claude/skills/exploring-the-wizard/SKILL.md b/.claude/skills/exploring-the-wizard/SKILL.md
@@ -0,0 +1,68 @@
+---
+name: exploring-the-wizard
+description: Run, drive, and explore the PostHog wizard headlessly against an app — boot it on the app and decide each screen yourself over the wizard-ci MCP tools (open_app / read_state / perform_action / run_agent), snapshotting the TUI to see what happened. Use to test or explore the wizard end-to-end.
+compatibility: Designed for Claude Code working on the PostHog wizard codebase.
+metadata:
+  author: posthog
+  version: "3.0"
+---
+
+# Exploring the wizard as an agent
+
+Drive a real wizard run yourself: boot it on an app, read each screen, decide, act,
+snapshot. You do this through the **`wizard-ci` MCP tools**, which are already bound
+in this repo (registered in `.mcp.json`). For _how_ it works underneath, read
+[`e2e-harness/ARCHITECTURE.md`](../../../e2e-harness/ARCHITECTURE.md).
+
+If you don't see the `wizard-ci` tools (`open_app`, `read_state`, …), the server
+isn't approved yet — ask the user to approve `wizard-ci`, then retry.
+
+## Set up
+
+Ask the user for the absolute path to their PostHog key file — e.g. "What's the
+path to your phx key file?" — plus the project id and region if you don't have
+them. Clone or copy the target app to a **throwaway `/tmp` copy** (never a real
+fixture). Never print or commit the key.
+
+## Drive
+
+1. **`open_app({ appDir, keyFile, projectId, region })`** — boots a live wizard on
+   the app and returns the first screen. `appDir` is the throwaway copy.
+2. **`read_state`** — current screen, run phase, secret-free session, tasks, and
+   the actions legal right now. Call after every move.
+3. **`perform_action({ action, params? })`** — commit a decision: `confirm_setup`,
+   `dismiss_outage`, `choose` (a setup question, e.g. `{ key, value }`),
+   `set_mcp_outcome`, `dismiss_slack`, `keep_skills`.
+4. **`render_screen`** — render the current TUI to ANSI so you can _see_ it.
+5. **`run_agent`** — kicks off the **real integration** in the background and
+   returns immediately; it bootstraps credentials, so it's what advances `auth`
+   and `run`. Then **poll `read_state`** — `runPhase` goes `running → completed`
+   and the screen advances to `outro`.
+
+A typical walk:
+
+```
+open_app → intro → perform_action confirm_setup
+read_state → health-check → perform_action dismiss_outage
+read_state → auth → run_agent           (returns at once; integration runs in background)
+read_state (poll) → runPhase running → completed, screen → outro
+outro → perform_action dismiss_outro → … → keep_skills
+```
+
+Snapshot with `render_screen` at each key moment so you (and the user) can see what
+the wizard showed.
+
+## Key facts
+
+- **State → screen.** You never navigate; you commit a decision (an action) and the
+  router re-derives the active screen. Name actions, not keys.
+- **`auth` and `run` advance only via `run_agent`.** They expose no action and
+  don't self-advance. `run_agent` returns immediately and runs the integration in
+  the background — poll `read_state` for `runPhase` (`running → completed`).
+  Everything else is an instant commit.
+- **`run_agent` creates real PostHog resources** (a dashboard + insights) in the
+  project; each run duplicates them.
+- **A green run ≠ a valid integration.** `runPhase=completed` means the flow
+  finished, not that the wizard understood the framework (e.g. it'll treat a Wasp
+  app as react-router). Read what it actually changed.
+- **None of this ships.** The harness lives in `e2e-harness/`, out of `src/`.
diff --git a/.mcp.json b/.mcp.json
@@ -0,0 +1,8 @@
+{
+  "mcpServers": {
+    "wizard-ci": {
+      "command": "npx",
+      "args": ["tsx", "scripts/wizard-ci-mcp.no-jest.ts"]
+    }
+  }
+}
diff --git a/AGENTS.md b/AGENTS.md
@@ -31,14 +31,15 @@ boundaries, screen resolution
 
 ## Skills available
 
-Four skills live under `.claude/skills/`. Read `wizard-development` first for any structural change; then load the relevant procedural skill:
+Five skills live under `.claude/skills/`. Read `wizard-development` first for any structural change; then load the relevant procedural skill:
 
 | Skill | When to use |
 |---|---|
 | `wizard-development` | Before any structural change. Design principles + decision framework. |
 | `adding-framework-support` | Adding a new framework integration (e.g. Ruby on Rails, Go, Angular). |
 | `adding-skill-program` | Adding a new skill-based program (e.g. a new product feature setup). |
 | `ink-tui` | Building or modifying TUI screens, layouts, and primitives. |
+| `exploring-the-wizard` | Running/driving/exploring the wizard headlessly (read_state/perform_action, TUI snapshots). |
 
 ## CLI command surface
 

diff --git a/README.md b/README.md
@@ -398,7 +398,7 @@ wizard --integration=nextjs
 wizard --integration=nextjs --local-mcp
 ```
 
-## Testing
+### Testing
 
 To run unit tests, run:
 
@@ -415,6 +415,27 @@ bin/test-e2e
 E2E tests are a bit more complicated to create and adjust due to to their mocked
 LLM calls. See the `e2e-tests/README.md` for more information.
 
+#### Explore with an agent
+
+You can hand the wizard to an AI agent and have it drive the real flow itself —
+deciding each screen and snapshotting the TUI to see what happened. The agent
+drives through the `wizard-ci` MCP tools (`open_app` / `read_state` /
+`perform_action` / `render_screen` / `run_agent`), which are registered in this
+repo's `.mcp.json` and bound in every session here — approve `wizard-ci` the first
+time you're prompted. The how-to is the `exploring-the-wizard` skill
+(`.claude/skills/exploring-the-wizard/SKILL.md`), which an agent discovers
+automatically.
+
+Example prompt — explore against
+[open-saas](https://github.com/wasp-lang/open-saas):
+
+> Explore the PostHog wizard against open-saas, following the
+> `exploring-the-wizard` skill. Ask me for my phx key file path, clone
+> `https://github.com/wasp-lang/open-saas` into a throwaway `/tmp` copy, then use
+> the `wizard-ci` MCP tools to open it and drive the whole flow — deciding each
+> screen yourself and snapshotting key moments — and tell me what it did and
+> anything that broke.
+
 ## Publishing your tool
 
 To make your version of a tool usable with a one-line `npx` command:

diff --git a/e2e-harness/ARCHITECTURE.md b/e2e-harness/ARCHITECTURE.md
@@ -0,0 +1,87 @@
+# e2e-harness — Headless e2e Control Plane
+
+How an agent (or CI) drives a **real** wizard run end-to-end — the **real TUI**,
+no browser, no keystrokes — and captures what it rendered. Both e2e routes share
+one idea: run the real `startTUI` (the real ink render) and drive its store by
+**state manipulation**, then capture the real rendered screen from a PTY.
+
+> If you're an agent that just wants to run and explore the wizard, use the
+> `exploring-the-wizard` skill
+> ([`.claude/skills/exploring-the-wizard/SKILL.md`](../.claude/skills/exploring-the-wizard/SKILL.md)).
+> This doc is the _how it works_ underneath.
+
+## The pieces
+
+This whole harness lives in `e2e-harness/` at the repo root — deliberately OUT of
+`src/` so none of it is part of the wizard's production source (nothing in `src/`
+imports it; the tsdown bundle never includes it).
+
+```
+e2e-harness/
+  wizard-ci-driver.ts   WizardCiDriver — read_state / perform_action over the store
+  action-registry.ts    screen → the actions legal on it (+ NO_ACTION_SCREENS)
+  e2e-profile.ts        WizardE2eProfile + decideE2eAction — the scripted walk policy
+  profiles.ts           per-program profiles + profileFor(programId)
+  tui-capture.ts        run a command in a PTY (node-pty) + read its real screen (@xterm/headless)
+scripts/
+  tui-host.no-jest.ts   the real-TUI host: startTUI + WizardCiDriver, MODE=fixed | serve
+  tui-snapshots.no-jest.ts   CI route: host(fixed) in a PTY → per-screen real-TUI snapshots
+  wizard-ci-mcp.no-jest.ts   agent route: MCP server proxying host(serve)
+```
+
+The driver reads and mutates the **real** `WizardStore` that the TUI renders from:
+the router resolves the active screen from session state, every action goes
+through a store setter, and the render is a pure projection of that state. So
+manipulating the store makes the real TUI react — the driver and the renderer
+share one store and never conflict; you never touch the TUI's input.
+
+## Auth without a browser
+
+The real TUI runs `ci: true`, and auth is satisfied by **state manipulation**:
+`getOrAskForProjectData({ ci: true, apiKey })` resolves the phx personal key into
+credentials, and `store.setCredentials(...)` sets them — the same bearer path an
+OAuth token takes, so the auth screen advances with no browser and no keystrokes.
+(`run_agent` does the same bootstrap as part of the real integration.)
+
+## The two routes
+
+- **CI snapshots** — `tui-snapshots.no-jest.ts` spawns `tui-host` (`MODE=fixed`)
+  in a PTY. The host self-drives the fixed profile (`decideE2eAction`) through the
+  real agent run and signals each key moment; the parent writes the real rendered
+  screen to `SNAP_OUT/NN-<screen>.txt` (including the run screen's progression).
+- **Agent** — `wizard-ci-mcp.no-jest.ts` is a stdio MCP server that spawns
+  `tui-host` (`MODE=serve`) and proxies: `read_state` / `perform_action` /
+  `run_agent` forward over a unix socket; `render_screen` returns the real
+  captured frame. The agent decides each screen itself.
+
+## Things that bite
+
+1. **Running inside an agent session.** Host env (`CLAUDECODE`, `ANTHROPIC_*`,
+   `CLAUDE_CODE_*`) makes the wizard's spawned agent defer auth to the host →
+   `apiKeySource: none` → 401. The harness strips these for the child. A plain CI
+   shell never has them.
+2. **A project-scoped key needs its project id.** Pass the team's `--project-id`
+   (or `POSTHOG_WIZARD_PROJECT_ID`), or bootstrap 403s on project-data fetch.
+3. **Never run on a real fixture.** Always a throwaway copy.
+4. **`run_agent` is minutes long and creates real resources** (a dashboard +
+   insights) each run; the agent log is one shared file — never run two at once.
+5. **node-pty's spawn-helper.** When the package is extracted without running its
+   build script (pnpm skips it), the prebuilt `spawn-helper` loses its execute
+   bit and `pty.spawn` fails with `posix_spawnp failed`. `tui-capture.ts` restores
+   it best-effort on each spawn.
+
+## Changing what the run does
+
+Per-program UI choices live in the harness (`profiles.ts`, keyed by program id) —
+not on the program config — so this machinery stays out of production source. Edit
+the program's entry (typed by `WizardE2eProfile`); the host asks
+`decideE2eAction(state, profile)` what to commit on each screen. The (screen →
+decision) trace is snapshot-tested offline in `__tests__/` (`jest -u` to update).
+
+## Visual-regression snapshots (the workbench flow)
+
+[wizard-workbench](https://github.com/PostHog/wizard-workbench) runs the CI route
+for real-run visual regression: each test definition runs `tui-snapshots`, the
+real-TUI screens are rasterized to a side-by-side baseline-vs-current review, and
+run-to-run differences are surfaced for a human, not asserted away. See
+`services/wizard-ci/` there.
diff --git a/e2e-harness/__tests__/__snapshots__/e2e-flow-snapshot.test.ts.snap b/e2e-harness/__tests__/__snapshots__/e2e-flow-snapshot.test.ts.snap
@@ -0,0 +1,93 @@
+// Jest Snapshot v1, https://goo.gl/fbAQLP
+
+exports[`e2e flow snapshot — posthog-integration Next.js (with a setup question) walks a stable path 1`] = `
+{
+  "profile": {
+    "ask": "first",
+    "healthCheck": "dismiss",
+    "mcp": "skip",
+    "setup": "first",
+    "skills": "delete",
+    "slack": "skip",
+  },
+  "program": "posthog-integration",
+  "trace": [
+    {
+      "action": "confirm_setup",
+      "screen": "intro",
+    },
+    {
+      "action": "dismiss_outage",
+      "screen": "health-check",
+    },
+    {
+      "action": "choose",
+      "screen": "setup",
+    },
+    {
+      "action": "(external)",
+      "screen": "auth",
+    },
+    {
+      "action": "(external)",
+      "screen": "run",
+    },
+    {
+      "action": "dismiss_outro",
+      "screen": "outro",
+    },
+    {
+      "action": "set_mcp_outcome",
+      "screen": "mcp",
+    },
+    {
+      "action": "dismiss_slack",
+      "screen": "slack-connect",
+    },
+    {
+      "action": "keep_skills",
+      "screen": "keep-skills",
+    },
+  ],
+}
+`;
+
+exports[`e2e flow snapshot — posthog-integration Node (no setup question) walks a stable path 1`] = `
+{
+  "program": "posthog-integration",
+  "trace": [
+    {
+      "action": "confirm_setup",
+      "screen": "intro",
+    },
+    {
+      "action": "dismiss_outage",
+      "screen": "health-check",
+    },
+    {
+      "action": "(external)",
+      "screen": "auth",
+    },
+    {
+      "action": "(external)",
+      "screen": "run",
+    },
+    {
+      "action": "dismiss_outro",
+      "screen": "outro",
+    },
+    {
+      "action": "set_mcp_outcome",
+      "screen": "mcp",
+    },
+    {
+      "action": "dismiss_slack",
+      "screen": "slack-connect",
+    },
+    {
+      "action": "keep_skills",
+      "screen": "keep-skills",
+    },
+  ],
+}
+`;