Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
ec8da57
feat(ci-driver): wizard-ci-tools control plane for headless e2e + rec…
gewenyu99 Jun 21, 2026
cd437a7
refactor(posthog-integration): extract e2e profile to its own file
gewenyu99 Jun 21, 2026
1c2dca8
docs(ci-driver): point the agent guide at the extracted e2e profile file
gewenyu99 Jun 21, 2026
f5e6a65
test(ci-driver): add offline sample-recording generator for replay
gewenyu99 Jun 22, 2026
17a8777
docs(scripts): add README indexing the ci-driver/e2e scripts
gewenyu99 Jun 22, 2026
b0f4e53
Merge remote-tracking branch 'origin/main' into e2e-control-plane
gewenyu99 Jun 22, 2026
c8eacca
fix(ci-driver): classify warehouse-intro + self-driving-intro screens
gewenyu99 Jun 22, 2026
b1a43ee
docs(ci-driver): rename agent guide to ARCHITECTURE.md, strip interna…
gewenyu99 Jun 22, 2026
61527e8
feat(ci-driver): render a recording to per-frame TUI snapshots
gewenyu99 Jun 22, 2026
a55ccbd
refactor: move e2e/recording harness out of prod src into e2e-harness/
gewenyu99 Jun 22, 2026
e44fe55
docs(e2e-harness): cross-link the workbench visual-snapshots flow + env
gewenyu99 Jun 22, 2026
2a325ef
docs(posthog-integration): describe the e2e test path next to the pro…
gewenyu99 Jun 22, 2026
e7484ee
refactor(e2e): make the test definition a readable JSON the harness l…
gewenyu99 Jun 22, 2026
18853dd
docs(e2e-harness): instrument the perform_action trace across the hops
gewenyu99 Jun 22, 2026
fcb2e54
docs(e2e-harness): state the never-ships-to-prod guarantee in each mo…
gewenyu99 Jun 22, 2026
0e63a30
Merge branch 'main' into e2e-control-plane
gewenyu99 Jun 22, 2026
cb439ff
revert: drop the explanatory comments from source
gewenyu99 Jun 22, 2026
6e88d7d
chore(scripts): remove demo/proof scaffolding from the PR
gewenyu99 Jun 22, 2026
5f214b7
docs(e2e-harness): add the agent exploration runbook
gewenyu99 Jun 22, 2026
5491e27
docs: move agent-exploration to wizard README, trim comments to curre…
gewenyu99 Jun 22, 2026
a919c3d
feat(skills): promote the agent-exploration runbook to a skill
gewenyu99 Jun 22, 2026
9d49433
feat(e2e-harness): live MCP server so an agent drives the wizard turn…
gewenyu99 Jun 22, 2026
59ffdc1
chore: align zod spec to ^3.25.76 (matches the pi stack #701)
gewenyu99 Jun 22, 2026
217e717
refactor(e2e-harness): drop redundant list_actions from the MCP server
gewenyu99 Jun 22, 2026
9b870f3
docs: revert prettier reflow of README + AGENTS, keep only the real c…
gewenyu99 Jun 22, 2026
119dba0
docs: fix dead link — point ARCHITECTURE at the skill, not the delete…
gewenyu99 Jun 22, 2026
e7129c4
fix(skill): correct the driving instructions — MCP tools bind at sess…
gewenyu99 Jun 22, 2026
6dd01e5
fix(e2e-harness): make the MCP server actually loadable; skill leads …
gewenyu99 Jun 22, 2026
f702da7
feat(e2e-harness): bind wizard-ci as committed MCP tools so an agent …
gewenyu99 Jun 22, 2026
d50da8d
docs(e2e-harness): drop "monorepo" wording from open_app guidance
gewenyu99 Jun 22, 2026
332d9d6
docs(e2e-harness): drop the app-dir hand-holding from open_app
gewenyu99 Jun 22, 2026
6dde710
fix(e2e-harness): point agents to run_agent on the auth screen
gewenyu99 Jun 23, 2026
51b9f4c
fix(e2e-harness): make run_agent non-blocking so the MCP server survi…
gewenyu99 Jun 23, 2026
94ac289
refactor(e2e-harness): drive + snapshot the real TUI; one primitive f…
gewenyu99 Jun 23, 2026
e120b2a
docs: drop stale e2e-full-run reference from a comment
gewenyu99 Jun 23, 2026
e87a322
fix(tui-snapshots): capture the run screen's progression + every tran…
gewenyu99 Jun 23, 2026
3bc6b1a
fix(tui-snapshots): drop the throttle; don't hang at exit on the park…
gewenyu99 Jun 23, 2026
fd5599f
refactor(tui-snapshots): always run the agent; drop the RUN_AGENT toggle
gewenyu99 Jun 23, 2026
4ed8691
build: allow node-pty's build script (compiles pty.node on Linux CI)
gewenyu99 Jun 23, 2026
72a1051
fix(tui-capture): strip CI markers so ink renders the real TUI
gewenyu99 Jun 23, 2026
426da5a
Merge remote-tracking branch 'origin/main' into e2e-control-plane
gewenyu99 Jun 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions .claude/skills/exploring-the-wizard/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
name: exploring-the-wizard
description: Run, drive, and explore the PostHog wizard headlessly against an app — boot it on the app and decide each screen yourself over the wizard-ci MCP tools (open_app / read_state / perform_action / run_agent), snapshotting the TUI to see what happened. Use to test or explore the wizard end-to-end.
compatibility: Designed for Claude Code working on the PostHog wizard codebase.
metadata:
author: posthog
version: "3.0"
---

# Exploring the wizard as an agent

Drive a real wizard run yourself: boot it on an app, read each screen, decide, act,
snapshot. You do this through the **`wizard-ci` MCP tools**, which are already bound
in this repo (registered in `.mcp.json`). For _how_ it works underneath, read
[`e2e-harness/ARCHITECTURE.md`](../../../e2e-harness/ARCHITECTURE.md).

If you don't see the `wizard-ci` tools (`open_app`, `read_state`, …), the server
isn't approved yet — ask the user to approve `wizard-ci`, then retry.

## Set up

Ask the user for the absolute path to their PostHog key file — e.g. "What's the
path to your phx key file?" — plus the project id and region if you don't have
them. Clone or copy the target app to a **throwaway `/tmp` copy** (never a real
fixture). Never print or commit the key.

## Drive

1. **`open_app({ appDir, keyFile, projectId, region })`** — boots a live wizard on
the app and returns the first screen. `appDir` is the throwaway copy.
2. **`read_state`** — current screen, run phase, secret-free session, tasks, and
the actions legal right now. Call after every move.
3. **`perform_action({ action, params? })`** — commit a decision: `confirm_setup`,
`dismiss_outage`, `choose` (a setup question, e.g. `{ key, value }`),
`set_mcp_outcome`, `dismiss_slack`, `keep_skills`.
4. **`render_screen`** — render the current TUI to ANSI so you can _see_ it.
5. **`run_agent`** — kicks off the **real integration** in the background and
returns immediately; it bootstraps credentials, so it's what advances `auth`
and `run`. Then **poll `read_state`**`runPhase` goes `running → completed`
and the screen advances to `outro`.

A typical walk:

```
open_app → intro → perform_action confirm_setup
read_state → health-check → perform_action dismiss_outage
read_state → auth → run_agent (returns at once; integration runs in background)
read_state (poll) → runPhase running → completed, screen → outro
outro → perform_action dismiss_outro → … → keep_skills
```

Snapshot with `render_screen` at each key moment so you (and the user) can see what
the wizard showed.

## Key facts

- **State → screen.** You never navigate; you commit a decision (an action) and the
router re-derives the active screen. Name actions, not keys.
- **`auth` and `run` advance only via `run_agent`.** They expose no action and
don't self-advance. `run_agent` returns immediately and runs the integration in
the background — poll `read_state` for `runPhase` (`running → completed`).
Everything else is an instant commit.
- **`run_agent` creates real PostHog resources** (a dashboard + insights) in the
project; each run duplicates them.
- **A green run ≠ a valid integration.** `runPhase=completed` means the flow
finished, not that the wizard understood the framework (e.g. it'll treat a Wasp
app as react-router). Read what it actually changed.
- **None of this ships.** The harness lives in `e2e-harness/`, out of `src/`.
8 changes: 8 additions & 0 deletions .mcp.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"mcpServers": {
"wizard-ci": {
"command": "npx",
"args": ["tsx", "scripts/wizard-ci-mcp.no-jest.ts"]
}
}
}
3 changes: 2 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,15 @@ boundaries, screen resolution

## Skills available

Four skills live under `.claude/skills/`. Read `wizard-development` first for any structural change; then load the relevant procedural skill:
Five skills live under `.claude/skills/`. Read `wizard-development` first for any structural change; then load the relevant procedural skill:

| Skill | When to use |
|---|---|
| `wizard-development` | Before any structural change. Design principles + decision framework. |
| `adding-framework-support` | Adding a new framework integration (e.g. Ruby on Rails, Go, Angular). |
| `adding-skill-program` | Adding a new skill-based program (e.g. a new product feature setup). |
| `ink-tui` | Building or modifying TUI screens, layouts, and primitives. |
| `exploring-the-wizard` | Running/driving/exploring the wizard headlessly (read_state/perform_action, TUI snapshots). |

## CLI command surface

Expand Down
23 changes: 22 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -398,7 +398,7 @@ wizard --integration=nextjs
wizard --integration=nextjs --local-mcp
```

## Testing
### Testing

To run unit tests, run:

Expand All @@ -415,6 +415,27 @@ bin/test-e2e
E2E tests are a bit more complicated to create and adjust due to to their mocked
LLM calls. See the `e2e-tests/README.md` for more information.

#### Explore with an agent

You can hand the wizard to an AI agent and have it drive the real flow itself —
deciding each screen and snapshotting the TUI to see what happened. The agent
drives through the `wizard-ci` MCP tools (`open_app` / `read_state` /
`perform_action` / `render_screen` / `run_agent`), which are registered in this
repo's `.mcp.json` and bound in every session here — approve `wizard-ci` the first
time you're prompted. The how-to is the `exploring-the-wizard` skill
(`.claude/skills/exploring-the-wizard/SKILL.md`), which an agent discovers
automatically.

Example prompt — explore against
[open-saas](https://github.com/wasp-lang/open-saas):

> Explore the PostHog wizard against open-saas, following the
> `exploring-the-wizard` skill. Ask me for my phx key file path, clone
> `https://github.com/wasp-lang/open-saas` into a throwaway `/tmp` copy, then use
> the `wizard-ci` MCP tools to open it and drive the whole flow — deciding each
> screen yourself and snapshotting key moments — and tell me what it did and
> anything that broke.

## Publishing your tool

To make your version of a tool usable with a one-line `npx` command:
Expand Down
87 changes: 87 additions & 0 deletions e2e-harness/ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# e2e-harness — Headless e2e Control Plane

How an agent (or CI) drives a **real** wizard run end-to-end — the **real TUI**,
no browser, no keystrokes — and captures what it rendered. Both e2e routes share
one idea: run the real `startTUI` (the real ink render) and drive its store by
**state manipulation**, then capture the real rendered screen from a PTY.

> If you're an agent that just wants to run and explore the wizard, use the
> `exploring-the-wizard` skill
> ([`.claude/skills/exploring-the-wizard/SKILL.md`](../.claude/skills/exploring-the-wizard/SKILL.md)).
> This doc is the _how it works_ underneath.

## The pieces

This whole harness lives in `e2e-harness/` at the repo root — deliberately OUT of
`src/` so none of it is part of the wizard's production source (nothing in `src/`
imports it; the tsdown bundle never includes it).

```
e2e-harness/
wizard-ci-driver.ts WizardCiDriver — read_state / perform_action over the store
action-registry.ts screen → the actions legal on it (+ NO_ACTION_SCREENS)
e2e-profile.ts WizardE2eProfile + decideE2eAction — the scripted walk policy
profiles.ts per-program profiles + profileFor(programId)
tui-capture.ts run a command in a PTY (node-pty) + read its real screen (@xterm/headless)
scripts/
tui-host.no-jest.ts the real-TUI host: startTUI + WizardCiDriver, MODE=fixed | serve
tui-snapshots.no-jest.ts CI route: host(fixed) in a PTY → per-screen real-TUI snapshots
wizard-ci-mcp.no-jest.ts agent route: MCP server proxying host(serve)
```

The driver reads and mutates the **real** `WizardStore` that the TUI renders from:
the router resolves the active screen from session state, every action goes
through a store setter, and the render is a pure projection of that state. So
manipulating the store makes the real TUI react — the driver and the renderer
share one store and never conflict; you never touch the TUI's input.

## Auth without a browser

The real TUI runs `ci: true`, and auth is satisfied by **state manipulation**:
`getOrAskForProjectData({ ci: true, apiKey })` resolves the phx personal key into
credentials, and `store.setCredentials(...)` sets them — the same bearer path an
OAuth token takes, so the auth screen advances with no browser and no keystrokes.
(`run_agent` does the same bootstrap as part of the real integration.)

## The two routes

- **CI snapshots** — `tui-snapshots.no-jest.ts` spawns `tui-host` (`MODE=fixed`)
in a PTY. The host self-drives the fixed profile (`decideE2eAction`) through the
real agent run and signals each key moment; the parent writes the real rendered
screen to `SNAP_OUT/NN-<screen>.txt` (including the run screen's progression).
- **Agent** — `wizard-ci-mcp.no-jest.ts` is a stdio MCP server that spawns
`tui-host` (`MODE=serve`) and proxies: `read_state` / `perform_action` /
`run_agent` forward over a unix socket; `render_screen` returns the real
captured frame. The agent decides each screen itself.

## Things that bite

1. **Running inside an agent session.** Host env (`CLAUDECODE`, `ANTHROPIC_*`,
`CLAUDE_CODE_*`) makes the wizard's spawned agent defer auth to the host →
`apiKeySource: none` → 401. The harness strips these for the child. A plain CI
shell never has them.
2. **A project-scoped key needs its project id.** Pass the team's `--project-id`
(or `POSTHOG_WIZARD_PROJECT_ID`), or bootstrap 403s on project-data fetch.
3. **Never run on a real fixture.** Always a throwaway copy.
4. **`run_agent` is minutes long and creates real resources** (a dashboard +
insights) each run; the agent log is one shared file — never run two at once.
5. **node-pty's spawn-helper.** When the package is extracted without running its
build script (pnpm skips it), the prebuilt `spawn-helper` loses its execute
bit and `pty.spawn` fails with `posix_spawnp failed`. `tui-capture.ts` restores
it best-effort on each spawn.

## Changing what the run does

Per-program UI choices live in the harness (`profiles.ts`, keyed by program id) —
not on the program config — so this machinery stays out of production source. Edit
the program's entry (typed by `WizardE2eProfile`); the host asks
`decideE2eAction(state, profile)` what to commit on each screen. The (screen →
decision) trace is snapshot-tested offline in `__tests__/` (`jest -u` to update).

## Visual-regression snapshots (the workbench flow)

[wizard-workbench](https://github.com/PostHog/wizard-workbench) runs the CI route
for real-run visual regression: each test definition runs `tui-snapshots`, the
real-TUI screens are rasterized to a side-by-side baseline-vs-current review, and
run-to-run differences are surfaced for a human, not asserted away. See
`services/wizard-ci/` there.
93 changes: 93 additions & 0 deletions e2e-harness/__tests__/__snapshots__/e2e-flow-snapshot.test.ts.snap

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a test of snapshotting not a snapshot

Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
// Jest Snapshot v1, https://goo.gl/fbAQLP

exports[`e2e flow snapshot — posthog-integration Next.js (with a setup question) walks a stable path 1`] = `
{
"profile": {
"ask": "first",
"healthCheck": "dismiss",
"mcp": "skip",
"setup": "first",
"skills": "delete",
"slack": "skip",
},
"program": "posthog-integration",
"trace": [
{
"action": "confirm_setup",
"screen": "intro",
},
{
"action": "dismiss_outage",
"screen": "health-check",
},
{
"action": "choose",
"screen": "setup",
},
{
"action": "(external)",
"screen": "auth",
},
{
"action": "(external)",
"screen": "run",
},
{
"action": "dismiss_outro",
"screen": "outro",
},
{
"action": "set_mcp_outcome",
"screen": "mcp",
},
{
"action": "dismiss_slack",
"screen": "slack-connect",
},
{
"action": "keep_skills",
"screen": "keep-skills",
},
],
}
`;

exports[`e2e flow snapshot — posthog-integration Node (no setup question) walks a stable path 1`] = `
{
"program": "posthog-integration",
"trace": [
{
"action": "confirm_setup",
"screen": "intro",
},
{
"action": "dismiss_outage",
"screen": "health-check",
},
{
"action": "(external)",
"screen": "auth",
},
{
"action": "(external)",
"screen": "run",
},
{
"action": "dismiss_outro",
"screen": "outro",
},
{
"action": "set_mcp_outcome",
"screen": "mcp",
},
{
"action": "dismiss_slack",
"screen": "slack-connect",
},
{
"action": "keep_skills",
"screen": "keep-skills",
},
],
}
`;
Loading
Loading