-
Notifications
You must be signed in to change notification settings - Fork 24
feat(e2e-harness): drive and snapshot the real wizard TUI #702
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gewenyu99
wants to merge
41
commits into
main
Choose a base branch
from
e2e-control-plane
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
41 commits
Select commit
Hold shift + click to select a range
ec8da57
feat(ci-driver): wizard-ci-tools control plane for headless e2e + rec…
gewenyu99 cd437a7
refactor(posthog-integration): extract e2e profile to its own file
gewenyu99 1c2dca8
docs(ci-driver): point the agent guide at the extracted e2e profile file
gewenyu99 f5e6a65
test(ci-driver): add offline sample-recording generator for replay
gewenyu99 17a8777
docs(scripts): add README indexing the ci-driver/e2e scripts
gewenyu99 b0f4e53
Merge remote-tracking branch 'origin/main' into e2e-control-plane
gewenyu99 c8eacca
fix(ci-driver): classify warehouse-intro + self-driving-intro screens
gewenyu99 b1a43ee
docs(ci-driver): rename agent guide to ARCHITECTURE.md, strip interna…
gewenyu99 61527e8
feat(ci-driver): render a recording to per-frame TUI snapshots
gewenyu99 a55ccbd
refactor: move e2e/recording harness out of prod src into e2e-harness/
gewenyu99 e44fe55
docs(e2e-harness): cross-link the workbench visual-snapshots flow + env
gewenyu99 2a325ef
docs(posthog-integration): describe the e2e test path next to the pro…
gewenyu99 e7484ee
refactor(e2e): make the test definition a readable JSON the harness l…
gewenyu99 18853dd
docs(e2e-harness): instrument the perform_action trace across the hops
gewenyu99 fcb2e54
docs(e2e-harness): state the never-ships-to-prod guarantee in each mo…
gewenyu99 0e63a30
Merge branch 'main' into e2e-control-plane
gewenyu99 cb439ff
revert: drop the explanatory comments from source
gewenyu99 6e88d7d
chore(scripts): remove demo/proof scaffolding from the PR
gewenyu99 5f214b7
docs(e2e-harness): add the agent exploration runbook
gewenyu99 5491e27
docs: move agent-exploration to wizard README, trim comments to curre…
gewenyu99 a919c3d
feat(skills): promote the agent-exploration runbook to a skill
gewenyu99 9d49433
feat(e2e-harness): live MCP server so an agent drives the wizard turn…
gewenyu99 59ffdc1
chore: align zod spec to ^3.25.76 (matches the pi stack #701)
gewenyu99 217e717
refactor(e2e-harness): drop redundant list_actions from the MCP server
gewenyu99 9b870f3
docs: revert prettier reflow of README + AGENTS, keep only the real c…
gewenyu99 119dba0
docs: fix dead link — point ARCHITECTURE at the skill, not the delete…
gewenyu99 e7129c4
fix(skill): correct the driving instructions — MCP tools bind at sess…
gewenyu99 6dd01e5
fix(e2e-harness): make the MCP server actually loadable; skill leads …
gewenyu99 f702da7
feat(e2e-harness): bind wizard-ci as committed MCP tools so an agent …
gewenyu99 d50da8d
docs(e2e-harness): drop "monorepo" wording from open_app guidance
gewenyu99 332d9d6
docs(e2e-harness): drop the app-dir hand-holding from open_app
gewenyu99 6dde710
fix(e2e-harness): point agents to run_agent on the auth screen
gewenyu99 51b9f4c
fix(e2e-harness): make run_agent non-blocking so the MCP server survi…
gewenyu99 94ac289
refactor(e2e-harness): drive + snapshot the real TUI; one primitive f…
gewenyu99 e120b2a
docs: drop stale e2e-full-run reference from a comment
gewenyu99 e87a322
fix(tui-snapshots): capture the run screen's progression + every tran…
gewenyu99 3bc6b1a
fix(tui-snapshots): drop the throttle; don't hang at exit on the park…
gewenyu99 fd5599f
refactor(tui-snapshots): always run the agent; drop the RUN_AGENT toggle
gewenyu99 4ed8691
build: allow node-pty's build script (compiles pty.node on Linux CI)
gewenyu99 72a1051
fix(tui-capture): strip CI markers so ink renders the real TUI
gewenyu99 426da5a
Merge remote-tracking branch 'origin/main' into e2e-control-plane
gewenyu99 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| --- | ||
| name: exploring-the-wizard | ||
| description: Run, drive, and explore the PostHog wizard headlessly against an app — boot it on the app and decide each screen yourself over the wizard-ci MCP tools (open_app / read_state / perform_action / run_agent), snapshotting the TUI to see what happened. Use to test or explore the wizard end-to-end. | ||
| compatibility: Designed for Claude Code working on the PostHog wizard codebase. | ||
| metadata: | ||
| author: posthog | ||
| version: "3.0" | ||
| --- | ||
|
|
||
| # Exploring the wizard as an agent | ||
|
|
||
| Drive a real wizard run yourself: boot it on an app, read each screen, decide, act, | ||
| snapshot. You do this through the **`wizard-ci` MCP tools**, which are already bound | ||
| in this repo (registered in `.mcp.json`). For _how_ it works underneath, read | ||
| [`e2e-harness/ARCHITECTURE.md`](../../../e2e-harness/ARCHITECTURE.md). | ||
|
|
||
| If you don't see the `wizard-ci` tools (`open_app`, `read_state`, …), the server | ||
| isn't approved yet — ask the user to approve `wizard-ci`, then retry. | ||
|
|
||
| ## Set up | ||
|
|
||
| Ask the user for the absolute path to their PostHog key file — e.g. "What's the | ||
| path to your phx key file?" — plus the project id and region if you don't have | ||
| them. Clone or copy the target app to a **throwaway `/tmp` copy** (never a real | ||
| fixture). Never print or commit the key. | ||
|
|
||
| ## Drive | ||
|
|
||
| 1. **`open_app({ appDir, keyFile, projectId, region })`** — boots a live wizard on | ||
| the app and returns the first screen. `appDir` is the throwaway copy. | ||
| 2. **`read_state`** — current screen, run phase, secret-free session, tasks, and | ||
| the actions legal right now. Call after every move. | ||
| 3. **`perform_action({ action, params? })`** — commit a decision: `confirm_setup`, | ||
| `dismiss_outage`, `choose` (a setup question, e.g. `{ key, value }`), | ||
| `set_mcp_outcome`, `dismiss_slack`, `keep_skills`. | ||
| 4. **`render_screen`** — render the current TUI to ANSI so you can _see_ it. | ||
| 5. **`run_agent`** — kicks off the **real integration** in the background and | ||
| returns immediately; it bootstraps credentials, so it's what advances `auth` | ||
| and `run`. Then **poll `read_state`** — `runPhase` goes `running → completed` | ||
| and the screen advances to `outro`. | ||
|
|
||
| A typical walk: | ||
|
|
||
| ``` | ||
| open_app → intro → perform_action confirm_setup | ||
| read_state → health-check → perform_action dismiss_outage | ||
| read_state → auth → run_agent (returns at once; integration runs in background) | ||
| read_state (poll) → runPhase running → completed, screen → outro | ||
| outro → perform_action dismiss_outro → … → keep_skills | ||
| ``` | ||
|
|
||
| Snapshot with `render_screen` at each key moment so you (and the user) can see what | ||
| the wizard showed. | ||
|
|
||
| ## Key facts | ||
|
|
||
| - **State → screen.** You never navigate; you commit a decision (an action) and the | ||
| router re-derives the active screen. Name actions, not keys. | ||
| - **`auth` and `run` advance only via `run_agent`.** They expose no action and | ||
| don't self-advance. `run_agent` returns immediately and runs the integration in | ||
| the background — poll `read_state` for `runPhase` (`running → completed`). | ||
| Everything else is an instant commit. | ||
| - **`run_agent` creates real PostHog resources** (a dashboard + insights) in the | ||
| project; each run duplicates them. | ||
| - **A green run ≠ a valid integration.** `runPhase=completed` means the flow | ||
| finished, not that the wizard understood the framework (e.g. it'll treat a Wasp | ||
| app as react-router). Read what it actually changed. | ||
| - **None of this ships.** The harness lives in `e2e-harness/`, out of `src/`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| { | ||
| "mcpServers": { | ||
| "wizard-ci": { | ||
| "command": "npx", | ||
| "args": ["tsx", "scripts/wizard-ci-mcp.no-jest.ts"] | ||
| } | ||
| } | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| # e2e-harness — Headless e2e Control Plane | ||
|
|
||
| How an agent (or CI) drives a **real** wizard run end-to-end — the **real TUI**, | ||
| no browser, no keystrokes — and captures what it rendered. Both e2e routes share | ||
| one idea: run the real `startTUI` (the real ink render) and drive its store by | ||
| **state manipulation**, then capture the real rendered screen from a PTY. | ||
|
|
||
| > If you're an agent that just wants to run and explore the wizard, use the | ||
| > `exploring-the-wizard` skill | ||
| > ([`.claude/skills/exploring-the-wizard/SKILL.md`](../.claude/skills/exploring-the-wizard/SKILL.md)). | ||
| > This doc is the _how it works_ underneath. | ||
|
|
||
| ## The pieces | ||
|
|
||
| This whole harness lives in `e2e-harness/` at the repo root — deliberately OUT of | ||
| `src/` so none of it is part of the wizard's production source (nothing in `src/` | ||
| imports it; the tsdown bundle never includes it). | ||
|
|
||
| ``` | ||
| e2e-harness/ | ||
| wizard-ci-driver.ts WizardCiDriver — read_state / perform_action over the store | ||
| action-registry.ts screen → the actions legal on it (+ NO_ACTION_SCREENS) | ||
| e2e-profile.ts WizardE2eProfile + decideE2eAction — the scripted walk policy | ||
| profiles.ts per-program profiles + profileFor(programId) | ||
| tui-capture.ts run a command in a PTY (node-pty) + read its real screen (@xterm/headless) | ||
| scripts/ | ||
| tui-host.no-jest.ts the real-TUI host: startTUI + WizardCiDriver, MODE=fixed | serve | ||
| tui-snapshots.no-jest.ts CI route: host(fixed) in a PTY → per-screen real-TUI snapshots | ||
| wizard-ci-mcp.no-jest.ts agent route: MCP server proxying host(serve) | ||
| ``` | ||
|
|
||
| The driver reads and mutates the **real** `WizardStore` that the TUI renders from: | ||
| the router resolves the active screen from session state, every action goes | ||
| through a store setter, and the render is a pure projection of that state. So | ||
| manipulating the store makes the real TUI react — the driver and the renderer | ||
| share one store and never conflict; you never touch the TUI's input. | ||
|
|
||
| ## Auth without a browser | ||
|
|
||
| The real TUI runs `ci: true`, and auth is satisfied by **state manipulation**: | ||
| `getOrAskForProjectData({ ci: true, apiKey })` resolves the phx personal key into | ||
| credentials, and `store.setCredentials(...)` sets them — the same bearer path an | ||
| OAuth token takes, so the auth screen advances with no browser and no keystrokes. | ||
| (`run_agent` does the same bootstrap as part of the real integration.) | ||
|
|
||
| ## The two routes | ||
|
|
||
| - **CI snapshots** — `tui-snapshots.no-jest.ts` spawns `tui-host` (`MODE=fixed`) | ||
| in a PTY. The host self-drives the fixed profile (`decideE2eAction`) through the | ||
| real agent run and signals each key moment; the parent writes the real rendered | ||
| screen to `SNAP_OUT/NN-<screen>.txt` (including the run screen's progression). | ||
| - **Agent** — `wizard-ci-mcp.no-jest.ts` is a stdio MCP server that spawns | ||
| `tui-host` (`MODE=serve`) and proxies: `read_state` / `perform_action` / | ||
| `run_agent` forward over a unix socket; `render_screen` returns the real | ||
| captured frame. The agent decides each screen itself. | ||
|
|
||
| ## Things that bite | ||
|
|
||
| 1. **Running inside an agent session.** Host env (`CLAUDECODE`, `ANTHROPIC_*`, | ||
| `CLAUDE_CODE_*`) makes the wizard's spawned agent defer auth to the host → | ||
| `apiKeySource: none` → 401. The harness strips these for the child. A plain CI | ||
| shell never has them. | ||
| 2. **A project-scoped key needs its project id.** Pass the team's `--project-id` | ||
| (or `POSTHOG_WIZARD_PROJECT_ID`), or bootstrap 403s on project-data fetch. | ||
| 3. **Never run on a real fixture.** Always a throwaway copy. | ||
| 4. **`run_agent` is minutes long and creates real resources** (a dashboard + | ||
| insights) each run; the agent log is one shared file — never run two at once. | ||
| 5. **node-pty's spawn-helper.** When the package is extracted without running its | ||
| build script (pnpm skips it), the prebuilt `spawn-helper` loses its execute | ||
| bit and `pty.spawn` fails with `posix_spawnp failed`. `tui-capture.ts` restores | ||
| it best-effort on each spawn. | ||
|
|
||
| ## Changing what the run does | ||
|
|
||
| Per-program UI choices live in the harness (`profiles.ts`, keyed by program id) — | ||
| not on the program config — so this machinery stays out of production source. Edit | ||
| the program's entry (typed by `WizardE2eProfile`); the host asks | ||
| `decideE2eAction(state, profile)` what to commit on each screen. The (screen → | ||
| decision) trace is snapshot-tested offline in `__tests__/` (`jest -u` to update). | ||
|
|
||
| ## Visual-regression snapshots (the workbench flow) | ||
|
|
||
| [wizard-workbench](https://github.com/PostHog/wizard-workbench) runs the CI route | ||
| for real-run visual regression: each test definition runs `tui-snapshots`, the | ||
| real-TUI screens are rasterized to a side-by-side baseline-vs-current review, and | ||
| run-to-run differences are surfaced for a human, not asserted away. See | ||
| `services/wizard-ci/` there. |
93 changes: 93 additions & 0 deletions
93
e2e-harness/__tests__/__snapshots__/e2e-flow-snapshot.test.ts.snap
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,93 @@ | ||
| // Jest Snapshot v1, https://goo.gl/fbAQLP | ||
|
|
||
| exports[`e2e flow snapshot — posthog-integration Next.js (with a setup question) walks a stable path 1`] = ` | ||
| { | ||
| "profile": { | ||
| "ask": "first", | ||
| "healthCheck": "dismiss", | ||
| "mcp": "skip", | ||
| "setup": "first", | ||
| "skills": "delete", | ||
| "slack": "skip", | ||
| }, | ||
| "program": "posthog-integration", | ||
| "trace": [ | ||
| { | ||
| "action": "confirm_setup", | ||
| "screen": "intro", | ||
| }, | ||
| { | ||
| "action": "dismiss_outage", | ||
| "screen": "health-check", | ||
| }, | ||
| { | ||
| "action": "choose", | ||
| "screen": "setup", | ||
| }, | ||
| { | ||
| "action": "(external)", | ||
| "screen": "auth", | ||
| }, | ||
| { | ||
| "action": "(external)", | ||
| "screen": "run", | ||
| }, | ||
| { | ||
| "action": "dismiss_outro", | ||
| "screen": "outro", | ||
| }, | ||
| { | ||
| "action": "set_mcp_outcome", | ||
| "screen": "mcp", | ||
| }, | ||
| { | ||
| "action": "dismiss_slack", | ||
| "screen": "slack-connect", | ||
| }, | ||
| { | ||
| "action": "keep_skills", | ||
| "screen": "keep-skills", | ||
| }, | ||
| ], | ||
| } | ||
| `; | ||
|
|
||
| exports[`e2e flow snapshot — posthog-integration Node (no setup question) walks a stable path 1`] = ` | ||
| { | ||
| "program": "posthog-integration", | ||
| "trace": [ | ||
| { | ||
| "action": "confirm_setup", | ||
| "screen": "intro", | ||
| }, | ||
| { | ||
| "action": "dismiss_outage", | ||
| "screen": "health-check", | ||
| }, | ||
| { | ||
| "action": "(external)", | ||
| "screen": "auth", | ||
| }, | ||
| { | ||
| "action": "(external)", | ||
| "screen": "run", | ||
| }, | ||
| { | ||
| "action": "dismiss_outro", | ||
| "screen": "outro", | ||
| }, | ||
| { | ||
| "action": "set_mcp_outcome", | ||
| "screen": "mcp", | ||
| }, | ||
| { | ||
| "action": "dismiss_slack", | ||
| "screen": "slack-connect", | ||
| }, | ||
| { | ||
| "action": "keep_skills", | ||
| "screen": "keep-skills", | ||
| }, | ||
| ], | ||
| } | ||
| `; |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a test of snapshotting not a snapshot