diff --git a/.claude/skills/exploring-the-wizard/SKILL.md b/.claude/skills/exploring-the-wizard/SKILL.md new file mode 100644 index 00000000..447c15a9 --- /dev/null +++ b/.claude/skills/exploring-the-wizard/SKILL.md @@ -0,0 +1,68 @@ +--- +name: exploring-the-wizard +description: Run, drive, and explore the PostHog wizard headlessly against an app — boot it on the app and decide each screen yourself over the wizard-ci MCP tools (open_app / read_state / perform_action / run_agent), snapshotting the TUI to see what happened. Use to test or explore the wizard end-to-end. +compatibility: Designed for Claude Code working on the PostHog wizard codebase. +metadata: + author: posthog + version: "3.0" +--- + +# Exploring the wizard as an agent + +Drive a real wizard run yourself: boot it on an app, read each screen, decide, act, +snapshot. You do this through the **`wizard-ci` MCP tools**, which are already bound +in this repo (registered in `.mcp.json`). For _how_ it works underneath, read +[`e2e-harness/ARCHITECTURE.md`](../../../e2e-harness/ARCHITECTURE.md). + +If you don't see the `wizard-ci` tools (`open_app`, `read_state`, …), the server +isn't approved yet — ask the user to approve `wizard-ci`, then retry. + +## Set up + +Ask the user for the absolute path to their PostHog key file — e.g. "What's the +path to your phx key file?" — plus the project id and region if you don't have +them. Clone or copy the target app to a **throwaway `/tmp` copy** (never a real +fixture). Never print or commit the key. + +## Drive + +1. **`open_app({ appDir, keyFile, projectId, region })`** — boots a live wizard on + the app and returns the first screen. `appDir` is the throwaway copy. +2. **`read_state`** — current screen, run phase, secret-free session, tasks, and + the actions legal right now. Call after every move. +3. **`perform_action({ action, params? })`** — commit a decision: `confirm_setup`, + `dismiss_outage`, `choose` (a setup question, e.g. `{ key, value }`), + `set_mcp_outcome`, `dismiss_slack`, `keep_skills`. +4. **`render_screen`** — render the current TUI to ANSI so you can _see_ it. +5. **`run_agent`** — kicks off the **real integration** in the background and + returns immediately; it bootstraps credentials, so it's what advances `auth` + and `run`. Then **poll `read_state`** — `runPhase` goes `running → completed` + and the screen advances to `outro`. + +A typical walk: + +``` +open_app → intro → perform_action confirm_setup +read_state → health-check → perform_action dismiss_outage +read_state → auth → run_agent (returns at once; integration runs in background) +read_state (poll) → runPhase running → completed, screen → outro +outro → perform_action dismiss_outro → … → keep_skills +``` + +Snapshot with `render_screen` at each key moment so you (and the user) can see what +the wizard showed. + +## Key facts + +- **State → screen.** You never navigate; you commit a decision (an action) and the + router re-derives the active screen. Name actions, not keys. +- **`auth` and `run` advance only via `run_agent`.** They expose no action and + don't self-advance. `run_agent` returns immediately and runs the integration in + the background — poll `read_state` for `runPhase` (`running → completed`). + Everything else is an instant commit. +- **`run_agent` creates real PostHog resources** (a dashboard + insights) in the + project; each run duplicates them. +- **A green run ≠ a valid integration.** `runPhase=completed` means the flow + finished, not that the wizard understood the framework (e.g. it'll treat a Wasp + app as react-router). Read what it actually changed. +- **None of this ships.** The harness lives in `e2e-harness/`, out of `src/`. diff --git a/.mcp.json b/.mcp.json new file mode 100644 index 00000000..6693fbb9 --- /dev/null +++ b/.mcp.json @@ -0,0 +1,8 @@ +{ + "mcpServers": { + "wizard-ci": { + "command": "npx", + "args": ["tsx", "scripts/wizard-ci-mcp.no-jest.ts"] + } + } +} diff --git a/AGENTS.md b/AGENTS.md index 68189191..dc058a83 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -31,7 +31,7 @@ boundaries, screen resolution ## Skills available -Four skills live under `.claude/skills/`. Read `wizard-development` first for any structural change; then load the relevant procedural skill: +Five skills live under `.claude/skills/`. Read `wizard-development` first for any structural change; then load the relevant procedural skill: | Skill | When to use | |---|---| @@ -39,6 +39,7 @@ Four skills live under `.claude/skills/`. Read `wizard-development` first for an | `adding-framework-support` | Adding a new framework integration (e.g. Ruby on Rails, Go, Angular). | | `adding-skill-program` | Adding a new skill-based program (e.g. a new product feature setup). | | `ink-tui` | Building or modifying TUI screens, layouts, and primitives. | +| `exploring-the-wizard` | Running/driving/exploring the wizard headlessly (read_state/perform_action, TUI snapshots). | ## CLI command surface diff --git a/README.md b/README.md index f48ae2a3..5386d7f6 100644 --- a/README.md +++ b/README.md @@ -398,7 +398,7 @@ wizard --integration=nextjs wizard --integration=nextjs --local-mcp ``` -## Testing +### Testing To run unit tests, run: @@ -415,6 +415,27 @@ bin/test-e2e E2E tests are a bit more complicated to create and adjust due to to their mocked LLM calls. See the `e2e-tests/README.md` for more information. +#### Explore with an agent + +You can hand the wizard to an AI agent and have it drive the real flow itself — +deciding each screen and snapshotting the TUI to see what happened. The agent +drives through the `wizard-ci` MCP tools (`open_app` / `read_state` / +`perform_action` / `render_screen` / `run_agent`), which are registered in this +repo's `.mcp.json` and bound in every session here — approve `wizard-ci` the first +time you're prompted. The how-to is the `exploring-the-wizard` skill +(`.claude/skills/exploring-the-wizard/SKILL.md`), which an agent discovers +automatically. + +Example prompt — explore against +[open-saas](https://github.com/wasp-lang/open-saas): + +> Explore the PostHog wizard against open-saas, following the +> `exploring-the-wizard` skill. Ask me for my phx key file path, clone +> `https://github.com/wasp-lang/open-saas` into a throwaway `/tmp` copy, then use +> the `wizard-ci` MCP tools to open it and drive the whole flow — deciding each +> screen yourself and snapshotting key moments — and tell me what it did and +> anything that broke. + ## Publishing your tool To make your version of a tool usable with a one-line `npx` command: diff --git a/e2e-harness/ARCHITECTURE.md b/e2e-harness/ARCHITECTURE.md new file mode 100644 index 00000000..0317ccfb --- /dev/null +++ b/e2e-harness/ARCHITECTURE.md @@ -0,0 +1,87 @@ +# e2e-harness — Headless e2e Control Plane + +How an agent (or CI) drives a **real** wizard run end-to-end — the **real TUI**, +no browser, no keystrokes — and captures what it rendered. Both e2e routes share +one idea: run the real `startTUI` (the real ink render) and drive its store by +**state manipulation**, then capture the real rendered screen from a PTY. + +> If you're an agent that just wants to run and explore the wizard, use the +> `exploring-the-wizard` skill +> ([`.claude/skills/exploring-the-wizard/SKILL.md`](../.claude/skills/exploring-the-wizard/SKILL.md)). +> This doc is the _how it works_ underneath. + +## The pieces + +This whole harness lives in `e2e-harness/` at the repo root — deliberately OUT of +`src/` so none of it is part of the wizard's production source (nothing in `src/` +imports it; the tsdown bundle never includes it). + +``` +e2e-harness/ + wizard-ci-driver.ts WizardCiDriver — read_state / perform_action over the store + action-registry.ts screen → the actions legal on it (+ NO_ACTION_SCREENS) + e2e-profile.ts WizardE2eProfile + decideE2eAction — the scripted walk policy + profiles.ts per-program profiles + profileFor(programId) + tui-capture.ts run a command in a PTY (node-pty) + read its real screen (@xterm/headless) +scripts/ + tui-host.no-jest.ts the real-TUI host: startTUI + WizardCiDriver, MODE=fixed | serve + tui-snapshots.no-jest.ts CI route: host(fixed) in a PTY → per-screen real-TUI snapshots + wizard-ci-mcp.no-jest.ts agent route: MCP server proxying host(serve) +``` + +The driver reads and mutates the **real** `WizardStore` that the TUI renders from: +the router resolves the active screen from session state, every action goes +through a store setter, and the render is a pure projection of that state. So +manipulating the store makes the real TUI react — the driver and the renderer +share one store and never conflict; you never touch the TUI's input. + +## Auth without a browser + +The real TUI runs `ci: true`, and auth is satisfied by **state manipulation**: +`getOrAskForProjectData({ ci: true, apiKey })` resolves the phx personal key into +credentials, and `store.setCredentials(...)` sets them — the same bearer path an +OAuth token takes, so the auth screen advances with no browser and no keystrokes. +(`run_agent` does the same bootstrap as part of the real integration.) + +## The two routes + +- **CI snapshots** — `tui-snapshots.no-jest.ts` spawns `tui-host` (`MODE=fixed`) + in a PTY. The host self-drives the fixed profile (`decideE2eAction`) through the + real agent run and signals each key moment; the parent writes the real rendered + screen to `SNAP_OUT/NN-.txt` (including the run screen's progression). +- **Agent** — `wizard-ci-mcp.no-jest.ts` is a stdio MCP server that spawns + `tui-host` (`MODE=serve`) and proxies: `read_state` / `perform_action` / + `run_agent` forward over a unix socket; `render_screen` returns the real + captured frame. The agent decides each screen itself. + +## Things that bite + +1. **Running inside an agent session.** Host env (`CLAUDECODE`, `ANTHROPIC_*`, + `CLAUDE_CODE_*`) makes the wizard's spawned agent defer auth to the host → + `apiKeySource: none` → 401. The harness strips these for the child. A plain CI + shell never has them. +2. **A project-scoped key needs its project id.** Pass the team's `--project-id` + (or `POSTHOG_WIZARD_PROJECT_ID`), or bootstrap 403s on project-data fetch. +3. **Never run on a real fixture.** Always a throwaway copy. +4. **`run_agent` is minutes long and creates real resources** (a dashboard + + insights) each run; the agent log is one shared file — never run two at once. +5. **node-pty's spawn-helper.** When the package is extracted without running its + build script (pnpm skips it), the prebuilt `spawn-helper` loses its execute + bit and `pty.spawn` fails with `posix_spawnp failed`. `tui-capture.ts` restores + it best-effort on each spawn. + +## Changing what the run does + +Per-program UI choices live in the harness (`profiles.ts`, keyed by program id) — +not on the program config — so this machinery stays out of production source. Edit +the program's entry (typed by `WizardE2eProfile`); the host asks +`decideE2eAction(state, profile)` what to commit on each screen. The (screen → +decision) trace is snapshot-tested offline in `__tests__/` (`jest -u` to update). + +## Visual-regression snapshots (the workbench flow) + +[wizard-workbench](https://github.com/PostHog/wizard-workbench) runs the CI route +for real-run visual regression: each test definition runs `tui-snapshots`, the +real-TUI screens are rasterized to a side-by-side baseline-vs-current review, and +run-to-run differences are surfaced for a human, not asserted away. See +`services/wizard-ci/` there. diff --git a/e2e-harness/__tests__/__snapshots__/e2e-flow-snapshot.test.ts.snap b/e2e-harness/__tests__/__snapshots__/e2e-flow-snapshot.test.ts.snap new file mode 100644 index 00000000..a7574afb --- /dev/null +++ b/e2e-harness/__tests__/__snapshots__/e2e-flow-snapshot.test.ts.snap @@ -0,0 +1,93 @@ +// Jest Snapshot v1, https://goo.gl/fbAQLP + +exports[`e2e flow snapshot — posthog-integration Next.js (with a setup question) walks a stable path 1`] = ` +{ + "profile": { + "ask": "first", + "healthCheck": "dismiss", + "mcp": "skip", + "setup": "first", + "skills": "delete", + "slack": "skip", + }, + "program": "posthog-integration", + "trace": [ + { + "action": "confirm_setup", + "screen": "intro", + }, + { + "action": "dismiss_outage", + "screen": "health-check", + }, + { + "action": "choose", + "screen": "setup", + }, + { + "action": "(external)", + "screen": "auth", + }, + { + "action": "(external)", + "screen": "run", + }, + { + "action": "dismiss_outro", + "screen": "outro", + }, + { + "action": "set_mcp_outcome", + "screen": "mcp", + }, + { + "action": "dismiss_slack", + "screen": "slack-connect", + }, + { + "action": "keep_skills", + "screen": "keep-skills", + }, + ], +} +`; + +exports[`e2e flow snapshot — posthog-integration Node (no setup question) walks a stable path 1`] = ` +{ + "program": "posthog-integration", + "trace": [ + { + "action": "confirm_setup", + "screen": "intro", + }, + { + "action": "dismiss_outage", + "screen": "health-check", + }, + { + "action": "(external)", + "screen": "auth", + }, + { + "action": "(external)", + "screen": "run", + }, + { + "action": "dismiss_outro", + "screen": "outro", + }, + { + "action": "set_mcp_outcome", + "screen": "mcp", + }, + { + "action": "dismiss_slack", + "screen": "slack-connect", + }, + { + "action": "keep_skills", + "screen": "keep-skills", + }, + ], +} +`; diff --git a/e2e-harness/__tests__/e2e-flow-snapshot.test.ts b/e2e-harness/__tests__/e2e-flow-snapshot.test.ts new file mode 100644 index 00000000..ac549020 --- /dev/null +++ b/e2e-harness/__tests__/e2e-flow-snapshot.test.ts @@ -0,0 +1,96 @@ +/** + * E2E flow snapshot — the structured-state analog of Sarah's TUI ANSI + * screenshots (`scripts/cli-screenshots.mjs`, `__screenshots__/*.ans`). + * + * Her harness snapshots what a screen *renders*; this snapshots the + * deterministic control-plane *trace* a `wizard-ci --e2e` run walks: the + * ordered (screen → committed decision) path the program's `e2e` profile + * produces. It runs fully offline — the agent and auth are stubbed by injecting + * the external transitions the runner/agent would make — so it's deterministic + * and CI-safe, and it fails when the flow shape regresses (a screen appears or + * disappears, the order changes, or a profile decision changes). + * + * Update goldens with `jest -u` after an intentional flow change. + */ + +import { WizardStore } from '@ui/tui/store'; +import { InkUI } from '@ui/tui/ink-ui'; +import { setUI } from '@ui/index'; +import { buildSession, RunPhase } from '@lib/wizard-session'; +import { Integration } from '@lib/constants'; +import { FRAMEWORK_REGISTRY } from '@lib/registry'; +import { WizardReadiness } from '@lib/health-checks/readiness'; +import { Program } from '@lib/programs/program-registry'; +import { ScreenId } from '@ui/tui/router'; +import { WizardCiDriver } from '../wizard-ci-driver'; +import { decideE2eAction } from '../e2e-profile'; +import { profileFor } from '../profiles'; + +/** + * Walk the program flow offline using its e2e profile, injecting the external + * transitions a real run gets from the runner (auth) and the agent (runPhase) + * and the health probe. Returns the ordered (screen, action) trace. + */ +function traceFlow( + integration: Integration, +): Array<{ screen: string; action: string }> { + const store = new WizardStore(Program.PostHogIntegration); + setUI(new InkUI(store)); + const session = buildSession({ installDir: '/tmp/e2e-snap', ci: true }); + session.integration = integration; + session.frameworkConfig = FRAMEWORK_REGISTRY[integration]; + store.session = session; + + const driver = new WizardCiDriver(store); + const profile = profileFor(Program.PostHogIntegration); + + const trace: Array<{ screen: string; action: string }> = []; + for (let guard = 0; guard < 40; guard++) { + const state = driver.readState(); + const screen = state.currentScreen; + const decision = decideE2eAction(state, profile); + trace.push({ screen, action: decision.action?.id ?? '(external)' }); + + if (decision.action) { + driver.performAction(decision.action.id, decision.action.params ?? {}); + } + + // Inject the transitions a real run gets from outside the driver. + if (screen === ScreenId.HealthCheck) { + store.setReadinessResult({ + decision: WizardReadiness.Yes, + health: {} as never, + reasons: [], + }); + } else if (screen === ScreenId.Auth) { + store.setCredentials({ + accessToken: 'phx_x', + projectApiKey: 'phc_x', + host: 'https://us.posthog.com', + projectId: 1, + }); + } else if (screen === ScreenId.Run) { + store.setRunPhase(RunPhase.Completed); + } + + if (decision.done || store.session.skillsComplete) break; + } + return trace; +} + +describe('e2e flow snapshot — posthog-integration', () => { + it('Next.js (with a setup question) walks a stable path', () => { + expect({ + program: 'posthog-integration', + profile: profileFor(Program.PostHogIntegration), + trace: traceFlow(Integration.nextjs), + }).toMatchSnapshot(); + }); + + it('Node (no setup question) walks a stable path', () => { + expect({ + program: 'posthog-integration', + trace: traceFlow(Integration.javascriptNode), + }).toMatchSnapshot(); + }); +}); diff --git a/e2e-harness/__tests__/wizard-ci-driver.test.ts b/e2e-harness/__tests__/wizard-ci-driver.test.ts new file mode 100644 index 00000000..230923b5 --- /dev/null +++ b/e2e-harness/__tests__/wizard-ci-driver.test.ts @@ -0,0 +1,183 @@ +/** + * Control-plane test: drive a REAL WizardStore through the full integration + * screen sequence using only the WizardCiDriver — proving read_state is a + * truthful projection of router-resolved state and that perform_action commits + * cause the same transitions the interactive UI would. + * + * The agent/auth steps are simulated by committing through the same store the + * runner mutates (the SDK is mocked in jest); every *human* decision goes + * through the driver. + */ + +import { WizardStore } from '@ui/tui/store'; +import { InkUI } from '@ui/tui/ink-ui'; +import { setUI } from '@ui/index'; +import { buildSession, RunPhase, McpOutcome } from '@lib/wizard-session'; +import { Integration } from '@lib/constants'; +import { FRAMEWORK_REGISTRY } from '@lib/registry'; +import { WizardReadiness } from '@lib/health-checks/readiness'; +import { ScreenId, Overlay } from '@ui/tui/router'; +import { Program } from '@lib/programs/program-registry'; +import { WizardCiDriver, UnknownActionError } from '../wizard-ci-driver'; +import { ACTION_REGISTRY, NO_ACTION_SCREENS } from '../action-registry'; + +function freshStore(): WizardStore { + const store = new WizardStore(Program.PostHogIntegration); + // Headless: a real store + InkUI (which only forwards to the store), no Ink + // render. setUI so any getUI() path the store touches resolves. + setUI(new InkUI(store)); + const session = buildSession({ + installDir: '/tmp/ci-driver-test', + ci: true, // OAuth-bypass + ai-opt-in auto-consent semantics + }); + session.integration = Integration.nextjs; + session.frameworkConfig = FRAMEWORK_REGISTRY[Integration.nextjs]; + store.session = session; + return store; +} + +const cleanReadiness = { + decision: WizardReadiness.Yes, + health: {} as never, + reasons: [] as string[], +}; + +describe('WizardCiDriver — full integration flow', () => { + it('walks intro → setup → run → outro → mcp → slack → keep-skills', () => { + const store = freshStore(); + const driver = new WizardCiDriver(store); + + // 1. Intro + expect(driver.readState().currentScreen).toBe(ScreenId.Intro); + expect(driver.listActions().map((a) => a.id)).toContain('confirm_setup'); + driver.performAction('confirm_setup'); + + // 2. Health check — blocks until a readiness result lands (mirrors onInit + // probe). Simulate a clean probe; router advances past it. + expect(driver.readState().currentScreen).toBe(ScreenId.HealthCheck); + store.setReadinessResult(cleanReadiness); + + // 3. Setup — Next.js asks for the router. The driver reads the question + // off read_state and commits the answer via `choose`. + const state = driver.readState(); + expect(state.currentScreen).toBe(ScreenId.Setup); + expect(state.setupQuestions).toHaveLength(1); + expect(state.setupQuestions[0].key).toBe('router'); + const appValue = state.setupQuestions[0].options[0].value; + driver.performAction('choose', { key: 'router', value: appValue }); + + // 4. Auth — no user action; the runner sets credentials headlessly using + // the phx key. Simulate that commit. + expect(driver.readState().currentScreen).toBe(ScreenId.Auth); + store.setCredentials({ + accessToken: 'phx_secret_should_not_leak', + projectApiKey: 'phc_public', + host: 'https://us.posthog.com', + projectId: 42, + }); + + // 5. ai-opt-in auto-completes (ci=true), so we land on Run. The agent runs + // here; simulate it finishing. + expect(driver.readState().currentScreen).toBe(ScreenId.Run); + store.setRunPhase(RunPhase.Running); + store.setRunPhase(RunPhase.Completed); + + // 6. Outro + expect(driver.readState().currentScreen).toBe(ScreenId.Outro); + driver.performAction('dismiss_outro'); + + // 7. MCP + expect(driver.readState().currentScreen).toBe(ScreenId.Mcp); + driver.performAction('set_mcp_outcome', { outcome: 'skipped' }); + expect(store.session.mcpOutcome).toBe(McpOutcome.Skipped); + + // 8. Slack + expect(driver.readState().currentScreen).toBe(ScreenId.SlackConnect); + driver.performAction('dismiss_slack'); + + // 9. Keep skills — terminal commit. + expect(driver.readState().currentScreen).toBe(ScreenId.KeepSkills); + const done = driver.performAction('keep_skills', { kept: true }); + + // keep-skills is the terminal step: it has no isComplete predicate, so the + // router rests on it. Completion is signalled by skillsComplete — the exact + // condition run-wizard.ts awaits to end the run. + expect(store.session.skillsComplete).toBe(true); + expect(done.currentScreen).toBe(ScreenId.KeepSkills); + }); + + it('read_state is a truthful projection and never leaks the access token', () => { + const store = freshStore(); + const driver = new WizardCiDriver(store); + store.setCredentials({ + accessToken: 'phx_secret_should_not_leak', + projectApiKey: 'phc_public', + host: 'https://us.posthog.com', + projectId: 7, + }); + const state = driver.readState(); + // currentScreen always equals what the router resolves. + expect(state.currentScreen).toBe(store.currentScreen); + expect(state.session.hasCredentials).toBe(true); + expect(state.session.projectId).toBe(7); + // No raw secret anywhere in the serialized snapshot. + expect(JSON.stringify(state)).not.toContain('phx_secret_should_not_leak'); + }); + + it('rejects actions that are not legal on the current screen', () => { + const store = freshStore(); + const driver = new WizardCiDriver(store); + expect(driver.readState().currentScreen).toBe(ScreenId.Intro); + expect(() => driver.performAction('keep_skills')).toThrow( + UnknownActionError, + ); + }); +}); + +describe('WizardCiDriver — wizard_ask overlay', () => { + it('answers a pending question through the driver, resolving the agent promise', async () => { + const store = freshStore(); + const driver = new WizardCiDriver(store); + + // The agent (via the ask bridge) opens a question and awaits the answers. + const answersPromise = store.requestQuestion({ + id: 'q1', + source: 'integration-nextjs', + questions: [ + { + id: 'router', + prompt: 'Which router?', + kind: 'single', + options: [ + { label: 'App', value: 'app' }, + { label: 'Pages', value: 'pages' }, + ], + }, + ], + }); + + const state = driver.readState(); + expect(state.currentScreen).toBe(Overlay.WizardAsk); + expect(state.hasOverlay).toBe(true); + expect(state.pendingQuestion?.questions[0].id).toBe('router'); + expect(driver.listActions().map((a) => a.id)).toContain('answer_question'); + + // The driver commits the complete answer map directly — skipping the + // per-question keystroke walk that lives in React-local state. + driver.performAction('answer_question', { answers: { router: 'app' } }); + + await expect(answersPromise).resolves.toEqual({ router: 'app' }); + // Overlay popped; back to the underlying screen. + expect(driver.readState().currentScreen).not.toBe(Overlay.WizardAsk); + }); +}); + +describe('action registry exhaustiveness', () => { + it('every screen and overlay is either actionable or explicitly no-action', () => { + const allScreens = [...Object.values(ScreenId), ...Object.values(Overlay)]; + const uncovered = allScreens.filter( + (s) => !(s in ACTION_REGISTRY) && !NO_ACTION_SCREENS.has(s), + ); + expect(uncovered).toEqual([]); + }); +}); diff --git a/e2e-harness/action-registry.ts b/e2e-harness/action-registry.ts new file mode 100644 index 00000000..13f274e4 --- /dev/null +++ b/e2e-harness/action-registry.ts @@ -0,0 +1,272 @@ +/** + * Screen → action registry for the CI driver. + * + * Maps every screen/overlay to the set of *commit* actions a user could + * perform on it — and, for each, the single WizardStore setter/resolver that + * commit goes through. This is the actuation half of the driver: instead of + * injecting keystrokes, a harness names an action and the driver invokes the + * same store method the Ink screen's keyboard handler would. + * + * Discipline mirrors screen-registry.tsx: one entry per screen, kept exhaustive + * by a test over the ScreenId/Overlay enums. No product knowledge leaks in — + * actions speak only in store setters and generic params. + */ + +import type { WizardStore } from '@ui/tui/store'; +import { ScreenId, Overlay, type ScreenName } from '@ui/tui/router'; +import { McpOutcome } from '@lib/wizard-session'; +import type { AskAnswers } from '@lib/wizard-session'; + +/** One commit action legal on a given screen. */ +export interface DriverAction { + /** Stable action id named in perform_action. */ + id: string; + /** One-line description of what committing this does. */ + description: string; + /** + * Parameter name → human/type hint. Absent = no params. The driver + * validates presence of required params before applying. + */ + params?: Record; + /** Apply the commit by calling exactly one store setter/resolver. */ + apply: (store: WizardStore, params: Record) => void; +} + +/** Thrown when perform_action references a missing required param. */ +export class MissingParamError extends Error { + constructor(action: string, param: string) { + super(`Action "${action}" requires param "${param}".`); + this.name = 'MissingParamError'; + } +} + +function requireString( + action: string, + params: Record, + key: string, +): string { + const v = params[key]; + if (typeof v !== 'string' || v.length === 0) { + throw new MissingParamError(action, key); + } + return v; +} + +/** + * Screens with no committable user action (the runner or agent advances them): + * auth (runner sets credentials), run (agent sets runPhase), ai-opt-in (gated on + * org approval / ci auto-consent), exit, and the no-dismiss terminal overlays. + * Listed explicitly so the exhaustiveness test can tell "intentionally empty" + * from "forgotten". + */ +export const NO_ACTION_SCREENS: ReadonlySet = new Set([ + ScreenId.Auth, + ScreenId.Run, + ScreenId.AiOptIn, + ScreenId.Exit, + ScreenId.AuditRun, + ScreenId.DoctorReport, + ScreenId.SourceMapsOutro, + ScreenId.AuditOutro, + Overlay.ManagedSettings, + Overlay.AuthError, + Overlay.SessionTimeout, +]); + +/** + * Intro-style screens whose only action is "confirm and continue", committing + * the same `setupConfirmed` flag the IntroScreen sets. Several programs reuse + * this shape, so they share one action via this helper. + */ +const confirmSetupAction: DriverAction = { + id: 'confirm_setup', + description: 'Confirm the intro and continue (sets setupConfirmed).', + apply: (store) => store.completeSetup(), +}; + +export const ACTION_REGISTRY: Partial> = { + // ── Program intros — confirm & continue ─────────────────────────────── + [ScreenId.Intro]: [confirmSetupAction], + [ScreenId.RevenueIntro]: [confirmSetupAction], + [ScreenId.SourceMapsIntro]: [confirmSetupAction], + [ScreenId.MigrationIntro]: [confirmSetupAction], + [ScreenId.AgentSkillIntro]: [confirmSetupAction], + [ScreenId.AuditIntro]: [confirmSetupAction], + [ScreenId.DoctorIntro]: [confirmSetupAction], + [ScreenId.WarehouseIntro]: [confirmSetupAction], + [ScreenId.SelfDrivingIntro]: [confirmSetupAction], + + // ── Health check — dismiss a blocking outage ────────────────────────── + [ScreenId.HealthCheck]: [ + { + id: 'dismiss_outage', + description: 'Dismiss the blocking outage screen and continue.', + apply: (store) => store.dismissOutage(), + }, + ], + + // ── Framework disambiguation ────────────────────────────────────────── + [ScreenId.Setup]: [ + { + id: 'choose', + description: + 'Answer one setup question by committing a framework-context value. ' + + 'Read read_state.setupQuestions for the key and allowed values.', + params: { key: 'setup question key', value: 'chosen option value' }, + apply: (store, params) => { + const key = requireString('choose', params, 'key'); + const value = requireString('choose', params, 'value'); + store.setFrameworkContext(key, value); + }, + }, + ], + + // ── Outro ───────────────────────────────────────────────────────────── + [ScreenId.Outro]: [ + { + id: 'dismiss_outro', + description: 'Dismiss the outro and advance to the MCP step.', + apply: (store) => store.setOutroDismissed(), + }, + ], + + // ── MCP install ─────────────────────────────────────────────────────── + [ScreenId.Mcp]: [ + { + id: 'set_mcp_outcome', + description: + 'Complete the MCP step. outcome ∈ {installed, skipped}; clients optional.', + params: { + outcome: '"installed" | "skipped"', + clients: 'string[] (optional)', + }, + apply: (store, params) => { + const raw = (params.outcome as string) ?? 'skipped'; + const outcome = + raw === 'installed' ? McpOutcome.Installed : McpOutcome.Skipped; + const clients = Array.isArray(params.clients) + ? (params.clients as string[]) + : []; + store.setMcpComplete(outcome, clients); + }, + }, + ], + [ScreenId.McpAdd]: [ + { + id: 'set_mcp_outcome', + description: 'Complete the standalone MCP-add flow.', + params: { outcome: '"installed" | "skipped"' }, + apply: (store, params) => { + const raw = (params.outcome as string) ?? 'skipped'; + store.setMcpComplete( + raw === 'installed' ? McpOutcome.Installed : McpOutcome.Skipped, + ); + }, + }, + ], + [ScreenId.McpRemove]: [ + { + id: 'set_mcp_outcome', + description: 'Complete the standalone MCP-remove flow.', + params: { outcome: '"installed" | "skipped"' }, + apply: (store, params) => { + const raw = (params.outcome as string) ?? 'skipped'; + store.setMcpComplete( + raw === 'installed' ? McpOutcome.Installed : McpOutcome.Skipped, + ); + }, + }, + ], + [ScreenId.McpSuggestedPrompts]: [ + { + id: 'dismiss', + description: 'Dismiss the suggested-prompts step.', + apply: (store) => store.setMcpSuggestedPromptsDismissed(), + }, + ], + + // ── Slack ───────────────────────────────────────────────────────────── + [ScreenId.SlackConnect]: [ + { + id: 'dismiss_slack', + description: 'Skip or finish the Connect-Slack step.', + apply: (store) => store.setSlackStepDismissed(), + }, + { + id: 'set_slack_connected', + description: 'Mark Slack as connected (then dismiss to advance).', + params: { connected: 'boolean' }, + apply: (store, params) => + store.setSlackConnected(params.connected !== false), + }, + ], + + // ── Keep skills (terminal step of the integration flow) ─────────────── + [ScreenId.KeepSkills]: [ + { + id: 'keep_skills', + description: + 'Decide whether to keep installed skills; completes the run.', + params: { kept: 'boolean (default true)' }, + apply: (store, params) => store.setSkillsComplete(params.kept !== false), + }, + ], + + // ── Overlays ────────────────────────────────────────────────────────── + [Overlay.WizardAsk]: [ + { + id: 'answer_question', + description: + 'Resolve the pending wizard_ask request. Supply a complete answers ' + + 'map: { [questionId]: string | string[] }. See read_state.pendingQuestion.', + params: { answers: 'Record' }, + apply: (store, params) => { + const answers = (params.answers ?? {}) as AskAnswers; + store.resolvePendingQuestion(answers); + }, + }, + { + id: 'cancel_question', + description: 'Cancel the pending wizard_ask request (sentinel answers).', + apply: (store) => store.cancelPendingQuestion(), + }, + ], + [Overlay.SettingsOverride]: [ + { + id: 'backup_and_fix', + description: 'Back up and fix conflicting .claude/settings.json.', + apply: (store) => { + store.backupAndFixSettingsOverride(); + }, + }, + ], + [Overlay.PortConflict]: [ + { + id: 'resolve_port_conflict', + description: + 'Dismiss the port-conflict overlay and retry the OAuth port loop.', + apply: (store) => store.resolvePortConflict(), + }, + ], + [Overlay.ManualAuthCode]: [ + { + id: 'submit_auth_code', + description: 'Submit a manually-entered OAuth authorization code.', + params: { code: 'authorization code' }, + apply: (store, params) => + store.submitManualAuthCode( + requireString('submit_auth_code', params, 'code'), + ), + }, + { + id: 'dismiss_auth_code', + description: 'Dismiss the manual auth-code overlay without submitting.', + apply: (store) => store.dismissManualAuthCode(), + }, + ], +}; + +/** Actions legal on the given screen — empty array if none. */ +export function actionsForScreen(screen: ScreenName): DriverAction[] { + return ACTION_REGISTRY[screen] ?? []; +} diff --git a/e2e-harness/e2e-profile.ts b/e2e-harness/e2e-profile.ts new file mode 100644 index 00000000..dda16295 --- /dev/null +++ b/e2e-harness/e2e-profile.ts @@ -0,0 +1,154 @@ +/** + * WizardE2eProfile — a program's declarative e2e "test definition": the UI + * choices a headless e2e run makes at each decision point. + * + * Per-program choices live in {@link ./profiles}, keyed by program id. + * {@link decideE2eAction} maps the current screen + a profile to the commit to + * make. Add a program's profile to {@link ./profiles} to make it e2e-drivable. + */ + +import { ScreenId, Overlay, type ScreenName } from '@ui/tui/router'; +import type { CiState } from './wizard-ci-driver.js'; + +/** Which option to pick for a setup disambiguation question. */ +export type SetupChoice = 'first' | 'last'; + +export interface WizardE2eProfile { + /** Setup disambiguation (e.g. Next.js router): which option to commit. */ + setup: SetupChoice; + /** + * Health-check screen: `dismiss` continues even if the probe flags an + * outage (sets outageDismissed); `wait` lets only a clean probe through. + */ + healthCheck: 'dismiss' | 'wait'; + /** Post-agent MCP-install step. */ + mcp: 'skip' | 'install'; + /** Connect-Slack step. */ + slack: 'skip'; + /** Keep or delete the wizard-installed skills at the end. */ + skills: 'keep' | 'delete'; + /** Default answer strategy for an agent `wizard_ask` overlay. */ + ask: 'first'; +} + +/** Happy-path default: take every screen forward, leave nothing behind. */ +export const DEFAULT_E2E_PROFILE: WizardE2eProfile = { + setup: 'first', + healthCheck: 'dismiss', + mcp: 'skip', + slack: 'skip', + skills: 'delete', + ask: 'first', +}; + +/** What the harness should do for the current screen. */ +export interface E2eDecision { + /** A driver action to commit, if any. */ + action?: { id: string; params?: Record }; + /** Set on the keep-skills screen — the orchestrator does the fs deletion. */ + skillsPolicy?: 'keep' | 'delete'; + /** True once the terminal commit has been made. */ + done?: boolean; + /** No action — wait for an external transition (probe, auth, agent run). */ + wait?: boolean; +} + +/** + * Map the current screen + profile to the commit to make. Pure: no store, no + * fs — the caller applies the returned action via the driver and handles + * `skillsPolicy` itself. Returns `{ wait: true }` for screens the runner/agent + * advances on their own (auth, run, ai-opt-in, a clean health probe). + */ +export function decideE2eAction( + state: CiState, + profile: WizardE2eProfile, +): E2eDecision { + switch (state.currentScreen) { + case ScreenId.Intro: + case ScreenId.RevenueIntro: + case ScreenId.MigrationIntro: + case ScreenId.AgentSkillIntro: + case ScreenId.AuditIntro: + case ScreenId.SourceMapsIntro: + case ScreenId.DoctorIntro: + case ScreenId.WarehouseIntro: + case ScreenId.SelfDrivingIntro: + return { action: { id: 'confirm_setup' } }; + + case ScreenId.HealthCheck: + return profile.healthCheck === 'dismiss' + ? { action: { id: 'dismiss_outage' } } + : { wait: true }; + + case ScreenId.Setup: { + const q = state.setupQuestions[0]; + if (!q) return { wait: true }; + const opt = + profile.setup === 'last' + ? q.options[q.options.length - 1] + : q.options[0]; + return { + action: { id: 'choose', params: { key: q.key, value: opt.value } }, + }; + } + + case ScreenId.Outro: + return { action: { id: 'dismiss_outro' } }; + + case ScreenId.Mcp: + return { + action: { + id: 'set_mcp_outcome', + params: { + outcome: profile.mcp === 'install' ? 'installed' : 'skipped', + }, + }, + }; + + case ScreenId.McpSuggestedPrompts: + return { action: { id: 'dismiss' } }; + + case ScreenId.SlackConnect: + return { action: { id: 'dismiss_slack' } }; + + case ScreenId.KeepSkills: + return { + action: { + id: 'keep_skills', + params: { kept: profile.skills === 'keep' }, + }, + skillsPolicy: profile.skills, + done: true, + }; + + case Overlay.WizardAsk: { + const q = state.pendingQuestion?.questions[0]; + if (!q) return { wait: true }; + // 'first': first option for single/multi, sentinel for free text. + const answer = q.options?.[0]?.value ?? 'e2e'; + return { + action: { + id: 'answer_question', + params: { answers: { [q.id]: answer } }, + }, + }; + } + + // auth (runner), run (agent), ai-opt-in (ci), exit, terminal overlays. + default: + return { wait: true }; + } +} + +/** Screens this profile knows how to act on — for completeness checks/tests. */ +export const E2E_DRIVABLE_SCREENS: readonly ScreenName[] = [ + ScreenId.Intro, + ScreenId.HealthCheck, + ScreenId.Setup, + ScreenId.Outro, + ScreenId.Mcp, + ScreenId.McpSuggestedPrompts, + ScreenId.SlackConnect, + ScreenId.KeepSkills, + Overlay.WizardAsk, +]; diff --git a/e2e-harness/profiles.ts b/e2e-harness/profiles.ts new file mode 100644 index 00000000..fd094ccf --- /dev/null +++ b/e2e-harness/profiles.ts @@ -0,0 +1,28 @@ +/** + * Per-program e2e profiles — the UI choices a headless run makes driving each + * program's flow. + * + * Each program declares its test path as JSON next to it + * (`src/lib/programs//test/e2e.json`): a `profile` (the options the run + * auto-takes) plus a documented `path`. {@link profileFor} loads the `profile` + * and maps it by program id. + */ + +import { Program, type ProgramId } from '@lib/programs/program-registry'; +import { DEFAULT_E2E_PROFILE, type WizardE2eProfile } from './e2e-profile.js'; +import posthogIntegrationE2e from '@lib/programs/posthog-integration/test/e2e.json'; + +const PROFILES: Partial> = { + [Program.PostHogIntegration]: + posthogIntegrationE2e.profile as WizardE2eProfile, +}; + +/** The e2e profile for a program, or the happy-path default if none is set. */ +export function profileFor(program: ProgramId): WizardE2eProfile { + return PROFILES[program] ?? DEFAULT_E2E_PROFILE; +} + +/** Whether a program has an explicit (non-default) e2e profile. */ +export function hasProfile(program: ProgramId): boolean { + return program in PROFILES; +} diff --git a/e2e-harness/tui-capture.ts b/e2e-harness/tui-capture.ts new file mode 100644 index 00000000..0a7ac388 --- /dev/null +++ b/e2e-harness/tui-capture.ts @@ -0,0 +1,107 @@ +/** + * Run a command in a PTY and read its real terminal screen. + * + * The shared capture primitive for both e2e routes: spawn the real-TUI host in a + * pseudo-terminal (node-pty) so it renders the real ink TUI, feed its output to a + * headless xterm emulator, and read the current screen as clean text on demand. + */ +import fsmod from 'fs'; +import pathmod from 'path'; +import * as pty from 'node-pty'; +import { createRequire } from 'module'; + +// @xterm/headless ships CJS; its `module` field points at the full browser build, +// so import the headless CJS entry directly to get a working Terminal in Node. +const require = createRequire(import.meta.url); +const { Terminal } = + require('@xterm/headless') as typeof import('@xterm/headless'); + +// node-pty's prebuilt macOS/Linux spawn-helper can lose its execute bit when the +// package is extracted without running its build script (e.g. pnpm skips it), +// which makes pty.spawn fail with "posix_spawnp failed". Restore it, best-effort. +function ensureSpawnHelper(): void { + try { + const root = pathmod.dirname(require.resolve('node-pty/package.json')); + const dir = pathmod.join( + root, + 'prebuilds', + `${process.platform}-${process.arch}`, + ); + const helper = pathmod.join(dir, 'spawn-helper'); + if (fsmod.existsSync(helper)) fsmod.chmodSync(helper, 0o755); + } catch { + /* best-effort */ + } +} + +export interface TuiCapture { + /** The current rendered screen as clean text (trailing blank lines trimmed). */ + frame(): string; + /** Fires after each chunk of terminal output is applied. */ + onData(cb: () => void): void; + kill(): void; + /** Resolves when the child exits. */ + exited: Promise; +} + +export function captureTui(opts: { + cmd: string; + args: string[]; + cwd: string; + env: NodeJS.ProcessEnv; + cols?: number; + rows?: number; +}): TuiCapture { + // Default to a roomy, full-screen-terminal-ish size (overridable per call or + // via PTY_COLS / PTY_ROWS) so the TUI renders the way it would on a real Mac + // terminal rather than cramped. The PTY winsize drives ink's layout. + const cols = opts.cols ?? (Number(process.env.PTY_COLS) || 180); + const rows = opts.rows ?? (Number(process.env.PTY_ROWS) || 50); + ensureSpawnHelper(); + const term = new Terminal({ cols, rows, allowProposedApi: true }); + // Strip CI markers: ink renders non-interactively when it detects CI, which + // leaves the captured screen blank. We want the real interactive TUI. + const childEnv = { ...opts.env }; + for (const k of ['CI', 'CONTINUOUS_INTEGRATION', 'GITHUB_ACTIONS']) + delete childEnv[k]; + const child = pty.spawn(opts.cmd, opts.args, { + name: 'xterm-256color', + cols, + rows, + cwd: opts.cwd, + env: childEnv as { [key: string]: string }, + }); + + const cbs: Array<() => void> = []; + child.onData((d) => { + term.write(d); + for (const cb of cbs) cb(); + }); + let resolveExit!: () => void; + const exited = new Promise((r) => (resolveExit = r)); + child.onExit(() => resolveExit()); + + return { + frame() { + const buf = term.buffer.active; + const lines: string[] = []; + for (let i = 0; i < rows; i++) { + const line = buf.getLine(i); + lines.push(line ? line.translateToString(true) : ''); + } + while (lines.length && !lines[lines.length - 1].trim()) lines.pop(); + return lines.join('\n') + '\n'; + }, + onData(cb) { + cbs.push(cb); + }, + kill() { + try { + child.kill(); + } catch { + /* already gone */ + } + }, + exited, + }; +} diff --git a/e2e-harness/wizard-ci-driver.ts b/e2e-harness/wizard-ci-driver.ts new file mode 100644 index 00000000..e23ce973 --- /dev/null +++ b/e2e-harness/wizard-ci-driver.ts @@ -0,0 +1,192 @@ +/** + * WizardCiDriver — the read/act control plane over a live WizardStore. + * + * This is the read/act core both e2e routes drive. A test harness or a driver + * LLM uses these primitives to run a real wizard end-to-end without keystrokes: + * + * readState() — a truthful projection of the committed store state + * (the same state the Ink render is a pure function of), + * plus the derived currentScreen/hasOverlay so the snapshot + * is complete without reaching into router internals. + * listActions() — the commit actions legal on the current screen. + * performAction() — invoke one, via the exact store setter the Ink screen's + * keyboard handler would call, and return the next state. + * + * It observes *committed* state and actuates *commits*. In-progress keystroke + * state (typed-but-unsubmitted text, highlighted option, the wizard_ask + * per-question accumulator) is React-local and deliberately invisible here — + * the driver issues the final commit directly instead. + */ + +import type { WizardStore } from '@ui/tui/store'; +import type { ScreenName } from '@ui/tui/router'; +import type { PendingQuestion, RunPhase } from '@lib/wizard-session'; +import { actionsForScreen, MissingParamError } from './action-registry.js'; + +/** A setup question projected for the harness (no `detect` fn, no closures). */ +export interface SetupQuestionView { + key: string; + message: string; + options: Array<{ label: string; value: string; hint?: string }>; +} + +/** The action surface as seen by a caller (no `apply` closure). */ +export interface ActionView { + id: string; + description: string; + params?: Record; +} + +/** + * The serialized observable state. A whitelist of WizardSession — credentials + * are reduced to a boolean so secrets never reach a driver LLM. + */ +export interface CiState { + currentScreen: ScreenName; + hasOverlay: boolean; + runPhase: RunPhase; + session: { + installDir: string; + integration: string | null; + detectedFrameworkLabel: string | null; + detectionComplete: boolean; + setupConfirmed: boolean; + hasCredentials: boolean; + projectId: number | null; + mcpComplete: boolean; + slackStepDismissed: boolean; + skillsComplete: boolean; + outroDismissed: boolean; + llmOptIn: boolean; + discoveredFeatures: string[]; + }; + tasks: Array<{ label: string; status: string; activeForm?: string }>; + statusMessages: string[]; + eventPlan: Array<{ name: string; description: string }>; + /** Present iff a wizard_ask overlay is up. */ + pendingQuestion: PendingQuestion | null; + /** Unresolved framework-setup questions when on the setup screen. */ + setupQuestions: SetupQuestionView[]; + /** Commit actions legal on currentScreen. */ + actions: ActionView[]; +} + +export class UnknownActionError extends Error { + constructor(action: string, screen: ScreenName) { + super( + `No action "${action}" on screen "${screen}". ` + + `Call list_actions / read read_state.actions first.`, + ); + this.name = 'UnknownActionError'; + } +} + +export class WizardCiDriver { + constructor(private readonly store: WizardStore) {} + + /** Snapshot the committed state plus the derived screen. */ + readState(): CiState { + const s = this.store.session; + const screen = this.store.currentScreen; + return { + currentScreen: screen, + hasOverlay: this.store.router.hasOverlay, + runPhase: s.runPhase, + session: { + installDir: s.installDir, + integration: s.integration, + detectedFrameworkLabel: s.detectedFrameworkLabel, + detectionComplete: s.detectionComplete, + setupConfirmed: s.setupConfirmed, + hasCredentials: s.credentials !== null, + projectId: s.credentials?.projectId ?? null, + mcpComplete: s.mcpComplete, + slackStepDismissed: s.slackStepDismissed, + skillsComplete: s.skillsComplete, + outroDismissed: s.outroDismissed, + llmOptIn: s.llmOptIn, + discoveredFeatures: [...s.discoveredFeatures], + }, + tasks: this.store.tasks.map((t) => ({ + label: t.label, + status: t.status, + activeForm: t.activeForm, + })), + statusMessages: [...this.store.statusMessages], + eventPlan: this.store.eventPlan.map((e) => ({ + name: e.name, + description: e.description, + })), + pendingQuestion: s.pendingQuestion ?? null, + setupQuestions: this.unresolvedSetupQuestions(), + actions: this.listActions(), + }; + } + + /** Commit actions legal on the current screen. */ + listActions(): ActionView[] { + return actionsForScreen(this.store.currentScreen).map((a) => ({ + id: a.id, + description: a.description, + ...(a.params ? { params: a.params } : {}), + })); + } + + /** + * Apply a named action via its store setter, then return the next state. + * Throws UnknownActionError if the action isn't legal on the current screen, + * or MissingParamError if a required param is absent. + */ + performAction( + actionId: string, + params: Record = {}, + ): CiState { + const screen = this.store.currentScreen; + const action = actionsForScreen(screen).find((a) => a.id === actionId); + if (!action) throw new UnknownActionError(actionId, screen); + action.apply(this.store, params); // may throw MissingParamError + return this.readState(); + } + + /** + * Resolve once the rendered screen changes (or a wizard_ask overlay opens), + * or after timeoutMs. Lets a driver loop block on the next decision point + * instead of polling — the store fires its version listener on every commit, + * including the agent's getUI() calls. + */ + waitForChange(timeoutMs = 120_000): Promise { + const before = this.store.currentScreen; + return new Promise((resolve) => { + let settled = false; + const finish = () => { + if (settled) return; + settled = true; + clearTimeout(timer); + unsub(); + resolve(this.readState()); + }; + const timer = setTimeout(finish, timeoutMs); + const unsub = this.store.subscribe(() => { + if (this.store.currentScreen !== before) finish(); + }); + }); + } + + private unresolvedSetupQuestions(): SetupQuestionView[] { + const s = this.store.session; + const questions = s.frameworkConfig?.metadata.setup?.questions ?? []; + return questions + .filter((q) => !(q.key in s.frameworkContext)) + .map((q) => ({ + key: q.key, + message: q.message, + options: q.options.map((o) => ({ + label: o.label, + value: o.value, + ...(o.hint ? { hint: o.hint } : {}), + })), + })); + } +} + +export { MissingParamError }; diff --git a/package.json b/package.json index eabb2acc..fb3005ee 100644 --- a/package.json +++ b/package.json @@ -54,7 +54,7 @@ "xcode": "3.0.1", "xml-js": "^1.6.11", "yargs": "^16.2.0", - "zod": "^3.24.2", + "zod": "^3.25.76", "zod-to-json-schema": "^3.24.3" }, "devDependencies": { @@ -62,6 +62,7 @@ "@babel/plugin-transform-modules-commonjs": "^7.28.6", "@babel/preset-env": "^7.29.0", "@babel/types": "~7.21.4", + "@modelcontextprotocol/sdk": "^1.29.0", "@types/chai": "^4.3.17", "@types/glob": "^7.2.0", "@types/inquirer": "^0.0.43", @@ -75,6 +76,7 @@ "@types/yargs": "^16.0.9", "@typescript-eslint/eslint-plugin": "^5.13.0", "@typescript-eslint/parser": "^5.13.0", + "@xterm/headless": "^6.0.0", "babel-jest": "^29.7.0", "dotenv": "^16.4.7", "eslint": "^8.18.0", @@ -85,6 +87,7 @@ "jest": "^29.5.0", "lint-staged": "^15.5.1", "msw": "^2.10.4", + "node-pty": "^1.1.0", "prettier": "^2.8.7", "rimraf": "^3.0.2", "ts-jest": "^29.1.0", @@ -120,7 +123,9 @@ "dev": "pnpm build && pnpm link --global && pnpm build:watch", "test:watch": "jest --watch", "prepare": "husky", - "screens:check": "tsx scripts/check-screens.tsx" + "screens:check": "tsx scripts/check-screens.tsx", + "wizard-ci-explore": "tsx scripts/wizard-ci-explore.no-jest.ts", + "wizard-ci-replay": "tsx scripts/tui-replay.no-jest.ts" }, "jest": { "collectCoverage": true, @@ -158,6 +163,7 @@ "^ink$": "/__mocks__/ink.ts", "^@env$": "/src/env.ts", "^@lib/(.*)$": "/src/lib/$1", + "^@e2e-harness/(.*)$": "/e2e-harness/$1", "^@utils/(.*)$": "/src/utils/$1", "^@ui$": "/src/ui/index.ts", "^@ui/(.*)$": "/src/ui/$1", @@ -179,5 +185,10 @@ "volta": { "node": "24.14.1", "pnpm": "10.23.0" + }, + "pnpm": { + "onlyBuiltDependencies": [ + "node-pty" + ] } } diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index 336708a7..d268d4c6 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -10,13 +10,13 @@ importers: dependencies: '@anthropic-ai/claude-agent-sdk': specifier: 0.3.169 - version: 0.3.169(@anthropic-ai/sdk@0.81.0(zod@3.24.2))(@modelcontextprotocol/sdk@1.29.0(@cfworker/json-schema@4.1.1)(zod@3.24.2))(zod@3.24.2) + version: 0.3.169(@anthropic-ai/sdk@0.81.0(zod@3.25.76))(@modelcontextprotocol/sdk@1.29.0(@cfworker/json-schema@4.1.1)(zod@3.25.76))(zod@3.25.76) '@inkjs/ui': specifier: ^2.0.0 version: 2.0.0(ink@6.8.0(@types/react@19.2.14)(react@19.2.4)) '@langchain/core': specifier: ^0.3.40 - version: 0.3.40(openai@6.7.0(ws@8.18.1)(zod@3.24.2)) + version: 0.3.40(openai@6.7.0(ws@8.18.1)(zod@3.25.76)) axios: specifier: 1.7.4 version: 1.7.4 @@ -75,11 +75,11 @@ importers: specifier: ^16.2.0 version: 16.2.0 zod: - specifier: ^3.24.2 - version: 3.24.2 + specifier: ^3.25.76 + version: 3.25.76 zod-to-json-schema: specifier: ^3.24.3 - version: 3.24.3(zod@3.24.2) + version: 3.24.3(zod@3.25.76) devDependencies: '@babel/core': specifier: ^7.29.0 @@ -93,6 +93,9 @@ importers: '@babel/types': specifier: ~7.21.4 version: 7.21.5 + '@modelcontextprotocol/sdk': + specifier: ^1.29.0 + version: 1.29.0(@cfworker/json-schema@4.1.1)(zod@3.25.76) '@types/chai': specifier: ^4.3.17 version: 4.3.20 @@ -132,6 +135,9 @@ importers: '@typescript-eslint/parser': specifier: ^5.13.0 version: 5.62.0(eslint@8.57.1)(typescript@5.7.3) + '@xterm/headless': + specifier: ^6.0.0 + version: 6.0.0 babel-jest: specifier: ^29.7.0 version: 29.7.0(@babel/core@7.29.0) @@ -162,6 +168,9 @@ importers: msw: specifier: ^2.10.4 version: 2.10.4(@types/node@18.19.76)(typescript@5.7.3) + node-pty: + specifier: ^1.1.0 + version: 1.1.0 prettier: specifier: ^2.8.7 version: 2.8.8 @@ -1639,6 +1648,9 @@ packages: resolution: {integrity: sha512-2WALfTl4xo2SkGCYRt6rDTFfk9R1czmBvUQy12gK2KuRKIpWEhcbbzy8EZXtz/jkRqHX8bFEc6FC1HjX4TUWYw==} engines: {node: '>=10.0.0'} + '@xterm/headless@6.0.0': + resolution: {integrity: sha512-5Yj1QINYCyzrZtf8OFIHi47iQtI+0qYFPHmouEfG8dHNxbZ9Tb9YGSuLcsEwj9Z+OL75GJqPyJbyoFer80a2Hw==} + accepts@2.0.0: resolution: {integrity: sha512-5cvg6CtKwfgdmVqY1WIiXKc3Q1bkRqGLi+2W/6ao+6Y7gu/RCwRuAhGEzh5B4KlszSuTLgZYuqFqo5bImjNKng==} engines: {node: '>= 0.6'} @@ -3134,9 +3146,15 @@ packages: resolution: {integrity: sha512-8Ofs/AUQh8MaEcrlq5xOX0CQ9ypTF5dl78mjlMNfOK08fzpgTHQRQPBxcPlEtIw0yRpws+Zo/3r+5WRby7u3Gg==} engines: {node: '>= 0.6'} + node-addon-api@7.1.1: + resolution: {integrity: sha512-5m3bsyrjFWE1xf7nz7YXdN4udnVtXK6/Yfgn5qnahL6bCkf2yKt4k3nuTKAtT4r3IG8JNR2ncsIMdZuAzJjHQQ==} + node-int64@0.4.0: resolution: {integrity: sha512-O5lz91xSOeoXP6DulyHfllpq+Eg00MWitZIbtPfoSEvqIHdl5gfcY6hYzDWnj0qD5tz52PI08u9qUvSVeUBeHw==} + node-pty@1.1.0: + resolution: {integrity: sha512-20JqtutY6JPXTUnL0ij1uad7Qe1baT46lyolh2sSENDd4sTzKZ4nmAFkeAARDKwmlLjPx6XKRlwRUxwjOy+lUg==} + node-releases@2.0.19: resolution: {integrity: sha512-xxOWJsBKtzAq7DY0J+DTzuz58K8e7sJbdgwkbMWQe8UYB6ekmsQ45q0M/tJDsGaZmbC+l7n57UV8Hl5tHxO9uw==} @@ -4099,8 +4117,8 @@ packages: peerDependencies: zod: ^3.25.28 || ^4 - zod@3.24.2: - resolution: {integrity: sha512-lY7CDW43ECgW9u1TcT3IoXHflywfVqDYze4waEz812jR/bZ8FHDsl7pFQoSZTz5N+2NqRXs8GBwnAwo3ZNxqhQ==} + zod@3.25.76: + resolution: {integrity: sha512-gzUt/qt81nXsFGKIFcC3YnfEAx5NkunCfnDlvuBSSFS02bcXu4Lmea0AFIUwbLWxWPx3d9p8S5QoaujKcNQxcQ==} snapshots: @@ -4133,11 +4151,11 @@ snapshots: '@anthropic-ai/claude-agent-sdk-win32-x64@0.3.169': optional: true - '@anthropic-ai/claude-agent-sdk@0.3.169(@anthropic-ai/sdk@0.81.0(zod@3.24.2))(@modelcontextprotocol/sdk@1.29.0(@cfworker/json-schema@4.1.1)(zod@3.24.2))(zod@3.24.2)': + '@anthropic-ai/claude-agent-sdk@0.3.169(@anthropic-ai/sdk@0.81.0(zod@3.25.76))(@modelcontextprotocol/sdk@1.29.0(@cfworker/json-schema@4.1.1)(zod@3.25.76))(zod@3.25.76)': dependencies: - '@anthropic-ai/sdk': 0.81.0(zod@3.24.2) - '@modelcontextprotocol/sdk': 1.29.0(@cfworker/json-schema@4.1.1)(zod@3.24.2) - zod: 3.24.2 + '@anthropic-ai/sdk': 0.81.0(zod@3.25.76) + '@modelcontextprotocol/sdk': 1.29.0(@cfworker/json-schema@4.1.1)(zod@3.25.76) + zod: 3.25.76 optionalDependencies: '@anthropic-ai/claude-agent-sdk-darwin-arm64': 0.3.169 '@anthropic-ai/claude-agent-sdk-darwin-x64': 0.3.169 @@ -4148,11 +4166,11 @@ snapshots: '@anthropic-ai/claude-agent-sdk-win32-arm64': 0.3.169 '@anthropic-ai/claude-agent-sdk-win32-x64': 0.3.169 - '@anthropic-ai/sdk@0.81.0(zod@3.24.2)': + '@anthropic-ai/sdk@0.81.0(zod@3.25.76)': dependencies: json-schema-to-ts: 3.1.1 optionalDependencies: - zod: 3.24.2 + zod: 3.25.76 '@babel/code-frame@7.26.2': dependencies: @@ -5081,7 +5099,7 @@ snapshots: '@eslint/eslintrc@2.1.4': dependencies: ajv: 6.12.6 - debug: 4.4.0 + debug: 4.4.3 espree: 9.6.1 globals: 13.24.0 ignore: 5.3.2 @@ -5101,7 +5119,7 @@ snapshots: '@humanwhocodes/config-array@0.13.0': dependencies: '@humanwhocodes/object-schema': 2.0.3 - debug: 4.4.0 + debug: 4.4.3 minimatch: 3.1.2 transitivePeerDependencies: - supports-color @@ -5353,24 +5371,24 @@ snapshots: '@jridgewell/resolve-uri': 3.1.2 '@jridgewell/sourcemap-codec': 1.5.0 - '@langchain/core@0.3.40(openai@6.7.0(ws@8.18.1)(zod@3.24.2))': + '@langchain/core@0.3.40(openai@6.7.0(ws@8.18.1)(zod@3.25.76))': dependencies: '@cfworker/json-schema': 4.1.1 ansi-styles: 5.2.0 camelcase: 6.3.0 decamelize: 1.2.0 js-tiktoken: 1.0.19 - langsmith: 0.3.11(openai@6.7.0(ws@8.18.1)(zod@3.24.2)) + langsmith: 0.3.11(openai@6.7.0(ws@8.18.1)(zod@3.25.76)) mustache: 4.2.0 p-queue: 6.6.2 p-retry: 4.6.2 uuid: 10.0.0 - zod: 3.24.2 - zod-to-json-schema: 3.24.3(zod@3.24.2) + zod: 3.25.76 + zod-to-json-schema: 3.24.3(zod@3.25.76) transitivePeerDependencies: - openai - '@modelcontextprotocol/sdk@1.29.0(@cfworker/json-schema@4.1.1)(zod@3.24.2)': + '@modelcontextprotocol/sdk@1.29.0(@cfworker/json-schema@4.1.1)(zod@3.25.76)': dependencies: '@hono/node-server': 1.19.14(hono@4.12.18) ajv: 8.20.0 @@ -5387,8 +5405,8 @@ snapshots: json-schema-typed: 8.0.2 pkce-challenge: 5.0.1 raw-body: 3.0.2 - zod: 3.24.2 - zod-to-json-schema: 3.25.2(zod@3.24.2) + zod: 3.25.76 + zod-to-json-schema: 3.25.2(zod@3.25.76) optionalDependencies: '@cfworker/json-schema': 4.1.1 transitivePeerDependencies: @@ -5732,7 +5750,7 @@ snapshots: dependencies: '@typescript-eslint/typescript-estree': 5.62.0(typescript@5.7.3) '@typescript-eslint/utils': 5.62.0(eslint@8.57.1)(typescript@5.7.3) - debug: 4.4.0 + debug: 4.4.3 eslint: 8.57.1 tsutils: 3.21.0(typescript@5.7.3) optionalDependencies: @@ -5746,7 +5764,7 @@ snapshots: dependencies: '@typescript-eslint/types': 5.62.0 '@typescript-eslint/visitor-keys': 5.62.0 - debug: 4.4.0 + debug: 4.4.3 globby: 11.1.0 is-glob: 4.0.3 semver: 7.7.1 @@ -5780,6 +5798,8 @@ snapshots: '@xmldom/xmldom@0.8.10': {} + '@xterm/headless@6.0.0': {} + accepts@2.0.0: dependencies: mime-types: 3.0.2 @@ -7286,7 +7306,7 @@ snapshots: kleur@3.0.3: {} - langsmith@0.3.11(openai@6.7.0(ws@8.18.1)(zod@3.24.2)): + langsmith@0.3.11(openai@6.7.0(ws@8.18.1)(zod@3.25.76)): dependencies: '@types/uuid': 10.0.0 chalk: 4.1.2 @@ -7296,7 +7316,7 @@ snapshots: semver: 7.7.1 uuid: 10.0.0 optionalDependencies: - openai: 6.7.0(ws@8.18.1)(zod@3.24.2) + openai: 6.7.0(ws@8.18.1)(zod@3.25.76) leven@3.1.0: {} @@ -7471,8 +7491,14 @@ snapshots: negotiator@1.0.0: {} + node-addon-api@7.1.1: {} + node-int64@0.4.0: {} + node-pty@1.1.0: + dependencies: + node-addon-api: 7.1.1 + node-releases@2.0.19: {} node-releases@2.0.27: {} @@ -7517,10 +7543,10 @@ snapshots: dependencies: mimic-function: 5.0.1 - openai@6.7.0(ws@8.18.1)(zod@3.24.2): + openai@6.7.0(ws@8.18.1)(zod@3.25.76): optionalDependencies: ws: 8.18.1 - zod: 3.24.2 + zod: 3.25.76 optional: true opn@5.5.0: @@ -8342,12 +8368,12 @@ snapshots: yoga-layout@3.2.1: {} - zod-to-json-schema@3.24.3(zod@3.24.2): + zod-to-json-schema@3.24.3(zod@3.25.76): dependencies: - zod: 3.24.2 + zod: 3.25.76 - zod-to-json-schema@3.25.2(zod@3.24.2): + zod-to-json-schema@3.25.2(zod@3.25.76): dependencies: - zod: 3.24.2 + zod: 3.25.76 - zod@3.24.2: {} + zod@3.25.76: {} diff --git a/scripts/README.md b/scripts/README.md new file mode 100644 index 00000000..30bd9546 --- /dev/null +++ b/scripts/README.md @@ -0,0 +1,33 @@ +# scripts/ + +Helper scripts. The build-related ones (`generate-version.cjs`, +`smoke-test*.sh`, `check-screens.tsx`) are wired into `package.json`. The rest +below are **manual, runnable tools** for headless e2e + snapshots — each is a +standalone `tsx` entry, named `*.no-jest.ts` so Jest ignores it. + +Run from the repo root, e.g. `npx tsx scripts/.no-jest.ts`. + +Both e2e routes share one primitive: the **real TUI host** runs `startTUI` (the +real ink render) and is driven purely by store state manipulation; a PTY parent +([`e2e-harness/tui-capture.ts`](../e2e-harness/tui-capture.ts), node-pty + +`@xterm/headless`) captures the real rendered screen. + +| Script | What it does | Needs | +| ------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------ | +| **`tui-host.no-jest.ts`** | The real-TUI host. `MODE=fixed` self-drives the fixed e2e profile and signals each screen; `MODE=serve` accepts drive commands (`read_state`/`perform_action`/`run_agent`) over a unix socket. | `APP_DIR`, `POSTHOG_KEY_FILE`, `PROJECT_ID`; run under a PTY | +| **`tui-snapshots.no-jest.ts`** | CI snapshot route: spawns `tui-host` (`MODE=fixed`) in a PTY, runs the full real agent flow, and writes the **real rendered** screen to `SNAP_OUT/NN-.txt` at each key moment (incl. the run screen's progression). | `SNAP_OUT`, `APP_DIR`, `POSTHOG_KEY_FILE`, `PROJECT_ID` | +| **`wizard-ci-mcp.no-jest.ts`** | Agent route: a stdio **MCP server** that proxies `tui-host` (`MODE=serve`) — `read_state`/`perform_action`/`run_agent` forward over the socket, `render_screen` returns the real captured frame. | spawns the host itself; key passed via `open_app` | +| **`wizard-ci-explore.no-jest.ts`** | Quick eyeball of the agent route: drives the MCP server (`open_app → confirm_setup → render_screen`) and prints the real TUI. `pnpm wizard-ci-explore`. | `APP_DIR`, `POSTHOG_KEY_FILE`, `PROJECT_ID` | + +> You usually don't call these directly — `pnpm wizard-ci-snapshots` (in +> [wizard-workbench](https://github.com/PostHog/wizard-workbench)) orchestrates +> the snapshot route; the MCP server is registered in this repo's `.mcp.json` and +> used via the `exploring-the-wizard` skill. + +## Background + +The control plane lives in [`e2e-harness/`](../e2e-harness/) — out of `src/`, so +none of it ships in prod. `WizardCiDriver` (read/act over the store), the +screen→action registry, the e2e profiles, and `tui-capture` (real-TUI PTY +capture). See [`ARCHITECTURE.md`](../e2e-harness/ARCHITECTURE.md) for how the two +routes drive these (env strip, scoped project id, gotchas). diff --git a/scripts/tui-host.no-jest.ts b/scripts/tui-host.no-jest.ts new file mode 100644 index 00000000..119563ca --- /dev/null +++ b/scripts/tui-host.no-jest.ts @@ -0,0 +1,275 @@ +/** + * Shared real-TUI host — the one primitive both e2e routes use. + * + * Runs the real `startTUI` (real ink render → this process's stdout, which the + * PTY parent captures) and drives its store by pure state manipulation via + * `WizardCiDriver` — no keystrokes. Auth is satisfied by `setCredentials` with + * the phx key (same bearer as an OAuth token). + * + * MODE=fixed — self-drive the fixed e2e profile, snapshotting each screen + * (the CI snapshot route). + * MODE=serve — listen on CONTROL_SOCK for {read_state, perform_action, + * set_credentials, run_agent} commands (the agent/MCP route). + * + * Never writes to stdout (that's the TUI); diagnostics go to the wizard log file. + */ +import fs from 'fs'; +import net from 'net'; +import { startTUI } from '@ui/tui/start-tui'; +import { VERSION } from '@lib/version'; +import { Program } from '@lib/programs/program-registry'; +import { buildSession } from '@lib/wizard-session'; +import { posthogIntegrationConfig } from '@lib/programs/posthog-integration'; +import { runAgent } from '@lib/agent/agent-runner'; +import { getOrAskForProjectData } from '@utils/setup-utils'; +import { logToFile } from '@utils/debug'; +import { WizardCiDriver } from '@e2e-harness/wizard-ci-driver'; +import { + decideE2eAction, + type WizardE2eProfile, +} from '@e2e-harness/e2e-profile'; +import { profileFor } from '@e2e-harness/profiles'; + +const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms)); +const mark = (m: string) => logToFile(`[tui-host] ${m}`); + +async function main() { + const apiKey = ( + process.env.POSTHOG_PERSONAL_API_KEY ?? + (process.env.POSTHOG_KEY_FILE + ? fs.readFileSync(process.env.POSTHOG_KEY_FILE, 'utf8') + : '') + ).trim(); + const projectId = process.env.PROJECT_ID!; + + const { store } = startTUI(VERSION, Program.PostHogIntegration); + store.session = buildSession({ + installDir: process.env.APP_DIR!, + ci: true, + apiKey, + projectId, + region: 'us', + }); + const driver = new WizardCiDriver(store); + + // Resolve credentials from the phx key (same bearer as an OAuth token) and set + // them on the store — advances the auth screen with no browser, no keystrokes. + const authByState = async () => { + const d = await getOrAskForProjectData({ + signup: false, + ci: true, + apiKey, + projectId, + programId: Program.PostHogIntegration, + }); + store.setCredentials({ + accessToken: d.accessToken, + projectApiKey: d.projectApiKey, + host: d.host, + projectId: d.projectId, + }); + }; + + if (process.env.MODE === 'serve') return serve(); + return fixed(); + + // ---- agent route: drive commands over a unix socket ---- + function serve() { + let runStatus: 'idle' | 'running' | 'done' | 'failed' = 'idle'; + const handle = async (req: { + type: string; + action?: string; + params?: Record; + }) => { + try { + switch (req.type) { + case 'read_state': + return { + ok: true, + state: { ...driver.readState(), integration: runStatus }, + }; + case 'perform_action': + return { + ok: true, + state: driver.performAction(req.action!, req.params ?? {}), + }; + case 'set_credentials': + await authByState(); + return { ok: true, state: driver.readState() }; + case 'run_agent': { + if (runStatus === 'running' || runStatus === 'done') + return { ok: true, runStatus }; + runStatus = 'running'; + void (async () => { + try { + await store.getGate('intro'); + await store.getGate('health-check'); + await runAgent(posthogIntegrationConfig, store.session); + runStatus = 'done'; + } catch (e) { + runStatus = 'failed'; + mark('run_agent error ' + (e as Error).message); + } + })(); + return { ok: true, runStatus: 'running' }; + } + default: + return { ok: false, error: `unknown command ${req.type}` }; + } + } catch (e) { + return { ok: false, error: (e as Error).message }; + } + }; + const server = net.createServer((sock) => { + let buf = ''; + sock.on('data', (d) => { + buf += d; + let i; + while ((i = buf.indexOf('\n')) >= 0) { + const line = buf.slice(0, i); + buf = buf.slice(i + 1); + if (!line.trim()) continue; + void handle(JSON.parse(line)).then((res) => + sock.write(JSON.stringify(res) + '\n'), + ); + } + }); + }); + const sockPath = process.env.CONTROL_SOCK!; + try { + fs.unlinkSync(sockPath); + } catch { + /* fresh */ + } + server.listen(sockPath, () => mark(`serving on ${sockPath}`)); + void store.runReadyHooks(); // detection so the intro screen fills in + } + + // ---- CI route: self-drive the fixed profile, snapshot each screen ---- + async function fixed() { + const CTRL = process.env.SNAP_CTRL!; + const profile: WizardE2eProfile = profileFor(Program.PostHogIntegration); + const screenPath: string[] = []; + // Snapshot on key moments — a screen change, a task-list update, or a + // runPhase change — so the run screen's progression (the agent working) is + // captured, not just screen transitions. The driver loop snaps each screen + // before acting on it (so transitions are caught as presented); a store + // subscription catches within-screen changes (the run). Deduped by + // signature and serialized. + let lastSig = ''; + let chain: Promise = Promise.resolve(); + const signature = () => + JSON.stringify({ + screen: store.currentScreen, + overlay: store.router.hasOverlay, + tasks: store.tasks.map((t) => [t.label, t.status, t.done]), + phase: store.session.runPhase, + }); + const snap = (): Promise => { + const sig = signature(); + if (sig === lastSig) return chain; + lastSig = sig; + const screen = store.currentScreen; + if (screenPath[screenPath.length - 1] !== screen) screenPath.push(screen); + chain = chain.then(async () => { + await sleep(500); // settle: let the frame finish drawing + fs.appendFileSync(CTRL, store.currentScreen + '\n'); + await sleep(300); // let the capturer capture before the screen moves on + }); + return chain; + }; + const unsub = store.subscribe(() => void snap()); + + let stop = false; + const driverLoop = async () => { + while (!stop && !store.session.skillsComplete) { + await snap(); // capture this screen as presented, before acting + const state = driver.readState(); + const before = state.currentScreen; + let acted = false; + try { + const decision = decideE2eAction(state, profile); + if (decision.action) { + driver.performAction( + decision.action.id, + decision.action.params ?? {}, + ); + acted = true; + } + if (decision.done) stop = true; + } catch (e) { + mark(`action error on ${before}: ${(e as Error).message}`); + } + if (acted && store.currentScreen !== before) continue; + if (!stop) await driver.waitForChange(600_000); + } + }; + const drive = driverLoop(); + + await store.runReadyHooks(); + await store.getGate('intro'); + await store.getGate('health-check'); + + await runAgent(posthogIntegrationConfig, store.session); + const deadline = Date.now() + 120_000; + while (!store.session.skillsComplete && Date.now() < deadline) + await driver.waitForChange(5_000); + // The run reached skillsComplete, so the driver loop is done — but it may be + // parked in waitForChange, so don't block on it; the process exit ends it. + stop = true; + void drive; + unsub(); + await snap(); // the final screen + await chain; // flush any pending snapshots + + // Structured result the --e2e assertion path reads: run phase, posthog deps, + // env file, and the screens walked. + if (process.env.E2E_RESULT_JSON) { + const appDir = process.env.APP_DIR!; + let deps: string[] = []; + try { + const pkg = JSON.parse( + fs.readFileSync(`${appDir}/package.json`, 'utf8'), + ); + deps = Object.keys({ ...pkg.dependencies, ...pkg.devDependencies }); + } catch { + /* some frameworks have no package.json */ + } + const posthogDeps = deps.filter((d) => d.includes('posthog')); + let envFile: string | null = null; + try { + const hit = fs + .readdirSync(appDir) + .find( + (f) => + f.startsWith('.env') && + /posthog/i.test(fs.readFileSync(`${appDir}/${f}`, 'utf8')), + ); + envFile = hit ? `${appDir}/${hit}` : null; + } catch { + /* none */ + } + fs.writeFileSync( + process.env.E2E_RESULT_JSON, + JSON.stringify( + { + runPhase: store.session.runPhase, + hasPosthogDep: posthogDeps.length > 0, + newDeps: posthogDeps, + envFile, + screenPath, + skillsComplete: store.session.skillsComplete, + }, + null, + 2, + ), + ); + } + process.exit(0); + } +} + +main().catch((e) => { + mark('FATAL ' + (e?.stack ?? e)); + process.exit(1); +}); diff --git a/scripts/tui-replay.no-jest.ts b/scripts/tui-replay.no-jest.ts new file mode 100644 index 00000000..55b2fa4b --- /dev/null +++ b/scripts/tui-replay.no-jest.ts @@ -0,0 +1,59 @@ +/** + * Replay captured real-TUI snapshots in the terminal — step through or auto-play + * the `NN-.txt` frames a snapshot run dropped in SNAP_OUT. + * + * npx tsx scripts/tui-replay.no-jest.ts [--step | --delay ] + * pnpm wizard-ci-replay /tmp/snaps # Enter ▸ advance (default) + * pnpm wizard-ci-replay /tmp/snaps --delay 1200 # auto-play + */ +import fs from 'fs'; +import path from 'path'; +import { createInterface } from 'readline'; + +const dir = process.argv[2] || process.env.SNAP_OUT; +const args = process.argv.slice(3); +const delayIdx = args.indexOf('--delay'); +const delay = delayIdx >= 0 ? Number(args[delayIdx + 1]) : null; + +const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms)); +const clear = () => process.stdout.write('\x1b[2J\x1b[3J\x1b[H'); + +async function pressEnter(prompt: string): Promise { + const rl = createInterface({ input: process.stdin, output: process.stdout }); + await new Promise((res) => rl.question(prompt, () => res())); + rl.close(); +} + +async function main() { + if (!dir || !fs.existsSync(dir)) { + console.error( + `✖ snapshot dir not found: ${ + dir ?? '(none)' + }\n usage: tui-replay [--step | --delay ]`, + ); + process.exit(2); + } + const frames = fs + .readdirSync(dir) + .filter((f) => f.endsWith('.txt') && f !== 'latest.txt') + .sort(); + if (frames.length === 0) { + console.error(`✖ no NN-.txt snapshots in ${dir}`); + process.exit(1); + } + // Step (Enter to advance) is the default; fall back to a timed play when not a + // TTY (e.g. CI) so it never hangs, or when --delay is given. + const autoMs = delay ?? (process.stdin.isTTY ? null : 600); + + for (let i = 0; i < frames.length; i++) { + clear(); + process.stdout.write(fs.readFileSync(path.join(dir, frames[i]), 'utf8')); + process.stdout.write(`\n [${i + 1}/${frames.length}] ${frames[i]}\n`); + if (i === frames.length - 1) break; + if (autoMs != null) await sleep(autoMs); + else await pressEnter(' ⏎ next ▸ '); + } + process.stdout.write('\n ✓ end of snapshots\n'); + process.exit(0); +} +main(); diff --git a/scripts/tui-snapshots.no-jest.ts b/scripts/tui-snapshots.no-jest.ts new file mode 100644 index 00000000..4c0ee881 --- /dev/null +++ b/scripts/tui-snapshots.no-jest.ts @@ -0,0 +1,60 @@ +/** + * Fixed-route snapshots of the REAL TUI (Node, single-stack). + * + * Spawns the real-TUI host (MODE=fixed) in a PTY, lets it self-drive the fixed + * e2e profile through the real agent run, and writes the real rendered screen to + * SNAP_OUT/NN-.txt at each key moment the host signals. + * + * SNAP_OUT=/tmp/snaps APP_DIR=/tmp/app POSTHOG_KEY_FILE=… PROJECT_ID=… \ + * npx tsx scripts/tui-snapshots.no-jest.ts + */ +import fs from 'fs'; +import path from 'path'; +import { captureTui } from '@e2e-harness/tui-capture'; + +const OUT = process.env.SNAP_OUT!; +const CTRL = path.join(OUT, 'ctrl'); +fs.mkdirSync(OUT, { recursive: true }); +fs.writeFileSync(CTRL, ''); + +const env: NodeJS.ProcessEnv = { + ...process.env, + MODE: 'fixed', + SNAP_CTRL: CTRL, +}; +for (const k of Object.keys(env)) + if (/^(CLAUDE|ANTHROPIC)/.test(k)) delete env[k]; // gateway auth via phx, not host creds + +const cap = captureTui({ + cmd: path.join(process.cwd(), 'node_modules/.bin/tsx'), + args: ['scripts/tui-host.no-jest.ts'], + cwd: process.cwd(), + env, +}); + +const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms)); +let pos = 0; +let seq = 0; +async function drainCtrl() { + const data = fs.readFileSync(CTRL, 'utf8').slice(pos); + pos += data.length; + for (const raw of data.split('\n')) { + const label = raw.trim(); + if (!label) continue; + await sleep(200); // let xterm apply the final writes for this screen + seq += 1; + const fn = path.join(OUT, `${String(seq).padStart(2, '0')}-${label}.txt`); + fs.writeFileSync(fn, cap.frame()); + // eslint-disable-next-line no-console + console.log('snap ->', path.basename(fn)); + } +} + +const timer = setInterval(() => void drainCtrl(), 150); +void cap.exited.then(async () => { + await drainCtrl(); + clearInterval(timer); + // eslint-disable-next-line no-console + console.log(`done; ${seq} snapshots in ${OUT}`); + process.exit(0); +}); diff --git a/scripts/wizard-ci-explore.no-jest.ts b/scripts/wizard-ci-explore.no-jest.ts new file mode 100644 index 00000000..79313388 --- /dev/null +++ b/scripts/wizard-ci-explore.no-jest.ts @@ -0,0 +1,105 @@ +/** + * Quick eyeball test of the agent (MCP) route — without a full Claude session. + * + * Spawns the wizard-ci MCP server (which boots the real TUI host), drives a few + * steps over stdio JSON-RPC, and prints the REAL rendered screen that + * render_screen returns. Pass STEP=run to also kick off the integration. + * + * APP_DIR=/tmp/app POSTHOG_KEY_FILE=/path/phx.txt PROJECT_ID=228144 \ + * npx tsx scripts/wizard-ci-explore.no-jest.ts + */ +import { spawn } from 'child_process'; +import path from 'path'; + +const srv = spawn( + path.join(process.cwd(), 'node_modules/.bin/tsx'), + ['scripts/wizard-ci-mcp.no-jest.ts'], + { cwd: process.cwd(), stdio: ['pipe', 'pipe', 'inherit'] }, +); + +let buf = ''; +const pending = new Map< + number, + (m: { result: { content: Array<{ text: string }> } }) => void +>(); +srv.stdout.on('data', (d) => { + buf += d; + let i; + while ((i = buf.indexOf('\n')) >= 0) { + const line = buf.slice(0, i); + buf = buf.slice(i + 1); + if (!line.trim()) continue; + let m; + try { + m = JSON.parse(line); + } catch { + continue; + } + if (m.id && pending.has(m.id)) { + pending.get(m.id)!(m); + pending.delete(m.id); + } + } +}); + +let idc = 0; +const send = ( + method: string, + params: unknown, +): Promise<{ result: { content: Array<{ text: string }> } }> => + new Promise((r) => { + const id = ++idc; + pending.set(id, r); + srv.stdin.write( + JSON.stringify({ jsonrpc: '2.0', id, method, params }) + '\n', + ); + }); +const notify = (method: string, params?: unknown) => + srv.stdin.write(JSON.stringify({ jsonrpc: '2.0', method, params }) + '\n'); +const call = (name: string, args: Record = {}) => + send('tools/call', { name, arguments: args }); +const out = (r: { result: { content: Array<{ text: string }> } }) => + r.result.content[0].text; +const screen = (r: { result: { content: Array<{ text: string }> } }) => { + try { + return JSON.parse(out(r)).currentScreen as string; + } catch { + return out(r); + } +}; + +async function main() { + await send('initialize', { + protocolVersion: '2024-11-05', + capabilities: {}, + clientInfo: { name: 'explore', version: '1' }, + }); + notify('notifications/initialized'); + + const open = await call('open_app', { + appDir: process.env.APP_DIR, + keyFile: process.env.POSTHOG_KEY_FILE, + projectId: process.env.PROJECT_ID, + region: process.env.POSTHOG_REGION ?? 'us', + }); + process.stdout.write(`open_app → ${screen(open)}\n`); + process.stdout.write( + `confirm_setup → ${screen( + await call('perform_action', { action: 'confirm_setup' }), + )}\n`, + ); + process.stdout.write( + `read_state → ${screen(await call('read_state'))}\n`, + ); + + process.stdout.write('\n=== render_screen (the REAL TUI) ===\n'); + process.stdout.write(out(await call('render_screen'))); + + srv.kill(); + process.exit(0); +} +main().catch((e) => { + process.stderr.write(`explore error: ${e?.stack ?? e}\n`); + srv.kill(); + process.exit(1); +}); diff --git a/scripts/wizard-ci-mcp.no-jest.ts b/scripts/wizard-ci-mcp.no-jest.ts new file mode 100644 index 00000000..79f7beec --- /dev/null +++ b/scripts/wizard-ci-mcp.no-jest.ts @@ -0,0 +1,233 @@ +/** + * wizard-ci-mcp — MCP server that lets an agent drive the REAL wizard TUI. + * + * A thin proxy: it spawns the shared real-TUI host (scripts/tui-host.no-jest.ts, + * MODE=serve) in a PTY via the Node capturer, forwards read_state/perform_action/ + * run_agent to it over a unix socket, and returns the REAL rendered screen for + * render_screen. No store or rendering lives here — same host the CI snapshot + * route uses. stdout is the JSON-RPC channel; nothing else writes to it. + * + * Registered in this repo's `.mcp.json`, so the tools are bound in every session. + */ +import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js'; +import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js'; +import { z } from 'zod'; +import fs from 'fs'; +import os from 'os'; +import path from 'path'; +import net from 'net'; +import { captureTui, type TuiCapture } from '@e2e-harness/tui-capture'; + +const text = (data: unknown) => ({ + content: [ + { + type: 'text' as const, + text: typeof data === 'string' ? data : JSON.stringify(data, null, 2), + }, + ], +}); +const errorOut = (e: unknown) => ({ + content: [ + { + type: 'text' as const, + text: `Error: ${e instanceof Error ? e.message : String(e)}`, + }, + ], + isError: true, +}); + +const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms)); + +let cap: TuiCapture | null = null; +let sockPath = ''; + +/** One request/response over the host's control socket (newline-delimited JSON). */ +function rpc( + req: object, +): Promise<{ + ok: boolean; + state?: unknown; + error?: string; + runStatus?: string; +}> { + return new Promise((resolve, reject) => { + if (!sockPath) + return reject(new Error('No app open. Call open_app first.')); + const sock = net.connect(sockPath); + let buf = ''; + const timer = setTimeout(() => { + sock.destroy(); + reject(new Error('control socket timeout')); + }, 600_000); + sock.on('connect', () => sock.write(JSON.stringify(req) + '\n')); + sock.on('data', (d) => { + buf += d; + const i = buf.indexOf('\n'); + if (i >= 0) { + clearTimeout(timer); + sock.end(); + resolve(JSON.parse(buf.slice(0, i))); + } + }); + sock.on('error', (e) => { + clearTimeout(timer); + reject(e); + }); + }); +} + +async function waitFor(cond: () => boolean, ms: number): Promise { + const end = Date.now() + ms; + while (Date.now() < end) { + if (cond()) return true; + await sleep(150); + } + return false; +} + +async function main() { + const server = new McpServer({ name: 'wizard-ci', version: '1.0.0' }); + + server.tool( + 'open_app', + 'Boot the real wizard TUI on an app and make it active. Call once before the other tools. appDir is a throwaway copy of the app to integrate. Returns the first screen.', + { + appDir: z + .string() + .describe('Absolute path to the app (a throwaway /tmp copy)'), + keyFile: z + .string() + .optional() + .describe( + 'Absolute path to a file holding the PostHog phx key (preferred)', + ), + apiKey: z + .string() + .optional() + .describe('The phx key inline (prefer keyFile to keep it out of logs)'), + projectId: z.string().describe('PostHog project id the key is scoped to'), + region: z + .enum(['us', 'eu']) + .optional() + .describe('PostHog region (default us)'), + }, + async ({ appDir, keyFile, apiKey, projectId, region }) => { + try { + if (cap) cap.kill(); + const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'wizard-ci-')); + sockPath = path.join(dir, 'host.sock'); + const key = + keyFile ?? + (() => { + const p = path.join(dir, 'key'); + fs.writeFileSync(p, (apiKey ?? '').trim(), { mode: 0o600 }); + return p; + })(); + const env: NodeJS.ProcessEnv = { ...process.env }; + for (const k of Object.keys(env)) + if (/^(CLAUDE|ANTHROPIC)/.test(k)) delete env[k]; + Object.assign(env, { + MODE: 'serve', + CONTROL_SOCK: sockPath, + SNAP_CTRL: path.join(dir, 'ctrl'), + APP_DIR: appDir, + POSTHOG_KEY_FILE: key, + PROJECT_ID: projectId, + POSTHOG_REGION: region ?? 'us', + }); + cap = captureTui({ + cmd: path.join(process.cwd(), 'node_modules/.bin/tsx'), + args: ['scripts/tui-host.no-jest.ts'], + cwd: process.cwd(), + env, + }); + if (!(await waitFor(() => fs.existsSync(sockPath), 30_000))) + return errorOut(new Error('the TUI host did not start')); + await waitFor(() => cap!.frame().includes('PostHog'), 30_000); + const r = await rpc({ type: 'read_state' }); + return text(r.state ?? r); + } catch (e) { + return errorOut(e); + } + }, + ); + + server.tool( + 'read_state', + "Read the wizard's committed state: current screen, run phase, a secret-free session view, tasks, pending question, and the actions legal now. Call after every perform_action and to poll run_agent (integration: running → done).", + {}, + async () => { + try { + const r = await rpc({ type: 'read_state' }); + return text(r.state ?? r); + } catch (e) { + return errorOut(e); + } + }, + ); + + server.tool( + 'perform_action', + 'Commit a decision on the current screen (confirm_setup, dismiss_outage, choose, set_mcp_outcome, dismiss_slack, keep_skills). The action must appear in read_state.actions. Returns the next state.', + { + action: z.string().describe('Action id from read_state.actions'), + params: z + .record(z.string(), z.unknown()) + .optional() + .describe('Action params, e.g. { key: "router", value: "app-router" }'), + }, + async ({ action, params }) => { + try { + const r = await rpc({ + type: 'perform_action', + action, + params: params ?? {}, + }); + return text(r.state ?? r); + } catch (e) { + return errorOut(e); + } + }, + ); + + server.tool( + 'render_screen', + 'Return the REAL rendered TUI screen (ANSI-stripped text) — exactly what the user would see.', + {}, + async () => { + try { + if (!cap) throw new Error('No app open. Call open_app first.'); + await sleep(150); // let the emulator apply the latest frame + return text(cap.frame()); + } catch (e) { + return errorOut(e); + } + }, + ); + + server.tool( + 'run_agent', + 'Kick off the real integration in the background and return immediately. It advances the auth and run screens (they never advance on their own). Then poll read_state — integration goes running → done and currentScreen advances to outro. Creates real PostHog resources (a dashboard + insights). Call once setup is confirmed.', + {}, + async () => { + try { + const r = await rpc({ type: 'run_agent' }); + return text({ + status: + 'integration started in the background — poll read_state (integration: running → done; screen advances to outro)', + ...r, + }); + } catch (e) { + return errorOut(e); + } + }, + ); + + await server.connect(new StdioServerTransport()); + process.stderr.write('wizard-ci-mcp: proxy ready on stdio\n'); +} + +main().catch((e) => { + process.stderr.write(`wizard-ci-mcp fatal: ${e?.stack ?? e}\n`); + process.exit(1); +}); diff --git a/src/lib/programs/posthog-integration/index.ts b/src/lib/programs/posthog-integration/index.ts index bbff84be..8da51862 100644 --- a/src/lib/programs/posthog-integration/index.ts +++ b/src/lib/programs/posthog-integration/index.ts @@ -49,6 +49,7 @@ export const posthogIntegrationConfig: ProgramConfig = { id: 'posthog-integration', steps: POSTHOG_INTEGRATION_PROGRAM, getContentBlocks, + // Basic integration runs without structured user input; drop wizard_ask // so the model can't pop modal prompts mid-run. The runner forwards this // list to the general-purpose subagent as well, so dispatched subagents diff --git a/src/lib/programs/posthog-integration/test/README.md b/src/lib/programs/posthog-integration/test/README.md new file mode 100644 index 00000000..34a008a6 --- /dev/null +++ b/src/lib/programs/posthog-integration/test/README.md @@ -0,0 +1,17 @@ +# PostHog Integration — e2e test definition + +[`e2e.json`](e2e.json) is this program's **test definition**: the options a +headless e2e run auto-takes at each decision point of the flow, plus a +documented `path` of every screen and what it does. + +- **`profile`** — the machine-read part. The harness loads it via + `profileFor(Program.PostHogIntegration)` + ([`e2e-harness/profiles.ts`](../../../../../e2e-harness/profiles.ts)) and asks + `decideE2eAction` what to commit on each screen. +- **`path`** — the human-read part: each screen in order and the auto-decision, + so you can see the whole walk at a glance. + +It's **data, not code** — imported only by the harness, never by prod, so it +doesn't ship in the bundle. To change the test path, edit `e2e.json`. To add a +new program's test path, drop an `e2e.json` in its own `test/` folder and map it +in `profiles.ts`. diff --git a/src/lib/programs/posthog-integration/test/e2e.json b/src/lib/programs/posthog-integration/test/e2e.json new file mode 100644 index 00000000..946435c9 --- /dev/null +++ b/src/lib/programs/posthog-integration/test/e2e.json @@ -0,0 +1,35 @@ +{ + "program": "posthog-integration", + "summary": "Happy path: confirm intro, push past health issues, take the first setup option, skip MCP + Slack, delete installed skills.", + "profile": { + "setup": "first", + "healthCheck": "dismiss", + "mcp": "skip", + "slack": "skip", + "skills": "delete", + "ask": "first" + }, + "path": [ + { "screen": "intro", "auto": "confirm & continue" }, + { + "screen": "health-check", + "auto": "dismiss outage — proceed even if the readiness probe flags an issue" + }, + { + "screen": "setup", + "auto": "pick the first option — only appears when the framework needs disambiguation (e.g. Next.js router); Node/Express has none" + }, + { "screen": "auth", "auto": "(external) — the runner injects credentials" }, + { + "screen": "run", + "auto": "(external) — the real agent integrates the SDK + instruments events" + }, + { "screen": "outro", "auto": "dismiss" }, + { "screen": "mcp", "auto": "skip — don't install the MCP server" }, + { "screen": "slack-connect", "auto": "skip" }, + { + "screen": "keep-skills", + "auto": "delete — leave nothing behind (terminal: the run's done-signal)" + } + ] +} diff --git a/tsconfig.build.json b/tsconfig.build.json index ee22a19d..8618277c 100644 --- a/tsconfig.build.json +++ b/tsconfig.build.json @@ -20,6 +20,7 @@ "paths": { "@env": ["./src/env.ts"], "@lib/*": ["./src/lib/*"], + "@e2e-harness/*": ["./e2e-harness/*"], "@utils/*": ["./src/utils/*"], "@ui": ["./src/ui/index.ts"], "@ui/*": ["./src/ui/*"], diff --git a/tsconfig.json b/tsconfig.json index 882ce510..a54d5a08 100644 --- a/tsconfig.json +++ b/tsconfig.json @@ -19,11 +19,10 @@ "src/**/*", "test/**/*", "e2e-tests/**/*", + "e2e-harness/**/*", "types/**/*" ], - "exclude": [ - "e2e-tests/test-applications/**/*" - ], + "exclude": ["e2e-tests/test-applications/**/*"], "ts-node": { "files": true }