Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
244 changes: 131 additions & 113 deletions .claude/skills/ai-test-runner/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
---
name: ai-test-runner
description: >-
Run a suite of AI-driven test cases against the WordPress and Jetpack iOS
app in a simulator. Use when asked to run a test suite, run AI tests, or
execute test cases in a directory.
Run a suite of plain-language markdown test cases (Prerequisites, Steps,
Expected Outcome) against the WordPress or Jetpack iOS app via
WebDriverAgent on an iOS Simulator. Each test case is a markdown file;
the runner drives the app UI autonomously and reports pass/fail per test.
Use when the user asks to run agent tests, AI tests, a UI test suite, a
smoke run against the app, or to execute test case markdown files in a
directory like `Tests/AgentTests/`.
---

# AI Test Runner
Expand All @@ -13,20 +17,29 @@ iOS Simulator. Each test case is a markdown file with Prerequisites, Steps,
and Expected Outcome. Claude Code navigates the app UI autonomously using
WebDriverAgent.

## Phase 1: Collect Credentials
## Phase 1: Gather Inputs

Before running any tests, ask the user for the following using AskUserQuestion.
Before running any tests, ask the user for:

- **App**: Which app to test — WordPress or Jetpack
- **Site URL**: The WordPress site URL (e.g., `https://example.com`)
- **Username**: The WordPress username or email
- **Application Password**: A WordPress application password for REST API access
- **Test directory**: Path to the directory containing test case markdown files
- **App**: WordPress or Jetpack
- **Test directory**: directory containing test case markdown files
- **Site URL**: the WordPress site to test against (used for REST API
calls and for picking the right site in the app)
- **Sign-in credentials**: see `docs/simulator-sign-in.md` for the two
supported flows. Infer the flow from what the user provides — a username
with application password is the self-hosted flow; a WordPress.com
bearer token is the WordPress.com flow.

Here are the app bundle IDs:
App bundle IDs:
- **WordPress**: `org.wordpress`
- **Jetpack**: `com.automattic.jetpack`

Also resolve the absolute path to the `ios-sim-navigation` skill's
`scripts/` directory and store it as `<WDA_SCRIPTS_DIR>` for use in
Phases 3, 6, and 7 — typically
`<project-root>/.claude/skills/ios-sim-navigation/scripts` on this
project.

## Phase 2: Discover Tests

1. Use Glob to find all `*.md` files in the directory the user specified.
Expand All @@ -43,36 +56,69 @@ If no `.md` files are found, tell the user and stop.

## Phase 3: Start WDA

1. Run the WDA start script, which locates at `scripts/wda-start.rb` in the
`ios-sim-navigation` skill directory. This may take up to 60 seconds the first time.
1. Run `<WDA_SCRIPTS_DIR>/wda-start.rb` from the project root that
should own `.build/WebDriverAgent`. First run takes a couple of
minutes; warm runs ~60 s.

2. Create a WDA session:
2. Confirm WDA is responding:
```bash
curl -s -X POST http://localhost:8100/session \
-H 'Content-Type: application/json' \
-d '{"capabilities":{"alwaysMatch":{}}}'
curl -sf http://localhost:8100/status >/dev/null
```
Extract the session ID from `value.sessionId` in the response.

3. Get the booted simulator UDID for screenshots:
3. Get the booted simulator UDID for screenshots and per-test relaunches:
```bash
xcrun simctl list devices booted -j | jq -r '.devices | to_entries[].value[] | select(.state == "Booted") | .udid'
```

If WDA fails to start or no simulator is booted, tell the user and stop.
If WDA fails to start, doesn't respond, or no simulator is booted, run
`<WDA_SCRIPTS_DIR>/wda-stop.rb` to clean up any stray `xcodebuild`
process, tell the user, and stop.

## Phase 4: Initialize Results Directory

```
<base>/results/<timestamp>-<suite>/
├── <test-filename>.md # per-test files (Phase 6)
└── screenshots/
└── <test-filename>-failure.png # failures only (Phase 6)
```

1. Compute the timestamp as `YYYY-MM-DD-HHmm` from the current date and time.
2. Determine the suite name from the test directory's last path component (e.g., `ui-tests`).
3. Derive the base directory as the **parent** of the test directory (e.g., if the test
directory is `ai-tests/ui-tests`, the base directory is `ai-tests/`).
4. Create the per-test results directory: `mkdir -p <base>/results/<timestamp>-<suite>`
5. Create the screenshots directory: `mkdir -p <base>/results/screenshots`
6. Store the results directory path, screenshots directory path, timestamp, and suite name
in context for use in later phases.

## Phase 5: Run Tests
4. Create the run directory and its screenshots subdirectory in one
call: `mkdir -p <base>/results/<timestamp>-<suite>/screenshots`.
5. Store these paths in context for use in later phases:
- `<RESULTS_DIR>` = `<base>/results/<timestamp>-<suite>`
- `<SCREENSHOTS_DIR>` = `<RESULTS_DIR>/screenshots`

## Phase 5: Sign In

Sign in once, following `docs/simulator-sign-in.md`. Test relaunches in
Phase 6 preserve the signed-in state, so each test skips sign-in. This is
the only phase that uses `-ui-test-reset-everything`. If any step below
fails, run `<WDA_SCRIPTS_DIR>/wda-stop.rb` to release the simulator,
tell the user, and stop.

1. Launch the app using `<SITE_URL>` and `<SIGN_IN_CREDENTIALS>`, with
`-ui-test-reset-everything` in the launch arguments. Poll the WDA
accessibility tree until the app's initial screen appears.

2. Pick the matching sign-in path on the welcome screen per
`docs/simulator-sign-in.md` — self-hosted or WordPress.com, inferred
from `<SIGN_IN_CREDENTIALS>`. The launch arguments hold the
credentials, so do not type a username, password, or bearer token
into the UI. If the launch already dropped you on a signed-in
screen, skip this step. Poll the accessibility tree until a
signed-in screen appears.

3. Verify the active site matches `<SITE_URL>`. For the self-hosted flow
this is automatic (the launch arguments target that site directly).
For the WordPress.com flow, use the site switcher if a different site
is currently selected.

## Phase 6: Run Tests

Run each test case **sequentially**. Tests share one simulator so they must not
run in parallel.
Expand All @@ -83,77 +129,75 @@ Track pass/fail/remaining counts in-context (incrementing counters).

#### Step 1: Dispatch subagent

Call the Agent tool with `subagent_type: general-purpose` and a prompt
constructed from the template below.
Derive `<test-filename>` as the test file's basename without the `.md`
extension (e.g. `create-blank-page.md` → `create-blank-page`). Store
it for use here and in Step 2.

Call the Agent tool with `subagent_type: general-purpose`, `model: "sonnet"`,
and a prompt constructed from the template below.

Build the prompt by filling in the `<PLACEHOLDERS>` with actual values:

````
You are running a single test case against the <APP_NAME> iOS app in a simulator
using WebDriverAgent (WDA).

Use the ios-sim-navigation skill for WDA interaction reference.
You are running a single test case against an iOS app in a simulator
using WebDriverAgent (WDA). The app under test is `<APP_BUNDLE_ID>`.

## Context

- App Bundle ID: <APP_BUNDLE_ID>
- WDA Session ID: <SESSION_ID>
- Simulator UDID: <UDID>
- WDA: already running on http://localhost:8100 (do not start or stop it)
- Test file: <TEST_FILE_PATH> (absolute path)
- Per-test results directory: <PER_TEST_RESULTS_DIR> (absolute path)
- Results directory: <RESULTS_DIR> (absolute path; write your per-test
result file here as `<test-filename>.md`)
- Screenshots directory: <SCREENSHOTS_DIR> (absolute; sibling
`screenshots/` of the per-test result file)
- Site URL: <SITE_URL>
- Username: <USERNAME>
- Application Password: <APPLICATION_PASSWORD>
- Screenshots directory: <SCREENSHOTS_DIR> (absolute path)
- Sign-in credentials: <SIGN_IN_CREDENTIALS> (self-hosted: username +
application password; WordPress.com: bearer token; see
`docs/simulator-sign-in.md`)
- WDA scripts directory: <WDA_SCRIPTS_DIR> (absolute path; contains
`tap.rb`, `wda-start.rb`, `wda-stop.rb`)

## Instructions

1. **Read the test file** at `<TEST_FILE_PATH>`. It contains the information
needed to execute the test: prerequisites, steps, expected outcome, etc.
0. **Load WDA guidance.** Invoke the `ios-sim-navigation` skill via the
Skill tool before any WDA work. Rewrite any `scripts/tap.rb`
reference in that skill as `<WDA_SCRIPTS_DIR>/tap.rb`.

Derive the test filename (without extension) from the file path for use
in result files and screenshots.
1. **Read the test file** at `<TEST_FILE_PATH>`. It contains the
information needed to execute the test: prerequisites, steps,
expected outcome, etc.

2. **Relaunch the app** for a clean state:
2. **Relaunch the app** for a clean per-test UI state. The app is
already signed in — do not pass `-ui-test-reset-everything` and do
not re-drive the sign-in flow.

```bash
xcrun simctl launch --terminate-running-process <UDID> <APP_BUNDLE_ID> \
-ui-test-site-url <SITE_URL> \
-ui-test-site-user <USERNAME> \
-ui-test-site-pass <APPLICATION_PASSWORD>
xcrun simctl launch --terminate-running-process <UDID> <APP_BUNDLE_ID>
```

Wait 2-3 seconds for the app to finish loading.

The app may already be logged in to the site. Check the accessibility tree
to determine if login is required. If the app is already showing the
logged-in state (e.g., My Site screen), skip login.

If the app shows a login/signup screen, log in using these steps:

1. Tap the **"Enter your existing site address"** button.
2. Type the exact site URL value into the site address text field.
3. Tap **Continue**. The app will auto-login after this.

Wait 2-3 seconds for the app to finish loading after login.
Poll the accessibility tree until a signed-in screen (e.g. My Site)
appears. If it doesn't appear within 15 s, mark the test as FAIL
with reason "Not signed in after relaunch".

3. **Fulfill prerequisites** from the test file.

For REST API prerequisites (e.g., creating tags, categories, or posts),
make the API calls using the site URL, username, and application password.
make the API calls using the credentials in `<SIGN_IN_CREDENTIALS>`.
For UI prerequisites like "Logged in to the app with the test account",
the app relaunch in step 2 handles this automatically.

If a prerequisite cannot be fulfilled, mark the test as FAIL with reason
"Prerequisite not met: <details>" and skip to the result writing step.
If a prerequisite cannot be fulfilled, mark the test as FAIL with
reason "Prerequisite not met: <details>".

4. **Execute the test case** following the steps, expected outcome, and any
verification/cleanup sections in the test file. Use WDA for all UI
interactions (refer to the ios-sim-navigation skill). Perform any REST API
cleanup regardless of pass/fail.

5. **Write per-test result file** at
`<PER_TEST_RESULTS_DIR>/<test-filename>.md`:
`<RESULTS_DIR>/<test-filename>.md`:

On pass — write:
```
Expand All @@ -171,25 +215,22 @@ Use the ios-sim-navigation skill for WDA interaction reference.
**Screenshot:** screenshots/<test-filename>-failure.png
```

6. **End your response** with exactly one of these lines as the very last line:
```
RESULT: PASS
```
or:
```
RESULT: FAIL: <reason>
```

IMPORTANT: Prefer the accessibility tree over screenshots. After every tap or
swipe, wait 0.5-1 seconds then re-fetch the tree to see the updated UI state.
The per-test result file is the source of truth for pass/fail. The parent
orchestrator reads it after this subagent returns, so the heading line
(`### PASS <title>` or `### FAIL <title>`) must be written correctly.
````

#### Step 2: Parse subagent response
#### Step 2: Read the per-test result file

After the subagent returns, read `<RESULTS_DIR>/<test-filename>.md`:

After the subagent returns, parse its response:
- A line starting with `### PASS ` means pass.
- A line starting with `### FAIL ` means fail; the failure reason is on
the `**Failure reason:**` line beneath it.
- If the file is missing, count the test as fail with reason "Subagent
did not produce a result file".

- Extract the last line for `RESULT: PASS` or `RESULT: FAIL: <reason>`.
- Update the in-context counters accordingly.
Update the in-context counters accordingly.

#### Step 3: Print status update

Expand All @@ -201,48 +242,25 @@ or:
[2/5] FAIL: create-blank-page — <reason>
```

## Phase 6: Cleanup and Assemble Results

1. Stop WDA:
```bash
ruby ~/.claude/skills/ios-sim-navigation/scripts/wda-stop.rb
```

2. **Assemble the final results file** at `<base>/results/<timestamp>-<suite>.md`:
- Read all per-test result files from `<base>/results/<timestamp>-<suite>/`
- Sort them alphabetically by filename
- Write the assembled file with this structure:
```
# Test Results: <suite>

- **Date:** <YYYY-MM-DD HH:mm>
- **Site:** <site_url>
- **Total:** N | **Passed:** P | **Failed:** F

## Results
## Phase 7: Cleanup and Summary

<contents of per-test result files, concatenated with blank lines between>
```
1. Stop WDA by running `<WDA_SCRIPTS_DIR>/wda-stop.rb`.

3. Print the final summary to the terminal:
2. Print the final summary to the terminal:
```
Test run complete.
Total: N | Passed: P | Failed: F
Results: <base>/results/<timestamp>-<suite>.md
Per-test results: <RESULTS_DIR>
```
If any tests failed, list the failing filenames under the summary so
the user can jump straight to the relevant per-test files and
screenshots.

## Important Notes

- The app MUST already be built and installed on a booted simulator. The app
is relaunched and logged in if needed at the start of each test.
- Assumes the app is already built and installed on a booted simulator.
- Continue to the next test even on failure. The suite reports the full
pass/fail tally at the end, so a single failure should not stop the run.
- Each test case runs in its own subagent to keep the main context lean.
The subagent relaunches the app for a clean state before each test.
- Prefer the accessibility tree over screenshots for all simulator interactions.
- NEVER stop on a test failure. Always continue to the next test.
- After every tap or swipe, wait 0.5-1 seconds then re-fetch the accessibility
tree to see the updated UI state.
- For scrolling, swipe from `(screen_width - 30, screen_height / 2)` upward
to avoid accidentally tapping interactive elements in the center.
- Save failure screenshots to the derived screenshots directory (`<base>/results/screenshots/`).
- Each subagent writes its own per-test result file. The final results file is
assembled in Phase 6 after all tests complete.
Per-test result files in `<RESULTS_DIR>` are the durable record of the
run.
Loading