diff --git a/.claude/skills/ai-test-runner/SKILL.md b/.claude/skills/ai-test-runner/SKILL.md index 0828d9640920..62d63ed9bd7d 100644 --- a/.claude/skills/ai-test-runner/SKILL.md +++ b/.claude/skills/ai-test-runner/SKILL.md @@ -1,9 +1,13 @@ --- name: ai-test-runner description: >- - Run a suite of AI-driven test cases against the WordPress and Jetpack iOS - app in a simulator. Use when asked to run a test suite, run AI tests, or - execute test cases in a directory. + Run a suite of plain-language markdown test cases (Prerequisites, Steps, + Expected Outcome) against the WordPress or Jetpack iOS app via + WebDriverAgent on an iOS Simulator. Each test case is a markdown file; + the runner drives the app UI autonomously and reports pass/fail per test. + Use when the user asks to run agent tests, AI tests, a UI test suite, a + smoke run against the app, or to execute test case markdown files in a + directory like `Tests/AgentTests/`. --- # AI Test Runner @@ -13,20 +17,29 @@ iOS Simulator. Each test case is a markdown file with Prerequisites, Steps, and Expected Outcome. Claude Code navigates the app UI autonomously using WebDriverAgent. -## Phase 1: Collect Credentials +## Phase 1: Gather Inputs -Before running any tests, ask the user for the following using AskUserQuestion. +Before running any tests, ask the user for: -- **App**: Which app to test — WordPress or Jetpack -- **Site URL**: The WordPress site URL (e.g., `https://example.com`) -- **Username**: The WordPress username or email -- **Application Password**: A WordPress application password for REST API access -- **Test directory**: Path to the directory containing test case markdown files +- **App**: WordPress or Jetpack +- **Test directory**: directory containing test case markdown files +- **Site URL**: the WordPress site to test against (used for REST API + calls and for picking the right site in the app) +- **Sign-in credentials**: see `docs/simulator-sign-in.md` for the two + supported flows. Infer the flow from what the user provides — a username + with application password is the self-hosted flow; a WordPress.com + bearer token is the WordPress.com flow. -Here are the app bundle IDs: +App bundle IDs: - **WordPress**: `org.wordpress` - **Jetpack**: `com.automattic.jetpack` +Also resolve the absolute path to the `ios-sim-navigation` skill's +`scripts/` directory and store it as `` for use in +Phases 3, 6, and 7 — typically +`/.claude/skills/ios-sim-navigation/scripts` on this +project. + ## Phase 2: Discover Tests 1. Use Glob to find all `*.md` files in the directory the user specified. @@ -43,36 +56,69 @@ If no `.md` files are found, tell the user and stop. ## Phase 3: Start WDA -1. Run the WDA start script, which locates at `scripts/wda-start.rb` in the - `ios-sim-navigation` skill directory. This may take up to 60 seconds the first time. +1. Run `/wda-start.rb` from the project root that + should own `.build/WebDriverAgent`. First run takes a couple of + minutes; warm runs ~60 s. -2. Create a WDA session: +2. Confirm WDA is responding: ```bash - curl -s -X POST http://localhost:8100/session \ - -H 'Content-Type: application/json' \ - -d '{"capabilities":{"alwaysMatch":{}}}' + curl -sf http://localhost:8100/status >/dev/null ``` - Extract the session ID from `value.sessionId` in the response. -3. Get the booted simulator UDID for screenshots: +3. Get the booted simulator UDID for screenshots and per-test relaunches: ```bash xcrun simctl list devices booted -j | jq -r '.devices | to_entries[].value[] | select(.state == "Booted") | .udid' ``` -If WDA fails to start or no simulator is booted, tell the user and stop. +If WDA fails to start, doesn't respond, or no simulator is booted, run +`/wda-stop.rb` to clean up any stray `xcodebuild` +process, tell the user, and stop. ## Phase 4: Initialize Results Directory +``` +/results/-/ +├── .md # per-test files (Phase 6) +└── screenshots/ + └── -failure.png # failures only (Phase 6) +``` + 1. Compute the timestamp as `YYYY-MM-DD-HHmm` from the current date and time. 2. Determine the suite name from the test directory's last path component (e.g., `ui-tests`). 3. Derive the base directory as the **parent** of the test directory (e.g., if the test directory is `ai-tests/ui-tests`, the base directory is `ai-tests/`). -4. Create the per-test results directory: `mkdir -p /results/-` -5. Create the screenshots directory: `mkdir -p /results/screenshots` -6. Store the results directory path, screenshots directory path, timestamp, and suite name - in context for use in later phases. - -## Phase 5: Run Tests +4. Create the run directory and its screenshots subdirectory in one + call: `mkdir -p /results/-/screenshots`. +5. Store these paths in context for use in later phases: + - `` = `/results/-` + - `` = `/screenshots` + +## Phase 5: Sign In + +Sign in once, following `docs/simulator-sign-in.md`. Test relaunches in +Phase 6 preserve the signed-in state, so each test skips sign-in. This is +the only phase that uses `-ui-test-reset-everything`. If any step below +fails, run `/wda-stop.rb` to release the simulator, +tell the user, and stop. + +1. Launch the app using `` and ``, with + `-ui-test-reset-everything` in the launch arguments. Poll the WDA + accessibility tree until the app's initial screen appears. + +2. Pick the matching sign-in path on the welcome screen per + `docs/simulator-sign-in.md` — self-hosted or WordPress.com, inferred + from ``. The launch arguments hold the + credentials, so do not type a username, password, or bearer token + into the UI. If the launch already dropped you on a signed-in + screen, skip this step. Poll the accessibility tree until a + signed-in screen appears. + +3. Verify the active site matches ``. For the self-hosted flow + this is automatic (the launch arguments target that site directly). + For the WordPress.com flow, use the site switcher if a different site + is currently selected. + +## Phase 6: Run Tests Run each test case **sequentially**. Tests share one simulator so they must not run in parallel. @@ -83,69 +129,67 @@ Track pass/fail/remaining counts in-context (incrementing counters). #### Step 1: Dispatch subagent -Call the Agent tool with `subagent_type: general-purpose` and a prompt -constructed from the template below. +Derive `` as the test file's basename without the `.md` +extension (e.g. `create-blank-page.md` → `create-blank-page`). Store +it for use here and in Step 2. + +Call the Agent tool with `subagent_type: general-purpose`, `model: "sonnet"`, +and a prompt constructed from the template below. Build the prompt by filling in the `` with actual values: ```` -You are running a single test case against the iOS app in a simulator -using WebDriverAgent (WDA). - -Use the ios-sim-navigation skill for WDA interaction reference. +You are running a single test case against an iOS app in a simulator +using WebDriverAgent (WDA). The app under test is ``. ## Context - App Bundle ID: -- WDA Session ID: - Simulator UDID: +- WDA: already running on http://localhost:8100 (do not start or stop it) - Test file: (absolute path) -- Per-test results directory: (absolute path) +- Results directory: (absolute path; write your per-test + result file here as `.md`) +- Screenshots directory: (absolute; sibling + `screenshots/` of the per-test result file) - Site URL: -- Username: -- Application Password: -- Screenshots directory: (absolute path) +- Sign-in credentials: (self-hosted: username + + application password; WordPress.com: bearer token; see + `docs/simulator-sign-in.md`) +- WDA scripts directory: (absolute path; contains + `tap.rb`, `wda-start.rb`, `wda-stop.rb`) ## Instructions -1. **Read the test file** at ``. It contains the information - needed to execute the test: prerequisites, steps, expected outcome, etc. +0. **Load WDA guidance.** Invoke the `ios-sim-navigation` skill via the + Skill tool before any WDA work. Rewrite any `scripts/tap.rb` + reference in that skill as `/tap.rb`. - Derive the test filename (without extension) from the file path for use - in result files and screenshots. +1. **Read the test file** at ``. It contains the + information needed to execute the test: prerequisites, steps, + expected outcome, etc. -2. **Relaunch the app** for a clean state: +2. **Relaunch the app** for a clean per-test UI state. The app is + already signed in — do not pass `-ui-test-reset-everything` and do + not re-drive the sign-in flow. ```bash - xcrun simctl launch --terminate-running-process \ - -ui-test-site-url \ - -ui-test-site-user \ - -ui-test-site-pass + xcrun simctl launch --terminate-running-process ``` - Wait 2-3 seconds for the app to finish loading. - - The app may already be logged in to the site. Check the accessibility tree - to determine if login is required. If the app is already showing the - logged-in state (e.g., My Site screen), skip login. - - If the app shows a login/signup screen, log in using these steps: - - 1. Tap the **"Enter your existing site address"** button. - 2. Type the exact site URL value into the site address text field. - 3. Tap **Continue**. The app will auto-login after this. - - Wait 2-3 seconds for the app to finish loading after login. + Poll the accessibility tree until a signed-in screen (e.g. My Site) + appears. If it doesn't appear within 15 s, mark the test as FAIL + with reason "Not signed in after relaunch". 3. **Fulfill prerequisites** from the test file. For REST API prerequisites (e.g., creating tags, categories, or posts), - make the API calls using the site URL, username, and application password. + make the API calls using the credentials in ``. For UI prerequisites like "Logged in to the app with the test account", the app relaunch in step 2 handles this automatically. - If a prerequisite cannot be fulfilled, mark the test as FAIL with reason - "Prerequisite not met:
" and skip to the result writing step. + If a prerequisite cannot be fulfilled, mark the test as FAIL with + reason "Prerequisite not met:
". 4. **Execute the test case** following the steps, expected outcome, and any verification/cleanup sections in the test file. Use WDA for all UI @@ -153,7 +197,7 @@ Use the ios-sim-navigation skill for WDA interaction reference. cleanup regardless of pass/fail. 5. **Write per-test result file** at - `/.md`: + `/.md`: On pass — write: ``` @@ -171,25 +215,22 @@ Use the ios-sim-navigation skill for WDA interaction reference. **Screenshot:** screenshots/-failure.png ``` -6. **End your response** with exactly one of these lines as the very last line: - ``` - RESULT: PASS - ``` - or: - ``` - RESULT: FAIL: - ``` - -IMPORTANT: Prefer the accessibility tree over screenshots. After every tap or -swipe, wait 0.5-1 seconds then re-fetch the tree to see the updated UI state. +The per-test result file is the source of truth for pass/fail. The parent +orchestrator reads it after this subagent returns, so the heading line +(`### PASS ` or `### FAIL <title>`) must be written correctly. ```` -#### Step 2: Parse subagent response +#### Step 2: Read the per-test result file + +After the subagent returns, read `<RESULTS_DIR>/<test-filename>.md`: -After the subagent returns, parse its response: +- A line starting with `### PASS ` means pass. +- A line starting with `### FAIL ` means fail; the failure reason is on + the `**Failure reason:**` line beneath it. +- If the file is missing, count the test as fail with reason "Subagent + did not produce a result file". -- Extract the last line for `RESULT: PASS` or `RESULT: FAIL: <reason>`. -- Update the in-context counters accordingly. +Update the in-context counters accordingly. #### Step 3: Print status update @@ -201,48 +242,25 @@ or: [2/5] FAIL: create-blank-page — <reason> ``` -## Phase 6: Cleanup and Assemble Results - -1. Stop WDA: - ```bash - ruby ~/.claude/skills/ios-sim-navigation/scripts/wda-stop.rb - ``` - -2. **Assemble the final results file** at `<base>/results/<timestamp>-<suite>.md`: - - Read all per-test result files from `<base>/results/<timestamp>-<suite>/` - - Sort them alphabetically by filename - - Write the assembled file with this structure: - ``` - # Test Results: <suite> - - - **Date:** <YYYY-MM-DD HH:mm> - - **Site:** <site_url> - - **Total:** N | **Passed:** P | **Failed:** F - - ## Results +## Phase 7: Cleanup and Summary - <contents of per-test result files, concatenated with blank lines between> - ``` +1. Stop WDA by running `<WDA_SCRIPTS_DIR>/wda-stop.rb`. -3. Print the final summary to the terminal: +2. Print the final summary to the terminal: ``` Test run complete. Total: N | Passed: P | Failed: F - Results: <base>/results/<timestamp>-<suite>.md + Per-test results: <RESULTS_DIR> ``` + If any tests failed, list the failing filenames under the summary so + the user can jump straight to the relevant per-test files and + screenshots. ## Important Notes -- The app MUST already be built and installed on a booted simulator. The app - is relaunched and logged in if needed at the start of each test. +- Assumes the app is already built and installed on a booted simulator. +- Continue to the next test even on failure. The suite reports the full + pass/fail tally at the end, so a single failure should not stop the run. - Each test case runs in its own subagent to keep the main context lean. - The subagent relaunches the app for a clean state before each test. -- Prefer the accessibility tree over screenshots for all simulator interactions. -- NEVER stop on a test failure. Always continue to the next test. -- After every tap or swipe, wait 0.5-1 seconds then re-fetch the accessibility - tree to see the updated UI state. -- For scrolling, swipe from `(screen_width - 30, screen_height / 2)` upward - to avoid accidentally tapping interactive elements in the center. -- Save failure screenshots to the derived screenshots directory (`<base>/results/screenshots/`). -- Each subagent writes its own per-test result file. The final results file is - assembled in Phase 6 after all tests complete. + Per-test result files in `<RESULTS_DIR>` are the durable record of the + run. diff --git a/.claude/skills/ios-sim-navigation/SKILL.md b/.claude/skills/ios-sim-navigation/SKILL.md index 96cb6311aeb4..6ec6393dc54b 100644 --- a/.claude/skills/ios-sim-navigation/SKILL.md +++ b/.claude/skills/ios-sim-navigation/SKILL.md @@ -1,37 +1,94 @@ --- name: ios-sim-navigation -description: General-purpose skill for navigating and interacting with an iOS app running in a Simulator using WebDriverAgent (WDA). Use when the user asks to tap buttons, swipe, scroll, type text, check what's on screen, go to a tab or screen, automate a flow, or verify UI state in a simulator app. Also use when the user wants to take screenshots, inspect the accessibility tree, explore screen hierarchy, or test a UI flow end-to-end on a simulator. Even if the user says something casual like "open settings in the app", "click that button", or "what's showing on the simulator" — this skill applies. +description: Drive an iOS app running in a Simulator via WebDriverAgent (WDA) — tap, swipe, scroll, type, take screenshots, inspect the accessibility tree, automate or verify a UI flow. Use when the work specifically targets a running Simulator app (e.g. running an end-to-end test, automating an in-app flow, verifying on-screen state via the WDA tree, scripting taps in a simulator). Do not use for non-Simulator UI work, headless code paths, or UI tasks on real devices. --- # iOS Simulator Navigation with WebDriverAgent -## Prerequisites +Drive an iOS app running in a Simulator via WebDriverAgent (WDA). -- Xcode with iOS Simulators installed -- WebDriverAgent built for simulator use (see Setup below) -- The app must be built and installed on the target simulator +## Fast-Path Cadence — read this first + +End-to-end test runs have a hard time budget — usually a few minutes per +test. Every tool call costs roughly 5 s of WDA + Claude round-trip +overhead, so **keep each user-visible action to about one tool call**. +The patterns below — use `tap.rb` over raw curl, never `Read` PNG +screenshots, one tree dump per screen — compound across a long test to +keep you inside the budget. The inverse patterns (three curl turns per +tap, `Read`-ing screenshot PNGs, re-dumping the tree after every action) +burn it. -### First-Time Setup +### Rule 1 — One bash call per tap. Always reach for `scripts/tap.rb`. -Clone and build WebDriverAgent: +`tap.rb` does session creation, element lookup, coordinate computation, +the tap, and an in-band readiness probe in a single Ruby invocation: ```bash -mkdir -p .build -git clone https://github.com/appium/WebDriverAgent.git .build/WebDriverAgent -cd .build/WebDriverAgent -xcodebuild build-for-testing \ - -project WebDriverAgent.xcodeproj \ - -scheme WebDriverAgentRunner \ - -destination "platform=iOS Simulator,name=iPhone 17" \ - CODE_SIGNING_ALLOWED=NO +# Tap a control AND wait up to 3 s for the next screen's marker to appear. +# Replaces: find /elements + get /rect + POST /actions + sleep + tree dump. +ruby scripts/tap.rb aid "create-post-button" --wait-aid "post-title-field" ``` +If you find yourself stringing together `/elements`, `/rect`, and +`/actions` curls by hand, stop. You're about to burn three turns on what +one `tap.rb` invocation does. Reach for raw curl only for genuinely +custom gestures (multi-touch, long-press chains) that `tap.rb` doesn't +model. + +### Rule 2 — Never `Read` a screenshot PNG back into context. + +Decisions come from the **accessibility tree (text)**, not images. Pulling +a PNG back through `Read` inflates the conversation by megabytes per turn +*and* burns an extra round-trip. The tree already contains every label, +identifier, and coordinate you'd see in the screenshot. Screenshots are +an output artifact (failure capture for human review), never an input to +your reasoning. If you're about to `Read /tmp/*.png`, you've already gone +wrong: re-fetch the tree instead. + +### Rule 3 — One tree dump per screen, not per action. + +Fetch `GET /source?format=description` once when you arrive on a new +screen. From that single dump, locate every control you need for the +screen (FAB, fields, buttons), then drive the screen with `tap.rb`. +`tap.rb` itself probes `/elements` for each individual tap, so you do +**not** need to re-dump the full tree between taps. The wait flag is your +between-tap confirmation, not a re-dump. + +Re-dump the tree only when (a) you've landed on a screen you haven't +seen yet this run, or (b) `--wait-aid` timed out and you genuinely need +to figure out what's on screen. + +### Anti-pattern: the slow loop + +``` +# DON'T do this — 4 turns per action, plus megabytes of PNG. +ruby scripts/tap.rb aid create-post-button +xcrun simctl io <UDID> screenshot /tmp/after.png +Read /tmp/after.png +curl -s 'http://localhost:8100/source?format=description' | jq -r .value +``` + +``` +# DO this — 1 turn, no PNG, in-band verification. +ruby scripts/tap.rb aid create-post-button --wait-aid post-title-field +``` + +A test case that says "Verify the post-publish confirmation screen shows +the correct title" is asking you to confirm that text via the tree (a +single targeted `/elements` query by label, or one tree dump and grep), +not to take a picture of it. See "Verifying step success" below. + +## Prerequisites + +- Xcode with iOS Simulators installed +- The app must be built and installed on the target simulator + ## WDA Lifecycle Start and stop WDA using the lifecycle scripts. **WDA must be running before using any curl commands below.** ```bash -# Start WDA (waits until ready, ~60s first time) +# Start WDA. Cold runs do the build first (minutes); warm runs ~60s. ruby scripts/wda-start.rb [--udid <UDID>] [--port <PORT>] # Check if WDA is running @@ -43,330 +100,304 @@ ruby scripts/wda-stop.rb [--port <PORT>] Both scripts auto-detect the first booted simulator. Use `--udid` to target a specific one. -## Strategy: Tree-First Navigation +Run these from the project root that should own the +`.build/WebDriverAgent` cache. `wda-start.rb` resolves the path +relative to its working directory and clones into it on first run. -**Always prefer the accessibility tree over screenshots.** The tree is text-based, faster to process, and doesn't require viewing an image. +## Tap — the default action -1. Fetch the tree with `GET /source?format=description` -2. Make decisions from the tree alone -3. Only take a screenshot when the tree doesn't contain enough info (e.g., verifying visual layout) - -## Accessibility Tree - -WDA offers two tree formats via `GET /source?format=<FORMAT>`: - -### `format=description` -- compact plaintext (~25 KB) +**Use `scripts/tap.rb` for every tap.** It collapses session creation +(with the required `bundleId` binding — see `references/sessions.md`), +element lookup, coordinate computation, the tap dispatch, and an +optional wait into one bash invocation. Three forms: ```bash -curl -s 'http://localhost:8100/source?format=description' | jq -r .value -``` +# Tap by accessibility id (most reliable; developer-assigned, locale-stable). +ruby scripts/tap.rb aid settings-button -Returns a human-readable indented tree. Each line shows an element with its type, memory address, frame as `{{x, y}, {width, height}}`, and optional attributes (identifier, label, Selected, etc.): +# Tap by visible label (matches accessibility id OR label). +ruby scripts/tap.rb text "Continue" -``` -NavigationBar, 0x105351660, {{0.0, 62.0}, {402.0, 54.0}}, identifier: 'my-site-navigation-bar' - Button, 0x105351a20, {{16.0, 62.0}, {44.0, 44.0}}, identifier: 'BackButton', label: 'Site Name' - StaticText, 0x105351b40, {{178.7, 73.7}, {44.7, 20.7}}, label: 'Posts' +# Tap at exact coordinates (only when no stable id/label exists, +# e.g. tapping into an empty area to dismiss a sheet). +ruby scripts/tap.rb at 196,504 ``` -**Use this format by default.** It's ~15x smaller than JSON, easy to reason about, and contains all the information needed for navigation (types, labels, identifiers, and coordinates). +### `--wait-aid` / `--wait-text` — fuse tap and verification -### `format=json` -- structured data (~375 KB) +After most taps you need to confirm the next screen is up before the +next action. When you can name an element you're confident will appear, +pass it to `tap.rb` and let the wait happen in the same call: ```bash -curl -s 'http://localhost:8100/source?format=json' > /tmp/wda-tree.json +# Tap, then wait up to 3s for "Site address" field to appear. ONE turn. +ruby scripts/tap.rb aid "Prologue Self Hosted Button" --wait-aid "Site address" + +# Tab-switch: wait for a known element on the destination screen. +ruby scripts/tap.rb aid tabbar_mysites --wait-aid switch-site-button + +# Wait by visible label instead of aid. +ruby scripts/tap.rb text "Continue" --wait-text "My Site" + +# Bump --timeout for known-slow transitions (network, large lists). +ruby scripts/tap.rb aid publish-button --wait-aid "Post Published" --timeout 15 ``` -Returns deeply nested JSON. Use this when you need to programmatically extract coordinates or search for elements with `jq`. The response has the structure `{"value": <root_node>, "sessionId": "..."}`. Each node has: +The wait polls `/elements` every 250 ms (cheap probe, ~200 B per response) +and exits as soon as the target appears. -| Field | Description | -|-------|-------------| -| `type` | Element type (e.g., `Button`, `StaticText`, `NavigationBar`) | -| `label` | Accessibility label (user-visible text) | -| `name` | Accessibility identifier (developer-assigned ID) | -| `value` | Current value (e.g., text field contents, switch state) | -| `rect` | `{"x": N, "y": N, "width": N, "height": N}` -- structured, use for tap coordinates | -| `frame` | Same as rect but as a string: `"{{x, y}, {w, h}}"` | -| `isEnabled` | Whether the element is interactive | -| `children` | Array of child nodes | +**When to use the wait flag.** Use it whenever you can plausibly name +something on the next screen. Even if you're not 100% sure of the +identifier, naming the most likely candidate is still cheaper than +tapping plain and re-dumping the tree. The downside of a wrong guess is +small: the wait times out (default 3 s) and `tap.rb` exits 1, at which +point you fall back to a tree dump. The upside on a right guess is +saving 2-3 turns. -Search example with `jq`: +**Naming hints** +- `--wait-aid` matches the developer-assigned accessibility identifier + (most stable). +- `--wait-text` matches accessibility id OR visible label, so it's more + forgiving but slightly slower to evaluate. +- `--wait-text` does exact equality, not partial match. If you only have + a substring, omit the wait flag and do one targeted `/elements` query + after the tap. -```bash -cat /tmp/wda-tree.json | jq '.. | objects | select(.label == "Settings")' -``` +Exit codes: `0` on success (tap + wait if specified), `1` if the tap +target wasn't found OR the wait target didn't appear in time, `2` for +WDA / usage errors. -### Computing Tap Coordinates +For W3C pointer gestures `tap.rb` doesn't model (long press), see +`references/raw-actions.md`. -From the description format, parse the frame `{{x, y}, {width, height}}` and compute: +### Anti-pattern: rolling your own tap ``` -tap_x = x + width / 2 -tap_y = y + height / 2 +# DON'T — 3-4 turns to tap one button. +curl -s -X POST http://localhost:8100/session/$SID/elements \ + -H 'Content-Type: application/json' \ + -d '{"using":"accessibility id","value":"create-post-button"}' +# ... extract element id ... +curl -s http://localhost:8100/session/$SID/element/$EID/rect +# ... compute center ... +curl -s -X POST http://localhost:8100/session/$SID/actions ... +curl -s 'http://localhost:8100/source?format=description' # "check state" ``` -From the JSON format, use the `rect` object: - ``` -tap_x = rect.x + rect.width / 2 -tap_y = rect.y + rect.height / 2 +# DO — 1 turn. +ruby scripts/tap.rb aid create-post-button --wait-aid post-title-field ``` -### Finding Elements - -Use this priority order when locating elements in the tree: +## Accessibility Tree -1. **`identifier` / `name`** -- most stable; developer-assigned, unlikely to change across locales -2. **`label`** -- accessibility label; user-visible text, may change with localization -3. **`type` + context** -- e.g., "Button inside NavigationBar" or "Cell inside Table" -4. **Partial matching** -- element label *contains* the target text (useful for dynamic labels like "3 Posts") -5. **Positional heuristics** -- last resort; fragile across screen sizes +**Always prefer the accessibility tree over screenshots.** The tree is +text-based, fast to grep, and contains everything you need (types, labels, +identifiers, coordinates). -In the description format, search the text output for labels or identifiers. In the JSON format, use `jq`: +### `format=description` — compact plaintext (default, ~25 KB) ```bash -# Exact match by identifier -cat /tmp/wda-tree.json | jq '.. | objects | select(.name == "settings-button")' - -# Exact match by label -cat /tmp/wda-tree.json | jq '.. | objects | select(.label == "Settings")' +curl -s 'http://localhost:8100/source?format=description' | jq -r .value +``` -# Partial match by label -cat /tmp/wda-tree.json | jq '.. | objects | select(.label? // "" | contains("Settings"))' +Returns a human-readable indented tree. Each line shows an element with +its type, memory address, frame as `{{x, y}, {width, height}}`, and +optional attributes (identifier, label, Selected, etc.): -# Type + context: find Buttons inside NavigationBar -cat /tmp/wda-tree.json | jq '.. | objects | select(.type == "NavigationBar") | .. | objects | select(.type == "Button")' +``` +NavigationBar, 0x105351660, {{0.0, 62.0}, {402.0, 54.0}}, identifier: 'my-site-navigation-bar' + Button, 0x105351a20, {{16.0, 62.0}, {44.0, 44.0}}, identifier: 'BackButton', label: 'Site Name' + StaticText, 0x105351b40, {{178.7, 73.7}, {44.7, 20.7}}, label: 'Posts' ``` -### Screen Size +**Use this format by default.** It's ~15× smaller than JSON, easy to +reason about, and contains all the navigation info you need. You can +pipe it directly to `grep` to find the few lines that matter. -The root node's `rect` gives the screen dimensions (e.g., `width: 393, height: 852`). +For the larger `format=json` structure (when you need to walk the tree +programmatically, e.g. to read a `value` attribute by element), see +`references/json-tree.md`. -## Session Management +### Finding Elements -Most action endpoints require a session ID. Create one if `/status` doesn't return a `sessionId`: +Priority order when locating something in the tree: -```bash -# Create session -curl -s -X POST http://localhost:8100/session \ - -H 'Content-Type: application/json' \ - -d '{"capabilities":{"alwaysMatch":{}}}' | jq . -``` +1. **`identifier` / `name`** — most stable; developer-assigned, unlikely + to change across locales. +2. **`label`** — accessibility label; user-visible text, may shift with + localization. +3. **`type` + context** — e.g. "Button inside NavigationBar". +4. **Partial matching** — element label *contains* the target text + (useful for dynamic labels like "3 Posts"). +5. **Positional heuristics** — last resort; fragile across screen sizes. -The session ID is at `value.sessionId` in the response. Use it in subsequent action URLs as `SESSION_ID`. +In description format, grep the tree text. Tap coordinates: from a +`{{x, y}, {w, h}}` frame the center is `(x + w/2, y + h/2)`. You almost +never need to compute this yourself, because `tap.rb` does it. -To check for an existing session, look at the `sessionId` field in the `/status` response. +The root node's `rect` gives screen dimensions (e.g. `width: 393, height: 852`). -## Actions +## Verifying step success without screenshots -All action endpoints use `POST /session/SESSION_ID/actions` with W3C WebDriver pointer actions. +When a test step ends in "verify <something is on screen>", do it through +the tree, not a screenshot. The common patterns: -### Tap +**Verify a specific element is present.** Query `/elements` directly: ```bash -curl -s -X POST http://localhost:8100/session/SESSION_ID/actions \ +# Cheap presence probe (~200 B response). +SID=$(jq -r .session_id /tmp/wda-8100.session) +curl -s -X POST "http://localhost:8100/session/$SID/elements" \ -H 'Content-Type: application/json' \ - -d '{ - "actions": [{ - "type": "pointer", - "id": "finger1", - "parameters": {"pointerType": "touch"}, - "actions": [ - {"type": "pointerMove", "duration": 0, "x": X, "y": Y}, - {"type": "pointerDown"}, - {"type": "pointerUp"} - ] - }] - }' + -d '{"using":"accessibility id","value":"post-published-banner"}' \ + | jq -e '.value | length > 0' ``` -#### Alternative: Element-Based Tapping - -WDA can find and tap elements directly without computing coordinates. This is useful when an element has a stable accessibility identifier: +**Verify a specific text is on screen.** One tree dump + grep: ```bash -# Find the element by accessibility identifier -curl -s -X POST http://localhost:8100/session/SESSION_ID/elements \ - -H 'Content-Type: application/json' \ - -d '{"using": "accessibility id", "value": "settings-button"}' | jq . - -# Tap it (ELEMENT_ID comes from the response above, at value[0].ELEMENT) -curl -s -X POST http://localhost:8100/session/SESSION_ID/element/ELEMENT_ID/click +curl -s 'http://localhost:8100/source?format=description' | jq -r .value \ + | grep -F "Category tag post" # exit 0 == found ``` -The coordinate approach above is preferred because it works directly with the tree data already being fetched. Use element-based tapping when coordinate parsing is awkward or when interacting with elements found by predicate. - -### Long Press - -Add a `pause` between `pointerDown` and `pointerUp`. Duration is in milliseconds. +**Verify post-publish / save success.** Most apps surface a confirmation +toast or banner with a stable label or aid. Wait for it as part of the +tap that triggered it: ```bash -curl -s -X POST http://localhost:8100/session/SESSION_ID/actions \ - -H 'Content-Type: application/json' \ - -d '{ - "actions": [{ - "type": "pointer", - "id": "finger1", - "parameters": {"pointerType": "touch"}, - "actions": [ - {"type": "pointerMove", "duration": 0, "x": X, "y": Y}, - {"type": "pointerDown"}, - {"type": "pause", "duration": 1000}, - {"type": "pointerUp"} - ] - }] - }' +ruby scripts/tap.rb aid publish-confirm-button \ + --wait-text "Post published" --timeout 15 ``` -### Swipe +If the verification fails (text not found, exit non-zero), *then* capture +a screenshot for the human-readable failure report. Do not `Read` it +back; just write the path into the failure report. -Move from `(x1, y1)` to `(x2, y2)` with a duration (milliseconds) on the second `pointerMove`. +## Swipe + +**Use `scripts/swipe.rb` for every swipe.** It auto-detects the +simulator's window size, computes direction-to-coordinates from the +guide below, and dispatches the gesture in one call: ```bash -curl -s -X POST http://localhost:8100/session/SESSION_ID/actions \ - -H 'Content-Type: application/json' \ - -d '{ - "actions": [{ - "type": "pointer", - "id": "finger1", - "parameters": {"pointerType": "touch"}, - "actions": [ - {"type": "pointerMove", "duration": 0, "x": X1, "y": Y1}, - {"type": "pointerDown"}, - {"type": "pointerMove", "duration": 500, "x": X2, "y": Y2}, - {"type": "pointerUp"} - ] - }] - }' +ruby scripts/swipe.rb up # vertical swipe up (scrolls content down) +ruby scripts/swipe.rb down # vertical swipe down (scrolls content up) +ruby scripts/swipe.rb left +ruby scripts/swipe.rb right +ruby scripts/swipe.rb back # edge swipe from left edge → right (back nav fallback) + +# Explicit coordinates if you need a custom gesture. +ruby scripts/swipe.rb at 196,500,196,200 + +# Slow swipe (1 s) when the gesture originates on a tappable item so it +# isn't misread as a tap. +ruby scripts/swipe.rb up --duration 1000 ``` -**Swipe direction guide** (given screen size `W x H`): -- **Up** (scroll down): from `(W/2, H/2 + H/6)` to `(W/2, H/2 - H/6)` -- **Down** (scroll up): from `(W/2, H/2 - H/6)` to `(W/2, H/2 + H/6)` -- **Left**: from `(W/2 + W/4, H/2)` to `(W/2 - W/4, H/2)` -- **Right**: from `(W/2 - W/4, H/2)` to `(W/2 + W/4, H/2)` -- **Back** (swipe from left edge): from `(5, H/2)` to `(W*2/3, H/2)` - -### Back Navigation - -To go back to the previous screen: - -- **Primary**: find a Button inside NavigationBar -- its label is typically the previous screen's title. Tap it. -- **Fallback**: edge swipe from `(5, H/2)` to `(W*2/3, H/2)` (see Swipe direction guide above) - -The button approach is more reliable because edge swipes can be finicky depending on gesture recognizers. +Vertical swipes use the right-edge x (`window_width - 30`) so they +don't land on interactive elements in the center. For the raw W3C +pointer-actions JSON body (e.g. multi-finger gestures or long-press +chains the script doesn't model), see `references/raw-actions.md`. -### Type Text - -```bash -curl -s -X POST http://localhost:8100/session/SESSION_ID/wda/keys \ - -H 'Content-Type: application/json' \ - -d '{"value": ["h","e","l","l","o"]}' -``` +## Scroll View Navigation -The `value` array contains individual characters. An element must be focused first (tap a text field before typing). +To find an element in a long scrollable list: -### Clear Text Field +1. Fetch the tree (description format) and grep for the target. +2. If found, `tap.rb` it. Done. +3. If not found, swipe up from the right edge to scroll down + (`x = screen_width - 30`). +4. Re-fetch the tree and grep again. +5. **Detect end of list**: if the tree text is unchanged after a scroll, + you've hit the bottom. +6. Stop and report element-not-found if the bottom is reached without + finding the target. -Select all text and delete it: +Same pattern for horizontal scroll views with horizontal swipes. -```bash -# Select all (Ctrl+A) then delete -curl -s -X POST http://localhost:8100/session/SESSION_ID/wda/keys \ - -H 'Content-Type: application/json' \ - -d '{"value": ["\u0001"]}' -curl -s -X POST http://localhost:8100/session/SESSION_ID/wda/keys \ - -H 'Content-Type: application/json' \ - -d '{"value": ["\u007F"]}' -``` +## Type Text -Alternatively, if you have an element ID: +**Use `scripts/type.rb` for every typing action.** It collapses +"tap-to-focus -> wait for keyboard -> send keys -> read value back" +into one call: ```bash -curl -s -X POST http://localhost:8100/session/SESSION_ID/element/ELEMENT_ID/clear +# Locate the field by aid (or by visible label), type the text. +# By default the script verifies the typed text landed: after typing it +# reads the field's `value` (or `label` as fallback) and exits 1 if the +# attribute doesn't contain TXT — catching dropped keys without you +# having to spend an extra tool call on a manual readback. +ruby scripts/type.rb aid post-title --text "Hello world" +ruby scripts/type.rb text "Email" --text "user@example.com" + +# Opt out of the readback if the field genuinely doesn't expose its +# typed content via value/label (rare — most do). +ruby scripts/type.rb aid post-title --text "Hello world" --no-verify + +# Skip the tap + keyboard wait if the field is already focused +# (e.g. a fresh post editor that auto-focuses its title). +ruby scripts/type.rb aid post-title --text "Hello world" --no-focus ``` -## Waiting for UI Stability - -After performing an action (tap, swipe, type), the UI may be animating or loading. Instead of using a fixed sleep, poll for the expected state: +The script polls for `XCUIElementTypeKeyboard` to appear before sending +keys, which is the cheap focus check from the WDA API. If the keyboard +doesn't appear within `--keyboard-timeout` seconds (default 3), it +exits 1 — at which point you usually need to re-fetch the tree and tap +again at fresh coordinates. `/element/<id>/click` does not reliably +raise the keyboard for text fields; the coordinate-based tap that +`tap.rb` (and `type.rb`) does is more reliable. -1. Fetch the accessibility tree -2. Check if the expected element or screen is present -3. If not found, sleep 0.5s and retry -4. After 10 failed attempts (5 seconds total), declare the element not found +The verify step checks the field's `value` attribute first, then falls +back to `label`. For most SwiftUI / UIKit text inputs the typed content +ends up in the enclosing element's `label` ("Post title. Hello world") +even when the element's own `value` is nil because the text lives on a +descendant `TextView`. Either is sufficient to catch dropped keys. -This approach is more reliable than fixed delays because it adapts to variable animation durations and network load times. +**Don't use `hasKeyboardFocus`.** That attribute is rejected on iOS 26 +("attribute is unknown"); the valid name is `focused`. -## Scroll View Navigation +**Fast typing pattern.** Use `type.rb`, then move on. Don't tree-dump +between each character or after typing. If your text is wrong on +screen, the publish/save step will surface it. Don't take a screenshot +to "see" the typed text. -To find an element in a long scrollable list: +For the raw `/wda/keys` curl (e.g. mixing in control codes for a +clear-field sequence) and clear-field caveats on iOS 26, see +`references/raw-actions.md`. -1. Fetch the tree and search for the target element -2. If found, tap it -- done -3. If not found, swipe up from the right edge to scroll down (use x = `screen_width - 30` to avoid tapping interactive elements) -4. Re-fetch the tree and search again -5. **Detect end of list**: if the tree content is identical after a scroll, you've reached the bottom -6. Stop and report element not found if the bottom is reached without finding the target +## Back Navigation -Use the same pattern for horizontal scroll views, adjusting swipe direction accordingly. +To return to the previous screen, find a Button inside `NavigationBar`. +Its label is typically the previous screen's title. Tap it via +`tap.rb text "<Prev Title>"` (with `--wait-aid` for the destination's +marker). For the edge-swipe fallback, see `references/raw-actions.md`. ## Screenshots -Use `simctl` for screenshots -- more reliable than WDA's base64 approach: +Screenshots are an output artifact for **human review only** (e.g. +attaching a failure image to a test report). Capture with `simctl`: ```bash xcrun simctl io <UDID> screenshot /tmp/screenshot.png ``` -To get the booted simulator's UDID: - -```bash -xcrun simctl list devices booted -j | jq -r '.devices | to_entries[].value[] | select(.state == "Booted")' -``` - -## Tips - -- **Tree coordinates, not screenshot pixels** -- screenshots may be at a different resolution than the tree's point-based coordinates. -- **Vertical swipes**: use the right edge x-coordinate (`screen_width - 30`) to avoid accidentally tapping interactive elements in the center. Use center only when needed. -- **Slow swipes on tappable items**: swipe gestures on tappable items may register as a tap. Use `duration: 1000` (1 second) for more reliable swipes. -- **WDA startup time**: ~60s the first time. Subsequent starts are faster with cached DerivedData. -- **Reconnecting**: if WDA disconnects, run `wda-start.rb` again -- it will reconnect. -- **Tab bar**: look for elements with type containing `TabBar` in the tree. Its children are the individual tabs. - -## Common Failures and Recovery - -### WDA Session Expiry - -WDA sessions can expire after inactivity. If action requests return HTTP 4xx errors, re-create the session: +Booted simulator UDID: ```bash -curl -s -X POST http://localhost:8100/session \ - -H 'Content-Type: application/json' \ - -d '{"capabilities":{"alwaysMatch":{}}}' | jq . +xcrun simctl list devices booted -j | jq -r '.devices | to_entries[].value[] | select(.state == "Booted") | .udid' ``` -### Stale Element Coordinates - -After animations or screen transitions, previously fetched coordinates may be wrong. Always re-fetch the tree and recompute coordinates before tapping after any navigation action. - -### System Alert Interception +See Rule 2 above: never `Read` the resulting PNG back into context. -System alerts (location permissions, notification permissions, tracking prompts) can block interactions with the app. Before retrying a failed tap: +## Reference files -1. Fetch the tree and look for elements of type `Alert` or `Sheet` -2. If found, look for a dismiss button ("Allow", "Don't Allow", "OK", "Cancel") and tap it -3. Then retry the original action - -### App Crash Detection - -If actions consistently fail or the tree looks unexpected, the app may have crashed. Check and re-launch: - -```bash -# Check if the app process is running -xcrun simctl list devices booted - -# Re-launch the app -xcrun simctl launch <UDID> <APP_BUNDLE_ID> -``` +For details that you only need when something specific is happening, +read the matching reference file: -After re-launching, create a new WDA session before continuing. +| Read this | When you need to | +|-----------|------------------| +| `references/sessions.md` | Interact with `/session/*` endpoints directly, debug "HTTP 200 but no UI effect," or understand the `bundleId` binding | +| `references/raw-actions.md` | Long-press, clear a text field (with iOS 26 caveats), or the edge-swipe back fallback | +| `references/json-tree.md` | Walk the tree programmatically with `jq` (e.g. read a `value` attribute by id) instead of grepping description format | +| `references/troubleshooting.md` | A tap silently no-ops, the app may have crashed, a system alert is intercepting input, or you need the swipe/deep-link tips | diff --git a/.claude/skills/ios-sim-navigation/references/json-tree.md b/.claude/skills/ios-sim-navigation/references/json-tree.md new file mode 100644 index 000000000000..c9a0f7a4f37b --- /dev/null +++ b/.claude/skills/ios-sim-navigation/references/json-tree.md @@ -0,0 +1,48 @@ +# JSON Tree Format + +The default tree format (`format=description`) is plaintext and grep-friendly +and covers ~90% of needs. Reach for `format=json` when you need to walk the +tree programmatically — for example reading a specific element's `value` +attribute after typing into it, or finding all elements matching a filter. + +```bash +curl -s 'http://localhost:8100/source?format=json' > /tmp/wda-tree.json +``` + +The JSON tree is ~375 KB (vs ~25 KB for description), so save to a file +and `jq` against it rather than piping it through your conversation +context. + +## Node shape + +Each node has these fields: + +| Field | Description | +|-------|-------------| +| `type` | Element type (`Button`, `StaticText`, …) | +| `label` | Accessibility label (user-visible text) | +| `name` | Accessibility identifier (developer-assigned id) | +| `value` | Current value (text field contents, switch state, …) | +| `rect` | `{"x": N, "y": N, "width": N, "height": N}` | +| `isEnabled` | Whether interactive | +| `children` | Array of child nodes | + +## Common jq patterns + +```bash +# Find a node by accessibility identifier. +jq '.. | objects | select(.name == "post-title")' /tmp/wda-tree.json + +# Find a node by visible label. +jq '.. | objects | select(.label == "Settings")' /tmp/wda-tree.json + +# Partial match on label. +jq '.. | objects | select(.label? // "" | contains("Posts"))' /tmp/wda-tree.json + +# Read a text field's current value. +jq '.. | objects | select(.name == "post-title") | .value' /tmp/wda-tree.json +``` + +For reading a single attribute, the targeted `/element/<id>/attribute/<name>` +endpoint is cheaper than dumping the whole JSON tree. Use the full tree +only when you genuinely need to enumerate or filter across many nodes. diff --git a/.claude/skills/ios-sim-navigation/references/raw-actions.md b/.claude/skills/ios-sim-navigation/references/raw-actions.md new file mode 100644 index 000000000000..538bf9a447b2 --- /dev/null +++ b/.claude/skills/ios-sim-navigation/references/raw-actions.md @@ -0,0 +1,70 @@ +# Raw W3C Actions and Less-Common Gestures + +Use these only when `tap.rb` doesn't cover the gesture (long press, +multi-touch) or when you need to clear a text field. All examples assume +a bound `SESSION_ID` — see `references/sessions.md`. + +```bash +SID=$(jq -r .session_id /tmp/wda-8100.session) +``` + +## Long Press + +A `pause` between `pointerDown` and `pointerUp`. Duration is milliseconds. + +```bash +curl -s -X POST http://localhost:8100/session/$SID/actions \ + -H 'Content-Type: application/json' \ + -d '{ + "actions": [{ + "type": "pointer", "id": "finger1", + "parameters": {"pointerType": "touch"}, + "actions": [ + {"type": "pointerMove", "duration": 0, "x": X, "y": Y}, + {"type": "pointerDown"}, + {"type": "pause", "duration": 1000}, + {"type": "pointerUp"} + ] + }] + }' +``` + +## Clear Text Field + +**Both methods below are unreliable on iOS 26** — verified to silently +no-op against `SearchField` and similar controls. Prefer the field's own +clear control (the small X icon present on most search/text fields) or +tap-and-hold to bring up the iOS edit menu. + +If you must try the programmatic path, **always read the field's +`value` attribute afterwards to confirm it actually cleared.** Don't +trust the HTTP 200. + +```bash +# Select all (Ctrl+A) then delete +curl -s -X POST http://localhost:8100/session/$SID/wda/keys \ + -H 'Content-Type: application/json' \ + -d '{"value": ["\u0001"]}' +curl -s -X POST http://localhost:8100/session/$SID/wda/keys \ + -H 'Content-Type: application/json' \ + -d '{"value": ["\u007F"]}' +``` + +Alternatively, if you have an element id: + +```bash +curl -s -X POST http://localhost:8100/session/$SID/element/ELEMENT_ID/clear +``` + +## Back Navigation Deep-Dive + +To return to the previous screen: + +- **Primary**: find a Button inside `NavigationBar`. Its label is + typically the previous screen's title. Tap it via + `tap.rb text "<Prev Title>"` (with `--wait-aid` for the destination's + marker). +- **Fallback**: edge swipe from `(5, H/2)` to `(W*2/3, H/2)`. + +The button approach is more reliable because edge swipes can be finicky +depending on gesture recognizers. diff --git a/.claude/skills/ios-sim-navigation/references/sessions.md b/.claude/skills/ios-sim-navigation/references/sessions.md new file mode 100644 index 000000000000..8756aebb460b --- /dev/null +++ b/.claude/skills/ios-sim-navigation/references/sessions.md @@ -0,0 +1,47 @@ +# WDA Session Management + +Most action endpoints (`/element/...`, `/elements`, `/wda/keys`, `/actions`) +require a session id. `tap.rb` manages this for you automatically — it +creates a session bound to the foreground app's bundle id, persists it at +`/tmp/wda-<port>.session`, and reuses it across calls. **Read this only +when interacting with `/session/...` endpoints directly.** + +## Why the bundleId binding matters + +Without `bundleId` in the session's capabilities, `/actions` returns +HTTP 200 but the taps never reach the UI: a silent failure that's easy +to mistake for "WDA is broken." Read the active bundle from +`/wda/activeAppInfo` and include it in `alwaysMatch`: + +```bash +BUNDLE=$(curl -s http://localhost:8100/wda/activeAppInfo | jq -r .value.bundleId) +SID=$(curl -s -X POST http://localhost:8100/session \ + -H 'Content-Type: application/json' \ + -d "{\"capabilities\":{\"alwaysMatch\":{\"bundleId\":\"$BUNDLE\"}}}" \ + | jq -r .value.sessionId) +echo "$SID" +``` + +## Reusing the session `tap.rb` persists + +For non-tap curl calls (e.g. `/actions` swipes, `/wda/keys`), reuse the +session id that `tap.rb` already established: + +```bash +SID=$(jq -r .session_id /tmp/wda-8100.session) +``` + +Don't mint a fresh session with `alwaysMatch:{}` for these calls — that +produces an unbound session whose `/actions` requests return HTTP 200 +but never reach the UI. + +## When sessions break + +If the foreground app changes (you launch a different bundle, or iOS +pushes you to Springboard), the existing session may stop dispatching +events. Recreate it with the new bundle id, or just call any `tap.rb` +command (it auto-rebinds). + +Symptoms of a dead session: action requests return HTTP 4xx, or (more +confusingly) return HTTP 200 with no visible UI effect. `tap.rb` +recreates the session automatically on the next call. diff --git a/.claude/skills/ios-sim-navigation/references/troubleshooting.md b/.claude/skills/ios-sim-navigation/references/troubleshooting.md new file mode 100644 index 000000000000..f6d180e3b51d --- /dev/null +++ b/.claude/skills/ios-sim-navigation/references/troubleshooting.md @@ -0,0 +1,62 @@ +# Troubleshooting and Tips + +## Common Failures + +### WDA session expiry + +WDA sessions can expire after inactivity, or stop dispatching events +when the foreground app changes. Symptoms: action requests return HTTP +4xx, or (more confusingly) return HTTP 200 with no visible UI effect +(see `references/sessions.md` for the bundleId gotcha). `tap.rb` +recreates the session automatically on the next call. For direct curl +work, recreate it bound to the current foreground app using the snippet +in `references/sessions.md`. + +### Stale element coordinates + +After animations or screen transitions, previously fetched coordinates +may be wrong. `tap.rb` always re-resolves the element by aid/text before +tapping, so prefer it over caching coordinates yourself. + +### System alert interception + +System alerts (location permissions, notification permissions, tracking +prompts) can block interactions with the app. If a tap silently does +nothing: + +1. Fetch the tree and look for elements of type `Alert` or `Sheet`. +2. If found, look for a dismiss button ("Allow", "Don't Allow", "OK", + "Cancel") and tap it with `tap.rb text "<button>"`. +3. Retry the original action. + +### App crash detection + +If actions consistently fail or the tree looks unexpected, the app may +have crashed. Check and re-launch: + +```bash +xcrun simctl list devices booted +xcrun simctl launch <UDID> <APP_BUNDLE_ID> +``` + +After re-launching, the next `tap.rb` call will create a fresh session +automatically. + +## Tips + +- **Tree coordinates, not screenshot pixels** — screenshots may be at a + different resolution than the tree's point-based coordinates. +- **Vertical swipes**: right-edge x (`screen_width - 30`) avoids + accidentally tapping interactive elements in the center. +- **Slow swipes on tappable items**: gestures may register as a tap. + Use `duration: 1000` (1 s) for reliability. +- **WDA startup time**: minutes on a cold checkout (the build phase + runs first); ~60 s once DerivedData is warm. +- **Reconnecting**: if WDA disconnects, re-run `wda-start.rb`. +- **Tab bar**: look for elements with type containing `TabBar`. Its + children are the individual tabs. +- **Deep links for navigation**: when the target app supports URL + schemes, `xcrun simctl openurl <UDID> <url>` (e.g. + `wordpress://post/new`) jumps straight to a screen and skips + multi-tap navigation chains. Both faster and less flaky than driving + the UI to get there. diff --git a/.claude/skills/ios-sim-navigation/scripts/swipe.rb b/.claude/skills/ios-sim-navigation/scripts/swipe.rb new file mode 100755 index 000000000000..e412519c017a --- /dev/null +++ b/.claude/skills/ios-sim-navigation/scripts/swipe.rb @@ -0,0 +1,164 @@ +#!/usr/bin/env ruby +# swipe.rb — Perform a directional or coordinate swipe via WebDriverAgent. +# +# One Ruby invocation = one Claude turn. Wraps the W3C pointer-actions +# JSON body and computes direction-to-coordinates from the simulator's +# window size automatically. Reuses the session that tap.rb persists at +# /tmp/wda-<port>.session. +# +# Usage: +# swipe.rb up # vertical swipe up (scrolls content down) +# swipe.rb down # vertical swipe down (scrolls content up) +# swipe.rb left +# swipe.rb right +# swipe.rb back # edge swipe from left edge → right (back nav fallback) +# swipe.rb at X1,Y1,X2,Y2 # explicit coordinates +# +# Options: +# --duration MS Swipe duration in milliseconds (default 500). +# Bump to 1000 if the gesture lands on a tappable +# item so it isn't misread as a tap. +# --port PORT WDA port (default: 8100, or $WDA_PORT). +# +# Vertical swipes use the right-edge x (window_width - 30) so the +# gesture doesn't land on interactive elements in the center. See +# SKILL.md "Swipe direction guide" for the underlying math. +# +# Exit codes: +# 0 swipe dispatched +# 2 WDA error / usage error + +require "net/http" +require "json" +require "uri" +require "optparse" + +PORT = (ENV["WDA_PORT"] || 8100).to_i +SESSION_FILE = "/tmp/wda-#{PORT}.session" + +def base_url(port = PORT) + "http://localhost:#{port}" +end + +def http_get(path, port = PORT) + uri = URI("#{base_url(port)}#{path}") + res = Net::HTTP.start(uri.host, uri.port) { |h| h.request(Net::HTTP::Get.new(uri)) } + [res.code.to_i, res.body] +rescue Errno::ECONNREFUSED + raise "WDA not reachable on port #{port}. Run wda-start.rb first." +end + +def http_post(path, body, port = PORT) + uri = URI("#{base_url(port)}#{path}") + req = Net::HTTP::Post.new(uri, "Content-Type" => "application/json") + req.body = JSON.dump(body) + res = Net::HTTP.start(uri.host, uri.port) { |h| h.request(req) } + [res.code.to_i, res.body] +rescue Errno::ECONNREFUSED + raise "WDA not reachable on port #{port}. Run wda-start.rb first." +end + +def active_bundle(port = PORT) + code, body = http_get("/wda/activeAppInfo", port) + return nil unless code == 200 + JSON.parse(body).dig("value", "bundleId") +rescue + nil +end + +def session_id(port = PORT) + if File.exist?(SESSION_FILE) + sid = JSON.parse(File.read(SESSION_FILE))["session_id"] rescue nil + if sid + code, _ = http_get("/session/#{sid}", port) + return sid if code == 200 + end + end + + caps = { "alwaysMatch" => {} } + bundle = active_bundle(port) + caps["alwaysMatch"]["bundleId"] = bundle if bundle + + code, body = http_post("/session", { "capabilities" => caps }, port) + raise "session create failed: HTTP #{code}: #{body}" unless code.between?(200, 299) + sid = JSON.parse(body).dig("value", "sessionId") + raise "no sessionId in response: #{body}" unless sid + File.write(SESSION_FILE, JSON.dump({ session_id: sid, port: port })) + sid +end + +def window_size(port = PORT) + sid = session_id(port) + code, body = http_get("/session/#{sid}/window/size", port) + raise "window/size failed: HTTP #{code}: #{body}" unless code.between?(200, 299) + v = JSON.parse(body)["value"] + [v["width"].to_f, v["height"].to_f] +end + +def swipe_from_to(x1, y1, x2, y2, duration_ms, port = PORT) + sid = session_id(port) + body = { + "actions" => [{ + "type" => "pointer", "id" => "finger1", + "parameters" => { "pointerType" => "touch" }, + "actions" => [ + { "type" => "pointerMove", "duration" => 0, "x" => x1, "y" => y1 }, + { "type" => "pointerDown" }, + { "type" => "pointerMove", "duration" => duration_ms, "x" => x2, "y" => y2 }, + { "type" => "pointerUp" } + ] + }] + } + code, resp = http_post("/session/#{sid}/actions", body, port) + raise "swipe failed: HTTP #{code}: #{resp}" unless code.between?(200, 299) +end + +# Direction-to-coordinates math from SKILL.md "Swipe direction guide". +# Returns [x1, y1, x2, y2]. +def coords_for_direction(direction, port = PORT) + w, h = window_size(port) + case direction + when "up" then [w - 30, h * 2 / 3.0, w - 30, h * 1 / 3.0] + when "down" then [w - 30, h * 1 / 3.0, w - 30, h * 2 / 3.0] + when "left" then [w * 3 / 4.0, h / 2.0, w / 4.0, h / 2.0] + when "right" then [w / 4.0, h / 2.0, w * 3 / 4.0, h / 2.0] + when "back" then [5.0, h / 2.0, w * 2 / 3.0, h / 2.0] + else raise "unknown direction: #{direction}" + end +end + +port = PORT +duration_ms = 500 + +parser = OptionParser.new do |opts| + opts.banner = "Usage: swipe.rb <up|down|left|right|back|at> [X1,Y1,X2,Y2] [--duration MS] [--port PORT]" + opts.on("--duration MS", Integer) { |v| duration_ms = v } + opts.on("--port PORT", Integer) { |v| port = v } +end +parser.parse! + +if ARGV.empty? + $stderr.puts parser.help + exit 2 +end + +direction = ARGV[0] + +begin + if direction == "at" + coords = (ARGV[1] || "").split(",").map { |s| Float(s) rescue nil } + if coords.size != 4 || coords.any?(&:nil?) + $stderr.puts "usage: swipe.rb at X1,Y1,X2,Y2" + exit 2 + end + x1, y1, x2, y2 = coords + else + x1, y1, x2, y2 = coords_for_direction(direction, port) + end + + swipe_from_to(x1, y1, x2, y2, duration_ms, port) + puts "swipe ok #{direction} (%.1f,%.1f → %.1f,%.1f, %dms)" % [x1, y1, x2, y2, duration_ms] +rescue => e + $stderr.puts "error: #{e.message}" + exit 2 +end diff --git a/.claude/skills/ios-sim-navigation/scripts/tap.rb b/.claude/skills/ios-sim-navigation/scripts/tap.rb new file mode 100755 index 000000000000..cb2153b47272 --- /dev/null +++ b/.claude/skills/ios-sim-navigation/scripts/tap.rb @@ -0,0 +1,212 @@ +#!/usr/bin/env ruby +# tap.rb — Tap an element on the iOS Simulator via WebDriverAgent. +# +# One Ruby invocation = one Claude turn. Handles session creation +# (bound to the foreground app's bundle id, which is required for +# pointer events to dispatch), element lookup, the actual tap, and +# (optionally) waiting for the next expected element to appear. +# +# Usage: +# tap.rb aid <accessibility-id> +# tap.rb text "<visible label>" # matches accessibility id OR label +# tap.rb at <x>,<y> # raw coordinates +# +# Options: +# --wait-aid AID After the tap, poll for an element with this aid. +# Exits 0 once found, 1 if --timeout elapses. +# --wait-text TXT Same, but matches by visible label OR aid. +# --timeout SEC Max wait, default 3 seconds. Bump up for known-slow +# transitions (network calls, large list loads). +# --port PORT WDA port (default: 8100, or $WDA_PORT). +# +# Use --wait-aid / --wait-text instead of `tap; sleep N; curl source` +# whenever you can name the element you expect afterwards. It's one +# turn instead of three, ~250ms cadence instead of fixed sleep, and +# returns ~200 B instead of a 25 KB tree dump. +# +# Exit codes: +# 0 tap dispatched (and wait condition met if specified) +# 1 tap target not found, OR wait target didn't appear in time +# 2 WDA error / usage error + +require "net/http" +require "json" +require "uri" +require "optparse" + +PORT = (ENV["WDA_PORT"] || 8100).to_i +SESSION_FILE = "/tmp/wda-#{PORT}.session" + +def base_url(port = PORT) + "http://localhost:#{port}" +end + +def http_get(path, port = PORT) + uri = URI("#{base_url(port)}#{path}") + res = Net::HTTP.start(uri.host, uri.port) { |h| h.request(Net::HTTP::Get.new(uri)) } + [res.code.to_i, res.body] +rescue Errno::ECONNREFUSED + raise "WDA not reachable on port #{port}. Run wda-start.rb first." +end + +def http_post(path, body, port = PORT) + uri = URI("#{base_url(port)}#{path}") + req = Net::HTTP::Post.new(uri, "Content-Type" => "application/json") + req.body = JSON.dump(body) + res = Net::HTTP.start(uri.host, uri.port) { |h| h.request(req) } + [res.code.to_i, res.body] +rescue Errno::ECONNREFUSED + raise "WDA not reachable on port #{port}. Run wda-start.rb first." +end + +def active_bundle(port = PORT) + code, body = http_get("/wda/activeAppInfo", port) + return nil unless code == 200 + JSON.parse(body).dig("value", "bundleId") +rescue + nil +end + +def session_id(port = PORT) + if File.exist?(SESSION_FILE) + sid = JSON.parse(File.read(SESSION_FILE))["session_id"] rescue nil + if sid + code, _ = http_get("/session/#{sid}", port) + return sid if code == 200 + end + end + + caps = { "alwaysMatch" => {} } + bundle = active_bundle(port) + caps["alwaysMatch"]["bundleId"] = bundle if bundle + + code, body = http_post("/session", { "capabilities" => caps }, port) + raise "session create failed: HTTP #{code}: #{body}" unless code.between?(200, 299) + sid = JSON.parse(body).dig("value", "sessionId") + raise "no sessionId in response: #{body}" unless sid + File.write(SESSION_FILE, JSON.dump({ session_id: sid, port: port })) + sid +end + +def locator_for(strategy, value) + case strategy + when "aid" + ["accessibility id", value] + when "text" + escaped = value.gsub("'", "\\\\'") + ["predicate string", "label == '#{escaped}' OR name == '#{escaped}'"] + else + raise "unknown strategy: #{strategy}" + end +end + +def find_first(strategy, value, port = PORT) + using, val = locator_for(strategy, value) + sid = session_id(port) + code, body = http_post("/session/#{sid}/elements", { "using" => using, "value" => val }, port) + return nil if code == 404 + raise "find failed: HTTP #{code}: #{body}" unless code.between?(200, 299) + matches = JSON.parse(body)["value"] || [] + return nil if matches.empty? + m = matches.first + m["ELEMENT"] || m["element-6066-11e4-a52e-4f735466cecf"] || m.values.first +end + +# Poll /elements every interval_s until target appears or deadline passes. +# Returns the elapsed time on success, nil on timeout. +def wait_for(strategy, value, timeout, port = PORT) + using, val = locator_for(strategy, value) + sid = session_id(port) + deadline = Time.now + timeout + start = Time.now + loop do + code, body = http_post("/session/#{sid}/elements", { "using" => using, "value" => val }, port) + if code.between?(200, 299) + matches = (JSON.parse(body)["value"] rescue nil) || [] + return Time.now - start unless matches.empty? + end + return nil if Time.now >= deadline + sleep 0.25 + end +end + +def element_center(eid, port = PORT) + sid = session_id(port) + code, body = http_get("/session/#{sid}/element/#{eid}/rect", port) + raise "rect failed: HTTP #{code}: #{body}" unless code.between?(200, 299) + rect = JSON.parse(body)["value"] + [rect["x"] + rect["width"] / 2.0, rect["y"] + rect["height"] / 2.0] +end + +def tap_at(x, y, port = PORT) + sid = session_id(port) + body = { + "actions" => [{ + "type" => "pointer", "id" => "finger1", + "parameters" => { "pointerType" => "touch" }, + "actions" => [ + { "type" => "pointerMove", "duration" => 0, "x" => x, "y" => y }, + { "type" => "pointerDown" }, + { "type" => "pointerUp" } + ] + }] + } + code, resp = http_post("/session/#{sid}/actions", body, port) + raise "tap failed: HTTP #{code}: #{resp}" unless code.between?(200, 299) +end + +port = PORT +wait_strategy = nil +wait_value = nil +wait_timeout = 3.0 + +parser = OptionParser.new do |opts| + opts.banner = "Usage: tap.rb <aid|text|at> <value> [--wait-aid AID|--wait-text TXT] [--timeout SEC] [--port PORT]" + opts.on("--wait-aid AID") { |v| wait_strategy = "aid"; wait_value = v } + opts.on("--wait-text TXT") { |v| wait_strategy = "text"; wait_value = v } + opts.on("--timeout SEC", Float) { |v| wait_timeout = v } + opts.on("--port PORT", Integer) { |v| port = v } +end +parser.parse! + +if ARGV.size < 2 + $stderr.puts parser.help + exit 2 +end + +strategy = ARGV[0] +value = ARGV[1..].join(" ") + +begin + case strategy + when "at" + x, y = value.split(",").map { |s| Float(s) } + tap_at(x, y, port) + puts "tap ok at #{x.round(1)},#{y.round(1)}" + when "aid", "text" + eid = find_first(strategy, value, port) + unless eid + $stderr.puts "no match for #{strategy}: #{value}" + exit 1 + end + x, y = element_center(eid, port) + tap_at(x, y, port) + puts "tap ok #{strategy}=#{value} at #{x.round(1)},#{y.round(1)}" + else + $stderr.puts "unknown strategy: #{strategy} (use aid, text, or at)" + exit 2 + end + + if wait_strategy + elapsed = wait_for(wait_strategy, wait_value, wait_timeout, port) + if elapsed + puts "wait ok #{wait_strategy}=#{wait_value} in #{elapsed.round(2)}s" + else + $stderr.puts "wait timeout: #{wait_strategy}=#{wait_value} not seen in #{wait_timeout}s" + exit 1 + end + end +rescue => e + $stderr.puts "error: #{e.message}" + exit 2 +end diff --git a/.claude/skills/ios-sim-navigation/scripts/type.rb b/.claude/skills/ios-sim-navigation/scripts/type.rb new file mode 100755 index 000000000000..230a63b2dc9d --- /dev/null +++ b/.claude/skills/ios-sim-navigation/scripts/type.rb @@ -0,0 +1,247 @@ +#!/usr/bin/env ruby +# type.rb — Focus a text field and type into it via WebDriverAgent. +# +# One Ruby invocation = one Claude turn. Collapses the four-step +# "tap field / wait for keyboard / send keys / read value back" loop +# into a single call. Reuses the session that tap.rb persists at +# /tmp/wda-<port>.session (bound to the foreground app's bundleId). +# +# Usage: +# type.rb aid <field-aid> --text "Hello world" +# type.rb text "<field-label>" --text "Hello world" +# +# Options: +# --text TXT Text to send. Required. +# --no-verify Skip the post-type readback. By default the +# script reads the field's `value` (or `label` +# as fallback) and exits 1 if it doesn't +# contain TXT — catching dropped keypresses +# without an extra tool call. +# --keyboard-timeout SEC Max seconds to wait for the keyboard to +# appear after tapping the field (default 3.0). +# --no-focus Skip the tap + keyboard wait. Use when the +# field is already focused (e.g. a fresh post +# editor that auto-focuses its title). +# --port PORT WDA port (default: 8100, or $WDA_PORT). +# +# Why a single string and not per-character: WDA's /wda/keys accepts +# an array whose entries are sent as keystrokes. A single entry like +# `"hello"` types the whole word. Per-character arrays are only useful +# when you need to mix in control codes (e.g. `""` for Ctrl+A +# inside a clear-field sequence). +# +# Exit codes: +# 0 field tapped (if needed), keyboard up, text sent +# (and field value/label contains TXT unless --no-verify) +# 1 field not found, keyboard didn't appear, or verify mismatch +# 2 WDA error / usage error + +require "net/http" +require "json" +require "uri" +require "optparse" + +PORT = (ENV["WDA_PORT"] || 8100).to_i +SESSION_FILE = "/tmp/wda-#{PORT}.session" + +def base_url(port = PORT) + "http://localhost:#{port}" +end + +def http_get(path, port = PORT) + uri = URI("#{base_url(port)}#{path}") + res = Net::HTTP.start(uri.host, uri.port) { |h| h.request(Net::HTTP::Get.new(uri)) } + [res.code.to_i, res.body] +rescue Errno::ECONNREFUSED + raise "WDA not reachable on port #{port}. Run wda-start.rb first." +end + +def http_post(path, body, port = PORT) + uri = URI("#{base_url(port)}#{path}") + req = Net::HTTP::Post.new(uri, "Content-Type" => "application/json") + req.body = JSON.dump(body) + res = Net::HTTP.start(uri.host, uri.port) { |h| h.request(req) } + [res.code.to_i, res.body] +rescue Errno::ECONNREFUSED + raise "WDA not reachable on port #{port}. Run wda-start.rb first." +end + +def active_bundle(port = PORT) + code, body = http_get("/wda/activeAppInfo", port) + return nil unless code == 200 + JSON.parse(body).dig("value", "bundleId") +rescue + nil +end + +def session_id(port = PORT) + if File.exist?(SESSION_FILE) + sid = JSON.parse(File.read(SESSION_FILE))["session_id"] rescue nil + if sid + code, _ = http_get("/session/#{sid}", port) + return sid if code == 200 + end + end + + caps = { "alwaysMatch" => {} } + bundle = active_bundle(port) + caps["alwaysMatch"]["bundleId"] = bundle if bundle + + code, body = http_post("/session", { "capabilities" => caps }, port) + raise "session create failed: HTTP #{code}: #{body}" unless code.between?(200, 299) + sid = JSON.parse(body).dig("value", "sessionId") + raise "no sessionId in response: #{body}" unless sid + File.write(SESSION_FILE, JSON.dump({ session_id: sid, port: port })) + sid +end + +def locator_for(strategy, value) + case strategy + when "aid" + ["accessibility id", value] + when "text" + escaped = value.gsub("'", "\\\\'") + ["predicate string", "label == '#{escaped}' OR name == '#{escaped}'"] + else + raise "unknown strategy: #{strategy}" + end +end + +def find_first(strategy, value, port = PORT) + using, val = locator_for(strategy, value) + sid = session_id(port) + code, body = http_post("/session/#{sid}/elements", { "using" => using, "value" => val }, port) + return nil if code == 404 + raise "find failed: HTTP #{code}: #{body}" unless code.between?(200, 299) + matches = JSON.parse(body)["value"] || [] + return nil if matches.empty? + m = matches.first + m["ELEMENT"] || m["element-6066-11e4-a52e-4f735466cecf"] || m.values.first +end + +def element_center(eid, port = PORT) + sid = session_id(port) + code, body = http_get("/session/#{sid}/element/#{eid}/rect", port) + raise "rect failed: HTTP #{code}: #{body}" unless code.between?(200, 299) + rect = JSON.parse(body)["value"] + [rect["x"] + rect["width"] / 2.0, rect["y"] + rect["height"] / 2.0] +end + +def tap_at(x, y, port = PORT) + sid = session_id(port) + body = { + "actions" => [{ + "type" => "pointer", "id" => "finger1", + "parameters" => { "pointerType" => "touch" }, + "actions" => [ + { "type" => "pointerMove", "duration" => 0, "x" => x, "y" => y }, + { "type" => "pointerDown" }, + { "type" => "pointerUp" } + ] + }] + } + code, resp = http_post("/session/#{sid}/actions", body, port) + raise "tap failed: HTTP #{code}: #{resp}" unless code.between?(200, 299) +end + +# Poll /elements for XCUIElementTypeKeyboard. The cheap check from SKILL.md. +def wait_for_keyboard(timeout, port = PORT) + sid = session_id(port) + deadline = Time.now + timeout + loop do + code, body = http_post( + "/session/#{sid}/elements", + { "using" => "class name", "value" => "XCUIElementTypeKeyboard" }, + port + ) + if code.between?(200, 299) + matches = (JSON.parse(body)["value"] rescue nil) || [] + return true unless matches.empty? + end + return false if Time.now >= deadline + sleep 0.1 + end +end + +def send_keys(text, port = PORT) + sid = session_id(port) + code, resp = http_post("/session/#{sid}/wda/keys", { "value" => [text] }, port) + raise "send_keys failed: HTTP #{code}: #{resp}" unless code.between?(200, 299) +end + +# Try the `value` attribute first; fall back to `label` if `value` is nil. +# Many SwiftUI / UIKit text-input controls expose the typed text via the +# enclosing element's `label` ("Post title. Hello world") even when that +# element's `value` is nil because the text lives on a descendant TextView. +def element_observed_text(eid, port = PORT) + sid = session_id(port) + ["value", "label"].each do |attr| + code, body = http_get("/session/#{sid}/element/#{eid}/attribute/#{attr}", port) + next unless code.between?(200, 299) + observed = JSON.parse(body)["value"] + return observed if observed && !observed.to_s.empty? + end + nil +end + +port = PORT +text_to_send = nil +verify = true +keyboard_timeout = 3.0 +no_focus = false + +parser = OptionParser.new do |opts| + opts.banner = "Usage: type.rb <aid|text> <field-locator> --text TXT [--no-verify] [--no-focus] [--keyboard-timeout SEC] [--port PORT]" + opts.on("--text TXT") { |v| text_to_send = v } + opts.on("--no-verify") { verify = false } + opts.on("--no-focus") { no_focus = true } + opts.on("--keyboard-timeout SEC", Float) { |v| keyboard_timeout = v } + opts.on("--port PORT", Integer) { |v| port = v } +end +parser.parse! + +if ARGV.size < 2 || text_to_send.nil? + $stderr.puts parser.help + exit 2 +end + +strategy = ARGV[0] +locator_value = ARGV[1..].join(" ") + +unless %w[aid text].include?(strategy) + $stderr.puts "unknown strategy: #{strategy} (use aid or text)" + exit 2 +end + +begin + eid = find_first(strategy, locator_value, port) + unless eid + $stderr.puts "no match for #{strategy}: #{locator_value}" + exit 1 + end + + unless no_focus + x, y = element_center(eid, port) + tap_at(x, y, port) + unless wait_for_keyboard(keyboard_timeout, port) + $stderr.puts "keyboard didn't appear within #{keyboard_timeout}s after tapping #{strategy}=#{locator_value}" + exit 1 + end + end + + send_keys(text_to_send, port) + + if verify + observed = element_observed_text(eid, port) + if observed.nil? || !observed.to_s.include?(text_to_send) + $stderr.puts "verify failed: expected to contain #{text_to_send.inspect}, got #{observed.inspect}" + exit 1 + end + puts "type ok #{strategy}=#{locator_value} verified=#{observed.inspect}" + else + puts "type ok #{strategy}=#{locator_value} sent #{text_to_send.inspect}" + end +rescue => e + $stderr.puts "error: #{e.message}" + exit 2 +end diff --git a/.claude/skills/ios-sim-navigation/scripts/wda-start.rb b/.claude/skills/ios-sim-navigation/scripts/wda-start.rb index 68c5dd8e3ee0..331a451f089e 100755 --- a/.claude/skills/ios-sim-navigation/scripts/wda-start.rb +++ b/.claude/skills/ios-sim-navigation/scripts/wda-start.rb @@ -1,8 +1,16 @@ #!/usr/bin/env ruby # Start WebDriverAgent server on a simulator. # -# Runs `xcodebuild test-without-building` in the background and waits -# for WDA to respond on the specified port. +# Workflow: +# 1. Clone WebDriverAgent into `<cwd>/.build/WebDriverAgent` if absent. +# 2. Run `xcodebuild build-for-testing` synchronously (foreground, so +# its progress is visible). Incremental — fast on warm cache. +# 3. Spawn `xcodebuild test-without-building` in the background and +# poll `/status` until WDA responds (~60 s). +# +# Invoke from the project root that should own the `.build/` cache — +# the WebDriverAgent path is resolved relative to the current working +# directory. # # Usage: ./wda-start.rb [--udid <UDID>] [--port <PORT>] # @@ -18,6 +26,7 @@ require "optparse" require "net/http" require "json" +require "fileutils" DEFAULT_PORT = 8100 @@ -71,15 +80,40 @@ def wda_running?(port) exit 0 end -# Find the WDA project -wda_project = File.join(Dir.pwd, ".build", "WebDriverAgent", "WebDriverAgent.xcodeproj") +# Find (or clone) the WDA project. `.build/WebDriverAgent` lives next to +# the caller's cwd so test runs share one cache per project root. +wda_dir = File.join(Dir.pwd, ".build", "WebDriverAgent") +wda_project = File.join(wda_dir, "WebDriverAgent.xcodeproj") unless File.exist?(wda_project) - $stderr.puts "Error: WebDriverAgent project not found at #{wda_project}" - $stderr.puts "Clone it: git clone https://github.com/appium/WebDriverAgent.git .build/WebDriverAgent" - exit 2 + puts "WebDriverAgent not found at #{wda_dir}; cloning..." + FileUtils.mkdir_p(File.dirname(wda_dir)) + clone_ok = system("git", "clone", "--depth", "1", + "https://github.com/appium/WebDriverAgent.git", wda_dir) + unless clone_ok && File.exist?(wda_project) + $stderr.puts "Error: failed to clone WebDriverAgent into #{wda_dir}" + exit 2 + end +end + +# Build first, synchronously, so cold checkouts complete their build +# phase before we start polling for WDA to come up. Incremental, so +# warm runs cost nothing. +puts "Building WebDriverAgent for testing (incremental on warm cache)..." +build_ok = system( + "xcodebuild", "build-for-testing", + "-project", wda_project, + "-scheme", "WebDriverAgentRunner", + "-destination", "id=#{udid}", + "CODE_SIGNING_ALLOWED=NO" +) +unless build_ok + $stderr.puts "Error: xcodebuild build-for-testing failed" + exit 1 end -# Start xcodebuild test-without-building in the background +# Then run the test bundle (which hosts the WDA server) in the +# background. The 60 s `/status` poll below is for WDA to come up — +# the build is already done. cmd = [ "xcodebuild", "test-without-building", "-project", wda_project, diff --git a/docs/simulator-sign-in.md b/docs/simulator-sign-in.md index 99ea9bb77f87..2eb57c16c2ad 100644 --- a/docs/simulator-sign-in.md +++ b/docs/simulator-sign-in.md @@ -8,7 +8,6 @@ Launch the app with credentials to enable automatic sign-in on the simulator. ```bash xcrun simctl launch --terminate-running-process booted org.wordpress \ - -ui-test-reset-everything \ -ui-test-site-url https://example.com \ -ui-test-site-user <username> \ -ui-test-site-pass <application-password> @@ -18,10 +17,12 @@ xcrun simctl launch --terminate-running-process booted org.wordpress \ ```bash xcrun simctl launch --terminate-running-process booted org.wordpress \ - -ui-test-reset-everything \ -ui-test-wpcom-token <bearer-token> ``` +If the app has lingering state from a previous run, reset it in a separate +command before the sign-in launch — see [Launch Arguments](#launch-arguments). + ## Step 2: Signing In After launching with credentials, the app displays a sign-in page with two buttons: **"Continue with WordPress.com"** and **"Enter your existing site address"**. @@ -37,3 +38,21 @@ Tap **"Continue with WordPress.com"**. You will be automatically signed in. 3. Tap **"Continue"** You will be automatically signed in. + +## Launch Arguments + +- `-ui-test-reset-everything` — wipes the Core Data store and `UserDefaults` + on launch. Skip it when the simulator is already fresh (e.g. just after + `xcrun simctl erase`). + + Don't combine it with the sign-in arguments in one command. Run it on its + own, then relaunch with the credentials: + + ```bash + xcrun simctl launch --terminate-running-process booted org.wordpress \ + -ui-test-reset-everything + xcrun simctl launch --terminate-running-process booted org.wordpress \ + -ui-test-site-url https://example.com \ + -ui-test-site-user <username> \ + -ui-test-site-pass <application-password> + ```