Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -266,3 +266,4 @@ jobs:
release-staging/ios-automation-server.tar.gz.sha256
install.sh
run-visiontest.sh
AGENT_INSTRUCTIONS.md
86 changes: 86 additions & 0 deletions AGENT_INSTRUCTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# VisionTest Mobile Automation

VisionTest provides a CLI for automating Android devices and iOS simulators. Every command requires `--platform android` or `--platform ios` (alias `-p`). The CLI reuses the same backend as the MCP server tools.

## Standard Automation Loop

```
1. Start the server → visiontest start_automation_server -p <platform>
2. Take a screenshot → visiontest screenshot -p <platform>
3. Inspect elements → visiontest get_interactive_elements -p <platform>
4. Interact → visiontest tap_by_coordinates -p <platform> <x> <y>
5. Repeat from step 2
```

## Commands

### Setup
| Command | Platforms | Description |
|---------|-----------|-------------|
| `install_automation_server` | android | Install automation APKs on device |
| `start_automation_server` | android, ios | Start the automation server |
| `automation_server_status` | android, ios | Check if server is running |

### Inspection
| Command | Platforms | Description |
|---------|-----------|-------------|
| `get_interactive_elements [--include-disabled]` | android, ios | List tappable elements with coordinates |
| `get_ui_hierarchy` | android, ios | Full UI tree as XML |
| `get_device_info` | android, ios | Display size, rotation, SDK/iOS version |
| `screenshot [--output PATH]` | android, ios | Save PNG (default: `./screenshots/`) |

### Interaction
| Command | Platforms | Description |
|---------|-----------|-------------|
| `tap_by_coordinates <x> <y>` | android, ios | Tap at screen coordinates |
| `input_text <text>` | android, ios | Type into focused element |
| `swipe_direction <up\|down\|left\|right> [--distance short\|medium\|long] [--speed slow\|normal\|fast]` | android, ios | Swipe gesture |

### Navigation
| Command | Platforms | Description |
|---------|-----------|-------------|
| `press_back` | android | Press back button |
| `press_home` | android, ios | Press home button |

### Apps
| Command | Platforms | Description |
|---------|-----------|-------------|
| `launch_app <id>` | android, ios | Launch by package name or bundle ID |

## The `--platform` Flag

Every command requires `--platform` (or `-p`). There is no default and no auto-detection. Android-only commands (`install_automation_server`, `press_back`) reject `--platform ios`.

## Exit Codes

| Code | Meaning | What to do |
|------|---------|------------|
| 0 | Success | Continue |
| 1 | Generic failure | Read stderr, retry or escalate |
| 2 | Usage error | Fix the command arguments |
| 3 | Server not reachable | Run `start_automation_server` first |
| 4 | Device not found | Connect a device or boot a simulator |
| 5 | Platform not supported | Use the correct `--platform` value |

## Flutter Apps

In Flutter apps, text labels appear in `content-desc` (contentDescription) rather than `text`. If `get_interactive_elements` returns elements without visible text, look at the `contentDescription` field instead.

## Example Session

```bash
# Android workflow
visiontest install_automation_server -p android
visiontest start_automation_server -p android
visiontest screenshot -p android
visiontest get_interactive_elements -p android
visiontest tap_by_coordinates -p android 540 1200
visiontest input_text -p android "hello"
visiontest screenshot -p android --output ./after.png

# iOS workflow
visiontest start_automation_server -p ios
visiontest screenshot -p ios
visiontest get_interactive_elements -p ios
visiontest tap_by_coordinates -p ios 200 400
```
31 changes: 31 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,37 @@ Both automation servers expose `GET /health` and `POST /jsonrpc` (JSON-RPC 2.0)

> **Flutter apps:** Text labels use `content-desc` (contentDescription) instead of `text`. If `find_element` by `text` fails, retry with `contentDescription`.

## CLI Usage

The same operations available as MCP tools can be invoked directly from the command line. Every command requires `--platform android` or `--platform ios` (alias `-p`). With no arguments, `visiontest` starts the MCP stdio server as before.

| Command | Platforms | Required args | Optional flags |
|---------|-----------|---------------|----------------|
| `install_automation_server` | android | — | — |
| `start_automation_server` | android, ios | — | — |
| `automation_server_status` | android, ios | — | — |
| `get_interactive_elements` | android, ios | — | `--include-disabled` |
| `get_ui_hierarchy` | android, ios | — | — |
| `get_device_info` | android, ios | — | — |
| `screenshot` | android, ios | — | `--output PATH` |
| `tap_by_coordinates` | android, ios | `x` `y` (ints) | — |
| `input_text` | android, ios | `text` (string) | — |
| `swipe_direction` | android, ios | `direction` (up\|down\|left\|right) | `--distance`, `--speed` |
| `press_back` | android | — | — |
| `press_home` | android, ios | — | — |
| `launch_app` | android, ios | `id` (string) | — |

### CLI Exit Codes

| Code | Meaning |
|------|---------|
| 0 | Success |
| 1 | Generic failure |
| 2 | Usage error (missing/invalid args) |
| 3 | Automation server not reachable |
| 4 | Device/simulator not found |
| 5 | Platform not supported for this command |

## Key Patterns

- All device operations use suspend functions with coroutine-based async
Expand Down
41 changes: 39 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,15 @@ VisionTest has three components:
visiontest/
├── app/ # MCP Server (Kotlin/JVM)
│ └── src/main/kotlin/com/example/visiontest/
│ ├── Main.kt # Entry point
│ ├── Main.kt # Entry point (MCP server or CLI dispatch)
│ ├── ToolFactory.kt # Thin coordinator wiring registrars
│ ├── cli/
│ │ ├── VisionTestCli.kt # Root Clikt command with 13 subcommands
│ │ ├── CliErrorHandler.kt # Exit-code mapping + runCliCommand
│ │ ├── CliExit.kt # CliExit exception + ExitCode enum
│ │ ├── PlatformOption.kt # Platform enum + --platform option helpers
│ │ ├── ComponentHolder.kt # Lazy DI graph for CLI commands
│ │ └── commands/ # 13 Clikt subcommand files
│ ├── tools/
│ │ ├── ToolDsl.kt # ToolScope DSL + CallToolRequest helpers
│ │ ├── ToolRegistrar.kt # Interface for modular registration
Expand Down Expand Up @@ -259,6 +266,13 @@ All Gradle tests are pure JVM unit tests (no device or emulator required). iOS t
| `app/` | `AppConfigTest.kt` | Default configuration values |
| `app/` | `ToolFactoryHelpersTest.kt` | `ToolHelpers.extractProperty`, `extractPattern`, `formatAppInfo` |
| `app/` | `ToolFactoryPathTest.kt` | `ToolDiscovery.findProjectRoot`, `findAutomationServerApk`, `resolveMainApkPath`, `findXctestrun`; `IOSAutomationToolRegistrar.buildXcodebuildCommand` |
| `app/` | `MainDispatchTest.kt` | CLI vs MCP server routing based on args |
| `app/` | `CliErrorHandlerTest.kt` | Exit-code mapping for all exception types |
| `app/` | `VisionTestCliTest.kt` | Clikt argument parsing, platform options, validation |
| `app/` | `CliCommandIntegrationTest.kt` | End-to-end CLI command delegation with MockWebServer |
| `app/` | `AndroidAutomationToolRegistrarTest.kt` | Extracted handler functions with mocked HTTP |
| `app/` | `AndroidDeviceToolRegistrarTest.kt` | Device tool functions with faked DeviceConfig |
| `app/` | `IOSDeviceToolRegistrarTest.kt` | iOS device tool functions with faked DeviceConfig |
| `automation-server/` | `JsonRpcModelsTest.kt` | JSON-RPC error factory methods, request/response defaults |
| `automation-server/` | `UiAutomatorModelsTest.kt` | Data classes, default values, enum entries |
| `automation-server/` | `ServerConfigPortTest.kt` | Port validation boundaries |
Expand Down Expand Up @@ -303,6 +317,27 @@ curl -X POST http://localhost:9009/jsonrpc -H 'Content-Type: application/json' \
# Stop the server: kill the xcodebuild process (Ctrl+C in Terminal 1)
```

### Testing the Installer

You can test `install.sh` locally without publishing a release using `--local-jar`:

```bash
# Build the fat JAR first
./gradlew shadowJar

# Run the installer against a temporary directory
VISIONTEST_DIR=~/.local/share/visiontest-test \
bash install.sh --local-jar app/build/libs/visiontest.jar

# Verify it works
~/.local/bin/visiontest --help

# Clean up
rm -rf ~/.local/share/visiontest-test
```

This skips downloading the JAR, APKs, and iOS bundle from GitHub Releases — it copies your local build instead. Agent instructions are still installed from `AGENT_INSTRUCTIONS.md` in the repo root. Add `--skip-agent-setup` to skip that too.

## Extending VisionTest

### Adding New JSON-RPC Methods
Expand All @@ -315,7 +350,9 @@ curl -X POST http://localhost:9009/jsonrpc -H 'Content-Type: application/json' \
### Adding New MCP Tools

1. Add the tool to the appropriate registrar in `tools/` using the `ToolScope` DSL
2. The tool is automatically registered via `ToolFactory.registerAllTools()`
2. Extract the handler body into an `internal suspend fun` on the registrar (for CLI reuse)
3. The MCP tool is automatically registered via `ToolFactory.registerAllTools()`
4. Optionally, add a CLI subcommand in `cli/commands/` and register it in `VisionTestCli.kt`

## Error Codes

Expand Down
16 changes: 16 additions & 0 deletions LEARNING.md
Original file line number Diff line number Diff line change
Expand Up @@ -645,3 +645,19 @@ iOS elements expose different properties than Android. The bridge maps them to a
- [Model Context Protocol (MCP)](https://modelcontextprotocol.io/)
- [Template Method Pattern](https://refactoring.guru/design-patterns/template-method)
- [OWASP Mobile Security Guide](https://owasp.org/www-project-mobile-app-security/)

## Dual Facade: MCP Tools + CLI

### Handler Extraction Refactor

Each MCP tool's handler body was extracted into an `internal suspend fun` on its registrar class. The MCP `scope.tool { }` block now handles only arg extraction from `CallToolRequest` and delegates to the extracted function. The CLI subcommands call the same functions directly with typed parameters.

This means both facades — MCP and CLI — share one implementation. Bug fixes and behavior changes apply to both paths automatically.

### Why Not Interfaces?

The extracted functions live on the concrete registrar classes rather than behind interfaces. This keeps the refactor minimal (no new types) and the functions are `internal` to the module, so they're only accessible within `app/`. If a third facade is needed later, introducing interfaces would be straightforward.

### Deferred: `--json` Output and Daemon Mode

The MVP CLI outputs plain text (the same strings the MCP tools return). Structured `--json` output was deferred because the MCP tools already return prose strings — adding JSON would require changing the return types of every extracted function. Daemon mode (keeping the process alive across multiple commands) was deferred to avoid the complexity of long-lived state management in the CLI path.
31 changes: 30 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,33 @@ Your AI coding tool discovers all available tools automatically via MCP. Just as

**iOS Automation:** `ios_start_automation_server`, `ios_automation_server_status`, `ios_get_ui_hierarchy`, `ios_get_interactive_elements`, `ios_find_element`, `ios_tap_by_coordinates`, `ios_swipe`, `ios_swipe_direction`, `ios_get_device_info`, `ios_input_text`, `ios_press_home`, `ios_stop_automation_server`

## CLI Usage

The same operations are also available as direct CLI commands — no MCP client needed:

```bash
visiontest automation_server_status -p android
visiontest get_interactive_elements -p ios
visiontest tap_by_coordinates -p android 100 200
visiontest screenshot -p ios --output ./screenshot.png
visiontest swipe_direction -p android up --distance long --speed fast
```

Every command requires `--platform android` or `--platform ios` (alias `-p`). Run `visiontest --help` for the full command list, or `visiontest <command> --help` for per-command usage.

With no arguments, `visiontest` starts the MCP stdio server.

### Exit Codes

| Code | Meaning |
|------|---------|
| 0 | Success |
| 1 | Generic failure |
| 2 | Usage error (missing/invalid args) |
| 3 | Automation server not reachable |
| 4 | Device/simulator not found |
| 5 | Platform not supported for this command |

## Configuration

### Environment Variables
Expand All @@ -175,7 +202,8 @@ Your AI coding tool discovers all available tools automatically via MCP. Just as
## Future Plans

- [x] Text input/typing support
- [ ] Screenshot capture via UIAutomator / XCUITest
- [x] Screenshot capture via UIAutomator / XCUITest
- [x] CLI mode (direct command-line usage without MCP)
- [ ] Long press operations
- [ ] Wait/sync operations for E2E testing
- [ ] Multi-device coordination
Expand All @@ -186,6 +214,7 @@ Your AI coding tool discovers all available tools automatically via MCP. Just as
- [ ] Notification/status bar interaction
- [ ] Permission dialog automation
- [ ] Video recording of automation sessions
- [ ] Separate CLI-only artifact (smaller download, no MCP dependencies)

## Contributing

Expand Down
4 changes: 3 additions & 1 deletion app/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,9 @@ dependencies {
// Gson for JSON serialization
implementation("com.google.code.gson:gson:2.10.1")


// Clikt — CLI argument parsing for the `visiontest <subcommand>` surface.
// Used only when the JAR is invoked with CLI args; MCP stdio path does not touch clikt.
implementation("com.github.ajalt.clikt:clikt:4.4.0")

// This dependency is used by the application.
implementation(libs.guava)
Expand Down
9 changes: 8 additions & 1 deletion app/src/main/kotlin/com/example/visiontest/Exceptions.kt
Original file line number Diff line number Diff line change
Expand Up @@ -52,4 +52,11 @@ class NoSimulatorAvailableException(message: String) : Exception(message)
/**
* Thrown when a specified app cannot be found on the iOS device/simulator.
*/
class AppNotFoundException(message: String) : Exception(message)
class AppNotFoundException(message: String) : Exception(message)

/**
* Thrown when a command requires the automation server but it is not running.
* Used by both Android and iOS registrars so the CLI path can map to exit code 3
* and the MCP path can surface the message via [ErrorHandler].
*/
class ServerNotRunningException(message: String) : Exception(message)
52 changes: 50 additions & 2 deletions app/src/main/kotlin/com/example/visiontest/Main.kt
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
package com.example.visiontest

import com.example.visiontest.android.Android
import com.example.visiontest.cli.CliExit
import com.example.visiontest.cli.ExitCode
import com.example.visiontest.cli.VisionTestCli
import com.example.visiontest.ios.IOSManager
import com.example.visiontest.config.AppConfig
import com.github.ajalt.clikt.core.CliktError
import com.github.ajalt.clikt.core.PrintHelpMessage
import com.github.ajalt.clikt.core.UsageError
import io.ktor.utils.io.streams.asInput
import io.modelcontextprotocol.kotlin.sdk.*
import io.modelcontextprotocol.kotlin.sdk.server.*
Expand All @@ -13,7 +19,49 @@ import kotlinx.io.buffered
import org.slf4j.LoggerFactory


fun main() {
fun main(args: Array<String>) {
when (route(args)) {
Route.McpServer -> runMcpServer()
Route.Cli -> runCli(args)
}
}

private fun runCli(args: Array<String>) {
try {
VisionTestCli().parse(args)
} catch (e: CliExit) {
// Safety net: all CliExit exceptions should be caught by runCliCommand inside
// each subcommand's run(). This catch handles any that escape during arg parsing.
System.err.println(e.message)
kotlin.system.exitProcess(e.code.value)
} catch (e: UsageError) {
val defaultFormatter = object : com.github.ajalt.clikt.output.ParameterFormatter {
override fun formatOption(name: String) = name
override fun formatArgument(name: String) = "<$name>"
override fun formatSubcommand(name: String) = name
}
val loc = e.context?.localization ?: object : com.github.ajalt.clikt.output.Localization {}
val msg = e.formatMessage(loc, defaultFormatter)
System.err.println(msg)
kotlin.system.exitProcess(ExitCode.UsageError.value)
} catch (e: PrintHelpMessage) {
val cmd = e.context?.command
if (cmd != null) {
println(cmd.getFormattedHelp())
}
kotlin.system.exitProcess(if (e.error) 1 else 0)
} catch (e: CliktError) {
System.err.println(e.message.orEmpty())
kotlin.system.exitProcess(ExitCode.GenericFailure.value)
}
}

internal enum class Route { McpServer, Cli }

internal fun route(args: Array<String>): Route =
if (args.isEmpty() || args[0] == "serve") Route.McpServer else Route.Cli

private fun runMcpServer() {

val config = AppConfig.createDefault()

Expand Down Expand Up @@ -84,4 +132,4 @@ private fun createServer(config: AppConfig): Server {
)
)
)
}
}
Loading
Loading