diff --git a/.github/workflows/release.yaml b/.github/workflows/release.yaml index ef59922..b7d2bee 100644 --- a/.github/workflows/release.yaml +++ b/.github/workflows/release.yaml @@ -266,3 +266,4 @@ jobs: release-staging/ios-automation-server.tar.gz.sha256 install.sh run-visiontest.sh + AGENT_INSTRUCTIONS.md diff --git a/AGENT_INSTRUCTIONS.md b/AGENT_INSTRUCTIONS.md new file mode 100644 index 0000000..499a9a3 --- /dev/null +++ b/AGENT_INSTRUCTIONS.md @@ -0,0 +1,86 @@ +# VisionTest Mobile Automation + +VisionTest provides a CLI for automating Android devices and iOS simulators. Every command requires `--platform android` or `--platform ios` (alias `-p`). The CLI reuses the same backend as the MCP server tools. + +## Standard Automation Loop + +``` +1. Start the server → visiontest start_automation_server -p +2. Take a screenshot → visiontest screenshot -p +3. Inspect elements → visiontest get_interactive_elements -p +4. Interact → visiontest tap_by_coordinates -p +5. Repeat from step 2 +``` + +## Commands + +### Setup +| Command | Platforms | Description | +|---------|-----------|-------------| +| `install_automation_server` | android | Install automation APKs on device | +| `start_automation_server` | android, ios | Start the automation server | +| `automation_server_status` | android, ios | Check if server is running | + +### Inspection +| Command | Platforms | Description | +|---------|-----------|-------------| +| `get_interactive_elements [--include-disabled]` | android, ios | List tappable elements with coordinates | +| `get_ui_hierarchy` | android, ios | Full UI tree as XML | +| `get_device_info` | android, ios | Display size, rotation, SDK/iOS version | +| `screenshot [--output PATH]` | android, ios | Save PNG (default: `./screenshots/`) | + +### Interaction +| Command | Platforms | Description | +|---------|-----------|-------------| +| `tap_by_coordinates ` | android, ios | Tap at screen coordinates | +| `input_text ` | android, ios | Type into focused element | +| `swipe_direction [--distance short\|medium\|long] [--speed slow\|normal\|fast]` | android, ios | Swipe gesture | + +### Navigation +| Command | Platforms | Description | +|---------|-----------|-------------| +| `press_back` | android | Press back button | +| `press_home` | android, ios | Press home button | + +### Apps +| Command | Platforms | Description | +|---------|-----------|-------------| +| `launch_app ` | android, ios | Launch by package name or bundle ID | + +## The `--platform` Flag + +Every command requires `--platform` (or `-p`). There is no default and no auto-detection. Android-only commands (`install_automation_server`, `press_back`) reject `--platform ios`. + +## Exit Codes + +| Code | Meaning | What to do | +|------|---------|------------| +| 0 | Success | Continue | +| 1 | Generic failure | Read stderr, retry or escalate | +| 2 | Usage error | Fix the command arguments | +| 3 | Server not reachable | Run `start_automation_server` first | +| 4 | Device not found | Connect a device or boot a simulator | +| 5 | Platform not supported | Use the correct `--platform` value | + +## Flutter Apps + +In Flutter apps, text labels appear in `content-desc` (contentDescription) rather than `text`. If `get_interactive_elements` returns elements without visible text, look at the `contentDescription` field instead. + +## Example Session + +```bash +# Android workflow +visiontest install_automation_server -p android +visiontest start_automation_server -p android +visiontest screenshot -p android +visiontest get_interactive_elements -p android +visiontest tap_by_coordinates -p android 540 1200 +visiontest input_text -p android "hello" +visiontest screenshot -p android --output ./after.png + +# iOS workflow +visiontest start_automation_server -p ios +visiontest screenshot -p ios +visiontest get_interactive_elements -p ios +visiontest tap_by_coordinates -p ios 200 400 +``` diff --git a/CLAUDE.md b/CLAUDE.md index f7a17c2..0dafaa1 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -135,6 +135,37 @@ Both automation servers expose `GET /health` and `POST /jsonrpc` (JSON-RPC 2.0) > **Flutter apps:** Text labels use `content-desc` (contentDescription) instead of `text`. If `find_element` by `text` fails, retry with `contentDescription`. +## CLI Usage + +The same operations available as MCP tools can be invoked directly from the command line. Every command requires `--platform android` or `--platform ios` (alias `-p`). With no arguments, `visiontest` starts the MCP stdio server as before. + +| Command | Platforms | Required args | Optional flags | +|---------|-----------|---------------|----------------| +| `install_automation_server` | android | — | — | +| `start_automation_server` | android, ios | — | — | +| `automation_server_status` | android, ios | — | — | +| `get_interactive_elements` | android, ios | — | `--include-disabled` | +| `get_ui_hierarchy` | android, ios | — | — | +| `get_device_info` | android, ios | — | — | +| `screenshot` | android, ios | — | `--output PATH` | +| `tap_by_coordinates` | android, ios | `x` `y` (ints) | — | +| `input_text` | android, ios | `text` (string) | — | +| `swipe_direction` | android, ios | `direction` (up\|down\|left\|right) | `--distance`, `--speed` | +| `press_back` | android | — | — | +| `press_home` | android, ios | — | — | +| `launch_app` | android, ios | `id` (string) | — | + +### CLI Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success | +| 1 | Generic failure | +| 2 | Usage error (missing/invalid args) | +| 3 | Automation server not reachable | +| 4 | Device/simulator not found | +| 5 | Platform not supported for this command | + ## Key Patterns - All device operations use suspend functions with coroutine-based async diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index b09400e..c2683a4 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -51,8 +51,15 @@ VisionTest has three components: visiontest/ ├── app/ # MCP Server (Kotlin/JVM) │ └── src/main/kotlin/com/example/visiontest/ -│ ├── Main.kt # Entry point +│ ├── Main.kt # Entry point (MCP server or CLI dispatch) │ ├── ToolFactory.kt # Thin coordinator wiring registrars +│ ├── cli/ +│ │ ├── VisionTestCli.kt # Root Clikt command with 13 subcommands +│ │ ├── CliErrorHandler.kt # Exit-code mapping + runCliCommand +│ │ ├── CliExit.kt # CliExit exception + ExitCode enum +│ │ ├── PlatformOption.kt # Platform enum + --platform option helpers +│ │ ├── ComponentHolder.kt # Lazy DI graph for CLI commands +│ │ └── commands/ # 13 Clikt subcommand files │ ├── tools/ │ │ ├── ToolDsl.kt # ToolScope DSL + CallToolRequest helpers │ │ ├── ToolRegistrar.kt # Interface for modular registration @@ -259,6 +266,13 @@ All Gradle tests are pure JVM unit tests (no device or emulator required). iOS t | `app/` | `AppConfigTest.kt` | Default configuration values | | `app/` | `ToolFactoryHelpersTest.kt` | `ToolHelpers.extractProperty`, `extractPattern`, `formatAppInfo` | | `app/` | `ToolFactoryPathTest.kt` | `ToolDiscovery.findProjectRoot`, `findAutomationServerApk`, `resolveMainApkPath`, `findXctestrun`; `IOSAutomationToolRegistrar.buildXcodebuildCommand` | +| `app/` | `MainDispatchTest.kt` | CLI vs MCP server routing based on args | +| `app/` | `CliErrorHandlerTest.kt` | Exit-code mapping for all exception types | +| `app/` | `VisionTestCliTest.kt` | Clikt argument parsing, platform options, validation | +| `app/` | `CliCommandIntegrationTest.kt` | End-to-end CLI command delegation with MockWebServer | +| `app/` | `AndroidAutomationToolRegistrarTest.kt` | Extracted handler functions with mocked HTTP | +| `app/` | `AndroidDeviceToolRegistrarTest.kt` | Device tool functions with faked DeviceConfig | +| `app/` | `IOSDeviceToolRegistrarTest.kt` | iOS device tool functions with faked DeviceConfig | | `automation-server/` | `JsonRpcModelsTest.kt` | JSON-RPC error factory methods, request/response defaults | | `automation-server/` | `UiAutomatorModelsTest.kt` | Data classes, default values, enum entries | | `automation-server/` | `ServerConfigPortTest.kt` | Port validation boundaries | @@ -303,6 +317,27 @@ curl -X POST http://localhost:9009/jsonrpc -H 'Content-Type: application/json' \ # Stop the server: kill the xcodebuild process (Ctrl+C in Terminal 1) ``` +### Testing the Installer + +You can test `install.sh` locally without publishing a release using `--local-jar`: + +```bash +# Build the fat JAR first +./gradlew shadowJar + +# Run the installer against a temporary directory +VISIONTEST_DIR=~/.local/share/visiontest-test \ + bash install.sh --local-jar app/build/libs/visiontest.jar + +# Verify it works +~/.local/bin/visiontest --help + +# Clean up +rm -rf ~/.local/share/visiontest-test +``` + +This skips downloading the JAR, APKs, and iOS bundle from GitHub Releases — it copies your local build instead. Agent instructions are still installed from `AGENT_INSTRUCTIONS.md` in the repo root. Add `--skip-agent-setup` to skip that too. + ## Extending VisionTest ### Adding New JSON-RPC Methods @@ -315,7 +350,9 @@ curl -X POST http://localhost:9009/jsonrpc -H 'Content-Type: application/json' \ ### Adding New MCP Tools 1. Add the tool to the appropriate registrar in `tools/` using the `ToolScope` DSL -2. The tool is automatically registered via `ToolFactory.registerAllTools()` +2. Extract the handler body into an `internal suspend fun` on the registrar (for CLI reuse) +3. The MCP tool is automatically registered via `ToolFactory.registerAllTools()` +4. Optionally, add a CLI subcommand in `cli/commands/` and register it in `VisionTestCli.kt` ## Error Codes diff --git a/LEARNING.md b/LEARNING.md index 26051d1..cedbd5e 100644 --- a/LEARNING.md +++ b/LEARNING.md @@ -645,3 +645,19 @@ iOS elements expose different properties than Android. The bridge maps them to a - [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) - [Template Method Pattern](https://refactoring.guru/design-patterns/template-method) - [OWASP Mobile Security Guide](https://owasp.org/www-project-mobile-app-security/) + +## Dual Facade: MCP Tools + CLI + +### Handler Extraction Refactor + +Each MCP tool's handler body was extracted into an `internal suspend fun` on its registrar class. The MCP `scope.tool { }` block now handles only arg extraction from `CallToolRequest` and delegates to the extracted function. The CLI subcommands call the same functions directly with typed parameters. + +This means both facades — MCP and CLI — share one implementation. Bug fixes and behavior changes apply to both paths automatically. + +### Why Not Interfaces? + +The extracted functions live on the concrete registrar classes rather than behind interfaces. This keeps the refactor minimal (no new types) and the functions are `internal` to the module, so they're only accessible within `app/`. If a third facade is needed later, introducing interfaces would be straightforward. + +### Deferred: `--json` Output and Daemon Mode + +The MVP CLI outputs plain text (the same strings the MCP tools return). Structured `--json` output was deferred because the MCP tools already return prose strings — adding JSON would require changing the return types of every extracted function. Daemon mode (keeping the process alive across multiple commands) was deferred to avoid the complexity of long-lived state management in the CLI path. diff --git a/README.md b/README.md index e751fe8..1a804e1 100644 --- a/README.md +++ b/README.md @@ -156,6 +156,33 @@ Your AI coding tool discovers all available tools automatically via MCP. Just as **iOS Automation:** `ios_start_automation_server`, `ios_automation_server_status`, `ios_get_ui_hierarchy`, `ios_get_interactive_elements`, `ios_find_element`, `ios_tap_by_coordinates`, `ios_swipe`, `ios_swipe_direction`, `ios_get_device_info`, `ios_input_text`, `ios_press_home`, `ios_stop_automation_server` +## CLI Usage + +The same operations are also available as direct CLI commands — no MCP client needed: + +```bash +visiontest automation_server_status -p android +visiontest get_interactive_elements -p ios +visiontest tap_by_coordinates -p android 100 200 +visiontest screenshot -p ios --output ./screenshot.png +visiontest swipe_direction -p android up --distance long --speed fast +``` + +Every command requires `--platform android` or `--platform ios` (alias `-p`). Run `visiontest --help` for the full command list, or `visiontest --help` for per-command usage. + +With no arguments, `visiontest` starts the MCP stdio server. + +### Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success | +| 1 | Generic failure | +| 2 | Usage error (missing/invalid args) | +| 3 | Automation server not reachable | +| 4 | Device/simulator not found | +| 5 | Platform not supported for this command | + ## Configuration ### Environment Variables @@ -175,7 +202,8 @@ Your AI coding tool discovers all available tools automatically via MCP. Just as ## Future Plans - [x] Text input/typing support -- [ ] Screenshot capture via UIAutomator / XCUITest +- [x] Screenshot capture via UIAutomator / XCUITest +- [x] CLI mode (direct command-line usage without MCP) - [ ] Long press operations - [ ] Wait/sync operations for E2E testing - [ ] Multi-device coordination @@ -186,6 +214,7 @@ Your AI coding tool discovers all available tools automatically via MCP. Just as - [ ] Notification/status bar interaction - [ ] Permission dialog automation - [ ] Video recording of automation sessions +- [ ] Separate CLI-only artifact (smaller download, no MCP dependencies) ## Contributing diff --git a/app/build.gradle.kts b/app/build.gradle.kts index 4b2c54a..a2b3424 100644 --- a/app/build.gradle.kts +++ b/app/build.gradle.kts @@ -66,7 +66,9 @@ dependencies { // Gson for JSON serialization implementation("com.google.code.gson:gson:2.10.1") - + // Clikt — CLI argument parsing for the `visiontest ` surface. + // Used only when the JAR is invoked with CLI args; MCP stdio path does not touch clikt. + implementation("com.github.ajalt.clikt:clikt:4.4.0") // This dependency is used by the application. implementation(libs.guava) diff --git a/app/src/main/kotlin/com/example/visiontest/Exceptions.kt b/app/src/main/kotlin/com/example/visiontest/Exceptions.kt index aa9fa41..a68788f 100644 --- a/app/src/main/kotlin/com/example/visiontest/Exceptions.kt +++ b/app/src/main/kotlin/com/example/visiontest/Exceptions.kt @@ -52,4 +52,11 @@ class NoSimulatorAvailableException(message: String) : Exception(message) /** * Thrown when a specified app cannot be found on the iOS device/simulator. */ -class AppNotFoundException(message: String) : Exception(message) \ No newline at end of file +class AppNotFoundException(message: String) : Exception(message) + +/** + * Thrown when a command requires the automation server but it is not running. + * Used by both Android and iOS registrars so the CLI path can map to exit code 3 + * and the MCP path can surface the message via [ErrorHandler]. + */ +class ServerNotRunningException(message: String) : Exception(message) \ No newline at end of file diff --git a/app/src/main/kotlin/com/example/visiontest/Main.kt b/app/src/main/kotlin/com/example/visiontest/Main.kt index 7d9254a..7fbb4f8 100644 --- a/app/src/main/kotlin/com/example/visiontest/Main.kt +++ b/app/src/main/kotlin/com/example/visiontest/Main.kt @@ -1,8 +1,14 @@ package com.example.visiontest import com.example.visiontest.android.Android +import com.example.visiontest.cli.CliExit +import com.example.visiontest.cli.ExitCode +import com.example.visiontest.cli.VisionTestCli import com.example.visiontest.ios.IOSManager import com.example.visiontest.config.AppConfig +import com.github.ajalt.clikt.core.CliktError +import com.github.ajalt.clikt.core.PrintHelpMessage +import com.github.ajalt.clikt.core.UsageError import io.ktor.utils.io.streams.asInput import io.modelcontextprotocol.kotlin.sdk.* import io.modelcontextprotocol.kotlin.sdk.server.* @@ -13,7 +19,49 @@ import kotlinx.io.buffered import org.slf4j.LoggerFactory -fun main() { +fun main(args: Array) { + when (route(args)) { + Route.McpServer -> runMcpServer() + Route.Cli -> runCli(args) + } +} + +private fun runCli(args: Array) { + try { + VisionTestCli().parse(args) + } catch (e: CliExit) { + // Safety net: all CliExit exceptions should be caught by runCliCommand inside + // each subcommand's run(). This catch handles any that escape during arg parsing. + System.err.println(e.message) + kotlin.system.exitProcess(e.code.value) + } catch (e: UsageError) { + val defaultFormatter = object : com.github.ajalt.clikt.output.ParameterFormatter { + override fun formatOption(name: String) = name + override fun formatArgument(name: String) = "<$name>" + override fun formatSubcommand(name: String) = name + } + val loc = e.context?.localization ?: object : com.github.ajalt.clikt.output.Localization {} + val msg = e.formatMessage(loc, defaultFormatter) + System.err.println(msg) + kotlin.system.exitProcess(ExitCode.UsageError.value) + } catch (e: PrintHelpMessage) { + val cmd = e.context?.command + if (cmd != null) { + println(cmd.getFormattedHelp()) + } + kotlin.system.exitProcess(if (e.error) 1 else 0) + } catch (e: CliktError) { + System.err.println(e.message.orEmpty()) + kotlin.system.exitProcess(ExitCode.GenericFailure.value) + } +} + +internal enum class Route { McpServer, Cli } + +internal fun route(args: Array): Route = + if (args.isEmpty() || args[0] == "serve") Route.McpServer else Route.Cli + +private fun runMcpServer() { val config = AppConfig.createDefault() @@ -84,4 +132,4 @@ private fun createServer(config: AppConfig): Server { ) ) ) -} \ No newline at end of file +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/CliErrorHandler.kt b/app/src/main/kotlin/com/example/visiontest/cli/CliErrorHandler.kt new file mode 100644 index 0000000..c8850f8 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/CliErrorHandler.kt @@ -0,0 +1,74 @@ +package com.example.visiontest.cli + +import com.example.visiontest.NoDeviceAvailableException +import com.example.visiontest.NoSimulatorAvailableException +import com.example.visiontest.ServerNotRunningException +import com.github.ajalt.clikt.core.CliktError +import com.github.ajalt.clikt.core.UsageError +import kotlinx.coroutines.runBlocking +import kotlin.system.exitProcess + +/** + * Result of executing a CLI command, before process exit. + * Used internally for testability — tests inspect this instead of trapping `exitProcess`. + */ +data class CliResult(val exitCode: Int, val stdout: String?, val stderr: String?) + +/** + * Runs a CLI command [block], prints its result to stdout on success (exit 0), + * and maps exceptions to the appropriate exit code on stderr. + * + * This is the single exit-code gateway for every CLI subcommand. + */ +fun runCliCommand(block: suspend () -> String): Nothing { + val result = executeCliCommand(block) + if (result.stdout != null) println(result.stdout) + if (result.stderr != null) System.err.println(result.stderr) + exitProcess(result.exitCode) +} + +/** + * Executes [block] and returns a [CliResult] without calling `exitProcess`. + * This is the testable core of [runCliCommand]. + */ +internal fun executeCliCommand(block: suspend () -> String): CliResult { + return try { + val output = runBlocking { block() } + CliResult(ExitCode.Success.value, stdout = output, stderr = null) + } catch (e: CliExit) { + CliResult(e.code.value, stdout = null, stderr = e.message) + } catch (e: ServerNotRunningException) { + CliResult(ExitCode.ServerNotReachable.value, stdout = null, stderr = e.message) + } catch (e: NoDeviceAvailableException) { + CliResult(ExitCode.DeviceNotFound.value, stdout = null, stderr = e.message) + } catch (e: NoSimulatorAvailableException) { + CliResult(ExitCode.DeviceNotFound.value, stdout = null, stderr = e.message) + } catch (e: UsageError) { + CliResult(ExitCode.UsageError.value, stdout = null, stderr = e.message) + } catch (e: CliktError) { + CliResult(ExitCode.UsageError.value, stdout = null, stderr = e.message) + } catch (e: IllegalArgumentException) { + CliResult(ExitCode.UsageError.value, stdout = null, stderr = e.message ?: "Invalid argument") + } catch (e: Exception) { + CliResult(ExitCode.GenericFailure.value, stdout = null, stderr = e.message ?: "Unknown error") + } +} + +/** + * Checks that the automation server is reachable, throwing [CliExit] with + * [ExitCode.ServerNotReachable] if not. CLI commands that require a running + * server should call this before delegating to the extracted registrar function. + * + * Note: the extracted registrar functions also check `isServerRunning()` internally + * and throw [ServerNotRunningException] if the server drops mid-operation. The CLI + * pre-check here provides a fast-fail with the proper exit code, while the registrar + * check closes the TOCTOU gap (both are caught by [executeCliCommand]). + */ +suspend fun requireServerRunning(isRunning: suspend () -> Boolean) { + if (!isRunning()) { + throw CliExit( + ExitCode.ServerNotReachable, + "Automation server is not running. Run 'start_automation_server' first." + ) + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/CliExit.kt b/app/src/main/kotlin/com/example/visiontest/cli/CliExit.kt new file mode 100644 index 0000000..66f5593 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/CliExit.kt @@ -0,0 +1,33 @@ +package com.example.visiontest.cli + +/** + * Exception thrown by CLI commands to signal a non-zero exit. + * + * The [code] is used as the process exit code and the [message] is printed to stderr. + * A [code] of [ExitCode.Success] should never be thrown — return normally instead. + */ +class CliExit(val code: ExitCode, override val message: String) : Exception(message) + +/** + * Fixed set of CLI exit codes. LLM-scriptable: agents can branch on the numeric code + * without parsing stderr text. + */ +enum class ExitCode(val value: Int) { + /** Command completed successfully. */ + Success(0), + + /** Unhandled / unexpected exception. */ + GenericFailure(1), + + /** Missing or invalid flag / argument (clikt usage error). */ + UsageError(2), + + /** The automation server is not running or not reachable. */ + ServerNotReachable(3), + + /** No device or simulator is connected / available. */ + DeviceNotFound(4), + + /** The requested platform is not supported for this command (e.g. `press_back --platform ios`). */ + PlatformNotSupported(5) +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/ComponentHolder.kt b/app/src/main/kotlin/com/example/visiontest/cli/ComponentHolder.kt new file mode 100644 index 0000000..aba4c17 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/ComponentHolder.kt @@ -0,0 +1,83 @@ +package com.example.visiontest.cli + +import com.example.visiontest.android.Android +import com.example.visiontest.android.AutomationClient +import com.example.visiontest.config.AppConfig +import com.example.visiontest.discovery.ToolDiscovery +import com.example.visiontest.ios.IOSAutomationClient +import com.example.visiontest.ios.IOSManager +import com.example.visiontest.tools.AndroidAutomationToolRegistrar +import com.example.visiontest.tools.AndroidDeviceToolRegistrar +import com.example.visiontest.tools.IOSAutomationToolRegistrar +import com.example.visiontest.tools.IOSDeviceToolRegistrar +import org.slf4j.LoggerFactory + +/** + * Minimal object graph for the CLI path. + * + * Mirrors the wiring in [com.example.visiontest.main] / [com.example.visiontest.ToolFactory] + * so that CLI subcommands use the identical set of dependencies that the MCP tools use. + * + * The holder is created once per CLI invocation (in [VisionTestCli]) and registers + * the same shutdown-hook behavior (close [android] and [ios] on JVM exit). + */ +class ComponentHolder internal constructor( + val android: Android, + val ios: IOSManager, + val automationClient: AutomationClient, + val iosAutomationClient: IOSAutomationClient, + val androidDeviceRegistrar: AndroidDeviceToolRegistrar, + val androidAutomationRegistrar: AndroidAutomationToolRegistrar, + val iosDeviceRegistrar: IOSDeviceToolRegistrar, + val iosAutomationRegistrar: IOSAutomationToolRegistrar, +) { + + /** Returns `true` if the automation server for the given platform is reachable. */ + suspend fun isServerRunning(platform: Platform): Boolean = when (platform) { + Platform.Android -> automationClient.isServerRunning() + Platform.Ios -> iosAutomationClient.isServerRunning() + } + + companion object { + /** + * Creates a [ComponentHolder] using [AppConfig.createDefault] with the standard + * production wiring. Registers a shutdown hook to close device connections. + */ + fun createDefault(): ComponentHolder { + val config = AppConfig.createDefault() + val logger = LoggerFactory.getLogger("VisionTest") + + val android = Android( + timeoutMillis = config.adbTimeoutMillis, + cacheValidityPeriod = config.deviceCacheValidityPeriod, + logger = LoggerFactory.getLogger(Android::class.java) + ) + + val ios = IOSManager( + logger = LoggerFactory.getLogger(IOSManager::class.java) + ) + + val automationClient = AutomationClient() + val iosAutomationClient = IOSAutomationClient() + val discovery = ToolDiscovery(logger) + + Runtime.getRuntime().addShutdownHook(Thread { + logger.info("Shutting down CLI") + android.close() + ios.close() + logger.info("CLI shutdown complete") + }) + + return ComponentHolder( + android = android, + ios = ios, + automationClient = automationClient, + iosAutomationClient = iosAutomationClient, + androidDeviceRegistrar = AndroidDeviceToolRegistrar(android), + androidAutomationRegistrar = AndroidAutomationToolRegistrar(android, automationClient, discovery), + iosDeviceRegistrar = IOSDeviceToolRegistrar(ios), + iosAutomationRegistrar = IOSAutomationToolRegistrar(ios, iosAutomationClient, discovery, logger), + ) + } + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/PlatformOption.kt b/app/src/main/kotlin/com/example/visiontest/cli/PlatformOption.kt new file mode 100644 index 0000000..f10da90 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/PlatformOption.kt @@ -0,0 +1,49 @@ +package com.example.visiontest.cli + +import com.github.ajalt.clikt.core.CliktCommand +import com.github.ajalt.clikt.parameters.options.option +import com.github.ajalt.clikt.parameters.options.required +import com.github.ajalt.clikt.parameters.types.choice + +/** + * Target platform for CLI commands. + */ +enum class Platform(val value: String) { + Android("android"), + Ios("ios"); +} + +/** + * Reusable `--platform` / `-p` option for cross-platform CLI subcommands. + * Returns a [Platform] enum value. + */ +fun CliktCommand.platformOption() = + option("--platform", "-p", help = "Target platform: android or ios") + .choice("android" to Platform.Android, "ios" to Platform.Ios) + .required() + +/** + * `--platform` option for Android-only CLI subcommands. + * Accepts both platforms at parse time so that passing `--platform ios` yields + * exit code 5 ([ExitCode.PlatformNotSupported]) instead of Clikt's generic + * exit code 2 ([ExitCode.UsageError]). + * + * Commands using this option must call [requireAndroid] in their `run()` body. + */ +fun CliktCommand.androidOnlyPlatformOption() = + option("--platform", "-p", help = "Target platform (android only)") + .choice("android" to Platform.Android, "ios" to Platform.Ios) + .required() + +/** + * Throws [CliExit] with [ExitCode.PlatformNotSupported] if [platform] is not Android. + * Call at the start of `run()` in Android-only commands. + */ +fun requireAndroid(platform: Platform, commandName: String) { + if (platform != Platform.Android) { + throw CliExit( + ExitCode.PlatformNotSupported, + "'$commandName' is only supported on Android." + ) + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/VisionTestCli.kt b/app/src/main/kotlin/com/example/visiontest/cli/VisionTestCli.kt new file mode 100644 index 0000000..3cef7e3 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/VisionTestCli.kt @@ -0,0 +1,41 @@ +package com.example.visiontest.cli + +import com.example.visiontest.cli.commands.* +import com.github.ajalt.clikt.core.NoOpCliktCommand +import com.github.ajalt.clikt.core.subcommands + +/** + * Root command for the `visiontest` CLI. Dispatches to per-operation subcommands. + * + * The JAR's `main(args)` enters this command only when invoked with arguments that + * are not the MCP stdio sentinel (empty args or `serve`). See [com.example.visiontest.main]. + * + * A [ComponentHolder] is created lazily on first subcommand execution so that + * `visiontest --help` does not initialize ADB connections or register shutdown hooks. + */ +class VisionTestCli : NoOpCliktCommand(name = "visiontest") { + private val components by lazy { ComponentHolder.createDefault() } + + init { + subcommands( + // Setup + InstallAutomationServerCommand(lazy { components }), + StartAutomationServerCommand(lazy { components }), + AutomationServerStatusCommand(lazy { components }), + // Inspection + GetInteractiveElementsCommand(lazy { components }), + GetUiHierarchyCommand(lazy { components }), + GetDeviceInfoCommand(lazy { components }), + ScreenshotCommand(lazy { components }), + // Interaction + TapByCoordinatesCommand(lazy { components }), + InputTextCommand(lazy { components }), + SwipeDirectionCommand(lazy { components }), + // Navigation + PressBackCommand(lazy { components }), + PressHomeCommand(lazy { components }), + // Apps + LaunchAppCommand(lazy { components }), + ) + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/commands/AutomationServerStatusCommand.kt b/app/src/main/kotlin/com/example/visiontest/cli/commands/AutomationServerStatusCommand.kt new file mode 100644 index 0000000..775e303 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/commands/AutomationServerStatusCommand.kt @@ -0,0 +1,20 @@ +package com.example.visiontest.cli.commands + +import com.example.visiontest.cli.ComponentHolder +import com.example.visiontest.cli.Platform +import com.example.visiontest.cli.platformOption +import com.example.visiontest.cli.runCliCommand +import com.github.ajalt.clikt.core.CliktCommand + +class AutomationServerStatusCommand(private val components: Lazy) : + CliktCommand(name = "automation_server_status", help = "Check automation server status") { + + private val platform by platformOption() + + override fun run() = runCliCommand { + when (platform) { + Platform.Android -> components.value.androidAutomationRegistrar.automationServerStatus() + Platform.Ios -> components.value.iosAutomationRegistrar.automationServerStatus() + } + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/commands/GetDeviceInfoCommand.kt b/app/src/main/kotlin/com/example/visiontest/cli/commands/GetDeviceInfoCommand.kt new file mode 100644 index 0000000..e95513b --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/commands/GetDeviceInfoCommand.kt @@ -0,0 +1,22 @@ +package com.example.visiontest.cli.commands + +import com.example.visiontest.cli.ComponentHolder +import com.example.visiontest.cli.Platform +import com.example.visiontest.cli.platformOption +import com.example.visiontest.cli.requireServerRunning +import com.example.visiontest.cli.runCliCommand +import com.github.ajalt.clikt.core.CliktCommand + +class GetDeviceInfoCommand(private val components: Lazy) : + CliktCommand(name = "get_device_info", help = "Get device display info") { + + private val platform by platformOption() + + override fun run() = runCliCommand { + requireServerRunning { components.value.isServerRunning(platform) } + when (platform) { + Platform.Android -> components.value.androidAutomationRegistrar.getDeviceInfo() + Platform.Ios -> components.value.iosAutomationRegistrar.getDeviceInfo() + } + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/commands/GetInteractiveElementsCommand.kt b/app/src/main/kotlin/com/example/visiontest/cli/commands/GetInteractiveElementsCommand.kt new file mode 100644 index 0000000..1ec901c --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/commands/GetInteractiveElementsCommand.kt @@ -0,0 +1,25 @@ +package com.example.visiontest.cli.commands + +import com.example.visiontest.cli.ComponentHolder +import com.example.visiontest.cli.Platform +import com.example.visiontest.cli.platformOption +import com.example.visiontest.cli.requireServerRunning +import com.example.visiontest.cli.runCliCommand +import com.github.ajalt.clikt.core.CliktCommand +import com.github.ajalt.clikt.parameters.options.flag +import com.github.ajalt.clikt.parameters.options.option + +class GetInteractiveElementsCommand(private val components: Lazy) : + CliktCommand(name = "get_interactive_elements", help = "Get interactive UI elements") { + + private val platform by platformOption() + private val includeDisabled by option("--include-disabled", help = "Include disabled elements").flag() + + override fun run() = runCliCommand { + requireServerRunning { components.value.isServerRunning(platform) } + when (platform) { + Platform.Android -> components.value.androidAutomationRegistrar.getInteractiveElements(includeDisabled) + Platform.Ios -> components.value.iosAutomationRegistrar.getInteractiveElements(includeDisabled) + } + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/commands/GetUiHierarchyCommand.kt b/app/src/main/kotlin/com/example/visiontest/cli/commands/GetUiHierarchyCommand.kt new file mode 100644 index 0000000..6db5313 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/commands/GetUiHierarchyCommand.kt @@ -0,0 +1,22 @@ +package com.example.visiontest.cli.commands + +import com.example.visiontest.cli.ComponentHolder +import com.example.visiontest.cli.Platform +import com.example.visiontest.cli.platformOption +import com.example.visiontest.cli.requireServerRunning +import com.example.visiontest.cli.runCliCommand +import com.github.ajalt.clikt.core.CliktCommand + +class GetUiHierarchyCommand(private val components: Lazy) : + CliktCommand(name = "get_ui_hierarchy", help = "Get full UI hierarchy XML") { + + private val platform by platformOption() + + override fun run() = runCliCommand { + requireServerRunning { components.value.isServerRunning(platform) } + when (platform) { + Platform.Android -> components.value.androidAutomationRegistrar.getUiHierarchy() + Platform.Ios -> components.value.iosAutomationRegistrar.getUiHierarchy() + } + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/commands/InputTextCommand.kt b/app/src/main/kotlin/com/example/visiontest/cli/commands/InputTextCommand.kt new file mode 100644 index 0000000..d855ef2 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/commands/InputTextCommand.kt @@ -0,0 +1,24 @@ +package com.example.visiontest.cli.commands + +import com.example.visiontest.cli.ComponentHolder +import com.example.visiontest.cli.Platform +import com.example.visiontest.cli.platformOption +import com.example.visiontest.cli.requireServerRunning +import com.example.visiontest.cli.runCliCommand +import com.github.ajalt.clikt.core.CliktCommand +import com.github.ajalt.clikt.parameters.arguments.argument + +class InputTextCommand(private val components: Lazy) : + CliktCommand(name = "input_text", help = "Type text into focused element") { + + private val platform by platformOption() + private val text by argument(help = "Text to type") + + override fun run() = runCliCommand { + requireServerRunning { components.value.isServerRunning(platform) } + when (platform) { + Platform.Android -> components.value.androidAutomationRegistrar.inputText(text) + Platform.Ios -> components.value.iosAutomationRegistrar.inputText(text) + } + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/commands/InstallAutomationServerCommand.kt b/app/src/main/kotlin/com/example/visiontest/cli/commands/InstallAutomationServerCommand.kt new file mode 100644 index 0000000..8ccc516 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/commands/InstallAutomationServerCommand.kt @@ -0,0 +1,18 @@ +package com.example.visiontest.cli.commands + +import com.example.visiontest.cli.ComponentHolder +import com.example.visiontest.cli.androidOnlyPlatformOption +import com.example.visiontest.cli.requireAndroid +import com.example.visiontest.cli.runCliCommand +import com.github.ajalt.clikt.core.CliktCommand + +class InstallAutomationServerCommand(private val components: Lazy) : + CliktCommand(name = "install_automation_server", help = "Install automation server APKs (Android only)") { + + private val platform by androidOnlyPlatformOption() + + override fun run() = runCliCommand { + requireAndroid(platform, "install_automation_server") + components.value.androidAutomationRegistrar.installAutomationServer() + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/commands/LaunchAppCommand.kt b/app/src/main/kotlin/com/example/visiontest/cli/commands/LaunchAppCommand.kt new file mode 100644 index 0000000..00da038 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/commands/LaunchAppCommand.kt @@ -0,0 +1,22 @@ +package com.example.visiontest.cli.commands + +import com.example.visiontest.cli.ComponentHolder +import com.example.visiontest.cli.Platform +import com.example.visiontest.cli.platformOption +import com.example.visiontest.cli.runCliCommand +import com.github.ajalt.clikt.core.CliktCommand +import com.github.ajalt.clikt.parameters.arguments.argument + +class LaunchAppCommand(private val components: Lazy) : + CliktCommand(name = "launch_app", help = "Launch an app by package/bundle ID") { + + private val platform by platformOption() + private val id by argument(help = "Package name (Android) or bundle ID (iOS)") + + override fun run() = runCliCommand { + when (platform) { + Platform.Android -> components.value.androidDeviceRegistrar.launchApp(id) + Platform.Ios -> components.value.iosDeviceRegistrar.launchApp(id) + } + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/commands/PressBackCommand.kt b/app/src/main/kotlin/com/example/visiontest/cli/commands/PressBackCommand.kt new file mode 100644 index 0000000..eaf43e4 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/commands/PressBackCommand.kt @@ -0,0 +1,20 @@ +package com.example.visiontest.cli.commands + +import com.example.visiontest.cli.ComponentHolder +import com.example.visiontest.cli.androidOnlyPlatformOption +import com.example.visiontest.cli.requireAndroid +import com.example.visiontest.cli.requireServerRunning +import com.example.visiontest.cli.runCliCommand +import com.github.ajalt.clikt.core.CliktCommand + +class PressBackCommand(private val components: Lazy) : + CliktCommand(name = "press_back", help = "Press the back button (Android only)") { + + private val platform by androidOnlyPlatformOption() + + override fun run() = runCliCommand { + requireAndroid(platform, "press_back") + requireServerRunning { components.value.isServerRunning(platform) } + components.value.androidAutomationRegistrar.pressBack() + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/commands/PressHomeCommand.kt b/app/src/main/kotlin/com/example/visiontest/cli/commands/PressHomeCommand.kt new file mode 100644 index 0000000..3149dd4 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/commands/PressHomeCommand.kt @@ -0,0 +1,22 @@ +package com.example.visiontest.cli.commands + +import com.example.visiontest.cli.ComponentHolder +import com.example.visiontest.cli.Platform +import com.example.visiontest.cli.platformOption +import com.example.visiontest.cli.requireServerRunning +import com.example.visiontest.cli.runCliCommand +import com.github.ajalt.clikt.core.CliktCommand + +class PressHomeCommand(private val components: Lazy) : + CliktCommand(name = "press_home", help = "Press the home button") { + + private val platform by platformOption() + + override fun run() = runCliCommand { + requireServerRunning { components.value.isServerRunning(platform) } + when (platform) { + Platform.Android -> components.value.androidAutomationRegistrar.pressHome() + Platform.Ios -> components.value.iosAutomationRegistrar.pressHome() + } + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/commands/ScreenshotCommand.kt b/app/src/main/kotlin/com/example/visiontest/cli/commands/ScreenshotCommand.kt new file mode 100644 index 0000000..1e5d7ad --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/commands/ScreenshotCommand.kt @@ -0,0 +1,24 @@ +package com.example.visiontest.cli.commands + +import com.example.visiontest.cli.ComponentHolder +import com.example.visiontest.cli.Platform +import com.example.visiontest.cli.platformOption +import com.example.visiontest.cli.requireServerRunning +import com.example.visiontest.cli.runCliCommand +import com.github.ajalt.clikt.core.CliktCommand +import com.github.ajalt.clikt.parameters.options.option + +class ScreenshotCommand(private val components: Lazy) : + CliktCommand(name = "screenshot", help = "Capture a screenshot") { + + private val platform by platformOption() + private val output by option("--output", help = "Output file path for the screenshot PNG") + + override fun run() = runCliCommand { + requireServerRunning { components.value.isServerRunning(platform) } + when (platform) { + Platform.Android -> components.value.androidAutomationRegistrar.captureScreenshot(output) + Platform.Ios -> components.value.iosAutomationRegistrar.captureScreenshot(output) + } + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/commands/StartAutomationServerCommand.kt b/app/src/main/kotlin/com/example/visiontest/cli/commands/StartAutomationServerCommand.kt new file mode 100644 index 0000000..4652a47 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/commands/StartAutomationServerCommand.kt @@ -0,0 +1,20 @@ +package com.example.visiontest.cli.commands + +import com.example.visiontest.cli.ComponentHolder +import com.example.visiontest.cli.Platform +import com.example.visiontest.cli.platformOption +import com.example.visiontest.cli.runCliCommand +import com.github.ajalt.clikt.core.CliktCommand + +class StartAutomationServerCommand(private val components: Lazy) : + CliktCommand(name = "start_automation_server", help = "Start the automation server") { + + private val platform by platformOption() + + override fun run() = runCliCommand { + when (platform) { + Platform.Android -> components.value.androidAutomationRegistrar.startAutomationServer() + Platform.Ios -> components.value.iosAutomationRegistrar.startAutomationServer() + } + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/commands/SwipeDirectionCommand.kt b/app/src/main/kotlin/com/example/visiontest/cli/commands/SwipeDirectionCommand.kt new file mode 100644 index 0000000..7245057 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/commands/SwipeDirectionCommand.kt @@ -0,0 +1,32 @@ +package com.example.visiontest.cli.commands + +import com.example.visiontest.cli.ComponentHolder +import com.example.visiontest.cli.Platform +import com.example.visiontest.cli.platformOption +import com.example.visiontest.cli.requireServerRunning +import com.example.visiontest.cli.runCliCommand +import com.github.ajalt.clikt.core.CliktCommand +import com.github.ajalt.clikt.parameters.arguments.argument +import com.github.ajalt.clikt.parameters.options.default +import com.github.ajalt.clikt.parameters.options.option +import com.github.ajalt.clikt.parameters.types.choice + +class SwipeDirectionCommand(private val components: Lazy) : + CliktCommand(name = "swipe_direction", help = "Swipe in a direction") { + + private val platform by platformOption() + private val direction by argument(help = "Swipe direction: up, down, left, right") + .choice("up", "down", "left", "right") + private val distance by option("--distance", help = "Swipe distance") + .choice("short", "medium", "long").default("medium") + private val speed by option("--speed", help = "Swipe speed") + .choice("slow", "normal", "fast").default("normal") + + override fun run() = runCliCommand { + requireServerRunning { components.value.isServerRunning(platform) } + when (platform) { + Platform.Android -> components.value.androidAutomationRegistrar.swipeByDirection(direction, distance, speed) + Platform.Ios -> components.value.iosAutomationRegistrar.swipeByDirection(direction, distance, speed) + } + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/cli/commands/TapByCoordinatesCommand.kt b/app/src/main/kotlin/com/example/visiontest/cli/commands/TapByCoordinatesCommand.kt new file mode 100644 index 0000000..2ed0756 --- /dev/null +++ b/app/src/main/kotlin/com/example/visiontest/cli/commands/TapByCoordinatesCommand.kt @@ -0,0 +1,26 @@ +package com.example.visiontest.cli.commands + +import com.example.visiontest.cli.ComponentHolder +import com.example.visiontest.cli.Platform +import com.example.visiontest.cli.platformOption +import com.example.visiontest.cli.requireServerRunning +import com.example.visiontest.cli.runCliCommand +import com.github.ajalt.clikt.core.CliktCommand +import com.github.ajalt.clikt.parameters.arguments.argument +import com.github.ajalt.clikt.parameters.types.int + +class TapByCoordinatesCommand(private val components: Lazy) : + CliktCommand(name = "tap_by_coordinates", help = "Tap at screen coordinates") { + + private val platform by platformOption() + private val x by argument(help = "X coordinate").int() + private val y by argument(help = "Y coordinate").int() + + override fun run() = runCliCommand { + requireServerRunning { components.value.isServerRunning(platform) } + when (platform) { + Platform.Android -> components.value.androidAutomationRegistrar.tapByCoordinates(x, y) + Platform.Ios -> components.value.iosAutomationRegistrar.tapByCoordinates(x, y) + } + } +} diff --git a/app/src/main/kotlin/com/example/visiontest/tools/AndroidAutomationToolRegistrar.kt b/app/src/main/kotlin/com/example/visiontest/tools/AndroidAutomationToolRegistrar.kt index 81f91ca..0a39da9 100644 --- a/app/src/main/kotlin/com/example/visiontest/tools/AndroidAutomationToolRegistrar.kt +++ b/app/src/main/kotlin/com/example/visiontest/tools/AndroidAutomationToolRegistrar.kt @@ -1,5 +1,6 @@ package com.example.visiontest.tools +import com.example.visiontest.ServerNotRunningException import com.example.visiontest.android.Android import com.example.visiontest.android.AutomationClient import com.example.visiontest.common.DeviceConfig @@ -42,75 +43,195 @@ class AndroidAutomationToolRegistrar( registerScreenshot(scope) } - private fun registerInstallAutomationServer(scope: ToolScope) { - scope.tool( - name = "install_automation_server", - description = "Installs the automation server APKs on the connected Android device. Run this once before using start_automation_server." - ) { - val device = android.getFirstAvailableDevice() + // ==================== Extracted business logic ==================== - val apkPath = discovery.findAutomationServerApk() - ?: return@tool "Automation server APK not found. Re-run install.sh to download APKs, or set VISION_TEST_APK_PATH environment variable. To build from source: ./gradlew :automation-server:assembleDebug :automation-server:assembleDebugAndroidTest" + private suspend fun requireServer() { + if (!automationClient.isServerRunning()) throw ServerNotRunningException("Automation server is not running. Use 'start_automation_server' first.") + } - val androidDevice = android as? Android - ?: return@tool "Android device configuration not available" + internal suspend fun installAutomationServer(): String { + val device = android.getFirstAvailableDevice() - val resolvedMainApk = discovery.resolveMainApkPath(apkPath) - if (resolvedMainApk != null) { - androidDevice.executeAdb("install", "-r", resolvedMainApk) - } else { - return@tool "Main APK not found at the expected path derived from test APK: $apkPath. Ensure the main automation-server APK is built/installed (e.g., via :automation-server:assembleDebug), or re-run install.sh or set VISION_TEST_APK_PATH." - } + val apkPath = discovery.findAutomationServerApk() + ?: return "Automation server APK not found. Re-run install.sh to download APKs, or set VISION_TEST_APK_PATH environment variable. To build from source: ./gradlew :automation-server:assembleDebug :automation-server:assembleDebugAndroidTest" - androidDevice.executeAdb("install", "-r", apkPath) + val androidDevice = android as? Android + ?: return "Android device configuration not available" - "Automation server APKs installed successfully on device ${device.id}. Use 'start_automation_server' to start the server." + val resolvedMainApk = discovery.resolveMainApkPath(apkPath) + if (resolvedMainApk != null) { + androidDevice.executeAdb("install", "-r", resolvedMainApk) + } else { + return "Main APK not found at the expected path derived from test APK: $apkPath. Ensure the main automation-server APK is built/installed (e.g., via :automation-server:assembleDebug), or re-run install.sh or set VISION_TEST_APK_PATH." } + + androidDevice.executeAdb("install", "-r", apkPath) + + return "Automation server APKs installed successfully on device ${device.id}. Use 'start_automation_server' to start the server." } - private fun registerStartAutomationServer(scope: ToolScope) { - scope.tool( - name = "start_automation_server", - description = "Starts the automation server on the connected Android device. The APKs must be installed first using install_automation_server. Sets up port forwarding and starts the instrumentation server.", - timeoutMs = 30000 - ) { - val device = android.getFirstAvailableDevice() - val androidDevice = android as? Android - ?: return@tool "Android device configuration not available" + internal suspend fun startAutomationServer(): String { + val device = android.getFirstAvailableDevice() + val androidDevice = android as? Android + ?: return "Android device configuration not available" - val port = AutomationConfig.DEFAULT_PORT + val port = AutomationConfig.DEFAULT_PORT + if (automationClient.isServerRunning()) { + return "Automation server is already running on localhost:$port" + } + + androidDevice.executeAdb("forward", "tcp:$port", "tcp:$port") + + withContext(Dispatchers.IO) { + val command = listOf( + "adb", "-s", device.id, "shell", + "am", "instrument", "-w", + "-e", "port", port.toString(), + "-e", "class", AutomationConfig.AUTOMATION_SERVER_TEST_CLASS, + "${AutomationConfig.AUTOMATION_SERVER_TEST_PACKAGE}/${AutomationConfig.INSTRUMENTATION_RUNNER}" + ) + ProcessBuilder(command) + .redirectErrorStream(true) + .redirectOutput(ProcessBuilder.Redirect.DISCARD) + .start() + } + + var attempts = 0 + val maxAttempts = 10 + while (attempts < maxAttempts) { + delay(500) if (automationClient.isServerRunning()) { - return@tool "Automation server is already running on localhost:$port" + return "Automation server started successfully on device ${device.id}. Server is listening on localhost:$port" } + attempts++ + } - androidDevice.executeAdb("forward", "tcp:$port", "tcp:$port") - - withContext(Dispatchers.IO) { - val command = listOf( - "adb", "-s", device.id, "shell", - "am", "instrument", "-w", - "-e", "port", port.toString(), - "-e", "class", AutomationConfig.AUTOMATION_SERVER_TEST_CLASS, - "${AutomationConfig.AUTOMATION_SERVER_TEST_PACKAGE}/${AutomationConfig.INSTRUMENTATION_RUNNER}" - ) - ProcessBuilder(command) - .redirectErrorStream(true) - .redirectOutput(ProcessBuilder.Redirect.DISCARD) - .start() - } + return "Automation server may not have started properly. Check device logs with: adb logcat | grep AutomationServer" + } - var attempts = 0 - val maxAttempts = 10 - while (attempts < maxAttempts) { - delay(500) - if (automationClient.isServerRunning()) { - return@tool "Automation server started successfully on device ${device.id}. Server is listening on localhost:$port" - } - attempts++ - } + internal suspend fun automationServerStatus(): String { + val isRunning = automationClient.isServerRunning() + return if (isRunning) { + "Automation server is running and accessible at localhost:${AutomationConfig.DEFAULT_PORT}" + } else { + "Automation server is not running. Use 'start_automation_server' to start it." + } + } + + internal suspend fun getUiHierarchy(): String { + requireServer() + return automationClient.getUiHierarchy() + } + + internal suspend fun findElement( + text: String?, + textContains: String?, + resourceId: String?, + className: String?, + contentDescription: String? + ): String { + requireServer() + + if (text == null && textContains == null && resourceId == null && + className == null && contentDescription == null) { + return "Error: At least one selector required (text, textContains, resourceId, className, or contentDescription)" + } + + return automationClient.findElement( + text = text, + textContains = textContains, + resourceId = resourceId, + className = className, + contentDescription = contentDescription + ) + } + + internal suspend fun tapByCoordinates(x: Int, y: Int): String { + requireServer() + return automationClient.tapByCoordinates(x, y) + } + + internal suspend fun swipe(startX: Int, startY: Int, endX: Int, endY: Int, steps: Int = 20): String { + requireServer() + return automationClient.swipe(startX, startY, endX, endY, steps) + } + + internal suspend fun swipeByDirection(direction: String, distance: String = "medium", speed: String = "normal"): String { + requireServer() + return automationClient.swipeByDirection(direction, distance, speed) + } + + internal suspend fun swipeOnElement( + direction: String, + text: String?, + textContains: String?, + resourceId: String?, + className: String?, + contentDescription: String?, + speed: String = "normal" + ): String { + requireServer() + + if (text == null && textContains == null && resourceId == null && + className == null && contentDescription == null) { + return "Error: At least one selector required (text, textContains, resourceId, className, or contentDescription)" + } + + return automationClient.swipeOnElement( + direction = direction, + text = text, + textContains = textContains, + resourceId = resourceId, + className = className, + contentDescription = contentDescription, + speed = speed + ) + } + + internal suspend fun pressBack(): String { + requireServer() + return automationClient.pressBack() + } + + internal suspend fun pressHome(): String { + requireServer() + return automationClient.pressHome() + } + + internal suspend fun inputText(text: String): String { + requireServer() + return automationClient.inputText(text) + } - "Automation server may not have started properly. Check device logs with: adb logcat | grep AutomationServer" + internal suspend fun getDeviceInfo(): String { + requireServer() + return automationClient.getDeviceInfo() + } + + internal suspend fun getInteractiveElements(includeDisabled: Boolean = false): String { + requireServer() + return automationClient.getInteractiveElements(includeDisabled) + } + + // ==================== MCP Tool Registrations ==================== + + private fun registerInstallAutomationServer(scope: ToolScope) { + scope.tool( + name = "install_automation_server", + description = "Installs the automation server APKs on the connected Android device. Run this once before using start_automation_server." + ) { + installAutomationServer() + } + } + + private fun registerStartAutomationServer(scope: ToolScope) { + scope.tool( + name = "start_automation_server", + description = "Starts the automation server on the connected Android device. The APKs must be installed first using install_automation_server. Sets up port forwarding and starts the instrumentation server.", + timeoutMs = 30000 + ) { + startAutomationServer() } } @@ -119,12 +240,7 @@ class AndroidAutomationToolRegistrar( name = "automation_server_status", description = "Checks if the automation server is running on the connected Android device. Returns server status and connection information." ) { - val isRunning = automationClient.isServerRunning() - if (isRunning) { - "Automation server is running and accessible at localhost:${AutomationConfig.DEFAULT_PORT}" - } else { - "Automation server is not running. Use 'start_automation_server' to start it." - } + automationServerStatus() } } @@ -155,10 +271,7 @@ class AndroidAutomationToolRegistrar( """.trimIndent(), timeoutMs = 30000 ) { - if (!automationClient.isServerRunning()) { - return@tool "Automation server is not running. Use 'start_automation_server' first." - } - automationClient.getUiHierarchy() + getUiHierarchy() } } @@ -182,27 +295,12 @@ class AndroidAutomationToolRegistrar( """.trimIndent(), timeoutMs = 30000 ) { request -> - if (!automationClient.isServerRunning()) { - return@tool "Automation server is not running. Use 'start_automation_server' first." - } - - val text = request.optionalString("text") - val textContains = request.optionalString("textContains") - val resourceId = request.optionalString("resourceId") - val className = request.optionalString("className") - val contentDescription = request.optionalString("contentDescription") - - if (text == null && textContains == null && resourceId == null && - className == null && contentDescription == null) { - return@tool "Error: At least one selector required (text, textContains, resourceId, className, or contentDescription)" - } - - automationClient.findElement( - text = text, - textContains = textContains, - resourceId = resourceId, - className = className, - contentDescription = contentDescription + findElement( + text = request.optionalString("text"), + textContains = request.optionalString("textContains"), + resourceId = request.optionalString("resourceId"), + className = request.optionalString("className"), + contentDescription = request.optionalString("contentDescription") ) } } @@ -231,12 +329,7 @@ class AndroidAutomationToolRegistrar( """.trimIndent(), inputSchema = Tool.Input(required = listOf("x", "y")) ) { request -> - if (!automationClient.isServerRunning()) { - return@tool "Automation server is not running. Use 'start_automation_server' first." - } - val x = request.requireInt("x") - val y = request.requireInt("y") - automationClient.tapByCoordinates(x, y) + tapByCoordinates(request.requireInt("x"), request.requireInt("y")) } } @@ -262,15 +355,13 @@ class AndroidAutomationToolRegistrar( """.trimIndent(), inputSchema = Tool.Input(required = listOf("startX", "startY", "endX", "endY")) ) { request -> - if (!automationClient.isServerRunning()) { - return@tool "Automation server is not running. Use 'start_automation_server' first." - } - val startX = request.requireInt("startX") - val startY = request.requireInt("startY") - val endX = request.requireInt("endX") - val endY = request.requireInt("endY") - val steps = request.optionalInt("steps") ?: 20 - automationClient.swipe(startX, startY, endX, endY, steps) + swipe( + startX = request.requireInt("startX"), + startY = request.requireInt("startY"), + endX = request.requireInt("endX"), + endY = request.requireInt("endY"), + steps = request.optionalInt("steps") ?: 20 + ) } } @@ -308,16 +399,11 @@ class AndroidAutomationToolRegistrar( """.trimIndent(), inputSchema = Tool.Input(required = listOf("direction")) ) { request -> - if (!automationClient.isServerRunning()) { - return@tool "Automation server is not running. Use 'start_automation_server' first." - } - - val direction = request.requireDirection() - - val distance = request.optionalString("distance") ?: "medium" - val speed = request.optionalString("speed") ?: "normal" - - automationClient.swipeByDirection(direction, distance, speed) + swipeByDirection( + direction = request.requireDirection(), + distance = request.optionalString("distance") ?: "medium", + speed = request.optionalString("speed") ?: "normal" + ) } } @@ -359,33 +445,14 @@ class AndroidAutomationToolRegistrar( """.trimIndent(), inputSchema = Tool.Input(required = listOf("direction")) ) { request -> - if (!automationClient.isServerRunning()) { - return@tool "Automation server is not running. Use 'start_automation_server' first." - } - - val direction = request.requireDirection() - - val text = request.optionalString("text") - val textContains = request.optionalString("textContains") - val resourceId = request.optionalString("resourceId") - val className = request.optionalString("className") - val contentDescription = request.optionalString("contentDescription") - - if (text == null && textContains == null && resourceId == null && - className == null && contentDescription == null) { - return@tool "Error: At least one selector required (text, textContains, resourceId, className, or contentDescription)" - } - - val speed = request.optionalString("speed") ?: "normal" - - automationClient.swipeOnElement( - direction = direction, - text = text, - textContains = textContains, - resourceId = resourceId, - className = className, - contentDescription = contentDescription, - speed = speed + swipeOnElement( + direction = request.requireDirection(), + text = request.optionalString("text"), + textContains = request.optionalString("textContains"), + resourceId = request.optionalString("resourceId"), + className = request.optionalString("className"), + contentDescription = request.optionalString("contentDescription"), + speed = request.optionalString("speed") ?: "normal" ) } } @@ -413,10 +480,7 @@ class AndroidAutomationToolRegistrar( screen is now displayed before proceeding with further actions. """.trimIndent() ) { - if (!automationClient.isServerRunning()) { - return@tool "Automation server is not running. Use 'start_automation_server' first." - } - automationClient.pressBack() + pressBack() } } @@ -444,10 +508,7 @@ class AndroidAutomationToolRegistrar( home screen, then use 'launch_app_android' to start a different app. """.trimIndent() ) { - if (!automationClient.isServerRunning()) { - return@tool "Automation server is not running. Use 'start_automation_server' first." - } - automationClient.pressHome() + pressHome() } } @@ -463,11 +524,7 @@ class AndroidAutomationToolRegistrar( """.trimIndent(), inputSchema = Tool.Input(required = listOf("text")) ) { request -> - if (!automationClient.isServerRunning()) { - return@tool "Automation server is not running. Use 'start_automation_server' first." - } - val text = request.requireString("text") - automationClient.inputText(text) + inputText(request.requireString("text")) } } @@ -489,10 +546,7 @@ class AndroidAutomationToolRegistrar( - Verify SDK version for feature compatibility """.trimIndent() ) { - if (!automationClient.isServerRunning()) { - return@tool "Automation server is not running. Use 'start_automation_server' first." - } - automationClient.getDeviceInfo() + getDeviceInfo() } } @@ -531,13 +585,7 @@ class AndroidAutomationToolRegistrar( """.trimIndent(), timeoutMs = 30000 ) { request -> - if (!automationClient.isServerRunning()) { - return@tool "Automation server is not running. Use 'start_automation_server' first." - } - - val includeDisabled = request.optionalBoolean("includeDisabled") ?: false - - automationClient.getInteractiveElements(includeDisabled) + getInteractiveElements(request.optionalBoolean("includeDisabled") ?: false) } } @@ -564,10 +612,10 @@ class AndroidAutomationToolRegistrar( } } + // ==================== Screenshot helpers ==================== + internal suspend fun captureScreenshot(outputPath: String?): String { - if (!automationClient.isServerRunning()) { - return "Automation server is not running. Use 'start_automation_server' first." - } + requireServer() val response = automationClient.screenshot() val root = try { diff --git a/app/src/main/kotlin/com/example/visiontest/tools/AndroidDeviceToolRegistrar.kt b/app/src/main/kotlin/com/example/visiontest/tools/AndroidDeviceToolRegistrar.kt index a8596d6..a215916 100644 --- a/app/src/main/kotlin/com/example/visiontest/tools/AndroidDeviceToolRegistrar.kt +++ b/app/src/main/kotlin/com/example/visiontest/tools/AndroidDeviceToolRegistrar.kt @@ -25,13 +25,18 @@ class AndroidDeviceToolRegistrar( name = "available_device_android", description = "Returns detailed information about the first available Android device, including model, Android version, SDK version, and device state. Automatically selects the first active device connected via ADB." ) { - val result = android.getFirstAvailableDevice() - val deviceProps = android.executeShell("getprop", result.id) - val modelName = ToolHelpers.extractProperty(deviceProps, PROP_MODEL) - val androidVersion = ToolHelpers.extractProperty(deviceProps, PROP_ANDROID_VERSION) - val sdkVersion = ToolHelpers.extractProperty(deviceProps, PROP_SDK_VERSION) + availableDevice() + } + } + + internal suspend fun availableDevice(): String { + val result = android.getFirstAvailableDevice() + val deviceProps = android.executeShell("getprop", result.id) + val modelName = ToolHelpers.extractProperty(deviceProps, PROP_MODEL) + val androidVersion = ToolHelpers.extractProperty(deviceProps, PROP_ANDROID_VERSION) + val sdkVersion = ToolHelpers.extractProperty(deviceProps, PROP_SDK_VERSION) - """ + return """ |Device found: |Serial: ${result.id} |Model: $modelName @@ -39,7 +44,6 @@ class AndroidDeviceToolRegistrar( |SDK Version: $sdkVersion |State: ${result.state} """.trimMargin() - } } private fun registerListApps(scope: ToolScope) { @@ -47,12 +51,16 @@ class AndroidDeviceToolRegistrar( name = "list_apps_android", description = "Returns a complete list of all applications installed on the Android device. Returns package names (e.g., com.example.app) for all installed apps." ) { - val result = android.listApps() - if (result.isEmpty()) { - "No apps found on the device" - } else { - "Found these apps: ${result.joinToString(", ")}" - } + listApps() + } + } + + internal suspend fun listApps(): String { + val result = android.listApps() + return if (result.isEmpty()) { + "No apps found on the device" + } else { + "Found these apps: ${result.joinToString(", ")}" } } @@ -63,11 +71,15 @@ class AndroidDeviceToolRegistrar( inputSchema = Tool.Input(required = listOf("packageName")) ) { request -> val packageName = request.requireString("packageName") - val rawResult = android.getAppInfo(packageName) - ToolHelpers.formatAppInfo(rawResult, packageName) + infoApp(packageName) } } + internal suspend fun infoApp(packageName: String): String { + val rawResult = android.getAppInfo(packageName) + return ToolHelpers.formatAppInfo(rawResult, packageName) + } + private fun registerLaunchApp(scope: ToolScope) { scope.tool( name = "launch_app_android", @@ -75,12 +87,16 @@ class AndroidDeviceToolRegistrar( inputSchema = Tool.Input(required = listOf("packageName")) ) { request -> val packageName = request.requireString("packageName") - val result = android.launchApp(packageName) - if (result) { - "Successfully launched the app: $packageName" - } else { - "Failed to launch the app: $packageName" - } + launchApp(packageName) + } + } + + internal suspend fun launchApp(packageName: String): String { + val result = android.launchApp(packageName) + return if (result) { + "Successfully launched the app: $packageName" + } else { + "Failed to launch the app: $packageName" } } } diff --git a/app/src/main/kotlin/com/example/visiontest/tools/IOSAutomationToolRegistrar.kt b/app/src/main/kotlin/com/example/visiontest/tools/IOSAutomationToolRegistrar.kt index d5c7764..1e77311 100644 --- a/app/src/main/kotlin/com/example/visiontest/tools/IOSAutomationToolRegistrar.kt +++ b/app/src/main/kotlin/com/example/visiontest/tools/IOSAutomationToolRegistrar.kt @@ -1,5 +1,6 @@ package com.example.visiontest.tools +import com.example.visiontest.ServerNotRunningException import com.example.visiontest.common.DeviceConfig import com.example.visiontest.config.IOSAutomationConfig import com.example.visiontest.discovery.ToolDiscovery @@ -131,75 +132,171 @@ class IOSAutomationToolRegistrar( return ServerPollResult("iOS automation server did not respond after ${maxAttempts * 2}s. xcodebuild may still be building. Check with 'ios_automation_server_status' or run xcodebuild manually to see output.") } - // ==================== Tool Registrations ==================== + // ==================== Extracted business logic ==================== - private fun registerStartAutomationServer(scope: ToolScope) { - scope.tool( - name = "ios_start_automation_server", - description = """ - Starts the iOS automation server on the booted iOS simulator. - Uses pre-built test bundle if available, otherwise builds from source. + private suspend fun requireServer() { + if (!iosAutomationClient.isServerRunning()) throw ServerNotRunningException("iOS automation server is not running. Use 'ios_start_automation_server' first.") + } - The server starts on port ${IOSAutomationConfig.DEFAULT_PORT} and is directly - accessible at localhost (no port forwarding needed for iOS simulators). - """.trimIndent(), - timeoutMs = 200000 - ) { - val port = IOSAutomationConfig.DEFAULT_PORT + internal suspend fun startAutomationServer(): String { + val port = IOSAutomationConfig.DEFAULT_PORT - if (iosAutomationClient.isServerRunning()) { - return@tool "iOS automation server is already running on localhost:$port" - } + if (iosAutomationClient.isServerRunning()) { + return "iOS automation server is already running on localhost:$port" + } - // Clean up any orphaned previous process - iosXcodebuildProcess?.let { process -> - if (process.isAlive) { - logger.info("Destroying orphaned xcodebuild process before starting a new one") - process.destroyForcibly() - } - iosXcodebuildProcess = null + // Clean up any orphaned previous process + iosXcodebuildProcess?.let { process -> + if (process.isAlive) { + logger.info("Destroying orphaned xcodebuild process before starting a new one") + process.destroyForcibly() } + iosXcodebuildProcess = null + } - // Discover launch path: pre-built bundle preferred, source build as fallback - val xctestrunPath = discovery.findXctestrun() - val projectPath = discovery.findXcodeProject() + // Discover launch path: pre-built bundle preferred, source build as fallback + val xctestrunPath = discovery.findXctestrun() + val projectPath = discovery.findXcodeProject() - if (xctestrunPath == null && projectPath == null) { - return@tool "Neither pre-built iOS test bundle nor Xcode source project found. " + - "To fix: re-run install.sh on macOS to download the pre-built bundle, " + - "or clone the VisionTest repository and set ${IOSAutomationConfig.XCODE_PROJECT_PATH_ENV} " + - "to build from source." - } + if (xctestrunPath == null && projectPath == null) { + return "Neither pre-built iOS test bundle nor Xcode source project found. " + + "To fix: re-run install.sh on macOS to download the pre-built bundle, " + + "or clone the VisionTest repository and set ${IOSAutomationConfig.XCODE_PROJECT_PATH_ENV} " + + "to build from source." + } - val usingPrebuilt = xctestrunPath != null - if (usingPrebuilt) { - logger.info("Using pre-built iOS test bundle: $xctestrunPath") - } else { - logger.info("Using source build from Xcode project: $projectPath") - } + val usingPrebuilt = xctestrunPath != null + if (usingPrebuilt) { + logger.info("Using pre-built iOS test bundle: $xctestrunPath") + } else { + logger.info("Using source build from Xcode project: $projectPath") + } - val device = ios.getFirstAvailableDevice() - val simulatorName = device.name + val device = ios.getFirstAvailableDevice() + val simulatorName = device.name - val command = buildXcodebuildCommand(xctestrunPath, projectPath, simulatorName) - val maxAttempts = if (usingPrebuilt) 30 else 60 + val command = buildXcodebuildCommand(xctestrunPath, projectPath, simulatorName) + val maxAttempts = if (usingPrebuilt) 30 else 60 - val label = if (usingPrebuilt) "pre-built bundle" else "source build" - val primaryResult = startAndPollServer(command, maxAttempts, port, label) + val label = if (usingPrebuilt) "pre-built bundle" else "source build" + val primaryResult = startAndPollServer(command, maxAttempts, port, label) - if (primaryResult.earlyExitCode == null) { - return@tool primaryResult.message - } + if (primaryResult.earlyExitCode == null) { + return primaryResult.message + } - // Primary attempt exited early — try source build fallback if available - if (usingPrebuilt && projectPath != null) { - logger.warn("Pre-built bundle failed (exit code ${primaryResult.earlyExitCode}), falling back to source build") - val fallbackCommand = buildXcodebuildCommand(null, projectPath, simulatorName) - val fallbackResult = startAndPollServer(fallbackCommand, 60, port, "source build fallback") - return@tool fallbackResult.message - } + // Primary attempt exited early — try source build fallback if available + if (usingPrebuilt && projectPath != null) { + logger.warn("Pre-built bundle failed (exit code ${primaryResult.earlyExitCode}), falling back to source build") + val fallbackCommand = buildXcodebuildCommand(null, projectPath, simulatorName) + val fallbackResult = startAndPollServer(fallbackCommand, 60, port, "source build fallback") + return fallbackResult.message + } + + return primaryResult.message + } + + internal suspend fun automationServerStatus(): String { + val isRunning = iosAutomationClient.isServerRunning() + return if (isRunning) { + "iOS automation server is running and accessible at localhost:${IOSAutomationConfig.DEFAULT_PORT}" + } else { + "iOS automation server is not running. Use 'ios_start_automation_server' to start it." + } + } + + internal suspend fun getUiHierarchy(bundleId: String? = null): String { + requireServer() + return iosAutomationClient.getUiHierarchy(bundleId) + } + + internal suspend fun getInteractiveElements(includeDisabled: Boolean = false, bundleId: String? = null): String { + requireServer() + return iosAutomationClient.getInteractiveElements(includeDisabled, bundleId) + } + + internal suspend fun tapByCoordinates(x: Int, y: Int): String { + requireServer() + return iosAutomationClient.tapByCoordinates(x, y) + } + + internal suspend fun swipe(startX: Int, startY: Int, endX: Int, endY: Int, steps: Int = 20): String { + requireServer() + return iosAutomationClient.swipe(startX, startY, endX, endY, steps) + } + + internal suspend fun swipeByDirection(direction: String, distance: String = "medium", speed: String = "normal"): String { + requireServer() + return iosAutomationClient.swipeByDirection(direction, distance, speed) + } + + internal suspend fun findElement( + text: String?, + textContains: String?, + identifier: String?, + elementType: String?, + label: String?, + bundleId: String? + ): String { + requireServer() + + if (text == null && textContains == null && identifier == null && + elementType == null && label == null) { + return "Error: At least one selector required (text, textContains, resourceId, className, or contentDescription)" + } + + return iosAutomationClient.findElement( + text = text, + textContains = textContains, + identifier = identifier, + elementType = elementType, + label = label, + bundleId = bundleId + ) + } + + internal suspend fun getDeviceInfo(): String { + requireServer() + return iosAutomationClient.getDeviceInfo() + } + + internal suspend fun pressHome(): String { + requireServer() + return iosAutomationClient.pressHome() + } - primaryResult.message + internal suspend fun inputText(text: String, bundleId: String? = null): String { + requireServer() + return iosAutomationClient.inputText(text, bundleId) + } + + internal suspend fun stopAutomationServer(): String { + val process = iosXcodebuildProcess + return if (process != null && process.isAlive) { + process.destroyForcibly() + iosXcodebuildProcess = null + "iOS automation server stopped successfully." + } else { + iosXcodebuildProcess = null + "iOS automation server is not running." + } + } + + // ==================== MCP Tool Registrations ==================== + + private fun registerStartAutomationServer(scope: ToolScope) { + scope.tool( + name = "ios_start_automation_server", + description = """ + Starts the iOS automation server on the booted iOS simulator. + Uses pre-built test bundle if available, otherwise builds from source. + + The server starts on port ${IOSAutomationConfig.DEFAULT_PORT} and is directly + accessible at localhost (no port forwarding needed for iOS simulators). + """.trimIndent(), + timeoutMs = 200000 + ) { + startAutomationServer() } } @@ -208,12 +305,7 @@ class IOSAutomationToolRegistrar( name = "ios_automation_server_status", description = "Checks if the iOS automation server is running on the simulator. Returns server status and connection information." ) { - val isRunning = iosAutomationClient.isServerRunning() - if (isRunning) { - "iOS automation server is running and accessible at localhost:${IOSAutomationConfig.DEFAULT_PORT}" - } else { - "iOS automation server is not running. Use 'ios_start_automation_server' to start it." - } + automationServerStatus() } } @@ -240,11 +332,7 @@ class IOSAutomationToolRegistrar( """.trimIndent(), timeoutMs = 30000 ) { request -> - if (!iosAutomationClient.isServerRunning()) { - return@tool "iOS automation server is not running. Use 'ios_start_automation_server' first." - } - val bundleId = request.optionalString("bundleId") - iosAutomationClient.getUiHierarchy(bundleId) + getUiHierarchy(request.optionalString("bundleId")) } } @@ -271,14 +359,10 @@ class IOSAutomationToolRegistrar( """.trimIndent(), timeoutMs = 30000 ) { request -> - if (!iosAutomationClient.isServerRunning()) { - return@tool "iOS automation server is not running. Use 'ios_start_automation_server' first." - } - - val includeDisabled = request.optionalBoolean("includeDisabled") ?: false - val bundleId = request.optionalString("bundleId") - - iosAutomationClient.getInteractiveElements(includeDisabled, bundleId) + getInteractiveElements( + includeDisabled = request.optionalBoolean("includeDisabled") ?: false, + bundleId = request.optionalString("bundleId") + ) } } @@ -294,12 +378,7 @@ class IOSAutomationToolRegistrar( """.trimIndent(), inputSchema = Tool.Input(required = listOf("x", "y")) ) { request -> - if (!iosAutomationClient.isServerRunning()) { - return@tool "iOS automation server is not running. Use 'ios_start_automation_server' first." - } - val x = request.requireInt("x") - val y = request.requireInt("y") - iosAutomationClient.tapByCoordinates(x, y) + tapByCoordinates(request.requireInt("x"), request.requireInt("y")) } } @@ -317,15 +396,13 @@ class IOSAutomationToolRegistrar( """.trimIndent(), inputSchema = Tool.Input(required = listOf("startX", "startY", "endX", "endY")) ) { request -> - if (!iosAutomationClient.isServerRunning()) { - return@tool "iOS automation server is not running. Use 'ios_start_automation_server' first." - } - val startX = request.requireInt("startX") - val startY = request.requireInt("startY") - val endX = request.requireInt("endX") - val endY = request.requireInt("endY") - val steps = request.optionalInt("steps") ?: 20 - iosAutomationClient.swipe(startX, startY, endX, endY, steps) + swipe( + startX = request.requireInt("startX"), + startY = request.requireInt("startY"), + endX = request.requireInt("endX"), + endY = request.requireInt("endY"), + steps = request.optionalInt("steps") ?: 20 + ) } } @@ -351,16 +428,11 @@ class IOSAutomationToolRegistrar( """.trimIndent(), inputSchema = Tool.Input(required = listOf("direction")) ) { request -> - if (!iosAutomationClient.isServerRunning()) { - return@tool "iOS automation server is not running. Use 'ios_start_automation_server' first." - } - - val direction = request.requireDirection() - - val distance = request.optionalString("distance") ?: "medium" - val speed = request.optionalString("speed") ?: "normal" - - iosAutomationClient.swipeByDirection(direction, distance, speed) + swipeByDirection( + direction = request.requireDirection(), + distance = request.optionalString("distance") ?: "medium", + speed = request.optionalString("speed") ?: "normal" + ) } } @@ -383,29 +455,13 @@ class IOSAutomationToolRegistrar( """.trimIndent(), timeoutMs = 30000 ) { request -> - if (!iosAutomationClient.isServerRunning()) { - return@tool "iOS automation server is not running. Use 'ios_start_automation_server' first." - } - - val text = request.optionalString("text") - val textContains = request.optionalString("textContains") - val identifier = request.optionalString("resourceId") - val elementType = request.optionalString("className") - val label = request.optionalString("contentDescription") - val bundleId = request.optionalString("bundleId") - - if (text == null && textContains == null && identifier == null && - elementType == null && label == null) { - return@tool "Error: At least one selector required (text, textContains, resourceId, className, or contentDescription)" - } - - iosAutomationClient.findElement( - text = text, - textContains = textContains, - identifier = identifier, - elementType = elementType, - label = label, - bundleId = bundleId + findElement( + text = request.optionalString("text"), + textContains = request.optionalString("textContains"), + identifier = request.optionalString("resourceId"), + elementType = request.optionalString("className"), + label = request.optionalString("contentDescription"), + bundleId = request.optionalString("bundleId") ) } } @@ -424,10 +480,7 @@ class IOSAutomationToolRegistrar( - Device model """.trimIndent() ) { - if (!iosAutomationClient.isServerRunning()) { - return@tool "iOS automation server is not running. Use 'ios_start_automation_server' first." - } - iosAutomationClient.getDeviceInfo() + getDeviceInfo() } } @@ -441,10 +494,7 @@ class IOSAutomationToolRegistrar( Returns to the home screen. The current app moves to the background. """.trimIndent() ) { - if (!iosAutomationClient.isServerRunning()) { - return@tool "iOS automation server is not running. Use 'ios_start_automation_server' first." - } - iosAutomationClient.pressHome() + pressHome() } } @@ -467,12 +517,10 @@ class IOSAutomationToolRegistrar( """.trimIndent(), inputSchema = Tool.Input(required = listOf("text")) ) { request -> - if (!iosAutomationClient.isServerRunning()) { - return@tool "iOS automation server is not running. Use 'ios_start_automation_server' first." - } - val text = request.requireString("text") - val bundleId = request.optionalString("bundleId") - iosAutomationClient.inputText(text, bundleId) + inputText( + text = request.requireString("text"), + bundleId = request.optionalString("bundleId") + ) } } @@ -499,10 +547,10 @@ class IOSAutomationToolRegistrar( } } + // ==================== Screenshot helpers ==================== + internal suspend fun captureScreenshot(outputPath: String?): String { - if (!iosAutomationClient.isServerRunning()) { - return "iOS automation server is not running. Use 'ios_start_automation_server' first." - } + requireServer() val response = iosAutomationClient.screenshot() val root = try { @@ -654,15 +702,7 @@ class IOSAutomationToolRegistrar( name = "ios_stop_automation_server", description = "Stops the iOS automation server running on the simulator." ) { - val process = iosXcodebuildProcess - if (process != null && process.isAlive) { - process.destroyForcibly() - iosXcodebuildProcess = null - "iOS automation server stopped successfully." - } else { - iosXcodebuildProcess = null - "iOS automation server is not running." - } + stopAutomationServer() } } } diff --git a/app/src/main/kotlin/com/example/visiontest/tools/IOSDeviceToolRegistrar.kt b/app/src/main/kotlin/com/example/visiontest/tools/IOSDeviceToolRegistrar.kt index 93b9b9f..7400f9a 100644 --- a/app/src/main/kotlin/com/example/visiontest/tools/IOSDeviceToolRegistrar.kt +++ b/app/src/main/kotlin/com/example/visiontest/tools/IOSDeviceToolRegistrar.kt @@ -19,9 +19,14 @@ class IOSDeviceToolRegistrar( name = "ios_available_device", description = "Returns detailed information about the first available iOS device or simulator. Includes device ID, name, type, state (Booted/Shutdown), iOS version, and model. Prioritizes booted simulators over shutdown ones." ) { - val device = ios.getFirstAvailableDevice() + availableDevice() + } + } + + internal suspend fun availableDevice(): String { + val device = ios.getFirstAvailableDevice() - """ + return """ |iOS Device found: |ID: ${device.id} |Name: ${device.name} @@ -30,7 +35,6 @@ class IOSDeviceToolRegistrar( |OS Version: ${device.osVersion ?: "Unknown"} |Model: ${device.modelName ?: "Unknown"} """.trimMargin() - } } private fun registerListApps(scope: ToolScope) { @@ -38,12 +42,16 @@ class IOSDeviceToolRegistrar( name = "ios_list_apps", description = "Returns a complete list of all applications installed on the iOS device or simulator. Returns bundle IDs (e.g., com.apple.mobilesafari) for all installed apps. Device must be booted." ) { - val result = ios.listApps() - if (result.isEmpty()) { - "No apps found on the iOS device" - } else { - "Found these apps: ${result.joinToString(", ")}" - } + listApps() + } + } + + internal suspend fun listApps(): String { + val result = ios.listApps() + return if (result.isEmpty()) { + "No apps found on the iOS device" + } else { + "Found these apps: ${result.joinToString(", ")}" } } @@ -54,11 +62,15 @@ class IOSDeviceToolRegistrar( inputSchema = Tool.Input(required = listOf("bundleId")) ) { request -> val bundleId = request.requireString("bundleId") - val rawResult = ios.getAppInfo(bundleId) - "App Information for $bundleId:\n$rawResult" + infoApp(bundleId) } } + internal suspend fun infoApp(bundleId: String): String { + val rawResult = ios.getAppInfo(bundleId) + return "App Information for $bundleId:\n$rawResult" + } + private fun registerLaunchApp(scope: ToolScope) { scope.tool( name = "ios_launch_app", @@ -66,12 +78,16 @@ class IOSDeviceToolRegistrar( inputSchema = Tool.Input(required = listOf("bundleId")) ) { request -> val bundleId = request.requireString("bundleId") - val result = ios.launchApp(bundleId) - if (result) { - "Successfully launched the iOS app: $bundleId" - } else { - "Failed to launch the iOS app: $bundleId" - } + launchApp(bundleId) + } + } + + internal suspend fun launchApp(bundleId: String): String { + val result = ios.launchApp(bundleId) + return if (result) { + "Successfully launched the iOS app: $bundleId" + } else { + "Failed to launch the iOS app: $bundleId" } } } diff --git a/app/src/test/kotlin/com/example/visiontest/MainDispatchTest.kt b/app/src/test/kotlin/com/example/visiontest/MainDispatchTest.kt new file mode 100644 index 0000000..4bfb27f --- /dev/null +++ b/app/src/test/kotlin/com/example/visiontest/MainDispatchTest.kt @@ -0,0 +1,37 @@ +package com.example.visiontest + +import kotlin.test.Test +import kotlin.test.assertEquals + +class MainDispatchTest { + + @Test + fun `empty args route to the MCP server`() { + assertEquals(Route.McpServer, route(emptyArray())) + } + + @Test + fun `explicit serve subcommand routes to the MCP server`() { + assertEquals(Route.McpServer, route(arrayOf("serve"))) + } + + @Test + fun `serve with additional args still routes to the MCP server`() { + assertEquals(Route.McpServer, route(arrayOf("serve", "--ignored"))) + } + + @Test + fun `unknown first arg routes to the CLI`() { + assertEquals(Route.Cli, route(arrayOf("tap_by_coordinates", "--platform", "android", "100", "200"))) + } + + @Test + fun `help flag routes to the CLI`() { + assertEquals(Route.Cli, route(arrayOf("--help"))) + } + + @Test + fun `any non-serve token routes to the CLI`() { + assertEquals(Route.Cli, route(arrayOf("screenshot"))) + } +} diff --git a/app/src/test/kotlin/com/example/visiontest/cli/CliCommandIntegrationTest.kt b/app/src/test/kotlin/com/example/visiontest/cli/CliCommandIntegrationTest.kt new file mode 100644 index 0000000..b85efdf --- /dev/null +++ b/app/src/test/kotlin/com/example/visiontest/cli/CliCommandIntegrationTest.kt @@ -0,0 +1,177 @@ +package com.example.visiontest.cli + +import com.example.visiontest.android.AutomationClient +import com.example.visiontest.cli.commands.* +import com.example.visiontest.common.DeviceConfig +import com.example.visiontest.common.DeviceType +import com.example.visiontest.common.MobileDevice +import com.example.visiontest.discovery.ToolDiscovery +import com.example.visiontest.ios.IOSAutomationClient +import com.example.visiontest.ios.IOSManager +import com.example.visiontest.tools.AndroidAutomationToolRegistrar +import com.example.visiontest.tools.AndroidDeviceToolRegistrar +import com.example.visiontest.tools.IOSAutomationToolRegistrar +import com.example.visiontest.tools.IOSDeviceToolRegistrar +import okhttp3.mockwebserver.MockResponse +import okhttp3.mockwebserver.MockWebServer +import org.slf4j.LoggerFactory +import kotlin.test.* + +/** + * Integration-style tests: each test constructs a real CLI command with faked backends, + * parses args, and verifies delegation produces expected output via [executeCliCommand]. + */ +class CliCommandIntegrationTest { + + private lateinit var androidMock: MockWebServer + private lateinit var iosMock: MockWebServer + private lateinit var components: ComponentHolder + + private val fakeDevice = MobileDevice( + id = "emulator-5554", name = "Pixel_6", type = DeviceType.ANDROID, state = "device" + ) + + private val fakeDeviceConfig = object : DeviceConfig { + override suspend fun listDevices() = listOf(fakeDevice) + override suspend fun getFirstAvailableDevice() = fakeDevice + override suspend fun listApps(deviceId: String?) = listOf("com.example.app") + override suspend fun getAppInfo(packageName: String, deviceId: String?) = "version=1.0" + override suspend fun launchApp(packageName: String, activityName: String?, deviceId: String?) = true + override suspend fun executeShell(command: String, deviceId: String?) = "" + } + + private val logger = LoggerFactory.getLogger(CliCommandIntegrationTest::class.java) + + @BeforeTest + fun setUp() { + androidMock = MockWebServer() + androidMock.start() + iosMock = MockWebServer() + iosMock.start() + + val androidClient = AutomationClient(host = androidMock.hostName, port = androidMock.port) + val iosClient = IOSAutomationClient(host = iosMock.hostName, port = iosMock.port) + val discovery = ToolDiscovery(logger) + + // Use a real IOSManager but it won't be called for Android tests + val iosManager = IOSManager(logger = logger) + + components = ComponentHolder( + android = com.example.visiontest.android.Android(logger = logger), + ios = iosManager, + automationClient = androidClient, + iosAutomationClient = iosClient, + androidDeviceRegistrar = AndroidDeviceToolRegistrar(fakeDeviceConfig), + androidAutomationRegistrar = AndroidAutomationToolRegistrar(fakeDeviceConfig, androidClient, discovery), + iosDeviceRegistrar = IOSDeviceToolRegistrar(fakeDeviceConfig), + iosAutomationRegistrar = IOSAutomationToolRegistrar(fakeDeviceConfig, iosClient, discovery, logger), + ) + } + + @AfterTest + fun tearDown() { + androidMock.shutdown() + iosMock.shutdown() + } + + // --- automation_server_status --- + + @Test + fun `automation_server_status android when running`() { + androidMock.enqueue(MockResponse().setResponseCode(200).setBody("OK")) + val result = executeCliCommand { + components.androidAutomationRegistrar.automationServerStatus() + } + assertEquals(0, result.exitCode) + assertTrue(result.stdout!!.contains("running")) + } + + // --- tap_by_coordinates --- + + @Test + fun `tap_by_coordinates parses and delegates`() { + // Enqueue health check (for requireServerRunning) + health check (for requireServer) + tap response + androidMock.enqueue(MockResponse().setResponseCode(200).setBody("OK")) + androidMock.enqueue(MockResponse().setResponseCode(200).setBody("OK")) + androidMock.enqueue(MockResponse().setResponseCode(200).setBody( + """{"jsonrpc":"2.0","id":1,"result":"Tapped at (100, 200)"}""" + )) + val result = executeCliCommand { + requireServerRunning { components.isServerRunning(Platform.Android) } + components.androidAutomationRegistrar.tapByCoordinates(100, 200) + } + assertEquals(0, result.exitCode) + assertTrue(result.stdout!!.contains("100")) + assertTrue(result.stdout!!.contains("200")) + } + + // --- server not running → exit 3 --- + + @Test + fun `command with server not running returns exit 3`() { + // MockWebServer won't respond to health check → connection refused handled + androidMock.shutdown() // force connection refused + val result = executeCliCommand { + requireServerRunning { components.isServerRunning(Platform.Android) } + components.androidAutomationRegistrar.getUiHierarchy() + } + assertEquals(3, result.exitCode) + assertTrue(result.stderr!!.contains("not running")) + } + + // --- press_back rejects ios --- + + @Test + fun `press_back rejects ios platform`() { + val result = executeCliCommand { + requireAndroid(Platform.Ios, "press_back") + components.androidAutomationRegistrar.pressBack() + } + assertEquals(5, result.exitCode) + assertTrue(result.stderr!!.contains("only supported on Android")) + } + + // --- launch_app delegates to device registrar --- + + @Test + fun `launch_app android delegates correctly`() { + val result = executeCliCommand { + components.androidDeviceRegistrar.launchApp("com.example.app") + } + assertEquals(0, result.exitCode) + assertTrue(result.stdout!!.contains("com.example.app")) + } + + // --- swipe_direction validates choices --- + + @Test + fun `swipe_direction with valid args dispatches`() { + // health check (requireServerRunning) + health check (requireServer) + swipe response + androidMock.enqueue(MockResponse().setResponseCode(200).setBody("OK")) + androidMock.enqueue(MockResponse().setResponseCode(200).setBody("OK")) + androidMock.enqueue(MockResponse().setResponseCode(200).setBody( + """{"jsonrpc":"2.0","id":1,"result":"Swiped up"}""" + )) + val result = executeCliCommand { + requireServerRunning { components.isServerRunning(Platform.Android) } + components.androidAutomationRegistrar.swipeByDirection("up", "medium", "normal") + } + assertEquals(0, result.exitCode) + } + + // --- input_text --- + + @Test + fun `input_text delegates with correct text`() { + androidMock.enqueue(MockResponse().setResponseCode(200).setBody("OK")) + androidMock.enqueue(MockResponse().setResponseCode(200).setBody("OK")) + androidMock.enqueue(MockResponse().setResponseCode(200).setBody( + """{"jsonrpc":"2.0","id":1,"result":"Text entered"}""" + )) + val result = executeCliCommand { + requireServerRunning { components.isServerRunning(Platform.Android) } + components.androidAutomationRegistrar.inputText("hello world") + } + assertEquals(0, result.exitCode) + } +} diff --git a/app/src/test/kotlin/com/example/visiontest/cli/CliErrorHandlerTest.kt b/app/src/test/kotlin/com/example/visiontest/cli/CliErrorHandlerTest.kt new file mode 100644 index 0000000..4482e08 --- /dev/null +++ b/app/src/test/kotlin/com/example/visiontest/cli/CliErrorHandlerTest.kt @@ -0,0 +1,98 @@ +package com.example.visiontest.cli + +import com.example.visiontest.NoDeviceAvailableException +import com.example.visiontest.NoSimulatorAvailableException +import kotlin.test.Test +import kotlin.test.assertEquals +import kotlin.test.assertNotNull +import kotlin.test.assertNull + +class CliErrorHandlerTest { + + @Test + fun `success returns exit 0 with stdout`() { + val result = executeCliCommand { "hello world" } + assertEquals(0, result.exitCode) + assertEquals("hello world", result.stdout) + assertNull(result.stderr) + } + + @Test + fun `CliExit ServerNotReachable returns exit 3`() { + val result = executeCliCommand { + throw CliExit(ExitCode.ServerNotReachable, "Server not running") + } + assertEquals(3, result.exitCode) + assertNull(result.stdout) + assertEquals("Server not running", result.stderr) + } + + @Test + fun `CliExit DeviceNotFound returns exit 4`() { + val result = executeCliCommand { + throw CliExit(ExitCode.DeviceNotFound, "No device") + } + assertEquals(4, result.exitCode) + assertNull(result.stdout) + } + + @Test + fun `CliExit PlatformNotSupported returns exit 5`() { + val result = executeCliCommand { + throw CliExit(ExitCode.PlatformNotSupported, "Android only") + } + assertEquals(5, result.exitCode) + } + + @Test + fun `NoDeviceAvailableException returns exit 4`() { + val result = executeCliCommand { + throw NoDeviceAvailableException("No Android device found") + } + assertEquals(4, result.exitCode) + assertEquals("No Android device found", result.stderr) + } + + @Test + fun `NoSimulatorAvailableException returns exit 4`() { + val result = executeCliCommand { + throw NoSimulatorAvailableException("No iOS simulator found") + } + assertEquals(4, result.exitCode) + assertEquals("No iOS simulator found", result.stderr) + } + + @Test + fun `IllegalArgumentException returns exit 2`() { + val result = executeCliCommand { + throw IllegalArgumentException("bad arg") + } + assertEquals(2, result.exitCode) + assertEquals("bad arg", result.stderr) + } + + @Test + fun `unexpected exception returns exit 1`() { + val result = executeCliCommand { + throw RuntimeException("something broke") + } + assertEquals(1, result.exitCode) + assertEquals("something broke", result.stderr) + } + + @Test + fun `CliExit GenericFailure returns exit 1`() { + val result = executeCliCommand { + throw CliExit(ExitCode.GenericFailure, "generic error") + } + assertEquals(1, result.exitCode) + } + + @Test + fun `CliExit UsageError returns exit 2`() { + val result = executeCliCommand { + throw CliExit(ExitCode.UsageError, "usage problem") + } + assertEquals(2, result.exitCode) + } +} diff --git a/app/src/test/kotlin/com/example/visiontest/cli/VisionTestCliTest.kt b/app/src/test/kotlin/com/example/visiontest/cli/VisionTestCliTest.kt new file mode 100644 index 0000000..cf15e99 --- /dev/null +++ b/app/src/test/kotlin/com/example/visiontest/cli/VisionTestCliTest.kt @@ -0,0 +1,167 @@ +package com.example.visiontest.cli + +import com.github.ajalt.clikt.core.MissingArgument +import com.github.ajalt.clikt.core.BadParameterValue +import com.github.ajalt.clikt.core.MissingOption +import com.github.ajalt.clikt.core.CliktCommand +import com.github.ajalt.clikt.parameters.arguments.argument +import com.github.ajalt.clikt.parameters.types.choice +import com.github.ajalt.clikt.parameters.types.int +import kotlin.test.Test +import kotlin.test.assertEquals +import kotlin.test.assertFailsWith +import kotlin.test.assertTrue + +/** + * Tests Clikt argument parsing for CLI commands without triggering `exitProcess`. + * Uses lightweight stub commands that capture parsed values instead of calling `runCliCommand`. + */ +class VisionTestCliTest { + + // --- Helpers: lightweight test commands that record parsed values --- + + private class TestPlatformCommand(name: String) : CliktCommand(name = name) { + val platform by platformOption() + var ran = false + override fun run() { ran = true } + } + + private class TestAndroidOnlyCommand(name: String) : CliktCommand(name = name) { + val platform by androidOnlyPlatformOption() + var ran = false + override fun run() { ran = true } + } + + private class TestTapCommand : CliktCommand(name = "tap_by_coordinates") { + val platform by platformOption() + val x by argument().int() + val y by argument().int() + var ran = false + override fun run() { ran = true } + } + + // --- Platform parsing --- + + @Test + fun `cross-platform command accepts android`() { + val cmd = TestPlatformCommand("test") + cmd.parse(listOf("--platform", "android")) + assertEquals(Platform.Android, cmd.platform) + assertTrue(cmd.ran) + } + + @Test + fun `cross-platform command accepts ios`() { + val cmd = TestPlatformCommand("test") + cmd.parse(listOf("--platform", "ios")) + assertEquals(Platform.Ios, cmd.platform) + } + + @Test + fun `cross-platform command accepts short flag`() { + val cmd = TestPlatformCommand("test") + cmd.parse(listOf("-p", "android")) + assertEquals(Platform.Android, cmd.platform) + } + + @Test + fun `missing platform produces usage error`() { + val cmd = TestPlatformCommand("test") + assertFailsWith { + cmd.parse(emptyList()) + } + } + + @Test + fun `invalid platform value is rejected`() { + val cmd = TestPlatformCommand("test") + assertFailsWith { + cmd.parse(listOf("--platform", "windows")) + } + } + + // --- Android-only commands --- + + @Test + fun `android-only command accepts android`() { + val cmd = TestAndroidOnlyCommand("test") + cmd.parse(listOf("--platform", "android")) + assertTrue(cmd.ran) + } + + @Test + fun `android-only command parses ios then requireAndroid rejects`() { + val cmd = TestAndroidOnlyCommand("test") + cmd.parse(listOf("--platform", "ios")) + // Parsing succeeds; the command's run() would call requireAndroid() to reject + assertEquals(Platform.Ios, cmd.platform) + assertFailsWith { + requireAndroid(cmd.platform, "test") + } + } + + // --- Positional args --- + + @Test + fun `tap command parses x and y`() { + val cmd = TestTapCommand() + cmd.parse(listOf("--platform", "android", "100", "200")) + assertEquals(100, cmd.x) + assertEquals(200, cmd.y) + } + + @Test + fun `tap command missing y produces error`() { + val cmd = TestTapCommand() + assertFailsWith { + cmd.parse(listOf("--platform", "android", "100")) + } + } + + @Test + fun `tap command non-integer x produces error`() { + val cmd = TestTapCommand() + assertFailsWith { + cmd.parse(listOf("--platform", "android", "abc", "200")) + } + } + + // --- Subcommand routing --- + + @Test + fun `root command lists all 13 subcommands`() { + // We can't use the real VisionTestCli (ComponentHolder is lazy but still needs ADB), + // so we just verify the count expectation as a documentation test. + val expectedCommands = listOf( + "install_automation_server", "start_automation_server", "automation_server_status", + "get_interactive_elements", "get_ui_hierarchy", "get_device_info", "screenshot", + "tap_by_coordinates", "input_text", "swipe_direction", + "press_back", "press_home", "launch_app" + ) + assertEquals(13, expectedCommands.size) + } + + // --- SwipeDirection choice validation --- + + private class TestSwipeCommand : CliktCommand(name = "swipe_direction") { + val platform by platformOption() + val direction by argument().choice("up", "down", "left", "right") + var ran = false + override fun run() { ran = true } + } + + @Test + fun `swipe direction rejects invalid direction`() { + val cmd = TestSwipeCommand() + assertFailsWith { + cmd.parse(listOf("--platform", "android", "diagonal")) + } + } + + @Test + fun `swipe direction accepts valid direction`() { + val cmd = TestSwipeCommand() + cmd.parse(listOf("--platform", "android", "up")) + assertEquals("up", cmd.direction) + } +} diff --git a/app/src/test/kotlin/com/example/visiontest/tools/AndroidAutomationToolRegistrarTest.kt b/app/src/test/kotlin/com/example/visiontest/tools/AndroidAutomationToolRegistrarTest.kt new file mode 100644 index 0000000..14789a6 --- /dev/null +++ b/app/src/test/kotlin/com/example/visiontest/tools/AndroidAutomationToolRegistrarTest.kt @@ -0,0 +1,165 @@ +package com.example.visiontest.tools + +import com.example.visiontest.ServerNotRunningException +import com.example.visiontest.android.AutomationClient +import com.example.visiontest.common.DeviceConfig +import com.example.visiontest.common.DeviceType +import com.example.visiontest.common.MobileDevice +import com.example.visiontest.discovery.ToolDiscovery +import kotlinx.coroutines.runBlocking +import okhttp3.mockwebserver.MockResponse +import okhttp3.mockwebserver.MockWebServer +import org.slf4j.LoggerFactory +import kotlin.test.* + +/** + * Tests for the extracted `internal suspend` functions on [AndroidAutomationToolRegistrar], + * exercised directly without going through ToolScope/MCP. + * + * Screenshot-related methods are already thoroughly tested in [AndroidScreenshotToolTest]; + * this class covers the remaining extracted functions. + */ +class AndroidAutomationToolRegistrarTest { + + private lateinit var mockServer: MockWebServer + private lateinit var registrar: AndroidAutomationToolRegistrar + + private val logger = LoggerFactory.getLogger(AndroidAutomationToolRegistrarTest::class.java) + private val fakeDeviceConfig = object : DeviceConfig { + override suspend fun listDevices() = emptyList() + override suspend fun getFirstAvailableDevice() = MobileDevice( + id = "emulator-5554", name = "Pixel_6", type = DeviceType.ANDROID, state = "device" + ) + override suspend fun listApps(deviceId: String?) = emptyList() + override suspend fun getAppInfo(packageName: String, deviceId: String?) = "" + override suspend fun launchApp(packageName: String, activityName: String?, deviceId: String?) = false + override suspend fun executeShell(command: String, deviceId: String?) = "" + } + + @BeforeTest + fun setUp() { + mockServer = MockWebServer() + mockServer.start() + val client = AutomationClient(host = mockServer.hostName, port = mockServer.port) + registrar = AndroidAutomationToolRegistrar(fakeDeviceConfig, client, ToolDiscovery(logger)) + } + + @AfterTest + fun tearDown() { + mockServer.shutdown() + } + + // --- automationServerStatus --- + + @Test + fun `automationServerStatus when running`() = runBlocking { + mockServer.enqueue(MockResponse().setResponseCode(200).setBody("OK")) + val result = registrar.automationServerStatus() + assertTrue(result.contains("running and accessible")) + } + + @Test + fun `automationServerStatus when not running`() = runBlocking { + mockServer.enqueue(MockResponse().setResponseCode(500)) + val result = registrar.automationServerStatus() + assertTrue(result.contains("not running")) + } + + // --- server-not-running guard on various functions --- + + @Test + fun `tapByCoordinates throws when server not running`() = runBlocking { + mockServer.enqueue(MockResponse().setResponseCode(500)) + val ex = assertFailsWith { registrar.tapByCoordinates(100, 200) } + assertTrue(ex.message!!.contains("not running")) + } + + @Test + fun `getUiHierarchy throws when server not running`() = runBlocking { + mockServer.enqueue(MockResponse().setResponseCode(500)) + val ex = assertFailsWith { registrar.getUiHierarchy() } + assertTrue(ex.message!!.contains("not running")) + } + + @Test + fun `pressBack throws when server not running`() = runBlocking { + mockServer.enqueue(MockResponse().setResponseCode(500)) + val ex = assertFailsWith { registrar.pressBack() } + assertTrue(ex.message!!.contains("not running")) + } + + @Test + fun `pressHome throws when server not running`() = runBlocking { + mockServer.enqueue(MockResponse().setResponseCode(500)) + val ex = assertFailsWith { registrar.pressHome() } + assertTrue(ex.message!!.contains("not running")) + } + + @Test + fun `inputText throws when server not running`() = runBlocking { + mockServer.enqueue(MockResponse().setResponseCode(500)) + val ex = assertFailsWith { registrar.inputText("hello") } + assertTrue(ex.message!!.contains("not running")) + } + + @Test + fun `getDeviceInfo throws when server not running`() = runBlocking { + mockServer.enqueue(MockResponse().setResponseCode(500)) + val ex = assertFailsWith { registrar.getDeviceInfo() } + assertTrue(ex.message!!.contains("not running")) + } + + @Test + fun `getInteractiveElements throws when server not running`() = runBlocking { + mockServer.enqueue(MockResponse().setResponseCode(500)) + val ex = assertFailsWith { registrar.getInteractiveElements() } + assertTrue(ex.message!!.contains("not running")) + } + + @Test + fun `swipeByDirection throws when server not running`() = runBlocking { + mockServer.enqueue(MockResponse().setResponseCode(500)) + val ex = assertFailsWith { registrar.swipeByDirection("up") } + assertTrue(ex.message!!.contains("not running")) + } + + // --- findElement validation --- + + @Test + fun `findElement requires at least one selector`() = runBlocking { + mockServer.enqueue(MockResponse().setResponseCode(200).setBody("OK")) + val result = registrar.findElement(null, null, null, null, null) + assertTrue(result.contains("At least one selector required")) + } + + @Test + fun `findElement throws when server not running`() = runBlocking { + mockServer.enqueue(MockResponse().setResponseCode(500)) + val ex = assertFailsWith { + registrar.findElement(text = "hello", textContains = null, resourceId = null, className = null, contentDescription = null) + } + assertTrue(ex.message!!.contains("not running")) + } + + // --- swipeOnElement validation --- + + @Test + fun `swipeOnElement requires at least one selector`() = runBlocking { + mockServer.enqueue(MockResponse().setResponseCode(200).setBody("OK")) + val result = registrar.swipeOnElement("up", null, null, null, null, null) + assertTrue(result.contains("At least one selector required")) + } + + // --- tapByCoordinates delegates when server running --- + + @Test + fun `tapByCoordinates delegates to automationClient`() = runBlocking { + // health check + mockServer.enqueue(MockResponse().setResponseCode(200).setBody("OK")) + // tap response + mockServer.enqueue(MockResponse().setBody("""{"jsonrpc":"2.0","result":"Tapped at (100, 200)","id":1}""")) + val result = registrar.tapByCoordinates(100, 200) + // The raw JSON-RPC response is returned by AutomationClient + assertTrue(result.isNotEmpty()) + } +} diff --git a/app/src/test/kotlin/com/example/visiontest/tools/AndroidDeviceToolRegistrarTest.kt b/app/src/test/kotlin/com/example/visiontest/tools/AndroidDeviceToolRegistrarTest.kt new file mode 100644 index 0000000..1152f19 --- /dev/null +++ b/app/src/test/kotlin/com/example/visiontest/tools/AndroidDeviceToolRegistrarTest.kt @@ -0,0 +1,78 @@ +package com.example.visiontest.tools + +import com.example.visiontest.common.DeviceConfig +import com.example.visiontest.common.DeviceType +import com.example.visiontest.common.MobileDevice +import kotlinx.coroutines.runBlocking +import kotlin.test.* + +/** + * Tests for the extracted `internal suspend` functions on [AndroidDeviceToolRegistrar], + * exercised directly without going through ToolScope/MCP. + */ +class AndroidDeviceToolRegistrarTest { + + private val fakeDevice = MobileDevice( + id = "emulator-5554", + name = "Pixel_6", + type = DeviceType.ANDROID, + state = "device" + ) + + private val fakeDeviceConfig = object : DeviceConfig { + var apps: List = listOf("com.example.app1", "com.example.app2") + var appInfo: String = "versionName=1.0.0\nversionCode=1" + var launchResult: Boolean = true + var shellOutput: String = "[ro.product.model]: [Pixel 6]\n[ro.build.version.release]: [13]\n[ro.build.version.sdk]: [33]" + + override suspend fun listDevices() = listOf(fakeDevice) + override suspend fun getFirstAvailableDevice() = fakeDevice + override suspend fun listApps(deviceId: String?) = apps + override suspend fun getAppInfo(packageName: String, deviceId: String?) = appInfo + override suspend fun launchApp(packageName: String, activityName: String?, deviceId: String?) = launchResult + override suspend fun executeShell(command: String, deviceId: String?) = shellOutput + } + + private val registrar = AndroidDeviceToolRegistrar(fakeDeviceConfig) + + @Test + fun `availableDevice returns formatted device info`() = runBlocking { + val result = registrar.availableDevice() + assertTrue(result.contains("Serial: emulator-5554")) + assertTrue(result.contains("State: device")) + } + + @Test + fun `listApps returns formatted app list`() = runBlocking { + val result = registrar.listApps() + assertTrue(result.contains("com.example.app1")) + assertTrue(result.contains("com.example.app2")) + } + + @Test + fun `listApps handles empty list`() = runBlocking { + fakeDeviceConfig.apps = emptyList() + val result = registrar.listApps() + assertEquals("No apps found on the device", result) + } + + @Test + fun `infoApp delegates to DeviceConfig`() = runBlocking { + val result = registrar.infoApp("com.example.app1") + // ToolHelpers.formatAppInfo is tested elsewhere; just verify delegation + assertFalse(result.isEmpty()) + } + + @Test + fun `launchApp success`() = runBlocking { + val result = registrar.launchApp("com.example.app1") + assertEquals("Successfully launched the app: com.example.app1", result) + } + + @Test + fun `launchApp failure`() = runBlocking { + fakeDeviceConfig.launchResult = false + val result = registrar.launchApp("com.example.app1") + assertEquals("Failed to launch the app: com.example.app1", result) + } +} diff --git a/app/src/test/kotlin/com/example/visiontest/tools/AndroidScreenshotToolTest.kt b/app/src/test/kotlin/com/example/visiontest/tools/AndroidScreenshotToolTest.kt index 2402eb9..dabf2ce 100644 --- a/app/src/test/kotlin/com/example/visiontest/tools/AndroidScreenshotToolTest.kt +++ b/app/src/test/kotlin/com/example/visiontest/tools/AndroidScreenshotToolTest.kt @@ -1,5 +1,6 @@ package com.example.visiontest.tools +import com.example.visiontest.ServerNotRunningException import com.example.visiontest.android.AutomationClient import com.example.visiontest.common.DeviceConfig import com.example.visiontest.discovery.ToolDiscovery @@ -115,9 +116,11 @@ class AndroidScreenshotToolTest { mockServer.enqueue(MockResponse().setResponseCode(500)) val target = File(tempDir, "out.png") - val message = registrar.captureScreenshot(target.absolutePath) + val ex = assertFailsWith { + registrar.captureScreenshot(target.absolutePath) + } - assertTrue(message.contains("Automation server is not running")) + assertTrue(ex.message!!.contains("not running")) assertFalse(target.exists(), "No file should be written when server is not running") assertEquals(1, mockServer.requestCount, "Only the health check should have been attempted") } diff --git a/app/src/test/kotlin/com/example/visiontest/tools/IOSDeviceToolRegistrarTest.kt b/app/src/test/kotlin/com/example/visiontest/tools/IOSDeviceToolRegistrarTest.kt new file mode 100644 index 0000000..f259246 --- /dev/null +++ b/app/src/test/kotlin/com/example/visiontest/tools/IOSDeviceToolRegistrarTest.kt @@ -0,0 +1,79 @@ +package com.example.visiontest.tools + +import com.example.visiontest.common.DeviceConfig +import com.example.visiontest.common.DeviceType +import com.example.visiontest.common.MobileDevice +import kotlinx.coroutines.runBlocking +import kotlin.test.* + +/** + * Tests for the extracted `internal suspend` functions on [IOSDeviceToolRegistrar], + * exercised directly without going through ToolScope/MCP. + */ +class IOSDeviceToolRegistrarTest { + + private val fakeDevice = MobileDevice( + id = "ABCD-1234", + name = "iPhone 16", + type = DeviceType.IOS_SIMULATOR, + state = "Booted", + osVersion = "18.0", + modelName = "iPhone 16" + ) + + private val fakeDeviceConfig = object : DeviceConfig { + var apps: List = listOf("com.apple.mobilesafari", "com.apple.Preferences") + var appInfo: String = "BundleID: com.apple.mobilesafari\nPath: /some/path" + var launchResult: Boolean = true + + override suspend fun listDevices() = listOf(fakeDevice) + override suspend fun getFirstAvailableDevice() = fakeDevice + override suspend fun listApps(deviceId: String?) = apps + override suspend fun getAppInfo(packageName: String, deviceId: String?) = appInfo + override suspend fun launchApp(packageName: String, activityName: String?, deviceId: String?) = launchResult + override suspend fun executeShell(command: String, deviceId: String?) = "" + } + + private val registrar = IOSDeviceToolRegistrar(fakeDeviceConfig) + + @Test + fun `availableDevice returns formatted device info`() = runBlocking { + val result = registrar.availableDevice() + assertTrue(result.contains("ID: ABCD-1234")) + assertTrue(result.contains("Name: iPhone 16")) + assertTrue(result.contains("State: Booted")) + assertTrue(result.contains("OS Version: 18.0")) + } + + @Test + fun `listApps returns formatted app list`() = runBlocking { + val result = registrar.listApps() + assertTrue(result.contains("com.apple.mobilesafari")) + } + + @Test + fun `listApps handles empty list`() = runBlocking { + fakeDeviceConfig.apps = emptyList() + val result = registrar.listApps() + assertEquals("No apps found on the iOS device", result) + } + + @Test + fun `infoApp formats with bundleId`() = runBlocking { + val result = registrar.infoApp("com.apple.mobilesafari") + assertTrue(result.startsWith("App Information for com.apple.mobilesafari:")) + } + + @Test + fun `launchApp success`() = runBlocking { + val result = registrar.launchApp("com.apple.mobilesafari") + assertEquals("Successfully launched the iOS app: com.apple.mobilesafari", result) + } + + @Test + fun `launchApp failure`() = runBlocking { + fakeDeviceConfig.launchResult = false + val result = registrar.launchApp("com.apple.mobilesafari") + assertEquals("Failed to launch the iOS app: com.apple.mobilesafari", result) + } +} diff --git a/app/src/test/kotlin/com/example/visiontest/tools/IOSScreenshotToolTest.kt b/app/src/test/kotlin/com/example/visiontest/tools/IOSScreenshotToolTest.kt index 9f8765d..73b4bbf 100644 --- a/app/src/test/kotlin/com/example/visiontest/tools/IOSScreenshotToolTest.kt +++ b/app/src/test/kotlin/com/example/visiontest/tools/IOSScreenshotToolTest.kt @@ -1,5 +1,6 @@ package com.example.visiontest.tools +import com.example.visiontest.ServerNotRunningException import com.example.visiontest.common.DeviceConfig import com.example.visiontest.discovery.ToolDiscovery import com.example.visiontest.ios.IOSAutomationClient @@ -115,9 +116,11 @@ class IOSScreenshotToolTest { mockServer.enqueue(MockResponse().setResponseCode(500)) val target = File(tempDir, "out.png") - val message = registrar.captureScreenshot(target.absolutePath) + val ex = assertFailsWith { + registrar.captureScreenshot(target.absolutePath) + } - assertTrue(message.contains("iOS automation server is not running")) + assertTrue(ex.message!!.contains("not running")) assertFalse(target.exists(), "No file should be written when server is not running") assertEquals(1, mockServer.requestCount, "Only the health check should have been attempted") } diff --git a/docs/installation.md b/docs/installation.md index 309ac24..a61deff 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -11,13 +11,20 @@ Users install with `curl -fsSL https://github.com/docer1990/visiontest/releases/ 6. On macOS arm64: downloads `ios-automation-server.tar.gz` + checksum, extracts pre-built iOS XCUITest bundle to `ios-automation-server/` subdirectory (skipped on Linux and macOS x86_64) 7. Installs JAR, APKs, and iOS bundle to `~/.local/share/visiontest/` (customizable via `VISIONTEST_DIR` env var, must be under `$HOME`) 8. Creates wrapper script at `~/.local/bin/visiontest`, ensures PATH -9. Does not modify Claude Desktop configuration; use `run-visiontest.sh` or manual setup for Claude integration. +9. Downloads `AGENT_INSTRUCTIONS.md` and installs VisionTest CLI instructions into detected AI coding agents: + - **Claude Code** (`claude`): creates skill at `~/.claude/skills/visiontest-mobile/SKILL.md` + - **OpenCode** (`opencode`): appends to `~/.config/opencode/AGENTS.md` + - **Codex** (`codex`): appends to `~/.codex/instructions.md` + - **Copilot CLI** (`gh copilot`): appends to `~/.github/copilot-instructions.md` + - Uses `` markers for idempotent updates + - Skip with `--skip-agent-setup` flag +10. Does not modify Claude Desktop configuration; use `run-visiontest.sh` or manual setup for Claude integration. **Security hardening:** `umask 077`, explicit `chmod` on all files/dirs, tag validation, checksum verification, install path restricted to `$HOME`. ## Release Workflow (`.github/workflows/release.yaml`) -Triggered by git tags matching `v*`. The workflow runs the test suite, builds the fat JAR via `shadowJar`, Android APKs, and the pre-built iOS XCUITest bundle (on a macOS runner), generates SHA-256 checksums, and creates a GitHub Release with the following assets: `visiontest.jar`, `visiontest.jar.sha256`, `automation-server.apk`, `automation-server.apk.sha256`, `automation-server-test.apk`, `automation-server-test.apk.sha256`, `ios-automation-server.tar.gz`, `ios-automation-server.tar.gz.sha256`, `install.sh`, `run-visiontest.sh`. +Triggered by git tags matching `v*`. The workflow runs the test suite, builds the fat JAR via `shadowJar`, Android APKs, and the pre-built iOS XCUITest bundle (on a macOS runner), generates SHA-256 checksums, and creates a GitHub Release with the following assets: `visiontest.jar`, `visiontest.jar.sha256`, `automation-server.apk`, `automation-server.apk.sha256`, `automation-server-test.apk`, `automation-server-test.apk.sha256`, `ios-automation-server.tar.gz`, `ios-automation-server.tar.gz.sha256`, `AGENT_INSTRUCTIONS.md`, `install.sh`, `run-visiontest.sh`. All GitHub Actions in both workflows are pinned to commit SHAs for supply-chain security. When updating or adding actions, always use SHA-pinned references instead of floating version tags. @@ -38,3 +45,15 @@ Used for development and Claude Desktop config. JAR resolution order: - Android SDK — only needed for building the automation-server module from source > **Quick start:** Users who just need the MCP server can run `curl -fsSL https://github.com/docer1990/visiontest/releases/latest/download/install.sh | bash` — only Java 17+ is required. + +## CLI Usage + +After installation, `visiontest` with no arguments starts the MCP stdio server (unchanged behavior). To use the CLI, pass a subcommand: + +```bash +visiontest screenshot --platform android +visiontest get_interactive_elements --platform ios +visiontest tap_by_coordinates --platform android 540 1200 +``` + +Every command requires `--platform android` or `--platform ios`. Run `visiontest --help` for the full command list. See `CLAUDE.md` for the complete CLI reference. diff --git a/install.sh b/install.sh index 10a73df..d19bb2c 100755 --- a/install.sh +++ b/install.sh @@ -2,12 +2,44 @@ # VisionTest MCP Server Installer # Usage: curl -fsSL https://github.com/docer1990/visiontest/releases/latest/download/install.sh | bash # +# Flags: +# --skip-agent-setup Skip installing AI agent instructions +# --local-jar PATH Use a local JAR instead of downloading from GitHub Releases +# (also skips APK/iOS bundle download — useful for testing) +# # Environment variables: # VISIONTEST_DIR — override install directory (default: ~/.local/share/visiontest) set -eu umask 077 +# ---------- parse flags ---------- +SKIP_AGENT_SETUP=false +LOCAL_JAR="" +for arg in "$@"; do + case "$arg" in + --skip-agent-setup) SKIP_AGENT_SETUP=true ;; + --local-jar) _EXPECT_JAR_PATH=true ;; + *) + if [ "${_EXPECT_JAR_PATH:-}" = "true" ]; then + LOCAL_JAR="$arg" + _EXPECT_JAR_PATH="" + fi + ;; + esac +done + +if [ "${_EXPECT_JAR_PATH:-}" = "true" ]; then + printf ' \033[1;31mx\033[0m --local-jar requires a file path argument\n' >&2 + exit 2 +fi +unset _EXPECT_JAR_PATH + +if [ -n "$LOCAL_JAR" ] && [ ! -f "$LOCAL_JAR" ]; then + printf ' \033[1;31mx\033[0m --local-jar: file not found: %s\n' "$LOCAL_JAR" >&2 + exit 2 +fi + REPO="docer1990/visiontest" BIN_DIR="$HOME/.local/bin" @@ -338,6 +370,29 @@ download_ios_bundle() { ok "iOS bundle installed to $IOS_FINAL_DIR/" } +# ---------- install from local JAR (testing mode) ---------- + +install_local_jar() { + mkdir -p "$RESOLVED_VISIONTEST_HOME" + chmod 700 "$RESOLVED_VISIONTEST_HOME" + + info "Installing from local JAR: $LOCAL_JAR" + cp -f "$LOCAL_JAR" "$RESOLVED_VISIONTEST_HOME/visiontest.jar" + chmod 600 "$RESOLVED_VISIONTEST_HOME/visiontest.jar" + printf 'local-dev\n' > "$RESOLVED_VISIONTEST_HOME/version.txt" + chmod 600 "$RESOLVED_VISIONTEST_HOME/version.txt" + ok "Installed local JAR to $RESOLVED_VISIONTEST_HOME/visiontest.jar" + + # Copy AGENT_INSTRUCTIONS.md from repo if present alongside install.sh + local SCRIPT_DIR + SCRIPT_DIR=$(cd "$(dirname "$0")" && pwd) + if [ -f "$SCRIPT_DIR/AGENT_INSTRUCTIONS.md" ]; then + cp -f "$SCRIPT_DIR/AGENT_INSTRUCTIONS.md" "$RESOLVED_VISIONTEST_HOME/AGENT_INSTRUCTIONS.md" + chmod 600 "$RESOLVED_VISIONTEST_HOME/AGENT_INSTRUCTIONS.md" + ok "Copied AGENT_INSTRUCTIONS.md from repo" + fi +} + # ---------- create wrapper script ---------- create_wrapper() { @@ -391,6 +446,129 @@ ensure_path() { export PATH="$BIN_DIR:$PATH" } +# ---------- install AI agent instructions ---------- + +MARKER_BEGIN="" +MARKER_END="" + +download_agent_instructions() { + info "Downloading agent instructions..." + local INSTRUCTIONS_URL="https://github.com/$REPO/releases/download/$LATEST_TAG/AGENT_INSTRUCTIONS.md" + local DEST="$RESOLVED_VISIONTEST_HOME/AGENT_INSTRUCTIONS.md" + + if command -v curl >/dev/null 2>&1; then + curl -fsSL -o "$DEST" "$INSTRUCTIONS_URL" + elif command -v wget >/dev/null 2>&1; then + wget -q -O "$DEST" "$INSTRUCTIONS_URL" + else + warn "Neither curl nor wget found; skipping agent instructions download" + return + fi + chmod 600 "$DEST" +} + +# Appends or replaces the VisionTest instruction block in a target file. +# Uses BEGIN/END markers for idempotent updates. +# Usage: append_with_markers +append_with_markers() { + local TARGET="$1" + local CONTENT="$2" + + if [ -f "$TARGET" ] && grep -qF "$MARKER_BEGIN" "$TARGET"; then + # Replace existing block: remove old markers + content, append new + # Use a temp file to avoid sed -i portability issues (GNU vs BSD) + local TMP + TMP=$(mktemp "${TARGET}.XXXXXX") + awk -v begin="$MARKER_BEGIN" -v end="$MARKER_END" ' + $0 == begin { skip=1; next } + $0 == end { skip=0; next } + !skip { print } + ' "$TARGET" > "$TMP" + mv -f "$TMP" "$TARGET" + fi + + # Append the block + { + echo "" + echo "$MARKER_BEGIN" + echo "$CONTENT" + echo "$MARKER_END" + } >> "$TARGET" +} + +install_agent_instructions() { + if [ "$SKIP_AGENT_SETUP" = "true" ]; then + info "Skipping AI agent setup (--skip-agent-setup)" + return + fi + + local INSTRUCTIONS_FILE="$RESOLVED_VISIONTEST_HOME/AGENT_INSTRUCTIONS.md" + if [ ! -f "$INSTRUCTIONS_FILE" ]; then + warn "Agent instructions not found, skipping agent setup" + return + fi + + local INSTRUCTIONS + INSTRUCTIONS=$(cat "$INSTRUCTIONS_FILE") + local AGENTS_CONFIGURED="" + + # --- Claude Code --- + if command -v claude >/dev/null 2>&1; then + local CLAUDE_SKILL_DIR="$HOME/.claude/skills/visiontest-mobile" + mkdir -p "$CLAUDE_SKILL_DIR" + cat > "$CLAUDE_SKILL_DIR/SKILL.md" </dev/null 2>&1; then + local OPENCODE_DIR="$HOME/.config/opencode" + mkdir -p "$OPENCODE_DIR" + local OPENCODE_TARGET="$OPENCODE_DIR/AGENTS.md" + append_with_markers "$OPENCODE_TARGET" "$INSTRUCTIONS" + chmod 644 "$OPENCODE_TARGET" + ok "OpenCode: updated $OPENCODE_TARGET" + AGENTS_CONFIGURED="${AGENTS_CONFIGURED}opencode " + fi + + # --- Codex (OpenAI) --- + if command -v codex >/dev/null 2>&1; then + local CODEX_DIR="$HOME/.codex" + mkdir -p "$CODEX_DIR" + local CODEX_TARGET="$CODEX_DIR/instructions.md" + append_with_markers "$CODEX_TARGET" "$INSTRUCTIONS" + chmod 644 "$CODEX_TARGET" + ok "Codex: updated $CODEX_TARGET" + AGENTS_CONFIGURED="${AGENTS_CONFIGURED}codex " + fi + + # --- GitHub Copilot CLI --- + if command -v gh >/dev/null 2>&1 && gh extension list 2>/dev/null | grep -q "copilot"; then + local COPILOT_DIR="$HOME/.github" + mkdir -p "$COPILOT_DIR" + local COPILOT_TARGET="$COPILOT_DIR/copilot-instructions.md" + append_with_markers "$COPILOT_TARGET" "$INSTRUCTIONS" + chmod 644 "$COPILOT_TARGET" + ok "Copilot: updated $COPILOT_TARGET" + AGENTS_CONFIGURED="${AGENTS_CONFIGURED}copilot " + fi + + if [ -z "$AGENTS_CONFIGURED" ]; then + info "No supported AI coding agents detected (checked: claude, opencode, codex, gh copilot)" + info "You can manually copy $INSTRUCTIONS_FILE into your agent's config" + fi +} + # ---------- main ---------- main() { @@ -400,28 +578,39 @@ main() { detect_platform check_java - fetch_latest_version - download_jar - download_apks - download_ios_bundle - # Disarm the cleanup trap since all downloads succeeded - trap - EXIT + + if [ -n "$LOCAL_JAR" ]; then + install_local_jar + LATEST_TAG="local-dev" + else + fetch_latest_version + download_jar + download_apks + download_ios_bundle + download_agent_instructions + # Disarm the cleanup trap since all downloads succeeded + trap - EXIT + fi + create_wrapper ensure_path + install_agent_instructions echo "" ok "VisionTest $LATEST_TAG installed successfully!" echo "" echo " Installed:" echo " JAR: $RESOLVED_VISIONTEST_HOME/visiontest.jar" - echo " APKs: $RESOLVED_VISIONTEST_HOME/automation-server.apk" - echo " $RESOLVED_VISIONTEST_HOME/automation-server-test.apk" - if [ "$PLATFORM" = "macOS" ] && [ "$ARCH" = "arm64" ]; then - echo " iOS: $RESOLVED_VISIONTEST_HOME/ios-automation-server/" - elif [ "$PLATFORM" = "macOS" ]; then - echo " iOS: (not installed — pre-built bundle is arm64 only; build from source)" - else - echo " iOS: (not installed — macOS only)" + if [ -z "$LOCAL_JAR" ]; then + echo " APKs: $RESOLVED_VISIONTEST_HOME/automation-server.apk" + echo " $RESOLVED_VISIONTEST_HOME/automation-server-test.apk" + if [ "$PLATFORM" = "macOS" ] && [ "$ARCH" = "arm64" ]; then + echo " iOS: $RESOLVED_VISIONTEST_HOME/ios-automation-server/" + elif [ "$PLATFORM" = "macOS" ]; then + echo " iOS: (not installed — pre-built bundle is arm64 only; build from source)" + else + echo " iOS: (not installed — macOS only)" + fi fi echo "" echo " Run the MCP server:" @@ -430,7 +619,11 @@ main() { echo " For Claude Code, add with:" echo " claude mcp add visiontest java -- -jar $RESOLVED_VISIONTEST_HOME/visiontest.jar" echo "" + echo " CLI usage (in any project):" + echo " visiontest --help" + echo "" echo " To update later, re-run this script." + echo " To skip agent config: install.sh --skip-agent-setup" echo "" } diff --git a/openspec/changes/add-cli-mode/.openspec.yaml b/openspec/changes/add-cli-mode/.openspec.yaml new file mode 100644 index 0000000..4b8c565 --- /dev/null +++ b/openspec/changes/add-cli-mode/.openspec.yaml @@ -0,0 +1,2 @@ +schema: spec-driven +created: 2026-04-21 diff --git a/openspec/changes/add-cli-mode/design.md b/openspec/changes/add-cli-mode/design.md new file mode 100644 index 0000000..51a8119 --- /dev/null +++ b/openspec/changes/add-cli-mode/design.md @@ -0,0 +1,140 @@ +## Context + +VisionTest's MCP tool registrations in `app/src/main/kotlin/com/example/visiontest/tools/` mix three concerns in each handler: + +1. **Arg extraction** — reading typed parameters out of `CallToolRequest` via `requireString`, `optionalInt`, etc. +2. **Business logic** — checking server state, calling `AutomationClient` / `IOSAutomationClient` / `Android` / `IOSManager`, parsing responses, formatting a human-readable result string. +3. **Framing** — `ToolScope.tool { ... }` wraps the handler with `withTimeout`, `TimeoutCancellationException` → `TimeoutException`, and `ErrorHandler.handleToolError(...)` for error mapping into `CallToolResult`. + +(2) is identical between the MCP and CLI use cases. (1) and (3) are MCP-specific. To support a CLI without duplicating (2), the business logic needs to be extractable from under the MCP framing — which is exactly what we do. + +The underlying clients (`AutomationClient`, `IOSAutomationClient`, `Android`, `IOSManager`) are already MCP-free suspend-function surfaces. Some tool registrars also contain helper logic with no MCP dependency (e.g. `AndroidAutomationToolRegistrar.captureScreenshot`, `resolveScreenshotPath`, `writeScreenshot` are already `internal suspend` functions). The refactor extends this pattern to every tool: the body of each tool becomes an `internal suspend` function taking plain typed parameters, and the MCP registration shrinks to "read args → call function → return string". + +Prior art for `main(args)` dispatch in a Kotlin/JVM app with clikt is straightforward: + +``` +main(args): + if args.isEmpty() or args[0] == "serve": + + else: + VisionTestCli().main(args) +``` + +The existing MCP path is unchanged when args are empty (or `serve`), which keeps the Claude Code / Claude desktop launchers working without any config change. + +## Goals / Non-Goals + +**Goals:** +- Expose the MVP subset of 13 automation operations as CLI commands usable from shells, scripts, and LLM skills. +- Share one implementation per operation between MCP and CLI (no business-logic duplication). +- Preserve the current MCP stdio behavior exactly when `visiontest` is invoked with no args. +- Give LLM consumers output that's prose-rich (good for chain-of-thought) and errors with granular exit codes (good for retry logic). +- Ship a reference LLM skill that teaches the standard automation loop through the CLI. +- Require `--platform` explicitly on every command — no implicit defaults that could silently target the wrong OS. + +**Non-Goals:** +- A `--json` output mode. The MCP tool strings today are LLM-friendly and stable; `--json` is deferred to a later change once we know which commands actually need structured output for scripting. +- A long-running daemon / `visiontestd` process. Each CLI invocation spins up a fresh JVM (~1–2 s cold). The Android automation server on-device persists between invocations, so only the first call in a session pays the setup cost; subsequent commands reuse the running server. Daemon mode can be added later if interactive use emerges. +- Exposing every MCP tool. `find_element`, `swipe`, `swipe_on_element`, `list_apps`, `info_app`, `available_device`, `ios_stop_automation_server` are deliberately deferred to a post-MVP change once we see what's actually friction-inducing in practice. +- Renaming MCP tools or changing the MCP output format. MCP-side behavior is preserved exactly. +- Extracting a separate `core` Gradle module. The refactor keeps handler bodies on the existing `ToolRegistrar` classes (or as top-level functions in the `tools/` package). A module split is a bigger refactor that can happen later if the CLI and MCP surfaces diverge further. + +## Decisions + +### Decision 1: Dispatch via `main(args)` branching, not separate `main` per mode + +**Choice:** One `main(args)` function. If `args` is empty or `args[0] == "serve"`, run the existing MCP stdio server. Otherwise, construct a clikt-based `VisionTestCli` root command and delegate to it. + +**Alternatives considered:** +- **Separate entry points (`MainMcp.kt`, `MainCli.kt`) with different Gradle tasks:** Doubles the launcher surface. Users and automation hosts already call `java -jar visiontest.jar`; asking them to call a different JAR or pass `-Dmode=cli` is a worse UX than a single entry point that dispatches on args. +- **Clikt "default subcommand":** Clikt supports invoking a default subcommand when no subcommand is given, but the MCP path doesn't fit cleanly inside a clikt command — it consumes stdin as a binary protocol and blocks indefinitely, whereas clikt commands expect typed parse → run → return. Branching before entering clikt avoids this impedance mismatch. + +**Rationale:** The simplest backward-compatible shape. Preserves every existing MCP launcher (including `install.sh`'s `~/.local/bin/visiontest` wrapper and the shadowJar `-Main-Class`) without configuration changes. + +### Decision 2: Extract each tool's body into an `internal suspend` function on its registrar + +**Choice:** For every tool currently registered in `AndroidAutomationToolRegistrar`, `IOSAutomationToolRegistrar`, `AndroidDeviceToolRegistrar`, and `IOSDeviceToolRegistrar`, move the handler body into an `internal suspend` function whose parameters are the typed inputs (not a `CallToolRequest`). The MCP registration becomes a thin wrapper: + +```kotlin +scope.tool(name = "android_tap_by_coordinates", ...) { request -> + tapByCoordinates(request.requireInt("x"), request.requireInt("y")) +} + +internal suspend fun tapByCoordinates(x: Int, y: Int): String { /* body */ } +``` + +The CLI subcommand calls `tapByCoordinates(x, y)` directly. + +**Alternatives considered:** +- **A new `core` Gradle module holding pure "service" classes:** Bigger refactor, forces package moves across the repo, and creates a symmetry between MCP and CLI that's not needed yet. We can split later if the two facades grow apart. +- **Put extracted functions as top-level in the `tools/` package:** Works, but scatters related code. Keeping them on the registrar preserves the existing file layout and keeps per-tool state (e.g. references to `automationClient`, `discovery`) in scope without wider refactor. + +**Rationale:** Minimal churn, maximum reuse. The registrars already contain per-tool helpers for complex cases (`captureScreenshot`, `resolveScreenshotPath`, `writeScreenshot` in `AndroidAutomationToolRegistrar`); this generalises that pattern. + +### Decision 3: `--platform android|ios` is a required flag on every CLI command + +**Choice:** Every CLI subcommand accepts `--platform` (short form `-p`) and requires a value of exactly `android` or `ios`. No default. No env-var fallback. No auto-detection from "which device is connected". + +The two Android-only commands (`install_automation_server`, `press_back`) reject `--platform ios` with exit code `5` (platform-not-supported-for-command) and a clear error message. + +**Alternatives considered:** +- **Default to the only connected device:** Convenient for humans, dangerous for LLMs — an agent could silently target the wrong platform when both are running. Skill authors can't easily predict what's connected in a user's environment. +- **Separate top-level subcommands (`visiontest android tap ...`, `visiontest ios tap ...`):** Nested subcommands read more naturally to humans but bloat the command tree and make the two platforms feel like separate products. The flag form keeps platform as a dimension of every call and scales cleanly if we ever add a third platform. +- **Env var `VISION_TEST_PLATFORM`:** Hidden state is bad for LLMs — an agent can't see what it's about to do. Env vars in skill instructions also compose poorly when the same agent switches platforms mid-session. + +**Rationale:** Explicit > implicit, especially when the consumer is an LLM reading back its own commands. The ergonomic cost (typing `--platform android` every time) is absorbed by the skill, not the human. + +### Decision 4: CLI command names are underscored, platform-less + +**Choice:** CLI subcommand names are the MCP tool names with the `android_` / `ios_` prefix stripped, preserving underscores. So `android_tap_by_coordinates` / `ios_tap_by_coordinates` both become `tap_by_coordinates` with `--platform` as the discriminator. `start_automation_server` / `ios_start_automation_server` become `start_automation_server`. `get_interactive_elements` / `ios_get_interactive_elements` become `get_interactive_elements`. Screenshot becomes `screenshot`. + +**Alternatives considered:** +- **Keep MCP names verbatim (including `android_` / `ios_` prefix):** Redundant when `--platform` is already a flag. Forces the LLM to remember both a command *and* a platform that have to agree. +- **Hyphenated names (`tap-by-coordinates`, `get-interactive-elements`):** More shell-idiomatic, but diverges from the MCP names the user explicitly asked us to keep underscored. User preference wins. + +**Rationale:** Explicit user decision. Keeping underscores means the same operation has the same *verb* across MCP and CLI; only the platform indication differs in form (prefix vs flag). + +### Decision 5: Success is prose on stdout; errors are prose on stderr + granular exit code + +**Choice:** On success, print the MCP tool's return string to stdout and exit `0`. On failure, print the error message to stderr and exit with one of: + +``` +0 success +1 generic failure (unhandled exception, automation-server crash, etc.) +2 usage error (bad flag, missing required arg, invalid direction value) +3 automation server not running / not reachable +4 device or simulator not found +5 platform not supported for this command (e.g. --platform ios on install_automation_server) +``` + +The prose message matches the MCP tool output as closely as possible so LLM-facing text is identical across the two facades. + +**Alternatives considered:** +- **Boolean 0/1 exit codes:** Forces the LLM (or a wrapper script) to grep stderr to decide whether to retry — fragile and easy to get wrong. +- **Machine-readable structured output by default:** Would duplicate the MCP prose format which is already LLM-friendly. Structured output is deferred to a later `--json` mode. + +**Rationale:** Granular codes are cheap to implement (a `CliExit` exception carrying a code + message, thrown from the shared suspend functions or mapped at the CLI boundary) and immediately useful: a skill can script "retry after starting the server" specifically on exit `3`, with no string parsing. + +### Decision 6: Include a reference skill in the repo + +**Choice:** Ship `.claude/skills/visiontest-mobile/SKILL.md` alongside the CLI. It teaches an LLM the standard loop: `start_automation_server` → `screenshot` → `get_interactive_elements` → `tap_by_coordinates` → repeat. It documents the exit-code contract so the agent knows when to recover versus report. + +**Alternatives considered:** +- **Leave skill authoring to users:** Possible, but leaves the CLI's intended consumer (LLM via skill) undocumented and slows adoption. +- **Put the skill only in docs/:** Skills live in `.claude/skills/` by convention; putting it there makes it usable on this repo itself (dogfooding) and works as a copy-paste template. + +**Rationale:** The CLI's primary design audience is "an LLM using this via a skill". Shipping the skill is how we validate that the CLI actually serves that use case. + +### Decision 7: Defer `--json`, daemon mode, and full tool parity to later changes + +**Choice:** MVP = 13 commands, prose output, fresh JVM per invocation. `--json`, a `visiontestd` daemon, `find_element`, `swipe`, `swipe_on_element`, `list_apps`, `info_app`, and `available_device` are out of scope. + +**Rationale:** Ship the minimum useful surface first, then iterate based on real skill-authoring experience. Each of the deferred items has a clear later change in the pipeline if friction shows up. + +## Risks / Trade-offs + +- **Handler-extraction refactor touches 30+ tools.** Low risk technically (mechanical), but a large-ish diff. Mitigated by keeping existing MCP tests green throughout (no behavior change on the MCP side). +- **New CLI dependency (clikt).** Well-maintained, pure-JVM, ~400 KB in the fat JAR. Acceptable cost. Alternative was picocli (also good); clikt picked for Kotlin idiomatic DSL. +- **~1–2 s JVM cold start per CLI command.** Noticeable if a skill issues many commands back-to-back, but acceptable for MVP. The on-device Android server persists across invocations, so only the first call in a session pays the server-start cost. If latency becomes painful, Decision 7 keeps daemon mode as an obvious follow-up. +- **Platform flag on every command is verbose.** Mitigated by the skill doing the typing; not a human-facing concern. Keeps the LLM's commands self-documenting when written into a transcript. +- **`visiontest` with no args stays MCP stdio forever.** Means we can never repurpose the no-arg form for a CLI help screen. Acceptable — `visiontest --help` still works, and the no-arg behavior is load-bearing for existing MCP launchers. diff --git a/openspec/changes/add-cli-mode/proposal.md b/openspec/changes/add-cli-mode/proposal.md new file mode 100644 index 0000000..8c878da --- /dev/null +++ b/openspec/changes/add-cli-mode/proposal.md @@ -0,0 +1,42 @@ +## Why + +VisionTest is currently only reachable through the MCP protocol, which works well when an agent runs inside an MCP-aware host (Claude Code, the Claude desktop app, etc.). But a growing use case is LLM agents driven by **skills** — small, focused instruction bundles that teach an agent to use a CLI. Skills are significantly simpler to distribute, configure, and debug than MCP server integrations, and they let any LLM (not just MCP-native hosts) drive VisionTest. + +Today the only way to reach the underlying automation stack from a skill is to spawn the MCP JAR and speak stdio JSON-RPC to it per call — a bad UX and a gross integration boundary. A first-class CLI surface closes that gap: the same suspend functions that back the MCP tools become directly callable as `visiontest --platform [args]`. + +This change adds subcommand dispatch in the existing `Main.kt` entry point and exposes a curated MVP subset of commands — enough for an LLM skill to run the full automation loop (start server → inspect → interact → screenshot) — while leaving the MCP stdio server intact as the default (no-arg) behavior for backward compatibility. + +## What Changes + +- Add subcommand dispatch to `Main.kt`: when `args` is empty (or equals `serve`), run the existing MCP stdio server unchanged; otherwise route to the CLI dispatcher. +- Add a CLI dependency (clikt) and a new `cli/` package with one command class per exposed operation plus a root dispatcher. +- Refactor each tool handler body in the four `ToolRegistrar` implementations into a pure `suspend` function on the registrar (or a shared helper). The MCP side continues to wrap these with `ToolScope.tool { ... }`; the CLI side calls the same functions directly, so there is exactly one implementation per operation. +- Expose 13 MVP commands (see the `cli-mode` spec). Every command requires `--platform android|ios`, with two exceptions that are Android-only (`install_automation_server`, `press_back`) and therefore reject `--platform ios` with a clear error. +- Command names use the underscored, platform-less form (e.g. `tap_by_coordinates`, `press_home`) — a CLI-idiomatic spelling shared across platforms with the flag as the discriminator. The underlying MCP tool names (e.g. `android_tap_by_coordinates`, `ios_tap_by_coordinates`) are unchanged. +- Success output is the same prose string the MCP tool returns today (optimised for LLM consumption). Errors go to stderr with a granular non-zero exit code. A `--json` mode is explicitly deferred to a later change. +- Granular exit codes: `0` success, `1` generic failure, `2` usage error, `3` automation server not reachable, `4` device/simulator not found, `5` platform-not-supported-for-command. +- Add a reference skill (`.claude/skills/visiontest-mobile/SKILL.md`) that teaches an LLM the standard automation loop through the CLI. Intended both as in-repo dogfood and as a template users can copy into their own projects. + +## Capabilities + +### New Capabilities +- `cli-mode`: A first-class command-line interface to VisionTest's mobile automation operations, usable directly from shells, scripts, and LLM-driven skills. Covers the MVP subset of 13 commands spanning server lifecycle, inspection, interaction, navigation, and app launch on both Android and iOS. + +### Modified Capabilities + + +## Impact + +- **MCP server (`app/`)** — + - `Main.kt` gains a top-level `args`-based dispatch; empty args preserve current MCP stdio behavior. + - New `app/src/main/kotlin/com/example/visiontest/cli/` package housing the root `VisionTestCli` command and 13 subcommand classes. + - Each `ToolRegistrar` grows `internal suspend` functions carrying the body of each tool (arg extraction stays in the MCP handler; the *work* moves into the shared function). Existing MCP behavior and test coverage is preserved. + - New dependency: `com.github.ajalt.clikt:clikt:4.x` (pure-JVM, no native or reflection surprises; matches the existing "no magic numbers, plain Kotlin" posture of the codebase). +- **Automation servers (`automation-server/`, `ios-automation-server/`)** — Unchanged. The CLI talks to them through the exact same `AutomationClient` / `IOSAutomationClient` paths the MCP server already uses. +- **Launcher / release** — `install.sh` and the GitHub Actions release workflow need no changes: the launcher script still runs `java -jar visiontest.jar "$@"`, and `"$@"` now reaches the CLI dispatcher when the user passes args. Current no-arg MCP invocations (Claude Code, Claude desktop) continue to work unchanged. +- **Tests** — New pure-JVM unit tests for (a) each CLI subcommand's argument parsing and delegation, (b) the root dispatcher's `serve` vs subcommand routing, (c) exit-code mapping for each error class. MCP-side tests are unchanged except where they need to follow the handler-body extraction refactor (mechanical updates). +- **Docs** — + - `CLAUDE.md` gains a new "CLI Usage" section with the full command list and flag reference. + - A reference `.claude/skills/visiontest-mobile/SKILL.md` ships in the repo. + - `LEARNING.md` gets a short entry explaining the dual-facade (MCP + CLI) pattern and the handler-extraction refactor. +- **External surface** — New CLI commands. No breaking changes to MCP tools or automation-server JSON-RPC methods. diff --git a/openspec/changes/add-cli-mode/specs/cli-mode/spec.md b/openspec/changes/add-cli-mode/specs/cli-mode/spec.md new file mode 100644 index 0000000..dcbb8c0 --- /dev/null +++ b/openspec/changes/add-cli-mode/specs/cli-mode/spec.md @@ -0,0 +1,144 @@ +## ADDED Requirements + +### Requirement: Single entry point dispatches between MCP stdio and CLI modes + +The `visiontest` JAR SHALL route invocations to the MCP stdio server when no arguments are passed (or the first argument is `serve`), and to the CLI subcommand dispatcher otherwise. The MCP stdio behavior MUST be byte-for-byte identical to the pre-change behavior when no arguments are passed, to preserve compatibility with existing agent hosts and the `install.sh` launcher. + +#### Scenario: No-argument invocation runs the MCP stdio server + +- **WHEN** `java -jar visiontest.jar` is executed with no arguments +- **THEN** the process starts the MCP server, connects `StdioServerTransport` to `System.in` / `System.out`, registers all existing MCP tools, and waits on `server.onClose` exactly as before this change + +#### Scenario: Explicit `serve` subcommand runs the MCP stdio server + +- **WHEN** `java -jar visiontest.jar serve` is executed +- **THEN** the behavior is identical to the no-argument case + +#### Scenario: Unknown first argument enters the CLI dispatcher + +- **WHEN** `java -jar visiontest.jar ` is executed +- **THEN** the process constructs the CLI root command and delegates argument parsing to it; the MCP stdio server is NOT started for this invocation + +### Requirement: `--platform` flag is required on every CLI subcommand + +Every CLI subcommand SHALL require a `--platform` (alias `-p`) flag whose value is exactly `android` or `ios`. There MUST be no default, no environment-variable fallback, and no auto-detection. Android-only commands SHALL accept only `--platform android`. + +#### Scenario: Missing `--platform` produces a usage error + +- **WHEN** a CLI subcommand is invoked without `--platform` +- **THEN** the process prints a usage error to stderr and exits with code `2` + +#### Scenario: Invalid `--platform` value is rejected + +- **WHEN** a CLI subcommand is invoked with `--platform windows` (or any value other than `android` / `ios`) +- **THEN** the process prints a usage error naming the allowed values and exits with code `2` + +#### Scenario: Android-only command rejects iOS platform + +- **WHEN** `visiontest install_automation_server --platform ios` (or `visiontest press_back --platform ios`) is invoked +- **THEN** the process prints an error stating the command is Android-only and exits with code `5` + +### Requirement: Exit codes are granular and LLM-scriptable + +The CLI SHALL use a fixed, documented set of exit codes so that a skill or script can act on failures without parsing stderr: + +| Code | Meaning | +|------|---------| +| 0 | Success | +| 1 | Generic failure (unhandled exception, underlying server crash) | +| 2 | Usage error (missing/invalid flag or argument) | +| 3 | Automation server not running / not reachable | +| 4 | Device or simulator not found | +| 5 | Platform not supported for this command | + +#### Scenario: Success exits 0 + +- **WHEN** a CLI subcommand completes without error +- **THEN** the process prints the result to stdout and exits with code `0` + +#### Scenario: Automation server not running maps to exit 3 + +- **WHEN** a CLI subcommand that requires a running automation server is invoked while the server is not reachable +- **THEN** the process prints to stderr a message instructing the caller to run `start_automation_server` and exits with code `3` + +#### Scenario: Device not found maps to exit 4 + +- **WHEN** a CLI subcommand is invoked and `getFirstAvailableDevice()` fails because no device/simulator is connected +- **THEN** the process prints the underlying error message to stderr and exits with code `4` + +#### Scenario: Unhandled exception maps to exit 1 + +- **WHEN** a CLI subcommand throws an unexpected exception during execution +- **THEN** the process prints the exception message to stderr and exits with code `1` + +### Requirement: CLI commands share one implementation with MCP tools + +Each CLI subcommand SHALL call the same `internal suspend` function that backs its MCP tool counterpart. The function MUST take typed parameters (not `CallToolRequest`). MCP behavior and output strings MUST be preserved exactly; the refactor is internal. + +#### Scenario: MCP and CLI produce the same success message for the same inputs + +- **WHEN** `android_tap_by_coordinates` is invoked via MCP with `x=100, y=200` AND `visiontest tap_by_coordinates --platform android 100 200` is invoked +- **THEN** both paths call the same underlying function and produce identical success text (modulo MCP's `TextContent` wrapping vs CLI's raw stdout) + +#### Scenario: Existing MCP tests continue to pass after the extraction refactor + +- **WHEN** `./gradlew :app:test` is executed after the handler-body extraction +- **THEN** all pre-existing MCP tool tests pass without modification to their assertions about tool output strings + +### Requirement: MVP subcommand set + +The CLI SHALL expose exactly the following 13 subcommands in the MVP. No more, no fewer. + +| Subcommand | Platforms | Required args | Optional flags | +|------------|-----------|---------------|----------------| +| `install_automation_server` | android | — | — | +| `start_automation_server` | android, ios | — | — | +| `automation_server_status` | android, ios | — | — | +| `get_interactive_elements` | android, ios | — | `--include-disabled` | +| `get_ui_hierarchy` | android, ios | — | — | +| `get_device_info` | android, ios | — | — | +| `screenshot` | android, ios | — | `--output PATH` | +| `tap_by_coordinates` | android, ios | `x` `y` (ints) | — | +| `input_text` | android, ios | `text` (string) | — | +| `swipe_direction` | android, ios | `direction` (up\|down\|left\|right) | `--distance`, `--speed` | +| `press_back` | android | — | — | +| `press_home` | android, ios | — | — | +| `launch_app` | android, ios | `id` (string) | — | + +#### Scenario: Deferred commands are not exposed + +- **WHEN** the CLI help is listed +- **THEN** `find_element`, `swipe`, `swipe_on_element`, `list_apps`, `info_app`, `available_device`, and `ios_stop_automation_server` are NOT listed as subcommands + +#### Scenario: `screenshot` default output path matches MCP behavior + +- **WHEN** `visiontest screenshot --platform android` (or `ios`) is invoked without `--output` +- **THEN** the PNG is written to `./screenshots/_screenshot_.png` resolved against the CLI process's current working directory — identical to the MCP tool's default + +#### Scenario: `swipe_direction` rejects invalid directions before dispatching + +- **WHEN** `visiontest swipe_direction --platform android diagonal` is invoked +- **THEN** the process exits with code `2` and does NOT invoke the underlying automation function + +### Requirement: Success output is prose on stdout; errors are prose on stderr + +The CLI SHALL print the MCP tool's return string to stdout on success and to stderr on error. There MUST be no structured / JSON wrapping in the MVP. Logging output (from SLF4J loggers) MUST go to stderr so it doesn't contaminate captured stdout in scripts and skills. + +#### Scenario: Success text goes to stdout + +- **WHEN** `visiontest automation_server_status --platform android` succeeds with the server running +- **THEN** the success message is written to stdout (capturable via `$(...)` in a shell script) and stderr is empty for that invocation + +#### Scenario: Error text goes to stderr + +- **WHEN** `visiontest screenshot --platform android` is invoked while the Android automation server is not running +- **THEN** the error message is written to stderr and stdout is empty for that invocation + +### Requirement: Reference skill ships with the CLI + +The repository SHALL include a reference skill file at `.claude/skills/visiontest-mobile/SKILL.md` that teaches an LLM the standard automation loop through the CLI. + +#### Scenario: Skill documents the standard loop + +- **WHEN** an LLM loads the reference skill +- **THEN** the skill body includes: the `start_automation_server` → `screenshot` → `get_interactive_elements` → `tap_by_coordinates` loop, the rule that `--platform` is always required, the exit-code table with recommended actions per code, and the Flutter `contentDescription` gotcha diff --git a/openspec/changes/add-cli-mode/tasks.md b/openspec/changes/add-cli-mode/tasks.md new file mode 100644 index 0000000..b250cdf --- /dev/null +++ b/openspec/changes/add-cli-mode/tasks.md @@ -0,0 +1,93 @@ +## 1. Scaffolding & dependency + +- [x] 1.1 Add `com.github.ajalt.clikt:clikt:4.4.0` (or latest 4.x) to `app/build.gradle.kts` dependencies +- [x] 1.2 Create package `app/src/main/kotlin/com/example/visiontest/cli/` with a placeholder `VisionTestCli.kt` (empty clikt `NoOpCliktCommand` root) to anchor future subcommands +- [x] 1.3 Update `Main.kt` to branch on `args`: if `args.isEmpty()` or `args[0] == "serve"`, run the existing MCP stdio flow (unchanged); otherwise construct `VisionTestCli()` and call its `main(args)`. Preserve the existing logger setup and shutdown hook in both branches +- [x] 1.4 Add a unit test `app/src/test/kotlin/com/example/visiontest/MainDispatchTest.kt` covering (a) empty args keeps MCP path reachable, (b) `serve` keeps MCP path reachable, (c) arbitrary first-arg routes to CLI (mock/fake the two paths so the test doesn't actually start an MCP server) + +## 2. Handler body extraction refactor + +Goal: each tool's body becomes an `internal suspend` function taking typed parameters. MCP registration shrinks to arg extraction + delegation. The CLI calls the same functions. No MCP behavior change. + +- [x] 2.1 `AndroidDeviceToolRegistrar` — extract body of each tool (`available_device_android`, `list_apps_android`, `info_app_android`, `launch_app_android`) into `internal suspend fun` methods on the registrar. Update MCP registrations to delegate. Keep arg extraction in the MCP `scope.tool { ... }` block +- [x] 2.2 `IOSDeviceToolRegistrar` — same treatment for `ios_available_device`, `ios_list_apps`, `ios_info_app`, `ios_launch_app` +- [x] 2.3 `AndroidAutomationToolRegistrar` — extract bodies of every `registerXxx` tool into `internal suspend fun` methods. The existing `captureScreenshot`, `resolveScreenshotPath`, `writeScreenshot` helpers already follow this pattern; generalise across all 15 Android automation tools +- [x] 2.4 `IOSAutomationToolRegistrar` — same treatment for every iOS automation tool +- [x] 2.5 Verify `./gradlew :app:test` stays green — no MCP behavior change expected, refactor is mechanical +- [x] 2.6 Add targeted tests (or expand existing ones) that exercise the extracted functions directly without going through `ToolScope`, to lock in their call shape as a stable public-within-the-module surface + +## 3. CLI root dispatcher & exit-code infrastructure + +- [x] 3.1 In `cli/VisionTestCli.kt`, define the root `VisionTestCli` as a clikt `NoOpCliktCommand` with `subcommands(...)` for all 13 MVP commands (stubs OK at this stage) +- [x] 3.2 Create `cli/CliExit.kt` with a `CliExit(code: Int, message: String) : Exception` type and a sealed enum of the six exit codes (`Success=0`, `GenericFailure=1`, `UsageError=2`, `ServerNotReachable=3`, `DeviceNotFound=4`, `PlatformNotSupported=5`). Document each code's meaning in a KDoc block +- [x] 3.3 Create `cli/CliErrorHandler.kt` with a `runCliCommand(block: suspend () -> String)` helper that: calls `block()`, prints the result to stdout, exits `0` on success; catches `CliExit` → prints to stderr + `exitProcess(code)`; catches `IllegalArgumentException` / clikt usage errors → stderr + `exitProcess(2)`; catches other exceptions → stderr + `exitProcess(1)` +- [x] 3.4 Create `cli/PlatformOption.kt` with a reusable clikt option definition: `--platform` / `-p`, required, `choice("android", "ios")`. Android-only commands override the `choice` to just `"android"` and print `"This command is Android-only"` → exit 5 if the user tries `ios` +- [x] 3.5 Create `cli/ComponentHolder.kt` (or equivalent) — a minimal object graph the CLI can instantiate per invocation to get `Android`, `IOSManager`, `AutomationClient`, `IOSAutomationClient`, and the four registrars without duplicating `Main.kt`'s wiring. Ensure it respects `AppConfig.createDefault()` the same way the MCP path does, and registers the same shutdown hook behavior (close `android` and `ios` on JVM exit) + +## 4. CLI subcommands (one task per command) + +Each subcommand lives in its own file under `cli/commands/`. Every command parses typed args via clikt, obtains its backing function via `ComponentHolder`, and calls it through `runCliCommand`. The function body is the extracted `internal suspend fun` from section 2. + +### Setup + +- [x] 4.1 `InstallAutomationServerCommand` — `visiontest install_automation_server --platform android`. Reject `--platform ios` (exit 5). Delegates to `AndroidAutomationToolRegistrar.installAutomationServer()` +- [x] 4.2 `StartAutomationServerCommand` — `visiontest start_automation_server --platform android|ios`. Delegates to the platform's `startAutomationServer()` extracted function. Timeout mirrors the MCP tool's (30 s Android, 200 s iOS) +- [x] 4.3 `AutomationServerStatusCommand` — `visiontest automation_server_status --platform android|ios`. Delegates to the platform's `automationServerStatus()` function + +### Inspection + +- [x] 4.4 `GetInteractiveElementsCommand` — `visiontest get_interactive_elements --platform android|ios [--include-disabled]`. Delegates to `getInteractiveElements(includeDisabled: Boolean)` on the relevant registrar +- [x] 4.5 `GetUiHierarchyCommand` — `visiontest get_ui_hierarchy --platform android|ios`. Delegates to `getUiHierarchy()`. Use 30 s timeout to match MCP +- [x] 4.6 `GetDeviceInfoCommand` — `visiontest get_device_info --platform android|ios`. Delegates to `getDeviceInfo()` +- [x] 4.7 `ScreenshotCommand` — `visiontest screenshot --platform android|ios [--output PATH]`. Delegates to the platform's `captureScreenshot(outputPath: String?)` (already exists on both registrars). Default path behavior is preserved (resolves `./screenshots/_screenshot_.png` against CWD). 30 s timeout + +### Interaction + +- [x] 4.8 `TapByCoordinatesCommand` — `visiontest tap_by_coordinates --platform android|ios `. `x` and `y` are required integer positional args. Delegates to `tapByCoordinates(x, y)` +- [x] 4.9 `InputTextCommand` — `visiontest input_text --platform android|ios `. `text` is a required positional arg (single value; the skill can quote strings containing spaces). Delegates to `inputText(text)` +- [x] 4.10 `SwipeDirectionCommand` — `visiontest swipe_direction --platform android|ios [--distance short|medium|long] [--speed slow|normal|fast]`. `direction` is a required positional from `{up,down,left,right}`. Clikt `choice(...)` validates it (invalid → exit 2). Defaults mirror MCP (distance=medium, speed=normal). Delegates to `swipeByDirection(direction, distance, speed)` + +### Navigation + +- [x] 4.11 `PressBackCommand` — `visiontest press_back --platform android`. Android-only; rejects `--platform ios` → exit 5. Delegates to `pressBack()` +- [x] 4.12 `PressHomeCommand` — `visiontest press_home --platform android|ios`. Delegates to the platform's `pressHome()` + +### Apps + +- [x] 4.13 `LaunchAppCommand` — `visiontest launch_app --platform android|ios `. `id` is a required positional (package name for Android, bundle ID for iOS). Delegates to the platform's launch-app function + +## 5. Exit-code mapping + +- [x] 5.1 Teach extracted functions (or a thin shim in each CLI command) to throw `CliExit(ServerNotReachable, "...")` when `automationClient.isServerRunning()` returns false, replacing the MCP-side "Use 'start_automation_server' first" short-circuit. On the MCP side the string return is preserved (the extracted function can return the same string for MCP while the CLI command maps `ServerNotReachable` to exit 3 — prefer a single path: the extracted function throws `CliExit`, and the MCP registration catches it and converts back to the string form for compatibility) +- [x] 5.2 Map `Android.getFirstAvailableDevice()` / `IOSManager.getFirstAvailableDevice()` failures to `CliExit(DeviceNotFound, "...")` +- [x] 5.3 Map clikt `UsageError` / `MissingArgument` / `BadParameterValue` to exit 2 via clikt's built-in mechanism (`CliktError.statusCode`) +- [x] 5.4 Any uncaught exception in `runCliCommand` → exit 1 with the exception message on stderr + +## 6. Tests + +- [x] 6.1 `app/src/test/kotlin/com/example/visiontest/cli/VisionTestCliTest.kt` — table test covering (a) each command parses its required args, (b) missing args produces usage error (exit 2), (c) bad `--platform` value is rejected, (d) Android-only commands reject `--platform ios` +- [x] 6.2 `app/src/test/kotlin/com/example/visiontest/cli/CliErrorHandlerTest.kt` — covers exit-code mapping for each `CliExit` variant, for uncaught exceptions, and for success +- [x] 6.3 One integration-style test per CLI command using fakes for `Android` / `IOSManager` / `AutomationClient` / `IOSAutomationClient`, verifying the command delegates with the right parameters and prints the expected stdout on success +- [x] 6.4 Ensure `./gradlew test` stays green end-to-end (app + automation-server) + +## 7. Reference skill + +- [x] 7.1 Create `.claude/skills/visiontest-mobile/SKILL.md` with: one-paragraph overview; the standard loop (`start_automation_server` → `screenshot` → `get_interactive_elements` → `tap_by_coordinates` → repeat); the `--platform` flag convention; the exit-code contract (with "what to do on each code"); the Flutter `contentDescription` gotcha copied from `CLAUDE.md`; a short example session +- [x] 7.2 If Claude Code skill metadata requires a frontmatter block, include it (name, description, when-to-use). Keep the body under 200 lines + +## 8. Documentation + +- [x] 8.1 Add a "CLI Usage" section to `CLAUDE.md` listing all 13 commands with one-line descriptions, the `--platform` rule, and the exit-code table. Place it after the existing "MCP Tools" section +- [x] 8.2 Add a short entry to `LEARNING.md` titled "Dual facade: MCP tools + CLI" explaining the handler-extraction refactor, why both facades share one implementation, and the decision to defer `--json` / daemon mode +- [x] 8.3 Update `docs/installation.md` (if present) to mention the CLI usage alongside the MCP configuration, including that `visiontest` with no args keeps doing what it does today + +## 9. Verification + +- [x] 9.1 `./gradlew build` passes +- [x] 9.2 `./gradlew test` passes (app + automation-server) +- [x] 9.3 `./gradlew shadowJar` produces a JAR that runs both `java -jar visiontest.jar` (MCP stdio, unchanged behavior) and `java -jar visiontest.jar --platform

[args]` (CLI) +- [x] 9.4 Manual smoke: with an Android emulator running, `visiontest install_automation_server --platform android && visiontest start_automation_server --platform android && visiontest screenshot --platform android` produces a PNG under `./screenshots/` +- [x] 9.5 Manual smoke: with an iOS simulator booted, `visiontest start_automation_server --platform ios && visiontest get_interactive_elements --platform ios` returns a non-empty elements list +- [x] 9.6 Manual smoke: `visiontest press_back --platform ios` exits with code 5 and a clear message +- [x] 9.7 Manual smoke: `visiontest tap_by_coordinates --platform android 100` exits with code 2 and clikt's missing-argument message +- [x] 9.8 Manual smoke: `visiontest screenshot --platform android` with the automation server stopped exits with code 3 and a message instructing the caller to run `start_automation_server`