feat: adb_vision - clean ADB + VLM game automation stack#43
feat: adb_vision - clean ADB + VLM game automation stack#43
Conversation
- Clean MCP server (server.py) with 6 tools: screenshot, tap, swipe, keyevent, launch_game, get_focus - Pluggable screenshot backends (screenshot.py) with dispatch + stubs for DroidCast, scrcpy, u2, screencap - 10/10 unit tests passing (all mocked, no device needed) - Gemini CLI wired as vision driver via MCP (drive.bat + system prompt) - .mcp.json updated to point to adb_vision server - .gemini/settings.json registers adb-vision MCP server GitHub issues created for Jules: - #40: DroidCast APK HTTP stream backend - #41: scrcpy virtual display capture backend - #42: uiautomator2 ATX HTTP backend Zero ALAS dependency. Pure async ADB + FastMCP.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
| "server.py", | ||
| "--serial", | ||
| "127.0.0.1:21513" | ||
| ] |
There was a problem hiding this comment.
🔥 The Roast: Changing the port from 21503 to 21513? I see we're playing 'guess which MEmu instance' today. According to AGENTS.md, the documented environment uses 127.0.0.1:21503 for MEmuPlayer. This change breaks the 'Known local environment' documented in your own setup guide.
🩹 The Fix: Either document why port 21513 is needed, make this configurable via environment variable, or keep the documented default. Don't leave developers playing port roulette.
📏 Severity: critical
| ] | ||
| } | ||
| } | ||
| } No newline at end of file |
There was a problem hiding this comment.
🔥 The Roast: Missing newline at end of file. It's 2026 and we're still having this conversation? Your editor is gaslighting you into believing this file is complete when git diff is side-eyeing you hard.
🩹 The Fix:
| } | |
| "PYTHONIOENCODING": "utf-8" | |
| } | |
| } | |
| } |
📏 Severity: nitpick
| "server.py", | ||
| "--serial", | ||
| "127.0.0.1:21513" | ||
| ], |
There was a problem hiding this comment.
🔥 The Roast: Same port switcharoo here — 21503 became 21513 like it's no big deal. The AGENTS.md clearly states 127.0.0.1:21503 as the known environment. If you're changing the default, at least have the courtesy to update the docs or explain why instance 0 suddenly became instance 1.
🩹 The Fix: Align with documented environment or update AGENTS.md to explain the port change rationale.
📏 Severity: critical
|
|
||
|
|
||
| ADB_EXECUTABLE: str = _find_adb() | ||
| ADB_SERIAL: str = "127.0.0.1:21513" |
There was a problem hiding this comment.
🔥 The Roast: Hardcoded serial at module level — the module loads once and this value is cast in stone. It's like defining your WiFi password in a global constant and calling it 'configuration'. The argparse default below becomes purely decorative since this is already set.
🩹 The Fix: Set this to None here and let main() initialize it properly from args or environment. Or use a class-based approach that doesn't rely on module-level globals.
📏 Severity: warning
| try: | ||
| log.debug("trying screenshot backend: %s", name) | ||
| b64 = await capture_fn(adb_run=adb_run, serial=serial, adb_exe=adb_exe) | ||
| if b64 and len(base64.b64decode(b64)) > 5000: |
There was a problem hiding this comment.
🔥 The Roast: 5000 — the magic number that somehow distinguishes 'real screenshot' from 'blank MEmu garbage'. This threshold is doing more heavy lifting than a CrossFit enthusiast on leg day. Where did 5000 come from? The comment says '>> 5KB' but the check is '> 5000 bytes' which is a hair under 5KB. Close enough for government work, I suppose.
🩹 The Fix: Make this a named constant with a descriptive name like MIN_VALID_PNG_SIZE_BYTES = 5000 and document how you arrived at this value.
📏 Severity: suggestion
| # Stub — implementation will be filled by a Jules issue. | ||
| # --------------------------------------------------------------------------- | ||
| async def _capture_droidcast(*, adb_run: AdbRunFn, serial: str, adb_exe: str) -> str: | ||
| raise NotImplementedError( |
There was a problem hiding this comment.
🔥 The Roast: Three unimplemented backends that just raise NotImplementedError. I appreciate the optimism of providing a pluggable architecture when only one plugin actually works. It's like building a 4-car garage for a unicycle.
🩹 The Fix: Either implement at least one more backend so 'auto' failover makes sense, or remove the stubs and simplify until they're ready. Dead code is technical debt wearing a Halloween costume.
📏 Severity: suggestion
| with open(_ACTION_LOG_PATH, "a", encoding="utf-8") as fh: | ||
| fh.write(json.dumps(record, ensure_ascii=True) + "\n") | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
🔥 The Roast: except Exception: pass — the 'see no evil, hear no evil' of error handling. When logging fails, we just... don't? Silent failures are like ninja bugs: you don't know they're there until they've stolen your sanity at 3 AM.
🩹 The Fix: At minimum, log to stderr as a fallback: except Exception as e: print(f'Failed to log action: {e}', file=sys.stderr)
📏 Severity: warning
| found = shutil.which("adb") | ||
| if found: | ||
| return found | ||
| candidates = [ |
There was a problem hiding this comment.
🔥 The Roast: Windows-only hardcoded paths. I see C:\ and D:\ and I know this code won't work anywhere else without modification. It's like building a house with doors that only open in one direction — sure it works, but you're gonna confuse a lot of people.
🩹 The Fix: Add POSIX paths like /usr/bin/adb, /opt/android-sdk/platform-tools/adb, or just rely on shutil.which() entirely for non-Windows systems.
📏 Severity: suggestion
Code Review Roast 🔥Verdict: 8 Issues Found | Recommendation: Address before merge Overview
Issue Details (click to expand)
🏆 Best part: The pluggable screenshot backend architecture is actually well-designed. If the stubs get implemented, this will be a solid foundation. The separation of ADB logic from ALAS dependency is a good architectural move toward the NORTH_STAR vision. 💀 Worst part: The port change from 21503 → 21513 without documentation or explanation is a breaking change that contradicts your own project's documented environment setup. It's like changing the locks on your house but leaving the old key under the mat with a note saying 'good luck'. 📊 Overall: This PR introduces a clean standalone ADB+VLM MCP server that aligns with the tool-first vision, but the undocumented port change and hardcoded configuration values suggest it wasn't tested against the documented MEmu environment. Like a beautifully architected house built on a foundation of sand — the structure is sound, but the assumptions need validation. Files Reviewed (6 files)
|
There was a problem hiding this comment.
Pull request overview
Introduces a new standalone adb_vision/ module that provides a clean FastMCP stdio server for ADB-based “vision-driven” emulator control, explicitly avoiding any ALAS dependency, and wires editor/Gemini configs to use it.
Changes:
- Added
adb_vision/MCP server (server.py) exposing ADB control + screenshot tools. - Implemented a pluggable screenshot dispatcher (
screenshot.py) with stub backends and a screencap fallback. - Added a mocked unit test suite and Gemini CLI integration/config (
test_server.py,drive.bat,.gemini/settings.json,.mcp.json).
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| adb_vision/test_server.py | Adds mocked async unit tests for screenshot dispatch + MCP tools. |
| adb_vision/server.py | New FastMCP server exposing ADB tools, action logging, and screenshot saving. |
| adb_vision/screenshot.py | Screenshot backend dispatcher with stubs + screencap backend and blank-image guard. |
| adb_vision/pyproject.toml | Declares adb_vision dependencies and dev test deps. |
| adb_vision/drive.bat | Launch helper for Gemini CLI using the adb_vision policy prompt. |
| adb_vision/conftest.py | Ensures local imports work for tests by adding adb_vision/ to sys.path. |
| adb_vision/init.py | Package marker + brief module docstring. |
| adb_vision/README.md | Usage/architecture documentation for adb_vision. |
| adb_vision/GEMINI_SYSTEM_PROMPT.md | Gemini system prompt describing tools and workflow. |
| adb_vision/.gitignore | Ignores runtime artifacts (JSONL logs, screenshots, venv, caches). |
| .mcp.json | Points VS Code MCP config to the new adb_vision server entrypoint. |
| .gemini/settings.json | Registers the adb_vision MCP server for Gemini CLI usage. |
| def _fake_png(width: int = 1280, height: int = 720) -> bytes: | ||
| """Generate a valid PNG with random noise that is > 5KB.""" | ||
| from PIL import Image | ||
| import io | ||
| import random | ||
|
|
||
| random.seed(42) | ||
| # Random pixel data won't compress as small as solid color | ||
| pixels = bytes(random.randint(0, 255) for _ in range(width * height * 3)) | ||
| img = Image.frombytes("RGB", (width, height), pixels) | ||
| buf = io.BytesIO() | ||
| img.save(buf, format="PNG") | ||
| return buf.getvalue() | ||
|
|
||
|
|
||
| FAKE_PNG_B64 = base64.b64encode(_fake_png()).decode("ascii") | ||
| FAKE_PNG_BYTES = base64.b64decode(FAKE_PNG_B64) |
There was a problem hiding this comment.
_fake_png() defaults to generating a full 1280×720 random-noise image and is called at import time to build FAKE_PNG_B64. This makes the test module expensive to import and will slow the unit test suite significantly; consider using a much smaller deterministic PNG that still exceeds the 5KB threshold (or storing a small precomputed PNG bytes constant).
| "mcpServers": { | ||
| "alas": { | ||
| "command": "uv", | ||
| "args": [ | ||
| "run", | ||
| "--directory", | ||
| "agent_orchestrator", | ||
| "alas_mcp_server.py", | ||
| "--config", | ||
| "alas" | ||
| "adb_vision", | ||
| "server.py", | ||
| "--serial", | ||
| "127.0.0.1:21513" | ||
| ], |
There was a problem hiding this comment.
The VS Code MCP server entry is still named alas even though it now launches adb_vision/server.py. Renaming the key to something like adb-vision would reduce confusion (and avoids clobbering any existing ALAS MCP configuration).
| parser.add_argument( | ||
| "--screenshot-method", | ||
| default="auto", | ||
| choices=["auto", "droidcast", "scrcpy", "u2", "screencap"], | ||
| help="Default screenshot method", | ||
| ) |
There was a problem hiding this comment.
The --screenshot-method CLI option is parsed but never used, so it currently has no effect on tool behavior. Either remove this argument or wire it into adb_screenshot() (e.g., set a module-level default method in main() and have adb_screenshot() use it when the caller doesn’t explicitly pass method).
| parser.add_argument( | |
| "--screenshot-method", | |
| default="auto", | |
| choices=["auto", "droidcast", "scrcpy", "u2", "screencap"], | |
| help="Default screenshot method", | |
| ) |
| t0 = time.monotonic() | ||
| await _adb_run("shell", "input", "tap", str(x), str(y), timeout=5.0) | ||
| ms = int((time.monotonic() - t0) * 1000) | ||
| _action_log("adb_tap", {"x": x, "y": y}, f"tapped {x},{y}", "", ms) |
There was a problem hiding this comment.
Action log records for adb_tap omit the active ADB serial, which makes the JSONL log ambiguous if the server is started with a non-default --serial. Consider including serial in the logged args (as is already done for some other tools).
| _action_log("adb_tap", {"x": x, "y": y}, f"tapped {x},{y}", "", ms) | |
| _action_log("adb_tap", {"serial": ADB_SERIAL, "x": x, "y": y}, f"tapped {x},{y}", "", ms) |
| _action_log( | ||
| "adb_swipe", | ||
| {"x1": x1, "y1": y1, "x2": x2, "y2": y2, "duration_ms": duration_ms}, | ||
| f"swiped {x1},{y1}->{x2},{y2}", | ||
| "", | ||
| ms, | ||
| ) |
There was a problem hiding this comment.
Action log records for adb_swipe omit the active ADB serial, which makes the JSONL log ambiguous if the server is started with a non-default --serial. Consider including serial in the logged args (as is already done for some other tools).
| t0 = time.monotonic() | ||
| await _adb_run("shell", "input", "keyevent", str(keycode), timeout=5.0) | ||
| ms = int((time.monotonic() - t0) * 1000) | ||
| _action_log("adb_keyevent", {"keycode": keycode}, f"keyevent {keycode}", "", ms) |
There was a problem hiding this comment.
Action log records for adb_keyevent omit the active ADB serial, which makes the JSONL log ambiguous if the server is started with a non-default --serial. Consider including serial in the logged args (as is already done for some other tools).
| _action_log("adb_keyevent", {"keycode": keycode}, f"keyevent {keycode}", "", ms) | |
| _action_log( | |
| "adb_keyevent", | |
| {"keycode": keycode, "serial": ADB_SERIAL}, | |
| f"keyevent {keycode}", | |
| "", | |
| ms, | |
| ) |
| if b64 and len(base64.b64decode(b64)) > 5000: | ||
| # Sanity check: real screenshots are >> 5KB; blank MEmu | ||
| # screencaps are ~3.6KB | ||
| log.info("screenshot captured via %s", name) | ||
| return b64 | ||
| log.warning("%s returned suspiciously small image (%d bytes)", name, len(base64.b64decode(b64))) |
There was a problem hiding this comment.
take_screenshot() base64-decodes the returned image multiple times (size check + warning), which does redundant work on potentially large screenshots. Decode once into bytes, reuse the length for the sanity check/logging, and only re-encode if needed.
| if b64 and len(base64.b64decode(b64)) > 5000: | |
| # Sanity check: real screenshots are >> 5KB; blank MEmu | |
| # screencaps are ~3.6KB | |
| log.info("screenshot captured via %s", name) | |
| return b64 | |
| log.warning("%s returned suspiciously small image (%d bytes)", name, len(base64.b64decode(b64))) | |
| img_bytes = base64.b64decode(b64) | |
| img_size = len(img_bytes) | |
| if b64 and img_size > 5000: | |
| # Sanity check: real screenshots are >> 5KB; blank MEmu | |
| # screencaps are ~3.6KB | |
| log.info("screenshot captured via %s", name) | |
| return b64 | |
| log.warning( | |
| "%s returned suspiciously small image (%d bytes)", | |
| name, | |
| img_size, | |
| ) |
Summary
New
adb_vision/folder: clean ADB game automation with zero ALAS dependency.What's in this PR
server.py) — 6 tools:adb_screenshot,adb_tap,adb_swipe,adb_keyevent,adb_launch_game,adb_get_focusscreenshot.py) — dispatcher with stubs for DroidCast, scrcpy, u2, and screencap. Each backend has a dedicated GitHub issue for Jules (Screenshot backend: DroidCast APK HTTP stream #40, Screenshot backend: scrcpy virtual display capture #41, Screenshot backend: direct uiautomator2 ATX HTTP #42).drive.batlaunches interactive game control sessionArchitecture
Gemini CLI connects to adb_vision MCP server via stdio. Gemini sees the screen (screenshot), reasons about what to do, and controls the emulator (tap/swipe). Gemini IS the bot.
Screenshot Methods (Issues for Jules)
Testing