feat: adb_vision - clean ADB + VLM game automation stack by Coldaine · Pull Request #43 · Coldaine/ALAS

Coldaine · 2026-03-03T04:44:44Z

Summary

New adb_vision/ folder: clean ADB game automation with zero ALAS dependency.

What's in this PR

Clean MCP server (server.py) — 6 tools: adb_screenshot, adb_tap, adb_swipe, adb_keyevent, adb_launch_game, adb_get_focus
Pluggable screenshot backends (screenshot.py) — dispatcher with stubs for DroidCast, scrcpy, u2, and screencap. Each backend has a dedicated GitHub issue for Jules (Screenshot backend: DroidCast APK HTTP stream #40, Screenshot backend: scrcpy virtual display capture #41, Screenshot backend: direct uiautomator2 ATX HTTP #42).
10/10 unit tests — all mocked, no device needed
Gemini CLI integration — MCP server registered with Gemini CLI; drive.bat launches interactive game control session
.mcp.json updated to point VS Code at the new server
System prompt for Gemini — describes tools, workflow, and game UI elements

Architecture

Gemini CLI connects to adb_vision MCP server via stdio. Gemini sees the screen (screenshot), reasons about what to do, and controls the emulator (tap/swipe). Gemini IS the bot.

Screenshot Methods (Issues for Jules)

Screenshot backend: DroidCast APK HTTP stream #40 DroidCast APK HTTP stream
Screenshot backend: scrcpy virtual display capture #41 scrcpy virtual display capture
Screenshot backend: direct uiautomator2 ATX HTTP #42 uiautomator2 ATX HTTP

Testing

cd adb_vision && uv run pytest test_server.py -v   # 10/10 pass

- Clean MCP server (server.py) with 6 tools: screenshot, tap, swipe, keyevent, launch_game, get_focus - Pluggable screenshot backends (screenshot.py) with dispatch + stubs for DroidCast, scrcpy, u2, screencap - 10/10 unit tests passing (all mocked, no device needed) - Gemini CLI wired as vision driver via MCP (drive.bat + system prompt) - .mcp.json updated to point to adb_vision server - .gemini/settings.json registers adb-vision MCP server GitHub issues created for Jules: - #40: DroidCast APK HTTP stream backend - #41: scrcpy virtual display capture backend - #42: uiautomator2 ATX HTTP backend Zero ALAS dependency. Pure async ADB + FastMCP.

chatgpt-codex-connector · 2026-03-03T04:44:49Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

kilo-code-bot · 2026-03-03T04:46:35Z

.gemini/settings.json

+        "server.py",
+        "--serial",
+        "127.0.0.1:21513"
+      ]


🔥 The Roast: Changing the port from 21503 to 21513? I see we're playing 'guess which MEmu instance' today. According to AGENTS.md, the documented environment uses 127.0.0.1:21503 for MEmuPlayer. This change breaks the 'Known local environment' documented in your own setup guide.

🩹 The Fix: Either document why port 21513 is needed, make this configurable via environment variable, or keep the documented default. Don't leave developers playing port roulette.

📏 Severity: critical

kilo-code-bot · 2026-03-03T04:46:35Z

.gemini/settings.json

+      ]
+    }
+  }
+}


🔥 The Roast: Missing newline at end of file. It's 2026 and we're still having this conversation? Your editor is gaslighting you into believing this file is complete when git diff is side-eyeing you hard.

🩹 The Fix:

Suggested change

}

"PYTHONIOENCODING": "utf-8"

}

}

}

📏 Severity: nitpick

kilo-code-bot · 2026-03-03T04:46:35Z

.mcp.json

+        "server.py",
+        "--serial",
+        "127.0.0.1:21513"
      ],


🔥 The Roast: Same port switcharoo here — 21503 became 21513 like it's no big deal. The AGENTS.md clearly states 127.0.0.1:21503 as the known environment. If you're changing the default, at least have the courtesy to update the docs or explain why instance 0 suddenly became instance 1.

🩹 The Fix: Align with documented environment or update AGENTS.md to explain the port change rationale.

📏 Severity: critical

kilo-code-bot · 2026-03-03T04:46:35Z

adb_vision/server.py

+
+
+ADB_EXECUTABLE: str = _find_adb()
+ADB_SERIAL: str = "127.0.0.1:21513"


🔥 The Roast: Hardcoded serial at module level — the module loads once and this value is cast in stone. It's like defining your WiFi password in a global constant and calling it 'configuration'. The argparse default below becomes purely decorative since this is already set.

🩹 The Fix: Set this to None here and let main() initialize it properly from args or environment. Or use a class-based approach that doesn't rely on module-level globals.

📏 Severity: warning

kilo-code-bot · 2026-03-03T04:46:35Z

adb_vision/screenshot.py

+        try:
+            log.debug("trying screenshot backend: %s", name)
+            b64 = await capture_fn(adb_run=adb_run, serial=serial, adb_exe=adb_exe)
+            if b64 and len(base64.b64decode(b64)) > 5000:


🔥 The Roast: 5000 — the magic number that somehow distinguishes 'real screenshot' from 'blank MEmu garbage'. This threshold is doing more heavy lifting than a CrossFit enthusiast on leg day. Where did 5000 come from? The comment says '>> 5KB' but the check is '> 5000 bytes' which is a hair under 5KB. Close enough for government work, I suppose.

🩹 The Fix: Make this a named constant with a descriptive name like MIN_VALID_PNG_SIZE_BYTES = 5000 and document how you arrived at this value.

📏 Severity: suggestion

kilo-code-bot · 2026-03-03T04:46:35Z

adb_vision/screenshot.py

+# Stub — implementation will be filled by a Jules issue.
+# ---------------------------------------------------------------------------
+async def _capture_droidcast(*, adb_run: AdbRunFn, serial: str, adb_exe: str) -> str:
+    raise NotImplementedError(


🔥 The Roast: Three unimplemented backends that just raise NotImplementedError. I appreciate the optimism of providing a pluggable architecture when only one plugin actually works. It's like building a 4-car garage for a unicycle.

🩹 The Fix: Either implement at least one more backend so 'auto' failover makes sense, or remove the stubs and simplify until they're ready. Dead code is technical debt wearing a Halloween costume.

📏 Severity: suggestion

kilo-code-bot · 2026-03-03T04:46:35Z

adb_vision/server.py

+        with open(_ACTION_LOG_PATH, "a", encoding="utf-8") as fh:
+            fh.write(json.dumps(record, ensure_ascii=True) + "\n")
+    except Exception:
+        pass


🔥 The Roast: except Exception: pass — the 'see no evil, hear no evil' of error handling. When logging fails, we just... don't? Silent failures are like ninja bugs: you don't know they're there until they've stolen your sanity at 3 AM.

🩹 The Fix: At minimum, log to stderr as a fallback: except Exception as e: print(f'Failed to log action: {e}', file=sys.stderr)

📏 Severity: warning

kilo-code-bot · 2026-03-03T04:46:35Z

adb_vision/server.py

+    found = shutil.which("adb")
+    if found:
+        return found
+    candidates = [


🔥 The Roast: Windows-only hardcoded paths. I see C:\ and D:\ and I know this code won't work anywhere else without modification. It's like building a house with doors that only open in one direction — sure it works, but you're gonna confuse a lot of people.

🩹 The Fix: Add POSIX paths like /usr/bin/adb, /opt/android-sdk/platform-tools/adb, or just rely on shutil.which() entirely for non-Windows systems.

📏 Severity: suggestion

kilo-code-bot · 2026-03-03T04:46:52Z

Code Review Roast 🔥

Verdict: 8 Issues Found | Recommendation: Address before merge

Overview

Severity	Count
🚨 critical	2
⚠️ warning	2
💡 suggestion	4
🤏 nitpick	0

Issue Details (click to expand)

File	Line	Roast
`.gemini/settings.json`	12	Port changed from documented 21503 to 21513 — breaks known environment
`.mcp.json`	12	Same port change — inconsistent with AGENTS.md documentation
`adb_vision/server.py`	107	Hardcoded serial at module level reduces configurability
`adb_vision/server.py`	73	Silent exception handling in `_action_log` hides failures
`adb_vision/server.py`	94	Windows-only hardcoded paths reduce portability
`adb_vision/screenshot.py`	45	Magic number 5000 lacks named constant and documentation
`adb_vision/screenshot.py`	95	Three unimplemented screenshot backends (technical debt)
`.gemini/settings.json`	15	Missing newline at end of file

🏆 Best part: The pluggable screenshot backend architecture is actually well-designed. If the stubs get implemented, this will be a solid foundation. The separation of ADB logic from ALAS dependency is a good architectural move toward the NORTH_STAR vision.

💀 Worst part: The port change from 21503 → 21513 without documentation or explanation is a breaking change that contradicts your own project's documented environment setup. It's like changing the locks on your house but leaving the old key under the mat with a note saying 'good luck'.

📊 Overall: This PR introduces a clean standalone ADB+VLM MCP server that aligns with the tool-first vision, but the undocumented port change and hardcoded configuration values suggest it wasn't tested against the documented MEmu environment. Like a beautifully architected house built on a foundation of sand — the structure is sound, but the assumptions need validation.

Files Reviewed (6 files)

.gemini/settings.json - 2 issues (critical: port change, nitpick: missing newline)
.mcp.json - 1 issue (critical: port change)
adb_vision/server.py - 3 issues (warning: hardcoded serial, warning: silent exception, suggestion: Windows-only paths)
adb_vision/screenshot.py - 2 issues (suggestion: magic number, suggestion: unimplemented backends)
adb_vision/test_server.py - Clean, well-structured tests
adb_vision/pyproject.toml - Clean

Fix these issues in Kilo Cloud

Copilot

Pull request overview

Introduces a new standalone adb_vision/ module that provides a clean FastMCP stdio server for ADB-based “vision-driven” emulator control, explicitly avoiding any ALAS dependency, and wires editor/Gemini configs to use it.

Changes:

Added adb_vision/ MCP server (server.py) exposing ADB control + screenshot tools.
Implemented a pluggable screenshot dispatcher (screenshot.py) with stub backends and a screencap fallback.
Added a mocked unit test suite and Gemini CLI integration/config (test_server.py, drive.bat, .gemini/settings.json, .mcp.json).

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
adb_vision/test_server.py	Adds mocked async unit tests for screenshot dispatch + MCP tools.
adb_vision/server.py	New FastMCP server exposing ADB tools, action logging, and screenshot saving.
adb_vision/screenshot.py	Screenshot backend dispatcher with stubs + screencap backend and blank-image guard.
adb_vision/pyproject.toml	Declares adb_vision dependencies and dev test deps.
adb_vision/drive.bat	Launch helper for Gemini CLI using the adb_vision policy prompt.
adb_vision/conftest.py	Ensures local imports work for tests by adding `adb_vision/` to `sys.path`.
adb_vision/init.py	Package marker + brief module docstring.
adb_vision/README.md	Usage/architecture documentation for adb_vision.
adb_vision/GEMINI_SYSTEM_PROMPT.md	Gemini system prompt describing tools and workflow.
adb_vision/.gitignore	Ignores runtime artifacts (JSONL logs, screenshots, venv, caches).
.mcp.json	Points VS Code MCP config to the new adb_vision server entrypoint.
.gemini/settings.json	Registers the adb_vision MCP server for Gemini CLI usage.

Copilot · 2026-03-03T04:51:43Z

adb_vision/test_server.py

+def _fake_png(width: int = 1280, height: int = 720) -> bytes:
+    """Generate a valid PNG with random noise that is > 5KB."""
+    from PIL import Image
+    import io
+    import random
+
+    random.seed(42)
+    # Random pixel data won't compress as small as solid color
+    pixels = bytes(random.randint(0, 255) for _ in range(width * height * 3))
+    img = Image.frombytes("RGB", (width, height), pixels)
+    buf = io.BytesIO()
+    img.save(buf, format="PNG")
+    return buf.getvalue()
+
+
+FAKE_PNG_B64 = base64.b64encode(_fake_png()).decode("ascii")
+FAKE_PNG_BYTES = base64.b64decode(FAKE_PNG_B64)


_fake_png() defaults to generating a full 1280×720 random-noise image and is called at import time to build FAKE_PNG_B64. This makes the test module expensive to import and will slow the unit test suite significantly; consider using a much smaller deterministic PNG that still exceeds the 5KB threshold (or storing a small precomputed PNG bytes constant).

Copilot · 2026-03-03T04:51:43Z

.mcp.json

  "mcpServers": {
    "alas": {
      "command": "uv",
      "args": [
        "run",
        "--directory",
-        "agent_orchestrator",
-        "alas_mcp_server.py",
-        "--config",
-        "alas"
+        "adb_vision",
+        "server.py",
+        "--serial",
+        "127.0.0.1:21513"
      ],


The VS Code MCP server entry is still named alas even though it now launches adb_vision/server.py. Renaming the key to something like adb-vision would reduce confusion (and avoids clobbering any existing ALAS MCP configuration).

Copilot · 2026-03-03T04:51:43Z

adb_vision/server.py

+    parser.add_argument(
+        "--screenshot-method",
+        default="auto",
+        choices=["auto", "droidcast", "scrcpy", "u2", "screencap"],
+        help="Default screenshot method",
+    )


The --screenshot-method CLI option is parsed but never used, so it currently has no effect on tool behavior. Either remove this argument or wire it into adb_screenshot() (e.g., set a module-level default method in main() and have adb_screenshot() use it when the caller doesn’t explicitly pass method).

Suggested change

parser.add_argument(

"--screenshot-method",

default="auto",

choices=["auto", "droidcast", "scrcpy", "u2", "screencap"],

help="Default screenshot method",

)

Copilot · 2026-03-03T04:51:44Z

adb_vision/server.py

+    t0 = time.monotonic()
+    await _adb_run("shell", "input", "tap", str(x), str(y), timeout=5.0)
+    ms = int((time.monotonic() - t0) * 1000)
+    _action_log("adb_tap", {"x": x, "y": y}, f"tapped {x},{y}", "", ms)


Action log records for adb_tap omit the active ADB serial, which makes the JSONL log ambiguous if the server is started with a non-default --serial. Consider including serial in the logged args (as is already done for some other tools).

Suggested change

_action_log("adb_tap", {"x": x, "y": y}, f"tapped {x},{y}", "", ms)

_action_log("adb_tap", {"serial": ADB_SERIAL, "x": x, "y": y}, f"tapped {x},{y}", "", ms)

Copilot · 2026-03-03T04:51:44Z

adb_vision/server.py

+    _action_log(
+        "adb_swipe",
+        {"x1": x1, "y1": y1, "x2": x2, "y2": y2, "duration_ms": duration_ms},
+        f"swiped {x1},{y1}->{x2},{y2}",
+        "",
+        ms,
+    )


Action log records for adb_swipe omit the active ADB serial, which makes the JSONL log ambiguous if the server is started with a non-default --serial. Consider including serial in the logged args (as is already done for some other tools).

Copilot · 2026-03-03T04:51:44Z

adb_vision/server.py

+    t0 = time.monotonic()
+    await _adb_run("shell", "input", "keyevent", str(keycode), timeout=5.0)
+    ms = int((time.monotonic() - t0) * 1000)
+    _action_log("adb_keyevent", {"keycode": keycode}, f"keyevent {keycode}", "", ms)


Action log records for adb_keyevent omit the active ADB serial, which makes the JSONL log ambiguous if the server is started with a non-default --serial. Consider including serial in the logged args (as is already done for some other tools).

Suggested change

_action_log("adb_keyevent", {"keycode": keycode}, f"keyevent {keycode}", "", ms)

_action_log(

"adb_keyevent",

{"keycode": keycode, "serial": ADB_SERIAL},

f"keyevent {keycode}",

"",

ms,

)

Copilot · 2026-03-03T04:51:44Z

adb_vision/screenshot.py

+            if b64 and len(base64.b64decode(b64)) > 5000:
+                # Sanity check: real screenshots are >> 5KB; blank MEmu
+                # screencaps are ~3.6KB
+                log.info("screenshot captured via %s", name)
+                return b64
+            log.warning("%s returned suspiciously small image (%d bytes)", name, len(base64.b64decode(b64)))


take_screenshot() base64-decodes the returned image multiple times (size check + warning), which does redundant work on potentially large screenshots. Decode once into bytes, reuse the length for the sanity check/logging, and only re-encode if needed.

Suggested change

if b64 and len(base64.b64decode(b64)) > 5000:

# Sanity check: real screenshots are >> 5KB; blank MEmu

# screencaps are ~3.6KB

log.info("screenshot captured via %s", name)

return b64

log.warning("%s returned suspiciously small image (%d bytes)", name, len(base64.b64decode(b64)))

img_bytes = base64.b64decode(b64)

img_size = len(img_bytes)

if b64 and img_size > 5000:

# Sanity check: real screenshots are >> 5KB; blank MEmu

# screencaps are ~3.6KB

log.info("screenshot captured via %s", name)

return b64

log.warning(

"%s returned suspiciously small image (%d bytes)",

name,

img_size,

)

Copilot AI review requested due to automatic review settings March 3, 2026 04:44

Copilot started reviewing on behalf of Coldaine March 3, 2026 04:45 View session

kilo-code-bot bot reviewed Mar 3, 2026

View reviewed changes

Copilot AI reviewed Mar 3, 2026

View reviewed changes

+                    ]
+                  }
+                }
+              }

                
                    No newline at end of file

-}
+      "PYTHONIOENCODING": "utf-8"
+    }
+  }
+}



		ADB_EXECUTABLE: str = _find_adb()
		ADB_SERIAL: str = "127.0.0.1:21513"

	_action_log("adb_tap", {"x": x, "y": y}, f"tapped {x},{y}", "", ms)
	_action_log("adb_tap", {"serial": ADB_SERIAL, "x": x, "y": y}, f"tapped {x},{y}", "", ms)

Conversation

Coldaine commented Mar 3, 2026

Summary

What's in this PR

Architecture

Screenshot Methods (Issues for Jules)

Testing

Uh oh!

chatgpt-codex-connector bot commented Mar 3, 2026

Uh oh!

kilo-code-bot bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

kilo-code-bot bot commented Mar 3, 2026

Code Review Roast 🔥

Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants