Skip to content

feat: adb_vision - clean ADB + VLM game automation stack#43

Open
Coldaine wants to merge 1 commit intomasterfrom
feature/adb-vision-clean
Open

feat: adb_vision - clean ADB + VLM game automation stack#43
Coldaine wants to merge 1 commit intomasterfrom
feature/adb-vision-clean

Conversation

@Coldaine
Copy link
Owner

@Coldaine Coldaine commented Mar 3, 2026

Summary

New adb_vision/ folder: clean ADB game automation with zero ALAS dependency.

What's in this PR

Architecture

Gemini CLI connects to adb_vision MCP server via stdio. Gemini sees the screen (screenshot), reasons about what to do, and controls the emulator (tap/swipe). Gemini IS the bot.

Screenshot Methods (Issues for Jules)

Testing

cd adb_vision && uv run pytest test_server.py -v   # 10/10 pass

- Clean MCP server (server.py) with 6 tools: screenshot, tap, swipe, keyevent, launch_game, get_focus
- Pluggable screenshot backends (screenshot.py) with dispatch + stubs for DroidCast, scrcpy, u2, screencap
- 10/10 unit tests passing (all mocked, no device needed)
- Gemini CLI wired as vision driver via MCP (drive.bat + system prompt)
- .mcp.json updated to point to adb_vision server
- .gemini/settings.json registers adb-vision MCP server

GitHub issues created for Jules:
- #40: DroidCast APK HTTP stream backend
- #41: scrcpy virtual display capture backend
- #42: uiautomator2 ATX HTTP backend

Zero ALAS dependency. Pure async ADB + FastMCP.
Copilot AI review requested due to automatic review settings March 3, 2026 04:44
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

"server.py",
"--serial",
"127.0.0.1:21513"
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: Changing the port from 21503 to 21513? I see we're playing 'guess which MEmu instance' today. According to AGENTS.md, the documented environment uses 127.0.0.1:21503 for MEmuPlayer. This change breaks the 'Known local environment' documented in your own setup guide.

🩹 The Fix: Either document why port 21513 is needed, make this configurable via environment variable, or keep the documented default. Don't leave developers playing port roulette.

📏 Severity: critical

]
}
}
} No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: Missing newline at end of file. It's 2026 and we're still having this conversation? Your editor is gaslighting you into believing this file is complete when git diff is side-eyeing you hard.

🩹 The Fix:

Suggested change
}
"PYTHONIOENCODING": "utf-8"
}
}
}

📏 Severity: nitpick

"server.py",
"--serial",
"127.0.0.1:21513"
],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: Same port switcharoo here — 21503 became 21513 like it's no big deal. The AGENTS.md clearly states 127.0.0.1:21503 as the known environment. If you're changing the default, at least have the courtesy to update the docs or explain why instance 0 suddenly became instance 1.

🩹 The Fix: Align with documented environment or update AGENTS.md to explain the port change rationale.

📏 Severity: critical



ADB_EXECUTABLE: str = _find_adb()
ADB_SERIAL: str = "127.0.0.1:21513"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: Hardcoded serial at module level — the module loads once and this value is cast in stone. It's like defining your WiFi password in a global constant and calling it 'configuration'. The argparse default below becomes purely decorative since this is already set.

🩹 The Fix: Set this to None here and let main() initialize it properly from args or environment. Or use a class-based approach that doesn't rely on module-level globals.

📏 Severity: warning

try:
log.debug("trying screenshot backend: %s", name)
b64 = await capture_fn(adb_run=adb_run, serial=serial, adb_exe=adb_exe)
if b64 and len(base64.b64decode(b64)) > 5000:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: 5000 — the magic number that somehow distinguishes 'real screenshot' from 'blank MEmu garbage'. This threshold is doing more heavy lifting than a CrossFit enthusiast on leg day. Where did 5000 come from? The comment says '>> 5KB' but the check is '> 5000 bytes' which is a hair under 5KB. Close enough for government work, I suppose.

🩹 The Fix: Make this a named constant with a descriptive name like MIN_VALID_PNG_SIZE_BYTES = 5000 and document how you arrived at this value.

📏 Severity: suggestion

# Stub — implementation will be filled by a Jules issue.
# ---------------------------------------------------------------------------
async def _capture_droidcast(*, adb_run: AdbRunFn, serial: str, adb_exe: str) -> str:
raise NotImplementedError(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: Three unimplemented backends that just raise NotImplementedError. I appreciate the optimism of providing a pluggable architecture when only one plugin actually works. It's like building a 4-car garage for a unicycle.

🩹 The Fix: Either implement at least one more backend so 'auto' failover makes sense, or remove the stubs and simplify until they're ready. Dead code is technical debt wearing a Halloween costume.

📏 Severity: suggestion

with open(_ACTION_LOG_PATH, "a", encoding="utf-8") as fh:
fh.write(json.dumps(record, ensure_ascii=True) + "\n")
except Exception:
pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: except Exception: pass — the 'see no evil, hear no evil' of error handling. When logging fails, we just... don't? Silent failures are like ninja bugs: you don't know they're there until they've stolen your sanity at 3 AM.

🩹 The Fix: At minimum, log to stderr as a fallback: except Exception as e: print(f'Failed to log action: {e}', file=sys.stderr)

📏 Severity: warning

found = shutil.which("adb")
if found:
return found
candidates = [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 The Roast: Windows-only hardcoded paths. I see C:\ and D:\ and I know this code won't work anywhere else without modification. It's like building a house with doors that only open in one direction — sure it works, but you're gonna confuse a lot of people.

🩹 The Fix: Add POSIX paths like /usr/bin/adb, /opt/android-sdk/platform-tools/adb, or just rely on shutil.which() entirely for non-Windows systems.

📏 Severity: suggestion

@kilo-code-bot
Copy link

kilo-code-bot bot commented Mar 3, 2026

Code Review Roast 🔥

Verdict: 8 Issues Found | Recommendation: Address before merge

Overview

Severity Count
🚨 critical 2
⚠️ warning 2
💡 suggestion 4
🤏 nitpick 0
Issue Details (click to expand)
File Line Roast
.gemini/settings.json 12 Port changed from documented 21503 to 21513 — breaks known environment
.mcp.json 12 Same port change — inconsistent with AGENTS.md documentation
adb_vision/server.py 107 Hardcoded serial at module level reduces configurability
adb_vision/server.py 73 Silent exception handling in _action_log hides failures
adb_vision/server.py 94 Windows-only hardcoded paths reduce portability
adb_vision/screenshot.py 45 Magic number 5000 lacks named constant and documentation
adb_vision/screenshot.py 95 Three unimplemented screenshot backends (technical debt)
.gemini/settings.json 15 Missing newline at end of file

🏆 Best part: The pluggable screenshot backend architecture is actually well-designed. If the stubs get implemented, this will be a solid foundation. The separation of ADB logic from ALAS dependency is a good architectural move toward the NORTH_STAR vision.

💀 Worst part: The port change from 21503 → 21513 without documentation or explanation is a breaking change that contradicts your own project's documented environment setup. It's like changing the locks on your house but leaving the old key under the mat with a note saying 'good luck'.

📊 Overall: This PR introduces a clean standalone ADB+VLM MCP server that aligns with the tool-first vision, but the undocumented port change and hardcoded configuration values suggest it wasn't tested against the documented MEmu environment. Like a beautifully architected house built on a foundation of sand — the structure is sound, but the assumptions need validation.

Files Reviewed (6 files)
  • .gemini/settings.json - 2 issues (critical: port change, nitpick: missing newline)
  • .mcp.json - 1 issue (critical: port change)
  • adb_vision/server.py - 3 issues (warning: hardcoded serial, warning: silent exception, suggestion: Windows-only paths)
  • adb_vision/screenshot.py - 2 issues (suggestion: magic number, suggestion: unimplemented backends)
  • adb_vision/test_server.py - Clean, well-structured tests
  • adb_vision/pyproject.toml - Clean

Fix these issues in Kilo Cloud

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new standalone adb_vision/ module that provides a clean FastMCP stdio server for ADB-based “vision-driven” emulator control, explicitly avoiding any ALAS dependency, and wires editor/Gemini configs to use it.

Changes:

  • Added adb_vision/ MCP server (server.py) exposing ADB control + screenshot tools.
  • Implemented a pluggable screenshot dispatcher (screenshot.py) with stub backends and a screencap fallback.
  • Added a mocked unit test suite and Gemini CLI integration/config (test_server.py, drive.bat, .gemini/settings.json, .mcp.json).

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
adb_vision/test_server.py Adds mocked async unit tests for screenshot dispatch + MCP tools.
adb_vision/server.py New FastMCP server exposing ADB tools, action logging, and screenshot saving.
adb_vision/screenshot.py Screenshot backend dispatcher with stubs + screencap backend and blank-image guard.
adb_vision/pyproject.toml Declares adb_vision dependencies and dev test deps.
adb_vision/drive.bat Launch helper for Gemini CLI using the adb_vision policy prompt.
adb_vision/conftest.py Ensures local imports work for tests by adding adb_vision/ to sys.path.
adb_vision/init.py Package marker + brief module docstring.
adb_vision/README.md Usage/architecture documentation for adb_vision.
adb_vision/GEMINI_SYSTEM_PROMPT.md Gemini system prompt describing tools and workflow.
adb_vision/.gitignore Ignores runtime artifacts (JSONL logs, screenshots, venv, caches).
.mcp.json Points VS Code MCP config to the new adb_vision server entrypoint.
.gemini/settings.json Registers the adb_vision MCP server for Gemini CLI usage.

Comment on lines +13 to +29
def _fake_png(width: int = 1280, height: int = 720) -> bytes:
"""Generate a valid PNG with random noise that is > 5KB."""
from PIL import Image
import io
import random

random.seed(42)
# Random pixel data won't compress as small as solid color
pixels = bytes(random.randint(0, 255) for _ in range(width * height * 3))
img = Image.frombytes("RGB", (width, height), pixels)
buf = io.BytesIO()
img.save(buf, format="PNG")
return buf.getvalue()


FAKE_PNG_B64 = base64.b64encode(_fake_png()).decode("ascii")
FAKE_PNG_BYTES = base64.b64decode(FAKE_PNG_B64)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_fake_png() defaults to generating a full 1280×720 random-noise image and is called at import time to build FAKE_PNG_B64. This makes the test module expensive to import and will slow the unit test suite significantly; consider using a much smaller deterministic PNG that still exceeds the 5KB threshold (or storing a small precomputed PNG bytes constant).

Copilot uses AI. Check for mistakes.
Comment on lines 2 to 12
"mcpServers": {
"alas": {
"command": "uv",
"args": [
"run",
"--directory",
"agent_orchestrator",
"alas_mcp_server.py",
"--config",
"alas"
"adb_vision",
"server.py",
"--serial",
"127.0.0.1:21513"
],
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VS Code MCP server entry is still named alas even though it now launches adb_vision/server.py. Renaming the key to something like adb-vision would reduce confusion (and avoids clobbering any existing ALAS MCP configuration).

Copilot uses AI. Check for mistakes.
Comment on lines +298 to +303
parser.add_argument(
"--screenshot-method",
default="auto",
choices=["auto", "droidcast", "scrcpy", "u2", "screencap"],
help="Default screenshot method",
)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --screenshot-method CLI option is parsed but never used, so it currently has no effect on tool behavior. Either remove this argument or wire it into adb_screenshot() (e.g., set a module-level default method in main() and have adb_screenshot() use it when the caller doesn’t explicitly pass method).

Suggested change
parser.add_argument(
"--screenshot-method",
default="auto",
choices=["auto", "droidcast", "scrcpy", "u2", "screencap"],
help="Default screenshot method",
)

Copilot uses AI. Check for mistakes.
t0 = time.monotonic()
await _adb_run("shell", "input", "tap", str(x), str(y), timeout=5.0)
ms = int((time.monotonic() - t0) * 1000)
_action_log("adb_tap", {"x": x, "y": y}, f"tapped {x},{y}", "", ms)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action log records for adb_tap omit the active ADB serial, which makes the JSONL log ambiguous if the server is started with a non-default --serial. Consider including serial in the logged args (as is already done for some other tools).

Suggested change
_action_log("adb_tap", {"x": x, "y": y}, f"tapped {x},{y}", "", ms)
_action_log("adb_tap", {"serial": ADB_SERIAL, "x": x, "y": y}, f"tapped {x},{y}", "", ms)

Copilot uses AI. Check for mistakes.
Comment on lines +213 to +219
_action_log(
"adb_swipe",
{"x1": x1, "y1": y1, "x2": x2, "y2": y2, "duration_ms": duration_ms},
f"swiped {x1},{y1}->{x2},{y2}",
"",
ms,
)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action log records for adb_swipe omit the active ADB serial, which makes the JSONL log ambiguous if the server is started with a non-default --serial. Consider including serial in the logged args (as is already done for some other tools).

Copilot uses AI. Check for mistakes.
t0 = time.monotonic()
await _adb_run("shell", "input", "keyevent", str(keycode), timeout=5.0)
ms = int((time.monotonic() - t0) * 1000)
_action_log("adb_keyevent", {"keycode": keycode}, f"keyevent {keycode}", "", ms)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action log records for adb_keyevent omit the active ADB serial, which makes the JSONL log ambiguous if the server is started with a non-default --serial. Consider including serial in the logged args (as is already done for some other tools).

Suggested change
_action_log("adb_keyevent", {"keycode": keycode}, f"keyevent {keycode}", "", ms)
_action_log(
"adb_keyevent",
{"keycode": keycode, "serial": ADB_SERIAL},
f"keyevent {keycode}",
"",
ms,
)

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +50
if b64 and len(base64.b64decode(b64)) > 5000:
# Sanity check: real screenshots are >> 5KB; blank MEmu
# screencaps are ~3.6KB
log.info("screenshot captured via %s", name)
return b64
log.warning("%s returned suspiciously small image (%d bytes)", name, len(base64.b64decode(b64)))
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take_screenshot() base64-decodes the returned image multiple times (size check + warning), which does redundant work on potentially large screenshots. Decode once into bytes, reuse the length for the sanity check/logging, and only re-encode if needed.

Suggested change
if b64 and len(base64.b64decode(b64)) > 5000:
# Sanity check: real screenshots are >> 5KB; blank MEmu
# screencaps are ~3.6KB
log.info("screenshot captured via %s", name)
return b64
log.warning("%s returned suspiciously small image (%d bytes)", name, len(base64.b64decode(b64)))
img_bytes = base64.b64decode(b64)
img_size = len(img_bytes)
if b64 and img_size > 5000:
# Sanity check: real screenshots are >> 5KB; blank MEmu
# screencaps are ~3.6KB
log.info("screenshot captured via %s", name)
return b64
log.warning(
"%s returned suspiciously small image (%d bytes)",
name,
img_size,
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants