Skip to content

Latest commit

 

History

History
162 lines (132 loc) · 5.65 KB

File metadata and controls

162 lines (132 loc) · 5.65 KB

Project Structure

MobileAgent-Virtual-Runner/
│
├── README.md                    # Main documentation
├── LICENSE                      # MIT License
├── CONTRIBUTING.md              # Contribution guidelines
├── PROJECT_STRUCTURE.md         # This file
│
├── setup.sh                     # Setup script
├── requirements.txt             # Python dependencies
├── Dockerfile                   # Docker image definition
├── docker-compose.yml           # Docker Compose configuration
│
├── .env.example                 # Environment variables template
├── .gitignore                   # Git ignore patterns
│
├── run_virtual.py               # Main execution script
├── run_virtual.sh               # Bash wrapper with presets
├── virtual_env_adapter.py       # Environment adapter
├── virtual_env_gemini3.py       # Gemini-based virtual environment
├── start_Android.png            # Initial Android screenshot
│
├── android_world/               # Core framework (from android_world)
│   ├── __init__.py
│   ├── agents/                  # Agent implementations
│   │   ├── base_agent.py
│   │   ├── mobile_agent_v3.py
│   │   ├── gui_owl.py
│   │   ├── infer_ma3.py
│   │   └── ...
│   ├── env/                     # Environment interfaces
│   │   ├── interface.py
│   │   ├── representation_utils.py
│   │   └── ...
│   ├── task_evals/              # Task evaluation modules
│   ├── utils/                   # Utility functions
│   ├── registry.py              # Task registry
│   ├── suite_utils.py           # Suite utilities
│   ├── checkpointer.py          # Checkpoint management
│   └── constants.py             # Constants
│
├── config/                      # Configuration files
│   └── README.md                # Config documentation
│
├── examples/                    # Usage examples
│   ├── quick_start.sh           # Quick start guide
│   └── custom_task.py           # Custom task example
│
└── outputs/                     # Generated at runtime
    ├── logs/                    # Execution logs
    ├── trajectories/            # Action trajectories
    └── results/                 # Task results

Key Components

Core Scripts

  • run_virtual.py: Main Python script that orchestrates the virtual environment and agent execution
  • run_virtual.sh: Bash wrapper providing convenient presets for common task categories
  • setup.sh: One-time setup script for environment configuration

Virtual Environment

  • virtual_env_gemini3.py: Implements the virtual Android environment using Gemini API
  • virtual_env_adapter.py: Adapts the virtual environment to the android_world interface
  • start_Android.png: Initial Android home screen screenshot

Framework Integration

  • android_world/: Minimal subset of the android_world framework
    • Contains agent implementations (Mobile-Agent-v3, GUI-Owl)
    • Environment interfaces and utilities
    • Task registry and evaluation modules
    • Only essential files included (no physical device/emulator dependencies)

Configuration

  • .env.example: Template for environment variables
  • config/: Directory for task definitions and model configurations

Examples & Documentation

  • examples/: Ready-to-run examples
  • README.md: Comprehensive usage guide
  • CONTRIBUTING.md: Guidelines for contributors
  • PROJECT_STRUCTURE.md: This file

File Responsibilities

Main Execution Flow

  1. run_virtual.sh → Parses arguments, sets environment variables
  2. run_virtual.py → Creates virtual environment and agent
  3. virtual_env_adapter.py → Adapts virtual env to expected interface
  4. virtual_env_gemini3.py → Generates screenshots via Gemini API
  5. android_world/agents/ → Agent processes screenshots and decides actions
  6. Output → Logs, trajectories, and results saved

Configuration Loading Priority

  1. Command-line arguments (highest priority)
  2. Environment variables
  3. .env file
  4. Default values (lowest priority)

Output Structure

outputs/
├── logs/
│   └── log_virtual_2024-12-02_11-14-51.log
├── trajectories/
│   └── traj_virtual_2024-12-02_11-14-51/
│       ├── ForwardMultipleMessagesToGroup/
│       │   └── traj.jsonl
│       └── BatchEditContactTags/
│           └── traj.jsonl
└── results/
    └── checkpoint_*.json

Dependencies

Required

  • Python 3.11+
  • Gemini API access (for virtual environment)
  • GUI-Owl or compatible LLM (for agent)

Optional

  • Docker (for containerized execution)
  • GPU (for local GUI-Owl deployment)

Minimal vs Full Install

Minimal (Virtual Only)

  • Core scripts: run_virtual.py, virtual_env_*.py
  • Required: Gemini API only
  • No Android SDK/emulator needed

Full (With Device Support)

  • Full android_world framework
  • Android SDK and emulator
  • Physical device support

This project is configured for Minimal installation.

Customization Points

  1. Tasks: Modify task definitions in run_virtual.py or add to config/
  2. Models: Switch agent models via environment variables
  3. Resolution: Adjust virtual screen resolution
  4. Prompts: Customize Gemini prompts in virtual_env_gemini3.py

Notes

  • The android_world directory contains only essential files
  • No test files or device-specific code included
  • Dockerfile simplified (no Android SDK installation)
  • All hardcoded paths replaced with environment variables