-
Notifications
You must be signed in to change notification settings - Fork 0
feat(azure): implement Azure ML parallelization for WAA evaluation #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
abrichr
wants to merge
12
commits into
main
Choose a base branch
from
feature/azure-parallelization
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- Remove unvalidated badges (95%+ success rate, 67% cost savings) - Add "First open-source WAA reproduction" as headline - Move WAA to top as main feature with status indicator - Change "Recent Improvements" to "Roadmap (In Progress)" - Remove v0.2.0 version references (current is v0.1.1) - Add Azure quota requirements note for parallelization - Mark features as [IN PROGRESS] where appropriate Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Complete the Azure ML parallelization implementation: 1. Agent config serialization (_serialize_agent_config): - Extracts provider, model, and API keys from agent - Passes OPENAI_API_KEY/ANTHROPIC_API_KEY via env vars - Supports OpenAI and Anthropic agents 2. Worker command building (_build_worker_command): - Uses vanilla WAA run.py with --worker_id and --num_workers - Matches Microsoft's official Azure deployment pattern - Task distribution handled by WAA internally 3. Result fetching (_fetch_worker_results, _parse_waa_results): - Downloads job outputs via Azure ML SDK - Parses WAA result.txt files (0.0 or 1.0 score) - Handles partial results for failed jobs 4. Job status tracking: - Added job_name field to WorkerState - Updated _wait_and_collect_results to poll job status - Fixed: was checking compute status instead of job status 5. Log fetching (get_job_logs in AzureMLClient): - Downloads logs via az ml job download - Supports tail parameter for last N lines - Updated health_checker to use new method Uses vanilla windowsarena/winarena:latest with VERSION=11e. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
WAA is already open-source from Microsoft. Changed to accurate claim: "Simplified CLI toolkit for Windows Agent Arena" Updated value proposition to reflect what we actually provide: - Azure VM setup and SSH tunnel management - Agent adapters for Claude/GPT/custom agents - Results viewer - Parallelization support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The code uses Standard_D4s_v5 (4 vCPUs) by default, not D8ds_v5. Updated all references to be accurate. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3 tasks
New command that: - Checks Azure CLI installation and login status - Creates resource group (default: openadapt-agents) - Creates ML workspace (default: openadapt-ml) - Writes config to .env file Usage: uv run python -m openadapt_evals.benchmarks.cli azure-setup Also improved azure command error message to guide users to run setup. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The vanilla windowsarena/winarena:latest image does NOT work for unattended WAA installation. This adds: - `waa-image build` - Build custom waa-auto image locally - `waa-image push` - Push to Docker Hub or ACR - `waa-image build-push` - Build and push in one command - `waa-image check` - Check if image exists in registry Also updates azure.py to use openadaptai/waa-auto:latest as default image. The custom Dockerfile (in waa_deploy/) includes: - Modern dockurr/windows base (auto-downloads Windows 11) - FirstLogonCommands patches for unattended installation - Python 3.9 with transformers 4.46.2 (navi agent compatibility) - api_agent.py for Claude/GPT support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ECR as the default registry (ecr, dockerhub, acr options) - Auto-create ECR repository if it doesn't exist - Auto-login to ECR Public using AWS CLI - Update azure.py to use public.ecr.aws/g3w3k7s5/waa-auto:latest as default - Update docs with new default image ECR Public is preferred because: - No Docker Hub login required - Uses existing AWS credentials - Public access for Azure ML to pull without cross-cloud auth Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The windowsarena/winarena base image is only available for linux/amd64. This fixes builds on macOS (arm64) by explicitly specifying the target platform. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add `aws-costs` command to show AWS cost breakdown using Cost Explorer API - Shows current month costs (total and by service) - Shows historical monthly costs - Shows ECR storage costs specifically - Add `waa-image delete` action to clean up registry resources - ECR: Deletes repository with --force - Docker Hub: Shows manual instructions (free tier) - ACR: Deletes repository - Change default registry from ECR to Docker Hub - Docker Hub is free (no storage charges) - Use ECR when rate limiting becomes an issue Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Automatically bumps version and creates tags on PR merge: - feat: minor version bump - fix/perf: patch version bump - docs/style/refactor/test/chore/ci/build: patch version bump Triggers publish.yml which deploys to PyPI. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: Azure ML compute instances don't have Docker installed. Our code used SDK V2 command jobs which run in bare Python environment, never calling /entry_setup.sh to start QEMU/Windows. Fix follows Microsoft's official WAA Azure pattern: - Add azureml-core dependency (SDK V1) - Use DockerConfiguration with NET_ADMIN capability for QEMU networking - Create run_entry.py that calls /entry_setup.sh before running client - Create compute-instance-startup.sh to stop conflicting services (DNS, nginx) - Use ScriptRunConfig instead of raw command jobs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Completes the Azure ML parallelization implementation for WAA evaluation, enabling parallel execution across multiple Azure VMs.
Key changes:
Agent config serialization (
_serialize_agent_config):OPENAI_API_KEY/ANTHROPIC_API_KEYvia environment variablesWorker command building (
_build_worker_command):run.pywith--worker_idand--num_workersResult fetching (
_fetch_worker_results,_parse_waa_results):client.jobs.download())result.txtfiles (0.0 or 1.0 score)Job status tracking:
job_namefield toWorkerStatedataclass_wait_and_collect_resultsto poll job status (not compute status)Log fetching (
get_job_logsinAzureMLClient):az ml job downloadtailparameter for last N lineshealth_checkerto use new method instead of returning empty stringDesign decisions:
windowsarena/winarena:latestwithVERSION=11e(no custom Dockerfile)run_azure.py--worker_id/--num_workersmechanismTest plan
Generated with Claude Code