From de280ba0b4261048b253f46ef98bfa6d74505179 Mon Sep 17 00:00:00 2001 From: isaacbmiller Date: Sun, 1 Mar 2026 17:31:35 -0500 Subject: [PATCH] feat: improve fly E2E skill and add load test skill fly-e2e-test: - Remove tmux dependency, use direct shell commands - Fix shell redirection in fly ssh (sh -c wrapper) - Fix API key prompt (echo "Y" | pipe) - Add ha = false for single-machine tests - Add health check config (/health/live) fly-load-test (new): - SleepModule-based load testing (zero LLM cost) - Sweep --sync-workers 64/128/200 with hey - Multi-machine autoscaling validation - Production sizing guide template Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com> --- .claude/skills/fly-e2e-test/SKILL.md | 226 +++++++------ .claude/skills/fly-load-test/SKILL.md | 459 ++++++++++++++++++++++++++ 2 files changed, 581 insertions(+), 104 deletions(-) create mode 100644 .claude/skills/fly-load-test/SKILL.md diff --git a/.claude/skills/fly-e2e-test/SKILL.md b/.claude/skills/fly-e2e-test/SKILL.md index f7eade0..5b2d4d9 100644 --- a/.claude/skills/fly-e2e-test/SKILL.md +++ b/.claude/skills/fly-e2e-test/SKILL.md @@ -9,11 +9,12 @@ allowed-tools: Deploy a fresh dspy-cli project to Fly.io using your local code changes, run full integration tests (health, auth, LLM execution), and **guarantee cleanup** regardless of success or failure. -## ⚠️ CRITICAL RULES +## CRITICAL RULES 1. **NEVER commit directly to main** - Always create a side branch first, even for small changes 2. **ALWAYS clean up** - Destroy Fly apps and delete temp branches, even if tests fail 3. **Use temp branches** - Name them `e2e-test/{timestamp}-{random}` for easy identification +4. **Run cleanup in a trap** - Use bash trap or always-run-cleanup pattern ## Prerequisites @@ -24,80 +25,76 @@ Deploy a fresh dspy-cli project to Fly.io using your local code changes, run ful ## Quick Start -Run each phase in a tmux session to enable output capture and cleanup tracking. +All commands run directly in the shell (no tmux required). Use environment variables to pass state between steps. ### Phase 1: Setup Environment ```bash -# Create tmux session -tmux new-session -d -s e2e-fly -c /Users/isaac/projects/dspy-cli - -# Set variables -tmux send-keys -t e2e-fly 'export DSPY_CLI_DIR="/Users/isaac/projects/dspy-cli"' C-m -tmux send-keys -t e2e-fly 'export TIMESTAMP=$(date +%s)' C-m -tmux send-keys -t e2e-fly 'export RANDOM_SUFFIX=$(head -c 4 /dev/urandom | xxd -p)' C-m -tmux send-keys -t e2e-fly 'export FLY_APP_NAME="dspy-e2e-${RANDOM_SUFFIX}"' C-m -tmux send-keys -t e2e-fly 'export TEMP_BRANCH="e2e-test/${TIMESTAMP}-${RANDOM_SUFFIX}"' C-m +export DSPY_CLI_DIR="/Users/isaac/projects/dspy-cli" +export TIMESTAMP=$(date +%s) +export RANDOM_SUFFIX=$(head -c 4 /dev/urandom | xxd -p) +export FLY_APP_NAME="dspy-e2e-${RANDOM_SUFFIX}" +export TEMP_BRANCH="e2e-test/${TIMESTAMP}-${RANDOM_SUFFIX}" +export DSPY_API_KEY_VALUE="test-e2e-$(head -c 8 /dev/urandom | xxd -p)" # Source .env for OPENAI_API_KEY -tmux send-keys -t e2e-fly 'set -a && source .env && set +a' C-m +set -a && source "$DSPY_CLI_DIR/.env" && set +a # Verify setup -tmux send-keys -t e2e-fly 'echo "App: $FLY_APP_NAME Branch: $TEMP_BRANCH"' C-m +echo "App: $FLY_APP_NAME Branch: $TEMP_BRANCH" ``` ### Phase 2: Pre-flight Checks ```bash -# Verify fly CLI -tmux send-keys -t e2e-fly 'fly version && fly auth whoami' C-m - -# Check for uncommitted changes (stash if needed) -tmux send-keys -t e2e-fly 'git status --porcelain' C-m +fly version && fly auth whoami +git -C "$DSPY_CLI_DIR" status --porcelain # Clean up any orphaned e2e resources -tmux send-keys -t e2e-fly 'fly apps list 2>/dev/null | grep "dspy-e2e" || echo "No orphaned apps"' C-m +fly apps list 2>/dev/null | grep "dspy-e2e" || echo "No orphaned apps" ``` ### Phase 3: Create and Push Temp Branch ```bash -tmux send-keys -t e2e-fly 'git checkout -b "$TEMP_BRANCH"' C-m -tmux send-keys -t e2e-fly 'git push -u origin "$TEMP_BRANCH"' C-m +git -C "$DSPY_CLI_DIR" checkout -b "$TEMP_BRANCH" +git -C "$DSPY_CLI_DIR" push -u origin "$TEMP_BRANCH" ``` ### Phase 4: Create Test Project ```bash -# Create temp directory -tmux send-keys -t e2e-fly 'export TEST_DIR=$(mktemp -d) && echo "TEST_DIR=$TEST_DIR"' C-m - -# Create project (will prompt for API key confirmation - send Y) -tmux send-keys -t e2e-fly 'uv run --directory "$DSPY_CLI_DIR" dspy-cli new fly-e2e-test --program-name qa_module --signature "question:str -> answer:str" --module-type Predict --model openai/gpt-4o-mini' C-m +export TEST_DIR=$(mktemp -d) && echo "TEST_DIR=$TEST_DIR" -# When prompted "Proceed with this API key? [Y/n]:", send: -tmux send-keys -t e2e-fly 'Y' C-m +# Pipe "Y" to accept the API key confirmation prompt +echo "Y" | uv run --directory "$DSPY_CLI_DIR" dspy-cli new fly-e2e-test \ + --program-name qa_module \ + --signature "question:str -> answer:str" \ + --module-type Predict \ + --model openai/gpt-4o-mini -# Move project to temp dir (dspy-cli creates in current dir) -tmux send-keys -t e2e-fly 'mv "$DSPY_CLI_DIR/fly-e2e-test" "$TEST_DIR/" && cd "$TEST_DIR/fly-e2e-test"' C-m +# Move project to temp dir (dspy-cli new creates in current dir) +mv "$DSPY_CLI_DIR/fly-e2e-test" "$TEST_DIR/" +cd "$TEST_DIR/fly-e2e-test" ``` ### Phase 5: Modify for Git-Based dspy-cli ```bash -# Update pyproject.toml to install dspy-cli from temp branch -tmux send-keys -t e2e-fly 'sed -i.bak "s|\"dspy-cli\"|\"dspy-cli @ git+https://github.com/cmpnd-ai/dspy-cli.git@$TEMP_BRANCH\"|" pyproject.toml' C-m +cd "$TEST_DIR/fly-e2e-test" + +# Update pyproject.toml to install dspy-cli from temp branch (use double quotes for variable expansion) +sed -i.bak "s|\"dspy-cli\"|\"dspy-cli @ git+https://github.com/cmpnd-ai/dspy-cli.git@$TEMP_BRANCH\"|" pyproject.toml -# IMPORTANT: Update Dockerfile to include git (required for git-based deps) -# NOTE: This is an example dockerfile. There may be specific changes in a newer version of dspy-cli. Check the current Dockerfile and add the Git install line -tmux send-keys -t e2e-fly 'cat > Dockerfile << '"'"'EOF'"'"' +# Update Dockerfile: add git (required for git-based deps) +# NOTE: Check the current Dockerfile.template for the latest CMD format and update accordingly +cat > Dockerfile << 'EOF' FROM python:3.11-slim ENV PYTHONDONTWRITEBYTECODE=1 ENV PYTHONUNBUFFERED=1 ENV XDG_CACHE_HOME=/tmp/.cache -# Install git for fetching dspy-cli from git URL RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/* COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv @@ -108,16 +105,17 @@ RUN uv sync --no-dev EXPOSE 8000 CMD ["uv", "run", "dspy-cli", "serve", "--host", "0.0.0.0", "--port", "8000", "--auth", "--no-reload"] -EOF' C-m +EOF ``` ### Phase 6: Create fly.toml and Deploy ```bash -# Create fly.toml -tmux send-keys -t e2e-fly 'cat > fly.toml << EOF -app = '"'"'$FLY_APP_NAME'"'"' -primary_region = '"'"'ewr'"'"' +cd "$TEST_DIR/fly-e2e-test" + +cat > fly.toml << EOF +app = '$FLY_APP_NAME' +primary_region = 'ewr' [build] @@ -127,97 +125,120 @@ primary_region = '"'"'ewr'"'"' auto_stop_machines = true auto_start_machines = true min_machines_running = 0 - processes = ['"'"'app'"'"'] + processes = ['app'] -[[vm]] - memory = '"'"'512mb'"'"' - cpu_kind = '"'"'shared'"'"' - cpus = 1 -EOF' C-m +[deploy] + ha = false -# Create app -tmux send-keys -t e2e-fly 'fly apps create "$FLY_APP_NAME" --org personal' C-m +[checks] + [checks.health] + port = 8000 + type = "http" + interval = "10s" + timeout = "5s" + grace_period = "30s" + method = "GET" + path = "/health/live" -# Generate a random API key for testing -tmux send-keys -t e2e-fly 'export DSPY_API_KEY_VALUE="test-e2e-$(head -c 8 /dev/urandom | xxd -p)"' C-m +[[vm]] + memory = '512mb' + cpu_kind = 'shared' + cpus = 1 +EOF -# Set secrets using fly secrets (required env vars for your app) -# Add any additional env vars your project needs here -tmux send-keys -t e2e-fly 'fly secrets set OPENAI_API_KEY="$OPENAI_API_KEY" DSPY_API_KEY="$DSPY_API_KEY_VALUE" --app "$FLY_APP_NAME"' C-m +# Create app and set secrets +fly apps create "$FLY_APP_NAME" --org personal +fly secrets set OPENAI_API_KEY="$OPENAI_API_KEY" DSPY_API_KEY="$DSPY_API_KEY_VALUE" --app "$FLY_APP_NAME" # Deploy (takes ~2-3 minutes) -tmux send-keys -t e2e-fly 'fly deploy --app "$FLY_APP_NAME" --wait-timeout 300' C-m +fly deploy --app "$FLY_APP_NAME" --wait-timeout 300 ``` ### Phase 7: Run Integration Tests ```bash -tmux send-keys -t e2e-fly 'export FLY_APP_URL="https://$FLY_APP_NAME.fly.dev"' C-m +export FLY_APP_URL="https://$FLY_APP_NAME.fly.dev" + +# Wait for app to be ready (poll /health/ready) +for i in $(seq 1 30); do + STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$FLY_APP_URL/health/ready") + if [ "$STATUS" = "200" ]; then echo "App ready after ${i}s"; break; fi + sleep 1 +done + +# Test 1: Health endpoints (no auth required) +echo "=== Test 1: Liveness ===" && curl -s "$FLY_APP_URL/health/live" +echo "=== Test 2: Readiness ===" && curl -s "$FLY_APP_URL/health/ready" +echo "=== Test 3: Legacy health ===" && curl -s "$FLY_APP_URL/health" + +# Test 4: Auth redirect (unauthenticated) +echo "=== Test 4: Auth Redirect ===" && curl -s -o /dev/null -w "HTTP: %{http_code}\n" "$FLY_APP_URL/programs" + +# Test 5: Auth success (authenticated) +echo "=== Test 5: Auth Success ===" && curl -s -H "Authorization: Bearer $DSPY_API_KEY_VALUE" "$FLY_APP_URL/programs" + +# Test 6: LLM Module Execution +echo "=== Test 6: LLM Execution ===" && curl -s -X POST \ + -H "Authorization: Bearer $DSPY_API_KEY_VALUE" \ + -H "Content-Type: application/json" \ + -d '{"question": "What is 2+2? Reply with just the number."}' \ + "$FLY_APP_URL/QaModulePredict" +``` -# Test 1: Health Check -tmux send-keys -t e2e-fly 'echo "=== Test 1: Health Check ===" && curl -s "$FLY_APP_URL/health"' C-m -# Expected: {"status":"ok"} +### Phase 8: SSH Inspection (optional) -# Test 2: Auth Redirect (unauthenticated) -tmux send-keys -t e2e-fly 'echo "=== Test 2: Auth Redirect ===" && curl -s -o /dev/null -w "HTTP: %{http_code}\n" "$FLY_APP_URL/programs"' C-m -# Expected: HTTP: 303 +With `ha = false`, there's only one machine so SSH always targets it. +Shell redirects like `2>/dev/null` don't work in `-C` commands -- wrap in `sh -c`: -# Test 3: Auth Success (authenticated) -tmux send-keys -t e2e-fly 'echo "=== Test 3: Auth Success ===" && curl -s -H "Authorization: Bearer $DSPY_API_KEY_VALUE" "$FLY_APP_URL/programs"' C-m -# Expected: {"programs":[{"name":"QaModulePredict",...}]} +```bash +# Inspect the machine filesystem +fly ssh console --app "$FLY_APP_NAME" -C "sh -c 'find /root -name \"*.log\" 2>/dev/null'" -# Test 4: LLM Module Execution -tmux send-keys -t e2e-fly 'echo "=== Test 4: LLM Execution ===" && curl -s -X POST -H "Authorization: Bearer $DSPY_API_KEY_VALUE" -H "Content-Type: application/json" -d '"'"'{"question": "What is 2+2?"}'"'"' "$FLY_APP_URL/QaModulePredict"' C-m -# Expected: {"answer":"4"} (or similar) +# Check inference logs +fly ssh console --app "$FLY_APP_NAME" -C "cat /logs/QaModulePredict.log" ``` -### Phase 8: Guaranteed Cleanup +### Phase 9: Guaranteed Cleanup **ALWAYS run cleanup, even if tests fail:** ```bash # Destroy Fly app -tmux send-keys -t e2e-fly 'fly apps destroy "$FLY_APP_NAME" --yes' C-m +fly apps destroy "$FLY_APP_NAME" --yes # Delete remote branch -tmux send-keys -t e2e-fly 'git -C "$DSPY_CLI_DIR" push origin --delete "$TEMP_BRANCH"' C-m +git -C "$DSPY_CLI_DIR" push origin --delete "$TEMP_BRANCH" # Return to main and delete local branch -tmux send-keys -t e2e-fly 'git -C "$DSPY_CLI_DIR" checkout main' C-m -tmux send-keys -t e2e-fly 'git -C "$DSPY_CLI_DIR" branch -D "$TEMP_BRANCH"' C-m +git -C "$DSPY_CLI_DIR" checkout main +git -C "$DSPY_CLI_DIR" branch -D "$TEMP_BRANCH" # Remove temp directory -tmux send-keys -t e2e-fly 'rm -rf "$TEST_DIR"' C-m - -# Kill tmux session -tmux kill-session -t e2e-fly +rm -rf "$TEST_DIR" ``` ## Verification Checklist | Test | Expected Result | |------|-----------------| -| Health Check | `{"status":"ok"}` | +| `/health/live` (no auth) | `{"status":"alive"}` | +| `/health/ready` (no auth) | `{"status":"ready","programs":1}` | +| `/health` (no auth) | `{"status":"ok"}` | | Auth Redirect (no auth) | HTTP 303 | | Auth Success (Bearer token) | JSON with `QaModulePredict` | | LLM Execution | JSON with `"answer"` field | ## Cleanup Verification -After running cleanup, verify: - ```bash -# No orphaned Fly apps -fly apps list | grep "dspy-e2e" || echo "Clean" - -# No orphaned branches -git branch -r | grep "e2e-test/" || echo "Clean" +fly apps list | grep "dspy-e2e" || echo "No orphaned apps" +git branch -r | grep "e2e-test/" || echo "No orphaned branches" ``` ## Troubleshooting ### Deploy fails with "Git executable not found" -The Dockerfile must include git installation. Ensure the Dockerfile has: +The Dockerfile must include git installation: ```dockerfile RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/* ``` @@ -230,13 +251,19 @@ sed -i.bak 's|...|...|' pyproject.toml # Won't expand $TEMP_BRANCH ``` ### Project created in wrong directory -`dspy-cli new` creates projects relative to the current working directory, not where it's run from. Move the project after creation: +`dspy-cli new` creates projects relative to the current working directory. Move the project after creation: ```bash mv "$DSPY_CLI_DIR/fly-e2e-test" "$TEST_DIR/" ``` +### Shell redirects fail in fly ssh -C +Wrap the remote command in `sh -c`: +```bash +fly ssh console --app "$FLY_APP_NAME" -C "sh -c 'find / -name \"*dspy*\" 2>/dev/null'" +``` + ### Cleanup fails -If any cleanup step fails, run them individually: +Run each step individually: ```bash fly apps destroy "dspy-e2e-XXXX" --yes git push origin --delete "e2e-test/XXXX" @@ -245,27 +272,18 @@ git branch -D "e2e-test/XXXX" ``` ### App crashes due to missing environment variables -Use `fly secrets` to set any required env vars. Check the app logs to see which vars are missing: ```bash -# View logs to find missing env vars fly logs --app "$FLY_APP_NAME" --no-tail - -# Set additional secrets as needed -fly secrets set VAR_NAME="value" ANOTHER_VAR="value" --app "$FLY_APP_NAME" - -# List current secrets +fly secrets set VAR_NAME="value" --app "$FLY_APP_NAME" fly secrets list --app "$FLY_APP_NAME" ``` -Common env vars that might be needed: +Common env vars: - `OPENAI_API_KEY` - Required for OpenAI models - `DSPY_API_KEY` - Required when `--auth` is enabled -- Project-specific vars (check your gateway's `setup()` method) - -## Multi-Layer Cleanup Protection -1. **Unique naming**: `dspy-e2e-{random}` prevents conflicts -2. **Pre-test orphan cleanup**: Removes stale resources before starting -3. **tmux session**: Enables output capture and manual recovery -4. **Explicit cleanup phase**: Always runs after tests -5. **Verification commands**: Confirm cleanup succeeded +### Per-machine cache fragmentation (multi-machine deployments) +The `ha = false` setting in fly.toml keeps E2E tests on a single machine, +avoiding this issue. For production deployments with multiple machines, the +LM response cache (`.dspy_cache`) is local to each VM, so requests hitting +different machines may miss the cache. diff --git a/.claude/skills/fly-load-test/SKILL.md b/.claude/skills/fly-load-test/SKILL.md new file mode 100644 index 0000000..934c1f8 --- /dev/null +++ b/.claude/skills/fly-load-test/SKILL.md @@ -0,0 +1,459 @@ +--- +name: fly-load-test +description: Load test dspy-cli on Fly.io with synthetic delay module (zero LLM cost). Finds per-machine concurrency ceiling, tests autoscaling, produces sizing guide. (project) +allowed-tools: + - Bash +--- + +# Fly.io Load Test Skill + +Deploy dspy-cli with a **SleepModule** (zero LLM cost) to Fly.io, use `hey` to find the per-machine concurrency ceiling, test multi-machine autoscaling, and produce a production sizing guide. + +## CRITICAL RULES + +1. **NEVER commit directly to main** - Always create a side branch +2. **ALWAYS clean up** - Destroy Fly apps and delete temp branches, even if tests fail +3. **No real LLM calls** - The SleepModule simulates latency with `time.sleep()` +4. **Record all results** - Print `hey` output and memory stats for every phase + +## Prerequisites + +1. **fly CLI**: Installed and authenticated (`fly auth whoami`) +2. **hey**: Load testing tool (`brew install hey`) +3. **Git**: Clean working directory (stash uncommitted changes first) +4. **Git push access**: Ability to push to origin + +## Quick Start + +### Phase 1: Setup Environment + +```bash +export DSPY_CLI_DIR="/Users/isaac/projects/dspy-cli" +export TIMESTAMP=$(date +%s) +export RANDOM_SUFFIX=$(head -c 4 /dev/urandom | xxd -p) +export FLY_APP_NAME="dspy-load-${RANDOM_SUFFIX}" +export TEMP_BRANCH="load-test/${TIMESTAMP}-${RANDOM_SUFFIX}" +export DSPY_API_KEY_VALUE="load-test-$(head -c 8 /dev/urandom | xxd -p)" + +echo "App: $FLY_APP_NAME Branch: $TEMP_BRANCH" +``` + +### Phase 2: Pre-flight Checks + +```bash +fly version && fly auth whoami +which hey || echo "INSTALL hey: brew install hey" +git -C "$DSPY_CLI_DIR" status --porcelain + +# Clean up any orphaned load test resources +fly apps list 2>/dev/null | grep "dspy-load" || echo "No orphaned apps" +``` + +### Phase 3: Create and Push Temp Branch + +```bash +git -C "$DSPY_CLI_DIR" checkout -b "$TEMP_BRANCH" +git -C "$DSPY_CLI_DIR" push -u origin "$TEMP_BRANCH" +``` + +### Phase 4: Create Test Project with SleepModule + +```bash +export TEST_DIR=$(mktemp -d) && echo "TEST_DIR=$TEST_DIR" + +# Create project (pipe "Y" to accept the API key prompt) +echo "Y" | uv run --directory "$DSPY_CLI_DIR" dspy-cli new load-test-app \ + --program-name sleep_module \ + --signature "delay_seconds:float -> result:str" \ + --module-type Predict \ + --model openai/gpt-4o-mini + +mv "$DSPY_CLI_DIR/load-test-app" "$TEST_DIR/" +cd "$TEST_DIR/load-test-app" +``` + +Now replace the generated module with SleepModule (which never calls an LM): + +```bash +cd "$TEST_DIR/load-test-app" + +# Find the generated module file and replace it +MODULE_FILE=$(find src/*/modules/ -name "*.py" ! -name "__init__.py" | head -1) +echo "Replacing module: $MODULE_FILE" + +cat > "$MODULE_FILE" << 'PYEOF' +import time +import dspy + + +class SleepModule(dspy.Module): + """Synthetic delay module for load testing. Never calls an LLM.""" + + def forward(self, delay_seconds: float = 1.0) -> str: + time.sleep(delay_seconds) + return f"slept {delay_seconds}s" +PYEOF +``` + +### Phase 5: Modify for Git-Based dspy-cli + +```bash +cd "$TEST_DIR/load-test-app" + +# Install dspy-cli from temp branch +sed -i.bak "s|\"dspy-cli\"|\"dspy-cli @ git+https://github.com/cmpnd-ai/dspy-cli.git@$TEMP_BRANCH\"|" pyproject.toml + +# Custom Dockerfile with git support +cat > Dockerfile << 'EOF' +FROM python:3.11-slim + +ENV PYTHONDONTWRITEBYTECODE=1 +ENV PYTHONUNBUFFERED=1 +ENV XDG_CACHE_HOME=/tmp/.cache + +RUN apt-get update && apt-get install -y git && rm -rf /var/lib/apt/lists/* + +COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv + +COPY . . +RUN uv sync --no-dev + +EXPOSE 8000 + +CMD ["uv", "run", "dspy-cli", "serve", "--host", "0.0.0.0", "--port", "8000", "--auth", "--no-reload", "--sync-workers", "64"] +EOF +``` + +**IMPORTANT**: The `--sync-workers` value in the CMD changes per phase: +- Phase A: `--sync-workers 64` +- Phase B: `--sync-workers 128` +- Phase C: `--sync-workers 256` + +To change it between phases, edit the Dockerfile CMD and redeploy: +```bash +sed -i.bak 's/--sync-workers [0-9]*/--sync-workers 128/' Dockerfile +fly deploy --app "$FLY_APP_NAME" --wait-timeout 300 +``` + +### Phase 6: Create fly.toml and Deploy + +```bash +cd "$TEST_DIR/load-test-app" + +cat > fly.toml << EOF +app = '$FLY_APP_NAME' +primary_region = 'ewr' + +[build] + +[deploy] + ha = false + +[http_service] + internal_port = 8000 + force_https = true + auto_stop_machines = 'stop' + auto_start_machines = true + min_machines_running = 1 + processes = ['app'] + + [http_service.concurrency] + type = 'requests' + soft_limit = 100 + hard_limit = 128 + +[checks] + [checks.health] + port = 8000 + type = "http" + interval = "10s" + timeout = "5s" + grace_period = "30s" + method = "GET" + path = "/health/live" + +[[vm]] + memory = '1gb' + cpu_kind = 'shared' + cpus = 2 +EOF + +fly apps create "$FLY_APP_NAME" --org personal + +# Dummy OpenAI key -- SleepModule never calls an LM, but dspy.LM() init needs it +fly secrets set OPENAI_API_KEY="sk-dummy-not-used" DSPY_API_KEY="$DSPY_API_KEY_VALUE" --app "$FLY_APP_NAME" + +fly deploy --app "$FLY_APP_NAME" --wait-timeout 300 +``` + +### Phase 7: Wait for Ready + +```bash +export FLY_APP_URL="https://$FLY_APP_NAME.fly.dev" + +for i in $(seq 1 60); do + STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$FLY_APP_URL/health/ready") + if [ "$STATUS" = "200" ]; then echo "App ready after ${i}s"; break; fi + echo "Waiting... ($STATUS)" + sleep 2 +done + +# Confirm SleepModule is discovered +curl -s -H "Authorization: Bearer $DSPY_API_KEY_VALUE" "$FLY_APP_URL/programs" +``` + +### Phase 8: Single-Machine Load Tests + +Run `hey` sweeps at increasing concurrency. Each run sends 200 requests with a 1s sleep delay. Theoretical max throughput = `min(concurrency, sync_workers)` rps. + +**Phase A: --sync-workers 64** + +```bash +export URL="$FLY_APP_URL/SleepModule" +export AUTH="Authorization: Bearer $DSPY_API_KEY_VALUE" +export BODY='{"delay_seconds": 1.0}' + +echo "=== Phase A: 64 workers, c=10 ===" +hey -n 200 -c 10 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +echo "=== Phase A: 64 workers, c=32 ===" +hey -n 200 -c 32 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +echo "=== Phase A: 64 workers, c=64 ===" +hey -n 200 -c 64 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +echo "=== Phase A: 64 workers, c=100 ===" +hey -n 200 -c 100 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +echo "=== Phase A: 64 workers, c=128 ===" +hey -n 200 -c 128 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +# Check memory after heavy load +fly ssh console --app "$FLY_APP_NAME" -C "cat /proc/meminfo | head -5" +``` + +**Phase B: --sync-workers 128** (redeploy first) + +```bash +cd "$TEST_DIR/load-test-app" +sed -i.bak 's/--sync-workers [0-9]*/--sync-workers 128/' Dockerfile +fly deploy --app "$FLY_APP_NAME" --wait-timeout 300 + +# Wait for ready +for i in $(seq 1 60); do + STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$FLY_APP_URL/health/ready") + if [ "$STATUS" = "200" ]; then echo "Ready after ${i}s"; break; fi + sleep 2 +done + +echo "=== Phase B: 128 workers, c=64 ===" +hey -n 200 -c 64 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +echo "=== Phase B: 128 workers, c=128 ===" +hey -n 200 -c 128 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +echo "=== Phase B: 128 workers, c=200 ===" +hey -n 300 -c 200 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +echo "=== Phase B: 128 workers, c=256 ===" +hey -n 300 -c 256 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +fly ssh console --app "$FLY_APP_NAME" -C "cat /proc/meminfo | head -5" +``` + +**Phase C: --sync-workers 256** (redeploy, only if Phase B didn't OOM) + +```bash +cd "$TEST_DIR/load-test-app" +sed -i.bak 's/--sync-workers [0-9]*/--sync-workers 256/' Dockerfile +fly deploy --app "$FLY_APP_NAME" --wait-timeout 300 + +for i in $(seq 1 60); do + STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$FLY_APP_URL/health/ready") + if [ "$STATUS" = "200" ]; then echo "Ready after ${i}s"; break; fi + sleep 2 +done + +echo "=== Phase C: 256 workers, c=128 ===" +hey -n 200 -c 128 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +echo "=== Phase C: 256 workers, c=256 ===" +hey -n 300 -c 256 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +echo "=== Phase C: 256 workers, c=300 ===" +hey -n 300 -c 300 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +fly ssh console --app "$FLY_APP_NAME" -C "cat /proc/meminfo | head -5" +``` + +### Phase 9: Multi-Machine Autoscaling Test + +Use the best `--sync-workers` from Phases A-C. Remove `ha = false` and scale to 3 machines: + +```bash +cd "$TEST_DIR/load-test-app" + +# Update fly.toml: remove ha = false +sed -i.bak '/ha = false/d' fly.toml + +# Set concurrency limits based on findings (adjust these!) +# soft_limit = ~80% of sync_workers, hard_limit = sync_workers +# Example for 128 workers: +sed -i.bak 's/soft_limit = [0-9]*/soft_limit = 100/' fly.toml +sed -i.bak 's/hard_limit = [0-9]*/hard_limit = 128/' fly.toml + +fly deploy --app "$FLY_APP_NAME" --wait-timeout 300 + +# Scale to 3 machines +fly scale count 3 --app "$FLY_APP_NAME" + +# Wait for all machines to be ready +sleep 30 +fly machines list --app "$FLY_APP_NAME" + +for i in $(seq 1 60); do + STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$FLY_APP_URL/health/ready") + if [ "$STATUS" = "200" ]; then echo "Ready after ${i}s"; break; fi + sleep 2 +done +``` + +Now blast at concurrency levels that should trigger multi-machine distribution: + +```bash +# Should spread across machines (3 x 128 = 384 slots) +echo "=== Autoscale: c=100 (fits in 1 machine) ===" +hey -n 300 -c 100 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +echo "=== Autoscale: c=200 (needs 2 machines) ===" +hey -n 400 -c 200 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +echo "=== Autoscale: c=300 (needs 3 machines) ===" +hey -n 600 -c 300 -t 30 -m POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +# Check machine status (which ones are started/stopped) +fly machines list --app "$FLY_APP_NAME" +``` + +**Test auto-stop/auto-start:** + +```bash +# Wait for idle machines to stop (~5 min) +echo "Waiting 5 minutes for auto-stop..." +sleep 300 +fly machines list --app "$FLY_APP_NAME" + +# Hit the endpoint -- should auto-start a machine +echo "=== Cold start test ===" +time curl -s -X POST -H "$AUTH" -H "Content-Type: application/json" -d "$BODY" "$URL" + +# Check: how long did the cold start take? +fly machines list --app "$FLY_APP_NAME" +``` + +### Phase 10: Guaranteed Cleanup + +**ALWAYS run cleanup, even if tests fail:** + +```bash +fly apps destroy "$FLY_APP_NAME" --yes + +git -C "$DSPY_CLI_DIR" push origin --delete "$TEMP_BRANCH" +git -C "$DSPY_CLI_DIR" checkout main +git -C "$DSPY_CLI_DIR" branch -D "$TEMP_BRANCH" + +rm -rf "$TEST_DIR" +``` + +## Interpreting Results + +### hey Output Key Metrics + +``` +Summary: + Total: X.XXX secs ← wall clock time + Requests/sec: XX.XX ← throughput (target: min(concurrency, sync_workers)) + +Latency distribution: + 50% in X.XXX secs ← should be ~1s (the sleep duration) when not queuing + 95% in X.XXX secs ← spikes here = queuing + 99% in X.XXX secs ← worst case + +Status code distribution: + [200] XXX responses ← success + [503] XXX responses ← server overloaded (hit hard_limit or OOM) +``` + +### What Good Looks Like + +| Concurrency | Expected RPS (128 workers) | Expected p50 | Sign of trouble | +|-------------|---------------------------|-------------|-----------------| +| c <= workers | ~c rps | ~1.0s | - | +| c = workers | ~workers rps | ~1.0s | Perfect saturation | +| c = 1.5x workers | ~workers rps | ~1.5s | Queuing (expected) | +| c = 2x workers | ~workers rps | ~2.0s | Deep queue | +| Any | < expected | > 3s | OOM, CPU thrash, or errors | + +### Memory Check + +```bash +fly ssh console --app "$FLY_APP_NAME" -C "cat /proc/meminfo | head -5" +``` + +If `MemAvailable` drops below ~100MB under load, you've found the memory wall. Reduce `--sync-workers` or increase VM memory. + +## Production Sizing Guide + +*Fill in after running tests. Template:* + +| Target Concurrent | VM | `--sync-workers` | `soft_limit` | `hard_limit` | Machines | +|-------------------|-----|-----------------|-------------|-------------|----------| +| 50 | shared-cpu-2x 1gb | ? | ? | ? | 1 | +| 100 | shared-cpu-2x 1gb | ? | ? | ? | 1 | +| 200 | shared-cpu-2x 1gb | ? | ? | ? | 2 | +| 500 | shared-cpu-2x 1gb | ? | ? | ? | 4-5 | + +**Rules:** +- `hard_limit = sync_workers` (the thread pool ceiling; no more concurrent work is possible) +- `soft_limit = ~80% of sync_workers` (gives fly ~seconds to wake another machine) +- Machines = `ceil(target_concurrent / hard_limit)` + +## Cleanup Verification + +```bash +fly apps list | grep "dspy-load" || echo "No orphaned apps" +git branch -r | grep "load-test/" || echo "No orphaned branches" +``` + +## Troubleshooting + +### SleepModule not discovered +Check that the module file is in `src//modules/` and the class inherits from `dspy.Module`. Verify with: +```bash +fly ssh console --app "$FLY_APP_NAME" -C "sh -c 'find /src -name \"*.py\" | head -20'" +``` + +### "No module named dspy" during build +The `uv sync --no-dev` in Dockerfile should install dspy via the dspy-cli dependency. Check `pyproject.toml` has the git URL correctly. + +### OOM kills during load test +Reduce `--sync-workers`, or increase VM memory. Check: +```bash +fly logs --app "$FLY_APP_NAME" --no-tail | grep -i "oom\|kill\|memory" +``` + +### hey: "socket: too many open files" +On macOS, increase ulimit before running hey: +```bash +ulimit -n 10240 +``` + +### Autoscaling doesn't trigger +Verify concurrency limits in fly.toml match what's deployed: +```bash +fly config show --app "$FLY_APP_NAME" | grep -A5 concurrency +``` +Fly only wakes stopped machines when `soft_limit` is exceeded. If all machines are already running, no new ones start (fly doesn't create machines, only starts/stops existing ones). + +### Machines don't auto-stop +`auto_stop_machines = 'stop'` only stops machines with zero connections. If hey keeps connections alive, wait for them to close. Default idle timeout is ~5 minutes.