AaronGoldsmith · AaronGoldsmith · Mar 15, 2026 · Mar 15, 2026 · Mar 15, 2026 · Mar 15, 2026
diff --git a/.claude/skills/mobius-run/SKILL.md b/.claude/skills/mobius-run/SKILL.md
@@ -2,37 +2,153 @@
 name: mobius-run
 description: Use when the user says "compete", "mobius run", or wants to pit agents against each other on a task.
 user-invocable: true
-argument-hint: <task description>
+argument-hint: <task description> [--free] [--api]
 ---
 
 # Mobius Competition Runner
 
 You are the orchestrator for Mobius, an adversarial agent swarm system. The user wants to run a competition.
 
-## What to do
+## Determine mode
+
+- **`--free`** (DEFAULT): Run the competition entirely within Claude Code using subagents. Zero API cost. You generate challenger personas on the fly, spawn them as haiku subagents, collect outputs, and judge them yourself.
+- **`--api`**: Run via the CLI with real API calls (cross-family diversity, costs money).
+
+If neither flag is given, default to `--free` mode.
+
+---
+
+## MODE: --free (Subagent Competition)
+
+This is the exciting part. You ARE the competition engine — no API calls needed.
+
+### Step 1: Initialize
 
-1. Check that Mobius is initialized:
 ```bash
 python -m mobius.cli stats
 ```
 
-2. If not initialized, run:
+If not initialized: `python -m mobius.cli init`
+
+### Step 2: Choose agents
+
+You have two options. Use whichever fits:
+
+**Option A — Use existing agents from the registry:**
+```bash
+python .claude/skills/mobius-run/scripts/create_match.py "<TASK>" --count 6
+```
+This returns JSON with agent details including their system_prompts. Use these prompts for the subagents.
+
+**Option B — Generate fresh challengers on the fly (PREFERRED for interesting results):**
+Analyze the task and design 4-8 complementary approaches that attack it from deliberately different angles. Think about what dimensions of variation would produce genuinely diverse solutions — not just "creative vs analytical" but specific strategic differences relevant to THIS task.
+
+For each challenger, create a short but specific system prompt (3-5 sentences) that defines their approach. Then register them:
+```bash
+python .claude/skills/mobius-seed/scripts/create_agent.py '{"name":"...", "slug":"...", "description":"...", "system_prompt":"...", "specializations":[...], "provider":"anthropic", "model":"claude-haiku-4-5-20251001"}'
+```
+
+Then create the match:
+```bash
+python .claude/skills/mobius-run/scripts/create_match.py "<TASK>" --agents slug1,slug2,slug3,...
+```
+
+**Option C — Mix both:** Pull veterans from the registry AND generate fresh challengers. Pit them against each other.
+
+### Step 3: Spawn subagents
+
+For each agent from the match JSON, spawn a haiku subagent using the Agent tool:
+- Set `model: "haiku"` on each agent
+- Pass the agent's system_prompt as context plus the competition task
+- Use `subagent_type: "general-purpose"`
+- **IMPORTANT: Launch ALL subagents in a SINGLE message** so they run in parallel
+- Each subagent prompt should be structured as:
+
+```
+You are competing in a Mobius adversarial swarm competition.
+
+YOUR IDENTITY AND APPROACH:
+<the agent's system_prompt from the registry>
+
+YOUR TASK:
+<the competition task>
+
+Produce your best solution. Be thorough but focused. Output ONLY your solution.
+```
+
+If you have more than 6 agents, batch them: spawn the first 6, wait for results, then spawn the next batch.
+
+### Step 4: Record outputs
+
+After each subagent returns, pipe its output to the match record:
+```bash
+echo "<agent_output>" | python .claude/skills/mobius-run/scripts/record_outputs.py <match_id> <agent_id>
+```
+
+You can record outputs incrementally as agents finish — each call merges into the existing record. Or record all at once with `--bulk`:
 ```bash
-python -m mobius.cli init
+echo '<outputs_json>' | python .claude/skills/mobius-run/scripts/record_outputs.py <match_id> --bulk
 ```
 
-3. If no agents exist, suggest running `/mobius-seed` first.
+### Step 5: Judge
+
+You ARE the judge. Score each output on:
+- **Correctness** (0-10): Does it solve the task accurately?
+- **Quality** (0-10): Is it well-structured, readable, best practices?
+- **Completeness** (0-10): Does it fully address all aspects?
+
+Be ruthless and fair. Don't let positional bias affect you — judge purely on merit.
+
+### Step 6: Record verdict
 
-4. Run the competition with the user's task:
+```bash
+python .claude/skills/mobius-judge/scripts/record_verdict.py \
+  --match <match_id> \
+  <winner_agent_id> \
+  '{"agent_id_1": 28.5, "agent_id_2": 22.0, ...}' \
+  "Your detailed reasoning"
+```
+
+Use the match_id from Step 2 to ensure the verdict is recorded against the correct match.
+
+### Step 7: Show results
+
+```bash
+python -m mobius.cli leaderboard
+```
+
+Present: the winner, your reasoning, Elo changes, and the winning solution.
+
+---
+
+## MODE: --api (CLI Competition)
+
+Traditional mode using real API calls.
+
+1. Check initialization:
+```bash
+python -m mobius.cli stats
+```
+
+2. If no agents exist, suggest `/mobius-seed` first.
+
+3. Run the competition:
 ```bash
 python -m mobius.cli run "<TASK>"
 ```
 
-5. After the competition, show the explain output:
+4. Show results:
 ```bash
 python -m mobius.cli explain
 ```
 
-6. Present the winning output to the user along with the judge reasoning.
+5. Present the winning output and judge reasoning to the user.
+
+---
+
+## Tips
 
-If the user didn't provide a task argument, ask them what they want the agents to compete on.
+- For `--free` mode, you can scale to 12+ agents easily — haiku is fast and cheap (free on Pro)
+- Generate challengers that are *orthogonal*, not just variations. Each should have a genuinely different strategy.
+- If an existing champion agent loses to a fresh challenger, that's interesting — note it for the user
+- The `--free` mode integrates with the same Elo system as `--api` — results are comparable
diff --git a/.claude/skills/mobius-run/scripts/create_match.py b/.claude/skills/mobius-run/scripts/create_match.py
@@ -0,0 +1,114 @@
+"""Create a match record for a free (subagent-based) competition.
+
+Usage:
+    python create_match.py "<task>" [--agents <slug1,slug2,...>] [--count N]
+
+Modes:
+    --agents slug1,slug2   Use specific agents from registry by slug
+    --count N              Pick top N agents by Elo (default: 5)
+
+Outputs JSON with match_id and agent details for the skill to orchestrate.
+"""
+
+import json
+import sys
+
+sys.path.insert(0, "src")
+
+from mobius.config import get_config
+from mobius.db import init_db
+from mobius.models import MatchRecord
+from mobius.registry import Registry
+
+
+def main():
+    args = sys.argv[1:]
+    if not args:
+        print("Usage: python create_match.py '<task>' [--agents s1,s2] [--count N]")
+        sys.exit(1)
+
+    task = args[0]
+    slugs = None
+    count = 5
+
+    i = 1
+    while i < len(args):
+        if args[i] == "--agents" and i + 1 < len(args):
+            slugs = [s.strip() for s in args[i + 1].split(",")]
+            i += 2
+        elif args[i] == "--count" and i + 1 < len(args):
+            count = int(args[i + 1])
+            i += 2
+        else:
+            i += 1
+
+    config = get_config()
+    conn, _ = init_db(config)
+    registry = Registry(conn, config)
+
+    # Select agents
+    agents = []
+    if slugs:
+        for slug in slugs:
+            agent = registry.get_agent_by_slug(slug)
+            if agent:
+                agents.append(agent)
+            else:
+                print(f"Warning: agent '{slug}' not found, skipping", file=sys.stderr)
+    else:
+        all_agents = registry.list_agents()
+        all_agents.sort(key=lambda a: a.elo_rating, reverse=True)
+        agents = all_agents[:count]
+
+    if len(agents) < 2:
+        print(json.dumps({"error": "Need at least 2 agents", "agent_count": len(agents)}))
+        sys.exit(1)
+
+    # Create match record (outputs empty — skill will fill them)
+    match = MatchRecord(
+        task_description=task,
+        competitor_ids=[a.id for a in agents],
+    )
+
+    conn.execute(
+        """INSERT INTO matches (id, task_description, competitor_ids, outputs, judge_models,
+            judge_reasoning, winner_id, scores, voided, created_at)
+        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
+        (
+            match.id,
+            match.task_description,
+            json.dumps(match.competitor_ids),
+            json.dumps({}),
+            json.dumps([]),
+            "",
+            None,
+            json.dumps({}),
+            0,
+            match.created_at.isoformat(),
+        ),
+    )
+    conn.commit()
+
+    # Output agent details for the skill
+    result = {
+        "match_id": match.id,
+        "task": task,
+        "agents": [
+            {
+                "id": a.id,
+                "name": a.name,
+                "slug": a.slug,
+                "system_prompt": a.system_prompt,
+                "specializations": a.specializations,
+                "elo_rating": a.elo_rating,
+            }
+            for a in agents
+        ],
+    }
+
+    print(json.dumps(result, indent=2))
+    conn.close()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/.claude/skills/mobius-run/scripts/record_outputs.py b/.claude/skills/mobius-run/scripts/record_outputs.py
@@ -0,0 +1,61 @@
+"""Record agent outputs for a free competition match.
+
+Usage:
+    echo "output text" | python record_outputs.py <match_id> <agent_id>
+    echo '{"id1": "out1", "id2": "out2"}' | python record_outputs.py <match_id> --bulk
+
+Reads output from stdin to avoid shell escaping issues.
+"""
+
+import json
+import sys
+
+sys.path.insert(0, "src")
+
+from mobius.config import get_config
+from mobius.db import init_db
+
+
+def main():
+    if len(sys.argv) < 3:
+        print("Usage:", file=sys.stderr)
+        print("  echo 'output' | python record_outputs.py <match_id> <agent_id>", file=sys.stderr)
+        print("  echo '{...}' | python record_outputs.py <match_id> --bulk", file=sys.stderr)
+        sys.exit(1)
+
+    match_id = sys.argv[1]
+    mode = sys.argv[2]
+    sys.stdin.reconfigure(encoding="utf-8", errors="replace")
+    stdin_data = sys.stdin.read()
+
+    config = get_config()
+    conn, _ = init_db(config)
+
+    row = conn.execute(
+        "SELECT id, outputs FROM matches WHERE id LIKE ?", (f"{match_id}%",)
+    ).fetchone()
+    if not row:
+        print(f"Match '{match_id}' not found.", file=sys.stderr)
+        sys.exit(1)
+
+    full_id = row[0]
+    existing = json.loads(row[1]) if row[1] else {}
+
+    if mode == "--bulk":
+        new_outputs = json.loads(stdin_data)
+        existing.update(new_outputs)
+    else:
+        agent_id = mode
+        existing[agent_id] = stdin_data.strip()
+
+    conn.execute(
+        "UPDATE matches SET outputs = ? WHERE id = ?",
+        (json.dumps(existing), full_id),
+    )
+    conn.commit()
+    print(f"Recorded {len(existing)} outputs for match {full_id[:8]}")
+    conn.close()
+
+
+if __name__ == "__main__":
+    main()