Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
136 changes: 126 additions & 10 deletions .claude/skills/mobius-run/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,37 +2,153 @@
name: mobius-run
description: Use when the user says "compete", "mobius run", or wants to pit agents against each other on a task.
user-invocable: true
argument-hint: <task description>
argument-hint: <task description> [--free] [--api]
---

# Mobius Competition Runner

You are the orchestrator for Mobius, an adversarial agent swarm system. The user wants to run a competition.

## What to do
## Determine mode

- **`--free`** (DEFAULT): Run the competition entirely within Claude Code using subagents. Zero API cost. You generate challenger personas on the fly, spawn them as haiku subagents, collect outputs, and judge them yourself.
- **`--api`**: Run via the CLI with real API calls (cross-family diversity, costs money).

If neither flag is given, default to `--free` mode.

---

## MODE: --free (Subagent Competition)

This is the exciting part. You ARE the competition engine — no API calls needed.

### Step 1: Initialize

1. Check that Mobius is initialized:
```bash
python -m mobius.cli stats
```

2. If not initialized, run:
If not initialized: `python -m mobius.cli init`

### Step 2: Choose agents

You have two options. Use whichever fits:

**Option A — Use existing agents from the registry:**
```bash
python .claude/skills/mobius-run/scripts/create_match.py "<TASK>" --count 6
```
This returns JSON with agent details including their system_prompts. Use these prompts for the subagents.

**Option B — Generate fresh challengers on the fly (PREFERRED for interesting results):**
Analyze the task and design 4-8 complementary approaches that attack it from deliberately different angles. Think about what dimensions of variation would produce genuinely diverse solutions — not just "creative vs analytical" but specific strategic differences relevant to THIS task.

For each challenger, create a short but specific system prompt (3-5 sentences) that defines their approach. Then register them:
```bash
python .claude/skills/mobius-seed/scripts/create_agent.py '{"name":"...", "slug":"...", "description":"...", "system_prompt":"...", "specializations":[...], "provider":"anthropic", "model":"claude-haiku-4-5-20251001"}'
```

Then create the match:
```bash
python .claude/skills/mobius-run/scripts/create_match.py "<TASK>" --agents slug1,slug2,slug3,...
```

**Option C — Mix both:** Pull veterans from the registry AND generate fresh challengers. Pit them against each other.

### Step 3: Spawn subagents

For each agent from the match JSON, spawn a haiku subagent using the Agent tool:
- Set `model: "haiku"` on each agent
- Pass the agent's system_prompt as context plus the competition task
- Use `subagent_type: "general-purpose"`
- **IMPORTANT: Launch ALL subagents in a SINGLE message** so they run in parallel
- Each subagent prompt should be structured as:

```
You are competing in a Mobius adversarial swarm competition.

YOUR IDENTITY AND APPROACH:
<the agent's system_prompt from the registry>

YOUR TASK:
<the competition task>

Produce your best solution. Be thorough but focused. Output ONLY your solution.
```

If you have more than 6 agents, batch them: spawn the first 6, wait for results, then spawn the next batch.

### Step 4: Record outputs

After each subagent returns, pipe its output to the match record:
```bash
echo "<agent_output>" | python .claude/skills/mobius-run/scripts/record_outputs.py <match_id> <agent_id>
```

You can record outputs incrementally as agents finish — each call merges into the existing record. Or record all at once with `--bulk`:
```bash
python -m mobius.cli init
echo '<outputs_json>' | python .claude/skills/mobius-run/scripts/record_outputs.py <match_id> --bulk
```

3. If no agents exist, suggest running `/mobius-seed` first.
### Step 5: Judge

You ARE the judge. Score each output on:
- **Correctness** (0-10): Does it solve the task accurately?
- **Quality** (0-10): Is it well-structured, readable, best practices?
- **Completeness** (0-10): Does it fully address all aspects?

Be ruthless and fair. Don't let positional bias affect you — judge purely on merit.

### Step 6: Record verdict

4. Run the competition with the user's task:
```bash
python .claude/skills/mobius-judge/scripts/record_verdict.py \
--match <match_id> \
<winner_agent_id> \
'{"agent_id_1": 28.5, "agent_id_2": 22.0, ...}' \
"Your detailed reasoning"
```

Use the match_id from Step 2 to ensure the verdict is recorded against the correct match.

### Step 7: Show results

```bash
python -m mobius.cli leaderboard
```

Present: the winner, your reasoning, Elo changes, and the winning solution.

---

## MODE: --api (CLI Competition)

Traditional mode using real API calls.

1. Check initialization:
```bash
python -m mobius.cli stats
```

2. If no agents exist, suggest `/mobius-seed` first.

3. Run the competition:
```bash
python -m mobius.cli run "<TASK>"
```

5. After the competition, show the explain output:
4. Show results:
```bash
python -m mobius.cli explain
```

6. Present the winning output to the user along with the judge reasoning.
5. Present the winning output and judge reasoning to the user.

---

## Tips

If the user didn't provide a task argument, ask them what they want the agents to compete on.
- For `--free` mode, you can scale to 12+ agents easily — haiku is fast and cheap (free on Pro)
- Generate challengers that are *orthogonal*, not just variations. Each should have a genuinely different strategy.
- If an existing champion agent loses to a fresh challenger, that's interesting — note it for the user
- The `--free` mode integrates with the same Elo system as `--api` — results are comparable
114 changes: 114 additions & 0 deletions .claude/skills/mobius-run/scripts/create_match.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
"""Create a match record for a free (subagent-based) competition.

Usage:
python create_match.py "<task>" [--agents <slug1,slug2,...>] [--count N]

Modes:
--agents slug1,slug2 Use specific agents from registry by slug
--count N Pick top N agents by Elo (default: 5)

Outputs JSON with match_id and agent details for the skill to orchestrate.
"""

import json
import sys

sys.path.insert(0, "src")

from mobius.config import get_config
from mobius.db import init_db
from mobius.models import MatchRecord
from mobius.registry import Registry


def main():
args = sys.argv[1:]
if not args:
print("Usage: python create_match.py '<task>' [--agents s1,s2] [--count N]")
sys.exit(1)

task = args[0]
slugs = None
count = 5

i = 1
while i < len(args):
if args[i] == "--agents" and i + 1 < len(args):
slugs = [s.strip() for s in args[i + 1].split(",")]
i += 2
elif args[i] == "--count" and i + 1 < len(args):
count = int(args[i + 1])
i += 2
else:
i += 1

config = get_config()
conn, _ = init_db(config)
registry = Registry(conn, config)

# Select agents
agents = []
if slugs:
for slug in slugs:
agent = registry.get_agent_by_slug(slug)
if agent:
agents.append(agent)
else:
print(f"Warning: agent '{slug}' not found, skipping", file=sys.stderr)
else:
all_agents = registry.list_agents()
all_agents.sort(key=lambda a: a.elo_rating, reverse=True)
agents = all_agents[:count]

if len(agents) < 2:
print(json.dumps({"error": "Need at least 2 agents", "agent_count": len(agents)}))
sys.exit(1)

# Create match record (outputs empty — skill will fill them)
match = MatchRecord(
task_description=task,
competitor_ids=[a.id for a in agents],
)

conn.execute(
"""INSERT INTO matches (id, task_description, competitor_ids, outputs, judge_models,
judge_reasoning, winner_id, scores, voided, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
match.id,
match.task_description,
json.dumps(match.competitor_ids),
json.dumps({}),
json.dumps([]),
"",
None,
json.dumps({}),
0,
match.created_at.isoformat(),
),
)
conn.commit()

# Output agent details for the skill
result = {
"match_id": match.id,
"task": task,
"agents": [
{
"id": a.id,
"name": a.name,
"slug": a.slug,
"system_prompt": a.system_prompt,
"specializations": a.specializations,
"elo_rating": a.elo_rating,
}
for a in agents
],
}

print(json.dumps(result, indent=2))
conn.close()


if __name__ == "__main__":
main()
61 changes: 61 additions & 0 deletions .claude/skills/mobius-run/scripts/record_outputs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
"""Record agent outputs for a free competition match.

Usage:
echo "output text" | python record_outputs.py <match_id> <agent_id>
echo '{"id1": "out1", "id2": "out2"}' | python record_outputs.py <match_id> --bulk

Reads output from stdin to avoid shell escaping issues.
"""

import json
import sys

sys.path.insert(0, "src")

from mobius.config import get_config
from mobius.db import init_db


def main():
if len(sys.argv) < 3:
print("Usage:", file=sys.stderr)
print(" echo 'output' | python record_outputs.py <match_id> <agent_id>", file=sys.stderr)
print(" echo '{...}' | python record_outputs.py <match_id> --bulk", file=sys.stderr)
sys.exit(1)

match_id = sys.argv[1]
mode = sys.argv[2]
sys.stdin.reconfigure(encoding="utf-8", errors="replace")
stdin_data = sys.stdin.read()

config = get_config()
conn, _ = init_db(config)

row = conn.execute(
"SELECT id, outputs FROM matches WHERE id LIKE ?", (f"{match_id}%",)
).fetchone()
if not row:
print(f"Match '{match_id}' not found.", file=sys.stderr)
sys.exit(1)

full_id = row[0]
existing = json.loads(row[1]) if row[1] else {}

if mode == "--bulk":
new_outputs = json.loads(stdin_data)
existing.update(new_outputs)
else:
agent_id = mode
existing[agent_id] = stdin_data.strip()

conn.execute(
"UPDATE matches SET outputs = ? WHERE id = ?",
(json.dumps(existing), full_id),
)
conn.commit()
print(f"Recorded {len(existing)} outputs for match {full_id[:8]}")
conn.close()


if __name__ == "__main__":
main()
Loading
Loading