Add skill forge lab, /mobius-profile skill, and ideas backlog by AaronGoldsmith · Pull Request #6 · AaronGoldsmith/mobius

AaronGoldsmith · 2026-03-15T04:10:44Z

Summary

labs/2026-03-14-skill-forge.md — First two-phase competition: 6 agents designed Claude Code skills, then 6 testers functionally validated them against the live DB. Full results, scores, and findings documented.
/mobius-profile — Competition winner. Single-script skill that shows agent deep-dives with match history, win/loss analysis, and challenger recommendations. Tested and working.
ideas.md — Living backlog of concepts that emerged from the competition (forge skill, match replay, agent factory, self-audit).

Key Finding

The generate → test two-phase pattern was more valuable than any individual skill. Design scores and functional scores diverged significantly — one skill scored 2nd in design but last in functional testing (shipped with crashing bugs). This pattern should become repeatable.

Relates to #5 (mobius improve self-diagnosing loop)

Test plan

/mobius-profile tested against live DB with 40 agents
Edge cases verified (0 matches, nonexistent agents)
Lab entry reviewed for accuracy against actual competition results

🤖 Generated with Claude Code

…flow - Add "when Mobius is worth it" section with honest trade-off framing - Add prerequisites section with mandatory vs optional API keys - Show all three bootstrap options (API, Claude Code, scout) upfront - Add Trade-off column to "Why Mobius?" table - Replace architecture file listing with orchestrator flow diagram - Add context and caveats to cost table (token range, date, local embeddings) - Note env-overridable vs code-only config options - Remove duplicate `mobius train` command entry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ount - Fix cost date from March 2025 to March 2026 - Add `mobius agent export` to commands list (exists in CLI but was undocumented) - Fix scout demo showing 4 agents when --count 5 was requested Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix scout bootstrap: "free API cost" → "~$0.50" (uses Opus API) - Add scout row to cost table - Clarify Claude Code note: CLI commands still require API keys - Standardize subscription naming to "Pro/Team" everywhere - Clarify Anthropic needed for default judge panel (not just "recommended") - Change "rate limiting" to "concurrency control" (semaphore in swarm.py) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Covers Pro, Max, and Team without needing updates as plans change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…uirements, and trade-off wording Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

First two-phase competition: 6 agents designed Claude Code skills, then 6 testers validated them against the live DB. The generate→test pattern proved more valuable than any individual skill produced. Relates to #5 (mobius improve self-diagnosing loop) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Adds documentation and a new Claude Code skill focused on analyzing a single Mobius agent’s performance, plus updates the project README to better explain when/how to use Mobius and how to bootstrap agents.

Changes:

Added a lab write-up documenting the “Skill Forge” two-phase (generate → functional test) competition pattern.
Added a new /mobius-profile Claude Code skill (SKILL.md + show_profile.py) to display agent match history and recommend challengers.
Added ideas.md backlog and refreshed README positioning, quickstart, and architecture/cost sections.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`labs/2026-03-14-skill-forge.md`	New lab entry documenting the competition and findings (formatting issue in markdown tables).
`ideas.md`	New ideas/backlog document capturing follow-on concepts from the competition.
`README.md`	Expanded framing, quickstart, and architecture/cost explanations; updated command list and skill notes.
`.claude/skills/mobius-profile/scripts/show_profile.py`	New script that prints agent stats/matches and suggests challengers (has correctness issues in loss attribution and match filtering).
`.claude/skills/mobius-profile/SKILL.md`	New skill documentation instructing how to run the profile script and interpret results.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

labs/2026-03-14-skill-forge.md

.claude/skills/mobius-profile/scripts/show_profile.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0f454f1507

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

.claude/skills/mobius-profile/scripts/show_profile.py

- Remove unused imports (json, row_to_dict) - Skip voided/undecided matches (winner_id=None) instead of counting as LOSS - Only count losses against the actual match winner, not all opponents - Key win/loss counters by slug instead of name to avoid collisions - Filter retired agents (elo_rating=0) and untested agents from challenger recommendations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

AaronGoldsmith and others added 7 commits March 14, 2026 19:48

Restore "competition drives quality" hook, add detail as follow-up line

7e20473

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use generic "Claude Code subscription" instead of listing specific tiers

ad293e2

Covers Pro, Max, and Team without needing updates as plans change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Fix PR review comments: accurate concurrency attribution, API key req…

2eb86cb

…uirements, and trade-off wording Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 15, 2026 04:10

Copilot started reviewing on behalf of AaronGoldsmith March 15, 2026 04:11 View session

Copilot AI reviewed Mar 15, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 15, 2026

View reviewed changes

.claude/skills/mobius-profile/scripts/show_profile.py Outdated Show resolved Hide resolved

AaronGoldsmith merged commit 96f0075 into main Mar 15, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add skill forge lab, /mobius-profile skill, and ideas backlog#6

Add skill forge lab, /mobius-profile skill, and ideas backlog#6
AaronGoldsmith merged 8 commits intomainfrom
feature/skill-forge-lab

AaronGoldsmith commented Mar 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AaronGoldsmith commented Mar 15, 2026

Summary

Key Finding

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants