Skip to content

feat: add skill-creator skill with eval-driven validation and blind A/B testing#178

Merged
notque merged 1 commit intomainfrom
feat/skill-creator-eval-pipeline
Mar 27, 2026
Merged

feat: add skill-creator skill with eval-driven validation and blind A/B testing#178
notque merged 1 commit intomainfrom
feat/skill-creator-eval-pipeline

Conversation

@notque
Copy link
Copy Markdown
Owner

@notque notque commented Mar 27, 2026

Summary

Replace the skill-creator-engineer agent with a self-contained skill that follows an eval-driven model: draft a skill, test against real prompts, grade with agent blind review, iterate based on measured results.

  • skill-creator skill — workflow-first SKILL.md with bundled agents (grader, comparator, analyzer), scripts (run_eval, aggregate_benchmark, optimize_description, eval_compare, package_results), assets (blind A/B/C comparison viewer with syntax highlighting), and references
  • Philosophy updates — "Skills Are Self-Contained Packages" and "Workflow First, Constraints Inline" principles added, both evidence-backed by blind A/B testing
  • Old agent removed — skill-creator-engineer deleted, all 25 referencing files updated
  • Hook matcher groups — PreToolUse/PostToolUse performance improvement
  • Eval workspace gitignore — *-workspace/ and .feature/ directories excluded

Test plan

  • Verify skill-creator skill loads correctly (/skill-creator triggers)
  • Verify old agent references are fully removed (grep -r skill-creator-engineer)
  • Verify eval viewer opens correctly from workspace
  • Verify hook matcher groups don't break existing hook behavior
  • Verify .gitignore excludes workspace and .feature directories

@notque notque force-pushed the feat/skill-creator-eval-pipeline branch 2 times, most recently from 15aac68 to 3fc508d Compare March 27, 2026 03:21
…/B testing

Replace the skill-creator-engineer agent with a self-contained skill at
skills/skill-creator/ that follows the eval-driven model: draft a skill,
test against real prompts, grade with agent blind review, iterate based
on measured results.

Includes:
- SKILL.md: workflow-first ordering, constraints inline with reasoning
- agents/: grader (assertion evaluation), comparator (blind A/B),
  analyzer (post-hoc analysis)
- scripts/: run_eval, aggregate_benchmark, optimize_description,
  eval_compare, package_results
- assets/: blind A/B/C comparison viewer with syntax highlighting,
  agent review panels, human feedback, skip-to-results
- references/: artifact schemas, skill template, complexity tiers,
  workflow patterns, error catalog

Philosophy updates:
- Skills Are Self-Contained Packages: scripts, agents, assets bundled
  inside the skill directory
- Workflow First, Constraints Inline: instructions before context,
  measured to produce better downstream output (3-0 in blind eval)

Also: hook matcher groups for PreToolUse/PostToolUse performance,
eval workspace gitignore, all references updated from old agent.
@notque notque force-pushed the feat/skill-creator-eval-pipeline branch from 3fc508d to f276a1c Compare March 27, 2026 03:41
@notque notque merged commit 46c6d21 into main Mar 27, 2026
4 checks passed
@notque notque deleted the feat/skill-creator-eval-pipeline branch March 27, 2026 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant