Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions skills/skill-creator/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -353,6 +353,76 @@ the best description by test-set score to avoid overfitting.

---

## Enriching existing skills

Use this mode when a skill already exists but produces shallow, generic output — it
has thin `references/`, no `scripts/`, and passes an eval by luck rather than
by containing domain knowledge that changes behavior.

Indicators this mode is appropriate:
- `references/` has fewer than 2 files, or none at all
- No `scripts/` directory
- Eval outputs look plausible but lack domain idioms, concrete examples, or
checklists specific to the skill's domain
- The skill passes a test because the model already knows the domain, not because
the skill contributes anything

### The enrichment loop

Six phases, max 3 iterations before escalating to the user:

**AUDIT** — measure the skill's current depth before changing anything.
Count `references/`, `scripts/`, `agents/` files. Run the skill against 2-3
realistic prompts. Save outputs to `enrichment-workspace/baseline/`.
See `references/enrichment-workflow.md` → AUDIT phase for the exact checklist.

**RESEARCH** — find domain knowledge the skill is missing.
Read the skill's SKILL.md and existing references to identify gaps. Search for
best practices, pattern catalogs with before/after examples, common mistakes,
and validation criteria. Where to look depends on the skill's domain — consult
`references/domain-research-targets.md` for a lookup table of primary and
secondary sources per domain.

**ENRICH** — add the research as reference content.
Create new files in the skill's `references/` directory. Add deterministic
`scripts/` where operations are repeatable. Update SKILL.md only with one-line
pointers to the new references — keep the orchestrator lean. Focus on content
that changes behavior: concrete examples beat abstract advice.
See `references/enrichment-workflow.md` → ENRICH phase for structuring guidance.

**TEST** — A/B test the enriched skill against baseline.
Write 2-3 realistic prompts that exercise the skill's domain. Use
`scripts/run_eval.py` to run enriched vs baseline on the same prompts. Both
runs use identical inputs. Save outputs to `enrichment-workspace/iteration-N/`.

**EVALUATE** — dispatch blind comparators on each test prompt.
Use `agents/comparator.md` (already bundled in this skill). Comparator scores on
depth, accuracy, actionability, and domain idioms without knowing which version
is which. If enriched wins 2/3 or better → PUBLISH. If tie or loss → run
`agents/analyzer.md` to understand why, then RETRY with a different research angle.
See `references/enrichment-workflow.md` → EVALUATE phase for scoring details.

**PUBLISH** — commit validated improvements.
Create branch `feat/enrich-{skill-name}`, commit references + scripts + SKILL.md
pointer updates, push, create PR. See `references/enrichment-workflow.md` →
PUBLISH phase for the exact commit/PR flow.

### Retry logic

Each retry uses a different research angle to avoid retreading the same ground:

| Iteration | Research angle |
|-----------|---------------|
| 1 | Official docs + canonical best practices |
| 2 | Common mistakes + anti-patterns (what goes wrong) |
| 3 | Advanced patterns + edge cases (what experts know) |

After 3 failed iterations, report to the user: summarize what was tried, what the
evaluator found lacking, and ask whether to try a different approach or accept the
current state.

---

## Bundled agents

The `agents/` directory contains prompts for specialized subagents used by this
Expand Down Expand Up @@ -384,6 +454,11 @@ skill. Read them when you need to spawn the relevant subagent.
- `references/complexity-tiers.md` — Skill examples by complexity tier
- `references/workflow-patterns.md` — Reusable phase structures and gate patterns
- `references/error-catalog.md` — Common skill creation errors with solutions
- `references/enrichment-workflow.md` — Deep reference for the enrichment loop:
AUDIT checklist, RESEARCH strategy, ENRICH structuring, TEST/EVALUATE/PUBLISH phases,
and retry logic in detail
- `references/domain-research-targets.md` — Lookup table: given a skill's domain,
which primary sources, secondary sources, and extraction targets to use during RESEARCH

---

Expand Down
281 changes: 281 additions & 0 deletions skills/skill-creator/references/domain-research-targets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,281 @@
# Domain Research Targets

Lookup table for the enrichment loop RESEARCH phase. Given a skill's domain, this
file tells you where to look for knowledge, what authority each source carries, and
what to extract from it.

Format per entry:
- **Primary sources** — official docs, specs, canonical reference material (highest authority)
- **Secondary sources** — blogs, talks, books, community guides (patterns and examples)
- **Extract** — what form of knowledge to pull out (checklists, before/after, decision trees)

---

## Go general (go-testing, go-concurrency, go-error-handling, go-anti-patterns, go-code-review)

**Primary sources**
- [Effective Go](https://go.dev/doc/effective_go) — canonical idioms; extract named
patterns with rationale
- [Go specification](https://go.dev/ref/spec) — authoritative on language semantics;
useful for edge cases and subtle behavior
- [Go standard library source](https://cs.opensource.google/go/go) — how the stdlib
itself applies patterns; extract struct design, error handling, and interface choices
- [Go Blog](https://go.dev/blog) — official in-depth articles; especially errors,
modules, generics, and concurrency posts
- [Go wiki: CodeReviewComments](https://github.com/golang/go/wiki/CodeReviewComments) —
community-maintained list of Go code review feedback; extract as checklist
- [Go wiki: CommonMistakes](https://github.com/golang/go/wiki/CommonMistakes) —
extract directly as anti-pattern catalog

**Secondary sources**
- [Go Proverbs](https://go-proverbs.github.io) (Rob Pike) — memorable heuristics;
useful for decision criteria
- Dave Cheney's blog (dave.cheney.net) and talks — especially error handling, interfaces,
and performance; extract before/after examples
- [100 Go Mistakes](https://100go.co) (Teiva Harsanyi) — structured mistake catalog;
extract mistake + root cause + fix format
- Go 1.22+ release notes — new patterns and deprecations worth knowing

**Extract**
- Checklist: idiomatic Go review (interface size, error wrapping, goroutine hygiene)
- Before/after: common rewrites (bare error returns → wrapped; goroutine leak → context cancel)
- Decision tree: when to use channels vs mutexes, when to define an interface vs use concrete type
- Anti-pattern catalog: goroutine leaks, error shadowing, interface pollution, unnecessary abstractions

---

## Go SAPCC (go-sapcc-conventions)

This skill is already rich — it was built from extracted PR review comments from
sapcc/keppel and sapcc/go-bits. Enrichment is low-value unless new PR review
patterns have accumulated.

**When to enrich**: mine new merged PRs from sapcc/keppel and sapcc/go-bits since
the skill's last update date. Look for reviewer comments that establish new patterns
not yet in the skill's references.

**Primary source**: sapcc/keppel PR review history (via `skills/skill-creator/scripts/` pr-miner)

**Extract**: reviewer comment → pattern name → before/after example, same format as
existing sapcc references

---

## Python (python-quality-gate)

**Primary sources**
- [PEP 8](https://peps.python.org/pep-0008/) — style; extract checklist of the
non-obvious rules (the obvious ones are already in every model's training)
- [PEP 484](https://peps.python.org/pep-0484/) — type hints; extract annotation patterns
- [PEP 526](https://peps.python.org/pep-0526/) — variable annotations
- [PEP 3107](https://peps.python.org/pep-3107/) — function annotations
- [Python docs: typing module](https://docs.python.org/3/library/typing.html) —
extract: when to use Protocol vs ABC, TypeVar constraints, overload patterns
- [mypy docs](https://mypy.readthedocs.io) — extract: common type errors and their
fixes, strict mode implications

**Secondary sources**
- [ruff rules reference](https://docs.astral.sh/ruff/rules/) — every rule has a
rationale; extract the non-obvious ones as checklist
- Real Python tutorials — extract before/after examples from "Pythonic" articles
- Hynek Schlawack's blog — especially async and attrs patterns

**Extract**
- Checklist: pre-commit quality gate (ruff, mypy, bandit checks that matter most)
- Before/after: common Python anti-patterns with idiomatic rewrites
- Decision tree: when to use dataclass vs TypedDict vs NamedTuple vs attrs
- Anti-pattern catalog: mutable default arguments, broad except, type: ignore abuse

---

## Kubernetes (kubernetes-debugging, kubernetes-security)

**Primary sources**
- [Kubernetes official docs](https://kubernetes.io/docs/) — especially Concepts and
Tasks sections; extract patterns, not API reference
- [RBAC best practices](https://kubernetes.io/docs/concepts/security/rbac-good-practices/)
- [Network Policy docs](https://kubernetes.io/docs/concepts/services-networking/network-policies/)
- [CIS Kubernetes Benchmark](https://www.cisecurity.org/benchmark/kubernetes) —
extract as security checklist with severity levels
- [Pod Security Standards](https://kubernetes.io/docs/concepts/security/pod-security-standards/)

**Secondary sources**
- *Kubernetes Patterns* (Ibryam & Huss) — extract named patterns with use-case criteria
- Learnk8s blog — extract debugging decision trees and before/after manifests
- [kubectl cheat sheet](https://kubernetes.io/docs/reference/kubectl/cheatsheet/) —
extract as debugging command reference

**Extract**
- Checklist: security hardening (RBAC, network policies, pod security, secret management)
- Decision tree: debugging pod failures (CrashLoopBackOff → ImagePullBackOff → OOMKilled flow)
- Before/after: insecure manifest → hardened manifest examples
- Anti-pattern catalog: over-privileged service accounts, missing resource limits, secret in env vars

---

## TypeScript (typescript-check)

**Primary sources**
- [TypeScript handbook](https://www.typescriptlang.org/docs/handbook/) — extract
non-obvious type patterns: conditional types, mapped types, template literals
- [TypeScript release notes](https://www.typescriptlang.org/docs/handbook/release-notes/overview.html)
— new features per version; extract patterns introduced in 5.x
- [@types conventions](https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/README.md)

**Secondary sources**
- Matt Pocock (total-typescript.com) — extract: advanced type patterns with before/after,
common TS mistakes with fixes
- [TypeScript Deep Dive](https://basarat.gitbook.io/typescript/) — extract anti-patterns section

**Extract**
- Checklist: strict mode implications and what each flag catches
- Before/after: `any` abuse → proper generics, type assertion abuse → type guards
- Decision tree: when to use `interface` vs `type`, `unknown` vs `any`, generics vs overloads
- Anti-pattern catalog: type assertions without guards, overly broad union types, enum misuse

---

## React / Next.js (distinctive-frontend-design, threejs-builder)

**Primary sources**
- [React docs](https://react.dev) — especially the "Thinking in React" and hooks
reference sections; extract composability patterns
- [Next.js docs](https://nextjs.org/docs) — extract: App Router patterns, server
component vs client component decision criteria, data fetching patterns

**Secondary sources**
- Vercel blog — extract: App Router migration patterns, performance optimization cases
- Kent C. Dodds (kentcdodds.com) — extract: compound component pattern, custom hooks
patterns, testing philosophy
- Josh Comeau (joshwcomeau.com) — extract: CSS-in-JS patterns, animation approaches

**Extract**
- Decision tree: server component vs client component selection criteria
- Before/after: common React anti-patterns (prop drilling → context, useEffect abuse → derived state)
- Checklist: performance review (unnecessary re-renders, missing keys, large bundle items)
- Pattern catalog: compound components, render props, custom hooks with clear interfaces

---

## Testing (test-driven-development, testing-anti-patterns, e2e-testing)

**Primary sources**
- [Playwright docs](https://playwright.dev/docs/intro) — extract: Page Object Model
structure, locator best practices, network interception patterns
- [pytest docs](https://docs.pytest.org) — extract: fixture patterns, parametrize,
conftest scope decisions

**Secondary sources**
- Kent C. Dodds — Testing Trophy and [testing-library principles](https://testing-library.com/docs/guiding-principles)
— extract: what to test at each level
- *Growing Object-Oriented Software, Guided by Tests* (Freeman & Pryce) — extract:
outside-in TDD pattern, listening to tests as design signal
- *xUnit Test Patterns* (Meszaros) — extract: test smell catalog with names and fixes
- Martin Fowler's [bliki on test doubles](https://martinfowler.com/bliki/TestDouble.html)

**Extract**
- Checklist: test quality review (one assertion focus, arrange-act-assert, test isolation)
- Anti-pattern catalog with names: Mystery Guest, Eager Test, Fragile Test, Slow Test
- Decision tree: unit vs integration vs E2E for a given scenario
- Before/after: brittle selector → resilient locator, over-mocked test → integrated test

---

## Security (security-threat-model)

**Primary sources**
- [OWASP Top 10](https://owasp.org/www-project-top-ten/) — extract each category as
a named vulnerability with detection criteria and mitigation checklist
- [OWASP Cheat Sheets](https://cheatsheetseries.owasp.org) — extract checklists per
topic (SQL injection, XSS, CSRF, auth, etc.)
- [CWE Top 25](https://cwe.mitre.org/top25/) — extract as severity-ranked catalog
- [NIST guidelines](https://csrc.nist.gov/publications) — especially SP 800-53 controls

**Secondary sources**
- PortSwigger Web Security Academy — extract: attack pattern → detection → fix format
- Troy Hunt's blog — extract: real-world mistake catalog

**Extract**
- Checklist: threat modeling prompts (per STRIDE category)
- Before/after: vulnerable code → remediated code for each OWASP Top 10 item
- Decision tree: severity classification (Critical/High/Medium/Low with criteria)
- Anti-pattern catalog: hard-coded secrets, overly permissive CORS, missing auth checks

---

## Perses (perses-*)

**Primary sources**
- [Perses docs](https://perses.dev/docs/) — extract: dashboard definition spec,
plugin architecture, variable interpolation formats
- [Perses GitHub wiki](https://github.com/perses/perses/wiki) — supplementary patterns
- [PromQL docs](https://prometheus.io/docs/prometheus/latest/querying/basics/) —
extract: query optimization patterns, recording rules, alerting rule structure

**Secondary sources**
- Perses GitHub issues and PR discussions — extract: community-documented gotchas
and workarounds

**Extract**
- Checklist: dashboard quality (variable usage, panel alignment, datasource scoping)
- Before/after: raw PromQL → optimized PromQL with recording rules
- Decision tree: when to use global vs project vs dashboard scope for variables
- Anti-pattern catalog: hardcoded datasource names, missing variable fallbacks, over-complex queries

---

## Voice skills (create-voice, voice-writer, voice-calibrator)

These skills are already rich — they have deterministic Python validators and
wabi-sabi calibration built in. Enrichment is rarely warranted.

**When to enrich**: if the banned-pattern list in `voice_validator.py` needs
expansion, or a new voice profile introduces patterns the existing rules don't cover.
Mine the validator's false-positive/false-negative log if one exists.

---

## Code review (systematic-code-review, parallel-code-review)

**Primary sources**
- [Google Engineering Practices: Code Review](https://google.github.io/eng-practices/review/)
— extract: reviewer standards, author responsibilities, speed guidelines
- [Conventional Comments](https://conventionalcomments.org) — label taxonomy for
review comments (nitpick, suggestion, issue, question, etc.)

**Secondary sources**
- Michaela Greiler (michaelagreiler.com) — extract: research-backed review effectiveness
checklist, anti-patterns in reviewer behavior
- SmartBear Code Review research papers — extract: optimal review size, defect density
findings as concrete thresholds

**Extract**
- Checklist: what to check at each review tier (security, logic, style, naming)
- Before/after: vague review comment → actionable comment with label
- Decision tree: block vs request-changes vs comment vs approve criteria
- Anti-pattern catalog: rubber-stamping, nitpick overload, missing context in comments

---

## Git / PR workflows (pr-pipeline, pr-sync, git-commit-flow)

**Primary sources**
- [Conventional Commits spec](https://www.conventionalcommits.org) — extract:
type taxonomy, breaking change notation, footer format
- [GitHub API docs](https://docs.github.com/en/rest) — extract: PR creation fields,
check run status, review request patterns
- [gh CLI reference](https://cli.github.com/manual/) — extract: useful command
combinations for PR workflows

**Secondary sources**
- [Git best practices](https://sethrobertson.github.io/GitBestPractices/) — extract:
commit hygiene rules
- [Chris Beams: How to Write a Git Commit Message](https://cbea.ms/git-commit/) —
extract: 7 rules as checklist

**Extract**
- Checklist: pre-PR commit hygiene (message format, squash policy, branch naming)
- Before/after: bad commit message → conventional commit message
- Decision tree: squash vs merge vs rebase for different PR types
- Anti-pattern catalog: fixup commits left in history, force-push to shared branch,
PR too large to review
Loading
Loading