notque · notque · Mar 27, 2026 · Mar 27, 2026
diff --git a/skills/skill-creator/SKILL.md b/skills/skill-creator/SKILL.md
@@ -137,10 +137,34 @@ Explain the reasoning behind constraints rather than issuing bare imperatives.
 effective than "ALWAYS run with -race" because the model can generalize the
 reasoning to situations the skill author didn't anticipate.
 
-**Progressive disclosure** — keep SKILL.md navigable:
-- Summary in frontmatter, workflow in body, deep reference in `references/`
-- If SKILL.md exceeds ~500 lines, move detailed catalogs to reference files
-- Reference files clearly linked from SKILL.md with guidance on when to read them
+**Progressive disclosure** — SKILL.md is the routing target, not the reference
+library. It stays lean so it loads fast when Claude considers invoking it, then
+reads `references/` on demand as phases execute. See
+`references/progressive-disclosure.md` for the full model, economics, and
+extraction decision tree.
+
+Key rules:
+- SKILL.md: brief overview, phase structure with gates, one-line pointers to
+  reference files, error handling
+- `references/`: checklists, rubrics, agent dispatch prompts, report templates,
+  pattern catalogs, example collections — anything only needed at execution time
+- If SKILL.md exceeds **500 lines** after writing, extract detailed content to
+  `references/` before proceeding
+- If SKILL.md exceeds **700 lines**, extraction is mandatory — it is carrying
+  reference content that should not be loaded on every routing decision
+
+**Maximizing skill effectiveness:**
+
+| More of this → better skill | Why |
+|-----------------------------|-----|
+| Rich `references/` content | Depth available at execution; zero cost at routing time |
+| Deterministic `scripts/` | Consistency, token savings, independent testability |
+| Bundled `agents/` prompts | Specialized dispatch without routing system overhead |
+
+The most effective complex skills in this toolkit (`comprehensive-review`,
+`sapcc-review`, `voice-writer`) have SKILL.md under 600 lines and put all
+operational depth in `references/` and `agents/`. See
+`references/progressive-disclosure.md` for the real numbers.
 
 ### Bundled scripts
 
@@ -352,9 +376,11 @@ skill. Read them when you need to spawn the relevant subagent.
 
 ## Reference files
 
+- `references/progressive-disclosure.md` — The disclosure model: economics, size
+  gates, what to extract, real examples from the toolkit, script and agent patterns
+- `references/skill-template.md` — Complete SKILL.md template with all sections
 - `references/artifact-schemas.md` — JSON schemas for eval artifacts (evals.json,
   grading.json, benchmark.json, comparison.json, timing.json, metrics.json)
-- `references/skill-template.md` — Complete SKILL.md template with all sections
 - `references/complexity-tiers.md` — Skill examples by complexity tier
 - `references/workflow-patterns.md` — Reusable phase structures and gate patterns
 - `references/error-catalog.md` — Common skill creation errors with solutions

diff --git a/skills/skill-creator/references/progressive-disclosure.md b/skills/skill-creator/references/progressive-disclosure.md
@@ -0,0 +1,217 @@
+# Progressive Disclosure Model
+
+How to structure skills so they load fast when Claude considers them and deliver
+full depth when Claude executes them.
+
+---
+
+## The Core Model
+
+```
+SKILL.md            ← always loaded when Claude considers invoking the skill
+references/         ← loaded on demand as the skill executes
+scripts/            ← deterministic CLI tools, called from SKILL.md phases
+agents/             ← specialized subagent prompts, dispatched from SKILL.md
+```
+
+**SKILL.md** is the routing target. It stays lean so it loads fast, then reads
+reference files on demand as phases execute.
+
+**`references/`** holds deep content: checklists, rubrics, templates, patterns,
+agent dispatch prompts, scoring systems, example collections. Loaded only when
+the skill is actually running and reaches the phase that needs them.
+
+**`scripts/`** holds deterministic CLI tools. If an operation is repeatable and
+doesn't require LLM judgment, it should be a Python script — not inline
+instructions that the model reinvents each run.
+
+**`agents/`** holds specialized subagent prompts for skills that dispatch
+parallel reviewers, graders, or domain specialists. Each agent file contains
+the full prompt for one specialized role.
+
+---
+
+## The Economics
+
+| Moment | What loads | Token cost |
+|--------|------------|------------|
+| Claude considers invoking the skill | SKILL.md only | Low (300–400 lines) |
+| Skill executes Phase 1 | SKILL.md + Phase 1 reference | Medium |
+| Skill executes all phases | SKILL.md + all referenced files | Full depth |
+
+A 300-line SKILL.md with 5 reference files totaling 800 lines costs **300 tokens
+to consider** and **1100 tokens when executing**. A 1100-line SKILL.md costs
+1100 tokens on every routing decision, whether or not the skill gets invoked.
+
+This is the key asymmetry. Keep SKILL.md lean.
+
+---
+
+## Size Gates
+
+| SKILL.md length | Action |
+|-----------------|--------|
+| Under 400 lines | Fine — no extraction needed |
+| 400–500 lines | Consider extracting if there are obvious deep-content sections |
+| Over 500 lines | Should extract detailed catalogs to `references/` |
+| Over 700 lines | Must extract — SKILL.md is carrying reference content |
+
+After writing a SKILL.md, check its length. If it exceeds 500 lines, identify
+the heaviest sections (checklists, rubrics, pattern catalogs, agent prompts,
+example collections) and move them to `references/`.
+
+---
+
+## What to Extract to `references/`
+
+**Extract these** — they are deep content that only matters when the skill runs:
+
+- Detailed checklists and rubrics (e.g., severity classification tables, joy-check
+  rubric, grading criteria)
+- Agent dispatch prompts (e.g., the 10 specialist prompts in `sapcc-review`, wave
+  agent prompts in `comprehensive-review`)
+- Report and output templates (e.g., the structured markdown template for
+  `sapcc-review` findings)
+- Domain-specific pattern catalogs (e.g., Go anti-patterns with before/after
+  examples, common error patterns)
+- Validation criteria and scoring systems
+- Example collections (realistic input/output pairs, prompt examples)
+- Phase-specific deep guides (e.g., "how to run the voice extraction phase")
+
+**Keep in SKILL.md** — these guide routing and orchestration:
+
+- Frontmatter (name, description, routing — never extracted)
+- Brief overview (2-3 sentences)
+- Phase/step structure with gates
+- One-line pointers to reference files ("See `references/X.md` for...")
+- Error handling (cause/solution pairs for common failures)
+- Brief examples showing trigger context
+
+---
+
+## Real Examples from This Toolkit
+
+These skills were built following this model. Use them as reference.
+
+| Skill | SKILL.md | `references/` | Total | What's in references |
+|-------|----------|----------------|-------|----------------------|
+| `comprehensive-review` | 564 lines | 765 lines (5 files) | 1329 | Wave-specific agent prompts per wave |
+| `create-voice` | 444 lines | 426 lines (4 files) | 870 | Phase-specific deep guides |
+| `pr-pipeline` | 417 lines | 365 lines (4 files) | 782 | Checklist, templates, loop details |
+| `sapcc-review` | 269 lines | 323 lines (2 files) | 592 | 10 agent dispatch prompts, report template |
+| `systematic-code-review` | 301 lines | 252 lines (3 files) | 553 | Severity rules, Go patterns, feedback guide |
+| `voice-writer` | 307 lines | 462 lines (6 files) | 769 | Rubrics, checklists, joy-check criteria, schemas |
+
+Notice that the most complex skills (`comprehensive-review`, `sapcc-review`) have
+the *smallest* SKILL.md-to-total ratios. All their operational depth lives in
+`references/` and `agents/`, loaded only when the skill executes.
+
+### Pattern: Agent Dispatch Prompts in `agents/`
+
+`sapcc-review` dispatches 10 parallel domain-specialist agents. Their prompts
+live in `agents/` (one file per specialist). SKILL.md says:
+
+```
+Spawn 10 parallel subagents, each loaded with their agent prompt from agents/:
+- agents/error-handling-reviewer.md
+- agents/api-contracts-reviewer.md
+...
+```
+
+SKILL.md stays at 269 lines. The 10 agent prompts are only loaded when the
+skill actually runs.
+
+### Pattern: Wave Prompts in `references/`
+
+`comprehensive-review` runs 4 waves of parallel review. Each wave's agent
+prompts are in a separate reference file (`references/wave1-agents.md`, etc.).
+SKILL.md describes the structure; the actual prompts are loaded per-wave.
+
+### Pattern: Checklist Extraction
+
+`pr-pipeline` has a pre-PR checklist that would bulk out SKILL.md. It lives in
+`references/pre-pr-checklist.md`. SKILL.md says: "Before creating the PR, work
+through `references/pre-pr-checklist.md`."
+
+---
+
+## Deterministic Script Principle
+
+If an operation is repeatable and doesn't require LLM judgment, it **should** be
+a Python CLI script in `scripts/`, not inline instructions that the model
+reinvents on each invocation.
+
+Scripts:
+- Save tokens — the model calls a script rather than reasoning through the same
+  steps from scratch each time
+- Ensure consistency — the same input produces the same output every run
+- Can be tested independently — unit tests for scripts, not for model reasoning
+- Are version-controlled and reviewable — changes are explicit diffs
+- Have predictable outputs — scripts fail deterministically; model reasoning fails
+  silently
+
+**Good candidates for scripts:**
+- Validation (voice validation, format checking, lint)
+- Metric extraction (line counts, token counts, benchmark aggregation)
+- Template rendering (fill a report template with data)
+- Link checking, path resolution, file discovery
+- Format conversion (CSV to JSON, markdown to HTML)
+- API calls with structured output (GitHub, linear, Slack)
+
+**Keep as SKILL.md instructions** — things that require judgment:
+- Deciding what to review and how deeply
+- Interpreting ambiguous outputs
+- Adapting approach to context
+
+The right split: `scripts/` for mechanical operations, SKILL.md for orchestration
+and judgment.
+
+---
+
+## Bundled Agents
+
+For skills that dispatch subagents with specialized roles, bundle agent prompts
+in `agents/`. These are not registered in the routing system — they are internal
+to the skill's workflow, loaded only when the skill dispatches them.
+
+```
+skill-name/
+├── SKILL.md
+├── agents/
+│   ├── security-reviewer.md    # Prompt for the security specialist
+│   ├── arch-reviewer.md        # Prompt for the architecture specialist
+│   └── grader.md               # Prompt for output grading
+├── scripts/
+└── references/
+```
+
+SKILL.md references them with a dispatch instruction:
+```
+Spawn a subagent using the prompt in agents/security-reviewer.md.
+Pass it: the diff, the package list, and the Wave 1 findings.
+```
+
+When to bundle vs. use repo-level agents:
+
+| Scenario | Where |
+|----------|-------|
+| Agent only used by this skill | Bundle in `agents/` |
+| Agent shared across multiple skills | Repo `agents/` directory |
+| Agent needs to appear in routing | Repo `agents/` directory |
+
+---
+
+## Applying This Model When Creating a New Skill
+
+1. **Write SKILL.md first** — get the workflow right without worrying about length
+2. **Check length** — if over 500 lines, identify extraction candidates
+3. **Extract** — move checklists, rubrics, agent prompts, templates to `references/`
+4. **Replace with pointers** — each extracted section becomes one line in SKILL.md:
+   `"See references/X.md for the full checklist."`
+5. **Identify deterministic operations** — anything the model would reinvent each
+   run is a script candidate; write `scripts/X.py` and replace with a `Run:` line
+6. **Identify specialized roles** — if the skill dispatches agents with distinct
+   expertise, write their prompts in `agents/` and reference from SKILL.md
+
+The result: a lean SKILL.md that orchestrates, and a rich `references/` + `scripts/`
++ `agents/` that delivers depth on demand.
diff --git a/skills/skill-creator/references/skill-template.md b/skills/skill-creator/references/skill-template.md
@@ -147,64 +147,53 @@ skill-name/
 
 Bundled agents are referenced from SKILL.md: "Spawn a subagent using the prompt in `agents/grader.md`". They don't appear in the routing system — they're internal to the skill's workflow.
 
-## Operator Context Section
+## Instructions Section
+
+Constraints belong **inline** within the workflow step where they apply, not in a
+separate `## Operator Context` block. If a constraint matters during Phase 2, put
+it in Phase 2 — not in a preamble 200 lines above where the model encounters it.
+Explain the reasoning alongside each constraint (see "Motivation over Mandate" below).
 
 ```markdown
-## Operator Context
+## Instructions
 
-This skill operates as an operator for [workflow], configuring Claude's behavior for [automation context].
+### Overview
 
-### Hardcoded Behaviors (Always Apply)
-- **CLAUDE.md Compliance**: Read and follow repository CLAUDE.md files
-- **Over-Engineering Prevention**: Only implement what's directly requested
-- [Domain-specific constraint]
+[2-3 sentences: what this skill does and how it works end-to-end]
 
-### Default Behaviors (ON unless disabled)
-- **Communication**: Show complete output, never summarize
-- **Temp File Cleanup**: Remove iteration files at completion
-- [Workflow-specific default]
+### Phase 1: [First Phase Name]
 
-### Optional Behaviors (OFF unless enabled)
-- [Capability available on request]
+[What to do here — goal and actions]
 
-## What This Skill CAN Do
-- [Explicit capability 1]
-- [Explicit capability 2]
+Run: `python3 ~/.claude/scripts/main.py --input {input_file}`
+Expect: [Specific output format]
 
-## What This Skill CANNOT Do
-- [Limitation 1]: [Reason]
-- [Limitation 2]: [Reason]
-```
+Gate: [Condition that must be true before moving to Phase 2]
+— because [reason the gate exists]
 
-## Instructions Section
+### Phase 2: [Second Phase Name]
 
-```markdown
-## Instructions
+[What to do here]
 
-### Step 1: [First Action]
-Run: `python3 ~/.claude/scripts/main.py --input {input_file}`
-Expect: [Specific output format]
-Validate: [How to verify success]
+Constraint: [Domain-specific rule that applies HERE]
+— because [why this matters in this context]
 
-### Step 2: [Next Action]
-[Continue with explicit steps]
+> If SKILL.md exceeds 500 lines: extract detailed content to `references/`
+> and add a one-liner here: "See `references/X.md` for the full [checklist/rubric/template]."
 
-## Examples
+### Phase 3: [Output Phase]
 
-### Example 1: [Common Scenario]
-User says: "[trigger phrase]"
-Actions:
-1. [Step]
-2. [Step]
-Result: [Concrete outcome]
+[Produce the output artifact]
 
 ## Error Handling
+
 **Error: "[Error message]"**
 - Cause: [Why it happens]
 - Solution: [How to fix]
 
 ## Reference Files
-- `references/examples.md`: [Purpose]
+- `references/examples.md`: [Purpose — loaded only when this skill executes]
+- `references/checklist.md`: [Phase 2 checklist — deep content extracted from SKILL.md]
 ```
 
 ### Best Practices for Instructions