Skip to content

Commit 2040fd5

Browse files
authored
Merge pull request #179 from notque/feat/workflow-first-wave-1
refactor: migrate 26 skills to workflow-first ordering (Wave 1)
2 parents 46c6d21 + e1f5a46 commit 2040fd5

190 files changed

Lines changed: 10288 additions & 20896 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/PHILOSOPHY.md

Lines changed: 25 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ The solution: only pull the relevant information for the specific task. This is
5454
Three mechanisms enforce this:
5555
- **Agents**: specialized instruction files tailored to specific domains, loaded only when their triggers match
5656
- **Skills**: workflow methodologies that invoke deterministic scripts (Python CLIs, validation tools) rather than relying on LLM judgment alone, activated only when their workflow applies
57-
- **Progressive Disclosure**: summary in the main file, details in `references/` subdirectory. Right context at the right time, not everything at once
57+
- **Progressive Disclosure**: SKILL.md contains the workflow orchestration and tells the model *when* to load deep context. Detailed catalogs, agent rosters, specification tables, and output templates live in `references/` and are loaded only when the current workflow phase needs them. A skill with 26 chart types keeps the selection logic in SKILL.md and each chart's parameter spec in its own reference file — the model loads only the spec for the chart it selected. A review skill with 4 waves keeps the orchestration in SKILL.md and each wave's agent roster in a separate reference file — Wave 2 agents don't consume tokens during Wave 1
5858

5959
## Tokens Are Cheap, Quality Is Expensive
6060

@@ -221,26 +221,38 @@ Everything a skill needs lives inside the skill directory. Scripts, viewer templ
221221

222222
```
223223
skills/my-skill/
224-
├── SKILL.md # The workflow
224+
├── SKILL.md # The orchestrator — workflow + when to load references
225225
├── agents/ # Subagent prompts used only by this skill
226226
├── scripts/ # Deterministic CLI tools this skill invokes
227227
├── assets/ # Templates, HTML viewers, static files
228228
└── references/ # Deep context loaded on demand
229229
```
230230

231+
**The orchestrator pattern:** SKILL.md is a thin workflow orchestrator, not a monolithic document. It tells the model *what to do* (phases, gates, decisions) and *when to load deep context* (reference files). The heavy content — detailed catalogs, agent dispatch prompts, output templates, specification tables — lives in `references/` and gets loaded only when the current phase needs it.
232+
233+
This is the difference between a skill that works and a skill that works *efficiently*:
234+
235+
| Approach | Token Cost | Quality |
236+
|----------|-----------|---------|
237+
| Everything in SKILL.md | High — full content loaded on every invocation | Good but wasteful |
238+
| Thin SKILL.md, no references | Low — but missing context | Degraded — lost domain knowledge |
239+
| **Orchestrator + references** | **Proportional to task** — load what the phase needs | **Best — full knowledge, minimal waste** |
240+
241+
Making a skill shorter by deleting content is not progressive disclosure — it's content loss. Progressive disclosure means the content still exists, organized so only the relevant slice enters the context window at any given phase.
242+
243+
**Example:** A review skill with 4 waves of agents keeps the wave orchestration logic in SKILL.md (~500 lines) and puts each wave's agent roster and dispatch prompts in separate reference files (`references/wave-1-foundation.md`, `references/wave-2-deep-dive.md`). When executing Wave 1, only the Wave 1 reference is loaded. Wave 2's agents don't consume tokens until Wave 2 begins.
244+
231245
**Why this matters:** A skill that depends on scripts scattered across the repo is fragile to move, hard to test, and impossible to evaluate in isolation. When everything is bundled, the skill can be:
232246
- Copied to another project and it works
233247
- Tested via `run_eval.py` against its own workspace
234248
- Reviewed as a single unit — all the tooling is visible in one tree
235249
- Deleted without orphaning dependencies elsewhere
236250

237-
**The exception:** Shared patterns (`shared-patterns/anti-rationalization-core.md`) are referenced across skills. These stay shared. But skill-specific scripts, assets, and agents are always bundled.
238-
239251
**Repo-level `scripts/`** is reserved for toolkit-wide operations (learning-db.py, sync-to-user-claude.py, INDEX generation) — tools that operate on the system as a whole, not on a single skill's workflow.
240252

241253
## Workflow First, Constraints Inline
242254

243-
Skill documents place the workflow (Instructions/Phases) immediately after the frontmatter. Constraints appear inline within the phases they govern, not in a separate upfront section.
255+
Skill documents place the workflow (Instructions/Phases) immediately after the frontmatter. Constraints appear inline within the phases they govern, with reasoning attached ("because X"), not in a separate upfront section.
244256

245257
**Measured result:** A/B/C testing on Go code generation showed workflow-first ordering (C) swept constraints-first ordering (B) 3-0 across simple, medium, and complex prompts. Agent blind reviewers consistently scored workflow-first higher on testing depth, Go idioms, and benchmark coverage.
246258

@@ -249,18 +261,19 @@ Skill documents place the workflow (Instructions/Phases) immediately after the f
249261
```
250262
1. YAML frontmatter (What + When)
251263
2. Brief overview (How — one paragraph)
252-
3. Instructions/Phases (The actual workflow, with inline constraints)
253-
4. Benchmark/Commands Guide (Reference material)
264+
3. Instructions/Phases (The workflow, constraints inline with reasoning)
265+
4. Reference Material (Commands, guides — or pointers to references/)
254266
5. Error Handling (Failure context)
255-
6. Anti-Patterns (What went wrong before)
256-
7. References (Pointers to deep context)
267+
6. References (Pointers to bundled files)
257268
```
258269

259-
**Why it works:** The model encounters the task structure before the constraint framework. Constraints appear at the decision point where they apply — "use table-driven tests because they make adding cases trivial" inside the testing phase, not in a separate Hardcoded Behaviors section 200 lines earlier. The model spends attention on understanding the task, not parsing a constraint taxonomy.
270+
**Why it works:** The model encounters the task structure before any constraint framework. Constraints appear at the decision point where they apply — "use table-driven tests because they make adding cases trivial" inside the testing phase, not in a separate Hardcoded Behaviors section 200 lines earlier. Attaching reasoning ("because X") lets the model generalize constraints to situations the skill author didn't anticipate.
271+
272+
**What was removed:** Operator Context sections (Hardcoded/Default/Optional taxonomy), standalone Anti-Patterns sections, Anti-Rationalization tables, and Capabilities & Limitations boilerplate. These were structural overhead that separated constraints from the workflow steps where they apply.
260273

261-
**What moves:** The Operator Context section (Hardcoded/Default/Optional behaviors) decomposes. Each constraint migrates to the phase where it applies. "Run with -race for concurrent code" belongs in Phase 3 (RUN), not in a behavior table.
274+
**Where the content went:** Every constraint was distributed inline to the workflow step where it matters. Anti-pattern wisdom became reasoning attached to the relevant instruction. Nothing was deleted — it was reorganized to be at point-of-use.
262275

263-
**What stays:** Error Handling, Anti-Patterns, and References remain at the end as context that's consulted when things go wrong — not before the model has understood what "going right" looks like.
276+
**Progressive disclosure completes the picture:** Workflow-first ordering keeps SKILL.md navigable. For skills exceeding ~500 lines, detailed catalogs, agent rosters, and specification tables move to `references/` files. The SKILL.md workflow tells the model when to load each reference — "Read `references/wave-1-foundation.md` for the agent list and dispatch prompts." The model gets the orchestration logic upfront and loads deep context only when the current phase needs it.
264277

265278
## Open Sharing Over Individual Ownership
266279

0 commit comments

Comments
 (0)