Skip to content

feat(skills): align YAML frontmatter to agentskills.io spec + add eval framework#798

Open
sytone wants to merge 1 commit intobradygaster:devfrom
sytone:squad/skills-schema-alignment-v2
Open

feat(skills): align YAML frontmatter to agentskills.io spec + add eval framework#798
sytone wants to merge 1 commit intobradygaster:devfrom
sytone:squad/skills-schema-alignment-v2

Conversation

@sytone
Copy link
Copy Markdown

@sytone sytone commented Apr 3, 2026

What

Aligns all skill YAML frontmatter across 3 canonical locations to the agentskills.io specification and adds a comprehensive eval framework for testing skill trigger quality.

Part one of a multi-PR skills normalization effort — this PR touches ONLY frontmatter (no skill body content changes) and adds the eval tooling.

Why

  • Skills used an inconsistent internal schema (domain, confidence, source as top-level fields, tools arrays, some skills had no frontmatter at all)
  • No mechanism existed to validate skill descriptions trigger correctly on user prompts
  • No contribution guide or review checklist existed for skill quality

How

Schema alignment (34 skills across 3 directories):

  • .squad/skills/ (14 skills) — team-level patterns
  • .copilot/skills/ (17 skills) — coordinator playbook
  • templates/skills/ (3 skills) — product templates

All skills now have:

  • name, description, license as top-level (agentskills.io required)
  • domain, confidence, source, triggers, roles, compatibility inside metadata: map
  • allowed-tools as top-level string where applicable (agentskills.io optional)
  • Skills without frontmatter got complete --- blocks added

Eval framework (new):

  • Phase 1run-evals.mjs: keyword-based trigger matching (fast, CI-ready). 88.9% baseline.
  • Phase 2run-llm-evals.mjs: LLM-based trigger + execution evals via Copilot CLI models. Supports --type trigger|exec|all, --runs N for nondeterminism, --split for train/validation.
  • Phase 3optimize-description.mjs: iterative description optimization loop with train/validation split.
  • Schema validatorvalidate-schema.mjs: checks frontmatter compliance across all directories.
  • 31 trigger eval fixtures (342 test cases) + 10 execution eval fixtures with LLM-as-judge assertion grading.

New team member:

  • SPAN (Skill Curator) — owns skill quality, schema compliance, eval coverage, and trigger testing.

Docs:

  • CONTRIBUTING.md — skill creation/modification workflow, schema reference, eval guide
  • skill-review-checklist.md — 40+ checkpoint review gate
  • README.md — eval framework documentation

Testing

  • node .squad/skills/evals/validate-schema.mjs passes (34/34 skills valid)
  • node .squad/skills/evals/run-evals.mjs passes (88.9%, 304/342)
  • node .squad/skills/evals/run-llm-evals.mjs --dry-run works for all modes
  • No source code changes — npm run build and npm test not applicable

Docs

  • CONTRIBUTING.md for skills
  • Eval framework README
  • Skill review checklist template

Breaking Changes

None — frontmatter-only changes. The SDK simple YAML parser flattens nested metadata, so triggers/roles inside metadata: are still accessible as top-level fields at runtime.

Waivers

Copilot AI review requested due to automatic review settings April 3, 2026 19:41
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

🛫 PR Readiness Check

⚠️ 2 item(s) to address before review

Status Check Details
Single commit 1 commit — clean history
Not in draft Ready for review
Branch up to date Up to date with dev
Copilot review No Copilot review yet — it may still be processing
Changeset present No source files changed — changeset not required
Scope clean ⚠️ PR includes 68 .squad/ file(s) — ensure these are intentional
No merge conflicts No merge conflicts
Copilot threads resolved All 4 Copilot thread(s) resolved
CI passing No CI checks have run yet

This check runs automatically on every push. Fix any ❌ items and push again.
See CONTRIBUTING.md and PR Requirements for details.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns skill YAML frontmatter across .squad/skills/, .copilot/skills/, and templates/skills/ to the agentskills.io schema shape, and introduces a new eval framework (schema validation + trigger/execution eval fixtures) to measure skill routing quality.

Changes:

  • Normalizes skill frontmatter (adds license, introduces metadata: with domain/confidence/source/compatibility/triggers/roles, and allowed-tools where applicable).
  • Adds skill eval tooling (validate-schema.mjs, run-evals.mjs) plus a large set of trigger (*.eval.yaml) and execution (*.exec-eval.yaml) fixtures.
  • Adds skill contribution/review docs and registers a new “SPAN” Skill Curator role in team routing/casting.

Reviewed changes

Copilot reviewed 88 out of 88 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
templates/skills/squad-conventions/SKILL.md Updates template skill frontmatter to agentskills.io-aligned structure.
templates/skills/rework-rate/SKILL.md Updates template skill frontmatter; replaces tool list with allowed-tools.
templates/skills/nap/SKILL.md Adds missing frontmatter block in template skill.
.squad/templates/skill.md Updates canonical skill template to new frontmatter schema.
.squad/templates/skill-review-checklist.md Adds a comprehensive skill PR review checklist aligned to the new schema/eval process.
.squad/team.md Adds SPAN as an active team member.
.squad/skills/versioning-policy/SKILL.md Migrates skill frontmatter to new schema and updates description.
.squad/skills/session-recovery/SKILL.md Migrates skill frontmatter to new schema; sets allowed-tools.
.squad/skills/release-process/SKILL.md Adds frontmatter block for release-process skill.
.squad/skills/ralph-two-pass-scan/SKILL.md Adds frontmatter block for ralph-two-pass-scan skill.
.squad/skills/pr-screenshots/SKILL.md Migrates frontmatter to new schema; updates description line.
.squad/skills/personal-squad/SKILL.md Adds frontmatter block for personal-squad skill.
.squad/skills/model-selection/SKILL.md Adds frontmatter block for model-selection skill.
.squad/skills/humanizer/SKILL.md Migrates frontmatter to new schema; updates description line.
.squad/skills/gh-auth-isolation/SKILL.md Migrates frontmatter to new schema; sets allowed-tools.
.squad/skills/external-comms/SKILL.md Migrates frontmatter to new schema; sets allowed-tools.
.squad/skills/economy-mode/SKILL.md Migrates frontmatter to new schema; enriches description/triggers.
.squad/skills/cross-squad/SKILL.md Migrates frontmatter; consolidates tool list into allowed-tools.
.squad/skills/cross-machine-coordination/SKILL.md Adds frontmatter block for cross-machine-coordination skill.
.squad/skills/CONTRIBUTING.md Adds contribution guide for skills and the eval framework workflow.
.squad/skill.md Updates root skill template to match .squad/templates/skill.md.
.squad/routing.md Adds explicit ownership line for “Skill quality & eval” under SPAN.
.squad/casting/registry.json Registers SPAN agent identity in casting registry.
.squad/agents/span/history.md Adds SPAN agent history file capturing skill/eval context.
.squad/agents/span/charter.md Adds SPAN agent charter defining responsibilities and hard rules.
.copilot/skills/squad-conventions/SKILL.md Migrates Copilot skill frontmatter to new schema.
.copilot/skills/secret-handling/SKILL.md Migrates Copilot skill frontmatter to new schema.
.copilot/skills/reviewer-protocol/SKILL.md Migrates Copilot skill frontmatter to new schema.
.copilot/skills/reskill/SKILL.md Migrates Copilot skill frontmatter to new schema.
.copilot/skills/release-process/SKILL.md Migrates Copilot skill frontmatter to new schema.
.copilot/skills/model-selection/SKILL.md Migrates Copilot skill frontmatter to new schema.
.copilot/skills/init-mode/SKILL.md Migrates Copilot skill frontmatter; sets allowed-tools.
.copilot/skills/history-hygiene/SKILL.md Migrates Copilot skill frontmatter to new schema.
.copilot/skills/github-multi-account/SKILL.md Migrates Copilot skill frontmatter; moves author into metadata.
.copilot/skills/git-workflow/SKILL.md Migrates Copilot skill frontmatter to new schema.
.copilot/skills/distributed-mesh/SKILL.md Migrates Copilot skill frontmatter to new schema.
.copilot/skills/client-compatibility/SKILL.md Migrates Copilot skill frontmatter to new schema.
.copilot/skills/cli-wiring/SKILL.md Adds missing frontmatter block for cli-wiring skill.
.copilot/skills/ci-validation-gates/SKILL.md Migrates Copilot skill frontmatter to new schema.
.copilot/skills/architectural-proposals/SKILL.md Migrates Copilot skill frontmatter; sets allowed-tools.
.copilot/skills/agent-conduct/SKILL.md Migrates Copilot skill frontmatter to new schema.
.copilot/skills/agent-collaboration/SKILL.md Migrates Copilot skill frontmatter to new schema.
.squad/skills/evals/validate-schema.mjs Adds schema validator for skill frontmatter + eval coverage reporting.
.squad/skills/evals/run-evals.mjs Adds Phase 1 keyword-based trigger eval runner.
.squad/skills/evals/README.md Documents the eval framework, fixture formats, and running instructions.
.squad/skills/evals/versioning-policy.eval.yaml Adds trigger eval fixture for versioning-policy.
.squad/skills/evals/squad-conventions.eval.yaml Adds trigger eval fixture for squad-conventions.
.squad/skills/evals/session-recovery.eval.yaml Adds trigger eval fixture for session-recovery.
.squad/skills/evals/session-recovery.exec-eval.yaml Adds execution eval fixture for session-recovery.
.squad/skills/evals/secret-handling.eval.yaml Adds trigger eval fixture for secret-handling.
.squad/skills/evals/secret-handling.exec-eval.yaml Adds execution eval fixture for secret-handling.
.squad/skills/evals/rework-rate.eval.yaml Adds trigger eval fixture for rework-rate.
.squad/skills/evals/reviewer-protocol.eval.yaml Adds trigger eval fixture for reviewer-protocol.
.squad/skills/evals/reviewer-protocol.exec-eval.yaml Adds execution eval fixture for reviewer-protocol.
.squad/skills/evals/reskill.eval.yaml Adds trigger eval fixture for reskill.
.squad/skills/evals/release-process.eval.yaml Adds trigger eval fixture for release-process.
.squad/skills/evals/release-process.exec-eval.yaml Adds execution eval fixture for release-process.
.squad/skills/evals/ralph-two-pass-scan.eval.yaml Adds trigger eval fixture for ralph-two-pass-scan.
.squad/skills/evals/pr-screenshots.eval.yaml Adds trigger eval fixture for pr-screenshots.
.squad/skills/evals/personal-squad.eval.yaml Adds trigger eval fixture for personal-squad.
.squad/skills/evals/nap.eval.yaml Adds trigger eval fixture for nap.
.squad/skills/evals/model-selection.eval.yaml Adds trigger eval fixture for model-selection.
.squad/skills/evals/model-selection.exec-eval.yaml Adds execution eval fixture for model-selection.
.squad/skills/evals/init-mode.eval.yaml Adds trigger eval fixture for init-mode.
.squad/skills/evals/humanizer.eval.yaml Adds trigger eval fixture for humanizer.
.squad/skills/evals/history-hygiene.eval.yaml Adds trigger eval fixture for history-hygiene.
.squad/skills/evals/github-multi-account.eval.yaml Adds trigger eval fixture for github-multi-account.
.squad/skills/evals/git-workflow.eval.yaml Adds trigger eval fixture for git-workflow.
.squad/skills/evals/git-workflow.exec-eval.yaml Adds execution eval fixture for git-workflow.
.squad/skills/evals/gh-auth-isolation.eval.yaml Adds trigger eval fixture for gh-auth-isolation.
.squad/skills/evals/gh-auth-isolation.exec-eval.yaml Adds execution eval fixture for gh-auth-isolation.
.squad/skills/evals/fact-checking.eval.yaml Adds trigger eval fixture for fact-checking.
.squad/skills/evals/external-comms.eval.yaml Adds trigger eval fixture for external-comms.
.squad/skills/evals/external-comms.exec-eval.yaml Adds execution eval fixture for external-comms.
.squad/skills/evals/economy-mode.eval.yaml Adds trigger eval fixture for economy-mode.
.squad/skills/evals/economy-mode.exec-eval.yaml Adds execution eval fixture for economy-mode.
.squad/skills/evals/distributed-mesh.eval.yaml Adds trigger eval fixture for distributed-mesh.
.squad/skills/evals/cross-squad.eval.yaml Adds trigger eval fixture for cross-squad.
.squad/skills/evals/cross-machine-coordination.eval.yaml Adds trigger eval fixture for cross-machine-coordination.
.squad/skills/evals/client-compatibility.eval.yaml Adds trigger eval fixture for client-compatibility.
.squad/skills/evals/cli-wiring.eval.yaml Adds trigger eval fixture for cli-wiring.
.squad/skills/evals/ci-validation-gates.eval.yaml Adds trigger eval fixture for ci-validation-gates.
.squad/skills/evals/ci-validation-gates.exec-eval.yaml Adds execution eval fixture for ci-validation-gates.
.squad/skills/evals/architectural-proposals.eval.yaml Adds trigger eval fixture for architectural-proposals.
.squad/skills/evals/agent-conduct.eval.yaml Adds trigger eval fixture for agent-conduct.
.squad/skills/evals/agent-collaboration.eval.yaml Adds trigger eval fixture for agent-collaboration.

sytone added a commit to sytone/squad that referenced this pull request Apr 3, 2026
- validate-schema.mjs: update header comment to include templates/skills/,
  remove author/version from ALLOWED_TOP_LEVEL (should be in metadata)
- README.md: fix typo 'seletion' -> 'selection' in example prompt
- CONTRIBUTING.md: fix relative link to skill-review-checklist.md

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sytone
Copy link
Copy Markdown
Author

sytone commented Apr 3, 2026

@bradygaster have a look, this is just getting the skills to the standard skills format and adding in the basic eval process. Next will to be to remove this repo squad specific items and update SPAN to help catch this in the future so it is kept cleaner. The other option is to move all skills to a template folder and then have squad pull from that folder on local repo updates and for other users the current process should work.

…l framework

Migrates all 34 skills across .squad/skills/, .copilot/skills/, and
templates/skills/ to the agentskills.io specification schema:
- name, description, license as top-level fields (spec required)
- domain, confidence, source, triggers, roles, compatibility in metadata
- tools arrays converted to allowed-tools strings
- Skills without frontmatter get complete --- blocks added
- Body content of all SKILL.md files unchanged (frontmatter only)

Adds three-phase eval framework for skill trigger quality testing:
- Phase 1 (run-evals.mjs): keyword matching, 88.9% baseline, CI-ready
- Phase 2 (run-llm-evals.mjs): LLM trigger + execution evals via Copilot
- Phase 3 (optimize-description.mjs): iterative description optimization
- Schema validator, 31 trigger fixtures, 10 execution fixtures
- CONTRIBUTING.md, skill-review-checklist, eval README

Adds SPAN (Skill Curator) team member for skill quality gating.

Spec: https://agentskills.io/specification

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@sytone sytone force-pushed the squad/skills-schema-alignment-v2 branch from 006e985 to 8e4a56e Compare April 3, 2026 23:26
@sytone
Copy link
Copy Markdown
Author

sytone commented Apr 3, 2026

Following up, for skills the ones used bythe squad need to be in a path that agent/s can access. If you look at copilot the following paths are looked at. I notice that the .copilot folder in the repo is created running 'squad init' this would mean the skills are not loaded by default.

The init should take a agent environment like copilot, claude, etc and pick the best folder. We can default to copilot ;)

This way skills would go to .github/skills. I have thoughts on cleaning up the skills with a single source location to make it clear they are squad supporting skills that are distributed in the repo when compared to skills used by this repo.

If there is a skills folder under templates that is the 'master' for all squad supporting skills we can refactor them down and also make it clear which are applicable for github vs ADO, again the init can help there or potentially a auto-detect? Anyway based on that the right set of skills would be provisioned in the right skill folder. Then the squad can kick in and help run the show.

This would mean the removal of the two skill copy scripts, create of a single replication script for the build process and then the skills in squad would need to be reviewed as technically the agent does not know about this folder and would require lots of hinting. I have not look at how this is done but it would be better if the agent handled it and we eval the descriptions for targeting.

Also the agent has this which does not apply when running squad init fromthe cli

Skill-aware routing: Before spawning, check .squad/skills/ for skills relevant to the task domain. If a matching skill exists, add to the spawn prompt: Relevant skill: .squad/skills/{name}/SKILL.md — read before starting. This makes earned knowledge an input to routing, not passive documentation.

The defaults are:

image

@jazz127
Copy link
Copy Markdown

jazz127 commented Apr 4, 2026

Following up, for skills the ones used bythe squad need to be in a path that agent/s can access. If you look at copilot the following paths are looked at. I notice that the .copilot folder in the repo is created running 'squad init' this would mean the skills are not loaded by default.

i made a discussion about this. One other avenue worth exploring (i think?) is using the config.json in .copilot. could potentially add something like this to allow flexibility in choosing where the skills actually need to end up but still have them available in copilot cli.

  "skillDirectories": [
    "/Users/jazz127/jorgmatrix/.copilot/skills"
  ],

Totally agree on making it clearer on the difference between "team skills" vs "copilot skills". Also wondering why these can't be a centralised single location in the squad repo? it seems like they get duplicated between cli and sdk packages etc. My (limited) experience is on the .NET stack so i'm probably unaware of something with typescript/node etc.

It's just a thought but is the agent based prompt what makes the team's skills work once in the environment/outside of the init procedure?

tamirdresher pushed a commit that referenced this pull request Apr 4, 2026
- --self installs @bradygaster/squad-cli@latest (stable)
- --self --insider installs @bradygaster/squad-cli@insider (prerelease)
- Auto-continues with repo upgrade after self-install
- Detects package manager (npm/pnpm/yarn)
- 4 new tests

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Omzig
Copy link
Copy Markdown

Omzig commented Apr 6, 2026

If i could make one change is to put the skills in .claude/skills and the agent in .claude/agents that way they can still work OOB without having to change the users environment too much, unless i miss understand what is really going on ;)

this would keep the squad out of my .github/ environment for my projects ;)

if we could get someone to update the settings for copilot to have .copilot/skills and .copilot/agents and .copilot/prompts, oh my.

@Omzig
Copy link
Copy Markdown

Omzig commented Apr 7, 2026

@sytone, i tried to get them to add the extra path locations to the default settings, they closed it :(

Would you happen to have any words that would encourage them to reopen it and apply the enhancement request?

@Omzig
Copy link
Copy Markdown

Omzig commented Apr 7, 2026

My Workaround

---
name: apply-bug-fix-to-squad
description: "Prepare the next Squad release by normalizing skill front matter across `.copilot/skills` and `.squad/skills`, then updating Squad's skill lookup instructions so `.copilot/skills` is checked alongside `.squad/skills`."
agent: agent
model: GPT-5.4 (copilot)
---

# Apply Bug Fix To Squad

Use this prompt when the next Squad release needs the same workflow-level bug fix applied or re-verified. Keep the work narrow: fix skill metadata in the live skill trees, then fix skill discovery wording in the live Squad coordinator.

## Primary Outcome

- Every live skill in `.copilot/skills/*/SKILL.md` and `.squad/skills/*/SKILL.md` has valid repo-standard YAML front matter.
- The primary live coordinator file `.github/agents/squad.agent.md` tells agents to look in `.copilot/skills` as well as `.squad/skills` wherever the text is about skill discovery or lookup.
- Any mirror or template files that directly ship the same behavior stay aligned, but mirror work remains secondary to the two required fixes above.

## Scope

In scope:

- `.copilot/skills/*/SKILL.md`
- `.squad/skills/*/SKILL.md`
- `.github/agents/squad.agent.md`

Secondary alignment only if needed because the live files above changed:

- `.squad/templates/squad.agent.md`
- `.squad/templates/skills/*/SKILL.md`

Out of scope:

- PowerShell module internals
- New skills or new agents
- Broad wording cleanup unrelated to metadata or lookup behavior
- Branch creation, commits, pushes, or PR creation

## Front Matter Standard

For each skill file, use a standard YAML block at the top with these fields unless the repo already requires more:

- `name`
- `description`
- `metadata`

Under `metadata`, include:

- `domain`
- `confidence`
- `source`

Follow these rules:

- Keep `description` on a single line.
- Quote `description` whenever it contains YAML-sensitive punctuation, especially colons.
- Keep `metadata` as a top-level mapping that contains `domain`, `confidence`, and `source`.
- Keep `name` aligned to the skill folder name.
- Preserve existing body content and headings unless a minimal wording adjustment is required to support the new metadata.
- For `.copilot/skills`, normalize only what is needed for consistency and parse safety.
- For `.squad/skills`, add missing front matter without turning the task into a content rewrite.

## Lookup Fix Rules

When updating `.github/agents/squad.agent.md`:

- Treat `.github/agents/squad.agent.md` as the primary live file.
- Change only instructions that are about finding, reading, or loading relevant skills.
- Expand those lookup instructions so they cover both `.squad/skills` and `.copilot/skills`.
- Do not blindly replace every `.squad/skills` occurrence.
- Leave creation, installation, and write-destination paths alone when they are intentionally about writing to `.squad/skills`, unless the surrounding text is clearly wrong.
- If the same live lookup wording exists in `.squad/templates/squad.agent.md`, keep the template mirror aligned after the live file is correct.

## Workflow

### 1. Inventory the live files

1. Enumerate every `SKILL.md` under `.copilot/skills` and `.squad/skills`.
2. Note which files already have front matter and which do not.
3. Note any `.copilot/skills` headers that are malformed or inconsistent with the repo standard.

### 2. Normalize live skill front matter

1. Add missing YAML front matter to each `.squad/skills/*/SKILL.md`.
2. Repair or normalize existing YAML front matter in `.copilot/skills/*/SKILL.md` only where necessary.
3. Derive metadata from the skill's actual content; do not use placeholder values like `TBD` or `misc`.
4. Keep edits minimal outside the header block.

### 3. Fix live Squad skill lookup

1. Search `.github/agents/squad.agent.md` for instructions that currently look only in `.squad/skills`.
2. Update those lookup and discovery instructions to mention `.copilot/skills` alongside `.squad/skills`.
3. Re-read each changed occurrence to confirm it still makes sense in context and did not accidentally change write destinations or plugin-install behavior.

### 4. Align mirrors only after the live fix

1. If the changed agent lookup text is mirrored in `.squad/templates/squad.agent.md`, update that template to match.
2. If this repo intentionally keeps `.copilot/skills` mirrored into `.squad/templates/skills`, update template headers only when needed to keep shipped mirrors consistent.
3. Do not let template parity work replace or delay the required live fixes.

## Validation

Before finishing, verify all of the following:

1. Every live file in `.copilot/skills/*/SKILL.md` and `.squad/skills/*/SKILL.md` starts with a valid YAML front matter block.
2. Each live skill header includes top-level `name`, `description`, and `metadata`, with `metadata.domain`, `metadata.confidence`, and `metadata.source` present.
3. `.github/agents/squad.agent.md` now mentions both `.copilot/skills` and `.squad/skills` in each lookup or discovery instruction you changed.
4. No install or authoring instruction was accidentally changed from a `.squad/skills` destination to `.copilot/skills`.
5. If any mirror or template files were updated, they match the final live wording they mirror.

## Final Report

Report these items at the end:

1. Which live skill files were updated in `.copilot/skills` and which were updated in `.squad/skills`.
2. Which `.squad/skills` files were missing front matter before the fix.
3. Which lookup or discovery passages were updated in `.github/agents/squad.agent.md`.
4. Whether `.squad/templates/squad.agent.md` or any `.squad/templates/skills/*/SKILL.md` files were also updated for alignment.
5. Any files intentionally left unchanged and why.
6. What validation you performed, and what you did not run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants