Skip to content

zztimur/skill-forge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Skill Forge

Skill Forge is a quality gate for Agent Skills.

It reviews OpenAI, Claude, and generic Agent Skill packages before they get installed, published, or shipped. It combines a deterministic package inspector with a qualitative review workflow so you can catch the boring structural failures, the dangerous edge cases, and the subtle instruction problems that only show up once an agent starts interpreting the skill in the wild.

In plain English: Skill Forge helps you find out whether a skill is actually ready — or just looks ready because the folder has a SKILL.md in it.

What this repository contains

skill-forge/
├── SKILL.md
├── README.md
├── LICENSE
├── .gitignore
├── .github/
│   └── workflows/
│       └── self-tests.yml
├── agents/
│   └── openai.yaml
├── scripts/
│   ├── inspect_skill_package.py
│   └── run_self_tests.py
└── references/
    ├── audit-checklist.md
    ├── evaluation-rubric.md
    ├── example-report.md
    ├── inspector-output-schema.md
    ├── platform-compatibility.md
    ├── pressure-test-suite.md
    ├── release-gate-checklist.md
    ├── report-template.md
    └── severity-framework.md

What Skill Forge checks

Skill Forge looks at two different failure modes:

  1. Package integrity — the things a script can inspect reliably.
  2. Agent behavior quality — the things that require judgment, pressure testing, and a working understanding of how skills fail in real use.

It can help evaluate:

  • Uploaded Skill ZIPs, folders, or SKILL.md drafts.
  • Package structure and expected entrypoints.
  • Cross-platform compatibility risks across OpenAI, Claude, and generic agent environments.
  • Unsafe archive paths, symlinks, oversized files, suspected secrets, missing resources, and leftover template content.
  • Trigger quality: when the skill should activate, when it should stay out of the way, and where ambiguity may cause bad routing.
  • Instruction clarity, contradiction risk, and unnecessary context bloat.
  • Progressive loading: whether the core skill stays lean while detailed rubrics and references live in linked files.
  • Safety posture and risky workflow assumptions.
  • Pressure-test results and release readiness.

The goal is not to produce a pretty audit for a dashboard. The goal is to stop weak, unsafe, confusing, or overbuilt skills from getting shipped with a confident little smile on their face.

Running the inspector locally

The inspector is dependency-free and uses only the Python standard library.

python -S scripts/inspect_skill_package.py /path/to/skill-or-skill.zip --json

For CI-style checks, use strict mode. Strict mode exits with code 2 when any error-severity finding is present.

python -S scripts/inspect_skill_package.py /path/to/skill-or-skill.zip --json --strict

Human-readable markdown output is available by omitting --json:

python -S scripts/inspect_skill_package.py /path/to/skill-or-skill.zip

The markdown output ends with a status and finding-count footer. JSON output includes a top-level summary object for release-gate and CI integrations.

Compatibility notes

Use the inspector process exit code as the canonical machine signal for CI and release gates.

In strict mode:

  • Exit code 0 means the inspected package passed strict checks.
  • Exit code 2 means one or more error-severity findings were present.

The top-level JSON summary object is retained for compatibility with existing automation that reads fields such as:

  • summary.status
  • summary.strict_pass
  • summary.error_count
  • summary.finding_codes

New integrations should still treat the process exit code as the source of truth for pass/fail decisions. Use summary for reporting, dashboards, or compatibility with older pipelines.

Running self-tests

Run the regression suite after changing the inspector, output schema, release-gate behavior, or any rule that could change pass/fail results.

python -S scripts/run_self_tests.py

The tests build temporary valid, malformed, and hostile Skill fixtures. They do not require network access or third-party packages.

Continuous integration

This repository includes a GitHub Actions workflow at .github/workflows/self-tests.yml.

It runs the bundled regression suite on pushes, pull requests, and manual dispatches:

python -S scripts/run_self_tests.py

Keep the workflow dependency-free unless the inspector intentionally adds third-party runtime requirements.

Release-gate workflow

For install, publish, ship, or release-candidate decisions, use references/release-gate-checklist.md and report results using the Release Gate Review section from references/report-template.md.

A skill should not be called release-ready when a Critical gate fails.

Treat the following as blockers unless the user explicitly requested only a draft review:

  • Strict inspector failures.
  • Likely bundled secrets.
  • Unsafe archive structures.
  • Missing or multiple SKILL.md entrypoints.
  • Official validator failures.
  • Safety risks that could cause the agent to take unsafe, misleading, destructive, or unsupported actions.

Release readiness should mean something. Otherwise it is just ceremony wearing a badge.

Safety notes

Treat uploaded archives and bundled scripts as untrusted input.

The secret scan is heuristic and non-exhaustive. A clean scan does not prove that no secrets exist.

Do not run bundled scripts that appear destructive, credential-harvesting, network-dependent, or unrelated to validation.

Keep SKILL.md compact. Move detailed rubrics, examples, schemas, and extended guidance into directly linked files under references/.

The skill should help the agent load the right amount of context at the right time — not bury it under a pile of instructions and then act surprised when behavior drifts.

Packaging

A distributable Skill ZIP should contain the skill-forge/ directory as the archive root entry.

Keep the package under the target platform upload limit and exclude generated caches.

License

This repository is distributed under the MIT License.

Update the copyright holder in LICENSE if your organization requires a different owner or license.

About

Quality gate for Agent Skills: inspect, pressure test, validate, and grade skills before release.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages