Skill Forge

Skill Forge is a quality gate for Agent Skills.

It reviews OpenAI, Claude, and generic Agent Skill packages before they get installed, published, or shipped. It combines a deterministic package inspector with a qualitative review workflow so you can catch the boring structural failures, the dangerous edge cases, and the subtle instruction problems that only show up once an agent starts interpreting the skill in the wild.

In plain English: Skill Forge helps you find out whether a skill is actually ready — or just looks ready because the folder has a SKILL.md in it.

What this repository contains

skill-forge/
├── SKILL.md
├── README.md
├── LICENSE
├── .gitignore
├── .github/
│   └── workflows/
│       └── self-tests.yml
├── agents/
│   └── openai.yaml
├── scripts/
│   ├── inspect_skill_package.py
│   └── run_self_tests.py
└── references/
    ├── audit-checklist.md
    ├── evaluation-rubric.md
    ├── example-report.md
    ├── inspector-output-schema.md
    ├── platform-compatibility.md
    ├── pressure-test-suite.md
    ├── release-gate-checklist.md
    ├── report-template.md
    └── severity-framework.md

What Skill Forge checks

Skill Forge looks at two different failure modes:

Package integrity — the things a script can inspect reliably.
Agent behavior quality — the things that require judgment, pressure testing, and a working understanding of how skills fail in real use.

It can help evaluate:

Uploaded Skill ZIPs, folders, or SKILL.md drafts.
Package structure and expected entrypoints.
Cross-platform compatibility risks across OpenAI, Claude, and generic agent environments.
Unsafe archive paths, symlinks, oversized files, suspected secrets, missing resources, and leftover template content.
Trigger quality: when the skill should activate, when it should stay out of the way, and where ambiguity may cause bad routing.
Instruction clarity, contradiction risk, and unnecessary context bloat.
Progressive loading: whether the core skill stays lean while detailed rubrics and references live in linked files.
Safety posture and risky workflow assumptions.
Pressure-test results and release readiness.

The goal is not to produce a pretty audit for a dashboard. The goal is to stop weak, unsafe, confusing, or overbuilt skills from getting shipped with a confident little smile on their face.

Running the inspector locally

The inspector is dependency-free and uses only the Python standard library.

python -S scripts/inspect_skill_package.py /path/to/skill-or-skill.zip --json

For CI-style checks, use strict mode. Strict mode exits with code 2 when any error-severity finding is present.

python -S scripts/inspect_skill_package.py /path/to/skill-or-skill.zip --json --strict

Human-readable markdown output is available by omitting --json:

python -S scripts/inspect_skill_package.py /path/to/skill-or-skill.zip

The markdown output ends with a status and finding-count footer. JSON output includes a top-level summary object for release-gate and CI integrations.

Compatibility notes

Use the inspector process exit code as the canonical machine signal for CI and release gates.

In strict mode:

Exit code 0 means the inspected package passed strict checks.
Exit code 2 means one or more error-severity findings were present.

The top-level JSON summary object is retained for compatibility with existing automation that reads fields such as:

summary.status
summary.strict_pass
summary.error_count
summary.finding_codes

New integrations should still treat the process exit code as the source of truth for pass/fail decisions. Use summary for reporting, dashboards, or compatibility with older pipelines.

Running self-tests

Run the regression suite after changing the inspector, output schema, release-gate behavior, or any rule that could change pass/fail results.

python -S scripts/run_self_tests.py

The tests build temporary valid, malformed, and hostile Skill fixtures. They do not require network access or third-party packages.

Continuous integration

This repository includes a GitHub Actions workflow at .github/workflows/self-tests.yml.

It runs the bundled regression suite on pushes, pull requests, and manual dispatches:

python -S scripts/run_self_tests.py

Keep the workflow dependency-free unless the inspector intentionally adds third-party runtime requirements.

Release-gate workflow

For install, publish, ship, or release-candidate decisions, use references/release-gate-checklist.md and report results using the Release Gate Review section from references/report-template.md.

A skill should not be called release-ready when a Critical gate fails.

Treat the following as blockers unless the user explicitly requested only a draft review:

Strict inspector failures.
Likely bundled secrets.
Unsafe archive structures.
Missing or multiple SKILL.md entrypoints.
Official validator failures.
Safety risks that could cause the agent to take unsafe, misleading, destructive, or unsupported actions.

Release readiness should mean something. Otherwise it is just ceremony wearing a badge.

Safety notes

Treat uploaded archives and bundled scripts as untrusted input.

The secret scan is heuristic and non-exhaustive. A clean scan does not prove that no secrets exist.

Do not run bundled scripts that appear destructive, credential-harvesting, network-dependent, or unrelated to validation.

Keep SKILL.md compact. Move detailed rubrics, examples, schemas, and extended guidance into directly linked files under references/.

The skill should help the agent load the right amount of context at the right time — not bury it under a pile of instructions and then act surprised when behavior drifts.

Packaging

A distributable Skill ZIP should contain the skill-forge/ directory as the archive root entry.

Keep the package under the target platform upload limit and exclude generated caches.

License

This repository is distributed under the MIT License.

Update the copyright holder in LICENSE if your organization requires a different owner or license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skill Forge

What this repository contains

What Skill Forge checks

Running the inspector locally

Compatibility notes

Running self-tests

Continuous integration

Release-gate workflow

Safety notes

Packaging

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
agents		agents
references		references
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Folders and files

Latest commit

History

Repository files navigation

Skill Forge

What this repository contains

What Skill Forge checks

Running the inspector locally

Compatibility notes

Running self-tests

Continuous integration

Release-gate workflow

Safety notes

Packaging

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages