Skip to content

selftune-dev/selftune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
selftune logo

selftune

Self-improving skills for AI agents.

CI CodeQL OpenSSF Scorecard npm version License: MIT TypeScript Zero Dependencies Bun

Your agent skills learn how you work. Detect what's broken. Fix it automatically.

Install · Use Cases · How It Works · Commands · Platforms · Docs


Your skills don't understand how you talk. You say "make me a slide deck" and nothing happens — no error, no log, no signal. selftune watches your real sessions, learns how you actually speak, and rewrites skill descriptions to match. Automatically.

Works with Claude Code, Codex, OpenCode, and OpenClaw. Zero runtime dependencies.

Install

npx skills add selftune-dev/selftune

Then tell your agent: "initialize selftune"

Two minutes. No API keys. No external services. No configuration ceremony. Uses your existing agent subscription. Within minutes you'll see which skills are undertriggering.

CLI only (no skill, just the CLI):

npx selftune@latest doctor

Before / After

Before: 47% pass rate → After: 89% pass rate

selftune learned that real users say "slides", "deck", "presentation for Monday" — none of which matched the original skill description. It rewrote the description to match how people actually talk. Validated against the eval set. Deployed with a backup. Done.

Built for How You Actually Work

I write and use my own skills — You built skills for your workflow but your descriptions don't match how you actually talk. selftune learns your language from real sessions and evolves descriptions to match — no more manual tuning. selftune status · selftune evolve · selftune baseline

I publish skills others install — Your skill works for you, but every user talks differently. selftune ships skills that get better for every user automatically — adapting descriptions to how each person actually works. selftune status · selftune evals · selftune badge

I manage an agent setup with many skills — You have 15+ skills installed. Some work. Some chain together. Some conflict. selftune shows which combinations repeat, which ones help, and where the friction is. selftune dashboard · selftune composability · selftune workflows

How It Works

Observe → Detect → Evolve → Watch

A continuous feedback loop that makes your skills learn and adapt. Automatically.

Observe — Hooks capture every user query and which skills fired. On Claude Code, hooks install automatically. Use selftune replay to backfill existing transcripts. This is how your skills start learning.

Detect — selftune finds the gap between how you talk and how your skills are described. You say "make me a slide deck" and your pptx skill stays silent — selftune catches that mismatch.

Evolve — Rewrites skill descriptions — and full skill bodies — to match how you actually work. Batched validation with per-stage model control (--cheap-loop uses haiku for the loop, sonnet for the gate). Teacher-student body evolution with 3-gate validation. Baseline comparison gates on measurable lift. Automatic backup.

Watch — After deploying changes, selftune monitors skill trigger rates. If anything regresses, it rolls back automatically. Your skills keep improving without you touching them.

What's New in v0.2.0

  • Full skill body evolution — Beyond descriptions: evolve routing tables and entire skill bodies using teacher-student model with structural, trigger, and quality gates
  • Synthetic eval generationselftune evals --synthetic generates eval sets from SKILL.md via LLM, no session logs needed. Solves cold-start: new skills get evals immediately.
  • Cheap-loop evolutionselftune evolve --cheap-loop uses haiku for proposal generation and validation, sonnet only for the final deployment gate. ~80% cost reduction.
  • Batch trigger validation — Validation now batches 10 queries per LLM call instead of one-per-query. ~10x faster evolution loops.
  • Per-stage model control--validation-model, --proposal-model, and --gate-model flags give fine-grained control over which model runs each evolution stage.
  • Auto-activation system — Hooks detect when selftune should run and suggest actions
  • Enforcement guardrails — Blocks SKILL.md edits on monitored skills unless selftune watch has been run
  • Live dashboard serverselftune dashboard --serve with SSE auto-refresh and action buttons
  • Evolution memory — Persists context, plans, and decisions across context resets
  • 4 specialized agents — Diagnosis analyst, pattern analyst, evolution reviewer, integration guide
  • Sandbox test harness — Comprehensive automated test coverage, including devcontainer-based LLM testing
  • Workflow discovery + codificationselftune workflows finds repeated multi-skill sequences from telemetry, and selftune workflows save <workflow-id|index> appends them to ## Workflows in SKILL.md

Commands

Command What it does
selftune status See which skills are undertriggering and why
selftune evals --skill <name> Generate eval sets from real session data (--synthetic for cold-start)
selftune evolve --skill <name> Propose, validate, and deploy improved descriptions (--cheap-loop, --with-baseline)
selftune evolve-body --skill <name> Evolve full skill body or routing table (teacher-student, 3-gate validation)
selftune baseline --skill <name> Measure skill value vs no-skill baseline
selftune unit-test --skill <name> Run or generate skill-level unit tests
selftune composability --skill <name> Measure synergy and conflicts between co-occurring skills, with workflow-candidate hints
selftune workflows Discover repeated multi-skill workflows and save a discovered workflow into SKILL.md
selftune import-skillsbench Import external eval corpus from SkillsBench
selftune badge --skill <name> Generate skill health badge SVG
selftune watch --skill <name> Monitor after deploy. Auto-rollback on regression.
selftune dashboard Open the visual skill health dashboard
selftune replay Backfill data from existing Claude Code transcripts
selftune doctor Health check: logs, hooks, config, permissions

Full command reference: selftune --help

Why Not Just Rewrite Skills Manually?

Approach Problem
Rewrite the description yourself No data on how users actually talk. No validation. No regression detection.
Add "ALWAYS invoke when..." directives Brittle. One agent rewrite away from breaking.
Force-load skills on every prompt Doesn't fix the description. Expensive band-aid.
selftune Learns from real usage, rewrites descriptions to match how you work, validates against eval sets, auto-rollbacks on regressions.

Different Layer, Different Problem

LLM observability tools trace API calls. Infrastructure tools monitor servers. Neither knows whether the right skill fired for the right person. selftune does — and fixes it automatically.

selftune is complementary to these tools, not competitive. They trace what happens inside the LLM. selftune makes sure the right skill is called in the first place.

Dimension selftune Langfuse LangSmith OpenLIT
Layer Skill-specific LLM call Agent trace Infrastructure
Detects Missed triggers, false negatives, skill conflicts Token usage, latency Chain failures System metrics
Improves Descriptions, body, and routing automatically
Setup Zero deps, zero API keys Self-host or cloud Cloud required Helm chart
Price Free (MIT) Freemium Paid Free
Unique Self-improving skills + auto-rollback Prompt management Evaluations Dashboards

Platforms

Claude Code — Hooks install automatically. selftune replay backfills existing transcripts.

Codexselftune wrap-codex -- <args> or selftune ingest-codex

OpenCodeselftune ingest-opencode

OpenClawselftune ingest-openclaw + selftune cron setup for autonomous evolution

Requires Bun or Node.js 18+. No extra API keys.


Architecture · Contributing · Security · Integration Guide · Sponsor

MIT licensed. Free forever. Works with Claude Code, Codex, OpenCode, and OpenClaw.