Agent Maturity Compass

Score your AI agent. Red-team it. Ship it with proof.
Open-source CLI for evidence-based trust scoring, adversarial testing, and compliance.
Works with any framework. 60 seconds to your first score.

Quick Start · Web Playground · Docs · Recipes · Community · Contribute

What is this?

AMC scores AI agents from what they actually do, not what their docs say they do.

npx agent-maturity-compass quickscore

One command. No account. No API key. You get:

A trust score — L0 (dangerous) to L5 (production-ready), based on execution evidence
A gap analysis — exactly what's weak, what's risky, and what's missing
Generated fixes — guardrails, config patches, CI gates, and compliance artifacts

Then you keep going: add adversarial testing, continuous monitoring, regulatory mapping, and fleet-wide governance — all from the same CLI.

Evaluation workflows — golden datasets, imported evals, lite scoring for non-agent apps
Business and compliance outputs — KPI correlation, leaderboards, audit binders

Works with LangChain, CrewAI, AutoGen, OpenAI Agents SDK, Claude Code, Gemini, OpenClaw, and more — with zero or near-zero integration friction.

Why should I care?

Today, many agents are evaluated by what they claim in docs, prompts, or self-reported checklists. That is structurally weak.

AMC focuses on execution-verified evidence.

How agents are evaluated today	How AMC evaluates
Agent says "I'm safe" → Score: 100 ✅	AMC tests the agent and inspects evidence → Real score may be 16 ❌
Self-reported documentation	Execution-verified evidence
Keyword matching	Weighted trust evidence
"Trust me, bro"	Cryptographic proof chains

That is the entire thesis: trust, but verify — with receipts.

⚡ 60 Seconds to Your First Score

# Install globally (or use npx below)
npm i -g agent-maturity-compass

# Score your agent
cd your-agent-project
amc quickscore

Or skip the install entirely:

npx agent-maturity-compass quickscore

Want it even faster?

amc quickscore --rapid           # skip optional questions, get a score in seconds

More install methods

curl (no Node required)

curl -fsSL https://raw.githubusercontent.com/thewisecrab/AgentMaturityCompass/main/install.sh | bash

Homebrew

brew tap thewisecrab/amc && brew install agent-maturity-compass

Docker

docker run -it --rm ghcr.io/thewisecrab/amc-quickstart amc quickscore

From source

git clone https://github.com/thewisecrab/AgentMaturityCompass.git
cd AgentMaturityCompass && npm ci && npm run build && npm link

🔍 How AMC Compares

	AMC	Observability platforms	Eval frameworks	Manual checklists
Evidence model	Execution-verified, cryptographic proofs	Logs and metrics, no trust scoring	Test pass/fail, no maturity model	Self-reported
Adversarial testing	147 attack simulations built in	Not a focus	Partial (prompt-level only)	None
Compliance mapping	EU AI Act, ISO 42001, NIST, SOC 2, OWASP	Not included	Not included	Manual, labor-intensive
Framework support	14 adapters, zero code changes	Framework-specific agents	Framework-specific	N/A
Cost	Free, open source (MIT)	Per-seat/month pricing	Free to paid	Free but manual
Time to first result	60 seconds	Hours to days	Minutes to hours	Days to weeks

AMC is not an observability tool and not an eval harness. It is a trust scorecard — it tells you whether your agent is safe to ship, with cryptographic evidence, and generates the compliance artifacts to prove it.

🧪 What AMC Tests

235 Diagnostic Questions × 5 Dimensions

Dimension	Questions	What It Measures
Strategic Agent Operations	16	Mission clarity, scope adherence, cost governance, operational intelligence
Agent Leadership	20	Governance structure, EU AI Act readiness, proactive risk management, business continuity
Agent Culture	94	Feedback loops, forecast legitimacy, persona governance, UX honesty, over-compliance detection, social alignment
Agent Resilience	52	Graceful degradation, circuit breakers, memory safety, threat resistance, fact/simulation boundaries
Agent Skills	53	Tool mastery, injection defense, DLP, scenario traceability, replay safety

147 Assurance Packs

Category	Examples
Prompt Injection	System tampering, role hijacking, jailbreaks
Exfiltration	Secret leakage, PII exposure, data boundary violations
Adversarial	TAP/PAIR, Crescendo, Skeleton Key, best-of-N
Context Leakage	EchoLeak, cross-session bleed, memory poisoning
Supply Chain	Dependency attacks, MCP server poisoning, SBOM integrity
Behavioral	Sycophancy, self-preservation, sabotage, over-compliance

40 Industry Domain Packs

Sector	Packs	Key Regulations
🏥 Health	9	HIPAA, FDA 21 CFR Part 11, EU MDR, ICH E6(R3)
💰 Wealth	5	MiFID II, PSD2, EU DORA, MiCA, FATF
🎓 Education	5	FERPA, COPPA, IDEA, EU AI Act Annex III
🚇 Mobility	5	UNECE WP.29, ETSI EN 303 645, EU NIS2
💡 Technology	5	EU AI Act Art. 13, EU Data Act, DSA Art. 34
🌿 Environment	6	EU Farm-to-Fork, REACH, IEC 61850
🏛️ Governance	5	EU eIDAS 2.0, UNCAC, UNGPs

🔮 Simulation & Forecast Evaluation Lane

Dedicated evaluation lane for simulation engines, forecast systems, and synthetic social environments. 5 scored dimensions:

Dimension	Weight	Questions	What it evaluates
Forecast Legitimacy	25%	AMC-6.1–6.10	Uncertainty expression, calibration, scenario vs prediction framing
Boundary Integrity	20%	AMC-6.11–6.17, 6.37–6.42	Fact/inference/simulation separation, writeback governance
Synthetic Identity	20%	AMC-6.18–6.25, 6.48–6.52	Persona governance, real-person representation controls
Simulation Validity	20%	AMC-6.30–6.36	Mode collapse detection, population diversity, historical calibration
Scenario Provenance	15%	AMC-6.26–6.29, 6.53–6.57	End-to-end traceability, replay capability, interaction safety

amc score simulation-lane --system-type simulation-engine              # interactive
amc score simulation-lane --system-type forecast-decision-support --json  # JSON output
amc score simulation-lane --system-type synthetic-social-environment --responses answers.json

79 Scoring Modules

See all modules

Calibration gap (confidence vs reality)
Evidence conflict detection
Gaming resistance (adversarial score inflation)
Sleeper agent detection (context-dependent behavior)
Policy consistency (pass^k reliability)
Factuality (parametric, retrieval, grounded)
Memory integrity & poisoning resistance
Alignment index (safety × honesty × helpfulness)
Over-compliance detection (H-Neurons, arXiv:2512.01797)
Monitor bypass resistance (arXiv:2503.09950)
Trust-authorization synchronization (arXiv:2512.06914)
MCP compliance scoring
Identity continuity tracking
Behavioral transparency index
Forecast legitimacy (epistemic honesty, calibration, uncertainty)
Fact/simulation boundary (provenance separation, writeback governance)
Synthetic identity governance (persona labeling, real-person controls)
Simulation validity (mode collapse, population diversity)
Scenario provenance (traceability, replay, interaction safety)
And 60+ more...

🏗️ Architecture

Agent (untrusted)
    │
    ▼
AMC Gateway ──── transparent proxy, agent doesn't know it's being watched
    │
    ▼
Evidence Ledger ──── Ed25519 signatures + Merkle tree proof chains
    │
    ▼
Scoring Engine ──── evidence-weighted diagnostics, 79 scoring modules, 147 assurance packs
    │
    ▼
AMC Studio ──── dashboard + API + CLI + reports

Evidence Trust Tiers

Tier	Weight	How
`OBSERVED_HARDENED`	1.1×	AMC-controlled adversarial scenarios
`OBSERVED`	1.0×	Captured via gateway proxy
`ATTESTED`	0.8×	Cryptographic attestation
`SELF_REPORTED`	0.4×	Agent's own claims (capped)

Maturity Scale

Level	Name	Meaning
L0	Absent	No safety controls
L1	Initial	Some intent, nothing operational
L2	Developing	Works on happy path, breaks at edges
L3	Defined	Repeatable, measurable, auditable (EU AI Act minimum)
L4	Managed	Proactive, risk-calibrated, cryptographic proofs
L5	Optimizing	Self-correcting, continuously verified

Product Family

AMC is one trust stack with eight named product surfaces:

Product	What it does
Score	Evidence-weighted maturity diagnostics and trust scoring
Shield	Adversarial assurance packs and attack simulations
Enforce	Policy controls, approvals, and governance workflows
Vault	Signatures, keys, and tamper-evident proof infrastructure
Watch	Traces, anomalies, monitoring, and operational drift detection
Fleet	Multi-agent oversight, comparison, inventory, and governance
Passport	Portable identity and credential artifacts for agents
Comply	Compliance mappings, audit binders, and governance reporting

📋 Recipes — Copy-Paste Examples

Score any agent in one line

npx agent-maturity-compass quickscore                    # quick score
npx agent-maturity-compass quickscore --eu-ai-act        # + EU AI Act check
npx agent-maturity-compass quickscore --share            # shareable link

Wrap an existing agent (zero code changes)

# LangChain
amc wrap langchain -- python my_agent.py

# CrewAI
amc wrap crewai -- python crew.py

# AutoGen
amc wrap autogen -- python autogen_app.py

# OpenClaw
amc wrap openclaw-cli -- openclaw run

# Claude Code
amc wrap claude-code -- claude "analyze this code"

# Any CLI agent
amc wrap generic-cli -- python my_bot.py

Red-team your agent

amc assurance run --scope full                           # full assurance library
amc assurance run --pack prompt-injection                # specific attack
amc assurance run --pack adversarial-robustness          # TAP/PAIR/Crescendo
amc assurance run --format sarif                         # export for security tools

Inspect traces and operational drift

amc observe timeline                                     # score history + evidence volume
amc observe anomalies                                    # volatility / regressions / weirdness
amc trace list                                           # recent agent sessions
amc trace inspect <trace-id>                             # inspect tool calls and trust tiers

Build golden datasets and run evals

amc dataset create support-bot                           # create a reusable eval dataset
amc dataset add-case support-bot --prompt "..." --expected "..."
amc dataset run support-bot                              # run eval cases
amc eval import --format promptfoo --file results.json   # import external eval results
amc lite-score                                           # score a non-agent chatbot / LLM app

Business, inventory, and reporting

amc business kpi                                         # correlate maturity to outcomes
amc business report                                      # stakeholder-ready business summary
amc leaderboard show                                     # compare agents across a fleet
amc inventory scan --deep                                # discover agents, frameworks, model files
amc comms-check --text "Guaranteed 40% return" --domain wealth

Auto-fix everything

amc fix                          # generate guardrails + CI gate + governance docs
amc fix --target-level L4        # target a specific level
amc guide --go                   # detect framework → apply guardrails to config
amc guide --watch                # continuous monitoring + auto-update

Compliance in one command

amc audit binder create --framework eu-ai-act            # EU AI Act evidence binder
amc compliance report --framework iso-42001              # ISO 42001 report
amc domain assess --domain health                        # HIPAA assessment
amc domain assess --domain wealth                        # MiFID II / DORA

GitHub Actions — CI trust gate

# .github/workflows/amc.yml — copy this entire file
name: AMC Trust Gate
on:
  pull_request:
  push:
    branches: [main]

jobs:
  amc-score:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: thewisecrab/AgentMaturityCompass/amc-action@main
        with:
          agent-id: my-agent
          target-level: 3
          fail-on-drop: true
          comment: true
          upload-artifacts: true

Badge for your README

<!-- Add this to your README -->
[![AMC Score](https://img.shields.io/badge/AMC-L3_(72.5)-green?logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCAyNCAyNCI+PHBhdGggZmlsbD0iI2ZmZiIgZD0iTTEyIDJMMiA3bDEwIDUgMTAtNXptMCA5bC04LjUtNC4yNUwyIDEybDEwIDUgMTAtNXptMCA5bC04LjUtNC4yNUwyIDIxbDEwIDUgMTAtNXoiLz48L3N2Zz4=)](https://github.com/thewisecrab/AgentMaturityCompass)

Result:

🔌 14 Framework Adapters

Zero code changes. One environment variable.

amc wrap <adapter> -- <your command>

Adapter	Command
LangChain	`amc wrap langchain -- python app.py`
LangGraph	`amc wrap langgraph -- python graph.py`
CrewAI	`amc wrap crewai -- python crew.py`
AutoGen	`amc wrap autogen -- python autogen.py`
OpenAI Agents SDK	`amc wrap openai-agents -- python agent.py`
LlamaIndex	`amc wrap llamaindex -- python rag.py`
Semantic Kernel	`amc wrap semantic-kernel -- dotnet run`
Claude Code	`amc wrap claude-code -- claude "task"`
Gemini	`amc wrap gemini -- gemini chat`
OpenClaw	`amc wrap openclaw-cli -- openclaw run`
OpenHands	`amc wrap openhands -- openhands run`
Python SDK	`amc wrap python-amc-sdk -- python app.py`
Generic CLI	`amc wrap generic-cli -- python bot.py`
OpenAI-compatible	`amc wrap openai-compat -- node server.js`

Full adapter docs

📊 Compliance Mapping

Framework	Coverage
EU AI Act	12 article mappings + audit binder generation
ISO 42001	Clauses 4-10 mapped to AMC dimensions
NIST AI RMF	Risk management framework alignment
SOC 2	Trust service criteria mapping
OWASP LLM Top 10	Full coverage (10/10)

🚀 Install

npm (recommended)

npm i -g agent-maturity-compass

npx (no install)

npx agent-maturity-compass quickscore

Homebrew

brew tap thewisecrab/amc && brew install agent-maturity-compass

curl

curl -fsSL https://raw.githubusercontent.com/thewisecrab/AgentMaturityCompass/main/install.sh | bash

Docker

docker run -it --rm ghcr.io/thewisecrab/amc-quickstart amc quickscore

From source

git clone https://github.com/thewisecrab/AgentMaturityCompass.git
cd AgentMaturityCompass && npm ci && npm run build && npm link

☁️ Deploy (One-Click)

Platform	Deploy
Docker Compose	`cd docker && docker compose up`
Vercel
Railway

Pricing

The full trust stack is free and MIT licensed. The only paid surface is Industry Packs.

Tier	What you get
Free / Open Source	Everything — Score, Shield, Enforce, Vault, Watch, Fleet, Passport, Comply, all 14 adapters, 481 CLI commands, browser playground, CI gates
Pro	Everything in Free + selected Industry Packs for your regulated verticals
Enterprise	Everything in Pro + all 40 Industry Packs + priority support + custom pack development + deployment assistance

Industry Packs are 40 sector-specific domain packs (healthcare, finance, education, government, etc.) that require ongoing regulatory research and maintenance. The core trust stack stays free forever.

Choose Your Path

Path	Best for	Start here
Browser	First-touch evaluation, demos, understanding scoring	Web Playground
CLI	Real agent scoring, evidence capture, shareable outputs	`npx agent-maturity-compass quickscore`
CI/CD	Release gates, score thresholds, PR comments	CI Templates
Enterprise	Self-hosted, managed deployment	Deployment Options

Start by persona

Solo builder / OSS maintainer → docs/SOLO_DEV_PATH.md
Platform / engineering team → docs/PLATFORM_PATH.md
Security / compliance → docs/SECURITY_PATH.md

📚 Docs


Quickstart (5 min)	Agent Guide
Solo Dev Quickstart	Platform Engineer Quickstart
Security & Compliance Quickstart	Troubleshooting
CLI Reference (481 commands)	Architecture
Compatibility Matrix	Starter Blueprints
Install Packages	Support Policy
Release Cadence	CI Templates
Hardening Guide	Community
Assurance Lab	Domain Packs
EU AI Act Compliance	Multi-Agent Trust
Executive Overview	White Paper
Example Projects	Web Playground

More docs

docs/INDEX.md — full documentation index
docs/START_HERE.md — orientation guide
docs/WHY_AMC.md — the case for AMC
docs/USE_CASES.md — use case gallery
docs/PERSONAS.md — role-based guides
docs/AFTER_QUICKSCORE.md — what to do after your first score
docs/EXAMPLES_INDEX.md — example index
docs/RECIPES.md — extended recipes
docs/DEPLOYMENT_OPTIONS.md — deployment options
docs/PRODUCT_EDITIONS.md — product editions
docs/PRICING.md — pricing details
docs/BUYER_PACKAGES.md — buyer packages
docs/SERVICES_AND_SUPPORT.md — services and support
docs/COMMUNITY_SHOWCASE.md — community showcase
docs/RELEASE_HIGHLIGHTS.md — release highlights
docs/BENCHMARK_GALLERY.md — benchmark gallery
docs/SPONSORING.md — sponsorship
docs/COMMUNITY_SUPPORT.md — community and support

Single-binary install (experimental)

AMC now includes an experimental Node SEA packaging path for host-specific single-binary builds:

npm run build
npm run build:sea

The build path is wired in and produces SEA artifacts plus a manifest. Runtime verification is still experimental and host-sensitive. See docs/SINGLE_BINARY.md for the honest status and caveats.

Nightly compatibility matrix

AMC now includes a scheduled GitHub Actions workflow that validates packaged CLI installs across a small OS/Node matrix and uploads JSON artifacts for inspection:

workflow: .github/workflows/nightly-compatibility-matrix.yml
current matrix: ubuntu-latest + macos-latest, Node 20 + 22
checks: packed install, doctor --json, quickscore --json, lite-score --help, comms-check --help

Workspace config profiles (MVP)

AMC now supports lightweight workspace config presets for .amc/amc.config.yaml:

amc init --profile dev
amc quickstart --profile ci
amc config profile prod

Current MVP behavior:

dev → shared trust boundary, proxy env enabled
ci → isolated trust boundary, proxy env enabled
prod → isolated trust boundary, proxy env disabled
explicit --trust-boundary still overrides the profile when you need it

🤝 Contributing

AMC is MIT licensed. We welcome contributions — especially new assurance packs, domain packs, framework adapters, and scoring modules.

git clone https://github.com/thewisecrab/AgentMaturityCompass.git
cd AgentMaturityCompass && npm ci && npm test   # 4,161 tests

→ CONTRIBUTING.md — includes guides for writing packs, mapping research papers, and adding adapters.

Good first contributions

New assurance pack — model a new attack scenario (guide)
New domain pack — add industry-specific questions (guide)
New adapter — support another agent framework (guide)
Research paper → module — turn arXiv findings into scoring logic (guide)

📄 License

MIT — public trust infrastructure for the age of AI agents.

235 diagnostic questions · 147 assurance packs · 40 domain packs · 14 adapters · 79 scoring modules · 4,161 tests
Stop trusting. Start verifying.

Name		Name	Last commit message	Last commit date
Latest commit History 574 Commits
.amc		.amc
.changeset		.changeset
.github		.github
.serena		.serena
Formula		Formula
amc-action		amc-action
api		api
deploy		deploy
docker		docker
docs		docs
examples		examples
integrations/pytest-amc		integrations/pytest-amc
internal/debug		internal/debug
platform		platform
research		research
scripts		scripts
sdk/python		sdk/python
src		src
tests		tests
tools/evil-mcp-server		tools/evil-mcp-server
vscode-extension		vscode-extension
website		website
whitepaper		whitepaper
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.runner		Dockerfile.runner
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
compliance-eu_ai_act.json		compliance-eu_ai_act.json
compliance-gdpr.json		compliance-gdpr.json
compliance-iso_42001.json		compliance-iso_42001.json
compliance-nist_ai_rmf.json		compliance-nist_ai_rmf.json
compliance-soc2.json		compliance-soc2.json
install.sh		install.sh
package-lock.json		package-lock.json
package.json		package.json
railway.json		railway.json
sbom.json		sbom.json
test_model.pkl		test_model.pkl
tsconfig.json		tsconfig.json
vercel.json		vercel.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

Agent Maturity Compass

What is this?

⚡ 60 Seconds to Your First Score

🔍 How AMC Compares

🧪 What AMC Tests

235 Diagnostic Questions × 5 Dimensions

147 Assurance Packs

40 Industry Domain Packs

🔮 Simulation & Forecast Evaluation Lane

79 Scoring Modules

🏗️ Architecture

Evidence Trust Tiers

Maturity Scale

Product Family

📋 Recipes — Copy-Paste Examples

Score any agent in one line

Wrap an existing agent (zero code changes)

Red-team your agent

Inspect traces and operational drift

Build golden datasets and run evals

Business, inventory, and reporting

Auto-fix everything

Compliance in one command

GitHub Actions — CI trust gate

Badge for your README

🔌 14 Framework Adapters

📊 Compliance Mapping

🚀 Install

npm (recommended)

npx (no install)

Homebrew

curl

Docker

From source

☁️ Deploy (One-Click)

Pricing

Choose Your Path

Start by persona

📚 Docs

Single-binary install (experimental)

Nightly compatibility matrix

Workspace config profiles (MVP)

🤝 Contributing

Good first contributions

📄 License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages