runtime defense for CLI AI agents. intercepts tool calls before execution and enforces security policy.
there's a single HTML page at docs/target.html styled to look like normal "CloudSync" tool documentation. every section of that page is poisoned with a different prompt injection: HTML comments, white-on-white text, zero-width Unicode, display:none divs, HTML entity encoding, link title attributes, tiny-font spans, fake "agent instruction" blockquotes. 20+ attack payloads total.
docs/run-attacks.sh replays every injection against sentinel evaluate:
./target/release/sentinel install --enforce
SENTINEL=./target/release/sentinel ./docs/run-attacks.sh20/20 attacks blocked at the hook layer, before any tool ran. full write-up and attack matrix at docs/index.html (or stresstestor.github.io/sentinel).
CLI agents like Claude Code and Codex have file system access, shell execution, and code modification capabilities. prompt injection can make them exfiltrate credentials, delete files, or modify production configs. the model-level safety layer is provably insufficient: DeepSeek R1 scored 0/10 on harmful refusals in adversarial evaluation.
nobody is defending at the runtime layer. sentinel fixes that.
sentinel hooks into Claude Code's PreToolUse system. every tool call (Bash, Edit, Write, Read) passes through sentinel before execution. sentinel evaluates the call against your security policy and either allows, warns, or blocks it.
you type a prompt
│
claude code decides to run: cat ~/.aws/credentials
│
sentinel intercepts the tool call
│
policy says: ~/.aws/* → BLOCK (credential access)
│
tool call denied. that read never happens.
the deterministic path layer is the part you can lean on: a deny on ~/.aws/* holds no matter how the path is spelled (absolute, $HOME, symlink, case, glob). the command rules (exfil, rm -rf, fetch-exec) raise the cost of the obvious attacks, but a shell has infinite spellings and a PreToolUse hook never sees a child process. treat those as cost, not a wall. more in supply-chain hardening.
cargo install sentinel-guard
sentinel install # audit mode (logs only, doesn't block)
sentinel install --enforce # enforcement mode (blocks violations)(the crate name is sentinel-guard because sentinel was already taken on crates.io. the binary is still sentinel.)
that's it. sentinel writes a PreToolUse hook into ~/.claude/settings.json and a default policy with sane deny rules (credential paths, recursive deletion, pipe-to-shell, data-exfil over curl/wget, secret patterns, and self-protection of its own policy, binary, and hook entry).
sentinel starts in audit mode. it logs what WOULD be blocked but doesn't actually block anything. you see the log and think "wow, sentinel would have caught 3 dangerous actions today." when you're ready, switch to enforce mode.
before installing the defense layer, see how vulnerable your agent actually is:
sentinel audit --agent claudethis runs the PromptPressure attack corpus (220+ adversarial sequences across 8 behavioral dimensions) against your agent in a sandbox. the report shows exactly where your agent is vulnerable.
the default policy lives at ~/.sentinel/policy.toml:
[policy]
mode = "audit"
on_failure = "closed"
default = "warn"
[[deny.paths]]
pattern = "~/.ssh/*"
action = "block"
reason = "SSH key access"
[[deny.commands]]
pattern = 'rm\s+-rf\s+/.*'
action = "block"
reason = "recursive root deletion"
[[deny.secrets]]
pattern = 'AKIA[0-9A-Z]{16}'
action = "block"
reason = "AWS access key in command args"deny rules evaluate first. glob patterns for paths, regex for commands and secrets.
| tier | what | status |
|---|---|---|
| 1. policy | deterministic deny/allow rules — path canonicalization, shell-aware command matching, secret patterns, fail-closed on un-inspectable input | active — runs on every tool call |
| 2. heuristic | aho-corasick patterns from the attack corpus + multi-turn context | implemented, not yet wired into the hook path (see roadmap) |
| 3. LLM classifier | secondary model for ambiguous inputs | planned — interface only, not implemented |
Enforcement today is the Tier-1 policy engine. It's the deterministic, zero-false-positive layer and it's what blocks the attacks in the demo. Tiers 2 and 3 are scaffolding for defense-in-depth: the heuristic analyzer is written but isn't called on the evaluate hot path yet (wiring it needs a concurrency-safe context buffer and a false-positive budget), and the LLM classifier is an interface stub. Don't rely on 2 or 3 being active.
the self-propagating npm/pypi worms in the shai-hulud / Miasma family inject persistence and steal credentials through package lifecycle scripts. the default policy now covers the part of that an agent runtime can actually see:
- self-protect. the agent can't write
~/.sentinel/policy.toml, can't overwrite thesentinelbinary at the common install paths, and can't rewrite~/.claude/settings.jsonto drop sentinel's own hook entry (that last one is content-aware: ordinary settings edits stay warn, a write that removes thesentinel evaluatehook escalates to block). a guard that lets an injected agent flip itself to audit mode, delete the cop, or unhook itself is not a guard. all blocked at the tool-call layer (your own editor, andsentinel install, are unaffected since they don't go through the hook). - credential coverage. more credential files and secret formats:
~/.npmrc,~/.kube/config,~/.config/gcloud/,~/.azure/(block), plus GCP / Azure / Vault / kubeconfig secret content in writes. - warn-level tripwires for the agent-driven version of the TTPs: writes to other agents' hook configs (
.claude/settings.json,~/.codex,~/.gemini,.vscode/tasks.json), LaunchAgent / systemd-user persistence units, andnpm publish/npm token/gh repo create --public. these are warn, not block, because developers do all of them legitimately.
now the part nobody else says out loud:
sentinel hooks the agent's tool calls. it does not and cannot see npm lifecycle scripts. when the worm's payload runs, it runs inside a child process of npm install, not as an agent tool call. that never crosses the PreToolUse hook. so these rules catch the case where a prompt injection drives the agent itself into writing a LaunchAgent or exfiltrating a credential. they do not catch the worm propagating on its own. anything that claims a runtime hook stops a lifecycle-script worm is lying to you.
two more honest caveats:
- self-protect now covers the three obvious disarm vectors (policy file, binary, hook entry), but it is not tamper-proof. it only sees the agent's own tool calls: a write from a subprocess (
sed -i,python -c "open(...).write(...)"), a path assembled from a shell variable, or a settings rewrite that keeps thesentinel evaluatemarker while repointing it at a different binary all get through. the binary tamper-by-name rules (rm "$(command -v sentinel)") are command-regex, so they're evadable by var-assembly; the literal-path binary rules are the stronger half. this raises the cost of disarming sentinel. it does not make it impossible. sentinel installdoes not overwrite an existing~/.sentinel/policy.toml, and the default is audit mode. if you already have a policy, these new rules won't appear until you regenerate it, and nothing blocks until you switch to--enforce.
sentinel audit run attack corpus against your agent
sentinel install install hooks + default policy (audit mode)
sentinel install --enforce install with enforcement
sentinel uninstall remove hooks
sentinel evaluate evaluate a tool call (called by the hook)
sentinel check '<json>' dry-run a tool call against the policy and explain the decision
sentinel verify replay pinned attacks through the policy, assert each is caught
sentinel doctor [--strict] validate the install chain + probe liveness. the canary spawns the hooked binary itself and asserts its own deny, so a no-op shim can't fake healthy
sentinel policy-diff show which bundled-default rules your policy is missing (read-only)
sentinel policy-lint static-check a policy for dead rules, bad regexes, broad allows
sentinel status show config, hooks, policy summary
sentinel corpus-update fetch latest attack corpus
sentinel verify is also wired into CI as a regression gate: a fixed bypass that silently reopens, or a new rule that starts false-blocking benign dev work, turns the build red.
- PromptPressure attack corpus (220+ sequences, 8 behavioral dimensions)
- Rust for near-zero latency in the hook path
- Claude Code's PreToolUse hook system for structured interception
MIT OR Apache-2.0

