SentinelAudit is an AI-assisted smart contract security workflow for Ethereum and the broader EVM ecosystem.
This repository is the selective open-source surface of the project: reusable triage, analysis, and validation tooling that sits underneath the product.
Core public surfaces:
- LLM triage harnesses
- Slither runner helper modules
- benchmark corpus and scorecard scripts
- release and evaluation docs
Start here:
- OSS_MODULES.md
- CONTRIBUTING.md
- CODE_OF_CONDUCT.md
It is not just "run Slither and summarize it." The system is designed to:
- assemble a compilable workspace from selected repo files
- run static analysis against that workspace
- structure findings into report-ready objects
- carry bounty scope into triage, validation, and dossier generation
- separate first-party findings from dependency risk
- keep lower-signal output in research lanes instead of promoting everything
- support validation workflows instead of stopping at scanner output
The current product direction is captured in ROADMAP.md.
This repo is meant to expose reusable security workflow building blocks:
- structured triage harnesses
- repo-aware compile and scan helpers
- public benchmark corpus and scorecard generation
- validation runner patterns
- release and evaluation methodology
Private product layers such as billing, auth, customer history, and internal audit-intelligence operations are intentionally kept out of the public surface.
SentinelAudit is being prepared for a selective public release.
The near-term plan is:
- publish reusable tooling and method-heavy pieces first
- keep billing, auth, customer data paths, and internal intelligence flows private
- avoid breaking the current product control plane while opening modules intentionally
See:
- PUBLIC_RELEASE_BOUNDARY.md
- OSS_MODULES.md
- PUBLIC_RELEASE_CHECKLIST.md
- PUBLIC_REPO_SETUP.md
- CONTRIBUTING.md
- CODE_OF_CONDUCT.md
- SECURITY.md
- LICENSE
- NOTICE
Before publishing any slice of the repo, run:
bun run audit:public-surfaceflowchart LR
U["User"] --> W["web<br/>Next.js UI"]
W --> B["backend<br/>Cloudflare Worker + Hono"]
B --> S["slither runner<br/>compile + scan"]
B --> L["llm-worker<br/>structuring + fixes + dossiers"]
B --> E["echidna runner<br/>optional fuzzing"]
B --> R["R2 + DB<br/>workspace, findings, jobs, events"]
S --> B
L --> B
E --> B
B --> W
flowchart TD
A["Selected repo files"] --> B["Repo-aware workspace expansion"]
B --> C["Compile context<br/>foundry/remappings/config/vendor roots"]
C --> D["Slither detectors"]
D --> E["Normalized findings"]
E --> F["Semantic fact extraction"]
F --> F2["Dimensional fact extraction<br/>selected accounting paths"]
F2 --> G["Structured report objects"]
G --> H["Promotion policy"]
H --> I["Report findings"]
H --> J["Needs review"]
H --> K["Research notes"]
G --> L["Dependency findings lane"]
I --> N["Validation lane<br/>fuzz / deterministic / manual PoC"]
N --> O["Bounty dossier + export pack"]
H --> M["Evaluation harnesses<br/>goldset + repo + auditor review set"]
Sentinel now distinguishes between:
- first-party findings: issues in the user's project code
- dependency findings: vendored or third-party code such as
lib/,vendor/,node_modules/, and package imports - research notes: lower-signal output kept for manual review
Dependencies are still analyzed when needed for compilation and context, but they should not dominate the main report verdict by default.
Sentinel is now built around a stricter promotion rule:
- detector output is not enough
- semantic facts come first
- promotion policy decides whether something becomes:
report_findingneeds_reviewresearch_note
The core questions Sentinel tries to answer automatically are:
- who can call this path
- what protection exists
- who controls the dangerous argument
- whether state is finalized before external interaction
- whether the affected code is first-party or dependency code
- whether arithmetic/accounting logic mixes units like assets, shares, prices, or fee scales in a suspicious way
If those answers are weak or incomplete, Sentinel should hold the finding back instead of pretending to have higher confidence than it does.
For accounting-sensitive findings, Sentinel also uses a narrow dimensional reasoning layer. This is not a blanket pass over every detector; it is applied only on selected monetary paths where unit confusion can materially affect exploitability and promotion confidence.
Typical local services:
webathttp://localhost:3000backendathttp://localhost:8787workers/llm-workerathttp://localhost:8788api/slitherathttp://localhost:8080
- Install dependencies:
cd web && bun install
cd ../backend && bun install
cd ../workers/llm-worker && bun install
cd ../..- Prepare env files:
web/.env.local- set
NEXT_PUBLIC_API_URL=http://localhost:8787
- set
backend/.dev.vars- local backend worker config
workers/llm-worker/.dev.vars- copy from
workers/llm-worker/.dev.vars.example
- copy from
- Make sure Docker Desktop is running for Slither.
From D:\projects\audit\apps:
bun run dev:slither
bun run dev:echidna
bun run dev:llm
bun run dev:backend
bun run dev:webbun run check:localThis currently verifies:
- web TypeScript compiles
- backend test suite passes
llm-workerlocal-safe tests pass
- local report structuring happens after a job reaches
READY, not when the report page refreshes - local backend should use
LLM_WORKER_URL=http://127.0.0.1:8788 - backend and
llm-workermust share the sameLLM_WORKER_TOKEN - local backend sends inline findings to the local worker because Wrangler's simulated local R2 is not shared across services
backend/.dev.varsshould pointSLITHER_RUNNER_URLathttp://localhost:8080
AI is used as a constrained layer on top of deterministic inputs.
Good uses:
- structuring normalized findings
- exploitability-oriented triage
- fix generation
- bounty dossier generation
- Echidna harness planning/generation
- proof-plan generation for deterministic tests and manual PoCs
- deterministic test generation and execution with repo-aware Foundry/Hardhat selection
Bad uses:
- inventing vulnerabilities from raw detector output
- promoting dependency noise into first-party findings
- acting like a one-click replacement for human audit judgment
The next serious quality leap is not "more AI."
It is automatic repo-local semantic extraction and stronger promotion policy:
- who can call this
- what protects it
- what input is attacker-controlled
- what state changes before and after sensitive calls
- whether the code is first-party or vendored
That is the path from "AI-assisted scanner" to "real audit triage system."
Sentinel now treats validation as a first-class lane after findings are structured.
fuzz_target- suitable for Echidna harness generation and counterexample hunting
deterministic_test- better validated through a crisp transaction sequence, Foundry test, and state assertions
manual_poc- needs an attacker walkthrough, evidence capture, and reviewer-facing proof
review_only- useful context, but not worth automated proof generation
Validation evidence can strengthen a finding:
- successful Echidna counterexamples are merged back into report findings
- successful deterministic tests from Foundry or Hardhat are merged back into report findings
- validated findings rise in report ordering
- bounty dossiers and exports now carry validation posture explicitly
Sentinel's bounty mode is not just a themed report.
It now supports:
- scope setup on
/bounty - fresh audits with bounty scope attached from the start
- attaching bounty scope to existing completed audits
- scope-aware LLM triage during structuring
- bounty dossier generation
- bounty pack export
- proof planning for manual PoC and deterministic test work
Sentinel now has three complementary trust harnesses inside workers/llm-worker:
- snippet gold set
- verdict quality on isolated cases
- repo benchmark set
- repo-shaped regression fixtures
- auditor review set
- whether a finding is strong enough to deserve the main report
Useful commands:
cd workers/llm-worker
bun run test:triage
bun run eval:triage
bun run eval:triage:repo
bun run eval:triage:auditorSentinel now has the start of a real production-learning loop.
For each completed audit, the backend stores a private audit intelligence artifact in R2 under:
results/internal/audit-intelligence/<jobId>.json
The goal is to make real audit runs useful for improving Sentinel over time without relying on a fixed target list. These artifacts are designed for:
- batch download and offline analysis
- finding detector families that still create noisy
needs_reviewoutput - spotting low semantic-fact coverage in real repos
- building new benchmark fixtures and auditor review cases from real usage
- backend/README.md
- web/README.md
- workers/llm-worker/README.md
- ROADMAP.md