diff --git a/PROMPTS.md b/PROMPTS.md index 81f4db8..84fbe73 100644 --- a/PROMPTS.md +++ b/PROMPTS.md @@ -235,16 +235,108 @@ conversation that chose it. 6. **Mutation testing** — slither-mutate, 302/302 killable mutants caught (100%) +### Session 3 — adversarial review (a different Claude session, no shared state) + +This is the part of the timeline that wasn't supposed to exist. After session 2 ended with all six tiers green and the contract pushed to GitHub, the author opened a fresh Claude Code session in their home directory and asked it to attack the contract instead of verify it. The session had no access to the previous prompt history, no knowledge of the test suite design, and no priors about what the contract should do — only the README's claims. + +- **User:** *"review the git repo of mine (dreliq9) called headless with 2 other evaluator agents"* + - Claude invoked the `/qa-hard` skill (rigorous 3-evaluator adversarial review modeled on a specialist-loop pipeline). Pulled the contract and tests via the GitHub MCP, then dispatched three evaluator agents in parallel. Each evaluator was given the same spec and file list but worked in isolation — no cross-contamination. Each was instructed to read the entire contract (~480 lines), evaluate across seven dimensions (spec compliance, correctness, reasoning quality, missed concerns, false alarms, code quality, communication quality), severity-calibrate findings, and grade the contract A through F. + - **All three evaluators independently graded the contract `D`.** All three identified the same critical bug, traced the same exact attack path, and pointed at the same lines (`Headless.sol:148` mint, `:302-327` redeem). The bug: + > The deployer mints 3,000,000 HDLS at construction. `tokensSold` starts at 0. As soon as Alice calls `buy(100)`, `tokensSold` becomes 100. The deployer can then call `redeem(100)` against their own pre-mined allocation — both guards (`balanceOf(self) >= 100` and `100 <= tokensSold`) pass, the burn comes out of the founder's 3M, and the refund (≈ Alice's payment minus fees) flows to the founder. Alice still holds 100 HDLS but `tokensSold == 0`, so her own subsequent `redeem` reverts. The deployer has just front-run a buyer's exit using a token they minted for free. + - The README's "founder is mechanically last in line on exit" claim was structurally false. The `tokensSold` counter was a supply guard, not an ordering guard, and the test author had conflated the two. + - **The damning evidence**: the unit test `test_FounderCannotRedeemBelowBaseline` actually demonstrates the exploit succeeding while the test asserts only that the founder can't redeem *more* afterward. Reading the test name, you'd conclude the founder was locked out. The author tested the wrong property and confidently named the test for what they thought they had tested. Every tier of the verification stack greenlit the bug because every tier was anchored to the same test. + - Claude then ran phase 3 (isolated weighing of each evaluator's report quality), phase 4 (synthesis with confidence tiers — HIGH/MAJORITY/CONTESTED/UNIQUE), and phase 5 (presented a synthesis report with severity-calibrated findings to the user). Synthesis grade: `D`, high confidence. +- **User:** *"fix all"* + - Claude cloned the repo to `/tmp/headless`, ran `forge test` to confirm the 54-test baseline passed, then applied the founder-taint lock fix: + 1. Added a `FounderTaintLocked` error and `mapping(address => uint256) public founderTaint`. + 2. Initialized `founderTaint[msg.sender] = FOUNDER_ALLOCATION` in the constructor. + 3. Overrode `_update` (the unified OZ ERC20 hook) to propagate taint proportionally on every account-to-account transfer. Round UP for stickiness, capped at the sender's current taint, no movement on mint or burn. The reasoning: burns are gated by the redeem precondition below, so taint never needs to move on burn; transfers must move taint or the founder could escape via a fresh address; mints can't create taint because the only mint that ever does is the founder mint in the constructor. + 4. Added a precondition in `redeem`: `if (balanceOf(msg.sender) - amount < founderTaint[msg.sender]) revert FounderTaintLocked();`. The founder cannot burn into their tainted balance. + 5. Added a `tokensSold > 0` gate in `claimAuction` to close the deploy-block self-rebate (the auction otherwise opened in the deploy block with no holders to receive the rebased premium). + - Then rewrote the test file: + - **Removed** the misleading `test_FounderCannotRedeemBelowBaseline` (it celebrated the exploit as a passing test). + - Added four new tests: + - `test_FounderCannotDrainBuyerDirect` — founder calls `redeem` directly → reverts. + - `test_FounderCannotDrainBuyerViaTransfer` — founder transfers 3M to bob → bob's `redeem` reverts. Proves taint propagates. + - `test_FounderCanRedeemBoughtTokens` — founder buys 50 fresh tokens via the curve → can redeem exactly those 50, not one wei into the allocation. + - `test_FounderTaintPropagatesProportionally` — partial transfer splits taint with sticky rounding. + - Added a `_seedCurveForAuction()` helper + `seeded` modifier; applied it to the 10 auction tests that needed `tokensSold > 0` after the new gate. + - Then rewrote the invariant test file: + - The handler now constructs the token itself, making the founder == handler, and adds `address(this)` as **actor 0**. Every redeem call with `seed % 6 == 0` is the founder attempting to drain. + - The redeem handler is wrapped in `try/catch` so taint-locked attempts don't pollute ghost variables. + - **Replaced** the previously-stub `invariant_EveryActorCanExitToPoolRoom` (whose body just re-asserted the aggregate `balance ≥ curveBackingRequired` check, which holds vacuously after a drain) with a real per-holder solvency check: + ```solidity + for each actor: + assertLe(founderTaint[actor], balanceOf(actor)); + sumNonTainted += balanceOf(actor) - founderTaint[actor]; + assertEq(sumNonTainted, tokensSold); + assertGe(address(token).balance, curveBackingRequired()); + ``` + This is the property that catches a founder drain: Alice's non-tainted balance would remain positive while `tokensSold` dropped to zero — a contradiction the assertion would flag. + - Re-ran the full stack: **57/57 unit tests pass, 10/10 invariants pass × 1.28M handler calls, 4/4 Halmos symbolic proofs pass**, Slither unchanged. The taint lock and per-holder invariant work end-to-end. +- **User:** *"fix moderate issues as well"* + - Claude applied the four moderate findings flagged during the synthesis phase: + 1. **Half-open auction window.** Changed `claimAuction` and `auctionPriceAt` to `[openBlock, openBlock + WINDOW)` instead of inclusive. The previous inclusive close block had premium = 0, allowing a fee-free 1000-HDLS mint on the final block — broke the "every trade pays a spread" symmetry. New `test_AuctionExpiredAtCloseBlock` proves the exclusion; the renamed `test_AuctionClaimableAtLastBlockBeforeClose` covers the inclusive end. `test_AuctionPriceDecaysLinearly` updated to assert `priceLast > base`. + 2. **L2 block-time sensitivity.** Converted `AUCTION_INTERVAL` and `AUCTION_WINDOW` from `constant` to `immutable` constructor parameters with `InvalidAuctionConfig` validation (`> 0`, `window <= interval`). `Deploy.s.sol` now documents recommended values for L1 (25, 25), Base/OP (150, 150), and Arbitrum (1200, 1200), and passes 25/25 as defaults. All test setUps updated to pass `(25, 25)`. + 3. **`twapCurveBase` input validation.** Added a `TwapNotChronological` named error rejecting `priorBlock > block.number` and `priorCumulative > nowCumulative` — the function previously underflow-reverted on bad inputs. + 4. **`Donated` event in `receive()`.** Added `event Donated(address indexed from, uint256 amount)` and emit it from `receive()` (gated on `msg.value > 0` so the buy/claimAuction overpayment refund path doesn't double-emit). + - One moderate finding (`auctionPriceAt` baseCost drift mid-window) was documented via a strong NatSpec ⚠ block instead of fixed via snapshotting — a snapshot mechanism would require per-auction state for an agent-UX issue, not a correctness one, and the natural overpayment-refund path already protects callers. + - Re-ran the full stack: **58/58 unit tests pass** (added the close-block expiry test, removed the misleading test = +4 net since session start), **10/10 invariants pass × 1.28M calls**, **4/4 Halmos proofs pass**, Slither unchanged. +- **User:** *(approval to push the branch and open a PR)* + - Claude branched `fix/founder-taint-lock` off main, committed all changes with a detailed message naming the critical finding and the four moderate fixes, pushed to `dreliq9/headless`, and opened [PR #1](https://github.com/dreliq9/headless/pull/1) via the GitHub MCP. The PR body documents the attack path, the fix, and the verification stack results post-fix. + +### Session 4 — repositioning as research artifact + +After the fix landed, the author and Claude discussed what to actually do with the repository. The original framing — "an autonomous agent-native token, six-tier verified, non-rug, fair launch" — could not survive the historical critical finding without disclosure. The question was whether to (a) bring the contract to market as a token, (b) productize the verification methodology as a service, or (c) reposition the whole repo as a research artifact / case study. + +The analysis: bringing the contract to market would require a real audit (~$15k–$60k), a crypto lawyer (legal exposure on the founder allocation under any reasonable Howey reading), bootstrap liquidity, distribution, monitoring, incident response, and an answer to "why would anyone hold this." The analysis flagged the founder allocation as the legal hot potato regardless of technical fixes — the taint lock prevents an economic rug but does nothing for the legal claim that the founder received tokens expecting profit from others' efforts. The pragmatic recommendation was option (c): reposition. + +- **User:** *(approved the repositioning)* + - Claude rewrote `README.md` from "verified token" framing to "research artifact" framing: + - New top-of-file framing sentence: *"A one-week experiment in AI-authored smart contract development and the verification gaps of AI-driven verification."* + - New first content section ("What this repository actually is") positioning the bug discovery as the load-bearing finding, not the contract itself. + - New "Historical critical finding" section walking through the attack path, the per-tier failure analysis, and the fix. + - New "Bounds and disclosures the original README didn't make explicit" subsection acknowledging the Halmos ≤100 ether bound, single-actor symbolic proofs, mutation-testing-measures-coverage-not-correctness, and L1-block-time sensitivity. + - New "Load-bearing invariant" subsection making explicit that the original aggregate invariant was necessary but not sufficient, and that the per-holder version added in PR #1 is the actual safety property. + - New "What v2 should look like" section listing eight forward-looking design notes for any future iteration — sub-whole-token denomination, slippage-bounded primitives, commit-reveal auctions, simulate-without-mutate views, proof-of-reserves API, max-buy-for-slippage, per-actor solvency from day one, adversarial review as a verification tier. + - New "Reproduce the historical bug" instruction in the verify-yourself section so any reader can run `git checkout f1d793c~1 && forge test --match-test test_FounderCannotRedeemBelowBaseline -vv` and watch the test pass while the founder drains alice's deposit. + - Strengthened the safety section to explicitly note that the verification stack failed to catch the critical finding when run as positive verification, and that automated stacks catch what they're told to look for. + - Removed "non-rug" and "formally verified" as standalone adjectives. + - This file (PROMPTS.md) was extended with sessions 3 and 4 in the same per-prompt format as sessions 1 and 2, so the full provenance trail is unbroken. + ## What future sessions should preserve -If anyone edits this contract — human or AI — two things must stay true +If anyone edits this contract — human or AI — three things must stay true for the "headless" name to keep earning itself: 1. **No admin, no owner, no upgrade path, no privileged role.** Not even a pause. Not even an emergency withdraw. Not even a parameter knob. -2. **The invariant:** `address(this).balance >= curveIntegral(0, tokensSold)` +2. **The aggregate invariant:** `address(this).balance >= curveIntegral(0, tokensSold)` must be preserved by every externally-callable mutating function, including any new ones. The internal `_rebase()` tightens this to equality; any new code path must call it before returning. +3. **Per-holder solvency** (added after the session 3 finding): every + holder's non-tainted balance must be redeemable from the contract's + ETH backing. The aggregate invariant alone is necessary but not + sufficient — it holds vacuously after a drain that takes `tokensSold` + to zero. The taint-lock mechanism in `_update` plus the redeem + precondition `balance - amount >= founderTaint[caller]` is the + load-bearing enforcement; the per-holder invariant in + `HeadlessInvariant.t.sol` is the load-bearing test. **Do not loosen + either without a successor mechanism that preserves the property + that no actor can take ETH that backs another actor's tokens.** Everything else is negotiable. + +The deeper methodological lesson, kept here as a permanent footnote so +nobody who works on this file in the future has to relearn it: **the +contract author and the test author cannot be the same agent on the +same session.** Session 2 produced both the contract and the +verification stack, both of which were confidently wrong about the same +property. Session 3 was a different Claude session asked to argue +against the artifact rather than verify it, and found the bug on the +first pass. If you (a future AI session, or a future human) are about +to claim a contract is verified, schedule a separate session whose +only job is to attack the claims of the verifying session. The +adversarial pass is the cheapest tier in any verification stack and +was the only one that worked here. diff --git a/README.md b/README.md index 92e41a9..2726892 100644 --- a/README.md +++ b/README.md @@ -3,88 +3,163 @@ [![verify](https://github.com/dreliq9/headless/actions/workflows/verify.yml/badge.svg)](https://github.com/dreliq9/headless/actions/workflows/verify.yml) [![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0) -> An autonomous, agent-native token. Written end-to-end by **Claude** (Anthropic's AI) at the prompt of a non-coder. No human wrote a line of the contract, the tests, or the verification stack. +> A one-week experiment in AI-authored smart contract development and the verification gaps of AI-driven verification. Every line of the contract, the tests, and the verification stack was written by **Claude** (Anthropic's AI) at the prompt of a non-coder. The interesting artifact is not the token — it's the gap a six-tier verification stack didn't catch. -**Status:** pre-audit, unaudited, not deployed. This is a verified code artifact, not a live protocol. **Do not put real ETH into this contract.** See [Safety](#safety) below. +**Status:** unaudited, never deployed, never will be by me. This repository is a research artifact and a case study, not a live protocol. **Do not put real ETH into this contract.** See [Safety](#safety). --- -## What it is +## What this repository actually is -Headless is a fair-launch ERC-20 with four interlocking mechanisms, all fully on-chain, deterministic, and callable by anyone: +A bonding-curve ERC-20 was the *vehicle*. The thing worth your attention is everything around it: -1. **Bonding-curve AMM** — `buy()` and `redeem()` price along a linear curve `price(n) = curveBase + SLOPE · n`. Both sides use the same integral, so round-tripping is net-zero minus fees. No free arbitrage via the curve. -2. **Spread fee (50 bps each side)** — every trade leaves a fee behind as excess backing, which is immediately swept into `curveBase` by the continuous rebase. Holders earn yield passively from every trade that touches the contract. -3. **Continuous rebase** — after every state-changing call, `_rebase()` runs internally and tightens the invariant `balance == curveIntegral(0, tokensSold)` to equality. No keeper, no poke ritual. -4. **Dutch auction** — every `AUCTION_INTERVAL` blocks a fresh auction opens for `AUCTION_SIZE` HDLS. Price starts at `curveCost × 1.20` and decays linearly to `curveCost` over the window. First caller takes the lot. The premium flows into `curveBase` via rebase. +1. A non-coder directed an AI to design and implement a non-trivial Solidity contract end-to-end. ([`PROMPTS.md`](PROMPTS.md) records the entire prompt timeline that produced it.) +2. The same AI (in a separate session) built a six-tier verification stack against the contract: 54 unit tests, 2 fuzz properties, 10 stateful invariants × 1.28M handler calls, Slither static analysis, 4 Halmos symbolic proofs, 302/302 killable mutants caught. +3. **Every tier passed.** The contract was claimed to be "non-rug, formally verified." +4. A subsequent **3-evaluator adversarial review** (also AI, in a fresh session, with no shared state) found a critical bug that would have allowed the deployer to drain every buyer's deposit in a single transaction. +5. The same review session designed the fix, wrote the regression tests, identified the structural reason every prior tier missed the bug, and patched four moderate findings on top. -**Built for AI trading agents, not humans.** Every event is scheduled, deterministic, and computable from on-chain state alone. Single-call `curveState()`, `auctionState()`, `oracleState()` snapshots for agent planners. On-chain TWAP cumulative tracker for composability. +The bug fix is in [PR #1](https://github.com/dreliq9/headless/pull/1). The story arc — *six tiers of automated verification rubber-stamped a contract that was trivially exploitable, and a separate adversarial AI session caught it on the first pass* — is the actual finding of this experiment. -**Headless** = no admin, no owner, no founder team, no upgrade path, no oracle, no off-chain dependency. The contract is the entire organisation for its entire lifetime. A 3% founder allocation is minted at construction — these tokens sit *below* the curve (cannot be curve-redeemed until someone else has bought above them), giving the deployer mechanical last-in-line exit. +If you only have time to read one thing in this repo, read [PR #1](https://github.com/dreliq9/headless/pull/1). -## Why it exists +## The contract (briefly) -This repo is the output of a week-long experiment: **can someone with zero coding experience ship a non-rug, formally-verified smart contract by only prompting an AI?** See [`PROMPTS.md`](PROMPTS.md) for the full prompt timeline — every design decision, every pivot, every mistake, and every fix, recorded as it happened. +Headless is a fair-launch ERC-20 with four mechanisms, all deterministic and on-chain: -The verification stack below is the interesting part. The token itself is secondary. +1. **Bonding-curve AMM.** `buy()` and `redeem()` price along a linear curve `price(n) = curveBase + SLOPE · n`. Both sides use the same integral, so round-tripping is net-zero minus fees. +2. **Spread fee (50 bps each side).** Every trade leaves a fee behind as excess backing, immediately swept into `curveBase` by the continuous rebase. Holders earn passive yield from every trade. +3. **Continuous rebase.** After every state-changing call, `_rebase()` tightens the invariant `balance == curveIntegral(0, tokensSold)` to equality. No keeper. +4. **Dutch auction.** Every `AUCTION_INTERVAL` blocks a fresh auction opens for `AUCTION_SIZE` HDLS. Price decays linearly from a 20% premium to the curve floor. Premium flows into `curveBase`. -## Verification stack +No admin, no owner, no upgrade path, no oracle, no off-chain dependency. A 3% founder allocation is minted at construction; after the fix in [PR #1](https://github.com/dreliq9/headless/pull/1), it is taint-locked at the token level so the founder cannot redeem against it. -Six independent methodologies. All run locally, all free, all pass: +This is a deployable Solidity artifact. It is not a token I am offering for sale or use. + +--- + +## The historical critical finding + +> The README originally claimed: *"The founder is mechanically last in line on exit — a hard lock enforced by the `tokensSold` counter, not by a flag."* +> +> This was materially false until [PR #1](https://github.com/dreliq9/headless/pull/1). + +### The bug + +`redeem()` had only two guards: + +```solidity +if (balanceOf(msg.sender) < amount) revert InsufficientBalance(); +if (amount > tokensSold) revert InsufficientCurveSupply(); +``` + +Neither guard distinguished founder-pre-mined tokens from buyer tokens. Both balance types were fungible ERC-20. The attack: + +1. Deployer holds 3,000,000 HDLS from construction. `tokensSold = 0`. +2. Alice calls `buy(100 ether)`, deposits ≈ `_curveIntegral(0,100) + fee`. `tokensSold = 100`. +3. Founder calls `redeem(100)`. Both guards pass (founder has 3M ≥ 100; 100 ≤ tokensSold). +4. `_burn` decrements founder's 3M by 100. `tokensSold → 0`. Refund (≈ Alice's payment − fees) is sent to **the founder**. +5. Alice still holds 100 HDLS, but her next `redeem` reverts `InsufficientCurveSupply` because `tokensSold == 0`. Her tokens are stranded. + +The founder could repeat this against every buyer until the 3M allocation was exhausted, capturing roughly the entire curve-paid-in ETH at no risk to themselves. The "fair launch, no rug" framing was structurally false. + +### Why every tier of the verification stack missed it + +| Tier | Why it greenlit the bug | +|---|---| +| **Unit tests** | A test named `test_FounderCannotRedeemBelowBaseline` *demonstrated the exploit succeeding* — alice buys, founder transfers 3M to bob, bob calls `redeem` against alice's deposit, the call returns success. The test then asserted only that bob couldn't redeem *more*. Reading the name, you'd conclude the founder was locked out. The author had tested the wrong property and confidently named the test for what they thought they had tested. | +| **Stateful invariants (1.28M handler calls)** | The actor set was 5 keccak-derived addresses; the founder was `address(this)` and never participated in fuzzing. The one invariant that *should* have caught it (`invariant_EveryActorCanExitToPoolRoom`) had a stub body that re-asserted the aggregate `balance ≥ curveBackingRequired` check. After a successful drain, the aggregate check holds **vacuously** because `tokensSold = 0` and therefore `required = 0`. | +| **Halmos symbolic proofs** | All four `check_*` functions used a single symbolic actor. The founder-buyer ordering bug requires *two* actors with different taint state. Halmos cannot construct an attack it has no symbolic variable to express. | +| **Slither static analysis** | This is a control-flow / pattern detector. The bug is a higher-order property of the economic model, not a code-level antipattern. Slither was never going to find it. | +| **Mutation testing (302/302 killable)** | Mutation testing measures whether the test suite catches small code modifications. It cannot detect missing tests for properties no one specified. Every mutation was caught by the same flawed test that celebrated the exploit. | + +The structural failure is consistent across every tier: **the verification stack tested the properties someone wrote down, and nobody wrote down "no actor may take ETH that backs another actor's tokens."** The aggregate invariant is necessary but not sufficient — per-holder solvency is the load-bearing property and was never asserted. + +### How a separate AI session found it on the first pass + +The author ran an adversarial 3-evaluator review using a different AI session with no access to the build conversation. Each evaluator was given the contract and the README's claims about it, and asked to find places where the code did not match the prose. All three evaluators independently identified the same critical bug, traced the same exact attack path, and pointed at the same lines of code (`Headless.sol:148` mint, `:302-327` redeem). All three graded the contract `D`. None of the prior six tiers had flagged it. + +The methodological lesson is uncomfortable: **"the same AI built the contract and the tests, both confidently wrong about the same thing."** A second AI session, asked to argue *against* the artifact rather than verify it, was more useful than the entire automated stack. + +### The fix + +[PR #1](https://github.com/dreliq9/headless/pull/1) introduces: + +- A per-address `founderTaint` mapping initialized at construction. +- An `_update` override that propagates taint proportionally on every transfer (rounded up for stickiness, capped at sender's taint). Burns and mints leave taint alone. +- A new `redeem` precondition: `balanceOf(msg.sender) - amount >= founderTaint[msg.sender]`. The founder cannot burn any token that originated in their pre-mine, even if routed through fresh addresses. +- The previously-stub `invariant_EveryActorCanExitToPoolRoom` rewritten as a real per-holder check: for every actor, taint ≤ balance, and `Σ(balance − taint) == tokensSold`. The founder is now actor 0 of the handler. +- Four moderate findings also fixed: half-open auction window (closes the zero-fee floor block), `tokensSold > 0` gate on the first auction (closes the deploy-block self-rebate), constructor-immutable auction parameters (closes the L1-only block-time assumption), `twapCurveBase` input validation, `Donated` event in `receive()`. + +After the fix: + +| Tier | Result | +|---|---| +| Unit + fuzz | 58 / 58 pass (was 54: +4 founder regression tests, +1 close-block expiry test, −1 misleading test removed) | +| Stateful invariants | 10 / 10 pass × 256 runs × 500 handler calls = 1.28M (founder is actor 0; per-holder solvency invariant) | +| Halmos symbolic | 4 / 4 pass | +| Slither | unchanged | + +The bug was fixable. The methodological gap is more interesting than the bug. + +--- + +## Verification stack (pre-fix description, kept for the case study) + +Six methodologies, all run locally, all free, all originally green: | Tier | Tool | Scope | Result | |---|---|---|---| -| 1. **Unit tests** | `forge test` | 52 concrete behavioural scenarios | **52 / 52 pass** | -| 2. **Fuzz tests** | `forge test` | 2 property tests × 256 runs = 512 sampled inputs | **pass** | -| 3. **Stateful invariants** | `forge test` (handler + ghost state) | 10 invariants × 256 × 500 = **1,280,000 random handler calls** | **10 / 10 pass** | -| 4. **Static analysis** | [Slither](https://github.com/crytic/slither) | 25 contracts, 101 detectors | **0 material findings** | -| 5. **Symbolic verification** | [Halmos](https://github.com/a16z/halmos) | 4 `check_*` properties, 85 symbolic paths | **4 / 4 proved** | -| 6. **Mutation testing** | [slither-mutate](https://github.com/crytic/slither) + custom driver | 328 mutants × 8 operator categories (AOR/LOR/ROR/CR/MIA/MVIE/RR/SBR) | **302 / 302 killable mutants caught (100%)** | +| 1. **Unit tests** | `forge test` | concrete behavioural scenarios | 54 → 58 passing post-fix | +| 2. **Fuzz tests** | `forge test` | 2 property tests × 256 runs | pass | +| 3. **Stateful invariants** | `forge test` (handler + ghost state) | 10 invariants × 256 × 500 = 1.28M random handler calls | 10 / 10 pass | +| 4. **Static analysis** | [Slither](https://github.com/crytic/slither) | 25 contracts, 101 detectors | 0 material findings | +| 5. **Symbolic verification** | [Halmos](https://github.com/a16z/halmos) | 4 `check_*` properties, 86 symbolic paths | 4 / 4 proved within bounds | +| 6. **Mutation testing** | [slither-mutate](https://github.com/crytic/slither) + custom driver | 328 mutants × 8 operator categories | 302 / 302 killable mutants caught (100%) | + +### Bounds and disclosures the original README didn't make explicit -### Load-bearing invariants +- **Halmos proofs are bounded**: every `check_*` constrains `amount ≤ 100 ether` and operates on a single symbolic actor. The slope-dominated regime (large amounts) and the multi-actor regime (any property requiring two distinct callers) are unverified by Halmos. The founder-drain bug lives in the multi-actor regime — Halmos *cannot express it*, not "missed it." +- **Mutation testing measures coverage of the existing test suite, not coverage of the specification.** A test suite that asserts the wrong property will catch every mutation that contradicts the wrong property. 100% killable does not mean 100% correct; it means consistent. +- **The stateful fuzz handler is part of the test surface.** The actor set, the action selectors, and the ghost-variable definitions are all decisions the author makes. None of them caught the founder case until [PR #1](https://github.com/dreliq9/headless/pull/1) added the founder as actor 0 and rewrote the per-holder invariant. +- **Block-time sensitivity.** The original `AUCTION_INTERVAL = AUCTION_WINDOW = 25` constants assume Ethereum L1 timing (~5 minute cadence). On Base (~2 s blocks) the same constants give ~50 second auctions, which is a sniping market. PR #1 makes both constructor parameters with validation. -The invariant that must hold forever, on every block, after every mutating call: +### The load-bearing invariant ``` -address(this).balance ≥ curveIntegral(0, tokensSold) +address(this).balance ≥ curveIntegral(0, tokensSold) ``` -The continuous rebase tightens this to equality on every touch. If this ever breaks, redemptions fail and the "floor" promise is void. - -Plus nine others asserted during the 1.28M-call stateful fuzz: +True before [PR #1](https://github.com/dreliq9/headless/pull/1), still true after. **And insufficient.** It holds vacuously after a founder drain (`required = 0`). The per-holder version added in PR #1 is the actual safety property: -- `totalSupply == FOUNDER_ALLOCATION + tokensSold` (accounting consistency) -- `totalSupply ≤ MAX_SUPPLY` (hard cap) -- `curveBase ≥ INITIAL_CURVE_BASE` (monotonic non-decreasing) -- `tokensSold ≤ MAX_SUPPLY − FOUNDER_ALLOCATION` -- `totalRebased ≤ ghost_totalEthIn` (yield bounded by ETH inflows) -- `address(this).balance == ghost_totalEthIn − ghost_totalEthOut` (**conservation of ETH** — value is never created or destroyed) -- `FOUNDER_ALLOCATION == 3_000_000 ether` (immutable) -- `backing ≥ required` (per-actor exit sufficiency) -- Call summary (debugging aid) - -### Symbolic proofs (Halmos) +``` +∀ actor a: balanceOf(a) − founderTaint(a) ≥ 0 +Σ_a (balanceOf(a) − founderTaint(a)) == tokensSold +address(this).balance ≥ curveBackingRequired() +``` -For every whole-token amount in `[1, 100] ether`, Halmos has enumerated the full symbolic path space and proved: +If the global invariant alone had been the only safety property of this contract, the bug would have been undetectable. Per-holder solvency is the primitive that should have been written down on day one. -- `balance ≥ curveBackingRequired` after any `buy` -- `balance ≥ curveBackingRequired` after any `buy` + `redeem` sequence -- `balance_after == balance_before + total_paid` for any `buy` (exact conservation) -- `user.balance_after ≤ user.balance_before` for any `buy + redeem` round-trip (**no free money**) +--- -These are not samples — every feasible input is covered within the bounds. +## What v2 should look like (open design space) -### Mutation score +This section is forward-looking — design notes for anyone (you, me, someone else) who might pick this artifact up and build the next iteration. Ranked by how strongly I'd argue for each. -Raw score: **302 / 328 = 92.0 %**. The 26 surviving mutants are all equivalent mutants that no behavioural test can kill: +1. **Sub-whole-token denomination.** v1 enforces `amount % 1 ether == 0`. This is a convenience for the curve integral, not a property users want. It rules out micropayments — an agent that wants to pay 0.001 HDLS for a tool call literally cannot. For an "agent-native" token this is the largest UX gap. v2 should drop the restriction and accept the slightly messier integral. +2. **Slippage-bounded primitives.** Ship `buyWithMaxPayment(amount, maxETH)` and `redeemWithMinRefund(amount, minETH)` as first-class. Every agent that integrates v1 will re-implement these. +3. **Commit-reveal auction instead of first-caller-wins.** First-caller Dutch auctions are a gas war that selects for fastest-RPC infrastructure rather than economic signal. A commit-reveal scheme over two phases lets agents bid honestly without front-running. +4. **`simulate(action, params)` view.** Returns the post-trade state without mutating, so an agent can plan chains of actions in a single RPC call. +5. **Proof-of-reserves as an explicit API.** v1 exposes the invariant implicitly via `curveBackingRequired() ≤ address(this).balance`. v2 should expose `proofOfReserves() returns (uint256 backing, uint256 required, bool solvent, uint256 excessPerToken)` so agents do not have to derive solvency from primitives. +6. **`maxBuyForSlippage(bps)` view.** "How much can I buy in a single tx without moving the price by more than X bps?" Derivable from curve math today; should be a one-call view. +7. **Per-actor solvency as a load-bearing invariant from day one**, not bolted on after a critical finding. This is a methodology change, not a code change: write down what each holder is owed before writing the code that owes it to them. +8. **Adversarial review as a verification tier.** Every project that uses an automated verification stack should also schedule at least one separate AI session whose only job is to argue *against* the artifact's claims. The 3-evaluator pattern in PR #1's history is the cheapest tier in the stack and was the only one that found the critical bug. -- 20 **SBR** (storage type changes: `constant ↔ immutable`, `uint256 ↔ uint128` on constants — no runtime difference) -- 5 **ROR equivalents** (`<= 0` ≡ `== 0` for uint, `!= 0` ≡ `> 0` for uint, `<= req` vs `< req` differs only at equality where `delta == 0` returns anyway) -- 1 **MIA equivalent** (removed early-return guard takes the same `delta == 0` path) +These are not improvements I plan to make. They are notes for whoever does. -**Killable mutation score: 302 / 302 = 100 %.** Every mutation that *could* be detected by a behavioural test *is* detected. +--- -## How to verify it yourself +## How to verify what's here yourself ### Prerequisites @@ -103,20 +178,15 @@ pipx install halmos ### Clone and build ```bash -git clone --recurse-submodules https://github.com//headless.git +git clone --recurse-submodules https://github.com/dreliq9/headless.git cd headless forge build ``` -If you already cloned without `--recurse-submodules`: -```bash -git submodule update --init --recursive -``` - ### Run the tests ```bash -# Unit + fuzz tests (fast, ~1 second) +# Unit + fuzz tests (fast) forge test --no-match-contract 'Headless(Invariant|Halmos)' # Stateful invariants (slow — 1.28M calls, ~30 seconds) @@ -140,16 +210,18 @@ slither src/Headless.sol \ halmos --match-contract HeadlessHalmosTest ``` -### Run the mutation campaign (optional, slow) +### Reproduce the historical bug + +To see the original bug at the original commit, check out the parent of PR #1 and run: ```bash -slither-mutate src/Headless.sol \ - --test-cmd "forge test --no-match-contract 'Headless(Invariant|Halmos)' -q" \ - --output-dir mutation_campaign -bash mutation_run.sh +git checkout f1d793c~1 # parent of the fix commit +forge test --match-test test_FounderCannotRedeemBelowBaseline -vv ``` -The custom driver is necessary because `slither-mutate`'s own result reporter misinterprets forge exit codes; the shell script in `mutation_run.sh` re-runs each generated mutant and correctly classifies caught vs. survived. +You will see the test pass while the founder successfully drains alice's deposit. That's the methodological finding in one command. + +--- ## Repository layout @@ -161,50 +233,37 @@ headless/ ├── foundry.toml ├── mutation_run.sh # custom mutation testing driver ├── src/ -│ └── Headless.sol # the contract (~480 lines, every line AI-written) +│ └── Headless.sol # the contract ├── test/ -│ ├── Headless.t.sol # 52 unit + fuzz + mutation-killer tests -│ ├── HeadlessInvariant.t.sol # 10 stateful invariants via handler +│ ├── Headless.t.sol # unit + fuzz + mutation-killer + founder regression tests +│ ├── HeadlessInvariant.t.sol # 10 stateful invariants via handler (founder is actor 0) │ └── HeadlessHalmos.t.sol # 4 symbolic proofs └── script/ - └── Deploy.s.sol # no-arg deploy + └── Deploy.s.sol # deploy script with auction-cadence parameters ``` ## Safety -**This contract is not audited. It has not been deployed. Do not put real ETH into it.** - -The verification stack above is strong — probably stronger than a lot of audited-and-shipped DeFi contracts — but it is not a substitute for a human audit. Automated tools catch known bug patterns and specified invariants; auditors catch the unknown unknowns, misaligned incentives, unstated assumptions, and novel attack vectors. +**This contract is unaudited and not deployed. Do not put real ETH into it.** -If you want to play with this: +The verification stack documented above caught one critical bug only after a separate AI session was specifically asked to argue against the claims of the original session. The same stack failed to catch that bug when run end-to-end as a positive verification of the contract. Before this finding I would have said "the verification stack is probably stronger than most audited DeFi contracts." After this finding I would say "automated verification stacks of any depth catch the bugs you specify and miss the bugs you don't specify, and the gap between those two sets is exactly where you get exploited." -- **Deploy to a testnet.** Base Sepolia is free and has Foundry tooling. Zero real value at risk. -- **Open an Immunefi bounty** if you're curious whether there are bugs — you only pay on findings, and for a zero-TVL contract the cost is also zero. -- **Read the code.** All 480 lines are heavily commented and the invariants are spelled out at the top of `Headless.sol`. The provenance in `PROMPTS.md` shows every reason behind every decision. +If you want to use this contract for anything beyond reading: -Contributions, bug reports, and "this mechanism is wrong for reason X" feedback are all welcome. Open an issue. +- **Deploy to a testnet.** Base Sepolia is free. +- **Read [PR #1](https://github.com/dreliq9/headless/pull/1) before reading any other claim in this README.** The verification stack and the "fair launch" framing were both materially wrong until that PR; the README around them is now caveated, not deleted. +- **Treat this as a research artifact, not a portfolio piece you put money into.** -## Known limitations - -- **Single-chain, ETH-only backing.** No cross-chain, no multi-asset collateral. -- **No Uniswap v4 hook or ERC-4626 wrapper.** Deliberately scoped out — see `PROMPTS.md` for the reasoning. -- **Bounded Halmos proofs.** Symbolic verification is bounded to amounts ≤ 100 ether and single operations. The invariant tests cover larger sequences stochastically; only unbounded formal verification (Certora, K-framework) would give complete proof. -- **No audit.** As noted above. - -## What's interesting here (beyond the code) - -- **Provenance as a feature.** `PROMPTS.md` records every prompt that produced this contract, in order. The "AI wrote every line" claim is falsifiable — you can read the conversation. -- **A non-coder shipped this.** I don't know Solidity. I don't know Python. I prompted each component into existence via Claude Code, verified the output via the six-tier stack, and iterated until every tier was green. The AI wrote the code; I wrote the intent. -- **Six verification tiers with $0 in tooling cost.** Everything in this repo runs locally on a laptop for free. The only thing you cannot do without paying is a human audit — and for a zero-TVL portfolio piece, you don't need one. - -If this is interesting to you as a research artifact, reach out — I'd love to hear what other AI-authored verification stacks look like. +Bug reports, "this mechanism is wrong because X" feedback, and methodological critique of the case study are all welcome. ## License -**AGPL-3.0-or-later.** See [`LICENSE`](LICENSE) for the full text. +**AGPL-3.0-or-later.** See [`LICENSE`](LICENSE). + +## Contact -Short version: you can fork, modify, use, and even sell this code commercially — but any derivative work (including code you run as a network service) must be released under AGPL-3.0 too. In plain English: no closed-source copy-and-resell. If you improve it, the whole ecosystem gets to see the improvement. +Methodological critique, review requests, or questions about the case study: please open an issue on this repository. --- -*Built with [Claude Code](https://claude.com/claude-code). See [`PROMPTS.md`](PROMPTS.md) for the full story.* +*Built with [Claude Code](https://claude.com/claude-code). The build conversation is logged in [`PROMPTS.md`](PROMPTS.md). The adversarial review and fix conversation is the second half of the same file.*