diff --git a/.github/workflows/proverif.yml b/.github/workflows/proverif.yml new file mode 100644 index 0000000..c0ecf58 --- /dev/null +++ b/.github/workflows/proverif.yml @@ -0,0 +1,51 @@ +name: ProVerif + +on: + push: + pull_request: + +permissions: + contents: read + +# Cancel superseded runs on the same ref. +concurrency: + group: proverif-${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true + +jobs: + proverif: + name: proverif (Rung 4 SSH channel, informational) + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v6 + # ProVerif brings its own toolchain (installed via opam / OCaml), + # independent of rust-toolchain.toml, so this does not affect the gating + # Rust jobs. m4 expands the single .m4.pv template into the two + # configurations. + - uses: ocaml/setup-ocaml@v3 + with: + ocaml-compiler: '5' + # Refresh the apt index before opam pulls ProVerif's system deps (it + # depends on the GTK GUI libs). Without this the runner's mirror + # intermittently 404s on a superseded liblzma-dev version pin, which + # fails this otherwise non-gating job at install time (seen flapping on + # this branch -- same commit passing on one trigger, failing on another). + - name: Refresh apt package index + run: sudo apt-get update + - name: Install ProVerif + run: opam install -y proverif + # Informational only, like the kani / geiger jobs. Both configurations are + # run so the log shows both outcomes. The FIRST-CONTACT run is *expected* + # to report query (c) (payload secrecy) false with a key-substitution + # attack derivation -- that surfaced gap is the artifact's purpose + # (attack-tree leaf 4.2), so the job must not gate on ProVerif's exit code + # (continue-on-error + the `|| true` below). + - name: Check the SSH-channel model (pinned + first-contact) + continue-on-error: true + run: | + echo '================ PINNED (known_hosts populated) ================' + m4 -DPINNED proofs/proverif/ssh-channel.m4.pv > pinned.pv + opam exec -- proverif pinned.pv || true + echo '================ FIRST CONTACT (no pinning; MITM expected) =====' + m4 proofs/proverif/ssh-channel.m4.pv > unpinned.pv + opam exec -- proverif unpinned.pv || true diff --git a/.github/workflows/tlc.yml b/.github/workflows/tlc.yml new file mode 100644 index 0000000..794f7f0 --- /dev/null +++ b/.github/workflows/tlc.yml @@ -0,0 +1,59 @@ +name: TLC + +on: + push: + pull_request: + +permissions: + contents: read + +# Cancel superseded runs on the same ref. +concurrency: + group: tlc-${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: true + +jobs: + tlc: + name: tlc (Rung 4 TLA+ models, informational) + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v6 + - uses: actions/setup-java@v4 + with: + distribution: temurin + java-version: '21' + # TLA+ tools manage their own toolchain (a single jar), independent of + # rust-toolchain.toml, so this does not affect the gating Rust jobs. The + # committed .tla already contains the pcal.trans translation, so no + # PlusCal re-translation step is needed. + - name: Fetch TLA+ tools + run: | + curl -sSL -o tla2tools.jar \ + https://github.com/tlaplus/tlaplus/releases/download/v1.8.0/tla2tools.jar + # Informational only, like the kani / geiger jobs: surfaces the Rung 4 + # model-check in the job log without gating merge while the property set + # is still growing. Promote to a required check once stable. + - name: Model-check the approval state machine + continue-on-error: true + run: | + java -cp tla2tools.jar tlc2.TLC -nowarning \ + -config proofs/tla/ApprovalStateMachine.cfg \ + -workers auto \ + proofs/tla/ApprovalStateMachine.tla + # The Extended Rung 4 freshness <-> replay-retention window-sizing model. + - name: Model-check the replay-window sizing + continue-on-error: true + run: | + java -cp tla2tools.jar tlc2.TLC -nowarning \ + -config proofs/tla/ReplayWindow.cfg \ + -workers auto \ + proofs/tla/ReplayWindow.tla + # The Extended Rung 4 concurrent-handler interleaving model (shared `seen` + # set + TTY lock); checks the atomic try_insert closes the dedup TOCTOU. + - name: Model-check the concurrent handlers + continue-on-error: true + run: | + java -cp tla2tools.jar tlc2.TLC -nowarning \ + -config proofs/tla/ConcurrentHandlers.cfg \ + -workers auto \ + proofs/tla/ConcurrentHandlers.tla diff --git a/README.md b/README.md index 88687f0..229ef7c 100644 --- a/README.md +++ b/README.md @@ -84,6 +84,7 @@ Each `execute` shows the exact command in the TUI; press `y` to approve, - [docs/usage.md](docs/usage.md) — CLI flags, non-privileged mode, SSH tunnels, agent forwarding - [docs/protocol.md](docs/protocol.md) — JSON-line wire protocol over the Unix socket - [docs/security.md](docs/security.md) — security model; [docs/security-audit.md](docs/security-audit.md) — point-in-time audit; [docs/threat-model.md](docs/threat-model.md) — STRIDE + attack tree; [docs/formalisation-roadmap.md](docs/formalisation-roadmap.md) — graduated-assurance plan; [docs/assurance-case.md](docs/assurance-case.md) — GSN argument +- [proofs/](proofs/) — machine-checked models: [TLA+/PlusCal approval state machine](proofs/tla/) (TLC, Rung 4) and [ProVerif SSH-channel model](proofs/proverif/) (Rung 4); Kani bounded proofs live in [src/proofs.rs](src/proofs.rs) (Rung 3) - [docs/pkexec.md](docs/pkexec.md) — `--pkexec` mode and polkit auth caching (not recommended) - [docs/architecture.md](docs/architecture.md) — source layout and implementation status - [docs/comparison.md](docs/comparison.md) — how sudo-proxy relates to other tools diff --git a/backlog.md b/backlog.md index 5c40d5b..636dec4 100644 --- a/backlog.md +++ b/backlog.md @@ -6,14 +6,20 @@ the assurance ladder defined in [docs/formalisation-roadmap.md](docs/formalisation-roadmap.md); each names its rung. -## Rung 4 — Protocol-level models - -TLA+/PlusCal or Alloy model of the approval + `confirm_unprivileged` policy state -machine: replay impossible, no exec without approval, policy flag transitions -only on an interactive keypress (pins down the F2 / attack-tree leaf 1.4/4.4 -residual). Tamarin/ProVerif model of the SSH path to make the channel -assumptions explicit and surface the first-contact MITM gap (attack-tree leaf -4.2 / assumption A4). +## Cap far-future request timestamps (replay-protection gap) + +Surfaced by the Extended Rung 4 `ReplayWindow` TLA+ model. `parse_age` +(`server.rs:146`) clamps a future-dated timestamp to age 0 with **no upper +bound**, so a request dated beyond `REPLAY_RETENTION` into the future passes the +freshness gate indefinitely while its id ages out of `SeenIds` — making it +replayable every retention window *regardless of how large retention is*. The +60s↔120s window relationship does not help here. Low severity (needs the request +onto the channel — SSH pinning guards that — and privileged commands still gate +the first run on a keypress), but for auto-approved unprivileged commands it is a +real repeatable replay. Fix: in `check_freshness` / `parse_age`, reject +timestamps more than a small clock-skew allowance into the future instead of +clamping to age 0; extend the Rung 2 freshness property and the `ReplayWindow` +model's `NoFutureReplay` invariant to cover it. ## Rung 5 — Deductive verification of the trusted core diff --git a/changelog.md b/changelog.md index 9462384..13dc57b 100644 --- a/changelog.md +++ b/changelog.md @@ -3,6 +3,261 @@ Most recent at top. See [docs/formalisation-roadmap.md](docs/formalisation-roadmap.md) for the assurance-ladder context behind the security-formalisation entries. +## 2026-06-11 — Extended Rung 4: SSH key-acceptance model, mutual auth (ProVerif) + +Reworked the Rung 4 ProVerif model (`proofs/proverif/ssh-channel.m4.pv`, +**non-gating** in CI) from a channel-toggle that *declared* the tunnel private +when pinned — assuming the MITM — to a **key-acceptance** encoding that +**derives** it. The model now carries the real key material and the `PINNED` +`m4` toggle flips only *which host key the daemon accepts*; the channel is the +public Dolev-Yao network in both configurations. Strengthens assurance-case leaf +Sn6.3 / G6.3 and makes assumption A4 genuinely load-bearing. + +- **Both-or-nothing design (mutual auth).** Modelling the host key alone while + ignoring client auth would misrepresent the system (it would paint the honest + remote as accepting forged commands even when pinned, which `authorized_keys` + prevents). So the model captures the real SSH mutual-authentication structure: + a **host keypair** (server→client: pinning + confidentiality, A4) and a + **client keypair** (client→server: `authorized_keys`, request integrity). The + crypto *primitives* stay a perfect black box — "under the hood" means the + authentication *structure*, not SSH's algorithms. + +- **The derived MITM.** Query **(c)** `not attacker(agentSecret)` is `true` when + pinned and **`false`** on first contact, with ProVerif's derivation being + exactly the key substitution: the attacker mints its own host key `pk(k)`, the + daemon accepts it and encrypts the payload to it, the attacker decrypts with + `k`. This is leaf 4.2 *derived* rather than assumed. + +- **The disentanglement (the payoff).** The four queries attribute each guarantee + to its assumption: **(a)** command authenticity (`RemoteAccept ⟹ RequestSent`) + is `true` in **both** configs — it rides on **client auth**, so a first-contact + eavesdropper still cannot forge or alter a command; **(c)** confidentiality + rides on **host-key pinning (A4)**; **(d)** separation + (`RemoteExec ⟹ Keypress`) is `true` in both — the private TTY (A1) is out of + the attacker's reach. The earlier model's "(a) holds iff pinned" was an + artifact of the assumed-private channel; the key-level model corrects it. + +- **Injective replay (b), now honest.** `false` in **both** configs with a real + public-channel replay trace — the earlier "cannot be proved when pinned" + private-channel over-approximation is **gone**. App-layer injective + replay-resistance stays discharged by the TLA+ `ReplayImpossible` invariant and + the Kani freshness-monotonicity proof. + +- **Teeth — three documented negative controls** (apply by hand, never + committed), each run and producing the expected result: **NC1** drops the + `checksign` client-auth check → **(a)** turns `false` even pinned (client auth + is load-bearing; confidentiality stays `true`, showing orthogonality); **NC2** + is the first-contact run itself → **(c)** `false`; **NC3** makes `ttyAck` + public → **(d)** no longer provable (the private TTY is load-bearing). + +- **Docs:** `proofs/proverif/README.md` rewritten (key-acceptance toggle, the + query table + "rides on" column, the disentanglement narrative, the three + controls, scope with the now-explicit client-auth assumption); `proverif.yml` + comment names **(c)** as the expected-`false` first-contact query; + `docs/threat-model.md` (leaf 4.2), `docs/assurance-case.md` (A4 + Sn6.3), and + `docs/formalisation-roadmap.md` updated for derived-not-assumed + the + per-assumption attribution. No Rust touched (`proofs/` is outside `src/`); the + three TLA+ sibling models are untouched. + +- **Verified:** ProVerif 2.05, both `m4` configs — pinned: (a) true, (b) false, + (c) true, (d) true; first contact: (a) true, (b) false, (c) **false** with the + key-substitution derivation, (d) true. All three negative controls fail as + required. + +## 2026-06-11 — Extended Rung 4: concurrent-handler interleaving (TLA+) + +Added a third TLA+/PlusCal model, `proofs/tla/ConcurrentHandlers.tla` (checked by +TLC, **non-gating** in CI like its two siblings), discharging the *canonical* +model-checking use case the two atomic, one-request-at-a-time models cannot +express by construction: **concurrent** daemon handler threads interleaving on +the two pieces of `Arc>`-shared state. It *verifies the atomicity fix is +sufficient* — strengthening the concurrent half of assurance-case leaf 1.1 / G2.1. + +- **The two theorems.** + - **`NoDoubleExec`** — no request id ever executes twice under **any** + interleaving. This is the closure of the dedup TOCTOU: `SeenIds::try_insert` + (`server.rs:213`) folds the contains-check and the insert into one critical + section under the `seen_ids` mutex; the model proves that single critical + section is *sufficient* to serialise two threads racing the same id. Witnessed + by a monitor flag (`vDoubleExec`) raised at an exec site if the id already ran. + - **`TtyMutualExclusion`** — at most one handler is in the interactive TTY + region (prompt or foreground exec) at a time, so `tty_lock` serialises + /dev/tty across all interleavings (the PR #22 background-pgrp / EIO hazard). + Witnessed by a bounded counter (`ttyActive ≤ 1`). + +- **Faithful to the dispatch's lock discipline.** PlusCal labels are the + atomicity boundaries, so TLC interleaves exactly where the threads can. The + privileged path holds `tty_lock` for the prompt, **releases before exec**, then + `ForegroundGuard` **re-takes** it for the foreground swap (two critical + sections); `exec_direct` takes no lock; the no-confirm banner is best-effort + `try_lock`. So another handler may legitimately prompt in the released gap — + never *simultaneously*. + +- **Result.** TLC reports no error: **5116 distinct states** at `Handlers = + {h1, h2}` (sub-second), and **238 590** at `{h1, h2, h3}`. Two handlers is the + minimal witness for the pairwise dedup TOCTOU and the TTY race; a third only + adds symmetric interleavings (a one-line `.cfg` change, kept out of CI for + speed). The same-id race is genuinely exercised (one handler wins `try_insert` + and execs, the other is rejected as a duplicate), so the pass is non-vacuous. + +- **Teeth — two documented negative controls** (apply by hand, never committed): + **NC1** splits the atomic `TryInsert` into check-release-insert, re-introducing + the TOCTOU → `NoDoubleExec` violated (the load-bearing control: it proves the + model *sees* the race the atomicity closes); **NC2** drops the `tty_lock` around + the privileged prompt → `TtyMutualExclusion` violated (reproduces the PR #22 + hazard). Both were run and produce the expected counterexample. + +- **Scope** (per the README faithfulness ledger): the shared `seen`/`tty_lock` + races and `try_insert`'s atomicity are modelled faithfully; the per-request gate + chain is assumed-passed (sequential, the `ApprovalStateMachine` model's job), + eviction is never-evict (the `ReplayWindow` model's job), and + `confirm_unprivileged` is read-only here (no `a` key). Handler count bounded to + 2 (3 also checked). + +- **CI / docs:** `.github/workflows/tlc.yml` gains a third `continue-on-error` + TLC step; `proofs/tla/README.md` adds the model's section and updates the two + siblings' "out of scope" notes to point at it; the roadmap's Rung 4 scope note + is corrected (concurrency/TOCTOU is no longer "stays elsewhere"). No Rust + touched (`proofs/` is outside `src/`); the two sibling models re-check unchanged. + +## 2026-06-11 — Extended Rung 4: freshness ↔ replay window-sizing (TLA+) + +Added a second TLA+/PlusCal model, `proofs/tla/ReplayWindow.tla` (checked by +TLC, **non-gating** in CI like its sibling), proving the temporal property the +approval-state-machine model deliberately hides: it models `seen` as +*never-evicting* ("conservative for `ReplayImpossible`"), which makes the +headline replay claim conditional. The new model restores a small abstract clock +and the **real TTL eviction** and discharges the genuinely temporal invariant — +tightening `ReplayImpossible` from conditional to real and strengthening +assurance-case leaf 1.1 / G2.1. + +- **The theorem — `PastReplayImpossible`.** No captured *past-dated* request is + ever re-executed. The window arithmetic: an id is stamped at acceptance + (`insertedAt ≥ firstTs`) and evicted only at `clock − insertedAt > Retention`; + a replay re-uses the fixed `firstTs` and stays fresh only while + `clock − firstTs ≤ Freshness`. With `insertedAt ≥ firstTs` and + `Retention ≥ Freshness`, eviction and freshness can never co-occur. TLC: no + error, 442 distinct states, sub-second. + - Constants abstract the two real windows (`MAX_REQUEST_AGE`=60s, + `REPLAY_RETENTION`=120s) to `Freshness=2`, `Retention=4` (=2F), `MaxTime=6`, + `Ids={r1}` — only the *relationship* matters. Bounded clock ⇒ finite, small + state space. + - **Why a monitor, not a `seen`-membership invariant:** eviction is *lazy* + (pruned only inside `try_insert`) and a replay-accept removes-then-reinserts + the id atomically, so a membership predicate can never observe the gap. Only + an event monitor (`vReplay`) is a robust witness — the same idiom the sibling + model uses. + +- **Finding 1 — the tight bound is `Retention ≥ Freshness`, not `2×`.** TLC + confirms the theorem holds at `Retention = Freshness` and fails at + `Freshness − 1`. The shipped `120s = 2×60s` is **conservative margin, not a + necessity** (max acceptance lag only pushes eviction later) — *not a bug*, and + the code is safer than its comment claims. + +- **Finding 2 — future-dated timestamps defeat replay protection after + eviction (new residual).** `parse_age` clamps future timestamps to age 0 + (`server.rs:146`) with no upper cap, so a request dated beyond `Retention` into + the future stays fresh forever while its id ages out — replayable every + `Retention` units regardless of size. The model exposes a concrete trace via + the `NoFutureReplay` invariant (e.g. `ts=3` accepted at `clock=0`, evicted at + `clock=5`, re-executed). Logged as a backlog item (cap far-future timestamps); + not fixed in this proofs-only session. + +- **Teeth:** documented negative controls — `Retention < Freshness` (window + relationship broken → `PastReplayImpossible` fails) and freshness-gate-disabled + (→ `AcceptWasFresh` + `PastReplayImpossible` fail, showing freshness is + load-bearing). The originally-planned "evict by the wrong field" control turned + out *safe* at `Retention ≥ Freshness` — a small result the model also settles, + recorded in the README. + +- **Scope (per the README faithfulness ledger):** the gate order at acceptance + (freshness → evict → dedup → insert), eviction keyed on insertion time, and the + `parse_age` future-clamp are modelled faithfully; the clock/windows are + abstracted to small relationship-preserving integers; one id (per-id property), + atomic handling (justified by the entry-time freshness check, `server.rs:499`), + and concurrency/keypress gating are left to the sibling model. + +- **CI:** `.github/workflows/tlc.yml` gains a second `continue-on-error` TLC step + for `ReplayWindow`; added `proofs/tla/.gitignore` for TLC/`pcal.trans` + artifacts. The sibling `ApprovalStateMachine` re-checks unchanged (9072 states); + no Rust touched (`proofs/` is outside `src/`). + +## 2026-06-11 — Rung 4: protocol-level formal models (TLA+ + ProVerif) + +> **Note (superseded in part):** the ProVerif description below reflects the +> original *channel-toggle* model (private tunnel when pinned, with the +> injective-replay over-approximation). It was reworked the same day into the +> *key-acceptance* mutual-auth model — see the **"Extended Rung 4: SSH +> key-acceptance model, mutual auth (ProVerif)"** entry at the top of this file. +> The TLA+ half of this entry stands unchanged. + +Discharged **Rung 4** of the assurance ladder — the temporal/relational +properties that are not per-function — with two machine-checked models under a +new top-level `proofs/` dir (kept out of `src/` so cargo/clippy never see them). +Both run **non-gating** in CI, mirroring the Rung 3 Kani job. Strengthens the +assurance-case leaves Sn5.1 / Sn5.3 (state machine) and Sn6.3 (SSH) from +`[partial] → planned` to machine-checked / formally-characterised evidence. + +- **Approval state machine — TLA+/PlusCal** (`proofs/tla/`, checked by TLC). The + PlusCal transcribes the `handle_connection` gate chain and `tui::classify_key`; + a nondeterministic environment process quantifies over all attacker + field-forgeries/replays *and* all operator keypresses. Four safety properties + hold: `NoExecWithoutApproval`, `ReplayImpossible`, `PolicyFlipsOnlyOnKeypress` + (+ a `FlagMonotone` action property), `PrivilegedGateIndependentOfPolicy` — + the machine-checked closure of the 1.4/4.4 policy-transition residual (F2). + - Properties are tracked by bounded **monitor variables** (a violation flag + raised at each bad-event site), so the reachable state space is finite and + small (~9k states, sub-second) with *no* history-length constraint — the + first cut with a full event log blew up past 180M states. No `MaxLog` hack. + - **Teeth:** four documented negative-control mutations (exec on timeout; flag + wired into the privileged gate; dedup gate removed; flag flipped on a + non-`a` keypress) each produce the expected TLC counterexample. + - **Decision: TLA+/PlusCal over Alloy** — the properties are temporal/safety + over an evolving state machine, TLC's home turf; Alloy's relational style + fits less well. Recorded in `docs/formalisation-roadmap.md`. + +- **SSH path — ProVerif** (`proofs/proverif/`). A symbolic Dolev-Yao model with a + compile-time `PINNED` toggle (via `m4`): the SSH tunnel is a private channel + when the host key is pinned, public when not. It proves channel authenticity + `(a)` and payload/agent secrecy `(c)` hold **iff** the key is pinned, exhibits + a concrete **first-contact MITM** trace when it is not (attack-tree leaf 4.2), + and proves a **separation theorem** `(d)` — a privileged exec at the honest + remote requires a human keypress even with the channel fully compromised. This + makes assumption A4 load-bearing and explicit. + - **Honest note on injective replay `(b)`:** ProVerif reports it "cannot be + proved" on the *private* (pinned) channel — a known over-approximation of + its non-linear private-channel semantics, with **no** attack trace; on the + public channel it is genuinely `false` with a replay trace (the surfaced + gap). App-layer injectivity is discharged instead by the TLA+ + `ReplayImpossible` invariant and the Kani freshness-monotonicity proof — a + coherent cross-rung split, and faithful to the threat model's "replay + protection rides on A4". + - **Decision: ProVerif over Tamarin** — one binary toggle + an attack + derivation, expressed with least ceremony and proved fully automatically; + Tamarin's unbounded-state/inductive strengths aren't needed. Recorded in the + roadmap. + +- **Scope, stated plainly** (per-model README "faithfulness ledger" / scope + notes): the TLA+ model abstracts the clock/env/decode to booleans, never-evicts + the replay set (conservative for replay), and leaves concurrency/TOCTOU and + per-field content scanning to their other rungs; the ProVerif model treats SSH + crypto as a perfect black box and abstracts time to nonces + ordering. The two + *accepted* residuals (by-design unprivileged auto-approve; A4 first-contact + dependency) are now formally **characterised**, not closed. + +- **CI:** `.github/workflows/tlc.yml` (fetches `tla2tools.jar`, runs TLC) and + `.github/workflows/proverif.yml` (installs ProVerif via opam, runs both `m4` + configurations) — both `continue-on-error`, like the Kani/geiger jobs. The + unpinned ProVerif run reporting `(a)/(c)` false is the *expected, desired* + output, so the job is not gated on ProVerif's exit code. + +- **Verified:** TLC reports no error (9072 distinct states) and each negative + control fails as required; ProVerif gives the pinned-vs-first-contact split + above. Core + `--features mcp` release build and + `cargo clippy --all-targets --all-features -- -D warnings` remain clean (no + Rust touched; `proofs/` is outside `src/` and not a workspace member, so cargo + ignores it). + ## 2026-06-10 — Rung 3: Kani bounded proofs + validation-boundary typestate Discharged **Rung 3** of the assurance ladder. The Kani half proves the diff --git a/docs/assurance-case.md b/docs/assurance-case.md index 8207ed7..0d6c00d 100644 --- a/docs/assurance-case.md +++ b/docs/assurance-case.md @@ -65,7 +65,7 @@ asserted, evidence not yet produced). | **A1** | Assumption | `/dev/tty` is trusted: what the daemon writes is what the human sees, and the keypress read is the human's. | | **A2** | Assumption | `sudo` (and `pkexec`) correctly enforce escalation and `env_reset`; the kernel enforces socket permissions and `SO_PEERCRED`. | | **A3** | Assumption | The host is effectively single-user *or* the operator accepts that any same-UID process is already inside the trust boundary for non-privileged actions (finding F2). | -| **A4** | Assumption | For the SSH path, remote hosts are in `known_hosts` before first use (or `StrictHostKeyChecking=accept-new` is configured); the SSH channel provides confidentiality and integrity. | +| **A4** | Assumption | For the SSH path, remote hosts are in `known_hosts` before first use (or `StrictHostKeyChecking=accept-new` is configured) — this gives payload confidentiality and lets the daemon reach the genuine remote; **and** the daemon authenticates to the remote by key (the remote's `authorized_keys`), which gives command authenticity at the remote. The ProVerif model ([`proofs/proverif/`](../proofs/proverif/)) makes both halves explicit and shows which guarantee rests on which. | | **J1** | Justification | The whole security model reduces to G1 (see [security-audit.md](security-audit.md#summary)); structuring the argument around the single invariant keeps every control traceable to it. | | **S1** | Strategy | Argue over the causal path from an inbound request to root execution: a request must be *authentic* (G2), what executes must be *the thing that was validated* (G3), the human must *see the real command* (G4), approval must be *necessary and binding* (G5), and no adversary may *bypass* the gate (G6). | @@ -121,11 +121,11 @@ keypress binds to the command shown.* | Node | Claim / Evidence | Rung | Status | |------|------------------|------|--------| | **G5.1** | `privileged:true` always reaches the Y/N gate regardless of any policy. | — | | -| **Sn5.1** | Verified in audit: only an interactive keypress sets policy; no request field, replay, or MCP tool flips it. `src/server.rs`. | 2 | [partial] → TLA+/Alloy state-machine model (Rung 4) | +| **Sn5.1** | TLC-checked: the `PolicyFlipsOnlyOnKeypress` invariant of [`proofs/tla/`](../proofs/tla/) proves the policy flag flips only via an interactive `a` keypress on an unprivileged request — never a request field, replay, MCP flag, or timeout — over all attacker forgeries/replays and operator choices. `src/server.rs`. | 4 | [discharged] (model-checked) | | **G5.2** | The prompt reads a single keypress in non-canonical mode and times out after 60 s (default **deny**). | — | | | **Sn5.2** | `src/tui.rs` prompt; timeout test. | 1 | [discharged] | | **G5.3** | The `confirm_unprivileged=false` policy relaxes only the **non-privileged** gate, never the privileged one, and only via an interactive `a` keypress. | — | | -| **Sn5.3** | Audit finding **F2**: by-design trade-off, residual risk documented; `display_banner` reliability improvement recommended. | 0 | [partial] → model-check transition (Rung 4) | +| **Sn5.3** | TLC-checked: `NoExecWithoutApproval` + `PrivilegedGateIndependentOfPolicy` ([`proofs/tla/`](../proofs/tla/)) prove the privileged gate requires a `y` keypress for *any* value the policy flag took, so `confirm_unprivileged` relaxes only the non-privileged gate. Audit finding **F2** by-design trade-off still documented; `display_banner` reliability is a separate backlog item. | 4 | [discharged] (model-checked) | ### G6 — Adversary cannot bypass the gate @@ -138,7 +138,7 @@ keypress binds to the command shown.* | **G6.2** | **A2** (same-UID direct socket): daemon-side validation does not depend on client-side `validate_host`; display fields validated at the daemon. | — | | | **Sn6.2** | F1 fix moves sanitization into `validate_request` (daemon side); `SO_PEERCRED` + 0600. | 2 | [discharged] | | **G6.3** | **A3** (network / remote): `validate_host` allowlist `[A-Za-z0-9._@:-]`, rejects leading `-` (ssh option injection); host is a trailing positional; remote UID digit-only + length-capped; ssh via argv. | — | | -| **Sn6.3** | `src/server.rs`, `src/bin/sudo-proxy.rs`, `src/hosts.rs`. **Open:** no `StrictHostKeyChecking` → first-contact MITM depends on user config (A4). | 1 | [partial] → Tamarin/ProVerif (Rung 4) | +| **Sn6.3** | `src/server.rs`, `src/bin/sudo-proxy.rs`, `src/hosts.rs`. ProVerif model ([`proofs/proverif/`](../proofs/proverif/)) makes **A4 explicit** by modelling the real host-key + client-key material: it **derives** the first-contact MITM from host-key substitution (payload secrecy `false` unpinned — leaf 4.2), and disentangles confidentiality (rides on host-key pinning) from command authenticity (rides on client auth, so it holds even on first contact); the separation theorem shows a channel compromise does not bypass the local keypress gate. Residual A4 (no `StrictHostKeyChecking`) now formally characterised, not closed. | 4 | [discharged] (residual A4 made explicit) | | **G6.4** | Resource exhaustion cannot force-open the gate: 1 MiB request cap, 64 in-flight, 16 MiB output cap. | — | | | **Sn6.4** | `src/server.rs`, `src/executor.rs`; cap test (flaky **S2**, control sound). | 1–2 | [partial] | @@ -147,13 +147,15 @@ keypress binds to the command shown.* These are the leaves where the argument is currently weakest, drawn from [security-audit.md](security-audit.md) and the rungs not yet climbed: -- **Sn2.2 / Sn5.1 / Sn5.3** — the freshness and approval-state-machine claims - rest on fuzzing + manual reasoning; the intended strengthening is **Kani** - (Rung 3) for `parse_age` and a **TLA+/Alloy** model (Rung 4) of the - approval + policy transitions. -- **Sn6.3** — the A3 channel guarantees rest on assumption **A4**; a - **Tamarin/ProVerif** model (Rung 4) would make the dependency explicit and - surface the first-contact MITM gap. +- **Sn2.2** — the freshness arithmetic is discharged by **Kani** (Rung 3); + `Sn5.1 / Sn5.3` (the approval + policy transitions) are now discharged by the + **TLC-checked** state machine ([`proofs/tla/`](../proofs/tla/), Rung 4). +- **Sn6.3** — the A3 channel guarantees rest on assumption **A4**; the + **ProVerif** model ([`proofs/proverif/`](../proofs/proverif/), Rung 4) now + makes that dependency explicit by *deriving* the first-contact MITM from + host-key substitution (a *characterised residual*, not a closed one — see leaf + 4.2), and attributes confidentiality to host-key pinning vs command + authenticity to client auth. - **Sn6.4 / S2** — stabilise the flaky resource-cap test so the cap stays covered in CI. - **Sn4.4 / Sn5.3** — accepted residual risks (look-alike `reason`, the diff --git a/docs/formalisation-roadmap.md b/docs/formalisation-roadmap.md index 52d4c56..7562e94 100644 --- a/docs/formalisation-roadmap.md +++ b/docs/formalisation-roadmap.md @@ -219,6 +219,63 @@ The most interesting properties are temporal and relational, not per-function: will surface the first-contact MITM gap the audit noted (the ssh invocation sets no `StrictHostKeyChecking`). +**Status: done.** Both halves are built and run non-gating in CI +(`.github/workflows/tlc.yml`, `proverif.yml`), mirroring the Rung 3 Kani job. + +- **Approval state machine — TLA+/PlusCal** ([`proofs/tla/`](../proofs/tla/), + checked by TLC). The PlusCal transcribes the `handle_connection` gate chain and + `tui::classify_key`; a nondeterministic environment process quantifies over all + attacker field-forgeries / replays *and* all operator keypresses. Four safety + properties hold: `NoExecWithoutApproval`, `ReplayImpossible`, + `PolicyFlipsOnlyOnKeypress` (+ a `FlagMonotone` action property), and + `PrivilegedGateIndependentOfPolicy` — discharging the 1.4/4.4 policy-transition + residual. Properties are tracked by bounded *monitor variables* so the state + space is finite without a history bound; four documented negative-control + mutations each produce the expected counterexample (the model has teeth). Full + faithfulness ledger in [`proofs/tla/README.md`](../proofs/tla/README.md). + - **Decision: TLA+/PlusCal over Alloy.** The properties are temporal/safety + over an evolving state machine, which TLC checks directly; Alloy's relational + style fits structural questions less well here. + +- **SSH path — ProVerif** ([`proofs/proverif/`](../proofs/proverif/)). A symbolic + Dolev-Yao model with a compile-time `PINNED` toggle (via `m4`). Rather than + *declaring* the tunnel private when pinned (which would assume the conclusion), + it models the real **key material** — a host keypair and a client keypair, the + mutual authentication of the SSH use case — and the toggle flips only **which + host key the daemon accepts** (the known-good one, or whatever the network + offers on first contact). ProVerif then **derives** the first-contact MITM from + the attacker substituting its own host key (payload secrecy `false` unpinned, + with a key-substitution trace — leaf 4.2). The model **disentangles** which + assumption protects what: payload **confidentiality** rides on host-key pinning + (A4), while **command authenticity** at the remote rides on **client auth** + (the remote's `authorized_keys`) and so holds even on first contact — correcting + the earlier channel-toggle model's conflation. It also proves a **separation + theorem** — a privileged exec at the honest remote requires a human keypress + even with the channel fully compromised — making assumption A4 load-bearing and + explicit. Scope, the per-assumption attribution, the three negative controls, + and the now-honest injective-replay result (a genuine public-channel replay, no + longer a private-channel over-approximation; app-layer dedup discharged by the + TLA+ `ReplayImpossible` + Kani freshness proofs) are in + [`proofs/proverif/README.md`](../proofs/proverif/README.md). + - **Decision: ProVerif over Tamarin.** The model's job is one binary toggle + plus producing an attack derivation, which ProVerif's applied-pi calculus + expresses with least ceremony and proves fully automatically; Tamarin's + unbounded-state / inductive strengths aren't needed. + +- **Scope / residuals carried past Rung 4:** the `ApprovalStateMachine` model + abstracts the clock, env contents and decode to booleans and never-evicts the + replay set (conservative for replay); per-field content scanning stays with + Rung 3. The two **Extended Rung 4** TLA+ models lift the abstractions that + conservatism hides: [`ReplayWindow`](../proofs/tla/README.md#window-sizing--freshness--replay-retention) + restores a real clock + TTL eviction and proves the freshness ↔ retention + window relationship, and [`ConcurrentHandlers`](../proofs/tla/README.md#concurrent-handlers--dedup-toctou--tty-lock-serialisation) + runs concurrent handler threads to prove the atomic `try_insert` closes the + dedup TOCTOU and `tty_lock` serialises /dev/tty under all interleavings. The + ProVerif model treats SSH crypto as a perfect black box and abstracts time to + nonces + ordering. The two *accepted* residuals (the by-design unprivileged + auto-approve, and the A4 first-contact dependency) are now formally + **characterised**, not closed. + ### Rung 5 — Deductive verification of the core (EAL5–7 territory) For the few functions that *are* the invariant, prove functional correctness @@ -263,11 +320,12 @@ prefix-matching escapes documented in the the existing `security.md` / `security-audit.md` prose into the formal model / threats / requirements / constraints artifact, with no code change. A full CC-structured **Security Target** document is the remaining paper step. -2. CI-ify Rung 1 and turn the existing fuzz tests into **named properties** - (Rung 2). -3. Add **Kani** on the parsers and a **TLA+/Alloy** model of the approval + - policy state machine (Rungs 3–4) — the highest assurance-per-effort steps, - and they directly target findings F1/F2 and the replay invariant. +2. **Done** — CI-ified Rung 1 and turned the existing fuzz tests into **named + properties** (Rung 2). +3. **Done** — **Kani** on the parsers (Rung 3) and a **TLA+/PlusCal** model of + the approval + policy state machine plus a **ProVerif** model of the SSH path + (Rung 4) — the highest assurance-per-effort steps, directly targeting findings + F1/F2, the replay invariant, and the A4 first-contact gap. 4. Treat **Creusot / Verus** (Rung 5) and **seL4-style refinement** (Rung 6) as a stated long-horizon goal in the assurance case, with the trust assumptions written down now. @@ -280,7 +338,7 @@ prefix-matching escapes documented in the | 1 | Review + static analysis | clippy, cargo-audit, cargo-deny, cargo-geiger, Semgrep/MIRAI | display sanitization, `unsafe` surface | EAL1–3 | | 2 | Property-based testing | proptest, quickcheck | `shell_escape`, `parse_age`, display fields, policy flag | EAL3–4 | | 3 | Automated formal | Kani, Flux | parsers (no panic/overflow), validation boundary | EAL4–5 | -| 4 | Protocol verification | TLA+/PlusCal, Alloy, Tamarin/ProVerif | approval state machine, replay, SSH path | EAL5–6 | +| 4 ✓ | Protocol verification | TLA+/PlusCal (TLC), ProVerif ([`proofs/`](../proofs/)) | approval state machine, replay, SSH path | EAL5–6 | | 5 | Deductive verification | Creusot, Verus, Prusti | `validate_request`, dispatch, env allowlist | EAL6–7 | | 6 | Machine-checked refinement | seL4-style (Isabelle/Coq, Iris) | daemon logic refines abstract approval spec | EAL7 | diff --git a/docs/threat-model.md b/docs/threat-model.md index a074822..d689b1c 100644 --- a/docs/threat-model.md +++ b/docs/threat-model.md @@ -169,7 +169,18 @@ is unaccounted for. The residuals — already on record and accepted — are: yet implemented" in [architecture.md](architecture.md)); the `-v` server log is the only record. -These residuals are the natural inputs to the higher -[roadmap rungs](formalisation-roadmap.md#the-rigor-ladder): the policy-transition -residual (1.4/4.4) is what a TLA+/Alloy model (Rung 4) would pin down, and the -SSH residual (4.2) is what a Tamarin/ProVerif model (Rung 4) would make explicit. +These residuals were the inputs to **Rung 4**, now discharged +([roadmap](formalisation-roadmap.md#the-rigor-ladder)): the policy-transition +residual (1.4/4.4) is pinned by the TLC-checked state machine in +[`proofs/tla/`](../proofs/tla/) (`PolicyFlipsOnlyOnKeypress`, +`PrivilegedGateIndependentOfPolicy`), and the SSH residual (4.2) is made explicit +by the ProVerif model in [`proofs/proverif/`](../proofs/proverif/), which models +the real host-key and client-key material and **derives** the first-contact MITM +from the attacker substituting its own host key. The model disentangles which +assumption protects what: payload **confidentiality** rides on host-key pinning +(A4) — broken on first contact, the derived MITM trace — while **command +authenticity** at the remote rides on **client authentication** (the remote's +`authorized_keys`), so even a first-contact eavesdropper cannot forge or alter a +command the honest remote will run. The 1.4/4.4 *by-design* trade-off +(unprivileged auto-approve) and the 4.2 A4 dependency remain accepted residuals — +now formally characterised rather than only argued in prose. diff --git a/proofs/proverif/README.md b/proofs/proverif/README.md new file mode 100644 index 0000000..740cb1a --- /dev/null +++ b/proofs/proverif/README.md @@ -0,0 +1,172 @@ +# Rung 4 — ProVerif SSH-channel model + +[← formalisation roadmap](../../docs/formalisation-roadmap.md) · [threat model](../../docs/threat-model.md) · [assurance case](../../docs/assurance-case.md) + +A ProVerif symbolic (Dolev-Yao) model of sudo-proxy's SSH-tunnel path. It +discharges **Rung 4** (SSH half) of the +[formalisation roadmap](../../docs/formalisation-roadmap.md): it makes assumption +**A4** load-bearing and *explicit* by **deriving** the **first-contact MITM** +(attack-tree leaf **4.2**) from the attacker substituting its own host key — +rather than *assuming* a secure channel under pinning. It also states the +**separation theorem**: a privileged exec at the honest remote requires a human +keypress *even when the channel is fully attacker-controlled*. + +ProVerif was chosen over Tamarin: the model's job is one binary toggle +(key-acceptance) plus producing an attack derivation, which ProVerif's +applied-pi calculus expresses with least ceremony and proves fully +automatically. Tamarin's strengths (unbounded mutable state, AC reasoning, +inductive lemmas) aren't needed here. + +## What changed from the earlier version + +The earlier model toggled the SSH tunnel between a **private** and a **public** +channel. That *declared* the channel secure when pinned — it assumed the +conclusion instead of showing *why* pinning produces it. This version models the +real **key material**: the remote has a host keypair, the local daemon **accepts +a host key** (the known-good one when pinned; whatever the network offers on +first contact), and encrypts the request to the accepted key. ProVerif then +**derives** the MITM from the attacker substituting its own key. + +Modelling only the host key while ignoring client authentication would +misrepresent the system — it would paint the honest remote as accepting forged +commands even when pinned, which the remote's `authorized_keys` check actually +prevents. So the model captures the **real mutual-authentication structure** of +the SSH use case: + +- **host key** (server → client): pinning + payload confidentiality — **A4**; +- **client key** (client → server): `authorized_keys`, request integrity. + +The payoff is that the model **disentangles which assumption protects what**: + +| Property | Rides on | +|----------|----------| +| Payload **confidentiality** (and you're-talking-to-the-real-remote) | **host-key pinning** (A4) | +| **Command authenticity** at the remote (only genuinely-issued commands run) | **client auth** (`authorized_keys`) — so it holds *even on first contact* | +| **Separation** (channel compromise can't fabricate approvals) | the private TTY (A1) | + +## The toggle + +`ssh-channel.m4.pv` is an `m4` template. The `PINNED` macro flips only **which +host key the daemon accepts** — the channel is the public Dolev-Yao network in +**both** configurations: + +- **`PINNED`** → `pkAccepted = pkR`, the known-good host key from `known_hosts`. +- **unpinned** → `in(pub, pkAccepted)`, accept whatever the network offers; the + attacker answers with its own key. + +```sh +# pinned (known_hosts populated): expect the channel guarantees to hold +m4 -DPINNED ssh-channel.m4.pv > /tmp/pinned.pv && proverif /tmp/pinned.pv + +# first contact (no pinning): expect the secrecy MITM to surface +m4 ssh-channel.m4.pv > /tmp/unpinned.pv && proverif /tmp/unpinned.pv +``` + +(ProVerif is installed with `opam install proverif`; run via `opam exec -- +proverif …`.) + +## The four queries and expected outcomes + +| # | Query | Pinned | First contact | Rides on | +|---|-------|--------|---------------|----------| +| **(a)** | Authenticity/integrity: the honest remote accepts only a request the local daemon actually sent. `RemoteAccept ⟹ RequestSent`. | `true` | `true` | **client auth** | +| **(b)** | Replay resistance (injective): each accept maps to a *distinct* send. | `false` | `false` | app-layer dedup¹ | +| **(c)** | Secrecy of the forwarded-agent secret / tunnelled payload. `not attacker(agentSecret)`. | `true` | **`false`** + key-substitution MITM derivation | **host-key pinning (A4)** | +| **(d)** | Separation: the remote execs a command only after a human keypress for it. `RemoteExec ⟹ Keypress`. | `true` | `true` | private TTY (A1) | + +The **first-contact `(c)` = `false`** is the *derived, surfaced gap* — the +mechanically-exhibited leaf 4.2. ProVerif's derivation is exactly the +key-substitution MITM: the attacker mints its own host key `pk(k)`, the daemon +(first contact) accepts it and encrypts the payload to it, and the attacker +decrypts with `k` to recover `agentSecret`. A non-`true` result in the unpinned +run is therefore expected, not a CI failure (the workflow is non-gating and not +gated on ProVerif's exit code). + +The honest, valuable result is that **`(a)` holds in *both* columns**: command +authenticity at the remote rides on **client authentication** (the signature the +remote checks against `authorized_keys`), so the first-contact MITM — which +breaks confidentiality — still **cannot forge or alter a command** the honest +remote will run. The earlier model's "authenticity holds *iff* pinned" was an +artifact of the assumed-private channel; the key-level model corrects it. + +The **`(d)`** row holding in *both* columns is the reassuring result: an SSH +channel compromise does not fabricate approvals — the local human gate +(assumption A1, the private TTY) is not reachable by a network attacker. + +¹ **On (b) — stated plainly.** The model has no application-layer dedup, so a +captured ciphertext can be replayed and the remote re-accepts it: injective +`(b)` is genuinely `false` in **both** configurations, with a replay trace. (The +earlier private-channel model reported (b) "cannot be proved" when pinned — a +known ProVerif over-approximation of private-channel linearity; with everything +now on the public channel that artifact is **gone**, and the result is a real, +honest replay.) Application-layer injective replay-resistance is discharged by +the other rungs: the daemon's UUID dedup is the TLA+ +[`ReplayImpossible`](../tla/) invariant (Rung 4), and freshness monotonicity is +the Kani `ymd_hms_to_epoch_is_monotone` proof (Rung 3). The threat model already +states that replay protection "rides on SSH transport integrity (A4)", so +locating app-layer dedup in those rungs rather than here is faithful to the +model. + +## Negative controls (teeth) + +Documented mutations that each turn a passing query red, confirming the model +*sees* the property it claims. Apply by hand to the expanded `.pv`; never +committed. + +- **NC1 — client auth is load-bearing for (a).** Delete the + `let (=id, =c) = checksign(sg, pkL) in` line in `RemoteDaemon` (accept without + verifying the client signature) ⇒ `(a)` `RemoteAccept ⟹ RequestSent` turns + **`false`** *even pinned*: the attacker, knowing the public host key `pkR`, + encrypts a forged request to the honest remote. (Confidentiality `(c)` stays + `true` — pinning still protects it — which shows the two guarantees are + orthogonal.) +- **NC2 — pinning is load-bearing for (c).** The **first-contact run is this + control**: dropping pinning turns `(c)` `false` with the key-substitution + derivation above. (Equivalently, forcing `pkAccepted` to an attacker key under + `PINNED` breaks `(c)` too.) +- **NC3 — the private TTY is load-bearing for (d).** Change + `free ttyAck:channel [private].` to a public `free ttyAck:channel.` ⇒ `(d)` + `RemoteExec ⟹ Keypress` can no longer be proved: the attacker forges the + acknowledgement and the remote execs without a keypress — reproducing + "channel compromise bypasses the human gate" were the TTY not private (A1). + +All three were run and produce the expected result. + +## Scope (what this model abstracts away) + +Stated plainly, as the Kani "Scope" note and the TLA+ faithfulness ledger do: + +- **Symbolic Dolev-Yao, not computational.** Cryptographic *primitives* are + assumed perfect (keys unguessable, `aenc`/`sign` unbreakable, no + algebraic/padding side channels). We prove what holds *given* sound crypto. +- **SSH is a black box.** We model the authentication *structure* SSH realises — + a confidential channel to whoever's host key was accepted, plus client + authentication by key — and the *condition* for it (pinning). We do not + re-verify SSH's transport, key-exchange, or authentication internals; that is + established work. +- **Two modelled assumptions.** Host-key pinning (**A4**) and client + authentication (the remote's `authorized_keys`, previously implicit). The + model makes the second explicit because command authenticity rides on it, not + on pinning. +- **Time is abstracted** to nonce uniqueness + event ordering. The 60 s + freshness and 120 s dedup windows are not modelled numerically (that + arithmetic is Rung 3 Kani's domain, `ymd_hms_to_epoch`); application-layer + replay dedup is the TLA+ `ReplayImpossible` invariant's domain. +- **The human keypress is a trusted local action** (assumption A1): the approval + channel is private and not attacker-injectable. This assumption is precisely + what yields the separation theorem (d). +- **DoS, traffic analysis, and side channels are out of scope** (DoS is covered + by the resource-cap controls, not here). + +What it *does* establish: A4 is now a mechanically-**derived** dependency — +confidentiality holds iff the host key is pinned, with a concrete first-contact +key-substitution MITM trace for leaf 4.2 — the disentanglement attributes +command authenticity to client auth instead, and the separation theorem (d) +bounds the blast radius of an SSH compromise: it does not reach the local +approval gate. + +## Why no committed generated `.pv` + +Unlike the TLA+ artifact (where the `pcal.trans` output is committed), the m4 +expansion is trivial and deterministic, so only the `.m4.pv` template is checked +in; CI regenerates both `.pv` files with `m4`. diff --git a/proofs/proverif/ssh-channel.m4.pv b/proofs/proverif/ssh-channel.m4.pv new file mode 100644 index 0000000..011b51d --- /dev/null +++ b/proofs/proverif/ssh-channel.m4.pv @@ -0,0 +1,177 @@ +(***************************************************************************) +(* Rung 4 of the sudo-proxy formalisation roadmap: a ProVerif symbolic *) +(* (Dolev-Yao) model of the SSH-tunnel path. *) +(* *) +(* Its job is to make assumption A4 load-bearing and EXPLICIT by DERIVING *) +(* the first-contact MITM (attack-tree leaf 4.2) rather than assuming it. *) +(* The earlier version of this model toggled the tunnel between a PRIVATE *) +(* and a public channel -- it therefore *declared* the channel secure when *) +(* pinned instead of showing *why* pinning makes it so. This version *) +(* instead models the real key material: the remote has a host keypair, *) +(* the local daemon ACCEPTS a host key (the known-good one when pinned; *) +(* whatever the network offers on first contact), and encrypts the request *) +(* to the accepted key. ProVerif then derives the MITM from the attacker *) +(* substituting its own key. *) +(* *) +(* Because modelling only the host key while ignoring client auth would *) +(* misrepresent the system (it would paint the honest remote as accepting *) +(* forged commands even when pinned -- which the remote's authorized_keys *) +(* check actually prevents), we model the REAL mutual-authentication *) +(* structure of the SSH use case: *) +(* - host key (server -> client): pinning + confidentiality (A4); *) +(* - client key (client -> server): authorized_keys, request integrity. *) +(* The model then DISENTANGLES which assumption protects what: *) +(* - confidentiality of the payload rides on HOST-KEY PINNING; *) +(* - command authenticity at the remote rides on CLIENT AUTH -- and so *) +(* holds even on first contact. *) +(* *) +(* TOGGLE (compile-time, via m4) -- flips only WHICH host key the daemon *) +(* accepts; the channel is the public network in BOTH configurations: *) +(* m4 -DPINNED ssh-channel.m4.pv | proverif /dev/stdin -- pinned *) +(* m4 ssh-channel.m4.pv | proverif /dev/stdin -- first contact*) +(* See README.md for the expected outcomes and the scope statement. *) +(* *) +(* SCOPE (stated plainly, as the Kani/TLA+ artifacts do): *) +(* - Symbolic Dolev-Yao model: crypto PRIMITIVES are perfect (keys *) +(* unguessable, aenc/sign unbreakable). We model the authentication *) +(* STRUCTURE SSH realises (a confidential channel to whoever's host key *) +(* was accepted, plus client authentication by key), not SSH's *) +(* transport/key-exchange internals -- those stay a black box. *) +(* - Time is abstracted to nonce uniqueness + event ordering; the 60s *) +(* freshness and 120s dedup windows are not modelled numerically (that *) +(* arithmetic is Rung 3 Kani's domain, ymd_hms_to_epoch). App-layer *) +(* replay dedup is the TLA+ ReplayImpossible invariant's domain. *) +(* - The human keypress is a trusted local action (assumption A1): the *) +(* approval channel is private and not attacker-injectable. This is *) +(* what yields the separation theorem (d). *) +(* - DoS, traffic analysis, and side channels are out of scope. *) +(***************************************************************************) + +(* ----------------------------- channels ------------------------------- *) +free pub:channel. (* the Dolev-Yao network *) + +(* The trusted /dev/tty approval handshake (assumption A1): the daemon + shows a command, the human approves exactly that command. Private, so a + network attacker can neither read nor forge an approval. *) +free ttyShow:channel [private]. +free ttyAck:channel [private]. + +(* --------------------------- cryptography ----------------------------- *) +(* Host keypair (server -> client authentication + confidentiality): SSH's + host key, the one pinned in known_hosts. Asymmetric encryption to it + gives a confidential channel to its holder. *) +type skey. +type pkey. +fun pk(skey):pkey. +fun aenc(bitstring, pkey):bitstring. +reduc forall m:bitstring, k:skey; adec(aenc(m, pk(k)), k) = m. + +(* Client keypair (client -> server authentication): the local daemon's key, + trusted by the remote's authorized_keys. A signature authenticates the + request and binds its integrity. *) +type sskey. +type spkey. +fun spk(sskey):spkey. +fun sign(bitstring, sskey):bitstring. +reduc forall m:bitstring, k:sskey; checksign(sign(m, k), spk(k)) = m. + +(* The forwarded-agent secret / a stand-in for confidential tunnelled + payload. Secret unless the channel leaks it. *) +free agentSecret:bitstring [private]. + +(* ------------------------------ events -------------------------------- *) +event RequestSent(bitstring, bitstring). (* (id, cmd) emitted by local daemon *) +event RemoteAccept(bitstring, bitstring). (* (id, cmd) accepted by honest remote *) +event RemoteExec(bitstring). (* cmd executed by honest remote *) +event Keypress(bitstring). (* human approved cmd at the TTY *) + +(* ------------------------------ queries ------------------------------- *) + +(* (a) Authenticity / integrity: the honest remote accepts only a request + the local daemon actually sent. Rides on CLIENT AUTH (the signature + the remote checks against authorized_keys), so it holds INDEPENDENT + of host-key pinning -- the attacker cannot forge the daemon's + signature on a command of its choosing in either configuration. + PINNED: true. FIRST CONTACT: true. *) +query id:bitstring, c:bitstring; + event(RemoteAccept(id, c)) ==> event(RequestSent(id, c)). + +(* (b) Replay resistance (injective): each acceptance corresponds to a + DISTINCT send. The model has no application-layer dedup, so a captured + ciphertext can be replayed and the remote re-accepts it -- false in + BOTH configurations (a genuine public-channel replay, not a + private-channel over-approximation). App-layer injectivity is the + TLA+ ReplayImpossible invariant's domain (UUID dedup), and freshness + monotonicity is the Kani ymd_hms_to_epoch_is_monotone proof. *) +query id:bitstring, c:bitstring; + inj-event(RemoteAccept(id, c)) ==> inj-event(RequestSent(id, c)). + +(* (c) Secrecy of the forwarded-agent secret / tunnelled payload. Rides on + HOST-KEY PINNING: the daemon encrypts to the ACCEPTED host key, so + confidentiality holds iff that key is the honest remote's. + PINNED: true (not derivable). + FIRST CONTACT: false -- the attacker substitutes its own host key, + the daemon encrypts to it, the attacker decrypts. THE DERIVED MITM. *) +query attacker(agentSecret). + +(* (d) Separation theorem: the honest remote execs a command only after a + human keypress approving exactly that command -- holds in BOTH + configurations, i.e. even when the channel is fully compromised the + SSH compromise does NOT bypass the local human gate (A1, private TTY). *) +query c:bitstring; + event(RemoteExec(c)) ==> event(Keypress(c)). + +(* ----------------------------- processes ------------------------------ *) + +(* Emit the request: signed by the client key (authorized_keys auth + + integrity) and encrypted to the accepted host key (confidentiality). + Factored out so the two toggle branches share one tail. *) +let SendReq(skL:sskey, pkAccepted:pkey, id:bitstring, c:bitstring) = + out(pub, aenc((id, c, agentSecret, sign((id, c), skL)), pkAccepted)). + +(* The local daemon (run_remote): the MCP caller (adversary A1) chooses the + command; the daemon stamps a fresh UUID nonce, then ACCEPTS a host key -- + the pinned known-good key, or (first contact) whatever the network offers, + which the attacker can answer with its own key. *) +let LocalDaemon(skL:sskey, pkR:pkey) = + in(pub, c:bitstring); + new id:bitstring; + event RequestSent(id, c); +ifdef(`PINNED', + (* pinned: use the known-good host key from known_hosts *) + SendReq(skL, pkR, id, c) +, + (* first contact: accept whatever host key the network offers *) + in(pub, pkAccepted:pkey); + SendReq(skL, pkAccepted, id, c) +). + +(* The honest remote daemon: decrypts with its host secret key, verifies the + client signature against the daemon's public key (authorized_keys) and + that it covers exactly (id, c), then runs the unconditional approval gate + -- shows the command on the trusted TTY and executes only once the human + approves exactly that command. *) +let RemoteDaemon(skR:skey, pkL:spkey) = + in(pub, ct:bitstring); + let (id:bitstring, c:bitstring, s:bitstring, sg:bitstring) = adec(ct, skR) in + let (=id, =c) = checksign(sg, pkL) in + event RemoteAccept(id, c); + out(ttyShow, c); + in(ttyAck, =c); + event RemoteExec(c). + +(* The human at the trusted TTY: sees a command and approves it (worst case + -- approves whatever is shown; we prove exec=>keypress, not refusal). *) +let Human = + in(ttyShow, c:bitstring); + event Keypress(c); + out(ttyAck, c). + +process + new skR:skey; (* honest remote host keypair *) + new skL:sskey; (* local daemon client keypair *) + let pkR = pk(skR) in + let pkL = spk(skL) in + out(pub, pkR); (* host public key is public knowledge *) + out(pub, pkL); (* client public key is public knowledge *) + ( !LocalDaemon(skL, pkR) | !RemoteDaemon(skR, pkL) | !Human ) diff --git a/proofs/tla/.gitignore b/proofs/tla/.gitignore new file mode 100644 index 0000000..1c5bd5d --- /dev/null +++ b/proofs/tla/.gitignore @@ -0,0 +1,6 @@ +# TLC / pcal.trans generated artifacts (the committed .tla carries the +# translation; everything below is regenerated by a local model-check run). +states/ +*_TTrace_*.tla +*_TTrace_*.bin +*.old diff --git a/proofs/tla/ApprovalStateMachine.cfg b/proofs/tla/ApprovalStateMachine.cfg new file mode 100644 index 0000000..7ec4bd3 --- /dev/null +++ b/proofs/tla/ApprovalStateMachine.cfg @@ -0,0 +1,20 @@ +\* TLC model for the Rung 4 approval state machine. See README.md. +\* Run: java -cp tla2tools.jar tlc2.TLC -config ApprovalStateMachine.cfg \ +\* -workers auto ApprovalStateMachine.tla + +CONSTANTS + \* Two ids: the minimum to express both "replay the same id" and + \* "two distinct ids". TLC model values. + Ids = {r1, r2} + +SPECIFICATION Spec + +INVARIANTS + TypeOK + NoExecWithoutApproval + ReplayImpossible + PolicyFlipsOnlyOnKeypress + PrivilegedGateIndependentOfPolicy + +PROPERTIES + FlagMonotone diff --git a/proofs/tla/ApprovalStateMachine.tla b/proofs/tla/ApprovalStateMachine.tla new file mode 100644 index 0000000..61184a4 --- /dev/null +++ b/proofs/tla/ApprovalStateMachine.tla @@ -0,0 +1,372 @@ +-------------------------- MODULE ApprovalStateMachine -------------------------- +(***************************************************************************) +(* Rung 4 of the sudo-proxy formalisation roadmap: a TLA+/PlusCal model of *) +(* the approval state machine, model-checked by TLC. *) +(* *) +(* It pins the two accepted residuals the Rung 0 threat model routed here *) +(* (attack-tree leaves 1.4 / 4.4, finding F2): the `confirm_unprivileged` *) +(* policy transition and the unconditional human gate on privileged exec. *) +(* *) +(* The PlusCal algorithm below is the source of truth; the TLC-checkable *) +(* TLA+ translation between BEGIN/END TRANSLATION is generated from it by *) +(* `pcal.trans` and committed alongside (see README.md). If you edit the *) +(* PlusCal, re-run `pcal.trans` and commit both. *) +(* *) +(* The properties are tracked by bounded *monitor* variables (a standard *) +(* TLC idiom): a violation flag is raised at the exact site a bad thing *) +(* would happen, and the invariant asserts the flag stays FALSE. This *) +(* keeps every variable bounded, so the reachable state space is finite *) +(* and small with no history-length constraint -- processing more requests *) +(* only revisits states (once `seen` is full, further requests are *) +(* rejected as replays). *) +(* *) +(* Faithfulness ledger and the negative-control recipe that demonstrates *) +(* the model has teeth live in proofs/tla/README.md. *) +(***************************************************************************) +EXTENDS Naturals, FiniteSets + +CONSTANTS Ids \* the bounded set of request ids, e.g. {r1, r2} + +\* The entire input domain of tui::classify_key, abstracted: a keypress 'y', +\* an 'a' (always-allow), any other key, or a poll timeout (no keypress). +KeyChoice == {"y", "a", "other", "timeout"} + +\* A request as seen by the daemon. Every gate is abstracted to the boolean +\* "does this request pass that gate"; the contents the gate inspects (clock, +\* env strings, dangerous chars) are out of scope here -- see the ledger. +Requests == [ id : Ids, + privileged : BOOLEAN, + forwardAgent : BOOLEAN, + wellFormed : BOOLEAN, \* passes ValidatedRequest::validate + fresh : BOOLEAN, \* passes the freshness gate (<=60s) + envOk : BOOLEAN ] \* passes the env allowlist + +\* A placeholder request for the initial state; never handled (busy = FALSE at +\* init), so its field values are irrelevant. +NoReq == [ id |-> CHOOSE i \in Ids : TRUE, + privileged |-> FALSE, + forwardAgent |-> FALSE, + wellFormed |-> FALSE, + fresh |-> FALSE, + envOk |-> FALSE ] + +(* --algorithm ApprovalStateMachine + +variables + \* The persisted/in-memory policy flag (Arc), daemon-lifetime. + \* Init is a free boolean so both the default and `--no-confirm-unprivileged` + \* startup are covered. + confirmUnpriv \in BOOLEAN, + + \* Replay dedup (SeenIds): ids the daemon has accepted (whether the dispatch + \* then executed, denied, or timed out). Modelled as a monotonically growing + \* set -- eviction is dropped, which is conservative for ReplayImpossible + \* (see the ledger). + seen = {}, + + \* Ids that have actually been executed -- the witness set for replay. + executed = {}, + + \* The request currently on the wire and the operator's keypress for it. + \* Read by the daemon only while `busy`. + req = NoReq, + key = "timeout", + busy = FALSE, + + \* ---- monitor (violation) flags; each invariant asserts its flag is FALSE ---- + \* A privileged exec happened on a keypress other than 'y'. + vNoExec = FALSE, + \* An exec happened for an id that was already executed (a replayed exec). + vReplay = FALSE, + \* The policy flag was flipped other than by an 'a' keypress on an + \* unprivileged request. + vFlip = FALSE; + +define + \* A direct transcription of tui::classify_key(key, privileged). + \* ApprovedAlways is emitted iff key = "a" AND the request is unprivileged. + Classify(k, priv) == + IF k = "timeout" THEN "Timeout" + ELSE IF k = "y" THEN "Approved" + ELSE IF k = "a" /\ ~priv THEN "ApprovedAlways" + ELSE "Denied" +end define; + +\* Bookkeeping at every execution site. `isPriv` says whether this is the +\* privileged (exec_sudo) path. Raises the replay flag if this id already ran, +\* and -- on the privileged path -- the no-approval flag if the executing +\* keypress was not 'y'. So if any mutation routes a privileged exec onto a +\* non-'y' key, or lets an id execute twice, the monitor catches it here. +macro NoteExec(isPriv) begin + if req.id \in executed then + vReplay := TRUE; + end if; + if isPriv /\ key # "y" then + vNoExec := TRUE; + end if; + executed := executed \union {req.id}; +end macro; + +\* The single primitive that writes the policy flag. Any flip must go through +\* it; it raises the flip-provenance flag unless the writer is an 'a' keypress +\* on an unprivileged request -- so wiring a flip into any other branch (a +\* request field, a timeout, the privileged path) trips the monitor. +macro FlipFlag() begin + confirmUnpriv := FALSE; + if req.privileged \/ key # "a" then + vFlip := TRUE; + end if; +end macro; + +\* The environment / attacker: submits an arbitrary request (every field free, +\* including an id already in `seen` -- a replay) and an arbitrary operator +\* keypress. TLC's exhaustive search universally quantifies the properties over +\* all attacker forgeries / replays AND all operator choices. +process Env = "env" +begin +EnvLoop: + while TRUE do + await ~busy; + with r \in Requests, k \in KeyChoice do + req := r; + key := k; + busy := TRUE; + end with; + end while; +end process; + +\* The daemon: runs the gate chain in the SAME order as handle_connection (so a +\* reordering mutation is observable), then dispatches. One request is handled +\* to completion atomically (concurrency / TTY-lock interleavings are out of +\* scope for these properties -- see the ledger). +process Daemon = "daemon" +begin +Handle: + while TRUE do + await busy; + if ~req.wellFormed then + skip; \* ValidatedRequest::validate -> reject + elsif req.forwardAgent /\ req.privileged then + skip; \* forward_agent + privileged -> reject + elsif ~req.fresh then + skip; \* freshness gate -> reject + elsif ~req.envOk then + skip; \* env allowlist -> reject + elsif req.id \in seen then + skip; \* replay dedup (try_insert -> false) + else + \* Accept (try_insert -> true), then dispatch. + seen := seen \union {req.id}; + if req.privileged then + \* PRIVILEGED PATH -- never consults confirmUnpriv. + \* ApprovedAlways is impossible here (Classify guards on ~priv) + \* and the real code defensively folds it to Denied anyway. + if Classify(key, TRUE) = "Approved" then + NoteExec(TRUE); \* exec_sudo + elsif Classify(key, TRUE) = "Timeout" then + skip; \* timeout, default deny + else + skip; \* denied + end if; + elsif confirmUnpriv then + \* UNPRIVILEGED + confirmation ON. + if Classify(key, FALSE) = "Approved" then + NoteExec(FALSE); \* exec_direct + elsif Classify(key, FALSE) = "ApprovedAlways" then + FlipFlag(); \* the ONLY flag writer + NoteExec(FALSE); \* exec_direct + elsif Classify(key, FALSE) = "Timeout" then + skip; + else + skip; \* denied + end if; + else + \* UNPRIVILEGED + confirmation OFF: no prompt, exec directly. + NoteExec(FALSE); \* exec_direct + end if; + end if; + busy := FALSE; + end while; +end process; + +end algorithm; *) +\* BEGIN TRANSLATION (chksum(pcal) = "5576622c" /\ chksum(tla) = "3c1dd0ad") +VARIABLES confirmUnpriv, seen, executed, req, key, busy, vNoExec, vReplay, + vFlip + +(* define statement *) +Classify(k, priv) == + IF k = "timeout" THEN "Timeout" + ELSE IF k = "y" THEN "Approved" + ELSE IF k = "a" /\ ~priv THEN "ApprovedAlways" + ELSE "Denied" + + +vars == << confirmUnpriv, seen, executed, req, key, busy, vNoExec, vReplay, + vFlip >> + +ProcSet == {"env"} \cup {"daemon"} + +Init == (* Global variables *) + /\ confirmUnpriv \in BOOLEAN + /\ seen = {} + /\ executed = {} + /\ req = NoReq + /\ key = "timeout" + /\ busy = FALSE + /\ vNoExec = FALSE + /\ vReplay = FALSE + /\ vFlip = FALSE + +Env == /\ ~busy + /\ \E r \in Requests: + \E k \in KeyChoice: + /\ req' = r + /\ key' = k + /\ busy' = TRUE + /\ UNCHANGED << confirmUnpriv, seen, executed, vNoExec, vReplay, vFlip >> + +Daemon == /\ busy + /\ IF ~req.wellFormed + THEN /\ TRUE + /\ UNCHANGED << confirmUnpriv, seen, executed, vNoExec, + vReplay, vFlip >> + ELSE /\ IF req.forwardAgent /\ req.privileged + THEN /\ TRUE + /\ UNCHANGED << confirmUnpriv, seen, executed, + vNoExec, vReplay, vFlip >> + ELSE /\ IF ~req.fresh + THEN /\ TRUE + /\ UNCHANGED << confirmUnpriv, seen, + executed, vNoExec, + vReplay, vFlip >> + ELSE /\ IF ~req.envOk + THEN /\ TRUE + /\ UNCHANGED << confirmUnpriv, + seen, + executed, + vNoExec, + vReplay, + vFlip >> + ELSE /\ IF req.id \in seen + THEN /\ TRUE + /\ UNCHANGED << confirmUnpriv, + seen, + executed, + vNoExec, + vReplay, + vFlip >> + ELSE /\ seen' = (seen \union {req.id}) + /\ IF req.privileged + THEN /\ IF Classify(key, TRUE) = "Approved" + THEN /\ IF req.id \in executed + THEN /\ vReplay' = TRUE + ELSE /\ TRUE + /\ UNCHANGED vReplay + /\ IF TRUE /\ key # "y" + THEN /\ vNoExec' = TRUE + ELSE /\ TRUE + /\ UNCHANGED vNoExec + /\ executed' = (executed \union {req.id}) + ELSE /\ IF Classify(key, TRUE) = "Timeout" + THEN /\ TRUE + ELSE /\ TRUE + /\ UNCHANGED << executed, + vNoExec, + vReplay >> + /\ UNCHANGED << confirmUnpriv, + vFlip >> + ELSE /\ IF confirmUnpriv + THEN /\ IF Classify(key, FALSE) = "Approved" + THEN /\ IF req.id \in executed + THEN /\ vReplay' = TRUE + ELSE /\ TRUE + /\ UNCHANGED vReplay + /\ IF FALSE /\ key # "y" + THEN /\ vNoExec' = TRUE + ELSE /\ TRUE + /\ UNCHANGED vNoExec + /\ executed' = (executed \union {req.id}) + /\ UNCHANGED << confirmUnpriv, + vFlip >> + ELSE /\ IF Classify(key, FALSE) = "ApprovedAlways" + THEN /\ confirmUnpriv' = FALSE + /\ IF req.privileged \/ key # "a" + THEN /\ vFlip' = TRUE + ELSE /\ TRUE + /\ vFlip' = vFlip + /\ IF req.id \in executed + THEN /\ vReplay' = TRUE + ELSE /\ TRUE + /\ UNCHANGED vReplay + /\ IF FALSE /\ key # "y" + THEN /\ vNoExec' = TRUE + ELSE /\ TRUE + /\ UNCHANGED vNoExec + /\ executed' = (executed \union {req.id}) + ELSE /\ IF Classify(key, FALSE) = "Timeout" + THEN /\ TRUE + ELSE /\ TRUE + /\ UNCHANGED << confirmUnpriv, + executed, + vNoExec, + vReplay, + vFlip >> + ELSE /\ IF req.id \in executed + THEN /\ vReplay' = TRUE + ELSE /\ TRUE + /\ UNCHANGED vReplay + /\ IF FALSE /\ key # "y" + THEN /\ vNoExec' = TRUE + ELSE /\ TRUE + /\ UNCHANGED vNoExec + /\ executed' = (executed \union {req.id}) + /\ UNCHANGED << confirmUnpriv, + vFlip >> + /\ busy' = FALSE + /\ UNCHANGED << req, key >> + +Next == Env \/ Daemon + +Spec == Init /\ [][Next]_vars + +\* END TRANSLATION + +\* ===================== invariants & properties ===================== + +\* Modeling-hygiene type invariant. +TypeOK == + /\ confirmUnpriv \in BOOLEAN + /\ seen \subseteq Ids + /\ executed \subseteq Ids + /\ req \in Requests + /\ key \in KeyChoice + /\ busy \in BOOLEAN + /\ vNoExec \in BOOLEAN + /\ vReplay \in BOOLEAN + /\ vFlip \in BOOLEAN + +\* P1: a privileged exec happens only on a 'y' keypress -- never on timeout, +\* denial, replay, or any policy state. (leaf 1.4 / G5) +NoExecWithoutApproval == ~vNoExec + +\* P2: the same request id never causes two executions. (leaf 1.1 / G2.1) +ReplayImpossible == ~vReplay + +\* P3: the policy flag flips only via an ApprovedAlways ('a') keypress on an +\* unprivileged request -- never a request field, replay, MCP flag, or timeout. +\* (leaf 1.4 / 4.4 / F2) +PolicyFlipsOnlyOnKeypress == ~vFlip + +\* P4: no privileged exec without a 'y' keypress, for ANY value the policy flag +\* took. The privileged branch structurally never reads confirmUnpriv, so this +\* is the "independent of policy" reading of the same witness as P1; stated +\* separately for assurance-case traceability (G5.1) and tripped by the NC2 +\* mutation that wires the flag into the privileged path. (leaf 4.4 / G5.1) +PrivilegedGateIndependentOfPolicy == ~vNoExec + +\* P3 (temporal half): once the flag is FALSE it stays FALSE -- it only ever +\* moves true->false, and only via the single writer above. +FlagMonotone == [][confirmUnpriv' => confirmUnpriv]_confirmUnpriv + +============================================================================= diff --git a/proofs/tla/ConcurrentHandlers.cfg b/proofs/tla/ConcurrentHandlers.cfg new file mode 100644 index 0000000..2231531 --- /dev/null +++ b/proofs/tla/ConcurrentHandlers.cfg @@ -0,0 +1,19 @@ +\* TLC model for the Extended Rung 4 concurrent-handler interleaving model. +\* See README.md. +\* Run: java -cp tla2tools.jar tlc2.TLC -config ConcurrentHandlers.cfg \ +\* -workers auto ConcurrentHandlers.tla + +CONSTANTS + \* Two concurrent handlers: the minimum that exposes the pairwise dedup + \* TOCTOU and the TTY-lock race. A third only adds symmetric interleavings. + Handlers = {h1, h2} + \* Two ids: lets two handlers race the SAME id (the TOCTOU) and also handle + \* two distinct ids. TLC model values. + Ids = {r1, r2} + +SPECIFICATION Spec + +INVARIANTS + TypeOK + NoDoubleExec + TtyMutualExclusion diff --git a/proofs/tla/ConcurrentHandlers.tla b/proofs/tla/ConcurrentHandlers.tla new file mode 100644 index 0000000..3b72dea --- /dev/null +++ b/proofs/tla/ConcurrentHandlers.tla @@ -0,0 +1,364 @@ +-------------------------- MODULE ConcurrentHandlers -------------------------- +(***************************************************************************) +(* Extended Rung 4 of the sudo-proxy formalisation roadmap: a TLA+/PlusCal *) +(* model of *concurrent* daemon connection handlers interleaving on the *) +(* two pieces of state they share, model-checked by TLC. *) +(* *) +(* The sibling `ApprovalStateMachine` and `ReplayWindow` models handle one *) +(* request to completion atomically, so they cannot see -- by construction *) +(* -- the two concurrency hazards that `server::handle_connection` guards *) +(* against when many threads run at once: *) +(* *) +(* 1. the dedup TOCTOU. `SeenIds::try_insert` (server.rs:213) folds the *) +(* contains-check and the insert into ONE critical section under the *) +(* `seen_ids` mutex precisely so two threads racing the same id cannot *) +(* both pass dedup. This model interleaves handlers on `seen` and *) +(* checks that the atomic critical section is *sufficient*: no id ever *) +(* executes twice under any interleaving. *) +(* *) +(* 2. the TTY-lock serialisation. The privileged prompt and the *) +(* foreground exec each take `tty_lock` (server.rs:555 / executor *) +(* ForegroundGuard) so two handlers never drive /dev/tty at once -- *) +(* without it the daemon would sit in a background pgrp and EIO *) +(* (PR #22). This model checks the lock keeps the interactive region *) +(* mutually exclusive across all interleavings. *) +(* *) +(* As in the sibling models, the PlusCal algorithm is the source of truth; *) +(* the TLC-checkable translation between BEGIN/END TRANSLATION is generated *) +(* by `pcal.trans` and committed alongside. If you edit the PlusCal, re-run *) +(* `pcal.trans` and commit both. Properties are tracked either by a bounded *) +(* witness counter (`ttyActive`) or a monitor flag (`vDoubleExec`) so every *) +(* variable stays bounded and the reachable state space is finite and small. *) +(* *) +(* Negative-control recipe and faithfulness ledger live in README.md. *) +(***************************************************************************) +EXTENDS Naturals, FiniteSets + +CONSTANTS Handlers, \* the set of concurrent handler threads, e.g. {h1, h2} + Ids \* the bounded set of request ids, e.g. {r1, r2} + +\* Sentinel for "lock is free"; distinct from every handler id. +NoHolder == "none" + +\* Keypress abstraction restricted to what matters for concurrency: "y" +\* approves (so the exec site is reached), "other" denies/times out (so the +\* TTY lock is released without executing). The full keypress decision table +\* and the confirm_unprivileged flip are the ApprovalStateMachine model's job; +\* here no "a" is offered, so confirmUnpriv is read-only. +Keys == {"y", "other"} + +(* --algorithm ConcurrentHandlers + +variables + \* SeenIds: ids the daemon has accepted. Eviction is out of scope here + \* (the ReplayWindow model owns the TTL); `seen` grows monotonically, + \* which is sound for the concurrency properties. The mutex guarding it. + seen = {}, + seenLock = NoHolder, + + \* The shared TTY lock (Arc>). Held across an interactive prompt + \* and re-taken by ForegroundGuard for a privileged exec. + ttyLock = NoHolder, + + \* The persisted policy flag (Arc). Free at init so both the + \* confirm-on and no-confirm dispatch paths are covered; never flipped here. + confirmUnpriv \in BOOLEAN, + + \* Ids that have actually executed -- the witness set for double-exec. + executed = {}, + + \* Witness counter: how many handlers are in the interactive TTY region + \* (prompt or foreground exec) at once. The lock must keep this <= 1. + ttyActive = 0, + + \* Monitor flag: raised at an exec site if this id already executed, i.e. + \* the dedup let the same id through twice (the concurrent TOCTOU). + vDoubleExec = FALSE; + +\* Record an execution of `id`; raise the double-exec monitor if it ran before. +macro NoteExec(id) begin + if id \in executed then + vDoubleExec := TRUE; + end if; + executed := executed \union {id}; +end macro; + +\* One connection handler thread. Picks its request (id + privileged + key) +\* nondeterministically -- the gate chain (validate/freshness/env) is the +\* sibling model's job, so every request here is assumed past those gates and +\* the focus is the shared-state races. Single-shot: handles one request then +\* terminates, which bounds the state space. +process Handler \in Handlers +variables + id = CHOOSE i \in Ids : TRUE, + priv = FALSE, + key = "y", + dup = FALSE; +begin +Pick: + with i \in Ids, p \in BOOLEAN, k \in Keys do + id := i; priv := p; key := k; + end with; + +\* ---- dedup critical section: the faithful ATOMIC try_insert ---- +SeenAcq: + await seenLock = NoHolder; \* lock_recover(&seen_ids) + seenLock := self; +TryInsert: + \* evict_stale() is a no-op here (eviction = ReplayWindow's job). The + \* contains-check and the insert are ONE atomic step under the lock -- + \* the single critical section that closes the TOCTOU. (To see the model + \* has teeth, the README NC1 splits this into check / release / insert.) + if id \in seen then + dup := TRUE; + else + seen := seen \union {id}; + dup := FALSE; + end if; + seenLock := NoHolder; + +Dispatch: + if dup then + goto Done; \* duplicate request id -> rejected, never execs + elsif priv then + goto PrivPrompt; + elsif confirmUnpriv then + goto ConfirmPrompt; + else + goto AutoExec; + end if; + +\* ---- privileged: TTY lock held across the prompt, released, re-taken to exec ---- +PrivPrompt: + await ttyLock = NoHolder; \* blocking lock_recover(&tty_lock) for the prompt + ttyLock := self; + ttyActive := ttyActive + 1; +PrivPromptEnd: + ttyActive := ttyActive - 1; + ttyLock := NoHolder; \* released before exec (server.rs:553) + if key # "y" then goto Done; end if; \* denied / timeout -> default deny +PrivExecAcq: + await ttyLock = NoHolder; \* ForegroundGuard::take re-acquires (blocking) + ttyLock := self; + ttyActive := ttyActive + 1; +PrivExec: + NoteExec(id); \* exec_sudo + ttyActive := ttyActive - 1; + ttyLock := NoHolder; + goto Done; + +\* ---- unprivileged + confirm ON: blocking prompt, then exec_direct (no TTY) ---- +ConfirmPrompt: + await ttyLock = NoHolder; \* blocking lock_recover(&tty_lock) + ttyLock := self; + ttyActive := ttyActive + 1; +ConfirmPromptEnd: + ttyActive := ttyActive - 1; + ttyLock := NoHolder; + if key # "y" then goto Done; end if; +ConfirmExec: + NoteExec(id); \* exec_direct -- takes NO tty_lock + goto Done; + +\* ---- unprivileged + confirm OFF: best-effort banner via try_lock, then exec ---- +AutoExec: + \* `if let Ok(_g) = tty_lock.try_lock() { display_banner }`: non-blocking + \* and released immediately. Modelled as atomic with no lasting hold, so it + \* never overlaps an interactive region -- skipped when the lock is busy. + if ttyLock = NoHolder then + skip; \* banner under a momentary try_lock + end if; + NoteExec(id); \* exec_direct + goto Done; \* "Done" is PlusCal's implicit terminal label +end process; + +end algorithm; *) + +\* BEGIN TRANSLATION +VARIABLES seen, seenLock, ttyLock, confirmUnpriv, executed, ttyActive, + vDoubleExec, pc, id, priv, key, dup + +vars == << seen, seenLock, ttyLock, confirmUnpriv, executed, ttyActive, + vDoubleExec, pc, id, priv, key, dup >> + +ProcSet == (Handlers) + +Init == (* Global variables *) + /\ seen = {} + /\ seenLock = NoHolder + /\ ttyLock = NoHolder + /\ confirmUnpriv \in BOOLEAN + /\ executed = {} + /\ ttyActive = 0 + /\ vDoubleExec = FALSE + (* Process Handler *) + /\ id = [self \in Handlers |-> CHOOSE i \in Ids : TRUE] + /\ priv = [self \in Handlers |-> FALSE] + /\ key = [self \in Handlers |-> "y"] + /\ dup = [self \in Handlers |-> FALSE] + /\ pc = [self \in ProcSet |-> "Pick"] + +Pick(self) == /\ pc[self] = "Pick" + /\ \E i \in Ids: + \E p \in BOOLEAN: + \E k \in Keys: + /\ id' = [id EXCEPT ![self] = i] + /\ priv' = [priv EXCEPT ![self] = p] + /\ key' = [key EXCEPT ![self] = k] + /\ pc' = [pc EXCEPT ![self] = "SeenAcq"] + /\ UNCHANGED << seen, seenLock, ttyLock, confirmUnpriv, executed, + ttyActive, vDoubleExec, dup >> + +SeenAcq(self) == /\ pc[self] = "SeenAcq" + /\ seenLock = NoHolder + /\ seenLock' = self + /\ pc' = [pc EXCEPT ![self] = "TryInsert"] + /\ UNCHANGED << seen, ttyLock, confirmUnpriv, executed, + ttyActive, vDoubleExec, id, priv, key, dup >> + +TryInsert(self) == /\ pc[self] = "TryInsert" + /\ IF id[self] \in seen + THEN /\ dup' = [dup EXCEPT ![self] = TRUE] + /\ seen' = seen + ELSE /\ seen' = (seen \union {id[self]}) + /\ dup' = [dup EXCEPT ![self] = FALSE] + /\ seenLock' = NoHolder + /\ pc' = [pc EXCEPT ![self] = "Dispatch"] + /\ UNCHANGED << ttyLock, confirmUnpriv, executed, ttyActive, + vDoubleExec, id, priv, key >> + +Dispatch(self) == /\ pc[self] = "Dispatch" + /\ IF dup[self] + THEN /\ pc' = [pc EXCEPT ![self] = "Done"] + ELSE /\ IF priv[self] + THEN /\ pc' = [pc EXCEPT ![self] = "PrivPrompt"] + ELSE /\ IF confirmUnpriv + THEN /\ pc' = [pc EXCEPT ![self] = "ConfirmPrompt"] + ELSE /\ pc' = [pc EXCEPT ![self] = "AutoExec"] + /\ UNCHANGED << seen, seenLock, ttyLock, confirmUnpriv, + executed, ttyActive, vDoubleExec, id, priv, + key, dup >> + +PrivPrompt(self) == /\ pc[self] = "PrivPrompt" + /\ ttyLock = NoHolder + /\ ttyLock' = self + /\ ttyActive' = ttyActive + 1 + /\ pc' = [pc EXCEPT ![self] = "PrivPromptEnd"] + /\ UNCHANGED << seen, seenLock, confirmUnpriv, executed, + vDoubleExec, id, priv, key, dup >> + +PrivPromptEnd(self) == /\ pc[self] = "PrivPromptEnd" + /\ ttyActive' = ttyActive - 1 + /\ ttyLock' = NoHolder + /\ IF key[self] # "y" + THEN /\ pc' = [pc EXCEPT ![self] = "Done"] + ELSE /\ pc' = [pc EXCEPT ![self] = "PrivExecAcq"] + /\ UNCHANGED << seen, seenLock, confirmUnpriv, executed, + vDoubleExec, id, priv, key, dup >> + +PrivExecAcq(self) == /\ pc[self] = "PrivExecAcq" + /\ ttyLock = NoHolder + /\ ttyLock' = self + /\ ttyActive' = ttyActive + 1 + /\ pc' = [pc EXCEPT ![self] = "PrivExec"] + /\ UNCHANGED << seen, seenLock, confirmUnpriv, executed, + vDoubleExec, id, priv, key, dup >> + +PrivExec(self) == /\ pc[self] = "PrivExec" + /\ IF id[self] \in executed + THEN /\ vDoubleExec' = TRUE + ELSE /\ TRUE + /\ UNCHANGED vDoubleExec + /\ executed' = (executed \union {id[self]}) + /\ ttyActive' = ttyActive - 1 + /\ ttyLock' = NoHolder + /\ pc' = [pc EXCEPT ![self] = "Done"] + /\ UNCHANGED << seen, seenLock, confirmUnpriv, id, priv, key, + dup >> + +ConfirmPrompt(self) == /\ pc[self] = "ConfirmPrompt" + /\ ttyLock = NoHolder + /\ ttyLock' = self + /\ ttyActive' = ttyActive + 1 + /\ pc' = [pc EXCEPT ![self] = "ConfirmPromptEnd"] + /\ UNCHANGED << seen, seenLock, confirmUnpriv, executed, + vDoubleExec, id, priv, key, dup >> + +ConfirmPromptEnd(self) == /\ pc[self] = "ConfirmPromptEnd" + /\ ttyActive' = ttyActive - 1 + /\ ttyLock' = NoHolder + /\ IF key[self] # "y" + THEN /\ pc' = [pc EXCEPT ![self] = "Done"] + ELSE /\ pc' = [pc EXCEPT ![self] = "ConfirmExec"] + /\ UNCHANGED << seen, seenLock, confirmUnpriv, + executed, vDoubleExec, id, priv, key, + dup >> + +ConfirmExec(self) == /\ pc[self] = "ConfirmExec" + /\ IF id[self] \in executed + THEN /\ vDoubleExec' = TRUE + ELSE /\ TRUE + /\ UNCHANGED vDoubleExec + /\ executed' = (executed \union {id[self]}) + /\ pc' = [pc EXCEPT ![self] = "Done"] + /\ UNCHANGED << seen, seenLock, ttyLock, confirmUnpriv, + ttyActive, id, priv, key, dup >> + +AutoExec(self) == /\ pc[self] = "AutoExec" + /\ IF ttyLock = NoHolder + THEN /\ TRUE + ELSE /\ TRUE + /\ IF id[self] \in executed + THEN /\ vDoubleExec' = TRUE + ELSE /\ TRUE + /\ UNCHANGED vDoubleExec + /\ executed' = (executed \union {id[self]}) + /\ pc' = [pc EXCEPT ![self] = "Done"] + /\ UNCHANGED << seen, seenLock, ttyLock, confirmUnpriv, + ttyActive, id, priv, key, dup >> + +Handler(self) == Pick(self) \/ SeenAcq(self) \/ TryInsert(self) + \/ Dispatch(self) \/ PrivPrompt(self) + \/ PrivPromptEnd(self) \/ PrivExecAcq(self) + \/ PrivExec(self) \/ ConfirmPrompt(self) + \/ ConfirmPromptEnd(self) \/ ConfirmExec(self) + \/ AutoExec(self) + +(* Allow infinite stuttering to prevent deadlock on termination. *) +Terminating == /\ \A self \in ProcSet: pc[self] = "Done" + /\ UNCHANGED vars + +Next == (\E self \in Handlers: Handler(self)) + \/ Terminating + +Spec == Init /\ [][Next]_vars + +Termination == <>(\A self \in ProcSet: pc[self] = "Done") + +\* END TRANSLATION + +\* ===================== invariants & properties ===================== + +\* Modeling-hygiene type invariant. +TypeOK == + /\ seen \subseteq Ids + /\ executed \subseteq Ids + /\ seenLock \in Handlers \cup {NoHolder} + /\ ttyLock \in Handlers \cup {NoHolder} + /\ confirmUnpriv \in BOOLEAN + /\ ttyActive \in 0..Cardinality(Handlers) + /\ vDoubleExec \in BOOLEAN + +\* C1: no request id ever executes twice, under ANY interleaving of the +\* concurrent handlers. This is the closure of the dedup TOCTOU: the atomic +\* `try_insert` critical section is *sufficient* to serialise the same-id race. +\* (leaf 1.1 / G2.1 -- the concurrent half of ReplayImpossible.) +NoDoubleExec == ~vDoubleExec + +\* C2: at most one handler is in the interactive TTY region (prompt or +\* foreground exec) at any time -- the `tty_lock` serialises /dev/tty across +\* all interleavings, so the daemon never drives the TTY from two handlers at +\* once (the PR #22 background-pgrp / EIO hazard). (leaf Sn5.x / G6) +TtyMutualExclusion == ttyActive <= 1 + +============================================================================= diff --git a/proofs/tla/README.md b/proofs/tla/README.md new file mode 100644 index 0000000..41c1aeb --- /dev/null +++ b/proofs/tla/README.md @@ -0,0 +1,385 @@ +# Rung 4 — TLA+/PlusCal models + +[← formalisation roadmap](../../docs/formalisation-roadmap.md) · [threat model](../../docs/threat-model.md) · [assurance case](../../docs/assurance-case.md) + +This directory holds **three** TLA+/PlusCal models, all model-checked by TLC: + +1. **`ApprovalStateMachine`** (below) — the approval state machine: the four + safety properties of the gate chain and `classify_key`. +2. **[`ReplayWindow`](#window-sizing--freshness--replay-retention) (Extended Rung 4)** — + the temporal freshness ↔ replay-retention **window-sizing** property that the + first model deliberately abstracts away (it never evicts). Restores a small + concrete clock + real TTL eviction and proves the genuinely temporal claim. +3. **[`ConcurrentHandlers`](#concurrent-handlers--dedup-toctou--tty-lock-serialisation) (Extended Rung 4)** — + *concurrent* handler threads interleaving on the shared `seen` set and the TTY + lock. Proves the atomic `try_insert` critical section is *sufficient* (no + double-exec under any interleaving) and that the TTY lock serialises /dev/tty — + the canonical model-checking case the two atomic, one-request-at-a-time models + above cannot express. + +--- + +## Approval state machine + +A TLA+/PlusCal model of sudo-proxy's approval state machine, model-checked by +TLC. It discharges **Rung 4** (state-machine half) of the +[formalisation roadmap](../../docs/formalisation-roadmap.md): the temporal / +relational properties that are not per-function and that the Rung 0 threat model +routed here — attack-tree leaves **1.4 / 4.4** (the `confirm_unprivileged` +policy transition, finding **F2**) and the unconditional human gate on +privileged execution. + +`ApprovalStateMachine.tla` carries the PlusCal algorithm in a comment block (the +source of truth) followed by the `pcal.trans`-generated TLA+ translation that TLC +checks. `ApprovalStateMachine.cfg` is the bounded model. The Rung 3 Kani sibling +lives in [`src/proofs.rs`](../../src/proofs.rs); this is the protocol-level +analogue. + +## The four properties + +Tracked by bounded **monitor variables** — a violation flag is raised at the +exact site the bad thing would happen, and the invariant asserts the flag stays +`FALSE`. (This keeps every variable bounded, so the reachable state space is +finite and small with *no* history-length constraint: once `seen` fills, further +requests are rejected as replays and only revisit states.) + +| # | Invariant | Claim | Maps to | +|---|-----------|-------|---------| +| **P1** | `NoExecWithoutApproval` | A privileged exec happens only on a `y` keypress — never on timeout, denial, replay, or any policy state. | leaf 1.4 · G5 | +| **P2** | `ReplayImpossible` | The same request id never causes two executions. | leaf 1.1 · G2.1 | +| **P3** | `PolicyFlipsOnlyOnKeypress` (+ action property `FlagMonotone`) | The policy flag flips only via an `a` keypress on an *unprivileged* request — never a request field, replay, MCP flag, or timeout — and once `FALSE` stays `FALSE`. | leaf 1.4 / 4.4 · F2 | +| **P4** | `PrivilegedGateIndependentOfPolicy` | No privileged exec without a `y` keypress, for *any* value the policy flag took. The privileged branch structurally never reads the flag; stated separately for traceability. | leaf 4.4 · G5.1 | + +`Classify(k, priv)` in the spec is a direct transcription of +[`tui::classify_key`](../../src/tui.rs); the daemon's gate chain mirrors the +ordering in [`server.rs` `handle_connection`](../../src/server.rs). + +## Running TLC + +Needs a JRE (21+) and `tla2tools.jar` (pin a release, e.g. `v1.8.0`): + +```sh +curl -sSL -o tla2tools.jar \ + https://github.com/tlaplus/tlaplus/releases/download/v1.8.0/tla2tools.jar + +# (only if you edited the PlusCal) re-generate the translation, then commit both: +java -cp tla2tools.jar pcal.trans ApprovalStateMachine.tla + +java -cp tla2tools.jar tlc2.TLC -nowarning \ + -config ApprovalStateMachine.cfg -workers auto ApprovalStateMachine.tla +``` + +Expected: `Model checking completed. No error has been found.` (~9k distinct +states, well under a second). The committed `.tla` already contains the +translation, so CI does not re-run `pcal.trans`. + +## Negative controls — that the model has teeth + +The analogue of the Kani negative controls: documented mutations that **must** +each make TLC produce a counterexample. Apply by hand to the PlusCal, re-run +`pcal.trans`, re-check, then revert. **Do not commit the mutations.** Each was +verified to violate exactly the listed invariant. + +| # | Mutation (in the PlusCal) | Violates | +|---|---------------------------|----------| +| **NC1** | Privileged `Timeout` branch: replace `skip;` with `NoteExec(TRUE);` (exec on timeout). | `NoExecWithoutApproval` | +| **NC2** | Wire the flag into the privileged gate: prepend `if confirmUnpriv then NoteExec(TRUE); elsif …` to the privileged dispatch. | `NoExecWithoutApproval` (= `PrivilegedGateIndependentOfPolicy` — same witness) | +| **NC3** | Remove the dedup gate: change `elsif req.id \in seen then` to `elsif FALSE then`. | `ReplayImpossible` | +| **NC4** | Flip the flag illegitimately: replace the `skip;` in the unprivileged `Timeout` branch with `FlipFlag();`. | `PolicyFlipsOnlyOnKeypress` | + +Note NC4 leaves `FlagMonotone` *satisfied* (a `TRUE→FALSE` flip is monotone-OK): +it is the provenance monitor `PolicyFlipsOnlyOnKeypress`, not monotonicity, that +catches an illegitimate flip — which is the point of having both. + +## Faithfulness ledger + +What the model claims, and what it does not — the analogue of the Kani "Scope" +note in the [roadmap](../../docs/formalisation-roadmap.md). + +**Modelled faithfully** + +- The gate chain *ordering* (validate → forward_agent+privileged → freshness → + env allowlist → replay dedup → dispatch), so a reordering regression is + observable. +- `classify_key`'s exact decision table (`Classify`), including that + `ApprovedAlways` is emitted iff `a` ∧ unprivileged. +- Both dispatch branches, the defensive `ApprovedAlways → Denied` fold on the + privileged path, and that `confirm_unprivileged` is written by exactly one + primitive (`FlipFlag`). +- Replay rejection via a monotonically-growing `seen` set. +- The attacker and the operator are both fully nondeterministic, so the + properties are universally quantified over all field forgeries / replays and + all operator choices. + +**Abstracted (sound for these four properties)** + +- The clock / freshness check → a boolean `fresh`; env contents → `envOk`; + decode + peer-auth → `wellFormed`. The properties don't depend on the contents + these gates inspect, only on pass/reject. +- `SeenIds` retention/eviction → never evict. This is *conservative* for + `ReplayImpossible`: remembering ids forever can only add rejections within the + model's horizon, never remove one. A replay accepted *after* the 120 s + retention window is by-design and is not what leaves 1.4 / 4.4 are about. The + **window relationship** that makes this safe — that an id stays remembered for + as long as a replay of it could still pass the freshness gate — is exactly what + the [`ReplayWindow`](#window-sizing--freshness--replay-retention) model proves, + lifting this abstraction. + +**Out of scope (covered by other rungs)** + +- Concurrent TTY-lock interleavings and the `try_insert` TOCTOU — one request is + handled to completion here; covered by the + [`ConcurrentHandlers`](#concurrent-handlers--dedup-toctou--tty-lock-serialisation) + sibling model. +- Freshness arithmetic monotonicity (`ymd_hms_to_epoch`) — Rung 3 Kani. +- Per-field dangerous-char scanning (`has_dangerous_chars`) — Rung 3 Kani. +- The SSH first-contact MITM residual (leaf 4.2 / A4) — the separate Rung 4 + ProVerif model in [`../proverif/`](../proverif/). + +--- + +## Window-sizing — freshness ↔ replay retention + +[`ReplayWindow.tla`](ReplayWindow.tla) / [`ReplayWindow.cfg`](ReplayWindow.cfg) — +the **Extended Rung 4** model. The approval-state-machine model above models +`seen` as never evicting (conservative, by its own admission). That conservatism +*hides* the one property the daemon genuinely depends on: the relationship +between the two time windows in [`server.rs`](../../src/server.rs): + +| constant | value | role | +|----------|-------|------| +| `MAX_REQUEST_AGE` | 60 s | freshness gate (`check_freshness`) | +| `REPLAY_RETENTION` | 120 s | seen-id eviction TTL (`SeenIds::evict_stale`) | + +This model restores a **small abstract clock** and the **real TTL eviction**, and +proves the temporal claim: *a captured request cannot evade replay-dedup by +waiting for its id to be evicted while still passing the freshness gate.* Only the +window **relationship** matters, not the absolute seconds, so the constants are +`Freshness = 2`, `Retention = 4` (= 2·Freshness), `MaxTime = 6`, `Ids = {r1}`. + +### The theorem + +| Invariant | Claim | Maps to | +|-----------|-------|---------| +| **`PastReplayImpossible`** | No captured **past-dated** request is ever re-executed. | leaf 1.1 · G2.1 (temporal half) | +| `AcceptWasFresh` | The daemon only ever accepts a request that was fresh at acceptance. | modeling hygiene | +| `TypeOK` | Type invariant. | — | + +**The window arithmetic.** An id is inserted at `acceptedAt ≥ firstTs` (you accept +no earlier than the request was created) and evicted only once +`clock − insertedAt > Retention`. A replay re-uses the fixed `firstTs`, so it +passes freshness only while `clock − firstTs ≤ Freshness`. Since +`insertedAt = acceptedAt ≥ firstTs` and `Retention ≥ Freshness`, eviction +(`clock > firstTs + Retention ≥ firstTs + Freshness`) cannot co-occur with +freshness (`clock ≤ firstTs + Freshness`). So a captured replay can never win the +race — proved exhaustively by TLC (442 distinct states, sub-second). + +**Why a monitor, not a `seen`-membership invariant.** Eviction is **lazy** — an id +is pruned only inside `try_insert`, and a replay-accept removes then re-inserts the +id in one atomic step. A membership predicate (`∃ rec ∈ seen : …`) therefore can +never observe the gap; the re-insertion masks it. Only an event monitor (`vReplay` +raised at the re-exec site) is a robust witness — the same rationale behind the +approval-state-machine model's monitor variables. + +### Finding 1 — the tight bound is `Retention ≥ Freshness`, not `2×` + +The code comment says retention is "twice `MAX_REQUEST_AGE` so that any id that +could still pass the freshness check is also still in the set." The arithmetic +above shows `Retention ≥ Freshness` already suffices (max acceptance lag only +pushes eviction *later*, never earlier). TLC confirms it: the theorem **holds** at +`Retention = Freshness` and **fails** at `Retention = Freshness − 1`. So the +shipped `120 s = 2 × 60 s` is **conservative margin, not a necessity** — *not a +bug*; the daemon is, if anything, safer than its comment claims. + +### Finding 2 — future-dated timestamps defeat replay protection after eviction + +`parse_age` **clamps future-dated timestamps to age 0** (`server.rs:146`) with no +upper cap, so a request dated beyond `Retention` into the future keeps passing the +freshness gate *forever* while its id ages out of `seen`. It is then replayable +**every `Retention` units regardless of how large `Retention` is** — the +window-sizing relationship does not help here at all. The model exposes this: with +the past-dated guard dropped (`NoFutureReplay`), TLC returns a concrete trace — +e.g. a request dated `ts = 3` accepted at `clock = 0`, evicted at `clock = 5`, then +replayed and re-executed at `clock = 5` (still fresh: `5 − 3 = 2 ≤ Freshness`). + +How exploitable this is depends on the threat model (it needs the request onto the +channel — which SSH pinning guards — and for privileged commands the human still +gates the first run), but for auto-approved unprivileged commands it is a real +repeatable replay. The fix is cheap: also reject timestamps too far in the +*future* (a small clock-skew allowance, then reject rather than clamp). Logged as a +**separate backlog item** — not fixed in this (proofs-only) session. + +### Negative controls — that the model has teeth + +Apply by hand to the model / cfg, re-check, then revert. **Do not commit.** + +| # | Mutation | Violates | +|---|----------|----------| +| **NC1** | `Retention = Freshness − 1` (window relationship broken) | `PastReplayImpossible` | +| **Tight-bound** | run at `Retention = Freshness` (holds) vs `Freshness − 1` (fails) | confirms `Retention ≥ Freshness` is the tight threshold (finding 1) | +| **NC2** | freshness gate disabled (`FreshAt(ts) == TRUE`) | `AcceptWasFresh` **and** `PastReplayImpossible` (freshness is load-bearing) | +| **Finding 2** | add `NoFutureReplay` to `INVARIANTS` | future-dated counterexample (finding 2) — a real property of the code, not a model mutation | + +> Note NC2 is *disable freshness*, not *evict by the wrong field*: at +> `Retention ≥ Freshness`, evicting by the request timestamp instead of the +> insertion time is **also** safe for past-dated requests (it expires them up to +> `Freshness` earlier, but freshness has already lapsed), so it is not a valid +> negative control here — a small result the model also settles. + +### Faithfulness ledger + +**Modelled faithfully** + +- The gate order at acceptance: freshness → evict → dedup → insert, mirroring + `try_insert` (`evict_stale` runs first, then the membership check, then the + stamped insert). +- Eviction keyed on **insertion** time (`Live(rec) == ¬(clock − insertedAt > Retention)`). +- `parse_age`'s future-timestamp **clamp** (`FreshAt(ts) == IF ts > clock THEN TRUE …`). +- The attacker replays the **captured bytes** (same id, same timestamp); the honest + client submits the one genuine request. Both, plus the clock, are nondeterministic. + +**Abstracted (sound for the window property)** + +- The clock and the windows → small integers preserving `Retention = 2·Freshness`; + the property depends only on the *relationship*, exhaustively checked to `MaxTime`. +- One id (`Ids = {r1}`): the property is per-id; the two-distinct-ids interaction is + the approval-state-machine model's job. +- Atomic handling, no clock tick mid-request — faithful because `check_freshness` + runs at handler *entry*, before the TTY lock (`server.rs:499`), precisely so a + request cannot age past the gate while queued. + +**Out of scope** + +- Concurrent `try_insert` interleavings — the + [`ConcurrentHandlers`](#concurrent-handlers--dedup-toctou--tty-lock-serialisation) + sibling model. +- The approval/keypress gate (the approval-state-machine model). + +### Running TLC + +```sh +# (only if you edited the PlusCal) re-generate the translation, then commit both: +java -cp tla2tools.jar pcal.trans ReplayWindow.tla + +java -cp tla2tools.jar tlc2.TLC -nowarning \ + -config ReplayWindow.cfg -workers auto ReplayWindow.tla +``` + +Expected: `No error has been found` (442 distinct states, sub-second). + +--- + +## Concurrent handlers — dedup TOCTOU & TTY-lock serialisation + +[`ConcurrentHandlers.tla`](ConcurrentHandlers.tla) / +[`ConcurrentHandlers.cfg`](ConcurrentHandlers.cfg) — the second **Extended Rung 4** +model. The two models above each handle one request to completion *atomically*, so +by construction they cannot see the hazards that only exist when the daemon runs +many [`handle_connection`](../../src/server.rs) threads at once. This is the +canonical model-checking use case: a property over *interleavings*, not over one +sequential run. + +`server::run` spawns a thread per connection, all sharing two pieces of state via +`Arc>`: + +| shared state | guarded by | the hazard if mishandled | +|--------------|------------|--------------------------| +| `seen` (`SeenIds`) | `seen_ids` mutex; `try_insert` is one critical section | two threads race the same id, both pass dedup, both exec (double-exec TOCTOU) | +| `/dev/tty` | `tty_lock` (`Arc>`) | two threads drive the terminal at once → background-pgrp EIO (PR #22) | + +The model runs `Handlers = {h1, h2}` concurrent handlers, each picking its request +(`id` ∈ `Ids`, `privileged`, key) nondeterministically — so TLC explores every +interleaving, including both handlers racing the **same** id. PlusCal labels are +the atomicity boundaries: the mutex acquire/release and each critical section are +labelled steps, so TLC interleaves exactly where the real threads can. + +### The two properties + +| Invariant | Claim | Maps to | +|-----------|-------|---------| +| **`NoDoubleExec`** | No request id ever executes twice, under **any** interleaving — the atomic `try_insert` critical section is *sufficient* to serialise the same-id race. | leaf 1.1 · G2.1 (concurrent half of `ReplayImpossible`) | +| **`TtyMutualExclusion`** | At most one handler is in the interactive TTY region (prompt or foreground exec) at a time — `tty_lock` serialises /dev/tty across all interleavings. | Sn5.x · G6 | + +**How they are witnessed.** `NoDoubleExec` is a monitor flag (`vDoubleExec`), raised +at an exec site if the id already ran — the same idiom as the sibling models. +`TtyMutualExclusion` is a bounded witness counter (`ttyActive`, 0..|Handlers|), +incremented inside the lock on entering the interactive region and decremented on +leaving; the invariant asserts it never exceeds 1. Both keep every variable +bounded, so the state space is finite and small. + +**Faithful to the dispatch's lock discipline.** The privileged path takes +`tty_lock` for the prompt, **releases it before exec**, then `ForegroundGuard` +**re-takes** it for the foreground swap (two separate critical sections — +`server.rs:553` / `executor.rs`). `exec_direct` (unprivileged) takes **no** TTY +lock. The no-confirm banner uses a best-effort `try_lock` (modelled as a momentary, +non-overlapping hold). So `ttyActive` legitimately goes 1→0→1 across a privileged +request, and another handler may prompt in the released gap — never *simultaneously*. + +### Result — the atomic critical section is sufficient + +TLC checks both invariants exhaustively with **no error**: 5116 distinct states at +`Handlers = {h1, h2}` (sub-second), 238 590 at `{h1, h2, h3}`. Two handlers is the +minimal witness for the pairwise dedup TOCTOU and the TTY race; a third only adds +symmetric interleavings (the larger run is a one-line `.cfg` change, kept out of CI +for speed). The same-id race **is** exercised — both handlers can pick the same id, +one wins `try_insert` and execs, the other sees `dup` and is rejected — so the +property is non-vacuous, as NC1 below confirms by making it fail. + +### Negative controls — that the model has teeth + +Apply by hand to the PlusCal, re-run `pcal.trans`, re-check, then revert. **Do not +commit the mutations.** Each was verified to violate exactly the listed invariant. + +| # | Mutation (in the PlusCal) | Violates | +|---|---------------------------|----------| +| **NC1** | Split the atomic `TryInsert` into `CheckSeen` (contains, then **release** the lock) and `InsertAcq`/`InsertSeen` (re-acquire, insert) — re-introducing the TOCTOU `try_insert` closes. | `NoDoubleExec` | +| **NC2** | Drop the `tty_lock` acquire/release around `PrivPrompt` (prompt without the lock). | `TtyMutualExclusion` | + +NC1 is the load-bearing one: it shows the model genuinely *sees* the TOCTOU, so the +clean run's `NoDoubleExec` is a real verification of the atomic critical section, not +a vacuous pass. NC2 reproduces the PR #22 hazard the TTY lock exists to prevent. + +### Faithfulness ledger + +**Modelled faithfully** + +- Two pieces of `Arc>`-shared state (`seen`, `tty_lock`) and N handler + threads interleaving on them at mutex-acquire/release granularity. +- `try_insert` as **one** critical section under the `seen_ids` lock (the atomicity + that closes the TOCTOU — the whole point of the model). +- The dispatch's TTY-lock discipline: privileged prompt holds then **releases** + before exec, `ForegroundGuard` re-takes for the foreground swap, `exec_direct` + takes no lock, the no-confirm banner is best-effort `try_lock`. +- Both handlers may race the **same** id (the TOCTOU) or handle distinct ids. + +**Abstracted (sound for these two properties)** + +- The per-request gate chain (validate → freshness → env allowlist) → assumed + passed: it is per-handler and sequential, covered by the + [`ApprovalStateMachine`](#the-four-properties) model. The focus here is the + shared-state races. +- `SeenIds` eviction → never evict (the [`ReplayWindow`](#window-sizing--freshness--replay-retention) + model owns the TTL); irrelevant to a concurrency race within the model's horizon. +- `confirm_unprivileged` → a read-only init-free boolean (no `a` key), so its flip + semantics stay with the `ApprovalStateMachine` model; both dispatch paths are + still covered. +- Handler count bounded to 2 (3 also checked); the keypress set to `{y, other}`. + +**Out of scope** + +- The approval/keypress decision table and the flag-flip transition — the + [`ApprovalStateMachine`](#the-four-properties) model. +- The freshness ↔ retention window sizing — the + [`ReplayWindow`](#window-sizing--freshness--replay-retention) model. + +### Running TLC + +```sh +# (only if you edited the PlusCal) re-generate the translation, then commit both: +java -cp tla2tools.jar pcal.trans ConcurrentHandlers.tla + +java -cp tla2tools.jar tlc2.TLC -nowarning \ + -config ConcurrentHandlers.cfg -workers auto ConcurrentHandlers.tla +``` + +Expected: `No error has been found` (5116 distinct states, sub-second). diff --git a/proofs/tla/ReplayWindow.cfg b/proofs/tla/ReplayWindow.cfg new file mode 100644 index 0000000..8a6a2b5 --- /dev/null +++ b/proofs/tla/ReplayWindow.cfg @@ -0,0 +1,23 @@ +\* TLC model for the Extended Rung 4 freshness <-> replay window-sizing proof. +\* See README.md ("Window-sizing"). +\* Run: java -cp tla2tools.jar tlc2.TLC -nowarning \ +\* -config ReplayWindow.cfg -workers auto ReplayWindow.tla + +CONSTANTS + \* One id: the property is per-id; the two-distinct-ids interaction is + \* covered by the sibling ApprovalStateMachine model. A TLC model value. + Ids = {r1} + \* The window relationship is what matters, not the absolute seconds. + \* Shipped config: Retention = 2 * Freshness (real code: 120s = 2 * 60s). + Freshness = 2 + Retention = 4 + \* Clock horizon: must reach an insertion + Retention so eviction (and a + \* post-eviction replay) is exercised; 6 gives headroom over 2*Freshness. + MaxTime = 6 + +SPECIFICATION Spec + +INVARIANTS + TypeOK + PastReplayImpossible + AcceptWasFresh diff --git a/proofs/tla/ReplayWindow.tla b/proofs/tla/ReplayWindow.tla new file mode 100644 index 0000000..557194c --- /dev/null +++ b/proofs/tla/ReplayWindow.tla @@ -0,0 +1,273 @@ +---------------------------- MODULE ReplayWindow ---------------------------- +(***************************************************************************) +(* Extended Rung 4 of the sudo-proxy formalisation roadmap: the temporal *) +(* freshness <-> replay-retention WINDOW-SIZING property that the sibling *) +(* `ApprovalStateMachine` model deliberately hides. *) +(* *) +(* That model abstracts the clock to a boolean and models the replay set *) +(* (`seen`) as never evicting -- "conservative for ReplayImpossible". This *) +(* model restores the concrete (but small, abstract) clock and the real *) +(* TTL eviction, and proves the genuinely temporal claim the daemon relies *) +(* on: a captured request cannot evade replay-dedup by waiting for its id *) +(* to be evicted while STILL passing the freshness gate. *) +(* *) +(* The two real-code windows (src/server.rs): *) +(* MAX_REQUEST_AGE = 60s -- freshness gate (check_freshness) *) +(* REPLAY_RETENTION = 120s -- seen-id eviction TTL (SeenIds::evict_stale*) +(* are modelled by the small constants Freshness / Retention; only their *) +(* RELATIONSHIP matters, not the absolute seconds (see README). *) +(* *) +(* The PlusCal algorithm below is the source of truth; the TLC-checkable *) +(* translation between BEGIN/END TRANSLATION is generated by `pcal.trans` *) +(* and committed alongside. If you edit the PlusCal, re-run pcal.trans and *) +(* commit both. *) +(* *) +(* Faithfulness ledger, the window arithmetic, the negative-control recipe,*) +(* and the two findings this model surfaced live in proofs/tla/README.md. *) +(***************************************************************************) +EXTENDS Integers, FiniteSets + +CONSTANTS Ids, \* the bounded set of request ids; the cfg uses {r1} + Freshness, \* freshness window (abstract units); real code: 60s + Retention, \* replay-retention TTL; real code: 120s = 2*Freshness + MaxTime \* clock horizon; bounds the state space, keeps it finite + +(* --algorithm ReplayWindow + +variables + \* An abstract wall clock; advanced only by the Tick branch, bounded at + \* MaxTime so the reachable state space is finite. + clock = 0, + + \* Replay dedup (SeenIds), now with eviction MODELLED: each remembered id + \* carries its insertion time (and the request timestamp, used only by the + \* NC2 mutation). Faithful to the HashSet + VecDeque<(id, Instant)> pair. + seen = {}, + + \* The single captured request we track (Ids = {r1}: the per-id property; + \* the two-distinct-ids interaction is the sibling model's job). `origTs` is + \* the original timestamp -- the bytes an attacker captures and replays. + origExists = FALSE, + origId = CHOOSE i \in Ids : TRUE, + origTs = 0, + origAccAt = 0, \* the clock value when the original was first accepted + + \* ---- monitor (violation) flags ---- + \* A PAST-dated captured request re-executed (replay slipped dedup). The + \* window-sizing theorem says this is impossible once Retention >= Freshness. + vReplay = FALSE, + \* A FUTURE-dated captured request re-executed after eviction. This one IS + \* reachable -- the finding: parse_age clamps future timestamps to age 0 + \* (server.rs:146) with no upper cap, so a request dated beyond Retention + \* into the future is replayable regardless of how large Retention is. + vFutureReplay = FALSE, + + \* The request on the wire and the busy handshake with the daemon. + busy = FALSE, + inReq = [id |-> CHOOSE i \in Ids : TRUE, ts |-> 0]; + +define + \* Faithful to parse_age: a future-dated timestamp (ts > now) is clamped to + \* age 0 and so always passes; otherwise age = now - ts must be <= Freshness. + FreshAt(ts) == IF ts > clock THEN TRUE ELSE clock - ts <= Freshness + + \* evict_stale keyed on INSERTION time (the faithful rule). The NC2 mutation + \* changes `rec.insertedAt` to `rec.ts` -- evicting by the request timestamp, + \* the exact bug the REPLAY_RETENTION code comment warns against. + Live(rec) == ~(clock - rec.insertedAt > Retention) + Evicted == { rec \in seen : Live(rec) } +end define; + +\* The environment / attacker + the honest client, fully nondeterministic. +process Env = "env" +begin +EnvLoop: + while TRUE do + await ~busy; + either + \* Time advances (between handled requests; handling is atomic -- + \* faithful: check_freshness runs at handler ENTRY, server.rs:499). + await clock < MaxTime; + clock := clock + 1; + or + \* The one genuine request for this id. Its timestamp is free over + \* the whole horizon, so it covers both an honest fresh request and + \* a (clock-skewed / crafted) FUTURE-dated one. + await ~origExists; + with i \in Ids, t \in 0..MaxTime do + inReq := [id |-> i, ts |-> t]; + busy := TRUE; + end with; + or + \* The attacker replays the CAPTURED BYTES: same id, same timestamp. + \* (Minting a fresh timestamp under an existing nonce id is not + \* replay -- ids are random; that case is out of scope.) + await origExists; + inReq := [id |-> origId, ts |-> origTs]; + busy := TRUE; + end either; + end while; +end process; + +\* The daemon: freshness gate, then the eviction-aware dedup (try_insert), then +\* dispatch. One request handled to completion atomically. +process Daemon = "daemon" +begin +Handle: + while TRUE do + await busy; + if ~FreshAt(inReq.ts) then + skip; \* freshness gate -> reject; try_insert not called + elsif \E rec \in Evicted : rec.id = inReq.id then + \* try_insert: evict_stale ran, id still present -> duplicate, reject. + seen := Evicted; + else + \* Accept: evict, then insert stamped with the current clock. + seen := Evicted \union {[id |-> inReq.id, insertedAt |-> clock, ts |-> inReq.ts]}; + if origExists then + \* A replay slipped through dedup -- record which kind. + if origTs <= origAccAt then + vReplay := TRUE; \* PAST capture: the theorem forbids this + else + vFutureReplay := TRUE; \* FUTURE capture: the documented finding + end if; + else + origExists := TRUE; + origId := inReq.id; + origTs := inReq.ts; + origAccAt := clock; + end if; + end if; + busy := FALSE; + end while; +end process; + +end algorithm; *) +\* BEGIN TRANSLATION +VARIABLES clock, seen, origExists, origId, origTs, origAccAt, vReplay, + vFutureReplay, busy, inReq + +(* define statement *) +FreshAt(ts) == IF ts > clock THEN TRUE ELSE clock - ts <= Freshness + + + + +Live(rec) == ~(clock - rec.insertedAt > Retention) +Evicted == { rec \in seen : Live(rec) } + + +vars == << clock, seen, origExists, origId, origTs, origAccAt, vReplay, + vFutureReplay, busy, inReq >> + +ProcSet == {"env"} \cup {"daemon"} + +Init == (* Global variables *) + /\ clock = 0 + /\ seen = {} + /\ origExists = FALSE + /\ origId = (CHOOSE i \in Ids : TRUE) + /\ origTs = 0 + /\ origAccAt = 0 + /\ vReplay = FALSE + /\ vFutureReplay = FALSE + /\ busy = FALSE + /\ inReq = [id |-> CHOOSE i \in Ids : TRUE, ts |-> 0] + +Env == /\ ~busy + /\ \/ /\ clock < MaxTime + /\ clock' = clock + 1 + /\ UNCHANGED <> + \/ /\ ~origExists + /\ \E i \in Ids: + \E t \in 0..MaxTime: + /\ inReq' = [id |-> i, ts |-> t] + /\ busy' = TRUE + /\ clock' = clock + \/ /\ origExists + /\ inReq' = [id |-> origId, ts |-> origTs] + /\ busy' = TRUE + /\ clock' = clock + /\ UNCHANGED << seen, origExists, origId, origTs, origAccAt, vReplay, + vFutureReplay >> + +Daemon == /\ busy + /\ IF ~FreshAt(inReq.ts) + THEN /\ TRUE + /\ UNCHANGED << seen, origExists, origId, origTs, + origAccAt, vReplay, vFutureReplay >> + ELSE /\ IF \E rec \in Evicted : rec.id = inReq.id + THEN /\ seen' = Evicted + /\ UNCHANGED << origExists, origId, origTs, + origAccAt, vReplay, + vFutureReplay >> + ELSE /\ seen' = (Evicted \union {[id |-> inReq.id, insertedAt |-> clock, ts |-> inReq.ts]}) + /\ IF origExists + THEN /\ IF origTs <= origAccAt + THEN /\ vReplay' = TRUE + /\ UNCHANGED vFutureReplay + ELSE /\ vFutureReplay' = TRUE + /\ UNCHANGED vReplay + /\ UNCHANGED << origExists, origId, + origTs, origAccAt >> + ELSE /\ origExists' = TRUE + /\ origId' = inReq.id + /\ origTs' = inReq.ts + /\ origAccAt' = clock + /\ UNCHANGED << vReplay, + vFutureReplay >> + /\ busy' = FALSE + /\ UNCHANGED << clock, inReq >> + +Next == Env \/ Daemon + +Spec == Init /\ [][Next]_vars + +\* END TRANSLATION + +\* ===================== invariants & properties ===================== + +\* Modeling-hygiene type invariant. +TypeOK == + /\ clock \in 0..MaxTime + /\ seen \subseteq [id : Ids, insertedAt : 0..MaxTime, ts : 0..MaxTime] + /\ origExists \in BOOLEAN + /\ origId \in Ids + /\ origTs \in 0..MaxTime + /\ origAccAt \in 0..MaxTime + /\ vReplay \in BOOLEAN + /\ vFutureReplay \in BOOLEAN + /\ busy \in BOOLEAN + /\ inReq \in [id : Ids, ts : 0..MaxTime] + +\* THE WINDOW-SIZING THEOREM (holds for Retention >= Freshness; the shipped +\* config has Retention = 2*Freshness): no captured PAST-dated request is ever +\* re-executed. A captured replay cannot win the race between freshness expiry +\* and eviction -- by the time its id could be evicted, its fixed timestamp has +\* already aged past the freshness gate. +\* +\* Stated as a monitor (vReplay raised at the re-exec site, exactly like the +\* sibling model's ReplayImpossible) rather than as a `\E rec \in seen` set- +\* membership predicate -- ON PURPOSE. Eviction is LAZY (an id is pruned only +\* inside try_insert) and a replay-accept removes then re-inserts the id in one +\* atomic step, so a membership invariant can never observe the gap; only an +\* event monitor can. This is the same reason the sibling model uses monitors. +PastReplayImpossible == ~vReplay + +\* Sanity: the daemon only ever accepts a request that was fresh at acceptance +\* (either future-dated -- clamped to age 0 -- or within the freshness window). +AcceptWasFresh == + origExists => (origTs > origAccAt \/ origAccAt - origTs <= Freshness) + +\* ----- the future-clamp FINDING (NOT in the committed .cfg) ----- +\* This invariant FAILS by design: TLC returns a future-dated counterexample in +\* which a captured request dated beyond Retention into the future is re-executed +\* after its id is evicted. parse_age clamps future timestamps to age 0 +\* (server.rs:146) with no upper cap, so such a request keeps passing the +\* freshness gate forever while its id ages out of `seen` -- making it replayable +\* every Retention units regardless of how large Retention is. Add NoFutureReplay +\* to INVARIANTS by hand to reproduce the trace; see README "Finding: future- +\* dated replay". +NoFutureReplay == ~vFutureReplay + +=============================================================================