diff --git a/.github/workflows/proverif.yml b/.github/workflows/proverif.yml
new file mode 100644
index 0000000..c0ecf58
--- /dev/null
+++ b/.github/workflows/proverif.yml
@@ -0,0 +1,51 @@
+name: ProVerif
+
+on:
+  push:
+  pull_request:
+
+permissions:
+  contents: read
+
+# Cancel superseded runs on the same ref.
+concurrency:
+  group: proverif-${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  proverif:
+    name: proverif (Rung 4 SSH channel, informational)
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v6
+      # ProVerif brings its own toolchain (installed via opam / OCaml),
+      # independent of rust-toolchain.toml, so this does not affect the gating
+      # Rust jobs. m4 expands the single .m4.pv template into the two
+      # configurations.
+      - uses: ocaml/setup-ocaml@v3
+        with:
+          ocaml-compiler: '5'
+      # Refresh the apt index before opam pulls ProVerif's system deps (it
+      # depends on the GTK GUI libs). Without this the runner's mirror
+      # intermittently 404s on a superseded liblzma-dev version pin, which
+      # fails this otherwise non-gating job at install time (seen flapping on
+      # this branch -- same commit passing on one trigger, failing on another).
+      - name: Refresh apt package index
+        run: sudo apt-get update
+      - name: Install ProVerif
+        run: opam install -y proverif
+      # Informational only, like the kani / geiger jobs. Both configurations are
+      # run so the log shows both outcomes. The FIRST-CONTACT run is *expected*
+      # to report query (c) (payload secrecy) false with a key-substitution
+      # attack derivation -- that surfaced gap is the artifact's purpose
+      # (attack-tree leaf 4.2), so the job must not gate on ProVerif's exit code
+      # (continue-on-error + the `|| true` below).
+      - name: Check the SSH-channel model (pinned + first-contact)
+        continue-on-error: true
+        run: |
+          echo '================ PINNED (known_hosts populated) ================'
+          m4 -DPINNED proofs/proverif/ssh-channel.m4.pv > pinned.pv
+          opam exec -- proverif pinned.pv || true
+          echo '================ FIRST CONTACT (no pinning; MITM expected) ====='
+          m4 proofs/proverif/ssh-channel.m4.pv > unpinned.pv
+          opam exec -- proverif unpinned.pv || true
diff --git a/.github/workflows/tlc.yml b/.github/workflows/tlc.yml
new file mode 100644
index 0000000..794f7f0
--- /dev/null
+++ b/.github/workflows/tlc.yml
@@ -0,0 +1,59 @@
+name: TLC
+
+on:
+  push:
+  pull_request:
+
+permissions:
+  contents: read
+
+# Cancel superseded runs on the same ref.
+concurrency:
+  group: tlc-${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  tlc:
+    name: tlc (Rung 4 TLA+ models, informational)
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v6
+      - uses: actions/setup-java@v4
+        with:
+          distribution: temurin
+          java-version: '21'
+      # TLA+ tools manage their own toolchain (a single jar), independent of
+      # rust-toolchain.toml, so this does not affect the gating Rust jobs. The
+      # committed .tla already contains the pcal.trans translation, so no
+      # PlusCal re-translation step is needed.
+      - name: Fetch TLA+ tools
+        run: |
+          curl -sSL -o tla2tools.jar \
+            https://github.com/tlaplus/tlaplus/releases/download/v1.8.0/tla2tools.jar
+      # Informational only, like the kani / geiger jobs: surfaces the Rung 4
+      # model-check in the job log without gating merge while the property set
+      # is still growing. Promote to a required check once stable.
+      - name: Model-check the approval state machine
+        continue-on-error: true
+        run: |
+          java -cp tla2tools.jar tlc2.TLC -nowarning \
+            -config proofs/tla/ApprovalStateMachine.cfg \
+            -workers auto \
+            proofs/tla/ApprovalStateMachine.tla
+      # The Extended Rung 4 freshness <-> replay-retention window-sizing model.
+      - name: Model-check the replay-window sizing
+        continue-on-error: true
+        run: |
+          java -cp tla2tools.jar tlc2.TLC -nowarning \
+            -config proofs/tla/ReplayWindow.cfg \
+            -workers auto \
+            proofs/tla/ReplayWindow.tla
+      # The Extended Rung 4 concurrent-handler interleaving model (shared `seen`
+      # set + TTY lock); checks the atomic try_insert closes the dedup TOCTOU.
+      - name: Model-check the concurrent handlers
+        continue-on-error: true
+        run: |
+          java -cp tla2tools.jar tlc2.TLC -nowarning \
+            -config proofs/tla/ConcurrentHandlers.cfg \
+            -workers auto \
+            proofs/tla/ConcurrentHandlers.tla
diff --git a/README.md b/README.md
index 88687f0..229ef7c 100644
--- a/README.md
+++ b/README.md
@@ -84,6 +84,7 @@ Each `execute` shows the exact command in the TUI; press `y` to approve,
 - [docs/usage.md](docs/usage.md) — CLI flags, non-privileged mode, SSH tunnels, agent forwarding
 - [docs/protocol.md](docs/protocol.md) — JSON-line wire protocol over the Unix socket
 - [docs/security.md](docs/security.md) — security model; [docs/security-audit.md](docs/security-audit.md) — point-in-time audit; [docs/threat-model.md](docs/threat-model.md) — STRIDE + attack tree; [docs/formalisation-roadmap.md](docs/formalisation-roadmap.md) — graduated-assurance plan; [docs/assurance-case.md](docs/assurance-case.md) — GSN argument
+- [proofs/](proofs/) — machine-checked models: [TLA+/PlusCal approval state machine](proofs/tla/) (TLC, Rung 4) and [ProVerif SSH-channel model](proofs/proverif/) (Rung 4); Kani bounded proofs live in [src/proofs.rs](src/proofs.rs) (Rung 3)
 - [docs/pkexec.md](docs/pkexec.md) — `--pkexec` mode and polkit auth caching (not recommended)
 - [docs/architecture.md](docs/architecture.md) — source layout and implementation status
 - [docs/comparison.md](docs/comparison.md) — how sudo-proxy relates to other tools
diff --git a/backlog.md b/backlog.md
index 5c40d5b..636dec4 100644
--- a/backlog.md
+++ b/backlog.md
@@ -6,14 +6,20 @@ the assurance ladder defined in
 [docs/formalisation-roadmap.md](docs/formalisation-roadmap.md); each names its
 rung.
 
-## Rung 4 — Protocol-level models
-
-TLA+/PlusCal or Alloy model of the approval + `confirm_unprivileged` policy state
-machine: replay impossible, no exec without approval, policy flag transitions
-only on an interactive keypress (pins down the F2 / attack-tree leaf 1.4/4.4
-residual). Tamarin/ProVerif model of the SSH path to make the channel
-assumptions explicit and surface the first-contact MITM gap (attack-tree leaf
-4.2 / assumption A4).
+## Cap far-future request timestamps (replay-protection gap)
+
+Surfaced by the Extended Rung 4 `ReplayWindow` TLA+ model. `parse_age`
+(`server.rs:146`) clamps a future-dated timestamp to age 0 with **no upper
+bound**, so a request dated beyond `REPLAY_RETENTION` into the future passes the
+freshness gate indefinitely while its id ages out of `SeenIds` — making it
+replayable every retention window *regardless of how large retention is*. The
+60s↔120s window relationship does not help here. Low severity (needs the request
+onto the channel — SSH pinning guards that — and privileged commands still gate
+the first run on a keypress), but for auto-approved unprivileged commands it is a
+real repeatable replay. Fix: in `check_freshness` / `parse_age`, reject
+timestamps more than a small clock-skew allowance into the future instead of
+clamping to age 0; extend the Rung 2 freshness property and the `ReplayWindow`
+model's `NoFutureReplay` invariant to cover it.
 
 ## Rung 5 — Deductive verification of the trusted core
 
diff --git a/changelog.md b/changelog.md
index 9462384..13dc57b 100644
--- a/changelog.md
+++ b/changelog.md
@@ -3,6 +3,261 @@
 Most recent at top. See [docs/formalisation-roadmap.md](docs/formalisation-roadmap.md)
 for the assurance-ladder context behind the security-formalisation entries.
 
+## 2026-06-11 — Extended Rung 4: SSH key-acceptance model, mutual auth (ProVerif)
+
+Reworked the Rung 4 ProVerif model (`proofs/proverif/ssh-channel.m4.pv`,
+**non-gating** in CI) from a channel-toggle that *declared* the tunnel private
+when pinned — assuming the MITM — to a **key-acceptance** encoding that
+**derives** it. The model now carries the real key material and the `PINNED`
+`m4` toggle flips only *which host key the daemon accepts*; the channel is the
+public Dolev-Yao network in both configurations. Strengthens assurance-case leaf
+Sn6.3 / G6.3 and makes assumption A4 genuinely load-bearing.
+
+- **Both-or-nothing design (mutual auth).** Modelling the host key alone while
+  ignoring client auth would misrepresent the system (it would paint the honest
+  remote as accepting forged commands even when pinned, which `authorized_keys`
+  prevents). So the model captures the real SSH mutual-authentication structure:
+  a **host keypair** (server→client: pinning + confidentiality, A4) and a
+  **client keypair** (client→server: `authorized_keys`, request integrity). The
+  crypto *primitives* stay a perfect black box — "under the hood" means the
+  authentication *structure*, not SSH's algorithms.
+
+- **The derived MITM.** Query **(c)** `not attacker(agentSecret)` is `true` when
+  pinned and **`false`** on first contact, with ProVerif's derivation being
+  exactly the key substitution: the attacker mints its own host key `pk(k)`, the
+  daemon accepts it and encrypts the payload to it, the attacker decrypts with
+  `k`. This is leaf 4.2 *derived* rather than assumed.
+
+- **The disentanglement (the payoff).** The four queries attribute each guarantee
+  to its assumption: **(a)** command authenticity (`RemoteAccept ⟹ RequestSent`)
+  is `true` in **both** configs — it rides on **client auth**, so a first-contact
+  eavesdropper still cannot forge or alter a command; **(c)** confidentiality
+  rides on **host-key pinning (A4)**; **(d)** separation
+  (`RemoteExec ⟹ Keypress`) is `true` in both — the private TTY (A1) is out of
+  the attacker's reach. The earlier model's "(a) holds iff pinned" was an
+  artifact of the assumed-private channel; the key-level model corrects it.
+
+- **Injective replay (b), now honest.** `false` in **both** configs with a real
+  public-channel replay trace — the earlier "cannot be proved when pinned"
+  private-channel over-approximation is **gone**. App-layer injective
+  replay-resistance stays discharged by the TLA+ `ReplayImpossible` invariant and
+  the Kani freshness-monotonicity proof.
+
+- **Teeth — three documented negative controls** (apply by hand, never
+  committed), each run and producing the expected result: **NC1** drops the
+  `checksign` client-auth check → **(a)** turns `false` even pinned (client auth
+  is load-bearing; confidentiality stays `true`, showing orthogonality); **NC2**
+  is the first-contact run itself → **(c)** `false`; **NC3** makes `ttyAck`
+  public → **(d)** no longer provable (the private TTY is load-bearing).
+
+- **Docs:** `proofs/proverif/README.md` rewritten (key-acceptance toggle, the
+  query table + "rides on" column, the disentanglement narrative, the three
+  controls, scope with the now-explicit client-auth assumption); `proverif.yml`
+  comment names **(c)** as the expected-`false` first-contact query;
+  `docs/threat-model.md` (leaf 4.2), `docs/assurance-case.md` (A4 + Sn6.3), and
+  `docs/formalisation-roadmap.md` updated for derived-not-assumed + the
+  per-assumption attribution. No Rust touched (`proofs/` is outside `src/`); the
+  three TLA+ sibling models are untouched.
+
+- **Verified:** ProVerif 2.05, both `m4` configs — pinned: (a) true, (b) false,
+  (c) true, (d) true; first contact: (a) true, (b) false, (c) **false** with the
+  key-substitution derivation, (d) true. All three negative controls fail as
+  required.
+
+## 2026-06-11 — Extended Rung 4: concurrent-handler interleaving (TLA+)
+
+Added a third TLA+/PlusCal model, `proofs/tla/ConcurrentHandlers.tla` (checked by
+TLC, **non-gating** in CI like its two siblings), discharging the *canonical*
+model-checking use case the two atomic, one-request-at-a-time models cannot
+express by construction: **concurrent** daemon handler threads interleaving on
+the two pieces of `Arc<Mutex<…>>`-shared state. It *verifies the atomicity fix is
+sufficient* — strengthening the concurrent half of assurance-case leaf 1.1 / G2.1.
+
+- **The two theorems.**
+  - **`NoDoubleExec`** — no request id ever executes twice under **any**
+    interleaving. This is the closure of the dedup TOCTOU: `SeenIds::try_insert`
+    (`server.rs:213`) folds the contains-check and the insert into one critical
+    section under the `seen_ids` mutex; the model proves that single critical
+    section is *sufficient* to serialise two threads racing the same id. Witnessed
+    by a monitor flag (`vDoubleExec`) raised at an exec site if the id already ran.
+  - **`TtyMutualExclusion`** — at most one handler is in the interactive TTY
+    region (prompt or foreground exec) at a time, so `tty_lock` serialises
+    /dev/tty across all interleavings (the PR #22 background-pgrp / EIO hazard).
+    Witnessed by a bounded counter (`ttyActive ≤ 1`).
+
+- **Faithful to the dispatch's lock discipline.** PlusCal labels are the
+  atomicity boundaries, so TLC interleaves exactly where the threads can. The
+  privileged path holds `tty_lock` for the prompt, **releases before exec**, then
+  `ForegroundGuard` **re-takes** it for the foreground swap (two critical
+  sections); `exec_direct` takes no lock; the no-confirm banner is best-effort
+  `try_lock`. So another handler may legitimately prompt in the released gap —
+  never *simultaneously*.
+
+- **Result.** TLC reports no error: **5116 distinct states** at `Handlers =
+  {h1, h2}` (sub-second), and **238 590** at `{h1, h2, h3}`. Two handlers is the
+  minimal witness for the pairwise dedup TOCTOU and the TTY race; a third only
+  adds symmetric interleavings (a one-line `.cfg` change, kept out of CI for
+  speed). The same-id race is genuinely exercised (one handler wins `try_insert`
+  and execs, the other is rejected as a duplicate), so the pass is non-vacuous.
+
+- **Teeth — two documented negative controls** (apply by hand, never committed):
+  **NC1** splits the atomic `TryInsert` into check-release-insert, re-introducing
+  the TOCTOU → `NoDoubleExec` violated (the load-bearing control: it proves the
+  model *sees* the race the atomicity closes); **NC2** drops the `tty_lock` around
+  the privileged prompt → `TtyMutualExclusion` violated (reproduces the PR #22
+  hazard). Both were run and produce the expected counterexample.
+
+- **Scope** (per the README faithfulness ledger): the shared `seen`/`tty_lock`
+  races and `try_insert`'s atomicity are modelled faithfully; the per-request gate
+  chain is assumed-passed (sequential, the `ApprovalStateMachine` model's job),
+  eviction is never-evict (the `ReplayWindow` model's job), and
+  `confirm_unprivileged` is read-only here (no `a` key). Handler count bounded to
+  2 (3 also checked).
+
+- **CI / docs:** `.github/workflows/tlc.yml` gains a third `continue-on-error`
+  TLC step; `proofs/tla/README.md` adds the model's section and updates the two
+  siblings' "out of scope" notes to point at it; the roadmap's Rung 4 scope note
+  is corrected (concurrency/TOCTOU is no longer "stays elsewhere"). No Rust
+  touched (`proofs/` is outside `src/`); the two sibling models re-check unchanged.
+
+## 2026-06-11 — Extended Rung 4: freshness ↔ replay window-sizing (TLA+)
+
+Added a second TLA+/PlusCal model, `proofs/tla/ReplayWindow.tla` (checked by
+TLC, **non-gating** in CI like its sibling), proving the temporal property the
+approval-state-machine model deliberately hides: it models `seen` as
+*never-evicting* ("conservative for `ReplayImpossible`"), which makes the
+headline replay claim conditional. The new model restores a small abstract clock
+and the **real TTL eviction** and discharges the genuinely temporal invariant —
+tightening `ReplayImpossible` from conditional to real and strengthening
+assurance-case leaf 1.1 / G2.1.
+
+- **The theorem — `PastReplayImpossible`.** No captured *past-dated* request is
+  ever re-executed. The window arithmetic: an id is stamped at acceptance
+  (`insertedAt ≥ firstTs`) and evicted only at `clock − insertedAt > Retention`;
+  a replay re-uses the fixed `firstTs` and stays fresh only while
+  `clock − firstTs ≤ Freshness`. With `insertedAt ≥ firstTs` and
+  `Retention ≥ Freshness`, eviction and freshness can never co-occur. TLC: no
+  error, 442 distinct states, sub-second.
+  - Constants abstract the two real windows (`MAX_REQUEST_AGE`=60s,
+    `REPLAY_RETENTION`=120s) to `Freshness=2`, `Retention=4` (=2F), `MaxTime=6`,
+    `Ids={r1}` — only the *relationship* matters. Bounded clock ⇒ finite, small
+    state space.
+  - **Why a monitor, not a `seen`-membership invariant:** eviction is *lazy*
+    (pruned only inside `try_insert`) and a replay-accept removes-then-reinserts
+    the id atomically, so a membership predicate can never observe the gap. Only
+    an event monitor (`vReplay`) is a robust witness — the same idiom the sibling
+    model uses.
+
+- **Finding 1 — the tight bound is `Retention ≥ Freshness`, not `2×`.** TLC
+  confirms the theorem holds at `Retention = Freshness` and fails at
+  `Freshness − 1`. The shipped `120s = 2×60s` is **conservative margin, not a
+  necessity** (max acceptance lag only pushes eviction later) — *not a bug*, and
+  the code is safer than its comment claims.
+
+- **Finding 2 — future-dated timestamps defeat replay protection after
+  eviction (new residual).** `parse_age` clamps future timestamps to age 0
+  (`server.rs:146`) with no upper cap, so a request dated beyond `Retention` into
+  the future stays fresh forever while its id ages out — replayable every
+  `Retention` units regardless of size. The model exposes a concrete trace via
+  the `NoFutureReplay` invariant (e.g. `ts=3` accepted at `clock=0`, evicted at
+  `clock=5`, re-executed). Logged as a backlog item (cap far-future timestamps);
+  not fixed in this proofs-only session.
+
+- **Teeth:** documented negative controls — `Retention < Freshness` (window
+  relationship broken → `PastReplayImpossible` fails) and freshness-gate-disabled
+  (→ `AcceptWasFresh` + `PastReplayImpossible` fail, showing freshness is
+  load-bearing). The originally-planned "evict by the wrong field" control turned
+  out *safe* at `Retention ≥ Freshness` — a small result the model also settles,
+  recorded in the README.
+
+- **Scope (per the README faithfulness ledger):** the gate order at acceptance
+  (freshness → evict → dedup → insert), eviction keyed on insertion time, and the
+  `parse_age` future-clamp are modelled faithfully; the clock/windows are
+  abstracted to small relationship-preserving integers; one id (per-id property),
+  atomic handling (justified by the entry-time freshness check, `server.rs:499`),
+  and concurrency/keypress gating are left to the sibling model.
+
+- **CI:** `.github/workflows/tlc.yml` gains a second `continue-on-error` TLC step
+  for `ReplayWindow`; added `proofs/tla/.gitignore` for TLC/`pcal.trans`
+  artifacts. The sibling `ApprovalStateMachine` re-checks unchanged (9072 states);
+  no Rust touched (`proofs/` is outside `src/`).
+
+## 2026-06-11 — Rung 4: protocol-level formal models (TLA+ + ProVerif)
+
+> **Note (superseded in part):** the ProVerif description below reflects the
+> original *channel-toggle* model (private tunnel when pinned, with the
+> injective-replay over-approximation). It was reworked the same day into the
+> *key-acceptance* mutual-auth model — see the **"Extended Rung 4: SSH
+> key-acceptance model, mutual auth (ProVerif)"** entry at the top of this file.
+> The TLA+ half of this entry stands unchanged.
+
+Discharged **Rung 4** of the assurance ladder — the temporal/relational
+properties that are not per-function — with two machine-checked models under a
+new top-level `proofs/` dir (kept out of `src/` so cargo/clippy never see them).
+Both run **non-gating** in CI, mirroring the Rung 3 Kani job. Strengthens the
+assurance-case leaves Sn5.1 / Sn5.3 (state machine) and Sn6.3 (SSH) from
+`[partial] → planned` to machine-checked / formally-characterised evidence.
+
+- **Approval state machine — TLA+/PlusCal** (`proofs/tla/`, checked by TLC). The
+  PlusCal transcribes the `handle_connection` gate chain and `tui::classify_key`;
+  a nondeterministic environment process quantifies over all attacker
+  field-forgeries/replays *and* all operator keypresses. Four safety properties
+  hold: `NoExecWithoutApproval`, `ReplayImpossible`, `PolicyFlipsOnlyOnKeypress`
+  (+ a `FlagMonotone` action property), `PrivilegedGateIndependentOfPolicy` —
+  the machine-checked closure of the 1.4/4.4 policy-transition residual (F2).
+  - Properties are tracked by bounded **monitor variables** (a violation flag
+    raised at each bad-event site), so the reachable state space is finite and
+    small (~9k states, sub-second) with *no* history-length constraint — the
+    first cut with a full event log blew up past 180M states. No `MaxLog` hack.
+  - **Teeth:** four documented negative-control mutations (exec on timeout; flag
+    wired into the privileged gate; dedup gate removed; flag flipped on a
+    non-`a` keypress) each produce the expected TLC counterexample.
+  - **Decision: TLA+/PlusCal over Alloy** — the properties are temporal/safety
+    over an evolving state machine, TLC's home turf; Alloy's relational style
+    fits less well. Recorded in `docs/formalisation-roadmap.md`.
+
+- **SSH path — ProVerif** (`proofs/proverif/`). A symbolic Dolev-Yao model with a
+  compile-time `PINNED` toggle (via `m4`): the SSH tunnel is a private channel
+  when the host key is pinned, public when not. It proves channel authenticity
+  `(a)` and payload/agent secrecy `(c)` hold **iff** the key is pinned, exhibits
+  a concrete **first-contact MITM** trace when it is not (attack-tree leaf 4.2),
+  and proves a **separation theorem** `(d)` — a privileged exec at the honest
+  remote requires a human keypress even with the channel fully compromised. This
+  makes assumption A4 load-bearing and explicit.
+  - **Honest note on injective replay `(b)`:** ProVerif reports it "cannot be
+    proved" on the *private* (pinned) channel — a known over-approximation of
+    its non-linear private-channel semantics, with **no** attack trace; on the
+    public channel it is genuinely `false` with a replay trace (the surfaced
+    gap). App-layer injectivity is discharged instead by the TLA+
+    `ReplayImpossible` invariant and the Kani freshness-monotonicity proof — a
+    coherent cross-rung split, and faithful to the threat model's "replay
+    protection rides on A4".
+  - **Decision: ProVerif over Tamarin** — one binary toggle + an attack
+    derivation, expressed with least ceremony and proved fully automatically;
+    Tamarin's unbounded-state/inductive strengths aren't needed. Recorded in the
+    roadmap.
+
+- **Scope, stated plainly** (per-model README "faithfulness ledger" / scope
+  notes): the TLA+ model abstracts the clock/env/decode to booleans, never-evicts
+  the replay set (conservative for replay), and leaves concurrency/TOCTOU and
+  per-field content scanning to their other rungs; the ProVerif model treats SSH
+  crypto as a perfect black box and abstracts time to nonces + ordering. The two
+  *accepted* residuals (by-design unprivileged auto-approve; A4 first-contact
+  dependency) are now formally **characterised**, not closed.
+
+- **CI:** `.github/workflows/tlc.yml` (fetches `tla2tools.jar`, runs TLC) and
+  `.github/workflows/proverif.yml` (installs ProVerif via opam, runs both `m4`
+  configurations) — both `continue-on-error`, like the Kani/geiger jobs. The
+  unpinned ProVerif run reporting `(a)/(c)` false is the *expected, desired*
+  output, so the job is not gated on ProVerif's exit code.
+
+- **Verified:** TLC reports no error (9072 distinct states) and each negative
+  control fails as required; ProVerif gives the pinned-vs-first-contact split
+  above. Core + `--features mcp` release build and
+  `cargo clippy --all-targets --all-features -- -D warnings` remain clean (no
+  Rust touched; `proofs/` is outside `src/` and not a workspace member, so cargo
+  ignores it).
+
 ## 2026-06-10 — Rung 3: Kani bounded proofs + validation-boundary typestate
 
 Discharged **Rung 3** of the assurance ladder. The Kani half proves the
diff --git a/docs/assurance-case.md b/docs/assurance-case.md
index 8207ed7..0d6c00d 100644
--- a/docs/assurance-case.md
+++ b/docs/assurance-case.md
@@ -65,7 +65,7 @@ asserted, evidence not yet produced).
 | **A1** | Assumption | `/dev/tty` is trusted: what the daemon writes is what the human sees, and the keypress read is the human's. |
 | **A2** | Assumption | `sudo` (and `pkexec`) correctly enforce escalation and `env_reset`; the kernel enforces socket permissions and `SO_PEERCRED`. |
 | **A3** | Assumption | The host is effectively single-user *or* the operator accepts that any same-UID process is already inside the trust boundary for non-privileged actions (finding F2). |
-| **A4** | Assumption | For the SSH path, remote hosts are in `known_hosts` before first use (or `StrictHostKeyChecking=accept-new` is configured); the SSH channel provides confidentiality and integrity. |
+| **A4** | Assumption | For the SSH path, remote hosts are in `known_hosts` before first use (or `StrictHostKeyChecking=accept-new` is configured) — this gives payload confidentiality and lets the daemon reach the genuine remote; **and** the daemon authenticates to the remote by key (the remote's `authorized_keys`), which gives command authenticity at the remote. The ProVerif model ([`proofs/proverif/`](../proofs/proverif/)) makes both halves explicit and shows which guarantee rests on which. |
 | **J1** | Justification | The whole security model reduces to G1 (see [security-audit.md](security-audit.md#summary)); structuring the argument around the single invariant keeps every control traceable to it. |
 | **S1** | Strategy | Argue over the causal path from an inbound request to root execution: a request must be *authentic* (G2), what executes must be *the thing that was validated* (G3), the human must *see the real command* (G4), approval must be *necessary and binding* (G5), and no adversary may *bypass* the gate (G6). |
 
@@ -121,11 +121,11 @@ keypress binds to the command shown.*
 | Node | Claim / Evidence | Rung | Status |
 |------|------------------|------|--------|
 | **G5.1** | `privileged:true` always reaches the Y/N gate regardless of any policy. | — | |
-| **Sn5.1** | Verified in audit: only an interactive keypress sets policy; no request field, replay, or MCP tool flips it. `src/server.rs`. | 2 | [partial] → TLA+/Alloy state-machine model (Rung 4) |
+| **Sn5.1** | TLC-checked: the `PolicyFlipsOnlyOnKeypress` invariant of [`proofs/tla/`](../proofs/tla/) proves the policy flag flips only via an interactive `a` keypress on an unprivileged request — never a request field, replay, MCP flag, or timeout — over all attacker forgeries/replays and operator choices. `src/server.rs`. | 4 | [discharged] (model-checked) |
 | **G5.2** | The prompt reads a single keypress in non-canonical mode and times out after 60 s (default **deny**). | — | |
 | **Sn5.2** | `src/tui.rs` prompt; timeout test. | 1 | [discharged] |
 | **G5.3** | The `confirm_unprivileged=false` policy relaxes only the **non-privileged** gate, never the privileged one, and only via an interactive `a` keypress. | — | |
-| **Sn5.3** | Audit finding **F2**: by-design trade-off, residual risk documented; `display_banner` reliability improvement recommended. | 0 | [partial] → model-check transition (Rung 4) |
+| **Sn5.3** | TLC-checked: `NoExecWithoutApproval` + `PrivilegedGateIndependentOfPolicy` ([`proofs/tla/`](../proofs/tla/)) prove the privileged gate requires a `y` keypress for *any* value the policy flag took, so `confirm_unprivileged` relaxes only the non-privileged gate. Audit finding **F2** by-design trade-off still documented; `display_banner` reliability is a separate backlog item. | 4 | [discharged] (model-checked) |
 
 ### G6 — Adversary cannot bypass the gate
 
@@ -138,7 +138,7 @@ keypress binds to the command shown.*
 | **G6.2** | **A2** (same-UID direct socket): daemon-side validation does not depend on client-side `validate_host`; display fields validated at the daemon. | — | |
 | **Sn6.2** | F1 fix moves sanitization into `validate_request` (daemon side); `SO_PEERCRED` + 0600. | 2 | [discharged] |
 | **G6.3** | **A3** (network / remote): `validate_host` allowlist `[A-Za-z0-9._@:-]`, rejects leading `-` (ssh option injection); host is a trailing positional; remote UID digit-only + length-capped; ssh via argv. | — | |
-| **Sn6.3** | `src/server.rs`, `src/bin/sudo-proxy.rs`, `src/hosts.rs`. **Open:** no `StrictHostKeyChecking` → first-contact MITM depends on user config (A4). | 1 | [partial] → Tamarin/ProVerif (Rung 4) |
+| **Sn6.3** | `src/server.rs`, `src/bin/sudo-proxy.rs`, `src/hosts.rs`. ProVerif model ([`proofs/proverif/`](../proofs/proverif/)) makes **A4 explicit** by modelling the real host-key + client-key material: it **derives** the first-contact MITM from host-key substitution (payload secrecy `false` unpinned — leaf 4.2), and disentangles confidentiality (rides on host-key pinning) from command authenticity (rides on client auth, so it holds even on first contact); the separation theorem shows a channel compromise does not bypass the local keypress gate. Residual A4 (no `StrictHostKeyChecking`) now formally characterised, not closed. | 4 | [discharged] (residual A4 made explicit) |
 | **G6.4** | Resource exhaustion cannot force-open the gate: 1 MiB request cap, 64 in-flight, 16 MiB output cap. | — | |
 | **Sn6.4** | `src/server.rs`, `src/executor.rs`; cap test (flaky **S2**, control sound). | 1–2 | [partial] |
 
@@ -147,13 +147,15 @@ keypress binds to the command shown.*
 These are the leaves where the argument is currently weakest, drawn from
 [security-audit.md](security-audit.md) and the rungs not yet climbed:
 
-- **Sn2.2 / Sn5.1 / Sn5.3** — the freshness and approval-state-machine claims
-  rest on fuzzing + manual reasoning; the intended strengthening is **Kani**
-  (Rung 3) for `parse_age` and a **TLA+/Alloy** model (Rung 4) of the
-  approval + policy transitions.
-- **Sn6.3** — the A3 channel guarantees rest on assumption **A4**; a
-  **Tamarin/ProVerif** model (Rung 4) would make the dependency explicit and
-  surface the first-contact MITM gap.
+- **Sn2.2** — the freshness arithmetic is discharged by **Kani** (Rung 3);
+  `Sn5.1 / Sn5.3` (the approval + policy transitions) are now discharged by the
+  **TLC-checked** state machine ([`proofs/tla/`](../proofs/tla/), Rung 4).
+- **Sn6.3** — the A3 channel guarantees rest on assumption **A4**; the
+  **ProVerif** model ([`proofs/proverif/`](../proofs/proverif/), Rung 4) now
+  makes that dependency explicit by *deriving* the first-contact MITM from
+  host-key substitution (a *characterised residual*, not a closed one — see leaf
+  4.2), and attributes confidentiality to host-key pinning vs command
+  authenticity to client auth.
 - **Sn6.4 / S2** — stabilise the flaky resource-cap test so the cap stays
   covered in CI.
 - **Sn4.4 / Sn5.3** — accepted residual risks (look-alike `reason`, the
diff --git a/docs/formalisation-roadmap.md b/docs/formalisation-roadmap.md
index 52d4c56..7562e94 100644
--- a/docs/formalisation-roadmap.md
+++ b/docs/formalisation-roadmap.md
@@ -219,6 +219,63 @@ The most interesting properties are temporal and relational, not per-function:
   will surface the first-contact MITM gap the audit noted (the ssh invocation
   sets no `StrictHostKeyChecking`).
 
+**Status: done.** Both halves are built and run non-gating in CI
+(`.github/workflows/tlc.yml`, `proverif.yml`), mirroring the Rung 3 Kani job.
+
+- **Approval state machine — TLA+/PlusCal** ([`proofs/tla/`](../proofs/tla/),
+  checked by TLC). The PlusCal transcribes the `handle_connection` gate chain and
+  `tui::classify_key`; a nondeterministic environment process quantifies over all
+  attacker field-forgeries / replays *and* all operator keypresses. Four safety
+  properties hold: `NoExecWithoutApproval`, `ReplayImpossible`,
+  `PolicyFlipsOnlyOnKeypress` (+ a `FlagMonotone` action property), and
+  `PrivilegedGateIndependentOfPolicy` — discharging the 1.4/4.4 policy-transition
+  residual. Properties are tracked by bounded *monitor variables* so the state
+  space is finite without a history bound; four documented negative-control
+  mutations each produce the expected counterexample (the model has teeth). Full
+  faithfulness ledger in [`proofs/tla/README.md`](../proofs/tla/README.md).
+  - **Decision: TLA+/PlusCal over Alloy.** The properties are temporal/safety
+    over an evolving state machine, which TLC checks directly; Alloy's relational
+    style fits structural questions less well here.
+
+- **SSH path — ProVerif** ([`proofs/proverif/`](../proofs/proverif/)). A symbolic
+  Dolev-Yao model with a compile-time `PINNED` toggle (via `m4`). Rather than
+  *declaring* the tunnel private when pinned (which would assume the conclusion),
+  it models the real **key material** — a host keypair and a client keypair, the
+  mutual authentication of the SSH use case — and the toggle flips only **which
+  host key the daemon accepts** (the known-good one, or whatever the network
+  offers on first contact). ProVerif then **derives** the first-contact MITM from
+  the attacker substituting its own host key (payload secrecy `false` unpinned,
+  with a key-substitution trace — leaf 4.2). The model **disentangles** which
+  assumption protects what: payload **confidentiality** rides on host-key pinning
+  (A4), while **command authenticity** at the remote rides on **client auth**
+  (the remote's `authorized_keys`) and so holds even on first contact — correcting
+  the earlier channel-toggle model's conflation. It also proves a **separation
+  theorem** — a privileged exec at the honest remote requires a human keypress
+  even with the channel fully compromised — making assumption A4 load-bearing and
+  explicit. Scope, the per-assumption attribution, the three negative controls,
+  and the now-honest injective-replay result (a genuine public-channel replay, no
+  longer a private-channel over-approximation; app-layer dedup discharged by the
+  TLA+ `ReplayImpossible` + Kani freshness proofs) are in
+  [`proofs/proverif/README.md`](../proofs/proverif/README.md).
+  - **Decision: ProVerif over Tamarin.** The model's job is one binary toggle
+    plus producing an attack derivation, which ProVerif's applied-pi calculus
+    expresses with least ceremony and proves fully automatically; Tamarin's
+    unbounded-state / inductive strengths aren't needed.
+
+- **Scope / residuals carried past Rung 4:** the `ApprovalStateMachine` model
+  abstracts the clock, env contents and decode to booleans and never-evicts the
+  replay set (conservative for replay); per-field content scanning stays with
+  Rung 3. The two **Extended Rung 4** TLA+ models lift the abstractions that
+  conservatism hides: [`ReplayWindow`](../proofs/tla/README.md#window-sizing--freshness--replay-retention)
+  restores a real clock + TTL eviction and proves the freshness ↔ retention
+  window relationship, and [`ConcurrentHandlers`](../proofs/tla/README.md#concurrent-handlers--dedup-toctou--tty-lock-serialisation)
+  runs concurrent handler threads to prove the atomic `try_insert` closes the
+  dedup TOCTOU and `tty_lock` serialises /dev/tty under all interleavings. The
+  ProVerif model treats SSH crypto as a perfect black box and abstracts time to
+  nonces + ordering. The two *accepted* residuals (the by-design unprivileged
+  auto-approve, and the A4 first-contact dependency) are now formally
+  **characterised**, not closed.
+
 ### Rung 5 — Deductive verification of the core (EAL5–7 territory)
 
 For the few functions that *are* the invariant, prove functional correctness
@@ -263,11 +320,12 @@ prefix-matching escapes documented in the
    the existing `security.md` / `security-audit.md` prose into the formal
    model / threats / requirements / constraints artifact, with no code change. A
    full CC-structured **Security Target** document is the remaining paper step.
-2. CI-ify Rung 1 and turn the existing fuzz tests into **named properties**
-   (Rung 2).
-3. Add **Kani** on the parsers and a **TLA+/Alloy** model of the approval +
-   policy state machine (Rungs 3–4) — the highest assurance-per-effort steps,
-   and they directly target findings F1/F2 and the replay invariant.
+2. **Done** — CI-ified Rung 1 and turned the existing fuzz tests into **named
+   properties** (Rung 2).
+3. **Done** — **Kani** on the parsers (Rung 3) and a **TLA+/PlusCal** model of
+   the approval + policy state machine plus a **ProVerif** model of the SSH path
+   (Rung 4) — the highest assurance-per-effort steps, directly targeting findings
+   F1/F2, the replay invariant, and the A4 first-contact gap.
 4. Treat **Creusot / Verus** (Rung 5) and **seL4-style refinement** (Rung 6) as
    a stated long-horizon goal in the assurance case, with the trust assumptions
    written down now.
@@ -280,7 +338,7 @@ prefix-matching escapes documented in the
 | 1 | Review + static analysis | clippy, cargo-audit, cargo-deny, cargo-geiger, Semgrep/MIRAI | display sanitization, `unsafe` surface | EAL1–3 |
 | 2 | Property-based testing | proptest, quickcheck | `shell_escape`, `parse_age`, display fields, policy flag | EAL3–4 |
 | 3 | Automated formal | Kani, Flux | parsers (no panic/overflow), validation boundary | EAL4–5 |
-| 4 | Protocol verification | TLA+/PlusCal, Alloy, Tamarin/ProVerif | approval state machine, replay, SSH path | EAL5–6 |
+| 4 ✓ | Protocol verification | TLA+/PlusCal (TLC), ProVerif ([`proofs/`](../proofs/)) | approval state machine, replay, SSH path | EAL5–6 |
 | 5 | Deductive verification | Creusot, Verus, Prusti | `validate_request`, dispatch, env allowlist | EAL6–7 |
 | 6 | Machine-checked refinement | seL4-style (Isabelle/Coq, Iris) | daemon logic refines abstract approval spec | EAL7 |
 
diff --git a/docs/threat-model.md b/docs/threat-model.md
index a074822..d689b1c 100644
--- a/docs/threat-model.md
+++ b/docs/threat-model.md
@@ -169,7 +169,18 @@ is unaccounted for. The residuals — already on record and accepted — are:
   yet implemented" in [architecture.md](architecture.md)); the `-v` server log
   is the only record.
 
-These residuals are the natural inputs to the higher
-[roadmap rungs](formalisation-roadmap.md#the-rigor-ladder): the policy-transition
-residual (1.4/4.4) is what a TLA+/Alloy model (Rung 4) would pin down, and the
-SSH residual (4.2) is what a Tamarin/ProVerif model (Rung 4) would make explicit.
+These residuals were the inputs to **Rung 4**, now discharged
+([roadmap](formalisation-roadmap.md#the-rigor-ladder)): the policy-transition
+residual (1.4/4.4) is pinned by the TLC-checked state machine in
+[`proofs/tla/`](../proofs/tla/) (`PolicyFlipsOnlyOnKeypress`,
+`PrivilegedGateIndependentOfPolicy`), and the SSH residual (4.2) is made explicit
+by the ProVerif model in [`proofs/proverif/`](../proofs/proverif/), which models
+the real host-key and client-key material and **derives** the first-contact MITM
+from the attacker substituting its own host key. The model disentangles which
+assumption protects what: payload **confidentiality** rides on host-key pinning
+(A4) — broken on first contact, the derived MITM trace — while **command
+authenticity** at the remote rides on **client authentication** (the remote's
+`authorized_keys`), so even a first-contact eavesdropper cannot forge or alter a
+command the honest remote will run. The 1.4/4.4 *by-design* trade-off
+(unprivileged auto-approve) and the 4.2 A4 dependency remain accepted residuals —
+now formally characterised rather than only argued in prose.
diff --git a/proofs/proverif/README.md b/proofs/proverif/README.md
new file mode 100644
index 0000000..740cb1a
--- /dev/null
+++ b/proofs/proverif/README.md
@@ -0,0 +1,172 @@
+# Rung 4 — ProVerif SSH-channel model
+
+[← formalisation roadmap](../../docs/formalisation-roadmap.md) · [threat model](../../docs/threat-model.md) · [assurance case](../../docs/assurance-case.md)
+
+A ProVerif symbolic (Dolev-Yao) model of sudo-proxy's SSH-tunnel path. It
+discharges **Rung 4** (SSH half) of the
+[formalisation roadmap](../../docs/formalisation-roadmap.md): it makes assumption
+**A4** load-bearing and *explicit* by **deriving** the **first-contact MITM**
+(attack-tree leaf **4.2**) from the attacker substituting its own host key —
+rather than *assuming* a secure channel under pinning. It also states the
+**separation theorem**: a privileged exec at the honest remote requires a human
+keypress *even when the channel is fully attacker-controlled*.
+
+ProVerif was chosen over Tamarin: the model's job is one binary toggle
+(key-acceptance) plus producing an attack derivation, which ProVerif's
+applied-pi calculus expresses with least ceremony and proves fully
+automatically. Tamarin's strengths (unbounded mutable state, AC reasoning,
+inductive lemmas) aren't needed here.
+
+## What changed from the earlier version
+
+The earlier model toggled the SSH tunnel between a **private** and a **public**
+channel. That *declared* the channel secure when pinned — it assumed the
+conclusion instead of showing *why* pinning produces it. This version models the
+real **key material**: the remote has a host keypair, the local daemon **accepts
+a host key** (the known-good one when pinned; whatever the network offers on
+first contact), and encrypts the request to the accepted key. ProVerif then
+**derives** the MITM from the attacker substituting its own key.
+
+Modelling only the host key while ignoring client authentication would
+misrepresent the system — it would paint the honest remote as accepting forged
+commands even when pinned, which the remote's `authorized_keys` check actually
+prevents. So the model captures the **real mutual-authentication structure** of
+the SSH use case:
+
+- **host key** (server → client): pinning + payload confidentiality — **A4**;
+- **client key** (client → server): `authorized_keys`, request integrity.
+
+The payoff is that the model **disentangles which assumption protects what**:
+
+| Property | Rides on |
+|----------|----------|
+| Payload **confidentiality** (and you're-talking-to-the-real-remote) | **host-key pinning** (A4) |
+| **Command authenticity** at the remote (only genuinely-issued commands run) | **client auth** (`authorized_keys`) — so it holds *even on first contact* |
+| **Separation** (channel compromise can't fabricate approvals) | the private TTY (A1) |
+
+## The toggle
+
+`ssh-channel.m4.pv` is an `m4` template. The `PINNED` macro flips only **which
+host key the daemon accepts** — the channel is the public Dolev-Yao network in
+**both** configurations:
+
+- **`PINNED`** → `pkAccepted = pkR`, the known-good host key from `known_hosts`.
+- **unpinned** → `in(pub, pkAccepted)`, accept whatever the network offers; the
+  attacker answers with its own key.
+
+```sh
+# pinned (known_hosts populated): expect the channel guarantees to hold
+m4 -DPINNED ssh-channel.m4.pv > /tmp/pinned.pv   && proverif /tmp/pinned.pv
+
+# first contact (no pinning): expect the secrecy MITM to surface
+m4          ssh-channel.m4.pv > /tmp/unpinned.pv && proverif /tmp/unpinned.pv
+```
+
+(ProVerif is installed with `opam install proverif`; run via `opam exec --
+proverif …`.)
+
+## The four queries and expected outcomes
+
+| # | Query | Pinned | First contact | Rides on |
+|---|-------|--------|---------------|----------|
+| **(a)** | Authenticity/integrity: the honest remote accepts only a request the local daemon actually sent. `RemoteAccept ⟹ RequestSent`. | `true` | `true` | **client auth** |
+| **(b)** | Replay resistance (injective): each accept maps to a *distinct* send. | `false` | `false` | app-layer dedup¹ |
+| **(c)** | Secrecy of the forwarded-agent secret / tunnelled payload. `not attacker(agentSecret)`. | `true` | **`false`** + key-substitution MITM derivation | **host-key pinning (A4)** |
+| **(d)** | Separation: the remote execs a command only after a human keypress for it. `RemoteExec ⟹ Keypress`. | `true` | `true` | private TTY (A1) |
+
+The **first-contact `(c)` = `false`** is the *derived, surfaced gap* — the
+mechanically-exhibited leaf 4.2. ProVerif's derivation is exactly the
+key-substitution MITM: the attacker mints its own host key `pk(k)`, the daemon
+(first contact) accepts it and encrypts the payload to it, and the attacker
+decrypts with `k` to recover `agentSecret`. A non-`true` result in the unpinned
+run is therefore expected, not a CI failure (the workflow is non-gating and not
+gated on ProVerif's exit code).
+
+The honest, valuable result is that **`(a)` holds in *both* columns**: command
+authenticity at the remote rides on **client authentication** (the signature the
+remote checks against `authorized_keys`), so the first-contact MITM — which
+breaks confidentiality — still **cannot forge or alter a command** the honest
+remote will run. The earlier model's "authenticity holds *iff* pinned" was an
+artifact of the assumed-private channel; the key-level model corrects it.
+
+The **`(d)`** row holding in *both* columns is the reassuring result: an SSH
+channel compromise does not fabricate approvals — the local human gate
+(assumption A1, the private TTY) is not reachable by a network attacker.
+
+¹ **On (b) — stated plainly.** The model has no application-layer dedup, so a
+captured ciphertext can be replayed and the remote re-accepts it: injective
+`(b)` is genuinely `false` in **both** configurations, with a replay trace. (The
+earlier private-channel model reported (b) "cannot be proved" when pinned — a
+known ProVerif over-approximation of private-channel linearity; with everything
+now on the public channel that artifact is **gone**, and the result is a real,
+honest replay.) Application-layer injective replay-resistance is discharged by
+the other rungs: the daemon's UUID dedup is the TLA+
+[`ReplayImpossible`](../tla/) invariant (Rung 4), and freshness monotonicity is
+the Kani `ymd_hms_to_epoch_is_monotone` proof (Rung 3). The threat model already
+states that replay protection "rides on SSH transport integrity (A4)", so
+locating app-layer dedup in those rungs rather than here is faithful to the
+model.
+
+## Negative controls (teeth)
+
+Documented mutations that each turn a passing query red, confirming the model
+*sees* the property it claims. Apply by hand to the expanded `.pv`; never
+committed.
+
+- **NC1 — client auth is load-bearing for (a).** Delete the
+  `let (=id, =c) = checksign(sg, pkL) in` line in `RemoteDaemon` (accept without
+  verifying the client signature) ⇒ `(a)` `RemoteAccept ⟹ RequestSent` turns
+  **`false`** *even pinned*: the attacker, knowing the public host key `pkR`,
+  encrypts a forged request to the honest remote. (Confidentiality `(c)` stays
+  `true` — pinning still protects it — which shows the two guarantees are
+  orthogonal.)
+- **NC2 — pinning is load-bearing for (c).** The **first-contact run is this
+  control**: dropping pinning turns `(c)` `false` with the key-substitution
+  derivation above. (Equivalently, forcing `pkAccepted` to an attacker key under
+  `PINNED` breaks `(c)` too.)
+- **NC3 — the private TTY is load-bearing for (d).** Change
+  `free ttyAck:channel [private].` to a public `free ttyAck:channel.` ⇒ `(d)`
+  `RemoteExec ⟹ Keypress` can no longer be proved: the attacker forges the
+  acknowledgement and the remote execs without a keypress — reproducing
+  "channel compromise bypasses the human gate" were the TTY not private (A1).
+
+All three were run and produce the expected result.
+
+## Scope (what this model abstracts away)
+
+Stated plainly, as the Kani "Scope" note and the TLA+ faithfulness ledger do:
+
+- **Symbolic Dolev-Yao, not computational.** Cryptographic *primitives* are
+  assumed perfect (keys unguessable, `aenc`/`sign` unbreakable, no
+  algebraic/padding side channels). We prove what holds *given* sound crypto.
+- **SSH is a black box.** We model the authentication *structure* SSH realises —
+  a confidential channel to whoever's host key was accepted, plus client
+  authentication by key — and the *condition* for it (pinning). We do not
+  re-verify SSH's transport, key-exchange, or authentication internals; that is
+  established work.
+- **Two modelled assumptions.** Host-key pinning (**A4**) and client
+  authentication (the remote's `authorized_keys`, previously implicit). The
+  model makes the second explicit because command authenticity rides on it, not
+  on pinning.
+- **Time is abstracted** to nonce uniqueness + event ordering. The 60 s
+  freshness and 120 s dedup windows are not modelled numerically (that
+  arithmetic is Rung 3 Kani's domain, `ymd_hms_to_epoch`); application-layer
+  replay dedup is the TLA+ `ReplayImpossible` invariant's domain.
+- **The human keypress is a trusted local action** (assumption A1): the approval
+  channel is private and not attacker-injectable. This assumption is precisely
+  what yields the separation theorem (d).
+- **DoS, traffic analysis, and side channels are out of scope** (DoS is covered
+  by the resource-cap controls, not here).
+
+What it *does* establish: A4 is now a mechanically-**derived** dependency —
+confidentiality holds iff the host key is pinned, with a concrete first-contact
+key-substitution MITM trace for leaf 4.2 — the disentanglement attributes
+command authenticity to client auth instead, and the separation theorem (d)
+bounds the blast radius of an SSH compromise: it does not reach the local
+approval gate.
+
+## Why no committed generated `.pv`
+
+Unlike the TLA+ artifact (where the `pcal.trans` output is committed), the m4
+expansion is trivial and deterministic, so only the `.m4.pv` template is checked
+in; CI regenerates both `.pv` files with `m4`.
diff --git a/proofs/proverif/ssh-channel.m4.pv b/proofs/proverif/ssh-channel.m4.pv
new file mode 100644
index 0000000..011b51d
--- /dev/null
+++ b/proofs/proverif/ssh-channel.m4.pv
@@ -0,0 +1,177 @@
+(***************************************************************************)
+(* Rung 4 of the sudo-proxy formalisation roadmap: a ProVerif symbolic     *)
+(* (Dolev-Yao) model of the SSH-tunnel path.                               *)
+(*                                                                         *)
+(* Its job is to make assumption A4 load-bearing and EXPLICIT by DERIVING  *)
+(* the first-contact MITM (attack-tree leaf 4.2) rather than assuming it.  *)
+(* The earlier version of this model toggled the tunnel between a PRIVATE  *)
+(* and a public channel -- it therefore *declared* the channel secure when *)
+(* pinned instead of showing *why* pinning makes it so. This version       *)
+(* instead models the real key material: the remote has a host keypair,    *)
+(* the local daemon ACCEPTS a host key (the known-good one when pinned;    *)
+(* whatever the network offers on first contact), and encrypts the request *)
+(* to the accepted key. ProVerif then derives the MITM from the attacker   *)
+(* substituting its own key.                                               *)
+(*                                                                         *)
+(* Because modelling only the host key while ignoring client auth would    *)
+(* misrepresent the system (it would paint the honest remote as accepting  *)
+(* forged commands even when pinned -- which the remote's authorized_keys  *)
+(* check actually prevents), we model the REAL mutual-authentication       *)
+(* structure of the SSH use case:                                          *)
+(*   - host key  (server -> client): pinning + confidentiality (A4);        *)
+(*   - client key (client -> server): authorized_keys, request integrity.   *)
+(* The model then DISENTANGLES which assumption protects what:             *)
+(*   - confidentiality of the payload rides on HOST-KEY PINNING;           *)
+(*   - command authenticity at the remote rides on CLIENT AUTH -- and so   *)
+(*     holds even on first contact.                                        *)
+(*                                                                         *)
+(* TOGGLE (compile-time, via m4) -- flips only WHICH host key the daemon   *)
+(* accepts; the channel is the public network in BOTH configurations:      *)
+(*   m4 -DPINNED ssh-channel.m4.pv | proverif /dev/stdin   -- pinned       *)
+(*   m4          ssh-channel.m4.pv | proverif /dev/stdin   -- first contact*)
+(* See README.md for the expected outcomes and the scope statement.        *)
+(*                                                                         *)
+(* SCOPE (stated plainly, as the Kani/TLA+ artifacts do):                  *)
+(*  - Symbolic Dolev-Yao model: crypto PRIMITIVES are perfect (keys        *)
+(*    unguessable, aenc/sign unbreakable). We model the authentication     *)
+(*    STRUCTURE SSH realises (a confidential channel to whoever's host key *)
+(*    was accepted, plus client authentication by key), not SSH's          *)
+(*    transport/key-exchange internals -- those stay a black box.          *)
+(*  - Time is abstracted to nonce uniqueness + event ordering; the 60s     *)
+(*    freshness and 120s dedup windows are not modelled numerically (that  *)
+(*    arithmetic is Rung 3 Kani's domain, ymd_hms_to_epoch). App-layer     *)
+(*    replay dedup is the TLA+ ReplayImpossible invariant's domain.        *)
+(*  - The human keypress is a trusted local action (assumption A1): the    *)
+(*    approval channel is private and not attacker-injectable. This is     *)
+(*    what yields the separation theorem (d).                              *)
+(*  - DoS, traffic analysis, and side channels are out of scope.           *)
+(***************************************************************************)
+
+(* ----------------------------- channels ------------------------------- *)
+free pub:channel.                       (* the Dolev-Yao network           *)
+
+(* The trusted /dev/tty approval handshake (assumption A1): the daemon
+   shows a command, the human approves exactly that command. Private, so a
+   network attacker can neither read nor forge an approval. *)
+free ttyShow:channel [private].
+free ttyAck:channel  [private].
+
+(* --------------------------- cryptography ----------------------------- *)
+(* Host keypair (server -> client authentication + confidentiality): SSH's
+   host key, the one pinned in known_hosts. Asymmetric encryption to it
+   gives a confidential channel to its holder. *)
+type skey.
+type pkey.
+fun pk(skey):pkey.
+fun aenc(bitstring, pkey):bitstring.
+reduc forall m:bitstring, k:skey; adec(aenc(m, pk(k)), k) = m.
+
+(* Client keypair (client -> server authentication): the local daemon's key,
+   trusted by the remote's authorized_keys. A signature authenticates the
+   request and binds its integrity. *)
+type sskey.
+type spkey.
+fun spk(sskey):spkey.
+fun sign(bitstring, sskey):bitstring.
+reduc forall m:bitstring, k:sskey; checksign(sign(m, k), spk(k)) = m.
+
+(* The forwarded-agent secret / a stand-in for confidential tunnelled
+   payload. Secret unless the channel leaks it. *)
+free agentSecret:bitstring [private].
+
+(* ------------------------------ events -------------------------------- *)
+event RequestSent(bitstring, bitstring).    (* (id, cmd) emitted by local daemon  *)
+event RemoteAccept(bitstring, bitstring).   (* (id, cmd) accepted by honest remote *)
+event RemoteExec(bitstring).                (* cmd executed by honest remote       *)
+event Keypress(bitstring).                  (* human approved cmd at the TTY       *)
+
+(* ------------------------------ queries ------------------------------- *)
+
+(* (a) Authenticity / integrity: the honest remote accepts only a request
+       the local daemon actually sent. Rides on CLIENT AUTH (the signature
+       the remote checks against authorized_keys), so it holds INDEPENDENT
+       of host-key pinning -- the attacker cannot forge the daemon's
+       signature on a command of its choosing in either configuration.
+       PINNED: true.  FIRST CONTACT: true. *)
+query id:bitstring, c:bitstring;
+  event(RemoteAccept(id, c)) ==> event(RequestSent(id, c)).
+
+(* (b) Replay resistance (injective): each acceptance corresponds to a
+       DISTINCT send. The model has no application-layer dedup, so a captured
+       ciphertext can be replayed and the remote re-accepts it -- false in
+       BOTH configurations (a genuine public-channel replay, not a
+       private-channel over-approximation). App-layer injectivity is the
+       TLA+ ReplayImpossible invariant's domain (UUID dedup), and freshness
+       monotonicity is the Kani ymd_hms_to_epoch_is_monotone proof. *)
+query id:bitstring, c:bitstring;
+  inj-event(RemoteAccept(id, c)) ==> inj-event(RequestSent(id, c)).
+
+(* (c) Secrecy of the forwarded-agent secret / tunnelled payload. Rides on
+       HOST-KEY PINNING: the daemon encrypts to the ACCEPTED host key, so
+       confidentiality holds iff that key is the honest remote's.
+       PINNED: true (not derivable).
+       FIRST CONTACT: false -- the attacker substitutes its own host key,
+       the daemon encrypts to it, the attacker decrypts. THE DERIVED MITM. *)
+query attacker(agentSecret).
+
+(* (d) Separation theorem: the honest remote execs a command only after a
+       human keypress approving exactly that command -- holds in BOTH
+       configurations, i.e. even when the channel is fully compromised the
+       SSH compromise does NOT bypass the local human gate (A1, private TTY). *)
+query c:bitstring;
+  event(RemoteExec(c)) ==> event(Keypress(c)).
+
+(* ----------------------------- processes ------------------------------ *)
+
+(* Emit the request: signed by the client key (authorized_keys auth +
+   integrity) and encrypted to the accepted host key (confidentiality).
+   Factored out so the two toggle branches share one tail. *)
+let SendReq(skL:sskey, pkAccepted:pkey, id:bitstring, c:bitstring) =
+    out(pub, aenc((id, c, agentSecret, sign((id, c), skL)), pkAccepted)).
+
+(* The local daemon (run_remote): the MCP caller (adversary A1) chooses the
+   command; the daemon stamps a fresh UUID nonce, then ACCEPTS a host key --
+   the pinned known-good key, or (first contact) whatever the network offers,
+   which the attacker can answer with its own key. *)
+let LocalDaemon(skL:sskey, pkR:pkey) =
+    in(pub, c:bitstring);
+    new id:bitstring;
+    event RequestSent(id, c);
+ifdef(`PINNED',
+    (* pinned: use the known-good host key from known_hosts *)
+    SendReq(skL, pkR, id, c)
+,
+    (* first contact: accept whatever host key the network offers *)
+    in(pub, pkAccepted:pkey);
+    SendReq(skL, pkAccepted, id, c)
+).
+
+(* The honest remote daemon: decrypts with its host secret key, verifies the
+   client signature against the daemon's public key (authorized_keys) and
+   that it covers exactly (id, c), then runs the unconditional approval gate
+   -- shows the command on the trusted TTY and executes only once the human
+   approves exactly that command. *)
+let RemoteDaemon(skR:skey, pkL:spkey) =
+    in(pub, ct:bitstring);
+    let (id:bitstring, c:bitstring, s:bitstring, sg:bitstring) = adec(ct, skR) in
+    let (=id, =c) = checksign(sg, pkL) in
+    event RemoteAccept(id, c);
+    out(ttyShow, c);
+    in(ttyAck, =c);
+    event RemoteExec(c).
+
+(* The human at the trusted TTY: sees a command and approves it (worst case
+   -- approves whatever is shown; we prove exec=>keypress, not refusal). *)
+let Human =
+    in(ttyShow, c:bitstring);
+    event Keypress(c);
+    out(ttyAck, c).
+
+process
+    new skR:skey;                 (* honest remote host keypair *)
+    new skL:sskey;                (* local daemon client keypair *)
+    let pkR = pk(skR) in
+    let pkL = spk(skL) in
+    out(pub, pkR);                (* host public key is public knowledge *)
+    out(pub, pkL);                (* client public key is public knowledge *)
+    ( !LocalDaemon(skL, pkR) | !RemoteDaemon(skR, pkL) | !Human )
diff --git a/proofs/tla/.gitignore b/proofs/tla/.gitignore
new file mode 100644
index 0000000..1c5bd5d
--- /dev/null
+++ b/proofs/tla/.gitignore
@@ -0,0 +1,6 @@
+# TLC / pcal.trans generated artifacts (the committed .tla carries the
+# translation; everything below is regenerated by a local model-check run).
+states/
+*_TTrace_*.tla
+*_TTrace_*.bin
+*.old
diff --git a/proofs/tla/ApprovalStateMachine.cfg b/proofs/tla/ApprovalStateMachine.cfg
new file mode 100644
index 0000000..7ec4bd3
--- /dev/null
+++ b/proofs/tla/ApprovalStateMachine.cfg
@@ -0,0 +1,20 @@
+\* TLC model for the Rung 4 approval state machine. See README.md.
+\* Run:  java -cp tla2tools.jar tlc2.TLC -config ApprovalStateMachine.cfg \
+\*            -workers auto ApprovalStateMachine.tla
+
+CONSTANTS
+    \* Two ids: the minimum to express both "replay the same id" and
+    \* "two distinct ids". TLC model values.
+    Ids = {r1, r2}
+
+SPECIFICATION Spec
+
+INVARIANTS
+    TypeOK
+    NoExecWithoutApproval
+    ReplayImpossible
+    PolicyFlipsOnlyOnKeypress
+    PrivilegedGateIndependentOfPolicy
+
+PROPERTIES
+    FlagMonotone
diff --git a/proofs/tla/ApprovalStateMachine.tla b/proofs/tla/ApprovalStateMachine.tla
new file mode 100644
index 0000000..61184a4
--- /dev/null
+++ b/proofs/tla/ApprovalStateMachine.tla
@@ -0,0 +1,372 @@
+-------------------------- MODULE ApprovalStateMachine --------------------------
+(***************************************************************************)
+(* Rung 4 of the sudo-proxy formalisation roadmap: a TLA+/PlusCal model of *)
+(* the approval state machine, model-checked by TLC.                       *)
+(*                                                                         *)
+(* It pins the two accepted residuals the Rung 0 threat model routed here  *)
+(* (attack-tree leaves 1.4 / 4.4, finding F2): the `confirm_unprivileged`  *)
+(* policy transition and the unconditional human gate on privileged exec.  *)
+(*                                                                         *)
+(* The PlusCal algorithm below is the source of truth; the TLC-checkable   *)
+(* TLA+ translation between BEGIN/END TRANSLATION is generated from it by   *)
+(* `pcal.trans` and committed alongside (see README.md). If you edit the   *)
+(* PlusCal, re-run `pcal.trans` and commit both.                           *)
+(*                                                                         *)
+(* The properties are tracked by bounded *monitor* variables (a standard   *)
+(* TLC idiom): a violation flag is raised at the exact site a bad thing    *)
+(* would happen, and the invariant asserts the flag stays FALSE. This      *)
+(* keeps every variable bounded, so the reachable state space is finite    *)
+(* and small with no history-length constraint -- processing more requests *)
+(* only revisits states (once `seen` is full, further requests are         *)
+(* rejected as replays).                                                   *)
+(*                                                                         *)
+(* Faithfulness ledger and the negative-control recipe that demonstrates   *)
+(* the model has teeth live in proofs/tla/README.md.                       *)
+(***************************************************************************)
+EXTENDS Naturals, FiniteSets
+
+CONSTANTS Ids          \* the bounded set of request ids, e.g. {r1, r2}
+
+\* The entire input domain of tui::classify_key, abstracted: a keypress 'y',
+\* an 'a' (always-allow), any other key, or a poll timeout (no keypress).
+KeyChoice == {"y", "a", "other", "timeout"}
+
+\* A request as seen by the daemon. Every gate is abstracted to the boolean
+\* "does this request pass that gate"; the contents the gate inspects (clock,
+\* env strings, dangerous chars) are out of scope here -- see the ledger.
+Requests == [ id           : Ids,
+              privileged    : BOOLEAN,
+              forwardAgent  : BOOLEAN,
+              wellFormed    : BOOLEAN,   \* passes ValidatedRequest::validate
+              fresh         : BOOLEAN,   \* passes the freshness gate (<=60s)
+              envOk         : BOOLEAN ]  \* passes the env allowlist
+
+\* A placeholder request for the initial state; never handled (busy = FALSE at
+\* init), so its field values are irrelevant.
+NoReq == [ id           |-> CHOOSE i \in Ids : TRUE,
+           privileged   |-> FALSE,
+           forwardAgent |-> FALSE,
+           wellFormed   |-> FALSE,
+           fresh        |-> FALSE,
+           envOk        |-> FALSE ]
+
+(* --algorithm ApprovalStateMachine
+
+variables
+    \* The persisted/in-memory policy flag (Arc<AtomicBool>), daemon-lifetime.
+    \* Init is a free boolean so both the default and `--no-confirm-unprivileged`
+    \* startup are covered.
+    confirmUnpriv \in BOOLEAN,
+
+    \* Replay dedup (SeenIds): ids the daemon has accepted (whether the dispatch
+    \* then executed, denied, or timed out). Modelled as a monotonically growing
+    \* set -- eviction is dropped, which is conservative for ReplayImpossible
+    \* (see the ledger).
+    seen = {},
+
+    \* Ids that have actually been executed -- the witness set for replay.
+    executed = {},
+
+    \* The request currently on the wire and the operator's keypress for it.
+    \* Read by the daemon only while `busy`.
+    req = NoReq,
+    key = "timeout",
+    busy = FALSE,
+
+    \* ---- monitor (violation) flags; each invariant asserts its flag is FALSE ----
+    \* A privileged exec happened on a keypress other than 'y'.
+    vNoExec = FALSE,
+    \* An exec happened for an id that was already executed (a replayed exec).
+    vReplay = FALSE,
+    \* The policy flag was flipped other than by an 'a' keypress on an
+    \* unprivileged request.
+    vFlip = FALSE;
+
+define
+    \* A direct transcription of tui::classify_key(key, privileged).
+    \* ApprovedAlways is emitted iff key = "a" AND the request is unprivileged.
+    Classify(k, priv) ==
+        IF   k = "timeout"        THEN "Timeout"
+        ELSE IF k = "y"           THEN "Approved"
+        ELSE IF k = "a" /\ ~priv  THEN "ApprovedAlways"
+        ELSE                           "Denied"
+end define;
+
+\* Bookkeeping at every execution site. `isPriv` says whether this is the
+\* privileged (exec_sudo) path. Raises the replay flag if this id already ran,
+\* and -- on the privileged path -- the no-approval flag if the executing
+\* keypress was not 'y'. So if any mutation routes a privileged exec onto a
+\* non-'y' key, or lets an id execute twice, the monitor catches it here.
+macro NoteExec(isPriv) begin
+    if req.id \in executed then
+        vReplay := TRUE;
+    end if;
+    if isPriv /\ key # "y" then
+        vNoExec := TRUE;
+    end if;
+    executed := executed \union {req.id};
+end macro;
+
+\* The single primitive that writes the policy flag. Any flip must go through
+\* it; it raises the flip-provenance flag unless the writer is an 'a' keypress
+\* on an unprivileged request -- so wiring a flip into any other branch (a
+\* request field, a timeout, the privileged path) trips the monitor.
+macro FlipFlag() begin
+    confirmUnpriv := FALSE;
+    if req.privileged \/ key # "a" then
+        vFlip := TRUE;
+    end if;
+end macro;
+
+\* The environment / attacker: submits an arbitrary request (every field free,
+\* including an id already in `seen` -- a replay) and an arbitrary operator
+\* keypress. TLC's exhaustive search universally quantifies the properties over
+\* all attacker forgeries / replays AND all operator choices.
+process Env = "env"
+begin
+EnvLoop:
+    while TRUE do
+        await ~busy;
+        with r \in Requests, k \in KeyChoice do
+            req  := r;
+            key  := k;
+            busy := TRUE;
+        end with;
+    end while;
+end process;
+
+\* The daemon: runs the gate chain in the SAME order as handle_connection (so a
+\* reordering mutation is observable), then dispatches. One request is handled
+\* to completion atomically (concurrency / TTY-lock interleavings are out of
+\* scope for these properties -- see the ledger).
+process Daemon = "daemon"
+begin
+Handle:
+    while TRUE do
+        await busy;
+        if ~req.wellFormed then
+            skip;                                       \* ValidatedRequest::validate -> reject
+        elsif req.forwardAgent /\ req.privileged then
+            skip;                                       \* forward_agent + privileged -> reject
+        elsif ~req.fresh then
+            skip;                                       \* freshness gate -> reject
+        elsif ~req.envOk then
+            skip;                                       \* env allowlist -> reject
+        elsif req.id \in seen then
+            skip;                                       \* replay dedup (try_insert -> false)
+        else
+            \* Accept (try_insert -> true), then dispatch.
+            seen := seen \union {req.id};
+            if req.privileged then
+                \* PRIVILEGED PATH -- never consults confirmUnpriv.
+                \* ApprovedAlways is impossible here (Classify guards on ~priv)
+                \* and the real code defensively folds it to Denied anyway.
+                if Classify(key, TRUE) = "Approved" then
+                    NoteExec(TRUE);                     \* exec_sudo
+                elsif Classify(key, TRUE) = "Timeout" then
+                    skip;                               \* timeout, default deny
+                else
+                    skip;                               \* denied
+                end if;
+            elsif confirmUnpriv then
+                \* UNPRIVILEGED + confirmation ON.
+                if Classify(key, FALSE) = "Approved" then
+                    NoteExec(FALSE);                    \* exec_direct
+                elsif Classify(key, FALSE) = "ApprovedAlways" then
+                    FlipFlag();                         \* the ONLY flag writer
+                    NoteExec(FALSE);                    \* exec_direct
+                elsif Classify(key, FALSE) = "Timeout" then
+                    skip;
+                else
+                    skip;                               \* denied
+                end if;
+            else
+                \* UNPRIVILEGED + confirmation OFF: no prompt, exec directly.
+                NoteExec(FALSE);                        \* exec_direct
+            end if;
+        end if;
+        busy := FALSE;
+    end while;
+end process;
+
+end algorithm; *)
+\* BEGIN TRANSLATION (chksum(pcal) = "5576622c" /\ chksum(tla) = "3c1dd0ad")
+VARIABLES confirmUnpriv, seen, executed, req, key, busy, vNoExec, vReplay, 
+          vFlip
+
+(* define statement *)
+Classify(k, priv) ==
+    IF   k = "timeout"        THEN "Timeout"
+    ELSE IF k = "y"           THEN "Approved"
+    ELSE IF k = "a" /\ ~priv  THEN "ApprovedAlways"
+    ELSE                           "Denied"
+
+
+vars == << confirmUnpriv, seen, executed, req, key, busy, vNoExec, vReplay, 
+           vFlip >>
+
+ProcSet == {"env"} \cup {"daemon"}
+
+Init == (* Global variables *)
+        /\ confirmUnpriv \in BOOLEAN
+        /\ seen = {}
+        /\ executed = {}
+        /\ req = NoReq
+        /\ key = "timeout"
+        /\ busy = FALSE
+        /\ vNoExec = FALSE
+        /\ vReplay = FALSE
+        /\ vFlip = FALSE
+
+Env == /\ ~busy
+       /\ \E r \in Requests:
+            \E k \in KeyChoice:
+              /\ req' = r
+              /\ key' = k
+              /\ busy' = TRUE
+       /\ UNCHANGED << confirmUnpriv, seen, executed, vNoExec, vReplay, vFlip >>
+
+Daemon == /\ busy
+          /\ IF ~req.wellFormed
+                THEN /\ TRUE
+                     /\ UNCHANGED << confirmUnpriv, seen, executed, vNoExec, 
+                                     vReplay, vFlip >>
+                ELSE /\ IF req.forwardAgent /\ req.privileged
+                           THEN /\ TRUE
+                                /\ UNCHANGED << confirmUnpriv, seen, executed, 
+                                                vNoExec, vReplay, vFlip >>
+                           ELSE /\ IF ~req.fresh
+                                      THEN /\ TRUE
+                                           /\ UNCHANGED << confirmUnpriv, seen, 
+                                                           executed, vNoExec, 
+                                                           vReplay, vFlip >>
+                                      ELSE /\ IF ~req.envOk
+                                                 THEN /\ TRUE
+                                                      /\ UNCHANGED << confirmUnpriv, 
+                                                                      seen, 
+                                                                      executed, 
+                                                                      vNoExec, 
+                                                                      vReplay, 
+                                                                      vFlip >>
+                                                 ELSE /\ IF req.id \in seen
+                                                            THEN /\ TRUE
+                                                                 /\ UNCHANGED << confirmUnpriv, 
+                                                                                 seen, 
+                                                                                 executed, 
+                                                                                 vNoExec, 
+                                                                                 vReplay, 
+                                                                                 vFlip >>
+                                                            ELSE /\ seen' = (seen \union {req.id})
+                                                                 /\ IF req.privileged
+                                                                       THEN /\ IF Classify(key, TRUE) = "Approved"
+                                                                                  THEN /\ IF req.id \in executed
+                                                                                             THEN /\ vReplay' = TRUE
+                                                                                             ELSE /\ TRUE
+                                                                                                  /\ UNCHANGED vReplay
+                                                                                       /\ IF TRUE /\ key # "y"
+                                                                                             THEN /\ vNoExec' = TRUE
+                                                                                             ELSE /\ TRUE
+                                                                                                  /\ UNCHANGED vNoExec
+                                                                                       /\ executed' = (executed \union {req.id})
+                                                                                  ELSE /\ IF Classify(key, TRUE) = "Timeout"
+                                                                                             THEN /\ TRUE
+                                                                                             ELSE /\ TRUE
+                                                                                       /\ UNCHANGED << executed, 
+                                                                                                       vNoExec, 
+                                                                                                       vReplay >>
+                                                                            /\ UNCHANGED << confirmUnpriv, 
+                                                                                            vFlip >>
+                                                                       ELSE /\ IF confirmUnpriv
+                                                                                  THEN /\ IF Classify(key, FALSE) = "Approved"
+                                                                                             THEN /\ IF req.id \in executed
+                                                                                                        THEN /\ vReplay' = TRUE
+                                                                                                        ELSE /\ TRUE
+                                                                                                             /\ UNCHANGED vReplay
+                                                                                                  /\ IF FALSE /\ key # "y"
+                                                                                                        THEN /\ vNoExec' = TRUE
+                                                                                                        ELSE /\ TRUE
+                                                                                                             /\ UNCHANGED vNoExec
+                                                                                                  /\ executed' = (executed \union {req.id})
+                                                                                                  /\ UNCHANGED << confirmUnpriv, 
+                                                                                                                  vFlip >>
+                                                                                             ELSE /\ IF Classify(key, FALSE) = "ApprovedAlways"
+                                                                                                        THEN /\ confirmUnpriv' = FALSE
+                                                                                                             /\ IF req.privileged \/ key # "a"
+                                                                                                                   THEN /\ vFlip' = TRUE
+                                                                                                                   ELSE /\ TRUE
+                                                                                                                        /\ vFlip' = vFlip
+                                                                                                             /\ IF req.id \in executed
+                                                                                                                   THEN /\ vReplay' = TRUE
+                                                                                                                   ELSE /\ TRUE
+                                                                                                                        /\ UNCHANGED vReplay
+                                                                                                             /\ IF FALSE /\ key # "y"
+                                                                                                                   THEN /\ vNoExec' = TRUE
+                                                                                                                   ELSE /\ TRUE
+                                                                                                                        /\ UNCHANGED vNoExec
+                                                                                                             /\ executed' = (executed \union {req.id})
+                                                                                                        ELSE /\ IF Classify(key, FALSE) = "Timeout"
+                                                                                                                   THEN /\ TRUE
+                                                                                                                   ELSE /\ TRUE
+                                                                                                             /\ UNCHANGED << confirmUnpriv, 
+                                                                                                                             executed, 
+                                                                                                                             vNoExec, 
+                                                                                                                             vReplay, 
+                                                                                                                             vFlip >>
+                                                                                  ELSE /\ IF req.id \in executed
+                                                                                             THEN /\ vReplay' = TRUE
+                                                                                             ELSE /\ TRUE
+                                                                                                  /\ UNCHANGED vReplay
+                                                                                       /\ IF FALSE /\ key # "y"
+                                                                                             THEN /\ vNoExec' = TRUE
+                                                                                             ELSE /\ TRUE
+                                                                                                  /\ UNCHANGED vNoExec
+                                                                                       /\ executed' = (executed \union {req.id})
+                                                                                       /\ UNCHANGED << confirmUnpriv, 
+                                                                                                       vFlip >>
+          /\ busy' = FALSE
+          /\ UNCHANGED << req, key >>
+
+Next == Env \/ Daemon
+
+Spec == Init /\ [][Next]_vars
+
+\* END TRANSLATION 
+
+\* ===================== invariants & properties =====================
+
+\* Modeling-hygiene type invariant.
+TypeOK ==
+    /\ confirmUnpriv \in BOOLEAN
+    /\ seen \subseteq Ids
+    /\ executed \subseteq Ids
+    /\ req \in Requests
+    /\ key \in KeyChoice
+    /\ busy \in BOOLEAN
+    /\ vNoExec \in BOOLEAN
+    /\ vReplay \in BOOLEAN
+    /\ vFlip \in BOOLEAN
+
+\* P1: a privileged exec happens only on a 'y' keypress -- never on timeout,
+\* denial, replay, or any policy state. (leaf 1.4 / G5)
+NoExecWithoutApproval == ~vNoExec
+
+\* P2: the same request id never causes two executions. (leaf 1.1 / G2.1)
+ReplayImpossible == ~vReplay
+
+\* P3: the policy flag flips only via an ApprovedAlways ('a') keypress on an
+\* unprivileged request -- never a request field, replay, MCP flag, or timeout.
+\* (leaf 1.4 / 4.4 / F2)
+PolicyFlipsOnlyOnKeypress == ~vFlip
+
+\* P4: no privileged exec without a 'y' keypress, for ANY value the policy flag
+\* took. The privileged branch structurally never reads confirmUnpriv, so this
+\* is the "independent of policy" reading of the same witness as P1; stated
+\* separately for assurance-case traceability (G5.1) and tripped by the NC2
+\* mutation that wires the flag into the privileged path. (leaf 4.4 / G5.1)
+PrivilegedGateIndependentOfPolicy == ~vNoExec
+
+\* P3 (temporal half): once the flag is FALSE it stays FALSE -- it only ever
+\* moves true->false, and only via the single writer above.
+FlagMonotone == [][confirmUnpriv' => confirmUnpriv]_confirmUnpriv
+
+=============================================================================
diff --git a/proofs/tla/ConcurrentHandlers.cfg b/proofs/tla/ConcurrentHandlers.cfg
new file mode 100644
index 0000000..2231531
--- /dev/null
+++ b/proofs/tla/ConcurrentHandlers.cfg
@@ -0,0 +1,19 @@
+\* TLC model for the Extended Rung 4 concurrent-handler interleaving model.
+\* See README.md.
+\* Run:  java -cp tla2tools.jar tlc2.TLC -config ConcurrentHandlers.cfg \
+\*            -workers auto ConcurrentHandlers.tla
+
+CONSTANTS
+    \* Two concurrent handlers: the minimum that exposes the pairwise dedup
+    \* TOCTOU and the TTY-lock race. A third only adds symmetric interleavings.
+    Handlers = {h1, h2}
+    \* Two ids: lets two handlers race the SAME id (the TOCTOU) and also handle
+    \* two distinct ids. TLC model values.
+    Ids = {r1, r2}
+
+SPECIFICATION Spec
+
+INVARIANTS
+    TypeOK
+    NoDoubleExec
+    TtyMutualExclusion
diff --git a/proofs/tla/ConcurrentHandlers.tla b/proofs/tla/ConcurrentHandlers.tla
new file mode 100644
index 0000000..3b72dea
--- /dev/null
+++ b/proofs/tla/ConcurrentHandlers.tla
@@ -0,0 +1,364 @@
+-------------------------- MODULE ConcurrentHandlers --------------------------
+(***************************************************************************)
+(* Extended Rung 4 of the sudo-proxy formalisation roadmap: a TLA+/PlusCal *)
+(* model of *concurrent* daemon connection handlers interleaving on the    *)
+(* two pieces of state they share, model-checked by TLC.                   *)
+(*                                                                         *)
+(* The sibling `ApprovalStateMachine` and `ReplayWindow` models handle one *)
+(* request to completion atomically, so they cannot see -- by construction *)
+(* -- the two concurrency hazards that `server::handle_connection` guards   *)
+(* against when many threads run at once:                                  *)
+(*                                                                         *)
+(*   1. the dedup TOCTOU. `SeenIds::try_insert` (server.rs:213) folds the   *)
+(*      contains-check and the insert into ONE critical section under the   *)
+(*      `seen_ids` mutex precisely so two threads racing the same id cannot  *)
+(*      both pass dedup. This model interleaves handlers on `seen` and       *)
+(*      checks that the atomic critical section is *sufficient*: no id ever  *)
+(*      executes twice under any interleaving.                              *)
+(*                                                                         *)
+(*   2. the TTY-lock serialisation. The privileged prompt and the           *)
+(*      foreground exec each take `tty_lock` (server.rs:555 / executor      *)
+(*      ForegroundGuard) so two handlers never drive /dev/tty at once --     *)
+(*      without it the daemon would sit in a background pgrp and EIO         *)
+(*      (PR #22). This model checks the lock keeps the interactive region    *)
+(*      mutually exclusive across all interleavings.                        *)
+(*                                                                         *)
+(* As in the sibling models, the PlusCal algorithm is the source of truth;  *)
+(* the TLC-checkable translation between BEGIN/END TRANSLATION is generated  *)
+(* by `pcal.trans` and committed alongside. If you edit the PlusCal, re-run  *)
+(* `pcal.trans` and commit both. Properties are tracked either by a bounded  *)
+(* witness counter (`ttyActive`) or a monitor flag (`vDoubleExec`) so every  *)
+(* variable stays bounded and the reachable state space is finite and small. *)
+(*                                                                         *)
+(* Negative-control recipe and faithfulness ledger live in README.md.       *)
+(***************************************************************************)
+EXTENDS Naturals, FiniteSets
+
+CONSTANTS Handlers,    \* the set of concurrent handler threads, e.g. {h1, h2}
+          Ids          \* the bounded set of request ids, e.g. {r1, r2}
+
+\* Sentinel for "lock is free"; distinct from every handler id.
+NoHolder == "none"
+
+\* Keypress abstraction restricted to what matters for concurrency: "y"
+\* approves (so the exec site is reached), "other" denies/times out (so the
+\* TTY lock is released without executing). The full keypress decision table
+\* and the confirm_unprivileged flip are the ApprovalStateMachine model's job;
+\* here no "a" is offered, so confirmUnpriv is read-only.
+Keys == {"y", "other"}
+
+(* --algorithm ConcurrentHandlers
+
+variables
+    \* SeenIds: ids the daemon has accepted. Eviction is out of scope here
+    \* (the ReplayWindow model owns the TTL); `seen` grows monotonically,
+    \* which is sound for the concurrency properties. The mutex guarding it.
+    seen = {},
+    seenLock = NoHolder,
+
+    \* The shared TTY lock (Arc<Mutex<()>>). Held across an interactive prompt
+    \* and re-taken by ForegroundGuard for a privileged exec.
+    ttyLock = NoHolder,
+
+    \* The persisted policy flag (Arc<AtomicBool>). Free at init so both the
+    \* confirm-on and no-confirm dispatch paths are covered; never flipped here.
+    confirmUnpriv \in BOOLEAN,
+
+    \* Ids that have actually executed -- the witness set for double-exec.
+    executed = {},
+
+    \* Witness counter: how many handlers are in the interactive TTY region
+    \* (prompt or foreground exec) at once. The lock must keep this <= 1.
+    ttyActive = 0,
+
+    \* Monitor flag: raised at an exec site if this id already executed, i.e.
+    \* the dedup let the same id through twice (the concurrent TOCTOU).
+    vDoubleExec = FALSE;
+
+\* Record an execution of `id`; raise the double-exec monitor if it ran before.
+macro NoteExec(id) begin
+    if id \in executed then
+        vDoubleExec := TRUE;
+    end if;
+    executed := executed \union {id};
+end macro;
+
+\* One connection handler thread. Picks its request (id + privileged + key)
+\* nondeterministically -- the gate chain (validate/freshness/env) is the
+\* sibling model's job, so every request here is assumed past those gates and
+\* the focus is the shared-state races. Single-shot: handles one request then
+\* terminates, which bounds the state space.
+process Handler \in Handlers
+variables
+    id   = CHOOSE i \in Ids : TRUE,
+    priv = FALSE,
+    key  = "y",
+    dup  = FALSE;
+begin
+Pick:
+    with i \in Ids, p \in BOOLEAN, k \in Keys do
+        id := i; priv := p; key := k;
+    end with;
+
+\* ---- dedup critical section: the faithful ATOMIC try_insert ----
+SeenAcq:
+    await seenLock = NoHolder;     \* lock_recover(&seen_ids)
+    seenLock := self;
+TryInsert:
+    \* evict_stale() is a no-op here (eviction = ReplayWindow's job). The
+    \* contains-check and the insert are ONE atomic step under the lock --
+    \* the single critical section that closes the TOCTOU. (To see the model
+    \* has teeth, the README NC1 splits this into check / release / insert.)
+    if id \in seen then
+        dup := TRUE;
+    else
+        seen := seen \union {id};
+        dup := FALSE;
+    end if;
+    seenLock := NoHolder;
+
+Dispatch:
+    if dup then
+        goto Done;                 \* duplicate request id -> rejected, never execs
+    elsif priv then
+        goto PrivPrompt;
+    elsif confirmUnpriv then
+        goto ConfirmPrompt;
+    else
+        goto AutoExec;
+    end if;
+
+\* ---- privileged: TTY lock held across the prompt, released, re-taken to exec ----
+PrivPrompt:
+    await ttyLock = NoHolder;      \* blocking lock_recover(&tty_lock) for the prompt
+    ttyLock := self;
+    ttyActive := ttyActive + 1;
+PrivPromptEnd:
+    ttyActive := ttyActive - 1;
+    ttyLock := NoHolder;           \* released before exec (server.rs:553)
+    if key # "y" then goto Done; end if;   \* denied / timeout -> default deny
+PrivExecAcq:
+    await ttyLock = NoHolder;      \* ForegroundGuard::take re-acquires (blocking)
+    ttyLock := self;
+    ttyActive := ttyActive + 1;
+PrivExec:
+    NoteExec(id);                  \* exec_sudo
+    ttyActive := ttyActive - 1;
+    ttyLock := NoHolder;
+    goto Done;
+
+\* ---- unprivileged + confirm ON: blocking prompt, then exec_direct (no TTY) ----
+ConfirmPrompt:
+    await ttyLock = NoHolder;      \* blocking lock_recover(&tty_lock)
+    ttyLock := self;
+    ttyActive := ttyActive + 1;
+ConfirmPromptEnd:
+    ttyActive := ttyActive - 1;
+    ttyLock := NoHolder;
+    if key # "y" then goto Done; end if;
+ConfirmExec:
+    NoteExec(id);                  \* exec_direct -- takes NO tty_lock
+    goto Done;
+
+\* ---- unprivileged + confirm OFF: best-effort banner via try_lock, then exec ----
+AutoExec:
+    \* `if let Ok(_g) = tty_lock.try_lock() { display_banner }`: non-blocking
+    \* and released immediately. Modelled as atomic with no lasting hold, so it
+    \* never overlaps an interactive region -- skipped when the lock is busy.
+    if ttyLock = NoHolder then
+        skip;                      \* banner under a momentary try_lock
+    end if;
+    NoteExec(id);                  \* exec_direct
+    goto Done;                     \* "Done" is PlusCal's implicit terminal label
+end process;
+
+end algorithm; *)
+
+\* BEGIN TRANSLATION
+VARIABLES seen, seenLock, ttyLock, confirmUnpriv, executed, ttyActive, 
+          vDoubleExec, pc, id, priv, key, dup
+
+vars == << seen, seenLock, ttyLock, confirmUnpriv, executed, ttyActive, 
+           vDoubleExec, pc, id, priv, key, dup >>
+
+ProcSet == (Handlers)
+
+Init == (* Global variables *)
+        /\ seen = {}
+        /\ seenLock = NoHolder
+        /\ ttyLock = NoHolder
+        /\ confirmUnpriv \in BOOLEAN
+        /\ executed = {}
+        /\ ttyActive = 0
+        /\ vDoubleExec = FALSE
+        (* Process Handler *)
+        /\ id = [self \in Handlers |-> CHOOSE i \in Ids : TRUE]
+        /\ priv = [self \in Handlers |-> FALSE]
+        /\ key = [self \in Handlers |-> "y"]
+        /\ dup = [self \in Handlers |-> FALSE]
+        /\ pc = [self \in ProcSet |-> "Pick"]
+
+Pick(self) == /\ pc[self] = "Pick"
+              /\ \E i \in Ids:
+                   \E p \in BOOLEAN:
+                     \E k \in Keys:
+                       /\ id' = [id EXCEPT ![self] = i]
+                       /\ priv' = [priv EXCEPT ![self] = p]
+                       /\ key' = [key EXCEPT ![self] = k]
+              /\ pc' = [pc EXCEPT ![self] = "SeenAcq"]
+              /\ UNCHANGED << seen, seenLock, ttyLock, confirmUnpriv, executed, 
+                              ttyActive, vDoubleExec, dup >>
+
+SeenAcq(self) == /\ pc[self] = "SeenAcq"
+                 /\ seenLock = NoHolder
+                 /\ seenLock' = self
+                 /\ pc' = [pc EXCEPT ![self] = "TryInsert"]
+                 /\ UNCHANGED << seen, ttyLock, confirmUnpriv, executed, 
+                                 ttyActive, vDoubleExec, id, priv, key, dup >>
+
+TryInsert(self) == /\ pc[self] = "TryInsert"
+                   /\ IF id[self] \in seen
+                         THEN /\ dup' = [dup EXCEPT ![self] = TRUE]
+                              /\ seen' = seen
+                         ELSE /\ seen' = (seen \union {id[self]})
+                              /\ dup' = [dup EXCEPT ![self] = FALSE]
+                   /\ seenLock' = NoHolder
+                   /\ pc' = [pc EXCEPT ![self] = "Dispatch"]
+                   /\ UNCHANGED << ttyLock, confirmUnpriv, executed, ttyActive, 
+                                   vDoubleExec, id, priv, key >>
+
+Dispatch(self) == /\ pc[self] = "Dispatch"
+                  /\ IF dup[self]
+                        THEN /\ pc' = [pc EXCEPT ![self] = "Done"]
+                        ELSE /\ IF priv[self]
+                                   THEN /\ pc' = [pc EXCEPT ![self] = "PrivPrompt"]
+                                   ELSE /\ IF confirmUnpriv
+                                              THEN /\ pc' = [pc EXCEPT ![self] = "ConfirmPrompt"]
+                                              ELSE /\ pc' = [pc EXCEPT ![self] = "AutoExec"]
+                  /\ UNCHANGED << seen, seenLock, ttyLock, confirmUnpriv, 
+                                  executed, ttyActive, vDoubleExec, id, priv, 
+                                  key, dup >>
+
+PrivPrompt(self) == /\ pc[self] = "PrivPrompt"
+                    /\ ttyLock = NoHolder
+                    /\ ttyLock' = self
+                    /\ ttyActive' = ttyActive + 1
+                    /\ pc' = [pc EXCEPT ![self] = "PrivPromptEnd"]
+                    /\ UNCHANGED << seen, seenLock, confirmUnpriv, executed, 
+                                    vDoubleExec, id, priv, key, dup >>
+
+PrivPromptEnd(self) == /\ pc[self] = "PrivPromptEnd"
+                       /\ ttyActive' = ttyActive - 1
+                       /\ ttyLock' = NoHolder
+                       /\ IF key[self] # "y"
+                             THEN /\ pc' = [pc EXCEPT ![self] = "Done"]
+                             ELSE /\ pc' = [pc EXCEPT ![self] = "PrivExecAcq"]
+                       /\ UNCHANGED << seen, seenLock, confirmUnpriv, executed, 
+                                       vDoubleExec, id, priv, key, dup >>
+
+PrivExecAcq(self) == /\ pc[self] = "PrivExecAcq"
+                     /\ ttyLock = NoHolder
+                     /\ ttyLock' = self
+                     /\ ttyActive' = ttyActive + 1
+                     /\ pc' = [pc EXCEPT ![self] = "PrivExec"]
+                     /\ UNCHANGED << seen, seenLock, confirmUnpriv, executed, 
+                                     vDoubleExec, id, priv, key, dup >>
+
+PrivExec(self) == /\ pc[self] = "PrivExec"
+                  /\ IF id[self] \in executed
+                        THEN /\ vDoubleExec' = TRUE
+                        ELSE /\ TRUE
+                             /\ UNCHANGED vDoubleExec
+                  /\ executed' = (executed \union {id[self]})
+                  /\ ttyActive' = ttyActive - 1
+                  /\ ttyLock' = NoHolder
+                  /\ pc' = [pc EXCEPT ![self] = "Done"]
+                  /\ UNCHANGED << seen, seenLock, confirmUnpriv, id, priv, key, 
+                                  dup >>
+
+ConfirmPrompt(self) == /\ pc[self] = "ConfirmPrompt"
+                       /\ ttyLock = NoHolder
+                       /\ ttyLock' = self
+                       /\ ttyActive' = ttyActive + 1
+                       /\ pc' = [pc EXCEPT ![self] = "ConfirmPromptEnd"]
+                       /\ UNCHANGED << seen, seenLock, confirmUnpriv, executed, 
+                                       vDoubleExec, id, priv, key, dup >>
+
+ConfirmPromptEnd(self) == /\ pc[self] = "ConfirmPromptEnd"
+                          /\ ttyActive' = ttyActive - 1
+                          /\ ttyLock' = NoHolder
+                          /\ IF key[self] # "y"
+                                THEN /\ pc' = [pc EXCEPT ![self] = "Done"]
+                                ELSE /\ pc' = [pc EXCEPT ![self] = "ConfirmExec"]
+                          /\ UNCHANGED << seen, seenLock, confirmUnpriv, 
+                                          executed, vDoubleExec, id, priv, key, 
+                                          dup >>
+
+ConfirmExec(self) == /\ pc[self] = "ConfirmExec"
+                     /\ IF id[self] \in executed
+                           THEN /\ vDoubleExec' = TRUE
+                           ELSE /\ TRUE
+                                /\ UNCHANGED vDoubleExec
+                     /\ executed' = (executed \union {id[self]})
+                     /\ pc' = [pc EXCEPT ![self] = "Done"]
+                     /\ UNCHANGED << seen, seenLock, ttyLock, confirmUnpriv, 
+                                     ttyActive, id, priv, key, dup >>
+
+AutoExec(self) == /\ pc[self] = "AutoExec"
+                  /\ IF ttyLock = NoHolder
+                        THEN /\ TRUE
+                        ELSE /\ TRUE
+                  /\ IF id[self] \in executed
+                        THEN /\ vDoubleExec' = TRUE
+                        ELSE /\ TRUE
+                             /\ UNCHANGED vDoubleExec
+                  /\ executed' = (executed \union {id[self]})
+                  /\ pc' = [pc EXCEPT ![self] = "Done"]
+                  /\ UNCHANGED << seen, seenLock, ttyLock, confirmUnpriv, 
+                                  ttyActive, id, priv, key, dup >>
+
+Handler(self) == Pick(self) \/ SeenAcq(self) \/ TryInsert(self)
+                    \/ Dispatch(self) \/ PrivPrompt(self)
+                    \/ PrivPromptEnd(self) \/ PrivExecAcq(self)
+                    \/ PrivExec(self) \/ ConfirmPrompt(self)
+                    \/ ConfirmPromptEnd(self) \/ ConfirmExec(self)
+                    \/ AutoExec(self)
+
+(* Allow infinite stuttering to prevent deadlock on termination. *)
+Terminating == /\ \A self \in ProcSet: pc[self] = "Done"
+               /\ UNCHANGED vars
+
+Next == (\E self \in Handlers: Handler(self))
+           \/ Terminating
+
+Spec == Init /\ [][Next]_vars
+
+Termination == <>(\A self \in ProcSet: pc[self] = "Done")
+
+\* END TRANSLATION
+
+\* ===================== invariants & properties =====================
+
+\* Modeling-hygiene type invariant.
+TypeOK ==
+    /\ seen \subseteq Ids
+    /\ executed \subseteq Ids
+    /\ seenLock \in Handlers \cup {NoHolder}
+    /\ ttyLock  \in Handlers \cup {NoHolder}
+    /\ confirmUnpriv \in BOOLEAN
+    /\ ttyActive \in 0..Cardinality(Handlers)
+    /\ vDoubleExec \in BOOLEAN
+
+\* C1: no request id ever executes twice, under ANY interleaving of the
+\* concurrent handlers. This is the closure of the dedup TOCTOU: the atomic
+\* `try_insert` critical section is *sufficient* to serialise the same-id race.
+\* (leaf 1.1 / G2.1 -- the concurrent half of ReplayImpossible.)
+NoDoubleExec == ~vDoubleExec
+
+\* C2: at most one handler is in the interactive TTY region (prompt or
+\* foreground exec) at any time -- the `tty_lock` serialises /dev/tty across
+\* all interleavings, so the daemon never drives the TTY from two handlers at
+\* once (the PR #22 background-pgrp / EIO hazard). (leaf Sn5.x / G6)
+TtyMutualExclusion == ttyActive <= 1
+
+=============================================================================
diff --git a/proofs/tla/README.md b/proofs/tla/README.md
new file mode 100644
index 0000000..41c1aeb
--- /dev/null
+++ b/proofs/tla/README.md
@@ -0,0 +1,385 @@
+# Rung 4 — TLA+/PlusCal models
+
+[← formalisation roadmap](../../docs/formalisation-roadmap.md) · [threat model](../../docs/threat-model.md) · [assurance case](../../docs/assurance-case.md)
+
+This directory holds **three** TLA+/PlusCal models, all model-checked by TLC:
+
+1. **`ApprovalStateMachine`** (below) — the approval state machine: the four
+   safety properties of the gate chain and `classify_key`.
+2. **[`ReplayWindow`](#window-sizing--freshness--replay-retention) (Extended Rung 4)** —
+   the temporal freshness ↔ replay-retention **window-sizing** property that the
+   first model deliberately abstracts away (it never evicts). Restores a small
+   concrete clock + real TTL eviction and proves the genuinely temporal claim.
+3. **[`ConcurrentHandlers`](#concurrent-handlers--dedup-toctou--tty-lock-serialisation) (Extended Rung 4)** —
+   *concurrent* handler threads interleaving on the shared `seen` set and the TTY
+   lock. Proves the atomic `try_insert` critical section is *sufficient* (no
+   double-exec under any interleaving) and that the TTY lock serialises /dev/tty —
+   the canonical model-checking case the two atomic, one-request-at-a-time models
+   above cannot express.
+
+---
+
+## Approval state machine
+
+A TLA+/PlusCal model of sudo-proxy's approval state machine, model-checked by
+TLC. It discharges **Rung 4** (state-machine half) of the
+[formalisation roadmap](../../docs/formalisation-roadmap.md): the temporal /
+relational properties that are not per-function and that the Rung 0 threat model
+routed here — attack-tree leaves **1.4 / 4.4** (the `confirm_unprivileged`
+policy transition, finding **F2**) and the unconditional human gate on
+privileged execution.
+
+`ApprovalStateMachine.tla` carries the PlusCal algorithm in a comment block (the
+source of truth) followed by the `pcal.trans`-generated TLA+ translation that TLC
+checks. `ApprovalStateMachine.cfg` is the bounded model. The Rung 3 Kani sibling
+lives in [`src/proofs.rs`](../../src/proofs.rs); this is the protocol-level
+analogue.
+
+## The four properties
+
+Tracked by bounded **monitor variables** — a violation flag is raised at the
+exact site the bad thing would happen, and the invariant asserts the flag stays
+`FALSE`. (This keeps every variable bounded, so the reachable state space is
+finite and small with *no* history-length constraint: once `seen` fills, further
+requests are rejected as replays and only revisit states.)
+
+| # | Invariant | Claim | Maps to |
+|---|-----------|-------|---------|
+| **P1** | `NoExecWithoutApproval` | A privileged exec happens only on a `y` keypress — never on timeout, denial, replay, or any policy state. | leaf 1.4 · G5 |
+| **P2** | `ReplayImpossible` | The same request id never causes two executions. | leaf 1.1 · G2.1 |
+| **P3** | `PolicyFlipsOnlyOnKeypress` (+ action property `FlagMonotone`) | The policy flag flips only via an `a` keypress on an *unprivileged* request — never a request field, replay, MCP flag, or timeout — and once `FALSE` stays `FALSE`. | leaf 1.4 / 4.4 · F2 |
+| **P4** | `PrivilegedGateIndependentOfPolicy` | No privileged exec without a `y` keypress, for *any* value the policy flag took. The privileged branch structurally never reads the flag; stated separately for traceability. | leaf 4.4 · G5.1 |
+
+`Classify(k, priv)` in the spec is a direct transcription of
+[`tui::classify_key`](../../src/tui.rs); the daemon's gate chain mirrors the
+ordering in [`server.rs` `handle_connection`](../../src/server.rs).
+
+## Running TLC
+
+Needs a JRE (21+) and `tla2tools.jar` (pin a release, e.g. `v1.8.0`):
+
+```sh
+curl -sSL -o tla2tools.jar \
+  https://github.com/tlaplus/tlaplus/releases/download/v1.8.0/tla2tools.jar
+
+# (only if you edited the PlusCal) re-generate the translation, then commit both:
+java -cp tla2tools.jar pcal.trans ApprovalStateMachine.tla
+
+java -cp tla2tools.jar tlc2.TLC -nowarning \
+     -config ApprovalStateMachine.cfg -workers auto ApprovalStateMachine.tla
+```
+
+Expected: `Model checking completed. No error has been found.` (~9k distinct
+states, well under a second). The committed `.tla` already contains the
+translation, so CI does not re-run `pcal.trans`.
+
+## Negative controls — that the model has teeth
+
+The analogue of the Kani negative controls: documented mutations that **must**
+each make TLC produce a counterexample. Apply by hand to the PlusCal, re-run
+`pcal.trans`, re-check, then revert. **Do not commit the mutations.** Each was
+verified to violate exactly the listed invariant.
+
+| # | Mutation (in the PlusCal) | Violates |
+|---|---------------------------|----------|
+| **NC1** | Privileged `Timeout` branch: replace `skip;` with `NoteExec(TRUE);` (exec on timeout). | `NoExecWithoutApproval` |
+| **NC2** | Wire the flag into the privileged gate: prepend `if confirmUnpriv then NoteExec(TRUE); elsif …` to the privileged dispatch. | `NoExecWithoutApproval` (= `PrivilegedGateIndependentOfPolicy` — same witness) |
+| **NC3** | Remove the dedup gate: change `elsif req.id \in seen then` to `elsif FALSE then`. | `ReplayImpossible` |
+| **NC4** | Flip the flag illegitimately: replace the `skip;` in the unprivileged `Timeout` branch with `FlipFlag();`. | `PolicyFlipsOnlyOnKeypress` |
+
+Note NC4 leaves `FlagMonotone` *satisfied* (a `TRUE→FALSE` flip is monotone-OK):
+it is the provenance monitor `PolicyFlipsOnlyOnKeypress`, not monotonicity, that
+catches an illegitimate flip — which is the point of having both.
+
+## Faithfulness ledger
+
+What the model claims, and what it does not — the analogue of the Kani "Scope"
+note in the [roadmap](../../docs/formalisation-roadmap.md).
+
+**Modelled faithfully**
+
+- The gate chain *ordering* (validate → forward_agent+privileged → freshness →
+  env allowlist → replay dedup → dispatch), so a reordering regression is
+  observable.
+- `classify_key`'s exact decision table (`Classify`), including that
+  `ApprovedAlways` is emitted iff `a` ∧ unprivileged.
+- Both dispatch branches, the defensive `ApprovedAlways → Denied` fold on the
+  privileged path, and that `confirm_unprivileged` is written by exactly one
+  primitive (`FlipFlag`).
+- Replay rejection via a monotonically-growing `seen` set.
+- The attacker and the operator are both fully nondeterministic, so the
+  properties are universally quantified over all field forgeries / replays and
+  all operator choices.
+
+**Abstracted (sound for these four properties)**
+
+- The clock / freshness check → a boolean `fresh`; env contents → `envOk`;
+  decode + peer-auth → `wellFormed`. The properties don't depend on the contents
+  these gates inspect, only on pass/reject.
+- `SeenIds` retention/eviction → never evict. This is *conservative* for
+  `ReplayImpossible`: remembering ids forever can only add rejections within the
+  model's horizon, never remove one. A replay accepted *after* the 120 s
+  retention window is by-design and is not what leaves 1.4 / 4.4 are about. The
+  **window relationship** that makes this safe — that an id stays remembered for
+  as long as a replay of it could still pass the freshness gate — is exactly what
+  the [`ReplayWindow`](#window-sizing--freshness--replay-retention) model proves,
+  lifting this abstraction.
+
+**Out of scope (covered by other rungs)**
+
+- Concurrent TTY-lock interleavings and the `try_insert` TOCTOU — one request is
+  handled to completion here; covered by the
+  [`ConcurrentHandlers`](#concurrent-handlers--dedup-toctou--tty-lock-serialisation)
+  sibling model.
+- Freshness arithmetic monotonicity (`ymd_hms_to_epoch`) — Rung 3 Kani.
+- Per-field dangerous-char scanning (`has_dangerous_chars`) — Rung 3 Kani.
+- The SSH first-contact MITM residual (leaf 4.2 / A4) — the separate Rung 4
+  ProVerif model in [`../proverif/`](../proverif/).
+
+---
+
+## Window-sizing — freshness ↔ replay retention
+
+[`ReplayWindow.tla`](ReplayWindow.tla) / [`ReplayWindow.cfg`](ReplayWindow.cfg) —
+the **Extended Rung 4** model. The approval-state-machine model above models
+`seen` as never evicting (conservative, by its own admission). That conservatism
+*hides* the one property the daemon genuinely depends on: the relationship
+between the two time windows in [`server.rs`](../../src/server.rs):
+
+| constant | value | role |
+|----------|-------|------|
+| `MAX_REQUEST_AGE` | 60 s | freshness gate (`check_freshness`) |
+| `REPLAY_RETENTION` | 120 s | seen-id eviction TTL (`SeenIds::evict_stale`) |
+
+This model restores a **small abstract clock** and the **real TTL eviction**, and
+proves the temporal claim: *a captured request cannot evade replay-dedup by
+waiting for its id to be evicted while still passing the freshness gate.* Only the
+window **relationship** matters, not the absolute seconds, so the constants are
+`Freshness = 2`, `Retention = 4` (= 2·Freshness), `MaxTime = 6`, `Ids = {r1}`.
+
+### The theorem
+
+| Invariant | Claim | Maps to |
+|-----------|-------|---------|
+| **`PastReplayImpossible`** | No captured **past-dated** request is ever re-executed. | leaf 1.1 · G2.1 (temporal half) |
+| `AcceptWasFresh` | The daemon only ever accepts a request that was fresh at acceptance. | modeling hygiene |
+| `TypeOK` | Type invariant. | — |
+
+**The window arithmetic.** An id is inserted at `acceptedAt ≥ firstTs` (you accept
+no earlier than the request was created) and evicted only once
+`clock − insertedAt > Retention`. A replay re-uses the fixed `firstTs`, so it
+passes freshness only while `clock − firstTs ≤ Freshness`. Since
+`insertedAt = acceptedAt ≥ firstTs` and `Retention ≥ Freshness`, eviction
+(`clock > firstTs + Retention ≥ firstTs + Freshness`) cannot co-occur with
+freshness (`clock ≤ firstTs + Freshness`). So a captured replay can never win the
+race — proved exhaustively by TLC (442 distinct states, sub-second).
+
+**Why a monitor, not a `seen`-membership invariant.** Eviction is **lazy** — an id
+is pruned only inside `try_insert`, and a replay-accept removes then re-inserts the
+id in one atomic step. A membership predicate (`∃ rec ∈ seen : …`) therefore can
+never observe the gap; the re-insertion masks it. Only an event monitor (`vReplay`
+raised at the re-exec site) is a robust witness — the same rationale behind the
+approval-state-machine model's monitor variables.
+
+### Finding 1 — the tight bound is `Retention ≥ Freshness`, not `2×`
+
+The code comment says retention is "twice `MAX_REQUEST_AGE` so that any id that
+could still pass the freshness check is also still in the set." The arithmetic
+above shows `Retention ≥ Freshness` already suffices (max acceptance lag only
+pushes eviction *later*, never earlier). TLC confirms it: the theorem **holds** at
+`Retention = Freshness` and **fails** at `Retention = Freshness − 1`. So the
+shipped `120 s = 2 × 60 s` is **conservative margin, not a necessity** — *not a
+bug*; the daemon is, if anything, safer than its comment claims.
+
+### Finding 2 — future-dated timestamps defeat replay protection after eviction
+
+`parse_age` **clamps future-dated timestamps to age 0** (`server.rs:146`) with no
+upper cap, so a request dated beyond `Retention` into the future keeps passing the
+freshness gate *forever* while its id ages out of `seen`. It is then replayable
+**every `Retention` units regardless of how large `Retention` is** — the
+window-sizing relationship does not help here at all. The model exposes this: with
+the past-dated guard dropped (`NoFutureReplay`), TLC returns a concrete trace —
+e.g. a request dated `ts = 3` accepted at `clock = 0`, evicted at `clock = 5`, then
+replayed and re-executed at `clock = 5` (still fresh: `5 − 3 = 2 ≤ Freshness`).
+
+How exploitable this is depends on the threat model (it needs the request onto the
+channel — which SSH pinning guards — and for privileged commands the human still
+gates the first run), but for auto-approved unprivileged commands it is a real
+repeatable replay. The fix is cheap: also reject timestamps too far in the
+*future* (a small clock-skew allowance, then reject rather than clamp). Logged as a
+**separate backlog item** — not fixed in this (proofs-only) session.
+
+### Negative controls — that the model has teeth
+
+Apply by hand to the model / cfg, re-check, then revert. **Do not commit.**
+
+| # | Mutation | Violates |
+|---|----------|----------|
+| **NC1** | `Retention = Freshness − 1` (window relationship broken) | `PastReplayImpossible` |
+| **Tight-bound** | run at `Retention = Freshness` (holds) vs `Freshness − 1` (fails) | confirms `Retention ≥ Freshness` is the tight threshold (finding 1) |
+| **NC2** | freshness gate disabled (`FreshAt(ts) == TRUE`) | `AcceptWasFresh` **and** `PastReplayImpossible` (freshness is load-bearing) |
+| **Finding 2** | add `NoFutureReplay` to `INVARIANTS` | future-dated counterexample (finding 2) — a real property of the code, not a model mutation |
+
+> Note NC2 is *disable freshness*, not *evict by the wrong field*: at
+> `Retention ≥ Freshness`, evicting by the request timestamp instead of the
+> insertion time is **also** safe for past-dated requests (it expires them up to
+> `Freshness` earlier, but freshness has already lapsed), so it is not a valid
+> negative control here — a small result the model also settles.
+
+### Faithfulness ledger
+
+**Modelled faithfully**
+
+- The gate order at acceptance: freshness → evict → dedup → insert, mirroring
+  `try_insert` (`evict_stale` runs first, then the membership check, then the
+  stamped insert).
+- Eviction keyed on **insertion** time (`Live(rec) == ¬(clock − insertedAt > Retention)`).
+- `parse_age`'s future-timestamp **clamp** (`FreshAt(ts) == IF ts > clock THEN TRUE …`).
+- The attacker replays the **captured bytes** (same id, same timestamp); the honest
+  client submits the one genuine request. Both, plus the clock, are nondeterministic.
+
+**Abstracted (sound for the window property)**
+
+- The clock and the windows → small integers preserving `Retention = 2·Freshness`;
+  the property depends only on the *relationship*, exhaustively checked to `MaxTime`.
+- One id (`Ids = {r1}`): the property is per-id; the two-distinct-ids interaction is
+  the approval-state-machine model's job.
+- Atomic handling, no clock tick mid-request — faithful because `check_freshness`
+  runs at handler *entry*, before the TTY lock (`server.rs:499`), precisely so a
+  request cannot age past the gate while queued.
+
+**Out of scope**
+
+- Concurrent `try_insert` interleavings — the
+  [`ConcurrentHandlers`](#concurrent-handlers--dedup-toctou--tty-lock-serialisation)
+  sibling model.
+- The approval/keypress gate (the approval-state-machine model).
+
+### Running TLC
+
+```sh
+# (only if you edited the PlusCal) re-generate the translation, then commit both:
+java -cp tla2tools.jar pcal.trans ReplayWindow.tla
+
+java -cp tla2tools.jar tlc2.TLC -nowarning \
+     -config ReplayWindow.cfg -workers auto ReplayWindow.tla
+```
+
+Expected: `No error has been found` (442 distinct states, sub-second).
+
+---
+
+## Concurrent handlers — dedup TOCTOU & TTY-lock serialisation
+
+[`ConcurrentHandlers.tla`](ConcurrentHandlers.tla) /
+[`ConcurrentHandlers.cfg`](ConcurrentHandlers.cfg) — the second **Extended Rung 4**
+model. The two models above each handle one request to completion *atomically*, so
+by construction they cannot see the hazards that only exist when the daemon runs
+many [`handle_connection`](../../src/server.rs) threads at once. This is the
+canonical model-checking use case: a property over *interleavings*, not over one
+sequential run.
+
+`server::run` spawns a thread per connection, all sharing two pieces of state via
+`Arc<Mutex<…>>`:
+
+| shared state | guarded by | the hazard if mishandled |
+|--------------|------------|--------------------------|
+| `seen` (`SeenIds`) | `seen_ids` mutex; `try_insert` is one critical section | two threads race the same id, both pass dedup, both exec (double-exec TOCTOU) |
+| `/dev/tty` | `tty_lock` (`Arc<Mutex<()>>`) | two threads drive the terminal at once → background-pgrp EIO (PR #22) |
+
+The model runs `Handlers = {h1, h2}` concurrent handlers, each picking its request
+(`id` ∈ `Ids`, `privileged`, key) nondeterministically — so TLC explores every
+interleaving, including both handlers racing the **same** id. PlusCal labels are
+the atomicity boundaries: the mutex acquire/release and each critical section are
+labelled steps, so TLC interleaves exactly where the real threads can.
+
+### The two properties
+
+| Invariant | Claim | Maps to |
+|-----------|-------|---------|
+| **`NoDoubleExec`** | No request id ever executes twice, under **any** interleaving — the atomic `try_insert` critical section is *sufficient* to serialise the same-id race. | leaf 1.1 · G2.1 (concurrent half of `ReplayImpossible`) |
+| **`TtyMutualExclusion`** | At most one handler is in the interactive TTY region (prompt or foreground exec) at a time — `tty_lock` serialises /dev/tty across all interleavings. | Sn5.x · G6 |
+
+**How they are witnessed.** `NoDoubleExec` is a monitor flag (`vDoubleExec`), raised
+at an exec site if the id already ran — the same idiom as the sibling models.
+`TtyMutualExclusion` is a bounded witness counter (`ttyActive`, 0..|Handlers|),
+incremented inside the lock on entering the interactive region and decremented on
+leaving; the invariant asserts it never exceeds 1. Both keep every variable
+bounded, so the state space is finite and small.
+
+**Faithful to the dispatch's lock discipline.** The privileged path takes
+`tty_lock` for the prompt, **releases it before exec**, then `ForegroundGuard`
+**re-takes** it for the foreground swap (two separate critical sections —
+`server.rs:553` / `executor.rs`). `exec_direct` (unprivileged) takes **no** TTY
+lock. The no-confirm banner uses a best-effort `try_lock` (modelled as a momentary,
+non-overlapping hold). So `ttyActive` legitimately goes 1→0→1 across a privileged
+request, and another handler may prompt in the released gap — never *simultaneously*.
+
+### Result — the atomic critical section is sufficient
+
+TLC checks both invariants exhaustively with **no error**: 5116 distinct states at
+`Handlers = {h1, h2}` (sub-second), 238 590 at `{h1, h2, h3}`. Two handlers is the
+minimal witness for the pairwise dedup TOCTOU and the TTY race; a third only adds
+symmetric interleavings (the larger run is a one-line `.cfg` change, kept out of CI
+for speed). The same-id race **is** exercised — both handlers can pick the same id,
+one wins `try_insert` and execs, the other sees `dup` and is rejected — so the
+property is non-vacuous, as NC1 below confirms by making it fail.
+
+### Negative controls — that the model has teeth
+
+Apply by hand to the PlusCal, re-run `pcal.trans`, re-check, then revert. **Do not
+commit the mutations.** Each was verified to violate exactly the listed invariant.
+
+| # | Mutation (in the PlusCal) | Violates |
+|---|---------------------------|----------|
+| **NC1** | Split the atomic `TryInsert` into `CheckSeen` (contains, then **release** the lock) and `InsertAcq`/`InsertSeen` (re-acquire, insert) — re-introducing the TOCTOU `try_insert` closes. | `NoDoubleExec` |
+| **NC2** | Drop the `tty_lock` acquire/release around `PrivPrompt` (prompt without the lock). | `TtyMutualExclusion` |
+
+NC1 is the load-bearing one: it shows the model genuinely *sees* the TOCTOU, so the
+clean run's `NoDoubleExec` is a real verification of the atomic critical section, not
+a vacuous pass. NC2 reproduces the PR #22 hazard the TTY lock exists to prevent.
+
+### Faithfulness ledger
+
+**Modelled faithfully**
+
+- Two pieces of `Arc<Mutex<…>>`-shared state (`seen`, `tty_lock`) and N handler
+  threads interleaving on them at mutex-acquire/release granularity.
+- `try_insert` as **one** critical section under the `seen_ids` lock (the atomicity
+  that closes the TOCTOU — the whole point of the model).
+- The dispatch's TTY-lock discipline: privileged prompt holds then **releases**
+  before exec, `ForegroundGuard` re-takes for the foreground swap, `exec_direct`
+  takes no lock, the no-confirm banner is best-effort `try_lock`.
+- Both handlers may race the **same** id (the TOCTOU) or handle distinct ids.
+
+**Abstracted (sound for these two properties)**
+
+- The per-request gate chain (validate → freshness → env allowlist) → assumed
+  passed: it is per-handler and sequential, covered by the
+  [`ApprovalStateMachine`](#the-four-properties) model. The focus here is the
+  shared-state races.
+- `SeenIds` eviction → never evict (the [`ReplayWindow`](#window-sizing--freshness--replay-retention)
+  model owns the TTL); irrelevant to a concurrency race within the model's horizon.
+- `confirm_unprivileged` → a read-only init-free boolean (no `a` key), so its flip
+  semantics stay with the `ApprovalStateMachine` model; both dispatch paths are
+  still covered.
+- Handler count bounded to 2 (3 also checked); the keypress set to `{y, other}`.
+
+**Out of scope**
+
+- The approval/keypress decision table and the flag-flip transition — the
+  [`ApprovalStateMachine`](#the-four-properties) model.
+- The freshness ↔ retention window sizing — the
+  [`ReplayWindow`](#window-sizing--freshness--replay-retention) model.
+
+### Running TLC
+
+```sh
+# (only if you edited the PlusCal) re-generate the translation, then commit both:
+java -cp tla2tools.jar pcal.trans ConcurrentHandlers.tla
+
+java -cp tla2tools.jar tlc2.TLC -nowarning \
+     -config ConcurrentHandlers.cfg -workers auto ConcurrentHandlers.tla
+```
+
+Expected: `No error has been found` (5116 distinct states, sub-second).
diff --git a/proofs/tla/ReplayWindow.cfg b/proofs/tla/ReplayWindow.cfg
new file mode 100644
index 0000000..8a6a2b5
--- /dev/null
+++ b/proofs/tla/ReplayWindow.cfg
@@ -0,0 +1,23 @@
+\* TLC model for the Extended Rung 4 freshness <-> replay window-sizing proof.
+\* See README.md ("Window-sizing").
+\* Run:  java -cp tla2tools.jar tlc2.TLC -nowarning \
+\*            -config ReplayWindow.cfg -workers auto ReplayWindow.tla
+
+CONSTANTS
+    \* One id: the property is per-id; the two-distinct-ids interaction is
+    \* covered by the sibling ApprovalStateMachine model. A TLC model value.
+    Ids = {r1}
+    \* The window relationship is what matters, not the absolute seconds.
+    \* Shipped config: Retention = 2 * Freshness (real code: 120s = 2 * 60s).
+    Freshness = 2
+    Retention = 4
+    \* Clock horizon: must reach an insertion + Retention so eviction (and a
+    \* post-eviction replay) is exercised; 6 gives headroom over 2*Freshness.
+    MaxTime = 6
+
+SPECIFICATION Spec
+
+INVARIANTS
+    TypeOK
+    PastReplayImpossible
+    AcceptWasFresh
diff --git a/proofs/tla/ReplayWindow.tla b/proofs/tla/ReplayWindow.tla
new file mode 100644
index 0000000..557194c
--- /dev/null
+++ b/proofs/tla/ReplayWindow.tla
@@ -0,0 +1,273 @@
+---------------------------- MODULE ReplayWindow ----------------------------
+(***************************************************************************)
+(* Extended Rung 4 of the sudo-proxy formalisation roadmap: the temporal   *)
+(* freshness <-> replay-retention WINDOW-SIZING property that the sibling   *)
+(* `ApprovalStateMachine` model deliberately hides.                        *)
+(*                                                                         *)
+(* That model abstracts the clock to a boolean and models the replay set   *)
+(* (`seen`) as never evicting -- "conservative for ReplayImpossible". This *)
+(* model restores the concrete (but small, abstract) clock and the real    *)
+(* TTL eviction, and proves the genuinely temporal claim the daemon relies  *)
+(* on: a captured request cannot evade replay-dedup by waiting for its id   *)
+(* to be evicted while STILL passing the freshness gate.                   *)
+(*                                                                         *)
+(* The two real-code windows (src/server.rs):                              *)
+(*   MAX_REQUEST_AGE   = 60s   -- freshness gate (check_freshness)          *)
+(*   REPLAY_RETENTION  = 120s  -- seen-id eviction TTL (SeenIds::evict_stale*)
+(* are modelled by the small constants Freshness / Retention; only their   *)
+(* RELATIONSHIP matters, not the absolute seconds (see README).            *)
+(*                                                                         *)
+(* The PlusCal algorithm below is the source of truth; the TLC-checkable   *)
+(* translation between BEGIN/END TRANSLATION is generated by `pcal.trans`  *)
+(* and committed alongside. If you edit the PlusCal, re-run pcal.trans and  *)
+(* commit both.                                                            *)
+(*                                                                         *)
+(* Faithfulness ledger, the window arithmetic, the negative-control recipe,*)
+(* and the two findings this model surfaced live in proofs/tla/README.md.  *)
+(***************************************************************************)
+EXTENDS Integers, FiniteSets
+
+CONSTANTS Ids,        \* the bounded set of request ids; the cfg uses {r1}
+          Freshness,  \* freshness window (abstract units); real code: 60s
+          Retention,  \* replay-retention TTL; real code: 120s = 2*Freshness
+          MaxTime     \* clock horizon; bounds the state space, keeps it finite
+
+(* --algorithm ReplayWindow
+
+variables
+    \* An abstract wall clock; advanced only by the Tick branch, bounded at
+    \* MaxTime so the reachable state space is finite.
+    clock = 0,
+
+    \* Replay dedup (SeenIds), now with eviction MODELLED: each remembered id
+    \* carries its insertion time (and the request timestamp, used only by the
+    \* NC2 mutation). Faithful to the HashSet + VecDeque<(id, Instant)> pair.
+    seen = {},
+
+    \* The single captured request we track (Ids = {r1}: the per-id property;
+    \* the two-distinct-ids interaction is the sibling model's job). `origTs` is
+    \* the original timestamp -- the bytes an attacker captures and replays.
+    origExists = FALSE,
+    origId     = CHOOSE i \in Ids : TRUE,
+    origTs     = 0,
+    origAccAt  = 0,    \* the clock value when the original was first accepted
+
+    \* ---- monitor (violation) flags ----
+    \* A PAST-dated captured request re-executed (replay slipped dedup). The
+    \* window-sizing theorem says this is impossible once Retention >= Freshness.
+    vReplay = FALSE,
+    \* A FUTURE-dated captured request re-executed after eviction. This one IS
+    \* reachable -- the finding: parse_age clamps future timestamps to age 0
+    \* (server.rs:146) with no upper cap, so a request dated beyond Retention
+    \* into the future is replayable regardless of how large Retention is.
+    vFutureReplay = FALSE,
+
+    \* The request on the wire and the busy handshake with the daemon.
+    busy  = FALSE,
+    inReq = [id |-> CHOOSE i \in Ids : TRUE, ts |-> 0];
+
+define
+    \* Faithful to parse_age: a future-dated timestamp (ts > now) is clamped to
+    \* age 0 and so always passes; otherwise age = now - ts must be <= Freshness.
+    FreshAt(ts) == IF ts > clock THEN TRUE ELSE clock - ts <= Freshness
+
+    \* evict_stale keyed on INSERTION time (the faithful rule). The NC2 mutation
+    \* changes `rec.insertedAt` to `rec.ts` -- evicting by the request timestamp,
+    \* the exact bug the REPLAY_RETENTION code comment warns against.
+    Live(rec) == ~(clock - rec.insertedAt > Retention)
+    Evicted   == { rec \in seen : Live(rec) }
+end define;
+
+\* The environment / attacker + the honest client, fully nondeterministic.
+process Env = "env"
+begin
+EnvLoop:
+    while TRUE do
+        await ~busy;
+        either
+            \* Time advances (between handled requests; handling is atomic --
+            \* faithful: check_freshness runs at handler ENTRY, server.rs:499).
+            await clock < MaxTime;
+            clock := clock + 1;
+        or
+            \* The one genuine request for this id. Its timestamp is free over
+            \* the whole horizon, so it covers both an honest fresh request and
+            \* a (clock-skewed / crafted) FUTURE-dated one.
+            await ~origExists;
+            with i \in Ids, t \in 0..MaxTime do
+                inReq := [id |-> i, ts |-> t];
+                busy  := TRUE;
+            end with;
+        or
+            \* The attacker replays the CAPTURED BYTES: same id, same timestamp.
+            \* (Minting a fresh timestamp under an existing nonce id is not
+            \* replay -- ids are random; that case is out of scope.)
+            await origExists;
+            inReq := [id |-> origId, ts |-> origTs];
+            busy  := TRUE;
+        end either;
+    end while;
+end process;
+
+\* The daemon: freshness gate, then the eviction-aware dedup (try_insert), then
+\* dispatch. One request handled to completion atomically.
+process Daemon = "daemon"
+begin
+Handle:
+    while TRUE do
+        await busy;
+        if ~FreshAt(inReq.ts) then
+            skip;                          \* freshness gate -> reject; try_insert not called
+        elsif \E rec \in Evicted : rec.id = inReq.id then
+            \* try_insert: evict_stale ran, id still present -> duplicate, reject.
+            seen := Evicted;
+        else
+            \* Accept: evict, then insert stamped with the current clock.
+            seen := Evicted \union {[id |-> inReq.id, insertedAt |-> clock, ts |-> inReq.ts]};
+            if origExists then
+                \* A replay slipped through dedup -- record which kind.
+                if origTs <= origAccAt then
+                    vReplay := TRUE;       \* PAST capture: the theorem forbids this
+                else
+                    vFutureReplay := TRUE; \* FUTURE capture: the documented finding
+                end if;
+            else
+                origExists := TRUE;
+                origId     := inReq.id;
+                origTs     := inReq.ts;
+                origAccAt  := clock;
+            end if;
+        end if;
+        busy := FALSE;
+    end while;
+end process;
+
+end algorithm; *)
+\* BEGIN TRANSLATION
+VARIABLES clock, seen, origExists, origId, origTs, origAccAt, vReplay, 
+          vFutureReplay, busy, inReq
+
+(* define statement *)
+FreshAt(ts) == IF ts > clock THEN TRUE ELSE clock - ts <= Freshness
+
+
+
+
+Live(rec) == ~(clock - rec.insertedAt > Retention)
+Evicted   == { rec \in seen : Live(rec) }
+
+
+vars == << clock, seen, origExists, origId, origTs, origAccAt, vReplay, 
+           vFutureReplay, busy, inReq >>
+
+ProcSet == {"env"} \cup {"daemon"}
+
+Init == (* Global variables *)
+        /\ clock = 0
+        /\ seen = {}
+        /\ origExists = FALSE
+        /\ origId = (CHOOSE i \in Ids : TRUE)
+        /\ origTs = 0
+        /\ origAccAt = 0
+        /\ vReplay = FALSE
+        /\ vFutureReplay = FALSE
+        /\ busy = FALSE
+        /\ inReq = [id |-> CHOOSE i \in Ids : TRUE, ts |-> 0]
+
+Env == /\ ~busy
+       /\ \/ /\ clock < MaxTime
+             /\ clock' = clock + 1
+             /\ UNCHANGED <<busy, inReq>>
+          \/ /\ ~origExists
+             /\ \E i \in Ids:
+                  \E t \in 0..MaxTime:
+                    /\ inReq' = [id |-> i, ts |-> t]
+                    /\ busy' = TRUE
+             /\ clock' = clock
+          \/ /\ origExists
+             /\ inReq' = [id |-> origId, ts |-> origTs]
+             /\ busy' = TRUE
+             /\ clock' = clock
+       /\ UNCHANGED << seen, origExists, origId, origTs, origAccAt, vReplay, 
+                       vFutureReplay >>
+
+Daemon == /\ busy
+          /\ IF ~FreshAt(inReq.ts)
+                THEN /\ TRUE
+                     /\ UNCHANGED << seen, origExists, origId, origTs, 
+                                     origAccAt, vReplay, vFutureReplay >>
+                ELSE /\ IF \E rec \in Evicted : rec.id = inReq.id
+                           THEN /\ seen' = Evicted
+                                /\ UNCHANGED << origExists, origId, origTs, 
+                                                origAccAt, vReplay, 
+                                                vFutureReplay >>
+                           ELSE /\ seen' = (Evicted \union {[id |-> inReq.id, insertedAt |-> clock, ts |-> inReq.ts]})
+                                /\ IF origExists
+                                      THEN /\ IF origTs <= origAccAt
+                                                 THEN /\ vReplay' = TRUE
+                                                      /\ UNCHANGED vFutureReplay
+                                                 ELSE /\ vFutureReplay' = TRUE
+                                                      /\ UNCHANGED vReplay
+                                           /\ UNCHANGED << origExists, origId, 
+                                                           origTs, origAccAt >>
+                                      ELSE /\ origExists' = TRUE
+                                           /\ origId' = inReq.id
+                                           /\ origTs' = inReq.ts
+                                           /\ origAccAt' = clock
+                                           /\ UNCHANGED << vReplay, 
+                                                           vFutureReplay >>
+          /\ busy' = FALSE
+          /\ UNCHANGED << clock, inReq >>
+
+Next == Env \/ Daemon
+
+Spec == Init /\ [][Next]_vars
+
+\* END TRANSLATION
+
+\* ===================== invariants & properties =====================
+
+\* Modeling-hygiene type invariant.
+TypeOK ==
+    /\ clock \in 0..MaxTime
+    /\ seen \subseteq [id : Ids, insertedAt : 0..MaxTime, ts : 0..MaxTime]
+    /\ origExists \in BOOLEAN
+    /\ origId \in Ids
+    /\ origTs \in 0..MaxTime
+    /\ origAccAt \in 0..MaxTime
+    /\ vReplay \in BOOLEAN
+    /\ vFutureReplay \in BOOLEAN
+    /\ busy \in BOOLEAN
+    /\ inReq \in [id : Ids, ts : 0..MaxTime]
+
+\* THE WINDOW-SIZING THEOREM (holds for Retention >= Freshness; the shipped
+\* config has Retention = 2*Freshness): no captured PAST-dated request is ever
+\* re-executed. A captured replay cannot win the race between freshness expiry
+\* and eviction -- by the time its id could be evicted, its fixed timestamp has
+\* already aged past the freshness gate.
+\*
+\* Stated as a monitor (vReplay raised at the re-exec site, exactly like the
+\* sibling model's ReplayImpossible) rather than as a `\E rec \in seen` set-
+\* membership predicate -- ON PURPOSE. Eviction is LAZY (an id is pruned only
+\* inside try_insert) and a replay-accept removes then re-inserts the id in one
+\* atomic step, so a membership invariant can never observe the gap; only an
+\* event monitor can. This is the same reason the sibling model uses monitors.
+PastReplayImpossible == ~vReplay
+
+\* Sanity: the daemon only ever accepts a request that was fresh at acceptance
+\* (either future-dated -- clamped to age 0 -- or within the freshness window).
+AcceptWasFresh ==
+    origExists => (origTs > origAccAt \/ origAccAt - origTs <= Freshness)
+
+\* ----- the future-clamp FINDING (NOT in the committed .cfg) -----
+\* This invariant FAILS by design: TLC returns a future-dated counterexample in
+\* which a captured request dated beyond Retention into the future is re-executed
+\* after its id is evicted. parse_age clamps future timestamps to age 0
+\* (server.rs:146) with no upper cap, so such a request keeps passing the
+\* freshness gate forever while its id ages out of `seen` -- making it replayable
+\* every Retention units regardless of how large Retention is. Add NoFutureReplay
+\* to INVARIANTS by hand to reproduce the trace; see README "Finding: future-
+\* dated replay".
+NoFutureReplay == ~vFutureReplay
+
+=============================================================================