feat: Rung 4 + Extended Rung 4 — TLA+/ProVerif formal models by cuihtlauac · Pull Request #35 · tarides/sudo-proxy

cuihtlauac · 2026-06-11T08:35:01Z

Summary

This branch lands Rung 4 of the security-formalisation ladder and its
Extended Rung 4 follow-ups. main does not yet have proofs/tla/, so the PR
includes the Rung 4 base it builds on. Three TLA+/PlusCal models plus a ProVerif
model, all TLC/ProVerif-checked and non-gating in CI (like the Kani/geiger
jobs). No Rust touched — proofs/ is outside src/.

Rung 4 — protocol-level formal models. TLA+ approval state machine
(ApprovalStateMachine) + ProVerif SSH channel model. See the
2026-06-11 — Rung 4 changelog entry.
Extended Rung 4 — freshness ↔ replay window-sizing (ReplayWindow): the
temporal property the approval-state-machine model deliberately abstracts away
(it never evicts).
Extended Rung 4 — concurrent-handler interleaving (ConcurrentHandlers):
the concurrency races the two atomic, one-request-at-a-time models cannot
express by construction.

`ConcurrentHandlers` — concurrent interleaving (newest work)

The canonical model-checking use case: concurrent daemon handler threads
interleaving on the two pieces of Arc<Mutex<..>>-shared state. PlusCal labels
are the atomicity boundaries, so TLC explores every interleaving — including both
handlers racing the same request id. It verifies the atomicity fix is
sufficient.

NoDoubleExec — no id executes twice under any interleaving. Closure of the
dedup TOCTOU: SeenIds::try_insert (server.rs:213) folds the contains-check
and insert into one critical section; the model proves that is sufficient.
TtyMutualExclusion (ttyActive <= 1) — tty_lock serialises /dev/tty
across all interleavings (the PR Fix TUI prompt spurious-deny and SIGTTIN daemon hang #22 background-pgrp / EIO hazard).

Faithful to the dispatch lock discipline: privileged prompt holds then releases
before exec, ForegroundGuard re-takes for the foreground swap, exec_direct
takes no lock, the no-confirm banner is best-effort try_lock.

TLC: no error, 5116 distinct states at 2 handlers (sub-second), 238 590 at
3. Two negative controls verified (not committed): NC1 splits try_insert
(check/release/insert) → NoDoubleExec fails (proves the model sees the race the
atomicity closes); NC2 drops the prompt's tty_lock → TtyMutualExclusion
fails (reproduces the PR #22 hazard).

`ReplayWindow` — freshness ↔ replay window-sizing

Restores a small abstract clock + real TTL eviction and proves
PastReplayImpossible — no captured past-dated request is ever re-executed —
tightening ReplayImpossible from conditional to real. TLC: no error, 442
distinct states.

Two findings, both confirmed by TLC and documented:

Tight bound is Retention >= Freshness, not 2x. Holds at
Retention = Freshness, fails at Freshness - 1. The shipped 120s = 2x60s is
conservative margin, not a necessity — not a bug.
Future-dated timestamps defeat replay protection after eviction (new
residual). parse_age clamps future timestamps to age 0 (server.rs:146)
with no upper cap, so a request dated beyond Retention into the future is
replayable every retention window regardless of size. Logged as a backlog item
(cap far-future timestamps); not fixed here.

Verification

tlc2.TLC on all three models: no error
(ApprovalStateMachine 9072 / ReplayWindow 442 / ConcurrentHandlers 5116
distinct states).
Every documented negative control / finding reproduces the expected
counterexample.
ProVerif: pinned-vs-first-contact split as expected.
cargo build --release clean (no Rust changed).

🤖 Generated with Claude Code

Discharge Rung 4 of the assurance ladder with two machine-checked protocol-level models under a new top-level proofs/ dir (kept out of src/ so cargo/clippy never see them); both run non-gating in CI, mirroring the Rung 3 Kani job. - proofs/tla/ — TLA+/PlusCal model of the approval + confirm_unprivileged state machine, checked by TLC. Transcribes the handle_connection gate chain and tui::classify_key; a nondeterministic environment quantifies over all attacker forgeries/replays and operator keypresses. Proves NoExecWithoutApproval, ReplayImpossible, PolicyFlipsOnlyOnKeypress (+ FlagMonotone) and PrivilegedGateIndependentOfPolicy — the machine-checked closure of the 1.4/4.4 policy-transition residual (F2). Bounded monitor variables keep the state space finite (~9k states); four negative-control mutations each produce the expected counterexample. - proofs/proverif/ — symbolic Dolev-Yao model of the SSH path with a compile-time PINNED toggle (m4). Proves channel authenticity and secrecy hold iff the host key is pinned, exhibits the first-contact MITM trace when it is not (leaf 4.2), and proves the separation theorem: a privileged exec at the honest remote still needs a human keypress even under full channel compromise. Makes assumption A4 explicit. - Decisions recorded: TLA+/PlusCal over Alloy (temporal safety properties); ProVerif over Tamarin (one toggle + auto attack derivation). Scope and the honest note on injective replay (a ProVerif private-channel over-approximation, discharged instead by the TLA+/Kani rungs) live in the per-model READMEs. - Docs: flip assurance-case leaves Sn5.1/Sn5.3/Sn6.3 to cite the models, mark Rung 4 done in the roadmap, update threat-model residual notes, add proofs/ to the README. Move the Rung 4 item from backlog to changelog. Verified: TLC no-error + negative controls fail as required; ProVerif gives the pinned-vs-first-contact split. Release build (--features mcp) and clippy --all-targets --all-features -D warnings remain clean (no Rust touched). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Capture the should-close gaps identified while reviewing the shipped Rung 4 models, as follow-up items continuing the assurance ladder: - Freshness <-> replay window-sizing (TLA+): prove retention >= 2x freshness so a request can't evade dedup by waiting for eviction; tightens ReplayImpossible from the current never-evict abstraction. - Concurrent-handler interleaving (TLA+): verify the try_insert TOCTOU fix and TTY-lock serialisation under all interleavings. - SSH key-acceptance model (ProVerif): derive the first-contact MITM from host-key substitution instead of assuming a private channel. The model-to-code gap is deliberately left to Rungs 5/6 (EAL6-7+); the ProVerif injective-replay limitation is documented, not closed (it is redundant with the TLA+/Kani coverage). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add a second TLA+/PlusCal model, proofs/tla/ReplayWindow.tla, proving the temporal property the approval-state-machine model hides: it models the replay set as never-evicting, making ReplayImpossible only conditional. The new model restores a small abstract clock + real TTL eviction and proves PastReplayImpossible — no captured past-dated request is ever re-executed — checked exhaustively by TLC (442 states, sub-second), non-gating in CI. Two findings, both confirmed by TLC and documented: - The tight retention bound is Retention >= Freshness, not 2x. The shipped 120s = 2*60s is conservative margin, not a necessity (not a bug). - parse_age clamps future-dated timestamps to age 0 with no upper cap, so a request dated beyond retention into the future is replayable every retention window regardless of size. Logged as a backlog item; the model exposes it via the NoFutureReplay invariant. Window constants abstract the two real windows to small relationship- preserving integers. The property uses an event monitor rather than a seen-membership invariant because eviction is lazy and a replay-accept removes-then-reinserts atomically. README gains a Window-sizing section (theorem, arithmetic, both findings, negative-control table, faithfulness ledger); tlc.yml gains a second non-gating step; .gitignore covers TLC artifacts. Sibling ApprovalStateMachine re-checks unchanged (9072 states); no Rust touched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Add proofs/tla/ConcurrentHandlers.tla, a third TLA+/PlusCal model (TLC-checked, non-gating in CI) covering the canonical model-checking case the two atomic, one-request-at-a-time siblings cannot express: concurrent daemon handler threads interleaving on the two pieces of Arc<Mutex<..>>-shared state. Verifies the atomicity fix is sufficient. - NoDoubleExec: no id executes twice under any interleaving — the atomic SeenIds::try_insert critical section closes the dedup TOCTOU. - TtyMutualExclusion: tty_lock serialises /dev/tty across interleavings (the PR #22 background-pgrp / EIO hazard), witnessed by ttyActive <= 1. Faithful to the dispatch lock discipline: privileged prompt holds then releases before exec, ForegroundGuard re-takes for the foreground swap, exec_direct takes no lock, no-confirm banner is best-effort try_lock. TLC: no error, 5116 distinct states at 2 handlers, 238590 at 3. Two negative controls verified (not committed): NC1 splits try_insert (check/release/insert) -> NoDoubleExec fails; NC2 drops the prompt's tty_lock -> TtyMutualExclusion fails. Wire the third TLC step into tlc.yml; extend proofs/tla/README.md and correct the roadmap's Rung 4 scope note (concurrency/TOCTOU is now covered, not "stays elsewhere"). No Rust touched (proofs/ is outside src/); the two sibling models re-check unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…nce (ProVerif) Rework the Rung 4 ProVerif model from a channel-toggle that *declared* the tunnel private when pinned (assuming the MITM) to a key-acceptance encoding that *derives* it. The PINNED m4 toggle now flips only which host key the daemon accepts; the channel is the public Dolev-Yao network in both configurations. Models the real SSH mutual-authentication structure — a host keypair (server->client: pinning + confidentiality, A4) and a client keypair (client->server: authorized_keys, integrity) — so the model disentangles which assumption protects what: - (a) command authenticity (RemoteAccept => RequestSent): true in both, rides on client auth (authorized_keys); - (c) payload confidentiality: true pinned / false first-contact, rides on host-key pinning (A4) — the derived key-substitution MITM (leaf 4.2); - (b) injective replay: false in both (honest public-channel replay; the old private-channel over-approximation is gone); - (d) separation (RemoteExec => Keypress): true in both (private TTY, A1). Three documented negative controls (NC1 drop checksign => (a) false even pinned; NC2 first-contact => (c) false; NC3 public ttyAck => (d) unprovable), each run and producing the expected result. Docs updated: proverif README (key-acceptance toggle, query table + rides-on column, controls, scope with the now-explicit client-auth assumption), proverif.yml comment, threat-model leaf 4.2, assurance-case A4 + Sn6.3, and the formalisation roadmap. No Rust touched (proofs/ is outside src/); the three TLA+ sibling models are untouched. Verified with ProVerif 2.05 on both m4 configs and all three negative controls. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The proverif job (non-gating) intermittently failed at install time: opam pulls ProVerif's GTK GUI system deps via depext, and the runner's apt mirror occasionally 404s on a superseded liblzma-dev version pin against a stale package index. The same commit passed on one trigger and failed on another, confirming flakiness rather than a model error. Add a sudo apt-get update step before the install so apt resolves against the current index. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add a forward-pointer note on the original 'Rung 4: protocol-level formal models' entry: its ProVerif description (channel-toggle, injective-replay over-approximation) was reworked the same day into the key-acceptance mutual-auth model documented in the top entry. History left intact; only an annotation added. The TLA+ half of the old entry is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…adder to Rung 4 (#36) The far-future request-timestamp replay residual surfaced during PR #35 (by the Extended Rung 4 ReplayWindow model's NoFutureReplay witness). Rewrite the top backlog item so it is explicit that the work is two-part: (1) the code fix in check_freshness / parse_age, and (2) re-discharging every rung that covers the touched code — Rung 1 regression test, Rung 2 freshness property, Rung 3 Kani monotonicity, Rung 4 flipping NoFutureReplay to a held invariant — so assurance returns to the Rung 4 level rather than leaving the proofs trailing the code. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cuihtlauac and others added 7 commits June 11, 2026 07:52

cuihtlauac merged commit b50a858 into main Jun 11, 2026
16 checks passed

cuihtlauac deleted the rung4-protocol-models branch June 11, 2026 15:15

cuihtlauac mentioned this pull request Jun 11, 2026

docs(backlog): far-future-timestamp fix + re-climb the ladder to Rung 4 #36

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Rung 4 + Extended Rung 4 — TLA+/ProVerif formal models#35

feat: Rung 4 + Extended Rung 4 — TLA+/ProVerif formal models#35
cuihtlauac merged 7 commits into
mainfrom
rung4-protocol-models

cuihtlauac commented Jun 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

cuihtlauac commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

ConcurrentHandlers — concurrent interleaving (newest work)

ReplayWindow — freshness ↔ replay window-sizing

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cuihtlauac commented Jun 11, 2026 •

edited

Loading

`ConcurrentHandlers` — concurrent interleaving (newest work)

`ReplayWindow` — freshness ↔ replay window-sizing