feat: Rung 4 + Extended Rung 4 — TLA+/ProVerif formal models#35
Merged
Conversation
Discharge Rung 4 of the assurance ladder with two machine-checked protocol-level models under a new top-level proofs/ dir (kept out of src/ so cargo/clippy never see them); both run non-gating in CI, mirroring the Rung 3 Kani job. - proofs/tla/ — TLA+/PlusCal model of the approval + confirm_unprivileged state machine, checked by TLC. Transcribes the handle_connection gate chain and tui::classify_key; a nondeterministic environment quantifies over all attacker forgeries/replays and operator keypresses. Proves NoExecWithoutApproval, ReplayImpossible, PolicyFlipsOnlyOnKeypress (+ FlagMonotone) and PrivilegedGateIndependentOfPolicy — the machine-checked closure of the 1.4/4.4 policy-transition residual (F2). Bounded monitor variables keep the state space finite (~9k states); four negative-control mutations each produce the expected counterexample. - proofs/proverif/ — symbolic Dolev-Yao model of the SSH path with a compile-time PINNED toggle (m4). Proves channel authenticity and secrecy hold iff the host key is pinned, exhibits the first-contact MITM trace when it is not (leaf 4.2), and proves the separation theorem: a privileged exec at the honest remote still needs a human keypress even under full channel compromise. Makes assumption A4 explicit. - Decisions recorded: TLA+/PlusCal over Alloy (temporal safety properties); ProVerif over Tamarin (one toggle + auto attack derivation). Scope and the honest note on injective replay (a ProVerif private-channel over-approximation, discharged instead by the TLA+/Kani rungs) live in the per-model READMEs. - Docs: flip assurance-case leaves Sn5.1/Sn5.3/Sn6.3 to cite the models, mark Rung 4 done in the roadmap, update threat-model residual notes, add proofs/ to the README. Move the Rung 4 item from backlog to changelog. Verified: TLC no-error + negative controls fail as required; ProVerif gives the pinned-vs-first-contact split. Release build (--features mcp) and clippy --all-targets --all-features -D warnings remain clean (no Rust touched). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Capture the should-close gaps identified while reviewing the shipped Rung 4 models, as follow-up items continuing the assurance ladder: - Freshness <-> replay window-sizing (TLA+): prove retention >= 2x freshness so a request can't evade dedup by waiting for eviction; tightens ReplayImpossible from the current never-evict abstraction. - Concurrent-handler interleaving (TLA+): verify the try_insert TOCTOU fix and TTY-lock serialisation under all interleavings. - SSH key-acceptance model (ProVerif): derive the first-contact MITM from host-key substitution instead of assuming a private channel. The model-to-code gap is deliberately left to Rungs 5/6 (EAL6-7+); the ProVerif injective-replay limitation is documented, not closed (it is redundant with the TLA+/Kani coverage). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a second TLA+/PlusCal model, proofs/tla/ReplayWindow.tla, proving the temporal property the approval-state-machine model hides: it models the replay set as never-evicting, making ReplayImpossible only conditional. The new model restores a small abstract clock + real TTL eviction and proves PastReplayImpossible — no captured past-dated request is ever re-executed — checked exhaustively by TLC (442 states, sub-second), non-gating in CI. Two findings, both confirmed by TLC and documented: - The tight retention bound is Retention >= Freshness, not 2x. The shipped 120s = 2*60s is conservative margin, not a necessity (not a bug). - parse_age clamps future-dated timestamps to age 0 with no upper cap, so a request dated beyond retention into the future is replayable every retention window regardless of size. Logged as a backlog item; the model exposes it via the NoFutureReplay invariant. Window constants abstract the two real windows to small relationship- preserving integers. The property uses an event monitor rather than a seen-membership invariant because eviction is lazy and a replay-accept removes-then-reinserts atomically. README gains a Window-sizing section (theorem, arithmetic, both findings, negative-control table, faithfulness ledger); tlc.yml gains a second non-gating step; .gitignore covers TLC artifacts. Sibling ApprovalStateMachine re-checks unchanged (9072 states); no Rust touched. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add proofs/tla/ConcurrentHandlers.tla, a third TLA+/PlusCal model (TLC-checked, non-gating in CI) covering the canonical model-checking case the two atomic, one-request-at-a-time siblings cannot express: concurrent daemon handler threads interleaving on the two pieces of Arc<Mutex<..>>-shared state. Verifies the atomicity fix is sufficient. - NoDoubleExec: no id executes twice under any interleaving — the atomic SeenIds::try_insert critical section closes the dedup TOCTOU. - TtyMutualExclusion: tty_lock serialises /dev/tty across interleavings (the PR #22 background-pgrp / EIO hazard), witnessed by ttyActive <= 1. Faithful to the dispatch lock discipline: privileged prompt holds then releases before exec, ForegroundGuard re-takes for the foreground swap, exec_direct takes no lock, no-confirm banner is best-effort try_lock. TLC: no error, 5116 distinct states at 2 handlers, 238590 at 3. Two negative controls verified (not committed): NC1 splits try_insert (check/release/insert) -> NoDoubleExec fails; NC2 drops the prompt's tty_lock -> TtyMutualExclusion fails. Wire the third TLC step into tlc.yml; extend proofs/tla/README.md and correct the roadmap's Rung 4 scope note (concurrency/TOCTOU is now covered, not "stays elsewhere"). No Rust touched (proofs/ is outside src/); the two sibling models re-check unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nce (ProVerif)
Rework the Rung 4 ProVerif model from a channel-toggle that *declared* the
tunnel private when pinned (assuming the MITM) to a key-acceptance encoding that
*derives* it. The PINNED m4 toggle now flips only which host key the daemon
accepts; the channel is the public Dolev-Yao network in both configurations.
Models the real SSH mutual-authentication structure — a host keypair
(server->client: pinning + confidentiality, A4) and a client keypair
(client->server: authorized_keys, integrity) — so the model disentangles which
assumption protects what:
- (a) command authenticity (RemoteAccept => RequestSent): true in both,
rides on client auth (authorized_keys);
- (c) payload confidentiality: true pinned / false first-contact, rides on
host-key pinning (A4) — the derived key-substitution MITM (leaf 4.2);
- (b) injective replay: false in both (honest public-channel replay; the
old private-channel over-approximation is gone);
- (d) separation (RemoteExec => Keypress): true in both (private TTY, A1).
Three documented negative controls (NC1 drop checksign => (a) false even pinned;
NC2 first-contact => (c) false; NC3 public ttyAck => (d) unprovable), each run
and producing the expected result.
Docs updated: proverif README (key-acceptance toggle, query table + rides-on
column, controls, scope with the now-explicit client-auth assumption),
proverif.yml comment, threat-model leaf 4.2, assurance-case A4 + Sn6.3, and the
formalisation roadmap. No Rust touched (proofs/ is outside src/); the three TLA+
sibling models are untouched.
Verified with ProVerif 2.05 on both m4 configs and all three negative controls.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The proverif job (non-gating) intermittently failed at install time: opam pulls ProVerif's GTK GUI system deps via depext, and the runner's apt mirror occasionally 404s on a superseded liblzma-dev version pin against a stale package index. The same commit passed on one trigger and failed on another, confirming flakiness rather than a model error. Add a sudo apt-get update step before the install so apt resolves against the current index. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a forward-pointer note on the original 'Rung 4: protocol-level formal models' entry: its ProVerif description (channel-toggle, injective-replay over-approximation) was reworked the same day into the key-acceptance mutual-auth model documented in the top entry. History left intact; only an annotation added. The TLA+ half of the old entry is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cuihtlauac
added a commit
that referenced
this pull request
Jun 18, 2026
…adder to Rung 4 (#36) The far-future request-timestamp replay residual surfaced during PR #35 (by the Extended Rung 4 ReplayWindow model's NoFutureReplay witness). Rewrite the top backlog item so it is explicit that the work is two-part: (1) the code fix in check_freshness / parse_age, and (2) re-discharging every rung that covers the touched code — Rung 1 regression test, Rung 2 freshness property, Rung 3 Kani monotonicity, Rung 4 flipping NoFutureReplay to a held invariant — so assurance returns to the Rung 4 level rather than leaving the proofs trailing the code. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This branch lands Rung 4 of the security-formalisation ladder and its
Extended Rung 4 follow-ups.
maindoes not yet haveproofs/tla/, so the PRincludes the Rung 4 base it builds on. Three TLA+/PlusCal models plus a ProVerif
model, all TLC/ProVerif-checked and non-gating in CI (like the Kani/geiger
jobs). No Rust touched —
proofs/is outsidesrc/.(
ApprovalStateMachine) + ProVerif SSH channel model. See the2026-06-11 — Rung 4changelog entry.ReplayWindow): thetemporal property the approval-state-machine model deliberately abstracts away
(it never evicts).
ConcurrentHandlers):the concurrency races the two atomic, one-request-at-a-time models cannot
express by construction.
ConcurrentHandlers— concurrent interleaving (newest work)The canonical model-checking use case: concurrent daemon handler threads
interleaving on the two pieces of
Arc<Mutex<..>>-shared state. PlusCal labelsare the atomicity boundaries, so TLC explores every interleaving — including both
handlers racing the same request id. It verifies the atomicity fix is
sufficient.
NoDoubleExec— no id executes twice under any interleaving. Closure of thededup TOCTOU:
SeenIds::try_insert(server.rs:213) folds the contains-checkand insert into one critical section; the model proves that is sufficient.
TtyMutualExclusion(ttyActive <= 1) —tty_lockserialises /dev/ttyacross all interleavings (the PR Fix TUI prompt spurious-deny and SIGTTIN daemon hang #22 background-pgrp / EIO hazard).
Faithful to the dispatch lock discipline: privileged prompt holds then releases
before exec,
ForegroundGuardre-takes for the foreground swap,exec_directtakes no lock, the no-confirm banner is best-effort
try_lock.TLC: no error, 5116 distinct states at 2 handlers (sub-second), 238 590 at
3. Two negative controls verified (not committed): NC1 splits
try_insert(check/release/insert) →
NoDoubleExecfails (proves the model sees the race theatomicity closes); NC2 drops the prompt's
tty_lock→TtyMutualExclusionfails (reproduces the PR #22 hazard).
ReplayWindow— freshness ↔ replay window-sizingRestores a small abstract clock + real TTL eviction and proves
PastReplayImpossible— no captured past-dated request is ever re-executed —tightening
ReplayImpossiblefrom conditional to real. TLC: no error, 442distinct states.
Two findings, both confirmed by TLC and documented:
Retention >= Freshness, not2x. Holds atRetention = Freshness, fails atFreshness - 1. The shipped120s = 2x60sisconservative margin, not a necessity — not a bug.
residual).
parse_ageclamps future timestamps to age 0 (server.rs:146)with no upper cap, so a request dated beyond
Retentioninto the future isreplayable every retention window regardless of size. Logged as a backlog item
(cap far-future timestamps); not fixed here.
Verification
tlc2.TLCon all three models: no error(
ApprovalStateMachine9072 /ReplayWindow442 /ConcurrentHandlers5116distinct states).
counterexample.
cargo build --releaseclean (no Rust changed).🤖 Generated with Claude Code