Skip to content

feat: Rung 4 + Extended Rung 4 — TLA+/ProVerif formal models#35

Merged
cuihtlauac merged 7 commits into
mainfrom
rung4-protocol-models
Jun 11, 2026
Merged

feat: Rung 4 + Extended Rung 4 — TLA+/ProVerif formal models#35
cuihtlauac merged 7 commits into
mainfrom
rung4-protocol-models

Conversation

@cuihtlauac

@cuihtlauac cuihtlauac commented Jun 11, 2026

Copy link
Copy Markdown
Member

Summary

This branch lands Rung 4 of the security-formalisation ladder and its
Extended Rung 4 follow-ups. main does not yet have proofs/tla/, so the PR
includes the Rung 4 base it builds on. Three TLA+/PlusCal models plus a ProVerif
model, all TLC/ProVerif-checked and non-gating in CI (like the Kani/geiger
jobs). No Rust touched — proofs/ is outside src/.

  1. Rung 4 — protocol-level formal models. TLA+ approval state machine
    (ApprovalStateMachine) + ProVerif SSH channel model. See the
    2026-06-11 — Rung 4 changelog entry.
  2. Extended Rung 4 — freshness ↔ replay window-sizing (ReplayWindow): the
    temporal property the approval-state-machine model deliberately abstracts away
    (it never evicts).
  3. Extended Rung 4 — concurrent-handler interleaving (ConcurrentHandlers):
    the concurrency races the two atomic, one-request-at-a-time models cannot
    express by construction.

ConcurrentHandlers — concurrent interleaving (newest work)

The canonical model-checking use case: concurrent daemon handler threads
interleaving on the two pieces of Arc<Mutex<..>>-shared state. PlusCal labels
are the atomicity boundaries, so TLC explores every interleaving — including both
handlers racing the same request id. It verifies the atomicity fix is
sufficient
.

  • NoDoubleExec — no id executes twice under any interleaving. Closure of the
    dedup TOCTOU: SeenIds::try_insert (server.rs:213) folds the contains-check
    and insert into one critical section; the model proves that is sufficient.
  • TtyMutualExclusion (ttyActive <= 1) — tty_lock serialises /dev/tty
    across all interleavings (the PR Fix TUI prompt spurious-deny and SIGTTIN daemon hang #22 background-pgrp / EIO hazard).

Faithful to the dispatch lock discipline: privileged prompt holds then releases
before exec, ForegroundGuard re-takes for the foreground swap, exec_direct
takes no lock, the no-confirm banner is best-effort try_lock.

TLC: no error, 5116 distinct states at 2 handlers (sub-second), 238 590 at
3
. Two negative controls verified (not committed): NC1 splits try_insert
(check/release/insert) → NoDoubleExec fails (proves the model sees the race the
atomicity closes); NC2 drops the prompt's tty_lockTtyMutualExclusion
fails (reproduces the PR #22 hazard).

ReplayWindow — freshness ↔ replay window-sizing

Restores a small abstract clock + real TTL eviction and proves
PastReplayImpossible — no captured past-dated request is ever re-executed —
tightening ReplayImpossible from conditional to real. TLC: no error, 442
distinct states.

Two findings, both confirmed by TLC and documented:

  • Tight bound is Retention >= Freshness, not 2x. Holds at
    Retention = Freshness, fails at Freshness - 1. The shipped 120s = 2x60s is
    conservative margin, not a necessity — not a bug.
  • Future-dated timestamps defeat replay protection after eviction (new
    residual).
    parse_age clamps future timestamps to age 0 (server.rs:146)
    with no upper cap, so a request dated beyond Retention into the future is
    replayable every retention window regardless of size. Logged as a backlog item
    (cap far-future timestamps); not fixed here.

Verification

  • tlc2.TLC on all three models: no error
    (ApprovalStateMachine 9072 / ReplayWindow 442 / ConcurrentHandlers 5116
    distinct states).
  • Every documented negative control / finding reproduces the expected
    counterexample.
  • ProVerif: pinned-vs-first-contact split as expected.
  • cargo build --release clean (no Rust changed).

🤖 Generated with Claude Code

cuihtlauac and others added 7 commits June 11, 2026 07:52
Discharge Rung 4 of the assurance ladder with two machine-checked
protocol-level models under a new top-level proofs/ dir (kept out of
src/ so cargo/clippy never see them); both run non-gating in CI,
mirroring the Rung 3 Kani job.

- proofs/tla/ — TLA+/PlusCal model of the approval + confirm_unprivileged
  state machine, checked by TLC. Transcribes the handle_connection gate
  chain and tui::classify_key; a nondeterministic environment quantifies
  over all attacker forgeries/replays and operator keypresses. Proves
  NoExecWithoutApproval, ReplayImpossible, PolicyFlipsOnlyOnKeypress
  (+ FlagMonotone) and PrivilegedGateIndependentOfPolicy — the
  machine-checked closure of the 1.4/4.4 policy-transition residual (F2).
  Bounded monitor variables keep the state space finite (~9k states); four
  negative-control mutations each produce the expected counterexample.

- proofs/proverif/ — symbolic Dolev-Yao model of the SSH path with a
  compile-time PINNED toggle (m4). Proves channel authenticity and secrecy
  hold iff the host key is pinned, exhibits the first-contact MITM trace
  when it is not (leaf 4.2), and proves the separation theorem: a privileged
  exec at the honest remote still needs a human keypress even under full
  channel compromise. Makes assumption A4 explicit.

- Decisions recorded: TLA+/PlusCal over Alloy (temporal safety properties);
  ProVerif over Tamarin (one toggle + auto attack derivation). Scope and the
  honest note on injective replay (a ProVerif private-channel
  over-approximation, discharged instead by the TLA+/Kani rungs) live in the
  per-model READMEs.

- Docs: flip assurance-case leaves Sn5.1/Sn5.3/Sn6.3 to cite the models,
  mark Rung 4 done in the roadmap, update threat-model residual notes, add
  proofs/ to the README. Move the Rung 4 item from backlog to changelog.

Verified: TLC no-error + negative controls fail as required; ProVerif gives
the pinned-vs-first-contact split. Release build (--features mcp) and clippy
--all-targets --all-features -D warnings remain clean (no Rust touched).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Capture the should-close gaps identified while reviewing the shipped
Rung 4 models, as follow-up items continuing the assurance ladder:

- Freshness <-> replay window-sizing (TLA+): prove retention >= 2x
  freshness so a request can't evade dedup by waiting for eviction;
  tightens ReplayImpossible from the current never-evict abstraction.
- Concurrent-handler interleaving (TLA+): verify the try_insert TOCTOU
  fix and TTY-lock serialisation under all interleavings.
- SSH key-acceptance model (ProVerif): derive the first-contact MITM
  from host-key substitution instead of assuming a private channel.

The model-to-code gap is deliberately left to Rungs 5/6 (EAL6-7+); the
ProVerif injective-replay limitation is documented, not closed (it is
redundant with the TLA+/Kani coverage).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a second TLA+/PlusCal model, proofs/tla/ReplayWindow.tla, proving the
temporal property the approval-state-machine model hides: it models the
replay set as never-evicting, making ReplayImpossible only conditional. The
new model restores a small abstract clock + real TTL eviction and proves
PastReplayImpossible — no captured past-dated request is ever re-executed —
checked exhaustively by TLC (442 states, sub-second), non-gating in CI.

Two findings, both confirmed by TLC and documented:
- The tight retention bound is Retention >= Freshness, not 2x. The shipped
  120s = 2*60s is conservative margin, not a necessity (not a bug).
- parse_age clamps future-dated timestamps to age 0 with no upper cap, so a
  request dated beyond retention into the future is replayable every
  retention window regardless of size. Logged as a backlog item; the model
  exposes it via the NoFutureReplay invariant.

Window constants abstract the two real windows to small relationship-
preserving integers. The property uses an event monitor rather than a
seen-membership invariant because eviction is lazy and a replay-accept
removes-then-reinserts atomically. README gains a Window-sizing section
(theorem, arithmetic, both findings, negative-control table, faithfulness
ledger); tlc.yml gains a second non-gating step; .gitignore covers TLC
artifacts. Sibling ApprovalStateMachine re-checks unchanged (9072 states);
no Rust touched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add proofs/tla/ConcurrentHandlers.tla, a third TLA+/PlusCal model
(TLC-checked, non-gating in CI) covering the canonical model-checking
case the two atomic, one-request-at-a-time siblings cannot express:
concurrent daemon handler threads interleaving on the two pieces of
Arc<Mutex<..>>-shared state. Verifies the atomicity fix is sufficient.

- NoDoubleExec: no id executes twice under any interleaving — the atomic
  SeenIds::try_insert critical section closes the dedup TOCTOU.
- TtyMutualExclusion: tty_lock serialises /dev/tty across interleavings
  (the PR #22 background-pgrp / EIO hazard), witnessed by ttyActive <= 1.

Faithful to the dispatch lock discipline: privileged prompt holds then
releases before exec, ForegroundGuard re-takes for the foreground swap,
exec_direct takes no lock, no-confirm banner is best-effort try_lock.

TLC: no error, 5116 distinct states at 2 handlers, 238590 at 3.
Two negative controls verified (not committed): NC1 splits try_insert
(check/release/insert) -> NoDoubleExec fails; NC2 drops the prompt's
tty_lock -> TtyMutualExclusion fails.

Wire the third TLC step into tlc.yml; extend proofs/tla/README.md and
correct the roadmap's Rung 4 scope note (concurrency/TOCTOU is now
covered, not "stays elsewhere"). No Rust touched (proofs/ is outside
src/); the two sibling models re-check unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nce (ProVerif)

Rework the Rung 4 ProVerif model from a channel-toggle that *declared* the
tunnel private when pinned (assuming the MITM) to a key-acceptance encoding that
*derives* it. The PINNED m4 toggle now flips only which host key the daemon
accepts; the channel is the public Dolev-Yao network in both configurations.

Models the real SSH mutual-authentication structure — a host keypair
(server->client: pinning + confidentiality, A4) and a client keypair
(client->server: authorized_keys, integrity) — so the model disentangles which
assumption protects what:

  - (a) command authenticity (RemoteAccept => RequestSent): true in both,
        rides on client auth (authorized_keys);
  - (c) payload confidentiality: true pinned / false first-contact, rides on
        host-key pinning (A4) — the derived key-substitution MITM (leaf 4.2);
  - (b) injective replay: false in both (honest public-channel replay; the
        old private-channel over-approximation is gone);
  - (d) separation (RemoteExec => Keypress): true in both (private TTY, A1).

Three documented negative controls (NC1 drop checksign => (a) false even pinned;
NC2 first-contact => (c) false; NC3 public ttyAck => (d) unprovable), each run
and producing the expected result.

Docs updated: proverif README (key-acceptance toggle, query table + rides-on
column, controls, scope with the now-explicit client-auth assumption),
proverif.yml comment, threat-model leaf 4.2, assurance-case A4 + Sn6.3, and the
formalisation roadmap. No Rust touched (proofs/ is outside src/); the three TLA+
sibling models are untouched.

Verified with ProVerif 2.05 on both m4 configs and all three negative controls.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The proverif job (non-gating) intermittently failed at install time: opam
pulls ProVerif's GTK GUI system deps via depext, and the runner's apt mirror
occasionally 404s on a superseded liblzma-dev version pin against a stale
package index. The same commit passed on one trigger and failed on another,
confirming flakiness rather than a model error. Add a sudo apt-get update step
before the install so apt resolves against the current index.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a forward-pointer note on the original 'Rung 4: protocol-level formal
models' entry: its ProVerif description (channel-toggle, injective-replay
over-approximation) was reworked the same day into the key-acceptance
mutual-auth model documented in the top entry. History left intact; only an
annotation added. The TLA+ half of the old entry is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cuihtlauac cuihtlauac merged commit b50a858 into main Jun 11, 2026
16 checks passed
@cuihtlauac cuihtlauac deleted the rung4-protocol-models branch June 11, 2026 15:15
cuihtlauac added a commit that referenced this pull request Jun 18, 2026
…adder to Rung 4 (#36)

The far-future request-timestamp replay residual surfaced during PR #35 (by the
Extended Rung 4 ReplayWindow model's NoFutureReplay witness). Rewrite the
top backlog item so it is explicit that the work is two-part: (1) the code fix
in check_freshness / parse_age, and (2) re-discharging every rung that covers
the touched code — Rung 1 regression test, Rung 2 freshness property, Rung 3
Kani monotonicity, Rung 4 flipping NoFutureReplay to a held invariant — so
assurance returns to the Rung 4 level rather than leaving the proofs trailing
the code.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant