From 9394da9e5205bd805bed71692380a338449eb9fc Mon Sep 17 00:00:00 2001 From: Ralf Anton Beier Date: Fri, 24 Apr 2026 07:00:24 +0200 Subject: [PATCH] feat: SDD revisions + variant-pruning and witness draft posts - Revise content/blog/2026-04-23-spec-driven-development-is-half-the-loop: correct sigil claims (split shipping-today vs. in-the-pipe; drop overstated orchestration with spar/rivet); add variant-pruning paragraph to MBSE section; update diagrams to mark cross-layer attestation composition as emerging rather than shipped - Add content/blog/2026-04-24-variant-pruning-rust-mcdc.md (draft): the five-layer variant-pruning argument for why MC/DC-for-Rust is tractable at AI-velocity authorship, with Wasm-as-layer-5 extension - Add content/blog/2026-04-25-witness-wasm-mcdc.md (draft): announcement skeleton for witness, the MC/DC-on-Wasm coverage tool tracked in #29 Co-Authored-By: Claude Opus 4.7 (1M context) --- ...pec-driven-development-is-half-the-loop.md | 24 +-- .../2026-04-24-variant-pruning-rust-mcdc.md | 137 ++++++++++++++++++ content/blog/2026-04-25-witness-wasm-mcdc.md | 112 ++++++++++++++ 3 files changed, 264 insertions(+), 9 deletions(-) create mode 100644 content/blog/2026-04-24-variant-pruning-rust-mcdc.md create mode 100644 content/blog/2026-04-25-witness-wasm-mcdc.md diff --git a/content/blog/2026-04-23-spec-driven-development-is-half-the-loop.md b/content/blog/2026-04-23-spec-driven-development-is-half-the-loop.md index 11aab38..3776953 100644 --- a/content/blog/2026-04-23-spec-driven-development-is-half-the-loop.md +++ b/content/blog/2026-04-23-spec-driven-development-is-half-the-loop.md @@ -91,7 +91,7 @@ There is a third half worth naming but not dwelling on — the iteration phase w **[spar](https://github.com/pulseengine/spar)** — our AADL v2.3 architecture toolchain, shipping at v0.6.0. 27+ analysis passes (scheduling, latency, ARINC 653 partitioning, EMV2 fault trees, memory budgets), a deployment solver with ASIL decomposition and SIL/DAL integrity constraints, a declarative assertion engine, LSP support, and a WASM component build. In the pipe: `spar-codegen` emits Cargo.toml and BUILD.bazel directly from the AADL model (the moment the architecture stops being a parallel drawing and starts being what the build reads), a SysML v2 parser for the requirements side of the roundtrip, a JSON CLI adapter so rivet can consume spar analysis results, and an MCP server so agents call spar for architecture review the same way they call rivet for traceability. -**[sigil](https://github.com/pulseengine/sigil)** — in-module cryptographic signing. Sigstore keyless, SLSA L4 predicates, per-transformation attestations. This is what operationalizes Clay Nelson's *cannot attest to what you did not observe* end-to-end: each build transformation produces signed evidence or it does not. +**[sigil](https://github.com/pulseengine/sigil)** — embedded cryptographic signing for WebAssembly. Shipping today: signatures carried inside WASM custom sections with no external registry required, full Sigstore keyless (OIDC → Fulcio → Rekor) with offline verification, in-toto SLSA provenance at L2 and L3. In the pipe: transformation-attestation data structures that upstream tools (Bazel rules, Loom, WAC) can emit per pipeline step, post-quantum signatures (SLH-DSA / FIPS 205, design-stage today), and the reproducibility work needed to substantiate SLSA L4. sigil is the signing and verification layer; the orchestration that threads attestations across Bazel, `rivet validate`, Kani, and Verus lives in each of those tools — not in sigil itself. **Spec Kit / Kiro / Tessl** — fine front-ends for `/specify`. The minimum we expect them to emit into our loop is a draft rivet requirement: title, description, acceptance criteria structured enough that rivet's schema can validate them, and a parent architecture or requirement link. Anything less is a prose ticket in a different file. @@ -111,7 +111,7 @@ flowchart LR impl["implement
scripts/mythos
failing-test oracle"] unit["unit tests
test pass"] integ["integration
rivet validate
verified-by links"] - signed([sigil-signed
attestation bundle]) + signed([sigil-signed WASM
+ SLSA L2/L3 provenance]) issue --> req --> design --> impl --> unit --> integ --> signed @@ -141,7 +141,7 @@ The dotted lines are the V-model's symmetry made concrete: the requirement is *v 5. **Ascend the right side.** `rivet validate` gates again. Every requirement now needs a `verified-by` link to a passing test; every code change needs an `implements` link to the design; every test needs a `verifies` link back to a requirement. The agent either adds the link (because the artifact exists) or flags the gap (because it does not). The ascent is not a ritual; it is a graph-completion task with the validator as oracle. -6. **Sign and merge.** `sigil` attaches signed evidence to each transformation — source commit, Bazel build, test pass, `rivet validate` zero-errors, Kani harness result if present. The PR carries the attestation bundle. Clay Nelson's *"cannot attest to what you did not observe"* becomes "observed and signed at each step." Audit trail complete. +6. **Sign and merge.** `sigil` signs the produced WASM with in-toto SLSA provenance — Sigstore keyless or key-based, offline-verifiable. As each upstream tool (Bazel, `rivet validate`, Kani, Verus) starts emitting its own in-toto predicate per transformation, those predicates ride alongside sigil's signature in the bundle carried by the PR. Clay Nelson's *"cannot attest to what you did not observe"* resolves into *"each oracle observes, each emits a predicate, sigil signs the bundle"* — signing and observing are different jobs, and the post-build composition across observers is work still maturing across the stack, not something sigil orchestrates by itself. At no point does anyone tell the agent *"follow the V-model."* The agent responds to `rivet validate` errors. That is what *"the tools require V-model shape"* means in practice — and it is why the V-model survives AI velocity instead of being eroded by it. @@ -170,26 +170,32 @@ flowchart LR codegen["spar-codegen
Cargo.toml · BUILD.bazel
#[aadl] attributes"] build[compiled binary] validate[rivet validate] - attest([sigil attestation bundle]) + signed([sigil-signed WASM
+ SLSA L2/L3 provenance]) + bundle([full attestation bundle
in-toto predicates per oracle
· target state ·]) mbse -->|drives| codegen mbse -->|gates| validate codegen --> build - build --> attest - validate --> attest + build --> signed + signed -.-> bundle + validate -.->|in-toto predicate
emerging| bundle classDef phase fill:#13161f,stroke:#3d4258,color:#8b90a0; classDef model fill:#1a1d27,stroke:#fbbf24,color:#e1e4ed; classDef step fill:#1a1d27,stroke:#6c8cff,color:#e1e4ed; classDef good fill:#1a1d27,stroke:#4ade80,color:#e1e4ed; + classDef future fill:#13161f,stroke:#4a5068,color:#8b90a0,stroke-dasharray: 5 5; class mbse phase; class spar,rivet model; class codegen,build,validate step; - class attest good; + class signed good; + class bundle future; {% end %} -Two concrete beats from our own stack: [spar](https://github.com/pulseengine/spar) already ships deployment allocation with ASIL decomposition and ARINC 653 partitioning — the integrity-level constraints are first-class in the solver, not bolted-on checks. Next on the roadmap, `spar-codegen` emits Cargo.toml and BUILD.bazel directly from the AADL model, with `#[aadl(period = ...)]` attributes tying Rust functions to the architecture elements they implement. The moment that lands, you cannot produce a binary without the model, because the build files are the model's output. rivet's variant artifacts do the symmetric thing on the requirements side — selecting which requirement set applies at which integrity level, gated by `rivet validate`. `sigil` signs the bundle only when all three layers — architecture, requirements, build — agree. The model is no longer a parallel document. It is the source of truth the build depends on, and that dependency is what makes skipping it impossible rather than merely discouraged. +Two concrete beats from our own stack: [spar](https://github.com/pulseengine/spar) already ships deployment allocation with ASIL decomposition and ARINC 653 partitioning — the integrity-level constraints are first-class in the solver, not bolted-on checks. Next on the roadmap, `spar-codegen` emits Cargo.toml and BUILD.bazel directly from the AADL model, with `#[aadl(period = ...)]` attributes tying Rust functions to the architecture elements they implement. The moment that lands, you cannot produce a binary without the model, because the build files are the model's output. rivet's variant artifacts do the symmetric thing on the requirements side — selecting which requirement set applies at which integrity level, gated by `rivet validate`. `sigil` signs the WASM output and emits in-toto SLSA provenance; as the transformation-attestation types in sigil mature and the upstream tools learn to emit predicates of their own, the signed bundle will carry evidence from each layer alongside the signature. Today, what's real is the signing and the signed build metadata — the cross-layer composition is the roadmap, not the ship. The model is no longer a parallel document. It is the source of truth the build depends on, and that dependency is what makes skipping it impossible rather than merely discouraged. + +The picture extends one layer further if you pull the thread. A variant-managed Rust codebase has at least five layers of variant-pruning machinery — requirements variants at the rivet level, cargo features at the build level, `#[cfg]` at source level, the type system at the type-check level, and pattern-match arms at runtime. Each prunes what the next layer has to reason about. When the model at the top specifies *"ASIL-D variant,"* the compounding reduction across every layer below takes a coverage burden that would otherwise be impractical at AI-authorship scale and bounds it to what a single variant's audit actually has to discharge. This inverts the usual *"Rust MC/DC is harder than C"* reading, and it is strong enough to earn its own treatment in [MC/DC for AI-authored Rust is tractable — the variant-pruning argument](/blog/variant-pruning-rust-mcdc/) — currently in draft. Where *"mandatory"* cuts: not every internal CRUD app needs MBSE. The line is systems with safety regulation, systems facing external auditors, and high-blast-radius infrastructure — cryptographic components, OS kernels, signing tools, language-model infrastructure with reach into hundreds of downstream systems. The common denominator is *"if this fails, the failure is not locally contained."* For those systems, the "too heavy" argument against MBSE stops holding; the alternative is shipping AI-authored code with no answer to how-do-you-prove. @@ -203,7 +209,7 @@ The post argues for a pattern; here is what would stop me from over-pitching it. **The oracle can be wrong.** If a Verus contract encodes an incorrect postcondition, or a rivet schema rule fires for the wrong reason, the pipeline confidently enforces the wrong behaviour. Three countermeasures, all already in our stack: (1) multiple independent oracles on the same property — Verus + Kani + property tests discharge the same claim by different techniques, and a bug in any one is revealed by disagreement with the other two; (2) mutation testing (cargo-mutants) catches tautological oracles — if mutating the code does not fail the test, the test was not a real oracle; (3) counter-examples force re-examination of the oracle, not just the code. When Kani reports a failing execution, the right move is to read the trace and ask *"is my property wrong?"* before asking *"is my code wrong?"* -**Signed is not the same as safe.** This is the attestation-illiteracy risk — management or auditors reading a `sigil` bundle and hearing *"safe."* The bundle claims only what was observed: *this commit's Verus proof discharged, these Kani harnesses ran to completion, `rivet validate` reported zero errors against these schema versions*. It does not claim the deployed binary is correct, that the proof obligations were the right ones, or that the schemas captured the right properties. The way to avoid the confusion is to name what the attestation *does not* cover inside the attestation itself — versions of each oracle, scope of the proof, schema revisions, explicitly excluded failure modes. +**Signed is not the same as safe.** This is the attestation-illiteracy risk — management or auditors reading a `sigil` bundle and hearing *"safe."* A well-formed bundle should claim only what each oracle observed: *this commit's Verus proof discharged, these Kani harnesses ran to completion, `rivet validate` reported zero errors against these schema versions*. It should not claim the deployed binary is correct, that the proof obligations were the right ones, or that the schemas captured the right properties. The way to avoid the confusion is to name what the attestation *does not* cover inside the attestation itself — versions of each oracle, scope of the proof, schema revisions, explicitly excluded failure modes. Today sigil signs the WASM output with SLSA L2/L3 provenance; the per-oracle predicates described above ride in the bundle as each upstream tool learns to emit them, which is ongoing work across the stack rather than a sigil feature alone. **Brownfield does not mean stop-the-world.** If you already have Jira or DOORS requirements, rivet reads those (or you script the import once) and starts producing diagnostics the next day — no new authoring required beyond filling the schema gaps rivet surfaces. spar is a bigger ask, so earn its place: introduce it where an AADL or SysML model already exists, or where the architecture analysis pays for itself (scheduling feasibility, ARINC 653 isolation, deployment allocation). sigil is additive — it layers onto whatever CI you have. diff --git a/content/blog/2026-04-24-variant-pruning-rust-mcdc.md b/content/blog/2026-04-24-variant-pruning-rust-mcdc.md new file mode 100644 index 0000000..19d5469 --- /dev/null +++ b/content/blog/2026-04-24-variant-pruning-rust-mcdc.md @@ -0,0 +1,137 @@ ++++ +title = "MC/DC for AI-authored Rust is tractable — the variant-pruning argument" +description = "The received wisdom is that Rust's pattern matching makes MC/DC harder than C. Under variant-managed AI-authored code, the opposite is true. Five layers of variant pruning, one oracle per layer, and a certification burden proportional to the single variant you ship — not the combinatorial product." +date = 2026-04-24 +draft = true +[taxonomies] +tags = ["verification", "process", "deep-dive"] +authors = ["Ralf Anton Beier"] ++++ + +{% insight() %} +The received wisdom about MC/DC for Rust reads: *"pattern matching makes it harder than C."* That is only true if you look at the runtime-decision layer in isolation. Rust has at least five layers of variant-pruning machinery — requirements variants, cargo features, cfg attributes, the type system, and match arms. Each prunes what the next layer has to reason about. Composed end-to-end under rivet-style variant management, MC/DC-for-Rust is less burdensome at certification scale than MC/DC-for-C ever was. AI-velocity authorship tips the economics further in Rust's favour, not worse. +{% end %} + +## The framing this post inverts + +The Ferrous Systems ESA SW PA post[^ferrous-post] summarises the common framing in one paragraph: + +> *"Rust has novel language features (such as pattern-matching) that don't map well to the existing MC/DC specifications, however a recent paper proposes a mapping that will allow the development of MC/DC tools for Rust in the future."* + +The received reading is: MC/DC was designed for C's boolean decisions; Rust pattern matching does not decompose naturally into boolean decisions; Rust needs a new mapping before tools can even count coverage. Once the mapping lands, Rust's coverage burden will be "addressable" but presumably large. + +The argument below is that the mapping is the least interesting part of the problem. Under multi-layer variant management — the kind a mature safety-critical product line already runs — the coverage *scope* at the decision layer shrinks multiplicatively with every layer above it. At scale, the burden becomes proportional to the single variant you ship, not to the combinatorial product across your feature matrix. + +## Five layers of variant pruning + +A Rust codebase has at least five distinct mechanisms for pruning the space of reachable decisions, stacked from design to execution: + +| Layer | Mechanism | Prunes what | Timing | +|---|---|---|---| +| 1. Requirements | rivet variant artifact | Which requirement set applies to this build | Design | +| 2. Build | Cargo features (`[features]`) | Which crates and modules link | Configure | +| 3. Source | `#[cfg(feature = ...)]`, `#[cfg(target_os = ...)]`, `#[cfg(debug_assertions)]` | Which code is even parsed | Compile | +| 4. Type | Exhaustive match, `!` (never type), type-state, const generics, trait bounds | Which match arms and branches are reachable in principle | Type-check | +| 5. Runtime | Match arms, `if let`, `?` desugaring | Which decisions fire on a given input | Execute | + +Each layer takes the variant space its predecessor produced and makes it smaller. Layer 1 selects a variant from the product family. Layer 2 translates that selection into feature toggles. Layer 3 elides entire blocks of source. Layer 4 closes gaps the source would otherwise leave open — Rust's exhaustive-match requirement and the never type remove whole classes of "unreachable but still counted" branches that C's MC/DC has to discharge by testing. Only what survives all four pruning steps ever reaches layer 5, where the Ferrous/DLR MC/DC mapping operates. + +If the compilation target is WebAssembly — as it is for PulseEngine — layer 5 is even more tractable than the Ferrous/DLR framing makes it look. By the time the Wasm module exists, pattern matching has already lowered to `br_if` and `br_table`, the `?` operator has desugared, type-state has been resolved, and cfg branches have been elided from the emitted code. The decisions visible in the Wasm are already MC/DC-shaped — small, explicit, without the source-level syntax that makes pattern matching hard to map. An MC/DC tool at the Wasm level measures coverage against an instruction set with a machine-readable spec and a reference test suite, neither of which C or Rust has to the same degree; the tool-qualification argument moves in your favour. And the coverage report describes what actually ships, not a source-level abstraction that the compiler then rearranges. + +## The multiplicative reduction + +Call the fully-expanded MC/DC decision space of the union of all variants `N_max`. That is the number MC/DC-for-Rust would nominally have to discharge if you naively applied the Ferrous mapping to every possible cfg-expansion of every feature combination across every variant. It is large. It is also not what any single shipped artifact actually contains. + +What the shipped artifact contains is a space of size: + +> `N_shipped(V) = f_1(V) · f_2(V) · f_3(V) · f_4(V) · N_max` + +where each `f_i(V) ∈ (0, 1]` represents the pruning fraction at layer `i` when variant `V` is selected. For a well-decomposed product line targeting, say, ASIL-D with a specific board and feature profile, every `f_i` is aggressive. The product is small. + +**Certification is against the shipped artifact, not against the union.** You present the MC/DC report for variant V, not for every possible V. The combinatorial argument you sometimes hear — *"Rust MC/DC is impractical because the decision space is huge"* — is comparing `N_max` to what C MC/DC reports do per-configuration. That is not an apples-to-apples comparison. + +Under variant management done properly, the apples-to-apples comparison is `N_shipped(V)` for Rust vs. the equivalent-configuration MC/DC space for C. On that comparison, Rust comes out ahead — because C's preprocessor-based cfg is structurally the same as Rust's, but C's type system does not provide layer-4 pruning. + +## Why AI authorship makes this load-bearing + +Variant-aware certification was historically hard for a practical reason: authoring the variant graph, the feature flags, the cfg discipline, and the per-variant test suite was expensive human work. Product-line engineering research has been on this problem for thirty years and the adoption story outside a few hold-out industries is mixed[^ple-lit]. + +AI-velocity authorship changes that calculus. The agent that writes the code also writes the variant graph, the cfg, the feature declarations, and the per-variant tests. What stops the output from being unattestable is the same thing that stops any AI output from being unattestable: a mechanical oracle at each layer. I made this argument for the pattern generally in [Spec-driven development is half the loop](/blog/spec-driven-development-is-half-the-loop/). Applied to the five-layer variant stack, the oracles are: + +- **Layer 1** — `rivet validate` checks that each requirement has a variant, each variant has linked evidence, each evidence has a linked test. Today. +- **Layer 2** — `cargo metadata`, `cargo tree --edges features`, and per-variant CI matrices catch feature-combination rot. `cargo check --all-features` and its per-variant siblings run today. +- **Layer 3** — `rustc --cfg=... --emit=dep-info` resolves cfg-expansion at compile time. Tooling to report *which* cfg blocks were active in a build is simple to add if it is not already present. +- **Layer 4** — `rustc` itself is the oracle. `#![deny(warnings)]`, `clippy`, exhaustive-match enforcement, and type-state discipline are all mechanical and already run. +- **Layer 5** — the Ferrous/DLR MC/DC tooling will be the oracle once it lands. The mapping paper[^ferrous-paper] is the theoretical work; a qualified tool is the engineering work tracked in the 2026 Rust Project Goal[^rust-project-goal]. + +Each layer's failure is a diagnostic, not an opinion. Each diagnostic is something an agent can act on without human interpretation. The same oracle-gated scaffold discipline that works for bug hunting and for V-model gap closure works here. + +## With our stack, concretely + +- **rivet** — layer 1. Variant artifact, requirement-to-variant link, variant-to-feature-set link. Ships today; variant-schema refinements ongoing. +- **Cargo** — layer 2. Feature declarations, variant-to-feature mapping fed by rivet output. The `spar-codegen` → Cargo.toml plan extends this (see the MBSE section of the SDD post). +- **stable rustc + clippy + `#![deny(warnings)]`** — layers 3 and 4. Ships today. Under Ferrocene-qualified rustc, the oracle gains certification standing. +- **Ferrocene** — when qualified rustc ships against a given integrity target, layers 3 and 4 acquire certification weight. IEC 61508 SIL 2 already exists for the `core` subset; DO-178C qualification is the roadmap[^ferrocene-cert]. +- **Ferrous/DLR MC/DC tool** — layer 5, when it ships. + +None of these is complete as a composed stack today. The argument is that the composition, not any single piece, is what makes the economics work. + +## The inversion + +Pulling the threads together: + +- C code under preprocessor-based cfg has roughly layers 1 (if you count project-level variant management), 2 (via build flags), 3 (`#ifdef`), and 5 (MC/DC at runtime-decision level). Layer 4 is absent — C's type system does not prune reachable branches. +- Rust code under rivet / cargo / cfg / type-discipline / match-arm-MC/DC has all five layers. +- Each layer that prunes at all makes the product strictly smaller. Adding a layer cannot make the product larger. + +Therefore: + +> **For a variant-managed product shipped at a specific configuration, MC/DC-for-Rust requires strictly less coverage effort than MC/DC-for-equivalent-C, once all layers are counted.** + +This inverts the naive reading of the Ferrous paragraph. The novel language features do not make MC/DC harder in net; they provide an additional pruning layer C does not have. Ferrous's post is not wrong — layer 5 alone is harder for Rust than for C. The post is measuring at the wrong boundary. + +## What this does not solve + +Three honest caveats. + +**Layer independence is an assumption.** The multiplicative argument assumes the pruning at each layer is orthogonal. In practice, cfg-guarded code can interact with type-system choices (a feature-gated generic bound, for instance), so the product `f_1 · f_2 · f_3 · f_4` can be loose. The argument degrades to *"strictly less or equal to C"* under pathological interaction, which is still a net win but less dramatic. + +**Ferrocene MC/DC tooling is not yet shipping.** The layer-5 oracle is proposed in a paper, referenced in the Rust Project Goal for 2026, and worked on by the Safety-Critical Rust Consortium. Until the tool lands and is qualified, the argument is theoretical at layer 5. The other four layers are operational today, so partial value is already available. + +**Proofs are still strictly stronger than coverage.** This post argues MC/DC-for-Rust is tractable, not sufficient. For ASIL-D and DAL-A, the full dossier wants both. The earlier posts in this arc — [Formal verification just became practical](/blog/formal-verification-ai-agents/), [What comes after test suites](/blog/what-comes-after-test-suites/) — make the proof-first case. MC/DC is what you add to the dossier *alongside* proofs, not instead of them. + +## Academic prior art + +Variability-aware analysis that prunes verification scope is a mature research line. Starting points: + +- **Czarnecki & Eisenecker** — *Generative Programming: Methods, Tools, and Applications* (2000). The original MBE/PLE synthesis. +- **Classen, Heymans, Schobbens, Legay** — family-based model checking for software product lines (TSE 2013). Proves that SPL-wide verification can be done without enumerating all variants, which is the formal analogue of the MC/DC argument here. +- **Thüm et al.** — *A Classification and Survey of Analysis Strategies for Software Product Lines* (ACM Computing Surveys 2014). The reference survey. +- **Apel, Batory, Kästner, Saake** — *Feature-Oriented Software Product Lines* (2013). The textbook. + +What this post contributes, as far as I can tell, is the Rust + AI-authorship angle. The academic literature is C-and-Ada-heavy; Rust's type-system pruning (layer 4) is qualitatively different from what PLE literature analyses. And AI-authorship as the economic ingredient that makes maintaining the variant graph cheap is not an angle the 2010–2014 literature was in a position to make. + +## Take-away + +- The coverage burden you have to discharge at ship is proportional to the single variant you ship, not to the product of your feature matrix. This is the variant-pruning principle. +- Rust provides four pruning layers above MC/DC itself — requirements variants, cargo features, cfg, and the type system. Three of the four ship today; the fifth (MC/DC tooling) is in development. +- AI-velocity authorship makes authoring all five layers cheap. That is what makes the economics work at scale. +- Under variant management, MC/DC-for-Rust at certification scale is *less* burdensome than MC/DC-for-C, not more. The received wisdom is measuring at layer 5 in isolation. + +--- + +## Sources + +[^ferrous-post]: Ferrous Systems — *Rust: Who, What and Why for ESA SW PA Workshop*, September 2025. [ferrous-systems.com/blog/rust-who-what-why](https://ferrous-systems.com/blog/rust-who-what-why/). Primary (author publication). + +[^ferrous-paper]: *Toward Modified Condition/Decision Coverage of Rust*. German Aerospace Center (DLR) and Ferrous Systems. arXiv:2409.08708, 2024. [arxiv.org/abs/2409.08708](https://arxiv.org/abs/2409.08708). Primary (preprint). + +[^rust-project-goal]: Rust Project — *What does it take to ship Rust in safety-critical?* 2026-01-14. [blog.rust-lang.org](https://blog.rust-lang.org/2026/01/14/what-does-it-take-to-ship-rust-in-safety-critical/). Primary (Rust Project publication; MC/DC listed as a 2026 goal). + +[^ferrocene-cert]: Ferrous Systems — *IEC 61508 (SIL 2) Certification for Rust Core Library Subset*. [ferrous-systems.com/blog](https://ferrous-systems.com/blog/). Primary. + +[^ple-lit]: Representative starting points: Czarnecki & Eisenecker (*Generative Programming*, 2000); the Classen / Heymans family-based model checking papers (2010–2013); Apel et al. (*Feature-Oriented Software Product Lines*, 2013); the Thüm et al. survey (ACM Computing Surveys 2014). Secondary — referenced collectively rather than quoted. + +--- + +*This post is part of [PulseEngine](/) — a formally verified WebAssembly Component Model engine for safety-critical systems. Prior posts in the arc: [Formal verification just became practical](/blog/formal-verification-ai-agents/), [What comes after test suites](/blog/what-comes-after-test-suites/), [Spec-driven development is half the loop](/blog/spec-driven-development-is-half-the-loop/).* diff --git a/content/blog/2026-04-25-witness-wasm-mcdc.md b/content/blog/2026-04-25-witness-wasm-mcdc.md new file mode 100644 index 0000000..bade8c8 --- /dev/null +++ b/content/blog/2026-04-25-witness-wasm-mcdc.md @@ -0,0 +1,112 @@ ++++ +title = "witness — MC/DC for the WebAssembly component model" +description = "The variant-pruning post argued that MC/DC on AI-authored Rust is tractable at Wasm level. This post is the tool: witness instruments a Wasm module, runs a test harness, emits a branch-coverage report, and composes with rivet and sigil for the full evidence chain. v0.1 today, Check-It-pattern qualification on the roadmap." +date = 2026-04-25 +draft = true +[taxonomies] +tags = ["verification", "wasm", "deep-dive"] +authors = ["Ralf Anton Beier"] ++++ + +{% insight() %} +The variant-pruning post argued that MC/DC-for-Rust becomes tractable when you measure at the WebAssembly level rather than the source level — the pattern-matching problem dissolves once code is lowered to `br_if` and `br_table`, the instruction set is small and formally specified, and coverage describes what actually ships. *witness* is the instrument that makes the argument empirical. It instruments a Wasm module, runs the test harness you already have, and emits a branch-coverage report that rivet can read as requirement-to-test evidence and sigil can carry in its attestation bundle. When Rust-level MC/DC lands via the Ferrous / DLR work, witness composes with it rather than competing: two measurement levels, two sets of blind spots, one additive argument. +{% end %} + + + +## Why another coverage tool + +I argued the case across the prior two posts: + +- [Spec-driven development is half the loop](/blog/spec-driven-development-is-half-the-loop/) framed the pattern — oracle-gated agents downstream of the spec, mechanical instruments at every layer, MBSE driving the build. +- [MC/DC for AI-authored Rust is tractable — the variant-pruning argument](/blog/variant-pruning-rust-mcdc/) argued that five layers of variant pruning (requirements → cargo features → cfg → type system → match arms) collapse the MC/DC coverage burden to what a single shipped variant actually exposes, and that at the Wasm level the hardest part of the Ferrous/DLR mapping problem (Rust pattern matching → MC/DC decisions) has already resolved before measurement begins. + +Both posts asked a question the blog cannot answer: is there a tool? Until now, no. This post is the tool. + +## What witness does + +Point it at a Wasm module and a test-runner command: + +```sh +witness instrument app.wasm -o app.instrumented.wasm +witness run --harness "cargo test --target wasm32-wasi" --module app.instrumented.wasm +witness report +``` + +The instrumentation step rewrites the Wasm to emit a counter at every `br_if`, `br_table`, and `if` instruction. The run step executes the test harness against the instrumented module and collects the counter values. The report step produces a branch-coverage summary keyed to (module, function, offset), with source-level mapping when DWARF-in-Wasm or the name section is present. + +That is v0.1. It is not MC/DC in the strict condition-by-condition sense yet. It is branch coverage with an MC/DC-shaped roadmap. + +## v0.1 — v1.0 + +Honest incremental scope: + +| Version | What it does | Status | +|---|---|---| +| v0.1 | Branch coverage on Wasm at the decision level. Strict per-`br_if` / per-`br_table` counting. | *TODO: fill in actual status — shipping / in PR / planned* | +| v0.2 | MC/DC condition decomposition when DWARF is present. Groups related `br_if` sequences back into source-level decisions. | Planned | +| v0.3 | rivet integration. Coverage report emits as an in-toto predicate that rivet can link to requirements and sigil can carry. | Planned | +| v0.4 | Variant-aware scope. Post-cfg, post-meld, post-loom measurement points — each one a selectable instrumentation target. | Planned | +| v1.0 | Check-It qualification artifact. Coverage attestation that a small trusted checker can validate, collapsing DO-330 qualification from *"qualify witness"* to *"qualify the checker."* | Planned | + +Each step fills a specific gap, and each step can ship independently. Don't wait for v1.0 to get value — v0.1 already produces the branch-coverage evidence the variant-pruning argument depends on. + +## The hard problem — decision granularity at Wasm level + +Short-circuit evaluation at Rust source (`a && b && c`) compiles to three `br_if` instructions in the emitted Wasm. MC/DC says *"each condition independently affects the decision outcome."* Two honest interpretations: + +- **Strict**: each `br_if` is its own decision. Easy to measure, easy to qualify; loses the source-level condition grouping. +- **Reconstructed**: group the sequence of related `br_if`s back into the source-level decision and measure MC/DC over the reconstruction. Harder; needs DWARF-in-Wasm or explicit compiler hints. + +v0.1 reports strict coverage only. v0.2 adds DWARF-informed reconstruction with strict as the fallback when DWARF is absent. The definition — exactly how the reconstructed groups are formed, and what invariants the grouping preserves — is worth a short paper. The tool implementation forces you to pick; the paper explains why this pick is the right one. + +## Why this does not make Rust-level MC/DC obsolete + +Resistance is futile — we adopt both. The overdo principle from [Overdoing the verification chain](/blog/overdoing-the-verification-chain/) applies here at the coverage layer: + +- **Rust-level MC/DC** (Ferrous / DLR, when it ships under the 2026 Rust Project Goal) measures decisions at the *source*. Its blind spot is compiler rearrangement — what the compiler emits might have different branches than what the source expresses. +- **witness** measures decisions at the *post-compile Wasm*. Its blind spot is the source-level intent — a condition that expresses a requirement clearly at source might be split into multiple Wasm-level branches with no obvious unit. +- **Translation validation** (loom's Z3 TV on Wasm-IR transformations) bridges the two levels. A proof at the source holds at the Wasm if TV discharges the transformation, so coverage at either level stands in for the other when needed. + +Two measurement levels, two blind spots, one additive evidence chain. The same dossier discipline DO-178C has accepted since 1992 — source-level coverage *and* object-code coverage — applies directly. Wasm is the new post-preprocessor level; rustc → Wasm is the compilation step; post-synth machine code is the object code. Witness fills the middle. + +## How it composes with the rest of the stack + +- **rivet** reads witness reports as test-to-requirement coverage evidence. `rivet validate` can now report *"requirement REQ-N has no test exercising decision D at offset O"* — a new failure class that names uncovered branches specifically. +- **sigil** carries witness reports as in-toto coverage predicates in the attestation bundle (once sigil composes upstream predicates; see the SDD post's note on sigil's transformation-attestation types). +- **loom** emits the post-optimization Wasm that witness measures; loom's translation validation is what makes "coverage on optimized Wasm" a valid stand-in for "coverage on pre-optimization Wasm." +- **meld** fuses components; witness can measure coverage on the fused module or on individual components before fusion. +- **kiln** (or any other Wasm runtime) executes the instrumented module during the test run. +- **spar** — architecture-level. Not directly involved in coverage measurement, but the variant selected at the spar / rivet layer determines which Wasm is produced, which determines what witness measures. + +Coverage for AI-authored code is not a one-tool problem. It is a pipeline problem. Each tool owns a narrow mechanical check; the composition is what the audit trail holds. + +## Open questions + +- **DWARF-in-Wasm maturity.** The spec exists; tooling is uneven across compilers. rustc emits usable DWARF for Wasm targets; Go does partial; AssemblyScript does not. v0.2's reconstruction quality depends on how good the DWARF is. +- **Loom-TV interaction.** When loom optimizes the Wasm, the CFG rearranges. Coverage at pre-loom and post-loom Wasm may disagree. That's arguably correct — you want coverage on what ships — but it needs a careful story for the audit. +- **Decision-granularity formal definition.** Named above; the paper I owe. +- **Component model semantics.** Multi-component Wasm has cross-component calls (`call_indirect`, imports / exports). Whether coverage across components counts as a single argument or separate arguments-per-component is an audit question, not just a technical one. + +Honest v0.1 does not solve these. Honest v1.0 does. + +## What this unlocks + +With witness running against the PulseEngine codebase, the variant-pruning argument stops being prose and becomes a report. The claim *"MC/DC-for-Rust becomes tractable under multi-layer variant management"* becomes a concrete measurement: here is the `f_1 · f_2 · f_3 · f_4 · N_max` scope for variant V, here is the coverage report for V, here is the delta when V changes. The audit dossier has numbers, not just argument. + +That is the point of shipping the tool. Arguments without measurements age poorly in safety-critical contexts. Measurements outlive the argument that motivated them. + +--- + +## Sources + + + +--- + +*This post is part of [PulseEngine](/) — a formally verified WebAssembly Component Model engine for safety-critical systems. Prior posts in the arc: [Formal verification just became practical](/blog/formal-verification-ai-agents/), [What comes after test suites](/blog/what-comes-after-test-suites/), [Spec-driven development is half the loop](/blog/spec-driven-development-is-half-the-loop/), [MC/DC for AI-authored Rust is tractable — the variant-pruning argument](/blog/variant-pruning-rust-mcdc/).*