diff --git a/content/blog/2026-03-03-zero-cost-component-model.md b/content/blog/2026-03-03-zero-cost-component-model.md deleted file mode 100644 index 9da89a6..0000000 --- a/content/blog/2026-03-03-zero-cost-component-model.md +++ /dev/null @@ -1,97 +0,0 @@ -+++ -title = "The Component Model as a zero-cost abstraction for safety-critical systems" -description = "The WebAssembly Component Model gives you composition, portability, and isolation at development time. A build-time pipeline can erase that overhead before it reaches the target. This post introduces the approach — the deep dives follow." -date = 2026-03-03 -draft = true -[taxonomies] -tags = ["deep-dive", "architecture", "series"] -authors = ["Ralf Anton Beier"] -+++ - -*This is part 1 of a series on building a verified WebAssembly pipeline for safety-critical embedded systems.* - -{% insight() %} -Your development teams can use modern, language-neutral composition tools — typed interfaces, isolation, versioning — without paying a runtime penalty on the target device. The same component works from prototype through production-qualified firmware. The build pipeline erases the abstraction overhead; what ships is a flat binary indistinguishable from hand-written code. -{% end %} - -## What the Component Model gives you - -The WebAssembly Component Model solves a real problem: portable, language-neutral composition with strong isolation guarantees. You can build a component in Rust, another in C, compose them through typed interfaces, and the result is a self-describing binary that any compliant runtime can execute. - -[wasmtime](https://github.com/bytecodealliance/wasmtime) is the reference runtime — full Component Model support, JIT compilation via Cranelift, comprehensive WASI implementation. It is where the Component Model is developed and validated, and it works well for server and cloud workloads. - -The question we are exploring is whether the same composition model can work for a different class of targets: automotive ECUs, avionics, medical devices — systems where the runtime environment, binary size, and qualification requirements look very different. - -## Where embedded stands today - -Embedded Wasm runtimes exist and are maturing. [WAMR](https://github.com/bytecodealliance/wasm-micro-runtime), [wasm3](https://github.com/wasm3/wasm3), [wasmi](https://github.com/wasmi-labs/wasmi), [DLR's wasm-interpreter](https://github.com/DLR-FT/wasm-interpreter) — all consume core Wasm modules. None of them support the Component Model. This is not a criticism — it reflects a reasonable engineering tradeoff. The Component Model adds structural complexity that minimal runtimes are designed to avoid. - -The pattern across the space is consistent: use the Component Model for development-time composition, then strip it before deployment. The question is whether that stripping can be done systematically, verifiably, and without losing the benefits. - -## The pipeline - -Instead of asking the runtime to handle component structure, we resolve it at build time: - -{% mermaid() %} -graph LR - W(.wasm component) --> M(meld) - M --> L(loom) - L --> S(synth) - K(kiln) -->|host + builtin components| S - S --> T(firmware / ELF) - SG(sigil) -.->|attest| M - SG -.->|attest| L - SG -.->|attest| S -{% end %} - -**[meld](/blog/meld-v0-1-0/)** fuses the component structure into a flat core module. Internal imports become direct calls. The shim modules, indirect tables, and canonical ABI trampolines disappear. - -**loom** optimizes the fused module. After meld has erased component boundaries, loom sees the entire program and can optimize across what were cross-component call boundaries — opportunities that no linker or general-purpose optimizer would see. - -**synth** transcodes to native code. The flat, optimized Wasm module becomes an ELF binary or firmware image. **kiln** provides the runtime layer — the host and builtin component implementations that the native code links against. - -**sigil** attests every transformation. The final artifact carries cryptographic evidence of exactly which pipeline produced it. - -Each tool gets its own deep dive later in this series. What matters here is the overall argument. - -## What "zero cost" means - -Calling this a "zero-cost abstraction" invites scrutiny, and it should. - -**What the pipeline eliminates:** Component structure (module nesting, instantiation chains, import/export wiring), canonical ABI trampolines (`canon lift` / `canon lower` fused into direct calls), indirect dispatch (shim tables and fixup modules), and cross-component optimization barriers. - -**What it preserves by design:** Resource handle tables remain runtime state. String transcoding persists when encodings differ. Async scheduling, streams, and threads are not overhead to eliminate — they are essential. Freedom from interference is a core safety requirement (ISO 26262), and structured concurrency is how you achieve it. WASI P3 will expand this with language-integrated concurrency, zero-copy, and high-performance streaming. The pipeline resolves what can be determined statically and preserves what must remain dynamic. - -**What does not fit safety-critical:** GC introduces non-deterministic timing that conflicts with worst-case execution time analysis. For safety-critical components, GC is not applicable. - -The honest framing: the Component Model's *structural* overhead is zero-cost — fully eliminated at build time. The *behavioral* overhead persists where it must, and in the case of async and freedom from interference, it *should* persist because runtime scheduling is a safety requirement, not waste. - -### The precedent - -This pattern is not new. Rust's generics are monomorphized at compile time. C++ templates work the same way. GraalVM compiles Spring's dependency injection into direct calls. The Component Model follows the same principle: rich composition at development time, flat artifact at deployment time — but at the module boundary rather than the function boundary. - -Fusing modules across memory spaces has implications. Luke Wagner raised [isolation concerns](https://github.com/WebAssembly/component-model/issues/386) about shared-everything linking. For safety-critical systems targeting single-purpose firmware images, this tradeoff is well-understood and acceptable. - -## Where we are building toward - -A verified build pipeline would change the economics of safety-critical software integration. If each transformation can be proven to preserve semantics — and sigil attests the chain — then components are validated once, the pipeline is qualified once, and integration testing focuses on what actually changed: the system context. - -{% note(kind="warning") %} -This is the goal, not the current state. meld is at v0.1.0. Formal verification of the pipeline is future work. No Wasm runtime or toolchain has undergone DO-178C or ISO 26262 qualification today. We are building toward this, not claiming it. -{% end %} - -## The series - -This post sets up the argument. The rest of the series goes deep on each piece: - -1. **The Component Model as a zero-cost abstraction** — this post -2. **[meld v0.1.0: static component fusion](/blog/meld-v0-1-0/)** — what `wit-component` produces, what meld does to it, what the Wasm looks like after -3. **[meld: from intra-component fusion to cross-component composition](/blog/meld-component-fusion/)** — where meld is heading and what cross-component fusion changes -4. **[loom: post-fusion optimization](/blog/loom-post-fusion-optimization/)** — why optimizing after fusion is different from wasm-opt, and how we verify the result -5. **[synth + kiln: from Wasm to firmware](/blog/synth-kiln-wasm-to-firmware/)** — native transcoding, the runtime layer, and the landscape of Wasm-to-native tools -6. **[Proving the pipeline](/blog/proving-the-pipeline/)** — formal verification of each transformation, and what it takes to qualify for ISO 26262 and DO-178C -7. **[The toolchain: hermetic builds and supply chain attestation](/blog/hermetic-toolchain/)** — Bazel, Nix, sigil, and reproducible builds for a qualified pipeline -8. **[temper: automated governance](/blog/temper-governance/)** — enforcing signed commits, branch protection, and CI attestation across every repository -9. **[thrum: autonomous AI-driven development](/blog/thrum-orchestrator/)** — AI agents, formal verification gates, and the development loop that builds itself - -If you are working in this space — embedded Wasm, safety-critical systems, Component Model tooling — we would like to hear from you. Everything is at [github.com/pulseengine](https://github.com/pulseengine). diff --git a/content/blog/2026-03-04-loom-post-fusion-optimization.md b/content/blog/2026-03-04-loom-post-fusion-optimization.md deleted file mode 100644 index 9995cff..0000000 --- a/content/blog/2026-03-04-loom-post-fusion-optimization.md +++ /dev/null @@ -1,76 +0,0 @@ -+++ -title = "loom: why optimizing after fusion is not the same as wasm-opt" -description = "After meld fuses components into a flat module, loom optimizes across boundaries that no longer exist. A fused optimizer that understands what meld produces, with optional Z3 translation validation to verify the result." -date = 2026-03-09 -draft = true -[taxonomies] -tags = ["deep-dive", "loom", "series"] -authors = ["Ralf Anton Beier"] -+++ - -*This is part 4 of a series on building a verified WebAssembly pipeline for safety-critical embedded systems. [Part 1](/blog/zero-cost-component-model/) introduces the approach. [Part 3](/blog/meld-component-fusion/) covers where meld is heading with cross-component fusion.* - -{% insight() %} -Every optimization loom applies can be mathematically verified. When a certification authority asks "how do you know your optimizer did not introduce a defect?", the answer is concrete: Z3 proved equivalence, function by function. This shifts optimization from a trust-based activity to an evidence-based one. -{% end %} - -## What loom does - -After meld fuses a WebAssembly component into a flat core module, the result is structurally simple but not yet optimized. The adapter trampolines are gone, the indirect dispatch tables are collapsed, the import chains are resolved — but the code still carries artifacts of its multi-module origin. Dead functions from adapter generation, redundant memory operations from same-memory copies, trivial forwarding calls that used to cross component boundaries. - -loom is an optimization pipeline designed specifically for this fused output. - -## The fused optimizer - -This is what makes loom different from a general-purpose Wasm optimizer. The fused optimizer understands the patterns that meld produces: - -**Adapter devirtualization.** When meld fuses components, it generates adapter functions — thin wrappers that forward calls between what used to be separate modules. loom detects these trivial forwarders and rewrites callers to call the target directly. The adapter becomes dead code and gets eliminated. - -**Same-memory adapter collapse.** In the Component Model, cross-component calls copy data between linear memories. After meld fuses components that share the same memory (which is common — the adapter and main module typically share memory), the copy becomes a same-address copy. loom detects this pattern and eliminates the redundant allocation and memcpy entirely. - -**Canonical ABI cleanup.** meld resolves `canon lift` / `canon lower` pairs into concrete function calls, but the resulting code may still contain patterns like trivial `cabi_post_return` functions (empty cleanup stubs) or duplicate type definitions from the original multi-module structure. The fused optimizer cleans these up. - -These are patterns that [wasm-opt](https://github.com/WebAssembly/binaryen) — the standard Wasm optimizer — has no concept of. wasm-opt does not understand the Canonical ABI, does not know what a meld adapter looks like, and [does not support Component Model input at all](https://github.com/WebAssembly/binaryen/issues/6728). - -## What wasm-opt does well - -This is not a competition. wasm-opt is production-grade, battle-tested across Emscripten, Rust, and LLVM, with 10 years of development and over 150 optimization passes. For standard Wasm optimization — constant folding, dead code elimination, inlining, loop-invariant code motion, register allocation, GC type optimization, SIMD — wasm-opt is the ecosystem standard and categorically more comprehensive than loom's general-purpose passes. - -After meld fuses and loom applies its fused optimizer, running wasm-opt on the result is a reasonable and complementary step. loom handles the fusion-specific patterns; wasm-opt handles the deep general-purpose optimization. They work on different layers of the problem. - -## Where loom goes further: Z3 translation validation - -For safety-critical systems, "the optimizer probably didn't break anything" is not sufficient. loom includes optional translation validation via Z3 SMT solving: after optimizing a function, loom can encode both the original and optimized versions as logical formulas and ask Z3 to prove they are semantically equivalent. - -This is not full formal verification of the optimizer itself — it is a per-function post-hoc check that the specific optimization applied to the specific function preserved semantics. If Z3 finds a counterexample, the optimization is rejected for that function. - -{% note(kind="warning") %} -Translation validation is implemented but has limitations. It currently works on functions with straightforward control flow. Complex branching patterns, indirect calls, and memory aliasing require further work. This is an active area of development, not a production capability. -{% end %} - -The direction is clear: every optimization loom applies should be verifiable. Not just tested — proven, function by function, transformation by transformation. This connects to the broader verification story across the pipeline, covered in [part 5](/blog/proving-the-pipeline/). - -## The optimization pipeline - -Beyond the fused optimizer, loom runs a multi-phase pipeline on the fused module: - -1. **Fused optimizer** — adapter devirtualization, same-memory collapse, Canonical ABI cleanup -2. **Constant propagation** — replace immutable globals with their values -3. **Constant folding** — fold arithmetic on known values, algebraic identities, strength reduction -4. **Common subexpression elimination** — deduplicate repeated computations -5. **Function inlining** — inline small functions exposed by fusion -6. **Post-inline constant folding** — second pass to exploit inlining results -7. **Loop-invariant code motion** — hoist loop-invariant expressions -8. **Branch simplification and dead code elimination** — clean up control flow -9. **Block merging and local simplification** — final cleanup - -The first phase is what makes loom unique. Phases 2-9 are standard compiler optimizations — loom implements them, wasm-opt implements them better. The value of having them in loom is pipeline integration and the ability to validate each transformation with Z3. - -## What's next for loom - -- Expanding Z3 validation coverage to handle more control flow patterns. -- Validating the fused optimizer against meld's evolving output as meld gains cross-component fusion. -- Exploring whether loom's fused optimization passes could be contributed upstream as Binaryen passes for the broader ecosystem. -- ISLE rule compilation — loom uses Cranelift's ISLE pattern-matching infrastructure for its type representation, with the goal of expressing optimization rules declaratively. This is infrastructure that is set up but not yet active. - -*Next in the series: [part 5 — synth + kiln: from Wasm to firmware](/blog/synth-kiln-wasm-to-firmware/).* diff --git a/content/blog/2026-03-05-meld-component-fusion.md b/content/blog/2026-03-05-meld-component-fusion.md deleted file mode 100644 index 71d6569..0000000 --- a/content/blog/2026-03-05-meld-component-fusion.md +++ /dev/null @@ -1,73 +0,0 @@ -+++ -title = "meld: from intra-component fusion to cross-component composition" -description = "meld v0.1.0 fuses the internal structure of a single component. The next step is fusing multiple components that talk to each other — resolving cross-component calls, shared types, and resource lifecycles at build time." -date = 2026-03-06 -draft = true -[taxonomies] -tags = ["deep-dive", "meld", "series"] -authors = ["Ralf Anton Beier"] -+++ - -*This is part 3 of a series on building a verified WebAssembly pipeline for safety-critical embedded systems. [Part 1](/blog/zero-cost-component-model/) introduces the approach. [Part 2](/blog/meld-v0-1-0/) is a technical walkthrough of meld v0.1.0 — what `wit-component` produces and what meld does to it.* - -{% insight() %} -Build a sensor driver once, validate it once, deploy it across product lines. Cross-component fusion means reusable software components compose at development time and fuse into a single optimized binary at build time — no integration overhead, no per-project re-validation of component interactions. -{% end %} - -## What meld does today - -meld v0.1.0 handles intra-component fusion. A single WebAssembly component — the tree of core modules, shim modules, fixup modules, and canonical ABI operations that `wit-component` produces — gets flattened into one core module. The [v0.1.0 walkthrough](/blog/meld-v0-1-0/) shows this in detail: 4 core modules, 23 instances, and 23 canonical ABI operations become 1 module with 54 functions and direct calls. - -This already solves a real problem. Every embedded Wasm runtime consumes core modules, not components. meld bridges that gap at build time — you develop with the Component Model, deploy as a core module. - -## Where meld is heading - -Intra-component fusion is the foundation. The harder and more valuable problem is cross-component fusion: taking multiple components that communicate through WIT interfaces and fusing them into a single artifact. - -### Cross-component calls - -When component A calls component B through a WIT interface, the Canonical ABI inserts a `canon lower` on the caller side and a `canon lift` on the callee side. These operations handle type conversion, memory allocation in the callee's linear memory, and potentially string transcoding. - -After cross-component fusion, these become direct function calls within a single module. If both components share the same string encoding and the same memory (after fusion, they do), the entire lift/lower sequence reduces to a plain call with no marshalling overhead. - -### Shared types and deduplication - -Composed components often import the same WIT types and the same WASI interfaces. After fusion, these redundant definitions collapse. Duplicate type definitions, duplicate import declarations, duplicate resource handle tables — all resolved to single definitions. - -### Resource lifecycle - -Component Model `resource` types have handle tables that track ownership across component boundaries. When two components are fused, resources that were passed across the boundary become internal. The handle table management for those resources can be eliminated — the resource is now a direct reference within a single module. - -Resources that cross the boundary to the *host* still need handle tables. But component-to-component resource passing becomes zero-cost after fusion. - -### What this means for the pipeline - -Cross-component fusion is where meld's output becomes significantly more valuable for loom. After intra-component fusion, loom already sees across what were module boundaries. After cross-component fusion, loom sees across what were *component* boundaries — enabling optimization across the entire composed application. - -This is where the "zero-cost abstraction" claim gets its strongest support. The full Component Model — typed composition, isolation, versioned interfaces — is available during development. After meld, all of it is gone. What remains is a flat module that looks like it was written as a single program. - -{% note(kind="warning") %} -Cross-component fusion is not implemented yet. meld v0.1.0 handles intra-component fusion only. The design for cross-component fusion is in progress — the challenges around resource lifecycle, string transcoding across different encodings, and memory layout merging are real engineering problems, not just implementation work. -{% end %} - -## The role of meld in the pipeline - -meld is the first transformation in the pipeline, and it sets up everything that follows: - -- **For loom:** meld produces a flat module where former component boundaries are erased. loom's fused optimizer understands the specific patterns meld generates — adapter trampolines, same-memory copies, redundant imports — and eliminates them. The cleaner meld's output, the more loom can optimize. - -- **For synth:** meld reduces the number of modules from many to one. synth transcodes one module to native code, not a graph of interconnected modules. This simplifies the transcoding problem significantly. - -- **For kiln:** The host and builtin component implementations that kiln provides become the only remaining imports after meld has resolved everything internal. kiln's interface with the fused module is clean and minimal. - -- **For sigil:** meld is the first transformation that sigil attests. The proof that fusion preserves semantics — that the fused module behaves identically to the composed component — is the foundation of the entire attestation chain. - -## Who else works on composition - -The Bytecode Alliance's [WAC](https://github.com/bytecodealliance/wac) (WebAssembly Composition) is the standard tool for composing components. WAC resolves imports against exports and produces a composed component — but the output is still a Component Model component with multiple core modules and instantiation chains. WAC composes; meld fuses. - -[wasm-merge](https://github.com/WebAssembly/binaryen) (Binaryen) merges core modules by connecting imports to exports. It operates at the core module level, not the Component Model level — it has no concept of canonical ABI operations, WIT interfaces, or resource types. - -meld operates at the Component Model level and produces core module output. It understands what `wit-component`, WAC, and the Canonical ABI generate, and resolves that structure into something any core Wasm runtime can consume. - -*Next in the series: [part 4 — loom: post-fusion optimization](/blog/loom-post-fusion-optimization/).* diff --git a/content/blog/2026-03-06-synth-kiln-wasm-to-firmware.md b/content/blog/2026-03-06-synth-kiln-wasm-to-firmware.md deleted file mode 100644 index ed4175f..0000000 --- a/content/blog/2026-03-06-synth-kiln-wasm-to-firmware.md +++ /dev/null @@ -1,92 +0,0 @@ -+++ -title = "synth + kiln: from Wasm to firmware" -description = "synth transcodes optimized Wasm to native code. kiln provides the runtime layer — host and builtin component implementations that the native code links against. Together, they produce a firmware image with no Wasm runtime on the device." -date = 2026-03-12 -draft = true -[taxonomies] -tags = ["deep-dive", "synth", "kiln", "series"] -authors = ["Ralf Anton Beier"] -+++ - -*This is part 5 of a series on building a verified WebAssembly pipeline for safety-critical embedded systems. [Part 1](/blog/zero-cost-component-model/) introduces the approach. [Part 4](/blog/loom-post-fusion-optimization/) covers post-fusion optimization.* - -{% insight() %} -No Wasm runtime on the device. The firmware image is native code — same footprint, same performance as hand-written C — but built from portable, language-neutral components. Teams write in Rust, C, or any Wasm-targeting language; the target hardware sees only optimized native instructions. -{% end %} - -## The last mile - -After meld fuses and loom optimizes, the result is a flat, optimized core Wasm module. For cloud and server targets, this is where the pipeline ends — hand the module to wasmtime or another runtime. For embedded targets, there is one more step: the Wasm module must become native code that runs directly on the hardware. - -This is what synth and kiln do together. - -## synth: Wasm to native - -synth transcodes the fused Wasm module to native machine code — an ELF binary or firmware image for the target architecture. ARM Cortex-M is the first backend, with the architecture designed for pluggable backends so that additional targets (TriCore, RISC-V, or others) are separate implementations. - -synth aims for provably correct transcoding: a proof that the Wasm input and the native output are semantically equivalent. This is the hardest part of the pipeline to verify, and it is early work. - -### The landscape - -Wasm-to-native is not a new idea. Several tools exist: - -- **[wasm2c](https://github.com/WebAssembly/wabt)** (WABT) transpiles Wasm to C, then compiles with any standard toolchain. This is pragmatically powerful — you get GCC, Clang, or a qualified safety-critical compiler (IAR, Green Hills, Wind River Diab), and for safety-critical work, the qualification burden shifts to the C compiler — which may already be qualified for the target domain. Siemens and Stanford demonstrated a [3 KB ROM Wasm runtime](https://cs.stanford.edu/~keithw/) using this approach. - -- **[aWsm](https://github.com/gwsystems/aWsm)** (GW University + ARM Research) compiles Wasm to native via LLVM, targeting x86-64, aarch64, and ARM Cortex-M. Performance within 10% of native on microprocessors, within 40% on Cortex-M. Uses LLVM's full optimization pipeline. - -- **[Infineon's TriCore AOT](https://github.com/Infineon/aurix_webassembly_aot)** translates Wasm to native TriCore for AURIX — their ASIL-D automotive MCU. Direct instruction translation for a specific production target. - -- **WAMR AOT** (wamrc) compiles Wasm ahead of time, but the output still requires the WAMR runtime on the device for system calls and memory management. - -All of these produce native code at build time — not at runtime. DLR's DASC 2025 paper notes that JIT and AOT at *runtime* generate executable object code, "a behavior which is not foreseen by current regulations." Build-time transcoding avoids this entirely, and this applies to all the tools above, not just synth. - -### Where synth fits - -synth's differentiator is not that it produces native code — others do that. It is: - -1. **Pipeline integration.** synth receives meld+loom output, which is already fused and optimized. The other tools take raw Wasm modules. meld+loom as a preprocessing step would benefit any of them. - -2. **Proof correctness.** synth aims to prove that its transcoding preserves semantics. This is the same verification goal that runs through the entire pipeline — meld proves fusion correct, loom proves optimization correct, synth proves transcoding correct, sigil attests the chain. - -3. **Pluggable backends.** The architecture separates target-specific code generation from the common transcoding framework, so adding a TriCore or RISC-V backend does not require redesigning the pipeline. - -{% note(kind="warning") %} -synth is early. The ARM Cortex-M backend is the first target. Proof correctness is the goal, not the current state. The wasm2c path (Wasm to C, then a qualified C compiler) is a pragmatically viable alternative for teams that need a certified toolchain today. -{% end %} - -## kiln: the runtime layer - -When synth transcodes user component logic to native code, the result is not a standalone binary. The component has imports — WASI calls, host functions, resource lifecycle management. Something must provide those implementations on the target device. - -That is kiln. - -kiln is the backend for the builtin and host components that synth generates. When the native code calls `get-stdout`, `exit`, or any other WASI function, kiln provides the implementation. When a resource handle needs to be allocated or dropped, kiln manages the table. When the component's memory needs initialization, kiln sets it up. - -kiln is not just a WASI shim. It is a `no_std` Component Model runtime in Rust, designed for the same constrained targets that synth produces code for. It provides: - -- **WASI 0.2 implementations** for the host interfaces the component imports -- **Resource handle management** for component-level resources -- **Memory and table initialization** for the fused module -- **An interpreter mode** for targets where native transcoding is not needed or not yet available - -The firmware image that ships to the device contains synth's native code *and* kiln's runtime layer, linked together. kiln is not an alternative to synth — it is the other half. - -### kiln as interpreter - -For targets where interpretation is preferred — perhaps for flexibility, for hot-loading Wasm updates without reflashing, or because a synth backend does not exist for the architecture — kiln provides a `no_std` Component Model interpreter. This is the same runtime, the same WASI implementations, the same resource management — but executing Wasm bytecode instead of native transcoded code. - -## From component to firmware - -The full path: - -1. Developer writes components against WIT interfaces, in any language -2. **meld** fuses them into a flat core module -3. **loom** optimizes across the erased boundaries -4. **synth** transcodes to native for the target (or the module runs on kiln's interpreter) -5. **kiln** provides the host/builtin runtime layer -6. **sigil** attests every step -7. The result is a firmware image: native user code + kiln runtime, ready to flash - -No Wasm runtime on the device. No Component Model overhead at runtime. The full composition model was available during development and has been erased by the time the code reaches the target. - -*Next in the series: [part 6 — proving the pipeline](/blog/proving-the-pipeline/).* diff --git a/content/blog/2026-03-07-proving-the-pipeline.md b/content/blog/2026-03-07-proving-the-pipeline.md deleted file mode 100644 index 1039746..0000000 --- a/content/blog/2026-03-07-proving-the-pipeline.md +++ /dev/null @@ -1,94 +0,0 @@ -+++ -title = "Proving the pipeline: verification from component to binary" -description = "Each transformation in the pipeline — fusion, optimization, transcoding — should be provably correct. This is what formal verification of a WebAssembly toolchain looks like, and what it takes to qualify for ISO 26262 and DO-178C." -date = 2026-03-15 -draft = true -[taxonomies] -tags = ["deep-dive", "verification", "series"] -authors = ["Ralf Anton Beier"] -+++ - -*This is part 6 of a series on building a verified WebAssembly pipeline for safety-critical embedded systems. [Part 1](/blog/zero-cost-component-model/) introduces the approach. [Part 5](/blog/synth-kiln-wasm-to-firmware/) covers native transcoding and the runtime layer.* - -{% insight() %} -A formally verified pipeline is qualified once. Every project that uses it inherits that qualification — reducing per-project certification effort from months to weeks. CompCert proved this model works for C compilation. The same economics apply to a verified WebAssembly pipeline: invest once in proving the toolchain correct, then amortize that investment across every product line that uses it. -{% end %} - -## Why proof matters - -For most software, testing is sufficient. You run your test suite, you fuzz, you deploy. If something breaks, you patch and redeploy. - -Safety-critical systems do not work this way. In automotive (ISO 26262), aerospace (DO-178C), and medical devices (IEC 62304), you must demonstrate that your toolchain does not introduce errors — or that you have mitigated the risk of tool errors through additional verification. This is called tool qualification. - -A compiler that has been formally verified to preserve semantics — like [CompCert](https://compcert.org/) for C — changes the qualification equation. Airbus uses CompCert precisely because its verified compilation eliminates a layer of object-code verification that would otherwise be required. The tool is qualified once; every project that uses it benefits. - -The PulseEngine pipeline has the same aspiration: prove that each transformation preserves semantics, so the pipeline itself can be qualified as a tool, not re-verified per project. - -## What needs to be proven - -The pipeline has three transformations, each requiring its own correctness argument: - -### meld: fusion preserves semantics - -meld takes a Component Model component (a tree of core modules, canonical ABI operations, and instantiation chains) and produces a flat core module. The proof obligation: the fused module, when executed with the same host imports, produces the same observable behavior as the original component. - -This is structurally similar to a linking correctness proof. The challenges are specific to the Component Model: - -- **Index space merging** — functions, memories, tables, and globals from multiple modules are renumbered into a single space. Every reference must be correctly rewritten. -- **Canonical ABI resolution** — `canon lift` / `canon lower` pairs are replaced by direct calls with appropriate type conversions. The conversion must be semantically equivalent to what a compliant runtime would produce. -- **Import resolution** — internal imports (module A importing from module B) become direct references. Only host imports remain. - -The formal framework: [Iris-Wasm](https://dl.acm.org/doi/abs/10.1145/3591265) provides mechanized separation logic for Wasm 1.0 in Rocq. A [mechanized formalization of the Wasm spec](https://www.semanticscholar.org/paper/A-Mechanized-Formalization-of-the-WebAssembly-in-Huang/2fde569f52c37fe8e45ebf05268e1b4341b58cbf) proves type safety and memory safety. These provide the foundation, but a fusion-specific proof is new work. - -### loom: optimization preserves semantics - -loom applies optimization passes to the fused module. Each pass must preserve semantics: the optimized function produces the same outputs for the same inputs. - -loom takes two approaches: - -**Translation validation via [Z3](https://github.com/Z3Prover/z3).** After optimizing a function, loom encodes both the original and optimized versions as SMT formulas and asks Z3 to prove equivalence. This is not a proof that the optimizer is correct in general — it is a per-function, per-optimization check that this specific transformation preserved semantics. If Z3 finds a counterexample, the optimization is rejected for that function. - -**Declarative rules via ISLE.** Cranelift's [ISLE](https://github.com/bytecodealliance/wasmtime/tree/main/cranelift/isle) pattern-matching framework allows optimization rules to be expressed declaratively. [Crocus/VeriISLE](https://dl.acm.org/doi/10.1145/3617232.3624862) has demonstrated that ISLE instruction-lowering rules can be verified — they reproduced known bugs and found new ones in Cranelift. The same verification approach could apply to loom's optimization rules once they are expressed in ISLE. - -### synth: transcoding preserves semantics - -synth transcodes Wasm to native code. The proof obligation: the native output, when executed on the target architecture, produces the same behavior as the Wasm input when executed by a compliant interpreter. - -This is the hardest proof in the pipeline. It spans two instruction set architectures (Wasm and the target ISA) and must account for the differences in memory models, calling conventions, and instruction semantics. - -Precedent exists. [VeriWasm](https://github.com/PLSysSec/veriwasm) verifies that native x86-64 output from Wasm compilation preserves software fault isolation properties, with Rocq proofs for the verifier. CompCert proves that compilation from C to multiple target architectures preserves semantics. A synth-specific proof would draw on both approaches. - -The wasm2c path offers an alternative verification strategy: transpile Wasm to C, then rely on a qualified C compiler ([IAR](https://www.iar.com/), [Green Hills](https://www.ghs.com/), [Wind River Diab](https://www.windriver.com/products/diab-compiler)) for the last step. The qualification burden shifts to the C compiler, which may already be qualified for the target domain. - -## The verification tools - -- **[Verus](https://github.com/verus-lang/verus)** — SMT-based formal verification for Rust. Proofs are expressed in Rust syntax, alongside the implementation code. For verifying kiln's interpreter, meld's fusion logic, and loom's optimization passes. -- **[Rocq](https://rocq-prover.org/)** (formerly Coq) — the proof assistant for the mathematical foundations. The semantic model of Wasm, the correctness theorems for each transformation, the proofs that tie everything together. Iris-Wasm and the mechanized Wasm spec are both built in Rocq. -- **[Z3](https://github.com/Z3Prover/z3)** — the SMT solver behind loom's translation validation and Verus's proof obligations. Z3 checks satisfiability of logical formulas — when loom asks "are these two functions equivalent?", Z3 provides the answer. - -## What this means for qualification - -In ISO 26262, a software tool used in the development of safety-related systems must be qualified according to its Tool Confidence Level (TCL). A tool that could introduce errors without detection requires higher qualification effort. - -A formally verified pipeline reduces the TCL requirement: if the tool is proven correct, the confidence in its output is higher, and the qualification effort per project is lower. The tool is qualified once. Each project that uses it inherits that qualification — subject to impact analysis and integration testing, but without re-verifying the tool's internal correctness. - -In DO-178C, the DO-330 supplement provides a framework for tool qualification. A verified compiler (like CompCert) satisfies the highest qualification criteria (TQL-1) because it can be shown to not introduce errors. A verified build pipeline — meld + loom + synth, each proven correct — would target the same classification. - -{% note(kind="warning") %} -None of this exists today in qualified form. No Wasm runtime or toolchain has undergone ISO 26262 or DO-178C qualification. CompCert took years of research and engineering to reach production qualification. We are at the beginning of this path — the tools exist, the verification approaches are being developed, and the goal is clear. Claiming otherwise would be dishonest. -{% end %} - -## The long game - -Qualifying a toolchain for safety-critical use is measured in years, not months. The path: - -1. **Build the tools** — meld, loom, synth, kiln, sigil. Ship working software that people can use. *(In progress.)* -2. **Add verification** — Z3 translation validation in loom, Verus proofs for critical algorithms, Rocq formalization of the semantic model. *(Starting.)* -3. **Demonstrate the pipeline** — end-to-end, from Component Model source to verified firmware, on a real embedded target. *(Future.)* -4. **Pursue qualification** — work with a certification authority to qualify the pipeline for a specific standard and domain. *(Long-term.)* - -Each step builds on the previous one. The tools have to work before they can be verified. The verification has to exist before it can be qualified. - -*Next in the series: [part 7 — the toolchain: hermetic builds and supply chain attestation](/blog/hermetic-toolchain/).* - -If you are working on formal verification of WebAssembly or tool qualification for safety-critical systems — we would like to hear from you. Everything is at [github.com/pulseengine](https://github.com/pulseengine). diff --git a/content/blog/2026-03-12-hermetic-toolchain.md b/content/blog/2026-03-12-hermetic-toolchain.md deleted file mode 100644 index 5f047d0..0000000 --- a/content/blog/2026-03-12-hermetic-toolchain.md +++ /dev/null @@ -1,99 +0,0 @@ -+++ -title = "The toolchain: hermetic builds and supply chain attestation" -description = "A verified pipeline needs a reproducible build system. Bazel provides hermeticity, Nix provides toolchain reproducibility, and sigil attests every transformation. This is how the pieces fit together." -date = 2026-03-18 -draft = true -[taxonomies] -tags = ["deep-dive", "sigil", "bazel", "series"] -authors = ["Ralf Anton Beier"] -+++ - -*This is part 7 of a series on building a verified WebAssembly pipeline for safety-critical embedded systems. [Part 1](/blog/zero-cost-component-model/) introduces the approach. [Part 6](/blog/proving-the-pipeline/) covers formal verification of each transformation.* - -{% insight() %} -Any engineer, on any machine, can reproduce the exact build that produced any released artifact. This is not just good engineering practice — it is a regulatory requirement. ISO 26262 demands traceability from requirements to deployed artifact. DO-178C requires configuration management that can reproduce any released build. A hermetic, attested build pipeline satisfies both from a single infrastructure investment. -{% end %} - -## Why reproducibility matters - -Formal verification proves that a transformation *can* preserve semantics. But if you cannot reproduce the exact build that produced a specific artifact, the proof is disconnected from reality. You need both: correctness (the algorithm is right) and reproducibility (this specific binary was produced by this specific verified pipeline, with these specific inputs). - -For safety-critical qualification, this is not optional. ISO 26262 requires traceability from requirements through implementation to the deployed artifact. DO-178C requires configuration management that can reproduce any released build. If your toolchain produces different outputs on different machines or at different times, qualification becomes impractical. - -[Ferrocene](https://ferrocene.dev/) — the qualified Rust compiler for safety-critical systems — understood this early. Cargo is included in their distribution but explicitly not qualified — qualifying a tool that touches the internet and executes arbitrary build scripts is prohibitively complex. For safety-critical production builds, Ferrocene's [Safety Manual](https://public-docs.ferrocene.dev/main/safety-manual/rustc/constraints.html) requires invoking `rustc` directly, bypassing cargo entirely, with procedural constraints: clean build environments, build monitoring, controlled environment variables. [CriticalUp](https://github.com/ferrocene/criticalup) manages toolchain distribution with cryptographic signature verification. - -The PulseEngine pipeline automates what Ferrocene achieves through procedural controls. Bazel invokes compilers directly (not through cargo), enforces hermeticity through sandboxing, and tracks every input and output in the build graph. Nix provides reproducible toolchain provisioning. sigil attests the chain. - -## Bazel: the build system - -Every tool in the PulseEngine pipeline builds with [Bazel](https://bazel.build/). Bazel's hermetic build model — where every input is declared, every action is sandboxed, and outputs are cacheable and reproducible — aligns directly with the requirements of a qualified toolchain. - -More importantly, Bazel allows the pipeline tools themselves to be composed as build rules. Building a WebAssembly component, fusing it with meld, optimizing with loom, and transcoding with synth are all build actions that Bazel can track, cache, and reproduce. - -### rules_wasm_component - -[rules_wasm_component](https://github.com/pulseengine/rules_wasm_component) is the foundation — Bazel rules for WebAssembly Component Model development. At v1.0.0 with over 770 commits, it is the most mature piece of the build infrastructure. It supports multi-language component builds (Rust, C, C++, Go, MoonBit), dependency management, multi-profile builds, and integration with the broader Bazel ecosystem through the [Bazel Central Registry](https://registry.bazel.build/). - -This is where the Component Model enters the build graph. Components built with these rules flow into the pipeline — meld, loom, synth — as hermetically tracked artifacts. - -### rules_rocq_rust - -[rules_rocq_rust](https://github.com/pulseengine/rules_rocq_rust) provides Bazel rules for the [Rocq](https://rocq-prover.org/) theorem prover and [rocq-of-rust](https://github.com/formal-land/rocq-of-rust) integration. This is how formal proofs enter the build system: Rocq proofs are compiled alongside the Rust implementation, ensuring that proof artifacts stay in sync with the code they verify. - -rules_rocq_rust already uses [Nix](https://nixos.org/) for hermetic toolchain management — Rocq's dependency chain is complex, and Nix provides the reproducibility guarantee that Bazel's sandboxing alone cannot for toolchains with external dependencies. - -### rules_verus - -[rules_verus](https://github.com/pulseengine/rules_verus) brings [Verus](https://github.com/verus-lang/verus) — SMT-based Rust verification — into Bazel. Verus proofs are expressed in Rust syntax, making them natural to integrate with Rust build rules. This is early work — the rules exist but are not yet production-grade. - -### rules_moonbit - -[rules_moonbit](https://github.com/pulseengine/rules_moonbit) provides hermetic [MoonBit](https://www.moonbitlang.com/) toolchain support in Bazel. MoonBit compiles to WebAssembly and is one of the languages supported by rules_wasm_component for component development. - -## Nix: toolchain reproducibility - -Bazel provides hermetic *builds* — every action is sandboxed, inputs are declared, outputs are deterministic. But the *toolchain itself* — the compiler, the linker, the verification tools — must also be reproducible. If two developers have different versions of Rocq or Verus, the proofs may not match. - -[Nix](https://nixos.org/) solves this. A Nix flake pins every toolchain dependency to an exact revision, and Nix's content-addressed store ensures that the same inputs always produce the same outputs. Where Bazel sandboxes the build actions, Nix sandboxes the toolchain provisioning. - -Today, only rules_rocq_rust uses Nix for toolchain management. Extending Nix flakes across the entire pipeline — meld, loom, synth, kiln, sigil — is an active priority. The goal: any developer, on any machine, can reproduce the exact build environment that produced any released artifact. - -## sigil: attesting the chain - -Formal verification proves correctness. Reproducibility ensures consistency. [sigil](https://github.com/pulseengine/sigil) proves provenance: this specific artifact was produced by this specific pipeline. - -sigil uses [Sigstore](https://www.sigstore.dev/) keyless signing for each pipeline stage. The final artifact carries a chain of attestations — one per transformation — that links it back to the source component and the exact tool versions that produced it. - -- **Wasm module signing** — sigil signs Wasm modules using custom sections, following the approach from [wasmsign2](https://github.com/wasm-signatures/wasmsign2). -- **SLSA provenance** — attestations follow the [SLSA](https://slsa.dev/) framework, providing standardized supply chain metadata. -- **In-toto layouts** — pipeline steps are expressed as [in-toto](https://in-toto.io/) layouts, enabling third-party verification of the supply chain. - -### The attestation chain - -Verification says: "meld's fusion algorithm is correct." -Reproducibility says: "this build environment matches the qualified configuration." -sigil says: "this specific ELF was produced by meld v0.1.0 → loom v0.2.0 → synth v0.1.0, from this specific source component, at this time, with these configuration parameters." - -For audit and qualification purposes, this chain is the evidence trail. An assessor can verify not just that the tools are correct, but that the correct tools were actually used to produce the artifact under review. - -### Signing native artifacts - -Today, sigil signs Wasm modules. After synth transcodes to native, the artifact is an ELF binary, not a Wasm module. Signing native artifacts — ELF sections, MCUboot TLV headers for secure boot — is [in progress](https://github.com/pulseengine/sigil/issues/47) to complete the attestation chain from source to deployed firmware. - -{% note(kind="warning") %} -The build infrastructure is real and actively used, but not yet complete. Nix integration is partial — only rules_rocq_rust uses it today. sigil's attestation chain covers Wasm artifacts but not yet native binaries. rules_verus is early. We are building toward full hermetic reproducibility with end-to-end attestation, and we are honest about what is and is not done. -{% end %} - -## Tying it together - -The qualified pipeline vision: - -1. **Bazel** tracks every input, action, and output in the build graph -2. **Nix** ensures the toolchain is identical across all environments -3. **meld + loom + synth** transform components to firmware (each verified — [part 6](/blog/proving-the-pipeline/)) -4. **sigil** attests every step with cryptographic evidence -5. The result: a firmware image with a complete, verifiable chain from source to binary - -This is not a new idea. It is the combination that matters — hermetic builds, verified transformations, and cryptographic attestation, applied to the specific problem of WebAssembly-based safety-critical systems. - -If you are working on reproducible builds for safety-critical toolchains, Bazel for embedded, or supply chain security for WebAssembly — we would like to hear from you. Everything is at [github.com/pulseengine](https://github.com/pulseengine). diff --git a/content/blog/2026-03-21-temper-governance.md b/content/blog/2026-03-21-temper-governance.md deleted file mode 100644 index 04502fe..0000000 --- a/content/blog/2026-03-21-temper-governance.md +++ /dev/null @@ -1,106 +0,0 @@ -+++ -title = "temper: automated governance for a safety-critical toolchain" -description = "Every repository in the PulseEngine organization adheres to the same standards automatically. temper is the GitHub App that enforces signed commits, branch protection, CI attestation, and consistent configuration — no manual enforcement, no drift." -date = 2026-03-21 -draft = true -[taxonomies] -tags = ["deep-dive", "temper", "series"] -authors = ["Ralf Anton Beier"] -+++ - -*This is part 8 of a series on building a verified WebAssembly pipeline for safety-critical embedded systems. [Part 1](/blog/zero-cost-component-model/) introduces the approach. [Part 7](/blog/hermetic-toolchain/) covers the build system and supply chain attestation.* - -{% insight() %} -In safety-critical industries, every audit starts with the same question: can you prove your development process is consistent? temper eliminates configuration drift across the entire toolchain — signed commits, branch protection, CI attestation, dependency tracking — enforced automatically on every repository. The result: when an assessor asks about your development governance, the answer is a dashboard, not a spreadsheet. -{% end %} - -## Why governance matters - -PulseEngine is not one repository. It is over 25 repositories spanning a Wasm runtime, a fusion tool, an optimizer, a native transcoder, build rules, verification tools, and MCP servers. Each repository needs the same branch protection, the same signed-commit requirements, the same CI attestation, the same Dependabot configuration. - -In a safety-critical context, this consistency is not optional. ISO 26262 requires configuration management. DO-178C requires that the development environment is controlled. If repository A requires signed commits but repository B does not, the governance claim applies to neither. - -Manual enforcement does not scale. A developer forgets to enable branch protection on a new repo. A fork relaxes merge settings. Dependabot falls out of sync. These are not hypothetical — they are the default state of any growing organization without automation. - -## What temper does - -[temper](https://github.com/pulseengine/temper) is a [Probot](https://probot.github.io/) v14 GitHub App that automatically configures and hardens every repository in the PulseEngine organization. It reacts to webhook events and enforces compliance through a declarative configuration. - -### Repository hardening - -When a new repository is created — or on demand via ChatOps commands — temper applies: - -- **Merge settings** — squash-only merges (rebase disabled), with fork-aware overrides that allow all strategies for forked repos -- **Branch protection** — required status checks, signed commits, linear history, conversation resolution, admin enforcement, no force-pushes or deletions -- **Issue labels** — synchronizes a standard label set across all repos -- **PR and issue templates** — pushes `.github/PULL_REQUEST_TEMPLATE.md`, issue templates, and `CODEOWNERS` -- **Dependabot configuration** — auto-detects ecosystems (supports 14: npm, cargo, gomod, pip, maven, docker, terraform, and more) and generates appropriate `dependabot.yml` - -### CI attestation - -temper's own CI pipeline generates SPDX SBOMs and attests build provenance via [Sigstore](https://www.sigstore.dev/) — the same attestation framework that [sigil](/blog/hermetic-toolchain/) uses for the pipeline artifacts. This creates consistency: the governance tool and the build tool use the same supply chain security infrastructure. - -### Signed-commit handling - -For repositories that require signed commits (all PulseEngine repos do), temper manages the merge strategy: - -- Temporarily enables merge commits (which preserve GPG signatures) when a PR contains signed commits -- Auto-reverts to rebase-only after a timeout -- This solves the common problem of losing signature verification during squash merges - -### ChatOps - -Ten commands, triggered by commenting `/command` on any issue or PR: - -| Command | Action | -|---------|--------| -| `/configure-repo` | Apply full configuration to the current repo | -| `/sync-all-repos` | Bulk-apply configuration to every org repo | -| `/check-config` | Generate a compliance report | -| `/generate-dependabot` | Auto-detect ecosystems and generate config | -| `/analyze-org` | Full organization analysis report | -| `/review-pr` | Trigger an AI-powered code review | - -## The compliance dashboard - -temper includes an operations dashboard — an embedded HTMX-based web UI that provides: - -- **Organization-wide compliance score** with per-repo breakdown (merge settings, branch protection, signed commits, CI status, labels) -- **Active PR tracker** with check status, labels, and CI status -- **Signal feed** for real-time webhook events and configuration drift detection -- **AI review tracking** with review history and statistics - -This dashboard is not a separate service — it is built into temper and served directly. For audit purposes, it provides a live view of organizational compliance without requiring manual reports. - -## The thrum integration - -temper works in concert with [thrum](/blog/thrum-orchestrator/) — the pipeline orchestrator that dispatches AI coding agents across PulseEngine repositories. When thrum's agents create pull requests: - -1. temper recognizes the `thrum` bot user -2. After CI passes, temper auto-enables GitHub's auto-merge (squash method) -3. The PR merges without manual intervention - -This creates an autonomous development loop: thrum generates and verifies code through formal gates, temper enforces governance and handles the merge. The human-in-the-loop checkpoint lives in thrum (approval gate), not in temper. - -## Architecture - -temper is a Node.js application built on Probot v14: - -- **Configuration** — a single declarative `config.yml` controls all behavior, validated at startup -- **Persistence** — SQLite with WAL mode for task scheduling and idempotent webhook processing -- **Self-update** — a compiled Rust binary handles zero-downtime deployments when the temper repo itself is updated -- **Deployment** — Docker (multi-stage Alpine), PM2, or shared hosting - -### For safety-critical context - -temper is not itself a safety-critical tool — it does not produce artifacts that run on target devices. It is a development environment tool. But its role is essential for the qualification story: it provides *evidence* that the development process meets the governance requirements of ISO 26262 and DO-178C. - -A formally verified compiler that runs in an ungoverned development environment is still a compliance gap. temper closes that gap. - -{% note(kind="warning") %} -temper is at v1.0.0 and actively deployed against the PulseEngine organization. It is not a general-purpose governance tool — it is purpose-built for PulseEngine's specific requirements. The AI review feature uses a local endpoint and is not suitable for all environments. -{% end %} - -*Next in the series: [part 9 — thrum: autonomous AI-driven development with formal verification gates](/blog/thrum-orchestrator/).* - -If you are building governance automation for safety-critical development environments — we would like to hear from you. Everything is at [github.com/pulseengine](https://github.com/pulseengine). diff --git a/content/blog/2026-04-22-overdoing-the-verification-chain.md b/content/blog/2026-04-22-overdoing-the-verification-chain.md new file mode 100644 index 0000000..cd40875 --- /dev/null +++ b/content/blog/2026-04-22-overdoing-the-verification-chain.md @@ -0,0 +1,374 @@ ++++ +title = "Overdoing the verification chain — and mapping it to six safety domains" +description = "The prior posts argued for proofs and for traceability. This one shows the full chain, why I chose to overdo rather than undercommit, and where the stack earns credit across six safety domains — with an honest read on what still does not clear the bar." +date = 2026-04-29 +draft = true +[taxonomies] +tags = ["verification", "deep-dive"] +authors = ["Ralf Anton Beier"] ++++ + +{% insight() %} +Proofs cover all inputs. Tests cover realistic inputs. Concurrency checkers cover every interleaving. Mutation testing covers the test suite itself. Sanitizers catch what unsafe code actually does at runtime. Each technique answers a different question. When you do not yet know which question the next assessor cares about — and you will not, because you work across six standards written by six different committees — overdoing is the only honest default. +{% end %} + +## The arc so far + +[Formal verification just became practical](/blog/formal-verification-ai-agents/) made the case that AI agents collapse the cost of writing proof annotations. [What comes after test suites](/blog/what-comes-after-test-suites/) argued that proofs layer — Verus for functional contracts, Rocq for abstract models, Lean for scheduling theory, Kani for bounded state exploration, tests and sanitizers for everything else — and that rivet[^rivet-184] holds the traceability together. + +This post zooms out. What does the whole chain look like, why is overdoing the right default, and where does the same stack earn credit across more than avionics? + +## Why overdo is the default + +Regulated domains reward defense in depth, not minimum-viable verification. Each technique has a blind spot. Proofs miss the cases where your specification is wrong. Tests miss the inputs you did not think to try. Model checkers miss states beyond their bound. Sanitizers only see what ran. Human review misses what scrolled past. + +Combinations shrink the blind spot faster than any single technique improves. Adding a second, independent tool at the same layer is often cheaper than tightening the first one beyond diminishing returns. + +And AI-velocity code grows the verification surface. If an agent can implement a feature in minutes, the proof annotations, tests, benches, and traceability artifacts have to keep pace on the same clock. A pipeline that runs proofs, tests, sanitizers, and mutation testing on every push scales to meet that. One that runs fewer does not. + +The cost of overdoing is CI budget. The cost of undercommitting is a certification campaign that stalls because one technique the assessor expected is missing. I know which I would rather pay. + +## The chain, layer by layer + +Ten layers, each answering a different question. No project names in this table — this is the technique picture, not the adoption picture. + +{% mermaid() %} +flowchart TB + subgraph allIn["proves for all inputs"] + direction TB + lean[Lean · Rocq] + verus[Verus · Rocq-of-Rust] + kani["Kani (bounded)"] + ztv[Z3 translation validation] + end + subgraph manyIn["covers many realistic inputs"] + direction TB + prop[proptest] + loom["tokio-rs/loom
every interleaving"] + mut["cargo-mutants
tests the tests"] + crit["criterion
perf regression"] + end + subgraph runtimeObs["observes what actually runs"] + direction TB + miri[Miri] + san[ASAN · TSAN · LSAN · UBSAN] + end + rivet["rivet — traceability
across all three"] + + allIn -.-> rivet + manyIn -.-> rivet + runtimeObs -.-> rivet + + classDef grp fill:#13161f,stroke:#3d4258,color:#8b90a0; + classDef tool fill:#1a1d27,stroke:#6c8cff,color:#e1e4ed; + classDef trace fill:#1a1d27,stroke:#4ade80,color:#e1e4ed; + class allIn,manyIn,runtimeObs grp; + class lean,verus,kani,ztv,prop,loom,mut,crit,miri,san tool; + class rivet trace; +{% end %} + +| Layer | Answers | Techniques | +|---|---|---| +| Pure mathematical logic | Is the theory sound? | Lean 4 + Mathlib, Rocq | +| Functional contracts on Rust | Does the code satisfy its spec for all inputs? | Verus (SMT/Z3), Rocq-of-Rust, Aeneas (planned refinement into Lean) | +| Bounded state-space exhaustion | Does it hold up to a realistic bound? | Kani (CBMC) | +| IR-to-IR translation | Does the pipeline preserve what was proved at the source? | Bespoke Z3 translation validation on the WASM optimizer | +| Property-based sampling | Does it hold on randomised realistic distributions? | proptest | +| Concurrency exhaustion | Does it hold under every thread interleaving? | tokio-rs/loom | +| Runtime observation of unsafe code | Do the unsafe regions actually behave? | Miri, ASAN, TSAN, LSAN, UBSAN | +| Test-suite adequacy | Would my tests detect a real bug? | cargo-mutants | +| Performance regression | Did the latency budget hold? | criterion — before any hardware test | +| Traceability | Which requirement does each of the above satisfy? | rivet | + +Operationally, those layers run through **four gates** on every push: + +{% mermaid() %} +flowchart LR + push([developer push]) + pc["pre-commit
rivet 21-hook template"] + bz["bazel test //...
hermetic"] + gh["GitHub Actions
miri · sanitizers · proptest
fuzz · differential · mutation"] + kv["cargo-kiln verify-matrix
ASIL profile"] + hw([hardware]) + + push --> pc --> bz --> gh --> kv --> hw + + classDef endpoint fill:#13161f,stroke:#4a5068,color:#e1e4ed; + classDef gate fill:#1a1d27,stroke:#6c8cff,color:#e1e4ed; + class push,hw endpoint; + class pc,bz,gh,kv gate; +{% end %} + +1. **Pre-commit** — shift-left. rivet's 21-hook template is the reference: formatting, clippy, cargo-test, cargo-audit, cargo-deny, cargo-bench-check, cargo-mutants, rivet-validate. +2. **Bazel `test //...`** — the hermetic gate. Compiles and runs verus_test, kani_test, rocq_proof_test, lean_proof_test alongside rust_test. +3. **GitHub Actions** — the matrix gate. Miri, sanitizers, proptest, fuzz smoke, differential, mutation, integration. +4. **ASIL-profile verify** — the pre-release gate. ASIL-D config exercises the strictest path in CI before anything reaches a board. + +Hardware comes after criterion benchmarks have cleared their regression budget. Silent performance drift is just as disqualifying as a failed proof, and it is cheaper to catch it in minutes on a laptop than in hours on a bring-up rig. + +## Where we stand — honest assessment + +Traffic-light at the chain-layer level, not the project level. The project-level matrix lives in the tracking issue[^rivet-184]. + +| Layer | Status | +|---|---| +| Mathematical logic (Lean, Rocq) | ✅ shipping | +| Functional SMT contracts (Verus) | ✅ shipping at scale; Z3 proof certificates not yet independently checkable | +| Refinement from Rust to Lean (Aeneas) | ◐ scaffolded; hermetic Charon pending[^rules-lean-1] | +| Bounded model checking (Kani) | ✅ shipping | +| Translation validation (Z3 on WASM IR) | ✅ shipping, bespoke | +| Abstract interpretation | ❌ not yet — the one missing third DO-333 technique class | +| proptest / tokio-rs/loom / sanitizer | ✅ present but unevenly adopted across the estate | +| Mutation testing | ◐ shipping at pre-commit in one repo; generalizing via a canonical template | +| Traceability (rivet) | ✅ shipping, living artifact | + +## Six standards, one chain + +Most safety-critical standards are published behind per-copy paywalls. I cannot quote them verbatim. What follows is grounded in open-access primary sources where available (NASA-STD-8719.13[^nasa-std-8719], ECSS-Q-ST-80C Rev.2[^ecss-80c], NASA Langley's DO-333 case study materials[^nasa-langley-do333][^nasa-cr-2014], peer-reviewed papers[^mdpi-do333]) and in named secondary interpretations from qualified vendors and consultants where not[^ldra-do332][^ldra-26262][^ldra-61508][^adacore-en50128][^afuzion-do332][^afuzion-do330][^visure-do333][^tuv-61508][^milemb-do332][^rapita-do332]. Every claim below cites at least one source; paywall caveats are consolidated in the Sources section. + +| Standard | Domain | FM posture | Our fit | Biggest gap | +|---|---|---|---|---| +| DO-178C + DO-333 | Avionics (DAL A–C) | DO-333 permits formal analysis in place of testing under soundness + completeness conditions[^mdpi-do333][^visure-do333] | Two of three DO-333 technique classes; WASM-IR translation validation is a §FM.6.7(f)-style asset[^nasa-cr-2014] | DO-330 qualification dossier[^afuzion-do330] for Verus / Kani / Z3 TV | +| ISO 26262 (Part 6) | Automotive (ASIL A–D) | Formal verification "highly recommended" at ASIL D as summarised in vendor compliance guides[^ldra-26262][^parasoft-26262] | Strong fit; ASIL-profile gate already in the CLI | Tool Confidence Level qualification[^ldra-26262] | +| IEC 61508-3 | General FS (SIL 1–4) | Formal methods "highly recommended" at SIL 3/4; Annexes A/B list techniques including theorem proving[^ldra-61508][^tuv-61508] | Strong fit; HOL lineage welcomed | Proven-in-use or §7.4.4 qualification argument[^ldra-61508] | +| EN 50128 | Railway (SIL 0–4) | Formal proof explicitly "highly recommended" at SIL 3/4[^adacore-en50128][^en50128-academia] | Best cultural fit — theorem proving is the railway norm (B-method heritage) | Lean lacks certification precedent vs. B / Coq[^en50128-academia] | +| IEC 62304 + FDA CSA | Medical (Class A/B/C) | Silent on specific tools; accepts technique plus traceability | Traceability via rivet is the usable artifact | Mapping proofs into DHF + ISO 14971 risk controls | +| ECSS-Q-ST-80C Rev.2 | Space (Cat A/B) | Recommended; no rigid qualification regime[^ecss-80c] | Fits; artifacts are assessor-checkable | No established credit template for newer provers | + +Nuclear (IEC 60880) sits outside this table deliberately. Regulator acceptance of SMT-backed arguments is harder to establish than the other six domains, and the pedigree preference for tabular methods means an SMT-first stack walks in with a disadvantage. + +A one-line gloss for each technique in the matrix below — skip if these read like home: + +- **Theorem proving** — machine-checked mathematical proofs (Lean, Rocq) that a specification holds for all inputs. The proof term is independently re-checkable by a small kernel. +- **SMT contracts** — `requires` and `ensures` annotations on Rust functions, discharged automatically by an SMT solver (Z3, behind Verus). Fast in CI; limited to decidable theories. +- **Bounded MC** — *bounded model checking*: exhaustive exploration of every execution up to a finite bound on loop iterations and state depth. Kani for Rust. +- **Translation validation** — a per-pass proof that a compiler transformation preserves semantics, run on the actual input. Lets a source-level proof extend to the executable. Our WASM optimizer uses bespoke Z3 encoding here. +- **Abstract interpretation** — execute the program over mathematical sets (`non-negative integer`, `[0, 255]`) instead of concrete values, to compute a sound over-approximation of all possible behaviours without running it on concrete inputs. Astrée is the canonical industrial tool. +- **proptest** — property-based testing with random input generation and failure shrinking. +- **tokio-rs/loom** — permutation-checks every possible thread interleaving in a bounded concurrent program. +- **Sanitizer · Miri** — runtime instrumentation that detects undefined behaviour, memory errors, and data races (ASAN, TSAN, LSAN, UBSAN, Miri). +- **Mutation testing** — inject small plausible bugs into the source and check whether the test suite catches them; empirical test-suite adequacy. +- **Traceability** — requirement ↔ design ↔ code ↔ test ↔ proof chain, validated on every commit (rivet). + +At a glance, all seven domains against the core chain techniques: + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Theorem
proving
SMT
contracts
Bounded
MC
Translation
validation
Abstract
interp.
proptesttokio-rs
loom
Sanitizer
 · Miri
Mutation
testing
Trace-
ability
DO-178C + DO-333 (avionics)
ISO 26262 ASIL D (automotive)
IEC 61508 SIL 4 (general FS)
EN 50128 SIL 4 (railway)
IEC 62304 Class C (medical)
ECSS-Q-ST-80C Cat A (space)
IEC 60880 Cat A (nuclear)
+
+ +
+ strong fit + partial — needs qualification or conditions + gap today + standard silent +
+ +## What becomes useful per domain + +Concrete reads on where the same chain would actually cash in. + +- **Automotive (ASIL D).** Kani with ASIL profiles plus rivet traceability plus cargo-mutants answers exactly what Part 6 unit-verification tables and Part 8 tool-confidence guidance expect to see[^ldra-26262]. Fits without rework. +- **Avionics (DAL A–C).** The WASM-IR translation validation is the under-valued asset; it is the kind of translation-preservation argument DO-333's source-to-object rules were written for[^nasa-cr-2014][^mdpi-do333]. Qualification under DO-330 is the work. +- **Railway (SIL 4).** The cultural fit is strong; the specific move is to port critical Lean theorems to Rocq for dossier gravitas — Coq has industrial precedent[^en50128-academia] that Lean does not yet — and to show a worked Aeneas refinement to bridge Rust implementation to the proof[^aeneas-paper]. +- **Medical (Class C + FDA CSA).** Primarily a traceability and risk-control mapping play. rivet is the usable artifact. Proofs are a bonus the standard does not strictly require. +- **Space (Category A).** Assessor-checkable artifacts (proof terms, SMT certificates) are where realistic credit lives[^ecss-80c]. Dual-formalization of critical modules in Rocq strengthens this. +- **General functional safety (SIL 4).** The baseline. If this works, the four above largely follow[^ldra-61508][^tuv-61508]. +- **Nuclear (Category A).** Hardest. Defer unless mission-critical. If pursued, pair SMT with tabular specifications in the dossier; do not rely on SMT alone. + +## Where we go from here + +Three pieces visibly missing. + +**Abstract interpretation — the third DO-333 technique class.** This is the single layer of the chain we do not yet run. Abstract interpretation executes a program in a mathematical universe where concrete values (`42`) are replaced by sets (`non-negative integer`, `[0, 255]`, `may-alias set`) and computes a *sound over-approximation* of every possible behaviour without running the program on concrete inputs. It is particularly strong at integer overflow, division-by-zero, out-of-bounds accesses, and numerical precision loss — exactly the classes of runtime failure that SMT is awkward with and bounded model checking cannot guarantee outside its bound. + +Three candidate paths, in rough order of "credibility already established" vs "fits our toolchain": + +- **[Astrée](https://www.absint.com/astree/)** (AbsInt, commercial). The canonical AI tool. Airbus A380 and A350 used it for DO-178B/C DAL A credit, and that precedent is exactly what a certification authority wants to see. Supports C and Ada — not Rust. For PulseEngine it would only fit the C-shim boundary (WASI host intrinsics, FFI shims), and it drags in a separate toolchain we would otherwise avoid. Licence cost is in the tens of thousands of EUR per year. +- **[MIRAI](https://github.com/facebookexperimental/MIRAI)** (Meta, research-grade). An abstract interpreter for Rust MIR, designed for the language. Covers integer overflow, panics, and some memory properties out of the box. Recent activity is low (steady since 2022), but the code is open source and the property classes it targets are the ones that matter for ASIL-D and DAL-A credit stories. Fastest way to see whether AI adds signal on our codebase — weekend-scale prototype. +- **Charon-based value analysis.** The strategic fit. Our pending Aeneas integration[^rules-lean-1] exposes Rust MIR as LLBC through Charon. A modest abstract interpreter built directly on LLBC — interval, sign, and may-alias lattices to start — keeps a single Rust-MIR toolchain, reuses the Nix-hermetic build, and produces analysis artefacts that live alongside the Lean and Rocq proofs. Rough effort: 2–4 months of focused work to a first useful pass. + +The plan: prototype with MIRAI this quarter to confirm which property classes actually pay off on our code; plan Charon-based value analysis as the longer-term investment once Aeneas is running end-to-end. Astrée stays on the shelf for the C-shim boundary if and when a specific certification programme asks for it. + +**The Check-It pattern.** NASA's own tool-qualification research[^nasa-cr-2017] describes the shape: untrusted prover emits a checkable proof certificate, tiny trusted checker validates it, only the checker is qualified under DO-330. Rocq and Lean work this way by construction. Verus, Kani, and the WASM-IR translation validator do not yet. Building certificate emitters and independent checkers collapses the DO-330 problem from "qualify Z3" (infeasible) to "qualify a small checker" (tractable). This is likely the next post's topic. + +{% mermaid() %} +flowchart LR + untrusted["untrusted prover
Verus · Kani · Z3"] + cert["proof certificate
SMT unsat core
CBMC trace
checkable artifact"] + checker["trusted checker
small · auditable
qualified under DO-330"] + ok([qualified verification]) + + untrusted -->|emits| cert + cert -->|validated by| checker + checker --> ok + + classDef risky fill:#13161f,stroke:#f87171,color:#e1e4ed; + classDef transit fill:#1a1d27,stroke:#fbbf24,color:#e1e4ed; + classDef trusted fill:#1a1d27,stroke:#4ade80,color:#e1e4ed; + classDef endpoint fill:#13161f,stroke:#4ade80,color:#e1e4ed; + class untrusted risky; + class cert transit; + class checker trusted; + class ok endpoint; +{% end %} + +**Aeneas end-to-end.** The rules_lean Bazel wiring is scaffolded but not yet running; hermetic Charon is the missing piece. Tracked at rules_lean#1[^rules-lean-1]. The rules_verus sysroot-handling pattern transfers almost directly. + +Twenty-five coordinated issues in the tracker[^rivet-184] raise the floor across the estate — not to reach "done," but to raise the worst cell in the matrix until every project runs every technique its surface supports. Overdoing, applied. + +--- + +## Sources and caveats + +Most safety-critical software standards — DO-178C and its supplements, ISO 26262, IEC 61508, IEC 62304, IEC 60880, EN 50128 — are published by standards bodies (RTCA, EUROCAE, ISO, IEC, CENELEC) behind per-copy paywalls. I cannot quote them verbatim. Claims about those standards above are backed either by freely available primary sources (NASA standards, ECSS PDFs, NASA Langley materials, peer-reviewed papers) or by named secondary interpretations from qualified vendors and consultants (LDRA, AdaCore, AFuzion, Parasoft, TÜV SÜD, Visure, Rapita, Military Embedded Systems). Each footnote below flags its tier inline. + +[^rivet-184]: pulseengine/rivet issue #184 — *Tracking: pulseengine-wide V&V coverage initiative*. [github.com/pulseengine/rivet/issues/184](https://github.com/pulseengine/rivet/issues/184). Live roadmap for the 25 coordinated adoption issues. + +[^rules-lean-1]: pulseengine/rules_lean issue #1 — *Finish Aeneas end-to-end pipeline*. [github.com/pulseengine/rules_lean/issues/1](https://github.com/pulseengine/rules_lean/issues/1). Tracking issue for hermetic Charon wiring. + +[^nasa-std-8719]: NASA — *NASA-STD-8719.13 Software Safety Standard*. [standards.nasa.gov](https://standards.nasa.gov/standard/NASA/NASA-STD-871913). Primary (open access — "cleared for public accessibility on the internet"). + +[^ecss-80c]: European Cooperation for Space Standardization — *ECSS-Q-ST-80C Rev.2 Software Product Assurance*, 30 April 2025. [Direct PDF](https://ecss.nl/wp-content/uploads/2025/05/ECSS-Q-ST-80C-Rev.2(30April2025).pdf). Primary (open access; ECSS license-agreement registration required but free). + +[^nasa-langley-do333]: NASA Langley Research Center — *DO-333 Case Studies* overview page. [shemesh.larc.nasa.gov/fm/DO-333-case-studies.html](https://shemesh.larc.nasa.gov/fm/DO-333-case-studies.html). Primary (open access). NASA interpretation of DO-333, not DO-333 itself. + +[^nasa-cr-2014]: Darren Cofer and Steven Miller — *Formal Methods Case Studies for DO-333*. NASA/CR-2014-218244, April 2014. [NTRS](https://ntrs.nasa.gov/citations/20140004055). Primary (open access). The reference case-study report for DO-333 practice. + +[^mdpi-do333]: Yang et al. — *Formal Analysis and Verification of Airborne Software Based on DO-333*. MDPI Electronics 9(2):327, 2020. Peer-reviewed. [Open access](https://www.mdpi.com/2079-9292/9/2/327). + +[^ldra-do332]: LDRA — *DO-332 Supplement guide*. [ldra.com/do-332](https://ldra.com/do-332/). Secondary (interpretation of paywalled RTCA DO-332). + +[^ldra-26262]: LDRA — *ISO 26262 guide*. [ldra.com/iso-26262](https://ldra.com/iso-26262/). Secondary (interpretation of paywalled ISO 26262). + +[^ldra-61508]: LDRA — *IEC 61508 guide*. [ldra.com/iec-61508](https://ldra.com/iec-61508/). Secondary (interpretation of paywalled IEC 61508). + +[^adacore-en50128]: AdaCore — *EN 50128 compliance*. [adacore.com/industries/rail/en50128](https://www.adacore.com/industries/rail/en50128). Secondary (interpretation of paywalled CENELEC EN 50128). + +[^afuzion-do332]: AFuzion — *DO-332 Introduction*. [afuzion.com/do-332-introduction-object-oriented-technology](https://afuzion.com/do-332-introduction-object-oriented-technology/). Secondary. + +[^afuzion-do330]: AFuzion — *DO-330 Introduction — Tool Qualification*. [afuzion.com/do-330-introduction-tool-qualification](https://afuzion.com/do-330-introduction-tool-qualification/). Secondary. + +[^visure-do333]: Visure Solutions — *DO-333 guide*. [visuresolutions.com/aerospace-and-defense/do-333](https://visuresolutions.com/aerospace-and-defense/do-333/). Secondary. + +[^tuv-61508]: TÜV SÜD — *IEC 61508 Functional Safety*. [tuvsud.com/en-us/services/functional-safety/iec-61508](https://www.tuvsud.com/en-us/services/functional-safety/iec-61508). Secondary. + +[^milemb-do332]: Military Embedded Systems — *DO-332, the Liskov Substitution Principle, and local type consistency ramp up DO-178 certification*. [militaryembedded.com](https://militaryembedded.com/avionics/safety-certification/do-332-liskov-consistency-ramp-do-178-certification). Secondary. + +[^rapita-do332]: Rapita Systems — *DO-332 overview*. [rapitasystems.com/do-332](https://www.rapitasystems.com/do-332). Secondary. + +[^parasoft-26262]: Parasoft — *ISO 26262 and ASIL requirements*. [parasoft.com/learning-center/iso-26262](https://www.parasoft.com/learning-center/iso-26262/what-is/). Secondary. + +[^en50128-academia]: *The new CENELEC EN 50128 and the use of formal method* (academia.edu preprint). [Open access](https://www.academia.edu/143234284/The_new_CENELEC_EN_50128_and_the_usedof_formal_method). Peer-community preprint; supports the claim that EN 50128:2011 treats formal proof as highly recommended at SIL 3/4. + +[^nasa-cr-2017]: NASA — *Formal Methods Tool Qualification*. NASA/CR-2017-219371. [Open PDF](https://shemesh.larc.nasa.gov/fm/FMinCert/NASA-CR-2017-219371.pdf). Primary (open access). Source of the Check-It / Kind 2 pattern referenced in this post. + +[^aeneas-paper]: Son Ho and Jonathan Protzenko — *Aeneas: Rust Verification by Functional Translation*. ICFP 2022. [arXiv 2206.07185](https://arxiv.org/abs/2206.07185). Primary (open access). diff --git a/content/blog/2026-04-24-variant-pruning-rust-mcdc.md b/content/blog/2026-04-24-variant-pruning-rust-mcdc.md index 19d5469..724da71 100644 --- a/content/blog/2026-04-24-variant-pruning-rust-mcdc.md +++ b/content/blog/2026-04-24-variant-pruning-rust-mcdc.md @@ -1,7 +1,7 @@ +++ title = "MC/DC for AI-authored Rust is tractable — the variant-pruning argument" description = "The received wisdom is that Rust's pattern matching makes MC/DC harder than C. Under variant-managed AI-authored code, the opposite is true. Five layers of variant pruning, one oracle per layer, and a certification burden proportional to the single variant you ship — not the combinatorial product." -date = 2026-04-24 +date = 2026-04-30 draft = true [taxonomies] tags = ["verification", "process", "deep-dive"] diff --git a/content/blog/2026-04-25-witness-wasm-mcdc.md b/content/blog/2026-04-25-witness-wasm-mcdc.md index bade8c8..5dc182c 100644 --- a/content/blog/2026-04-25-witness-wasm-mcdc.md +++ b/content/blog/2026-04-25-witness-wasm-mcdc.md @@ -1,8 +1,9 @@ +++ title = "witness — MC/DC for the WebAssembly component model" description = "The variant-pruning post argued that MC/DC on AI-authored Rust is tractable at Wasm level. This post is the tool: witness instruments a Wasm module, runs a test harness, emits a branch-coverage report, and composes with rivet and sigil for the full evidence chain. v0.1 today, Check-It-pattern qualification on the roadmap." -date = 2026-04-25 +date = 2026-04-30 draft = true +hold = true [taxonomies] tags = ["verification", "wasm", "deep-dive"] authors = ["Ralf Anton Beier"]