refactor(cosim): CosimBackend seam + de-Metal run_cosim + module split + CpuBackend (#105 Phase 0 + Phase 1 steps 1-5)#118
Open
robtaylor wants to merge 31 commits into
Open
refactor(cosim): CosimBackend seam + de-Metal run_cosim + module split + CpuBackend (#105 Phase 0 + Phase 1 steps 1-5)#118robtaylor wants to merge 31 commits into
robtaylor wants to merge 31 commits into
Conversation
…e_ops_mut Phase 0 step 1 of the backend-portability seam (#105, ADR 0017 Amendment 2026-06-07). The three per-edge ops-patching closures (reset, model-driven inputs, model clock edges) and the --check-with-cpu replay each open-coded the same unsafe `from_raw_parts[_mut](buf.contents() as *mut/*const BitOp, len)` slice over a schedule edge's shared MTLBuffer. Collapse those into `ScheduleBuffers::edge_ops_mut`/`edge_ops` accessors. The accessor name and signature deliberately match the `edge_ops_mut` method on the forthcoming `CosimBackend` trait, so call sites need no renaming when `ScheduleBuffers` is later subsumed into `MetalBackend`. On Metal the slice is zero-copy over shared memory (the write is the upload); a CUDA/HIP backend will back the same accessor with a host mirror + dirty-flag + lazy upload. Behaviour-preserving: all 7 Metal cosim fixtures byte-identical to the pre-refactor golden; `cargo test --lib --features metal` 298 pass. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
Phase 0 of the cosim backend-portability seam (#105, ADR 0017 Amendment 2026-06-07). Factors `run_cosim`'s design execution + state ownership behind a batch-granular `CosimBackend` trait, implemented by a new `MetalBackend` that owns the `MetalSimulator`, the `[2 × state_size]` design-state buffer, the per-edge schedule storage, and every GPU IO buffer (flash / UART / bus-trace / SRAM / blocks / event). Key points: - The backend owns its schedule storage, materialised once via `init_schedule` from a backend-agnostic `Vec<Vec<BitOp>>` description; orchestration keeps only scalars (`edges_per_period`, `gcd_ps`). No parallel copy the backend re-materialises. - `run_edges` / `profile_kernels` / `wait` forward to the UNCHANGED `MetalSimulator::encode_and_commit_gpu_batch` / `profile_gpu_kernels` / `spin_wait` using the owned fields, so the GPU-encoding logic is byte-for-byte unchanged from the pre-seam call site. - The three per-edge ops-patching closures (reset / model-driven inputs / model clock edges) become free functions taking `&mut dyn CosimBackend` and patch through `edge_ops_mut` — the "closure-borrow" resolution noted in ADR 0017 (sequential calls, no long-lived backend borrow). - `edge_ops_mut` takes `&mut self` so the seam enforces exclusive access at compile time over the interior-mutable Metal shared memory. Trait scope is exactly what the Metal driver uses; `state()`/`state_mut()` (plan's output_state/input_state_mut) and a backend-agnostic VCD ring land in Phase 1 with `CpuBackend` (noted in-code). The module split to cosim/{mod,metal}.rs remains separable/cosmetic. Behaviour-preserving: all 7 Metal cosim fixtures byte-identical to the pre-refactor golden; jtag_minimal 4M-edge replay PASS (data0_obs=0xCAFEBABE); `cargo test --lib --features metal` 298 pass. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
Phase 0 (#105) landed on this branch: edge_ops_mut accessor + CosimBackend trait/MetalBackend extraction, Metal bit-identical (harness + 4M JTAG verified). Re-frame next-up as Phase 1 (CpuBackend + Linux CI) and record the two Phase-0 deferrals (state accessors, de-Metal the VCD ring) that feed into it. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
Concrete implementation plan for Phase 1 of #105, stacked on the Phase 0 seam (#118). Records the chosen interface approach (fat backend constructor / ADR Layer-1/2 split), the trait additions to de-Metal run_cosim's ~54 setup sites, the required cosim_metal.rs → cosim/{mod,metal} module split, the CPU peripheral-decode strategy (reuse CppSpiFlash + BusTraceDecoder; port UART TX decode to CPU), a 7-step bit-identical-gated sequencing, and the cross-backend-equivalence testing strategy. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…ddressed) Address the plan-reviewer's 4 critical issues + gaps (all evidenced against the real code): - Drop `flash_d_i` from the trait (Metal vs CppSpiFlash update d_i at different dispatch points); CpuBackend absorbs CppSpiFlash::step internally. - Specify the `vcd_snapshot` ring contract ([input|output] slot layout, CpuBackend fills slot 0, drain loop refactored off raw ring.contents()). - `--check-with-cpu` becomes a no-op-with-warning under CpuBackend (never compare the reference backend to itself). - Decompose step 2 (the ~54-site relocation) into 2a/2b/2c, each Metal bit-identical; add the module-split compile gate (zero metal:: in mod.rs, `cargo check --lib` no-feature) with front-loaded exit assertions in steps 1/3. - Note the de-Metaling of the ~100 lines of loop-body diagnostics (dff-dump/trace-signals/deep-diag). - New CpuBackend::new asserts: reject SRAM+xprop (simulate_block_v1 has no sram_xmask) and timing-arrivals (GPU-ring readback); size state Vec to effective_state_size*2. - Acknowledge WB-trace stub, input-only models / empty step_edge output, multi-stage (blocks × num_major_stages); fix the cmd_cosim hard-error ref (~1684, not the sim-path 507). Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…nification) Fill in the Tier-2 `GpuPeripheral` seam beyond the `encode_step` one-liner: specify the single peripheral contract that the CPU model (Tier 1), GPU kernel (Tier 2), and single source (Tier 3) all express, so they aren't two parallel interfaces. Records: (1) the observe→FSM→drive+emit shape is common to every peripheral and the CPU `PeripheralModel` trait is already this contract and already bidirectional (optional input/output halves cover GPIO/UART/bus/flash); (2) the GPU half is three bespoke kernels with one common skeleton + a hand-synced `#[repr(C)]` layout — the shared layout is the consistency anchor, the hand-sync is the tax Tier 3 removes; (3) the key decision — express ALL input drives as (position,value) ops applied through state_prep, normalising flash's bespoke direct-write so input application is uniform across peripherals and both substrates; (4) what deliberately stays substrate-specific (ring drain, FSM body). Adds the Phase-1 implication that the new CPU UART-TX decoder mirror `UartDecoderState` so Phase 2/3 fold into one definition. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
… (P1.1) Phase 1 step 1 of #105. Removes the `metal::Buffer` leak from the `CosimBackend` trait and moves the per-edge VCD snapshot ring into the backend, so the seam stops exposing Metal types: - `run_edges(batch, schedule_offset)` drops its `Option<&metal::Buffer>` parameter; the ring is now a `MetalBackend` field sourced internally. - `enable_vcd_ring()` / `vcd_snapshot(edge) -> &[u32]` trait methods replace the run_cosim-local `vcd_ring_buffer` + raw `ring.contents()` pointer-math drain (now a `vcd_snapshot(i)` loop over agnostic [input|output] slots). - `flash_set_in_reset(bool)` replaces the direct `flash_state_buffer.contents() as *mut FlashState` in_reset write. State/SRAM read accessors (`state`/`sram`) and the mutable `state_mut`, plus the routing of the ~15 diagnostic `.contents()` reads (entangled with flash-FlashState reads), are deferred to step 3 (make run_cosim generic) and step 5 (CpuBackend), where they are handled coherently — kept off the trait here to match exactly what the Metal driver calls today. Behaviour-preserving: 7 cosim fixtures byte-identical to the Phase 0 golden (incl. the VCD outputs that exercise the ring path); cargo test --lib --features metal 298 pass. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…ash_buffers (P1.2a) Phase 1 step 2a. Relocate the ~120-line GPU SPI-flash buffer allocation + init (FlashState / FlashDinParams / FlashModelParams / flash_data firmware load) out of run_cosim's inline body into an associated `MetalBackend::build_flash_buffers` in the Metal impl. Pure relocation; a permanent piece of the eventual MetalBackend::new (which composes build_flash + build_uart + build_bus + build_state). Bit-identical: 7 cosim fixtures byte-identical to the Phase 0 golden. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
… (P1.2b) Phase 1 step 2b. Relocate the ~110-line UART + Wishbone-trace + bus-trace buffer allocation + init (the gpu_io_step peripheral buffers) out of run_cosim into an associated `MetalBackend::build_io_buffers`, returning the seven buffers plus the CPU-side `bus_lanes` decoders. Pure relocation; companion to build_flash_buffers in the eventual MetalBackend::new. Bit-identical: 7 cosim fixtures byte-identical to the Phase 0 golden (incl. dual_uart + apb_trace, which exercise these buffers). Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
Flash + gpu_io_step buffer setup extracted to MetalBackend::build_flash_buffers / build_io_buffers (75c9ec0, a93812a), bit-identical. Record 2c (the trickier state/sram/event/blocks extraction + xprop seed) and the MetalBackend::new assembly as the next moves. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…next Sync the handoff header to the current branch tip + CI status (fully green across Phase 0 + P1.1 + P1.2a + P1.2b) and note the PR description should be rescoped to Phase 0 + Phase 1. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…d_state_buffers (P1.2c) Relocate the design-state, SRAM, blocks-program, and event buffer allocation + buffer-intrinsic init out of run_cosim into a new static MetalBackend::build_state_buffers, mirroring 2a/2b. Moved: states_buffer alloc + fill, the xprop X-mask seed of both slots, sram_data/sram_xmask alloc + fill, the SRAM ELF preload, the blocks_start/blocks_data no-copy wrappers, and the leaked-Box event buffer. The agnostic stimulus deposits (reg_init / reset / constant_ports / set_flash_din) and sram_dumper stay inline in run_cosim, operating on a states slice re-derived over the returned states_buffer. timing_constraints_buffer also stays inline (not in 2c scope). build_state_buffers returns event_buffer_ptr (the *mut EventBuffer from Box::into_raw) so run_cosim keeps ownership of the leaked box; the two existing drop(Box::from_raw(event_buffer_ptr)) sites are unchanged, and no Drop impl is added. Metal output bit-identical (7 fixtures match the golden; the only shasum delta is run_params.json::master_seed, which is rand::random() and flips between runs of the same binary). 298 lib tests pass. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
… 3 next Step 2c extracted state/sram/event/blocks buffer setup into MetalBackend::build_state_buffers; Metal bit-identical, 298 tests pass. Next up: step 3 (generic run_cosim<B>). Also notes the golden.sums run_params.json drop (per-run random master_seed, never bit-identical). Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…reads (P1.3a)
Add CosimBackend::{state,state_mut,sram}; MetalBackend gains sram_len.
Route the ~14 read-only states_buffer/sram_data_buffer loop-body reads
through state()/sram(). run_cosim stays concrete-typed; the generic flip
+ flash/drain seam are 3b. Metal bit-identical, 298 tests pass.
Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…ric flip) Step 3's one-liner under-specified the loop-body work. 3a (state/sram accessors) landed @ d5a029f. Document the decoded-records seam (ADR 0017 Layer 3) for 3b-i (flash diagnostics + uart/wb/bus drains off concrete fields) and 3b-ii (MetalBackend::new fat constructor + generic flip). Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…rete fields (P1.3b-i)
Decoded-records seam (ADR 0017 Layer 3): add CosimBackend::{flash_d_i,
flash_debug_snapshot, drain_uart_tx, drain_bus_beats, drain_wb_trace_debug,
uart_decoder_debug, debug_flash_raw_tick0}. Flash const-params become
agnostic locals; uart/wb/bus ring read-cursors move into MetalBackend.
run_cosim body now has zero backend.<field>.contents() reads; stays
concrete-typed (generic flip is 3b-ii). Metal bit-identical, 298 tests pass.
Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
3a (state/sram accessors @ d5a029f) + 3b-i (decoded-records seam @ 32d31b8) landed; run_cosim body now has zero backend.<field>.contents() reads. Records the 3b-ii design: MetalBackend::new fat constructor, bus_lanes → agnostic, event_buffer Drop impl, state_mut() deposit routing, generic flip. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…e_mut, Drop for event buffer (P1.3b-ii-a) Fat constructor MetalBackend::new (MetalSimulator + build_state/flash/io + timing buffer + struct literal) returning (Self, bus_lanes). Stimulus deposits (reg_init/reset/constant_ports/set_flash_din) now go through state_mut() after construction. event_buffer_ptr becomes a field freed by a Drop impl, removing the two manual drop sites. run_cosim stays concrete-typed (generic flip is 3b-ii-b). Metal bit-identical, 298 tests pass. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…Backend> (P1.3b-ii-b) Add CosimBackend::new (fat constructor) + profile_kernels (default no-op); move the write_params GPU-setup loop into MetalBackend::new. run_cosim_generic<B> drives everything through trait methods — zero concrete MetalBackend/metal:: tokens in its body. Public run_cosim is a thin run_cosim_generic::<MetalBackend> shim; jacquard.rs call site unchanged. Completes step 3. Metal bit-identical, 298 tests pass. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…; step 4 next 3b-ii-a (MetalBackend::new + deposits via state_mut + Drop @ 0bb150c) and 3b-ii-b (trait new/profile_kernels + run_cosim_generic<B> flip @ 14f6a27) landed. run_cosim body is fully backend-neutral. Next: step 4 module split (cosim/{mod,metal}.rs) gated by `cargo check --lib` no-feature compile. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
mod.rs (non-gated): CosimBackend trait, BitOp/FlashDebug, CosimOpts/Result, run_cosim_generic<B>, patchers, scheduler + agnostic glue. metal.rs (#[cfg(feature="metal")]): MetalBackend/MetalSimulator/ScheduleBuffers, GPU structs, build_*/encode_*, the run_cosim shim. src/sim/mod.rs now `pub mod cosim;` (non-gated); jacquard.rs paths updated. cargo check --lib --no-default-features now compiles the agnostic half. Metal bit-identical, 298 tests pass. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…327080); step 5 next Module split landed: cosim/mod.rs (non-gated agnostic) + cosim/metal.rs (metal-gated GPU). cargo check --lib --no-default-features now compiles the agnostic half. Next: step 5 CpuBackend (the trait is fully backend-neutral; debug/profile methods have no-op defaults). Notes the bus_lanes-helper extraction + dropping the temporary allow(dead_code) when CpuBackend lands. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
CpuBackend run_edges models the --check-with-cpu CPU stepper; UART-TX FSM must be ported from csrc/kernel_v1.metal (gpu_io_step), not Rust. CpuBackend can't run end-to-end until step 6 wiring — note the 5+6-together vs compile+unit-test verification options. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
CpuBackend (cosim/mod.rs, non-gated): Vec<u32> state/sram, run_edges via cpu_reference::simulate_block_v1[_xprop] mirroring the --check-with-cpu stepper, xprop X-mask state_prep ported from kernel_v1.metal. cmd_cosim's no-GPU path now runs cosim on CpuBackend via run_cosim_cpu. UART/bus decode + flash stubbed (drain_* empty) for 5b/5c. Logic fixtures (xprop/2state/ noreginit/reginit VCDs) byte-identical to Metal golden; Metal path unchanged (7 fixtures bit-identical, 298 tests pass). Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…; 5b next CpuBackend runs cosim with no GPU feature; 4 logic VCDs (incl. xprop) byte-identical to Metal golden. Next: 5b (port UART-TX FSM from kernel_v1.metal → drain_uart_tx, verify dual_uart), then 5c (bus-trace beat extraction → drain_bus_beats + agnostic bus-lanes helper, verify apb_trace). Notes the UnsafeCell interior-mutability review point. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…1.5b) Port the gpu_io_step UART FSM (kernel_v1.metal:1189-1249) to CPU: per-channel 4-state decoder advancing current_cycle per edge, run after each edge's simulate in run_edges, accumulating completed bytes drained by drain_uart_tx. Per-channel tx_out_pos/cycles_per_bit derived agnostically (mirrors build_io_buffers' UartParams). dual_uart_events.json byte-identical to Metal golden on a no-GPU build; Metal path unchanged (7/7), 298 tests pass. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
… next dual_uart byte-identical to golden on CPU. Next: 5c (port bus-trace beat extraction → drain_bus_beats + extract agnostic bus-lanes builder so CpuBackend::new returns real bus_lanes; verify apb_trace ±xprop). Then step 7 Linux cosim CI. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…-lanes builder (P1.5c) Port the gpu_io_step APB3 bus-trace extraction (kernel_v1.metal:1305-1352) to CPU: per-bus gate rising-edge detection emitting RawBeats with a per-edge current_tick, run after each edge's simulate. Extract the agnostic build_bus_trace (positions + BusTraceLane decoders) into cosim/mod.rs, shared by both backends; metal's build_bus_trace_params packs the positions into the GPU struct. CpuBackend::new now returns real bus_lanes. All 7 cosim fixtures byte-identical to the Metal golden on a no-GPU build; Metal path unchanged (7/7), 298 tests pass. Completes CpuBackend Phase-1 functional parity. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
…e (@ a525a25); step 7 next All 7 cosim fixtures byte-identical to the Metal golden on a no-GPU build. Only step 7 (Linux cosim CI) remains for Phase 1. Records the CI-job options (commit expected outputs vs extend compare_backend_vcds.py). Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
Commit the 7 cosim fixture expected outputs (tests/*/expected/) and a scripts/ci/cosim_cpu_check.sh that runs them via the no-GPU CpuBackend build and diffs against expected. New ubuntu-latest `cosim-cpu` CI job builds `cargo build -r --bin jacquard` (no features) and runs the check — locking in cross-backend equivalence on a free Linux runner. Completes Phase 1 of #105. Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Cosim backend portability (#105), Phase 0 + Phase 1 (in progress) — extract
the backend seam and de-Metal
run_cosimso cosim can run on a CPU referencebackend (and later CUDA/HIP), not just Metal. Implements the target architecture
from ADR 0017 Amendment 2026-06-07 (Layer 1/2/3 + the peripheral contract)
and the staging in
docs/plans/cosim-backend-portability.md/docs/plans/cosim-phase1-cpu-backend.md.Per the maintainer's call this is one PR, not a stack: it now carries Phase 0
and the completed Phase 1 steps below. Every commit is a Metal-only refactor
with zero behaviour change, gated by a byte-identical fixture harness.
Phase 0 — the seam
edge_ops_mutaccessor — factor the repeatedunsafe from_raw_partsovera schedule edge's shared
MTLBufferbehindScheduleBuffers::edge_ops_mut/edge_ops.CosimBackendtrait +MetalBackend—MetalBackendowns theMetalSimulator, the[2×state_size]design-state buffer, the per-edgeschedule, and all GPU IO buffers.
run_edges/profile_kernels/waitforwardto the unchanged
encode_and_commit_gpu_batch/profile_gpu_kernels/spin_wait.Phase 1 — de-Metal
run_cosim, module split, CpuBackend (steps 1–5 done)Step 1 — de-Metal the
run_edgesseam (dropmetal::Buffer); backend-ownedVCD ring (
enable_vcd_ring/vcd_snapshot);flash_set_in_reset.Step 2 (2a/2b/2c) — relocate buffer setup+init into builders:
build_flash_buffers,build_io_buffers(+ CPUbus_lanes),build_state_buffers(states/sram/xmask/blocks/event, incl. xprop seeding + SRAM preload).
Step 3 (3a/3b-i/3b-ii) — make the orchestration backend-neutral:
CosimBackend::{state,state_mut,sram}; route the design-state/SRAMloop-body reads.
flash_d_i,flash_debug_snapshot,drain_uart_tx,drain_bus_beats,drain_wb_trace_debug,uart_decoder_debug. Flash const-params becomeagnostic locals; peripheral ring cursors move into
MetalBackend. Therun_cosimbody now has zerobackend.<field>.contents()reads.MetalBackend::newfat constructor (-> (Self, Vec<BusTraceLane>));stimulus deposits via
state_mut();event_bufferfreed by aDropimpl;CosimBackend::new/profile_kernelson the trait;run_cosimflips torun_cosim_generic<B: CosimBackend>with a thin private-MetalBackendshim.Step 4 — physical module split
cosim_metal.rs→cosim/{mod,metal}.rs:mod.rs(non-gated) holds the trait,run_cosim_generic<B>, scheduler + agnostic glue;metal.rs(#[cfg(feature="metal")]) holdsMetalBackend, the GPU structs,build_*/encode_*, therun_cosimshim.cargo check --lib --no-default-featuresnow compiles the agnostic half.The
CosimBackendtrait surface is now complete and CPU-ready — debug/profilemethods have no-op defaults, so a
CpuBackendneed only implement the functionalcore.
Step 5 (5a/5b/5c) —
CpuBackend(in non-gatedcosim/mod.rs): a CPU reference backend that runsjacquard cosimwith no GPU feature. Design stepper viacpu_reference::simulate_block_v1[_xprop](xprop X-mask state_prep ported fromkernel_v1.metal); CPU ports of the UART-TX decoder and APB3 bus-trace beat-extraction FSMs (also from the shader);cmd_cosim's no-GPU path wired. All 7 cosim regression fixtures are byte-identical between CpuBackend (no-GPU build) and the Metal golden — cross-backend equivalence proven. This unlocks cosim regression on free Linux CI.Scope / next (this PR or follow-up)
ubuntu-latest(free runner): run the cosimfixtures via the no-GPU
CpuBackendbuild. The only remaining Phase-1 step.Verification (every commit)
(
dual_uart,apb_trace±xprop,xprop_cosim, 2state, reg-init variants).jtag_minimal4M-edge replay PASS (data0_obs=0xCAFEBABE) — exercises themodel-driven-clock per-edge path.
cargo test --lib --features metal→ 298 pass.Refs #105.
Co-developed-by: Claude Code v2.1.168 (claude-opus-4-8)