Skip to content

fix: issues found by Diligence#519

Open
felicityin wants to merge 28 commits into
pre-release-v1.2.7from
fix-issues
Open

fix: issues found by Diligence#519
felicityin wants to merge 28 commits into
pre-release-v1.2.7from
fix-issues

Conversation

@felicityin

Copy link
Copy Markdown
Contributor

@felicityin

felicityin commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Issue 009

op_a_value is unconstrained when op_a_0 = 1, enabling forged branch direction (control-flow hijack).

The bug is that Ziren keeps two versions of operand A:

  • op_a_access.value(): the value actually read from or written to the register file.
  • op_a_value: the logical operand sent to instruction chips like Branch and TEQ.

When op_a is register $zero, the CPU AIR forced only op_a_access.value() to be zero. It did not also force op_a_value to zero for read-source instructions.

That left op_a_value free whenever an instruction read $zero as operand A. Branch instructions and TEQ consume this free op_a_value, so a malicious prover could fake comparisons involving $zero.

Example: beq $0, $0, label should always branch. But the prover could keep the real register access value as 0 while setting the free op_a_value to 1, making the branch chip believe the operands are unequal and accept a fallthrough path.

So this is a soundness bug: the verifier could accept a proof of an execution path that does not match real MIPS semantics.

Fix

Add a CPU AIR constraint for the vulnerable case:

builder
    .when(local.instruction.op_a_0 * local.op_a_immutable)
    .assert_word_zero(local.op_a_value);

This binds the free instruction-tuple operand op_a_value to zero when op_a is $zero and the instruction treats A as an immutable read source, covering branches, stores, and TEQ. It deliberately does not constrain ordinary writes to $zero, where a nonzero computed result is allowed to be discarded.

@felicityin

felicityin commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Issue 213

Critical — Branch/Jump delay-slot target (next_next_pc) is not carried across shard boundaries, allowing control-flow forgery

MIPS branches and jumps use a delay slot, so Ziren tracks:

pc
next_pc
next_next_pc

For a jump at B to target T:

pc = B
next_pc = B + 4 // delay slot
next_next_pc = T // post-delay-slot target

Inside one shard, next_next_pc is correctly propagated to the delay-slot row. But across shard boundaries, public values only carry next_pc, not next_next_pc.

So a malicious prover could end a shard on a branch/jump row. The shard exports only B + 4, the delay-slot address. The next shard starts at B + 4 and re-derives next_pc = B + 8, silently dropping the true target T.

Result: a taken branch or unconditional jump can be forged as fallthrough, letting the prover create an execution trace that violates the real program control flow while still verifying.

Fix

Add verifier-side AIR constraints that forbid a shard from ending on a branch or jump row:

  builder
      .when_transition()
      .when(local.is_real - next.is_real)
      .assert_one(local.is_sequential + local.is_halt);

  builder
      .when_last_row()
      .when(local.is_real)
      .assert_one(local.is_sequential + local.is_halt);

Why this fixes it: branch/jump rows are real, non-halt, non-sequential rows. They carry the post-delay-slot target in next_next_pc, but shard public values only export next_pc. By requiring the last real CPU row to be either sequential or halt, the AIR now enforces the executor’s intended invariant that branch/jump and their delay slot cannot be split across a shard boundary. That prevents next_next_pc from being dropped.

@felicityin

felicityin commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Issue 016

High - syscall/precompiles/u256x2048_mul/air.rs - U256x2048Mul lo_ptr/hi_ptr constrained only via reduce()

The chip reads lo_ptr and hi_ptr from memory, then constrains them only by comparing against:

lo_ptr_memory.value().reduce()
hi_ptr_memory.value().reduce()

But reduce() maps a 32-bit word into the KoalaBear field, whose modulus is smaller than 2^32. So two different 32-bit words can reduce to the same field element.

Because lo_ptr and hi_ptr are later used as base addresses for output memory writes, a non-canonical pointer word could collide with another address after reduction. That lets the proof write the multiplication result to a different memory address than the raw pointer word represents.

The fix is to range-check the raw pointer words with KoalaBearWordRangeChecker before reducing them, making the reduction injective for those values.

Fix

Add local KoalaBearWordRangeChecker columns for both pointer reads:

pub lo_ptr_range_checker: KoalaBearWordRangeChecker<T>,
pub hi_ptr_range_checker: KoalaBearWordRangeChecker<T>,

Trace generation now populates them from the raw memory-read pointer words, and the AIR range-checks both values before the existing reduce() binding:

KoalaBearWordRangeChecker::<AB::F>::range_check(
  builder,
  *local.lo_ptr_memory.value(),
  local.lo_ptr_range_checker,
  local.is_real.into(),
);

Same for hi_ptr_memory. This makes the pointer word canonical before reduction, so lo_ptr_memory.value().reduce() and hi_ptr_memory.value().reduce() cannot collide with a different 32-bit word.

@felicityin

felicityin commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Issue 020

MULT/MULTU HI-register write is skippable, forging the value read by MFHI.

The MUL chip allowed real MULT/MULTU instructions to skip writing the MIPS HI register.

The HI write was gated by hi_record_is_real, but that column was not forced to be 1 for hardware MULT/MULTU rows. A malicious prover could set it to 0, causing the product high word to be computed but not written to register 33.

Then a later MFHI would read the old/stale HI value instead of the true multiply high word. This lets the prover forge computations that depend on 64-bit multiplication results, while still satisfying the memory argument because the skipped write simply never enters the memory multiset.

Fix

Add a MUL AIR constraint that forces hardware MULT/MULTU rows to write HI, while preserving dependency-only multiply rows used by DivRem and MADD/MSUB:

builder.when(local.is_mult + local.is_multu).assert_zero(
(local.pc - AB::Expr::from_canonical_u32(UNUSED_PC))
    * (AB::Expr::one() - local.hi_record_is_real),
);

Meaning:

  • If the row is MULT/MULTU
  • and pc != UNUSED_PC meaning it is a real program instruction row
  • then hi_record_is_real must be 1

Delegated multiply rows use pc = UNUSED_PC, so they can still keep hi_record_is_real = 0 and avoid writing register 33.

This closes the attack because a real MULT/MULTU row can no longer match a MUL-chip row that skips the HI memory write. Since hi_record_is_real is also the is_check_memory slot in the instruction tuple, this also forces the CPU side to send the real shard/clk for hardware multiplies.

@felicityin

felicityin commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Issue 036

Medium - Verifier trusts proof-supplied chip_ordering indices without bounds validation.

The shard proof contains a prover-supplied chip_ordering map from chip name to opened-values index.

The verifier used this map directly to index opened_values.chips and order the chip list, but did not validate that the indices were in bounds, unique, or a valid 0..n permutation.

A malicious proof could therefore provide malformed ordering metadata, causing verifier panics through out-of-bounds indexing, or making multiple chips reuse the same opened values. That can misalign chip constraints with the data being opened.

The fix is to validate chip_ordering before use and reject malformed proofs with a verification error instead of trusting the prover-supplied map.

Fix

Fixed the native shard verifier path.

  • Added InvalidChipOrdering(String) verification error.

  • Validates prover-supplied chip_ordering before use:

    • length matches opened chip values
    • all chip names are known/expected
    • every index is in bounds
    • indices are unique
    • indices form a contiguous 0..n range
    • required preprocessed chip names are present
  • Calls validation before shard_chips_ordered in StarkMachine::verify.

  • Also validates inside Verifier::verify_shard to protect direct callers.

@felicityin

felicityin commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Issue 037

INS instruction integer underflow when msb < lsb.

The execute_ins function in the executor computes msb - lsb + 1 without checking that msb >= lsb. When a malicious guest program encodes an INS instruction where msb < lsb (which is architecturally undefined in MIPS), the subtraction msb - lsb underflows as a u32, producing a very large value. This causes a subsequent 1u32 << (msb - lsb + 1) to either panic in debug mode or produce undefined behavior (shift by >= 32 bits is undefined in Rust's wrapping semantics for << with u32).

The AIR constraints for INS (in misc/others/air.rs) do include a range check lsb <= msb via ByteOpcode::LTU lookup (lsb < msb + 1), but this constraint only validates the trace during proof verification. An honest executor encountering such an instruction will panic before generating a valid trace, constituting a prover denial-of-service.

A malicious guest program can craft an INS instruction with msb < lsb to crash the prover.

Fix

execute_ins now validates msb >= lsb before computing the bit-width mask. If the encoding is invalid, it returns ExecutionError::ExceptionOrTrap() instead of underflowing msb - lsb and panicking. The INS dispatch path now propagates that Result with ?.

@felicityin

felicityin commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Issue 038

Reachable unimplemented!() panic in ecrecover hook on guest-controlled curve ID.

The hook_ecrecover function in hook.rs contains an unimplemented!() macro call at line 133 that is reachable through guest-controlled input. The curve ID is extracted from the first byte of the guest-supplied buffer (buf[0] & 0b0111_1111), and only values 1 (secp256k1) and 2 (secp256r1) are handled. Any other value (0, 3-127) triggers unimplemented!(), which panics and crashes the prover process.

Hooks are invoked during execution via the WRITE syscall to specific file descriptors. Since the guest program controls the data written, a malicious guest can write a buffer with any curve ID to the FD_ECRECOVER_HOOK file descriptor, triggering the panic.

The hook return values are used as "hints" and are not directly constrained by the AIR -- the precompile circuit re-verifies the computation. However, the panic occurs before any return value is produced, so it is a pure DoS vector against the prover.

Fix

I replaced the reachable unimplemented!() in crates/core/executor/src/hook.rs:118 with a proper error return:

  • unsupported curve_id now returns Err(ExecutionError::UnsupportedEcrecoverCurveId(curve_id))

I also added the new executor error variant in crates/core/executor/src/executor.rs:193 so the failure is explicit and non-panicking.

@felicityin

felicityin commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Issue 039

EXT instruction shift underflow when msbd + lsb > 31.

The execute_ext function in the executor and the corresponding emit_misc_dependencies function compute 31 - lsb - msbd to generate the SLL shift amount. When msbd + lsb > 31 (which is architecturally undefined per the MIPS spec), this subtraction underflows as u32, producing a huge shift amount. This causes incorrect trace generation or a panic.

The AIR constraints for EXT (misc/others/air.rs lines 402-416) verify lsb + msbd < 32 via a ByteOpcode::LTU byte lookup, but this validation occurs only during proof verification. The executor will panic during trace generation on architecturally undefined instruction encodings.

Both the executor (execute_ext at line 1778-1779) and the dependency emitter (emit_misc_dependencies at line 291) perform 31 - lsb - msbd without validating the precondition, making them both vulnerable.

Fix

Add an explicit check in execute_ext to return an ExecutionError when msbd + lsb >= 32, consistent with the AIR constraint.

  • executor.rs — execute_ext now returns Result<_, ExecutionError> and returns ExceptionOrTrap when msbd + lsb >= 32, mirroring the existing execute_ins guard. Call site updated with ?.
  • dependencies.rs — added a debug_assert!(lsb + msbd < 32) in the EXT branch documenting the invariant the executor now guarantees.
  • ext_executor_edge_cases.rs (new) — regression test confirming the undefined encoding traps instead of panicking.

Because the guard fires during execution (before event emission), it also protects the identical subtraction in emit_misc_dependencies and trace generation — the same pattern the existing INS guard uses.

@felicityin

felicityin commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Issue 040

Division-by-zero AIR constraint forces quotient=0xFFFFFFFF but executor rejects div-by-zero.

Executor/AIR divergence on division by zero. The executor traps on c == 0 for DIV/DIVU/MOD/MODU (execute_alu, executor.rs:1877-1881), but the DivRem AIR accepted div-by-zero rows by forcing quotient = 0xFFFFFFFF and skipping the remainder range check. A malicious prover could thus submit a valid div-by-zero trace that the honest executor would never produce — a path to inject controlled quotient/remainder values into the trace.

Fix

Make the AIR reject division by zero, matching the executor. The existing IsZeroWordOperation already constrains is_c_0.result == (c == 0); I replaced the quotient-forcing block with:

// c must be non-zero on every real divrem row.
builder.when(is_real.clone()).assert_zero(local.is_c_0.result);

This proves c != 0 on every real divrem row, so any trace containing a div-by-zero event is now rejected. Also updated the module doc comment.

@felicityin felicityin force-pushed the fix-issues branch 2 times, most recently from 4f79b32 to ca43f4c Compare June 25, 2026 07:39
@felicityin felicityin force-pushed the fix-issues branch 2 times, most recently from 793c72c to 449fd40 Compare June 25, 2026 09:58
@felicityin

felicityin commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Issue 041

No Independent Range Checking on Bridge Chip Columns.

The MemoryLocalChip is the critical bridge between within-shard memory operations (local Memory lookups from eval_memory_access) and cross-shard memory state (global bus connecting Init/Finalize chips). Despite being foundational to memory soundness, its AIR constraints consist exclusively of four lookup send/receive operations with zero structural constraints on any of its 21 witness columns per entry.

Specifically, the following columns are entirely unconstrained beyond lookup matching:

  • addr (memory address) - no range check, no KoalaBear modulus bound
  • initial_shard, final_shard - no range check against 24-bit bound
  • initial_clk, final_clk - no range check against 24-bit bound
  • initial_value[0..3], final_value[0..3] - no byte range check (values should be in [0, 255])

The system relies entirely on transitive constraint propagation: the MemoryGlobalInit chip range-checks values via 32 boolean bits, and this propagates through the lookup chain. However, this creates a fragile dependency where any implementation bug in the logUp/grand-product argument could allow arbitrary field elements to flow through the memory system undetected.

This stands in sharp contrast with the MemoryGlobalChip, which independently:

  • Range-checks addr via KoalaBearBitDecomposition
  • Constrains value via 32 boolean bit decompositions
  • Enforces address ordering via AssertLtColsBits
  • Constrains timestamp = 1 for Initialize mode

Fix

Add defense-in-depth constraints within the MemoryLocalChip AIR:

  1. Byte range check all 8 value fields (initial_value[0..3] and final_value[0..3]) via byte lookup table
  2. Range check addr to be a valid KoalaBear field element (< 0x7F000001)
  3. initial_shard, final_shard to be within 16 bits
  4. Range check initial_clk, final_clk to be within 24 bits, consistent with the eval_memory_access_timestamp assumption

@felicityin

felicityin commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Issue 042

Memory Value Bytes Never Range-Checked Within Memory Subsystem.

Throughout the entire memory subsystem (instruction chip, consistency columns, local chip, and eval_memory_access), the 4-byte Word values representing memory contents are never independently verified to be valid bytes (in the range [0, 255]). The memory subsystem trusts that all values entering through the lookup argument are correctly constrained by upstream chips.

The memory value flow works as follows:

  1. MemoryGlobalInit creates initial values from 32 boolean bits (range-checked to [0, 255]) and sends them to the global bus
  2. MemoryLocalChip bridges global and local bus without any range checking
  3. eval_memory_access in air/memory.rs sends/receives memory values through the local Memory lookup without any byte range checking (lines 40-61)
  4. The memory instruction chip constructs new values for stores (SB/SH/SW/SWL/SWR/SC) from op_a_value and prev_mem_val, neither of which are byte-range-checked within the memory instruction chip

The memory instruction chip DOES perform a KoalaBearWordRangeChecker on addr_word (ensuring the address is < p), and performs byte range checks on addr_word[1..3] via slice_range_check_u8. But no similar range checking exists for the memory VALUE words.

For reads (LB/LBU/LH/LHU/LW/LWL/LWR/LL), the memory instruction chip constrains mem_access.value == mem_access.prev_value (no modification). For stores, it constructs mem_val from existing bytes. But if any byte value in the chain is not in [0, 255], the field arithmetic in byte selection (e.g., a_val[0] * offset_is_zero + mem_val[1] * ls_bits_is_one) would operate on arbitrary field elements, potentially producing results that appear correct in the field but represent invalid memory states.

The soundness assumption is that the CPU chip and ALU chip properly range-check all register values to be valid 4-byte Words with each byte in [0, 255]. If this assumption is violated anywhere in the constraint system, the memory subsystem would silently propagate and store invalid values.

Fix

Add byte range checks on memory values at the boundary where they enter the memory subsystem. The most effective location is within eval_memory_access in air/memory.rs, which is the single entry point for all memory operations. Add slice_range_check_u8 on the 4 byte elements of both value() and prev_value():

// Within eval_memory_access, add:
self.slice_range_check_u8(&memory_access.value().0, do_check.clone());
self.slice_range_check_u8(&memory_access.prev_value().0, do_check.clone());

This would provide defense-in-depth against any upstream chip failing to properly constrain register/memory values.

@felicityin

Copy link
Copy Markdown
Contributor Author

Issue 214

Poseidon2-KoalaBear uses 13 partial rounds where the Plonky3 reference prescribes 20 for the degree-3 S-box.

Ziren instantiates Poseidon2 over KoalaBear (31-bit prime, width 16) with S-box degree α=3 (cube — the smallest exponent coprime to p−1 = 2^24·127) but only R_P=13 partial (internal) rounds (R_F=8 external). Plonky3's own KoalaBear permutation — the dependency Ziren uses — builds via new_from_rng_128, whose poseidon2_round_numbers_128 table returns (R_F, R_P) = (8, 20) for (width=16, d=3) on a 31-bit field, and (8, 13) only for d=7. R_P=13 is the correct count for SP1's BabyBear/α=7 configuration that Ziren forked from. When Ziren switched the field to KoalaBear and the S-box to α=3 (a lower-degree S-box, which needs more partial rounds for algebraic-attack resistance), it left R_P at 13 — 65% of the reference-recommended 20. The smoking gun is the commented-out sbox_deg_7 next to the live sbox_deg_3 in the recursion AIR: the code was retargeted from α=7 to α=3 but R_P was never raised.

Fix

Raised the partial-round count to 20 across all five KoalaBear/α=3 instantiations — the original recommendation listed four, but the core-machine precompile chip also had to change, since the Poseidon2Permute syscall's witness is generated by poseidon2_init() and its AIR must match:

Location Change
primitives/src/lib.rs (poseidon2_init) ROUNDS_P: 13 → 20 (Canonical reference)
stark/src/kb31_poseidon2.rs (my_perm) ROUNDS_P: 13 → 20 (Inner STARK config)
core/machine/.../poseidon2/mod.rs NUM_INTERNAL_ROUNDS: 13 → 20 (Precompile AIR)
recursion/.../poseidon2_skinny/mod.rs NUM_INTERNAL_ROUNDS: 13 → 20 (Wrap machine)
recursion/.../poseidon2_wide/mod.rs NUM_INTERNAL_ROUNDS: 13 → 20 (Compress machine)

All consume the existing 30-row RC_16_30 table (8+20=28 ≤ 30); no new constants. The BN254 outer perm (outer_perm, R_P=56) is a different field/width and was correctly left alone.

A structural bug the naive fix would have missed

The poseidon2_skinny chip (used by wrap_machine, the final KoalaBear stage) packs all internal-round constants into a single round_constants column sized [T; WIDTH] (16). With R_P=20 this panics (index out of bounds: len 16 index 16). I added NUM_ROUND_CONSTANTS and widened the column, the AIR signature, and split the trace-population loop — plus mirrored it in the C++ FFI (include/poseidon2_skinny.hpp) for the sys feature. poseidon2_wide needed no change.

I also added a regression test in primitives pinning R_F/R_P to Plonky3's reference, so a future field/S-box switch can't silently drift again.

@felicityin

felicityin commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Issue 033

Medium - BooleanCircuitGarbleChip - Untested Ziren-Specific Precompile

The boolean circuit garble precompile is a Ziren-specific addition (not present in SP1) that verifies garbled boolean circuit gates. Neither the executor syscall implementation (syscalls/precompiles/boolean_circuit/garble.rs) nor the constraint chip (machine/src/syscall/precompiles/boolean_circuit_garble/) have any unit tests. This precompile was not covered by the Veridise audit, which focused only on crates/core/machine/.

The precompile reads num_gates from guest memory, iterates over gate info, computes ciphertexts for AND/OR gates, and writes a boolean result. The constraint chip must enforce that the executor's computation is correctly witnessed. Without tests, there is no verification that:

  • The constraint AIR correctly models the executor's gate evaluation logic
  • Edge cases (zero gates, maximum gates, mixed gate types) are handled
  • The memory access pattern is correctly constrained

Fix

Added unit tests for the boolean circuit garble precompile on both sides.

In the executor, I covered:

  • zero-gate events
  • basic AND and OR gate cases
  • mixed gate types
  • an invalid ciphertext case that returns false

In the machine crate, I added trace-level and AIR-level tests that:

  • validate the generated row encoding
  • check valid traces for both passing and failing boolean results
  • confirm an inconsistent output is rejected

@felicityin

felicityin commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Issue 072

Low - wrap.rs - Wrap verifier omits is_complete assertion and assert_complete checks

Ziren's ZKMWrapVerifier::verify does not assert that is_complete == 1 and does not call assert_complete on the unwrapped proof's public values. In contrast, SP1's SP1WrapVerifier::verify explicitly performs both checks: it asserts input.is_complete == 1 and calls assert_complete(builder, &public_values.inner, input.is_complete).

The wrap verifier is the final recursion stage before the proof is handed to the Gnark prover for Groth16/PLONK wrapping. It verifies a root proof and re-commits its public values. Without the completeness assertions, a wrap proof could be generated from an incomplete root proof -- one that does not represent a fully verified program execution. This means:

  1. The program may not have halted (next_pc != 0)
  2. The start shard may not be shard 1
  3. The global cumulative sum may be non-zero
  4. The deferred proofs digest may not match
  5. No execution shard may be present

All of these boundary conditions, normally enforced by assert_complete, would be skipped.

While the root verifier (ZKMCompressRootVerifier) does assert is_complete == 1, the wrap verifier is an independent entry point that processes root proofs. If a wrap circuit is invoked directly with a proof that did not pass through the root verifier, completeness would not be enforced.

Fix

Added completeness enforcement to the wrap verifier.

ZKMWrapVerifier::verify now:

  • asserts public_values.inner.is_complete == 1
  • calls assert_complete(builder, &public_values.inner, public_values.inner.is_complete)

This matches the defense-in-depth behavior used by the root/compress verifiers and prevents a wrap proof from being built on an incomplete root proof.

@felicityin

felicityin commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Issue 076

Low - syscall/precompiles/keccak_sponge/air.rs - Keccak sponge block position flags are one-way selectors (Veridise Bug #1/9 cross-chip variant)

Extends ZR-021 (boolean concern) with Veridise Bug #1/9 cross-reference: these flags are not just missing boolean checks -- they are one-way selectors that are never derived from the actual block position, matching the same vulnerability pattern found in SysLinux. The is_first_input_block, is_final_input_block, and read_block columns are prover-chosen witness values used as gate selectors throughout the chip, but none of them are constrained to be boolean, and their relationship to the actual block position is not enforced.

The eval_flags function (lines 115-131) derives compound flags from these unconstrained columns:

builder.assert_eq(first_block * first_step * local.is_real, local.receive_syscall);
builder.assert_eq(final_block * final_step * local.is_real, local.write_output);
builder.assert_eq(local.is_absorbed, final_step * not_final_block * local.is_real);

These constraints link receive_syscall, write_output, and is_absorbed to the products of the block flags and step flags. But since first_block and final_block are not constrained to be boolean:

  1. A malicious prover could set is_first_input_block = 2, causing receive_syscall = 2 * first_step * is_real. The receive_syscall value is used as the multiplicity in receive_syscall (line 49-57). A non-boolean multiplicity would corrupt the lookup argument.
  2. The read_block flag gates memory reads (lines 78, 158) and XOR operations (line 78). It is used as the do_check parameter in eval_memory_access, which asserts do_check is boolean (line 31 of [air/memory.rs](https://github.com/ProjectZKM/Ziren/blob/88e0521b82d3ed4c28005fbe220066c8ec76b31a/air/memory.rs)). So read_block is indirectly boolean-checked by the memory subsystem. But is_first_input_block and is_final_input_block are NOT used as do_check parameters -- they only appear in arithmetic products.
  3. The is_first_input_block flag affects absorbed-byte tracking: when(first_block).assert_eq(already_absorbed_u32s, 0) (line 88). If first_block is not boolean, this constraint is unreliable.
  4. The is_final_input_block flag affects the absorbed-byte check: when(final_block).assert_eq(already_absorbed_u32s, input_len - RATE) (line 90-93). A non-boolean final_block value breaks this constraint's intended meaning.

Note: is_real IS asserted to be boolean at line 40. But is_first_input_block and is_final_input_block are NOT.

Fix

Added completeness-style block flag constraints for the Keccak sponge AIR.

In crates/core/machine/src/syscall/precompiles/keccak_sponge/air.rs, I:

  • asserted is_first_input_block, is_final_input_block, and read_block are boolean
  • constrained is_first_input_block to be true only on the first block and stable within a block
  • constrained is_final_input_block to stay consistent with the final block position
  • kept read_block tied to the first Keccak round as before, but now explicitly boolean

In crates/core/machine/src/syscall/precompiles/keccak_sponge/trace.rs, I added a regression test that builds a 2-block sponge event and checks the trace flags match the expected block and round positions.

@felicityin

Copy link
Copy Markdown
Contributor Author

Issue 081

Low - SysMmapSyscall - Integer Overflow in mmap Size Alignment

The sysmmap syscall implementation uses wrapping_add for aligning the requested memory size to page boundaries. When the input size (parameter a1, sourced from guest memory) is close to u32::MAX, the wrapping addition can produce an unexpectedly small aligned size, leading to a smaller-than-expected memory allocation while the guest program believes it received the requested amount.

Specifically, the computation size.wrapping_add(PAGE_SIZE as u32 - (size & PAGE_ADDR_MASK as u32)) wraps around when size > u32::MAX - PAGE_SIZE. For example, if size = 0xFFFFF001, the alignment adjustment would be 0xFFFFF001 + (0x1000 - 0x001) = 0xFFFFF001 + 0xFFF = 0x100000000, which wraps to 0x00000000. The guest would receive a zero-size allocation but believe it has ~4GB mapped.

This is a prover-side issue in the executor, not a soundness issue in the constraint system. It could cause incorrect execution traces that either fail proof generation or produce proofs of incorrect computations.

Fix

Fixed sysmmap so page-size alignment no longer uses wrapping_add.

The syscall now:

  • aligns the requested size with checked_add
  • returns InvalidSyscallArgs() if alignment would overflow u32
  • preserves the original size when it is already page-aligned

I also added unit tests covering:

  • aligned sizes
  • normal round-up behavior
  • the overflow case near u32::MAX

@felicityin

Copy link
Copy Markdown
Contributor Author

Issue 083

Low - Executor - todo!() Panic in Store Instruction Handler

The execute_store method in the Ziren executor contains a catch-all _ => todo!() at line 2057 that would panic the prover if reached. While the Opcode enum is exhaustive and the code path should only be entered for store instructions (SB, SH, SW, SWL, SWR, SC), a future code change that adds a new store opcode without updating this match, or a logic error in the instruction classification, would cause an unrecoverable panic.

This pattern also exists in events/byte.rs:126 where add_byte_lookup_events_from_maps for Vec<ByteLookupEvent> has todo!().

Fix

Replaced the executor-side todo!() fallbacks with real behavior and clearer failure diagnostics.

In crates/core/executor/src/executor.rs, the store-instruction catch-all now uses:

  • unreachable!("unexpected store opcode: {:?}", instruction.opcode)

In crates/core/executor/src/events/byte.rs, the Vec<ByteLookupEvent> implementation now:

  • expands map counts into repeated events instead of panicking
  • matches the executor-side aggregation behavior

@felicityin

Copy link
Copy Markdown
Contributor Author

Issue 085

Low - SysBrkSyscall - Unbounded Heap Growth Without Upper Bound

The sysbrk syscall implementation grows the program break (heap boundary) without any upper bound check. The new break value is computed as max(a0, current_brk), where a0 is a guest-controlled value. A malicious guest can set a0 to any value up to u32::MAX, potentially causing the prover to allocate memory for the entire 4GB address space when subsequent memory operations reference addresses in the newly allocated region.

While the BRK register value itself is just a u32, downstream memory allocation when the guest actually writes to high addresses could exhaust prover memory.

Fix

Added an upper bound to sysbrk in crates/core/executor/src/syscalls/precompiles/sys_linux/sysbrk.rs . The syscall now derives the initial BRK from the program image, caps growth with MAX_HEAP_SIZE, and returns ExecutionError::InvalidSyscallArgs() when a requested break would exceed the limit. I also added unit tests for normal growth, no-op behavior, overflow rejection, and limit rejection.

@felicityin

Copy link
Copy Markdown
Contributor Author

Issue 086

Low - recursion/circuit/machine/ - MaybeUninit::zeroed().assume_init() on Circuit Variables

The recursion circuit's core.rs and compress.rs files use unsafe { MaybeUninit::zeroed().assume_init() } to initialize Felt<_> circuit variables before they are assigned values from proof public values. This pattern is technically undefined behavior (UB) in Rust if the type has any invariant excluding zero-initialized memory. While Felt wraps a field element where zero is valid, the pattern bypasses Rust's safety guarantees.

In the circuit context, these variables are always overwritten in the first loop iteration (i == 0) before being used in constraints. However, if the code were refactored to allow empty proof batches past the assert!(!shard_proofs.is_empty()) guard, the uninitialized values would propagate into constraint assertions.

Fix

Replaced the recursion circuit’s unsafe zero-initialization in core.rs and compress.rs with DSL-managed builder.uninit() for scalar Felt values, and array::from_fn(|_| builder.uninit()) for array-backed locals. This removes the MaybeUninit::zeroed().assume_init() UB pattern while preserving the existing proof-flow semantics.

@felicityin felicityin force-pushed the fix-issues branch 2 times, most recently from 4186cf9 to 33ec840 Compare June 26, 2026 09:54
@felicityin

Copy link
Copy Markdown
Contributor Author

Issue 087

Low - recursion/compiler/ir/collections.rs - todo!() on Fixed Array Operations May Cause Panics

Five todo!() macros exist in the recursion compiler's array operations for the Array::Fixed variant. These are in shift(), truncate(), get_ptr(), set(), and set_value() methods. If any circuit code path attempts to perform these operations on a fixed-size array, the compiler will panic during circuit generation, causing a denial of service for the proving pipeline.

While the primary circuit code paths use Array::Dyn (dynamic arrays), the Array::Fixed variant is created via builder.vec() and used for constant-length arrays. Any future refactoring that routes a Fixed array through these methods would silently introduce a crash.

Fix

Fixed the recursion compiler’s Array::Fixed handling in crates/recursion/compiler/src/ir/collections.rs. The todo!() branches are gone: shift, truncate, and get_ptr now fail with explicit unreachable!() messages for unsupported fixed-array semantics, and set / set_value now update fixed vectors directly when given a constant index.

@felicityin

Copy link
Copy Markdown
Contributor Author

Issue 088

Low - crates/core/machine/src/utils/mod.rs - zeroed_f_vec relies on debug-only size assertion for transmute safety

The zeroed_f_vec<F: Field> function transmutes Vec<u32> to Vec<F> with only a debug_assert! to verify that size_of::<F>() == 4. This assertion is stripped in release builds, meaning that if this generic function is ever instantiated with a field type whose size is not 4 bytes, it would silently produce undefined behavior. The function is called in 30+ locations across the core machine crate for trace generation, trace padding, and preprocessed trace allocation.

The transmute of zero values is sound for KoalaBear (the production field type) because Montgomery-form zero is represented as 0u32. However, the safety invariant depends entirely on a runtime check that is disabled in production.

Fix

Hardened zeroed_f_vec in crates/core/machine/src/utils/mod.rs by changing the size check from debug_assert! to a normal assert!, so the transmute precondition is enforced in release builds too. I also added a regression test that verifies zeroed_f_vec::<KoalaBear> returns the expected zero-filled vector.

@felicityin

Copy link
Copy Markdown
Contributor Author

Issue 089

Low - deserialize.go - Deserialization performs no bounds checking on input data

The deserializeData function in the Go runtime directly indexes into the data byte slice without checking that sufficient bytes remain. A truncated or malformed input will cause a Go runtime panic (index out of range) rather than a graceful error. While this code runs inside the zkVM guest (MIPS target), meaning panics translate to execution failures, the lack of bounds validation means:

  1. Slice length as uint64 cast to int (line 84-88): The deserialized length is a uint64 cast to int. On the 32-bit MIPS target, int is 32 bits, so lengths > 2^31-1 silently truncate, potentially causing the slice data[index : index+int(length)] to access an unexpected memory region.
  2. No validation that index + size <= len(data): Every type accessor (Bool, Int8, Int16, Int32, Int64, Slice, String) directly reads bytes at computed offsets without checking bounds first.
  3. Pointer type deserialization (line 117): When the first byte is nonzero (indicating non-nil), the function recurses into v.Elem(), but for a nil reflect.Value element this will panic.

Fix

Added explicit bounds checks throughout the Go runtime deserializer in crates/goruntime/zkvm_runtime/deserialize.go. All fixed-width reads now go through a shared readBytes helper, slice and string lengths are validated before converting uint64 to int, and nil pointer fields are allocated before recursive decoding instead of panicking on v.Elem().

@felicityin

Copy link
Copy Markdown
Contributor Author

Issue 090

Low - runtime.go - RESERVED_INPUT_PTR advances without bounds checking

The Go runtime Read[T]() function allocates memory from a reserved input region by advancing a global pointer (RESERVED_INPUT_PTR). Each call increments the pointer by the padded length of the hint data, but there is no check that the pointer remains within the valid memory region [MAX_MEMORY - EMBEDDED_RESERVED_INPUT_REGION_SIZE, MAX_MEMORY). Successive large reads could advance the pointer beyond MAX_MEMORY (0x7F000000), causing the unsafe.Pointer(uintptr(addr)) to point to invalid memory.

Additionally, the RESERVED_INPUT_PTR is a Go int (32-bit on MIPS32LE), initialized to MAX_MEMORY - EMBEDDED_RESERVED_INPUT_REGION_SIZE = 0x7F000000 - 0x40000000 = 0x3F000000. While the 1GB region is large, there is no guard against reading more total hint data than fits in this region.

This is comparable to the Rust entrypoint (crates/zkvm/entrypoint/src/lib.rs:79-85) which has the same pattern of advancing EMBEDDED_RESERVED_INPUT_PTR without bounds checking.

Fix

Added an explicit bounds check to the Go runtime’s reserved input allocator in crates/go-runtime/zkvm_runtime/runtime.go. Read[T]() now routes pointer advancement through readReservedInput(), which panics with input region overflowed if the hint region would exceed MAX_MEMORY or if the computed capacity is invalid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant