feat: isolate and track forester worker concurrency#2343
feat: isolate and track forester worker concurrency#2343sergeytimoshin wants to merge 15 commits intomainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughArc-based concurrency and task orchestration added to EpochManager (NewTreeWorker lifecycle, FuturesUnordered tasks); refinalize now returns updated epoch/pda state; V2 processing switched to snapshot-driven batches; many prover client APIs and indexer/SDK types converted to fallible Result-based flows and richer data shapes. Changes
Sequence Diagram(s)sequenceDiagram
participant EpochMgr as EpochManager (Arc)
participant Scheduler as FuturesUnordered
participant NewWorker as NewTreeWorker (thread)
participant ProofPool as ProofWorkers (channel)
participant Indexer as Indexer / RPC
participant DB as State/Accounts
EpochMgr->>Scheduler: process_queue(epoch_info, epoch_pda, schedule)
Scheduler->>EpochMgr: spawn dispatch_tree_processing tasks
EpochMgr->>NewWorker: register / spawn / cancel (oneshot)
EpochMgr->>ProofPool: spawn_proof_workers() -> Sender (fallible)
ProofPool->>Indexer: request proofs
Indexer->>ProofPool: return proof data
ProofPool->>DB: verify/store compressed proofs
NewWorker->>EpochMgr: completion / cancellation
Scheduler->>EpochMgr: collect ProcessingResult or panic_payload_message
EpochMgr->>EpochMgr: maybe_refinalize(...) -> (updated_epoch_pda, updated_schedule)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
dc3b8b4 to
fe08bcd
Compare
fe08bcd to
ef9d7fe
Compare
aaaa67c to
fbf48e5
Compare
There was a problem hiding this comment.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (9)
prover/client/src/helpers.rs (1)
38-48:⚠️ Potential issue | 🟠 MajorReject negative values in
bigint_to_u8_32.
to_bytes_be()returns(sign, magnitude). Right now-1serializes as0x…01instead of failing, which can turn invalid proof inputs into a different root/hash without any error.Suggested fix
pub fn bigint_to_u8_32(n: &BigInt) -> Result<[u8; 32], ProverClientError> { - let (_, bytes_be) = n.to_bytes_be(); + let (sign, bytes_be) = n.to_bytes_be(); + if sign == num_bigint::Sign::Minus { + return Err(ProverClientError::InvalidProofData( + "Negative values are not valid 32-byte field elements".to_string(), + )); + } if bytes_be.len() > 32 { return Err(ProverClientError::InvalidProofData( "Number too large to fit in [u8; 32]".to_string(), )); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@prover/client/src/helpers.rs` around lines 38 - 48, The function bigint_to_u8_32 currently ignores the sign from BigInt::to_bytes_be and therefore accepts negative values; update bigint_to_u8_32 to inspect the sign returned by to_bytes_be and return Err(ProverClientError::InvalidProofData(...)) when the sign is negative (i.e., not Sign::Plus), leaving the rest of the logic unchanged for non-negative values; reference the bigint_to_u8_32 function and the ProverClientError::InvalidProofData variant when implementing this check.forester/tests/legacy/batched_state_async_indexer_test.rs (1)
79-90:⚠️ Potential issue | 🟠 MajorRemove the second prover startup.
init()already starts the local validator withenable_prover: true. Spawning another prover here can race for port3001and fail the test before the worker-concurrency path is exercised.Based on learnings The light test validator must spawn three background processes: solana test validator on port 8899, prover server on port 3001, and photon indexer on port 8784.Simplest fix
- spawn_prover().await.unwrap();🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@forester/tests/legacy/batched_state_async_indexer_test.rs` around lines 79 - 90, The test starts a prover twice causing a port race: remove the explicit spawn_prover().await.unwrap() call so the prover started by init(Some(LightValidatorConfig { enable_prover: true, ... })) is the only prover; ensure init() is called with enable_indexer: true and the LightValidatorConfig remains as shown so the test validator still spawns the Solana validator, prover (port 3001) and photon indexer (port 8784) in the background.prover/client/src/proof_client.rs (1)
671-677:⚠️ Potential issue | 🔴 CriticalSerialize
low_element_next_indicesin the address-append payload.The
BatchAddressAppendInputsstruct carrieslow_element_next_indices, but the JSON serializer atprover/client/src/proof_types/batch_address_append/json.rsomits this field entirely. The prover circuit needs both the values and indices of boundary elements to validate indexed Merkle tree insertions (verifying that new addresses fit betweenlow_valueandlow_next_valueat the correct tree positions). Dropping the indices breaks this validation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@prover/client/src/proof_client.rs` around lines 671 - 677, The JSON serializer for BatchAddressAppendInputs is omitting the low_element_next_indices field, breaking prover validation; update the batch_address_append JSON serialization (the module that produces the payload consumed by generate_batch_address_append_proof) to include low_element_next_indices alongside low_element_values/low_next_values, serialize it with the exact key/name the prover circuit expects, and ensure the values are converted to JSON (e.g., a numeric array of indices) so to_json(&inputs) used in generate_batch_address_append_proof contains both the indices and values required by the circuit.forester/src/forester_status.rs (1)
674-685:⚠️ Potential issue | 🟠 MajorDon’t silently mask queue parse failures as “empty queue.”
At Line 674 and Line 729, deserialization errors are collapsed into
(0, 0). That makes malformed/unreadable queue data indistinguishable from truly empty queues and can produce misleading status/metrics output.Please either propagate the parse error (preferred) or at least emit a warning with tree/queue identifiers before fallback.
Also applies to: 729-740
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@forester/src/forester_status.rs` around lines 674 - 685, The code currently swallows deserialization errors by calling parse_hash_set_from_bytes::<QueueAccount>(&acc.data).ok().map(...).unwrap_or((0,0)), which makes malformed queue data indistinguishable from an empty queue; replace these silent fallbacks (the two occurrences using parse_hash_set_from_bytes::<QueueAccount>) with proper error handling: ideally change the enclosing function to return Result and propagate the parse error (use parse_hash_set_from_bytes::<QueueAccount>(&acc.data)? and compute (len, cap) on Ok), otherwise at minimum catch Err(e) and emit a warning including identifying info (e.g., the tree/queue identifiers available in scope such as acc.pubkey or tree_id) before returning the fallback (0,0); apply the same fix at both locations (the parse_hash_set_from_bytes::<QueueAccount> usages around the shown block and the 729–740 block).prover/client/src/proof_types/batch_append/proof_inputs.rs (1)
115-130:⚠️ Potential issue | 🟠 MajorReject length mismatches before the zipped loop.
zip()stops at the shortest input. Ifold_leaves,leaves, ormerkle_proofsis shorter thanbatch_size, this still returnsbatch_sizeunchanged but computes the root from only a partial batch. That will generate inconsistent public inputs and very opaque prover failures.Suggested guard
) -> Result<(BatchAppendsCircuitInputs, Vec<ChangelogEntry<HEIGHT>>), ProverClientError> { + let expected_len = batch_size as usize; + if leaves.len() != expected_len + || old_leaves.len() != expected_len + || merkle_proofs.len() != expected_len + { + return Err(ProverClientError::GenericError(format!( + "batch append input length mismatch: leaves={}, old_leaves={}, merkle_proofs={}, expected={}", + leaves.len(), + old_leaves.len(), + merkle_proofs.len(), + expected_len, + ))); + } + let mut new_root = [0u8; 32];Also applies to: 135-200
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@prover/client/src/proof_types/batch_append/proof_inputs.rs` around lines 115 - 130, The function get_batch_append_inputs currently zips old_leaves, leaves, and merkle_proofs and can silently process a partial batch if any of those vectors are shorter than batch_size; add explicit length checks before the zipped loop: verify old_leaves.len() == batch_size as usize, leaves.len() == batch_size as usize, and merkle_proofs.len() == batch_size as usize (or otherwise reject mismatches), and return an appropriate ProverClientError when any mismatch is detected so the function fails fast and never computes roots/inputs from a partial batch.sdk-libs/client/src/interface/initialize_config.rs (1)
20-27:⚠️ Potential issue | 🔴 CriticalConsolidate to shared
InitializeLightConfigParamspayload and useanchor_lang::Discriminator.The
InitializeRentFreeConfig::build()method will produce an incompatible instruction payload. It serializesInitializeCompressionConfigAnchorData(which lacksconfig_bump), but the siblinginitialize_config()function ininstructions.rscorrectly usesInitializeLightConfigParamswithconfig_bumpincluded. Both target the same on-chain instruction, so they must produce identical wire formats—presently they do not.Additionally, line 122 hardcodes the discriminator bytes. Replace both issues by:
- Reusing
InitializeLightConfigParamshere instead of the local struct- Using
anchor_lang::Discriminatortrait to derive the discriminator (as required by coding guidelines forsdk-libs/**/src/**/*.rs) rather than hardcoding bytes🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk-libs/client/src/interface/initialize_config.rs` around lines 20 - 27, The build path is serializing the wrong payload and hardcodes the discriminator: replace the local struct InitializeCompressionConfigAnchorData with the shared InitializeLightConfigParams so InitializeRentFreeConfig::build() emits the same wire format as the existing initialize_config() function, and remove the manual discriminator bytes in favor of deriving and using anchor_lang::Discriminator (call the trait implementation to obtain the discriminator) rather than hardcoding values; update any serialization calls in InitializeRentFreeConfig::build() to use InitializeLightConfigParams and the Discriminator API so both instruction creators produce identical payloads.sdk-libs/program-test/src/indexer/test_indexer.rs (1)
2502-2535:⚠️ Potential issue | 🟠 MajorHandle prover HTTP failures explicitly instead of spinning or panicking.
This loop only decrements
retrieson transport errors. A prover that keeps returning 4xx/5xx will spin forever, and a 200 with an unreadable body still hitsunwrap()ontext()/deserialize_gnark_proof_json(). Please convert both HTTP-status failures and parse failures intoIndexerErrorand consume a retry in those branches.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 2502 - 2535, The HTTP loop currently only decrements retries on transport errors and uses unwrap() on response.text() and deserialize_gnark_proof_json(), so server 4xx/5xx and parse failures can spin or panic; update the client.post handling (the block around client.post(...).send().await and the success branch that calls response.text(), deserialize_gnark_proof_json, proof_from_json_struct, and compress_proof) to: treat non-success status codes as IndexerError::CustomError (include status and body if available), convert any failures from response.text(), deserialize_gnark_proof_json, proof_from_json_struct, or compress_proof into IndexerError::CustomError rather than unwrap/panic, and ensure each of those error branches decrements retries and follows the same retry/sleep logic so the function eventually returns Err(IndexerError::CustomError("Failed to get proof from server")) after exhausting retries; reference functions/values: client.post, response.text(), deserialize_gnark_proof_json, proof_from_json_struct, compress_proof, IndexerError, and ValidityProofWithContext.forester/src/epoch_manager.rs (2)
4775-4785:⚠️ Potential issue | 🟠 MajorShutdown path can bypass tracked new-tree worker teardown.
When shutdown wins the
tokio::select!in Line 4775,epoch_manager.run()is dropped before reaching Line 680 cleanup, so trackedstd::threadworkers may outlive service shutdown and continue processing.💡 Proposed fix
- let result = tokio::select! { - result = epoch_manager.run() => result, + let epoch_manager_for_run = epoch_manager.clone(); + let result = tokio::select! { + result = epoch_manager_for_run.run() => result, _ = shutdown => { info!( event = "shutdown_received", run_id = %run_id_for_logs, phase = "service_run", "Received shutdown signal. Stopping the service." ); + epoch_manager + .shutdown_new_tree_workers(NEW_TREE_WORKER_SHUTDOWN_TIMEOUT) + .await; Ok(()) } };Also applies to: 680-681
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@forester/src/epoch_manager.rs` around lines 4775 - 4785, The shutdown branch in the tokio::select! currently just returns Ok(()) which drops epoch_manager.run() and skips the cleanup that joins tracked std::thread workers; instead, on shutdown you must trigger a graceful stop on the EpochManager and await its run future to complete so its cleanup code (the tracked-worker teardown at/around epoch_manager.run() and the cleanup near lines ~680) runs: send the same shutdown signal or call the epoch_manager.shutdown()/stop() method (or set a shutdown flag) and then await epoch_manager.run() (or call a provided join/await method) before returning Ok(()) so all tracked std::thread workers are torn down and joined. Ensure you reference and use the epoch_manager.run() future and any explicit join/teardown methods on EpochManager to perform the worker joins instead of dropping the future.
2181-2192:⚠️ Potential issue | 🔴 Critical
RegistrationTrackerclaim can get stuck on early error paths.After
try_claim_refinalize()succeeds, Line 2188 and Line 2191 use?. If either fails,complete_refinalize()is never called, leavingrefinalize_in_progress=trueand potentially blocking other workers forever inwait_for_refinalize().💡 Proposed fix
if registration_tracker.try_claim_refinalize() { - // This task sends the finalize_registration tx - let ix = create_finalize_registration_instruction( - &self.config.payer_keypair.pubkey(), - &self.config.derivation_pubkey, - epoch_info.epoch, - ); - let priority_fee = self - .resolve_epoch_priority_fee(&*rpc, epoch_info.epoch) - .await?; - let current_slot = rpc.get_slot().await?; - let Some(confirmation_deadline) = scheduled_confirmation_deadline( - epoch_info.phases.active.end.saturating_sub(current_slot), - ) else { - info!(...); - registration_tracker.complete_refinalize(cached_weight); - return Ok((forester_epoch_pda, tree_schedule)); - }; - ... - match send_smart_transaction(...).await.map_err(RpcError::from) { - Ok(_) => { ... registration_tracker.complete_refinalize(post_finalize_weight); } - Err(e) => { - registration_tracker.complete_refinalize(cached_weight); - return Err(e.into()); - } - } + let mut completion_weight = cached_weight; + let claim_result: Result<()> = async { + let ix = create_finalize_registration_instruction( + &self.config.payer_keypair.pubkey(), + &self.config.derivation_pubkey, + epoch_info.epoch, + ); + let priority_fee = self + .resolve_epoch_priority_fee(&*rpc, epoch_info.epoch) + .await?; + let current_slot = rpc.get_slot().await?; + let Some(confirmation_deadline) = scheduled_confirmation_deadline( + epoch_info.phases.active.end.saturating_sub(current_slot), + ) else { + info!(...); + return Ok(()); + }; + match send_smart_transaction(...).await.map_err(RpcError::from) { + Ok(_) => { + completion_weight = match rpc.get_anchor_account::<EpochPda>(&epoch_pda_address).await { + Ok(Some(pda)) => pda.registered_weight, + _ => on_chain_weight, + }; + Ok(()) + } + Err(e) => Err(e.into()), + } + }.await; + + registration_tracker.complete_refinalize(completion_weight); + claim_result?; } else { registration_tracker.wait_for_refinalize().await; }Also applies to: 2246-2250
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@forester/src/epoch_manager.rs` around lines 2181 - 2192, After a successful RegistrationTracker::try_claim_refinalize() return ensure RegistrationTracker::complete_refinalize() is always called even if subsequent calls like resolve_epoch_priority_fee(), rpc.get_slot(), or scheduled_confirmation_deadline(...) fail; wrap the post-claim work in a scope that guarantees cleanup (e.g., a scope guard or an explicit match/let guard) so complete_refinalize() runs on all early-return/error paths, and apply the same pattern around the other similar block (lines ~2246-2250) to avoid leaving refinalize_in_progress = true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@forester/src/config.rs`:
- Around line 341-347: The current use of partition() makes the Err branch
already returned, so the map_err fallback on valid.into_iter().collect(...) is
unreachable and gives a misleading generic error; change the logic in the tree
ID parsing flow (the variables/blocks using partition(), valid, invalid and the
ConfigError::InvalidArguments with field "tree_ids") to either directly
unwrap/collect the already-validated `valid` Vec<Pubkey> (since `invalid` was
handled) or perform a single-pass parse that collects invalid inputs and returns
ConfigError::InvalidArguments with the actual invalid_values; update the code
around the partition() usage to return the real invalid strings instead of the
generic "failed to parse tree_ids".
In `@forester/src/processor/v2/helpers.rs`:
- Around line 495-524: The function currently treats truncated batches as valid
by using actual_end = end.min(available) and proceeding; change the guard so
that if wait_for_batch(end) returns available < end the function returns
Ok(None) (i.e., do not proceed with partial batches). Specifically, in
get_batch_snapshot replace the current if start >= available { return Ok(None);
} / actual_end logic with a check like if available < end { return Ok(None); }
(still keep the start >= available check as needed), so code paths that call
lock_recover, read data.addresses[start..actual_end], or derive
leaves_hash_chain (data.leaves_hash_chains, create_hash_chain_from_slice) never
run for undersized batches.
In `@forester/src/processor/v2/proof_worker.rs`:
- Around line 197-214: The serialization-failure path uses two different
timestamps (submitted_at = Instant::now() here and round_trip_start elsewhere)
and silently ignores closed result_tx; consolidate this by adding a helper
(e.g., send_error_result or report_serialization_failure) used by both the
job.inputs.to_json error branch and the other serialization-before-submit branch
that: captures one Instant::now() into a single variable (round_trip_start or
submitted_at) at the failure site, logs the error with error! including the
error details, constructs the ProofJobResult with that single timestamp and
consistent round_trip_ms/proof_duration_ms values, and attempts to send via
job.result_tx while handling a closed channel by logging the send failure;
update both locations (the error branch around job.inputs.to_json and the other
branch that currently reuses round_trip_start) to call this helper so latency
metrics and logging are consistent.
In `@forester/tests/test_batch_append_spent.rs`:
- Around line 332-346: The spawned closure passed to tokio::task::spawn_blocking
(bound to service_handle) can fail at runtime construction
(tokio::runtime::Builder::build), during
runtime.block_on(run_pipeline::<LightClient>(...)), and when joining the spawned
task; currently all results are ignored—update the test to explicitly handle
each error: unwrap or propagate the runtime build Result (ensure the build error
is surfaced), propagate/unwrap the Result returned by
run_pipeline::<LightClient> so pipeline failures fail the test, and await/join
the JoinHandle and check its JoinError (or unwrap) instead of discarding with
let _ = ..., failing the test on panic or join failure; reference
service_handle, runtime.block_on, run_pipeline::<LightClient>,
shutdown_receiver, shutdown_compressible_receiver, and work_report_sender when
making these changes.
In `@forester/tests/test_compressible_pda.rs`:
- Around line 312-313: Replace the bare .unwrap() calls on builder results with
.expect(...) that includes a descriptive message; specifically change the
.build().unwrap() invocations that call InitializeRentFreeConfig::build() (and
the two other similar setup builder calls in this test) to
.build().expect("InitializeRentFreeConfig::build() failed") or an equivalent
message that names the failing builder so failures in setup are clear. Ensure
each replaced call uses a distinct message if helpful to identify which builder
failed.
In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs`:
- Around line 190-195: The code currently slices
new_element_values[..zkp_batch_size] and then indexes parallel inputs
(low_element_values, low_element_next_values, low_element_indices,
low_element_next_indices, low_element_proofs) by that length, which can panic if
any of those slices are shorter; add explicit guards at the start of the
function that compute required_len = zkp_batch_size and verify each parallel
input's len() >= required_len (for low_element_proofs check len() >=
required_len and each inner-proof has HEIGHT elements) and return a
ProverClientError on failure instead of allowing a panic; reference the
parameter names low_element_values, low_element_next_values,
low_element_indices, low_element_next_indices, low_element_proofs,
new_element_values and zkp_batch_size when adding these checks.
In `@prover/client/src/proof_types/batch_append/json.rs`:
- Around line 76-79: Rename the inherent fallible method `to_string(&self) ->
Result<String, ProverClientError>` to a clearly-fallible name like
`try_to_json_string(&self) -> Result<String, ProverClientError>` (or
`to_json_string` if you prefer), update all call sites that invoke `to_string()`
to call the new name, keep the same return type and body
(`create_json_from_struct(&self)`), and remove the
`#[allow(clippy::inherent_to_string)]` attribute since the method no longer
shadows the conventional `to_string`. Ensure unit tests and any trait impls or
uses are updated to the new identifier.
In `@prover/client/src/proof_types/batch_update/proof_inputs.rs`:
- Around line 115-118: The public_inputs method currently indexes self.0[0],
which will panic on an empty batch and only returns the first entry; instead
iterate over the inner slice (self.0), call public_inputs_arr() on each item,
collect the results into the Vec<[u8; 32]>, and propagate any errors as
ProverClientError; update public_inputs to map/try_collect (or equivalent) over
self.0 so all batch entries are included and empty slices are handled without
panicking (or explicitly return an error if an empty batch is considered
invalid).
In `@prover/client/src/proof_types/non_inclusion/v2/json.rs`:
- Around line 54-62: The JSON serialization currently forces path_index to a u32
by converting input.index_hashed_indexed_element_leaf via to_u32(), which will
fail for trees deeper than 32 bits; update the contract to allow larger indices
by switching path_index to a wider representation (for example: use to_u64() and
a u64 field, or serialize the index as a string/BigInt) and change the
conversion call from to_u32() to the chosen conversion (e.g., to_u64() or
to_string()) and update any consumers of path_index accordingly so indices >
u32::MAX are supported; ensure error handling still returns
ProverClientError::IntegerConversion with an appropriate message if conversion
to the new representation fails.
In `@prover/client/src/proof.rs`:
- Around line 167-170: The negate_g1 function currently deserializes the
incoming G1 point with Validate::No allowing malformed/off-curve points through;
change the call to G1::deserialize_with_mode(g1_le.as_slice(), Compress::No,
Validate::Yes) so the point is validated at deserialization and any error still
maps to ProverClientError::InvalidProofData; this uses the existing
convert_endianness and error mapping, so no further changes to error handling
are needed.
In `@sdk-libs/client/src/indexer/types/proof.rs`:
- Around line 214-223: The loop that computes output tree indices (using
account.tree_info.next_tree_info and account.tree_info.pack_output_tree_index on
each Account) currently keeps the first index in output_tree_index and ignores
mismatches; update PackedStateTreeInfos construction to validate that every call
to pack_output_tree_index returns the same index and return an error if any
subsequent index differs. Specifically, in the code paths that set
output_tree_index (the blocks referencing output_tree_index,
account.tree_info.next_tree_info, and pack_output_tree_index), compare the newly
computed index to the previously stored Some(index) and propagate a descriptive
Err when they diverge instead of silently ignoring the difference; apply the
same validation to the other occurrence noted (around lines 243-248).
In `@sdk-libs/client/src/interface/load_accounts.rs`:
- Around line 120-124: The missing-PDA error loses the original position because
collect_pda_hashes restarts enumeration for each group; update the flow so the
original flat index from cold_pdas is preserved: modify
group_pda_specs/collect_pda_hashes usage so each group carries a base_index (or
have collect_pda_hashes return (hash, original_index) pairs) and use that
original index when constructing MissingPdaCompressed; apply the same change to
the other occurrence referenced (lines ~164-175) so every MissingPdaCompressed
contains the flattened index from cold_pdas rather than the per-group 0-based
index.
In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 127-136: The non-surfpool branch currently spawns the validator
with Command::new(...).spawn() and immediately drops the Child, so failures
during the sleep are ignored; change this to keep the Child (e.g., let mut child
= Command::new(...).spawn()?), await the
tokio::time::sleep(Duration::from_secs(config.wait_time)).await while retaining
the child, then call child.try_wait() (or child.wait() if blocking is
acceptable) and inspect the ExitStatus; if the child has exited with a
non-success status return an io::Error (or map the ExitStatus to Err) so callers
receive an error instead of Ok(()). Ensure you mirror the surfpool branch
pattern (spawn -> keep handle -> wait/try_wait -> check status) and propagate
any non-zero exit as an io::Error.
- Around line 69-115: The current code concatenates command pieces into a single
shell string and spawns it via Command::new("sh").arg("-c").arg(path), which
breaks on spaces/metacharacters (see path construction and uses of
config.enable_indexer, config.limit_ledger_size, config.sbf_programs,
config.upgradeable_programs, config.validator_args and config.use_surfpool).
Change to build a Vec<String> (or Vec<&str>) of argv: start with the executable
built from project_root + "/cli/test_bin/run", then push "test-validator" and
each flag/value as separate .arg(...) entries (one per sbf_program pair and per
upgradeable_program fields, and each validator_arg as its own arg) and call
Command::new(executable).args(&args).spawn() instead of using "sh -c". Also make
the non-surfpool branch behave like the surfpool branch: keep the child handle,
call child.wait(), check the exit Status and propagate or log failures instead
of dropping the child and sleeping unconditionally.
---
Outside diff comments:
In `@forester/src/epoch_manager.rs`:
- Around line 4775-4785: The shutdown branch in the tokio::select! currently
just returns Ok(()) which drops epoch_manager.run() and skips the cleanup that
joins tracked std::thread workers; instead, on shutdown you must trigger a
graceful stop on the EpochManager and await its run future to complete so its
cleanup code (the tracked-worker teardown at/around epoch_manager.run() and the
cleanup near lines ~680) runs: send the same shutdown signal or call the
epoch_manager.shutdown()/stop() method (or set a shutdown flag) and then await
epoch_manager.run() (or call a provided join/await method) before returning
Ok(()) so all tracked std::thread workers are torn down and joined. Ensure you
reference and use the epoch_manager.run() future and any explicit join/teardown
methods on EpochManager to perform the worker joins instead of dropping the
future.
- Around line 2181-2192: After a successful
RegistrationTracker::try_claim_refinalize() return ensure
RegistrationTracker::complete_refinalize() is always called even if subsequent
calls like resolve_epoch_priority_fee(), rpc.get_slot(), or
scheduled_confirmation_deadline(...) fail; wrap the post-claim work in a scope
that guarantees cleanup (e.g., a scope guard or an explicit match/let guard) so
complete_refinalize() runs on all early-return/error paths, and apply the same
pattern around the other similar block (lines ~2246-2250) to avoid leaving
refinalize_in_progress = true.
In `@forester/src/forester_status.rs`:
- Around line 674-685: The code currently swallows deserialization errors by
calling
parse_hash_set_from_bytes::<QueueAccount>(&acc.data).ok().map(...).unwrap_or((0,0)),
which makes malformed queue data indistinguishable from an empty queue; replace
these silent fallbacks (the two occurrences using
parse_hash_set_from_bytes::<QueueAccount>) with proper error handling: ideally
change the enclosing function to return Result and propagate the parse error
(use parse_hash_set_from_bytes::<QueueAccount>(&acc.data)? and compute (len,
cap) on Ok), otherwise at minimum catch Err(e) and emit a warning including
identifying info (e.g., the tree/queue identifiers available in scope such as
acc.pubkey or tree_id) before returning the fallback (0,0); apply the same fix
at both locations (the parse_hash_set_from_bytes::<QueueAccount> usages around
the shown block and the 729–740 block).
In `@forester/tests/legacy/batched_state_async_indexer_test.rs`:
- Around line 79-90: The test starts a prover twice causing a port race: remove
the explicit spawn_prover().await.unwrap() call so the prover started by
init(Some(LightValidatorConfig { enable_prover: true, ... })) is the only
prover; ensure init() is called with enable_indexer: true and the
LightValidatorConfig remains as shown so the test validator still spawns the
Solana validator, prover (port 3001) and photon indexer (port 8784) in the
background.
In `@prover/client/src/helpers.rs`:
- Around line 38-48: The function bigint_to_u8_32 currently ignores the sign
from BigInt::to_bytes_be and therefore accepts negative values; update
bigint_to_u8_32 to inspect the sign returned by to_bytes_be and return
Err(ProverClientError::InvalidProofData(...)) when the sign is negative (i.e.,
not Sign::Plus), leaving the rest of the logic unchanged for non-negative
values; reference the bigint_to_u8_32 function and the
ProverClientError::InvalidProofData variant when implementing this check.
In `@prover/client/src/proof_client.rs`:
- Around line 671-677: The JSON serializer for BatchAddressAppendInputs is
omitting the low_element_next_indices field, breaking prover validation; update
the batch_address_append JSON serialization (the module that produces the
payload consumed by generate_batch_address_append_proof) to include
low_element_next_indices alongside low_element_values/low_next_values, serialize
it with the exact key/name the prover circuit expects, and ensure the values are
converted to JSON (e.g., a numeric array of indices) so to_json(&inputs) used in
generate_batch_address_append_proof contains both the indices and values
required by the circuit.
In `@prover/client/src/proof_types/batch_append/proof_inputs.rs`:
- Around line 115-130: The function get_batch_append_inputs currently zips
old_leaves, leaves, and merkle_proofs and can silently process a partial batch
if any of those vectors are shorter than batch_size; add explicit length checks
before the zipped loop: verify old_leaves.len() == batch_size as usize,
leaves.len() == batch_size as usize, and merkle_proofs.len() == batch_size as
usize (or otherwise reject mismatches), and return an appropriate
ProverClientError when any mismatch is detected so the function fails fast and
never computes roots/inputs from a partial batch.
In `@sdk-libs/client/src/interface/initialize_config.rs`:
- Around line 20-27: The build path is serializing the wrong payload and
hardcodes the discriminator: replace the local struct
InitializeCompressionConfigAnchorData with the shared
InitializeLightConfigParams so InitializeRentFreeConfig::build() emits the same
wire format as the existing initialize_config() function, and remove the manual
discriminator bytes in favor of deriving and using anchor_lang::Discriminator
(call the trait implementation to obtain the discriminator) rather than
hardcoding values; update any serialization calls in
InitializeRentFreeConfig::build() to use InitializeLightConfigParams and the
Discriminator API so both instruction creators produce identical payloads.
In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 2502-2535: The HTTP loop currently only decrements retries on
transport errors and uses unwrap() on response.text() and
deserialize_gnark_proof_json(), so server 4xx/5xx and parse failures can spin or
panic; update the client.post handling (the block around
client.post(...).send().await and the success branch that calls response.text(),
deserialize_gnark_proof_json, proof_from_json_struct, and compress_proof) to:
treat non-success status codes as IndexerError::CustomError (include status and
body if available), convert any failures from response.text(),
deserialize_gnark_proof_json, proof_from_json_struct, or compress_proof into
IndexerError::CustomError rather than unwrap/panic, and ensure each of those
error branches decrements retries and follows the same retry/sleep logic so the
function eventually returns Err(IndexerError::CustomError("Failed to get proof
from server")) after exhausting retries; reference functions/values:
client.post, response.text(), deserialize_gnark_proof_json,
proof_from_json_struct, compress_proof, IndexerError, and
ValidityProofWithContext.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b8aeca30-1184-4e0f-84c8-f3cb3237bc1f
⛔ Files ignored due to path filters (63)
forester-utils/src/account_zero_copy.rsis excluded by none and included by noneforester-utils/src/address_merkle_tree_config.rsis excluded by none and included by noneforester-utils/src/address_staging_tree.rsis excluded by none and included by noneforester-utils/src/forester_epoch.rsis excluded by none and included by noneforester-utils/src/instructions/compress_and_close_mint.rsis excluded by none and included by noneforester-utils/src/rate_limiter.rsis excluded by none and included by noneforester-utils/src/registry.rsis excluded by none and included by noneprogram-tests/account-compression-test/tests/address_merkle_tree_tests.rsis excluded by none and included by noneprogram-tests/account-compression-test/tests/merkle_tree_tests.rsis excluded by none and included by noneprogram-tests/batched-merkle-tree-test/tests/merkle_tree.rsis excluded by none and included by noneprogram-tests/compressed-token-test/tests/freeze/functional.rsis excluded by none and included by noneprogram-tests/compressed-token-test/tests/v1.rsis excluded by none and included by noneprogram-tests/system-cpi-v2-test/tests/event.rsis excluded by none and included by noneprogram-tests/system-cpi-v2-test/tests/invoke_cpi_with_read_only.rsis excluded by none and included by noneprogram-tests/utils/src/account_zero_copy.rsis excluded by none and included by noneprogram-tests/utils/src/actions/legacy/instructions/transfer2.rsis excluded by none and included by noneprogram-tests/utils/src/address_tree_rollover.rsis excluded by none and included by noneprogram-tests/utils/src/assert_compressed_tx.rsis excluded by none and included by noneprogram-tests/utils/src/assert_merkle_tree.rsis excluded by none and included by noneprogram-tests/utils/src/assert_queue.rsis excluded by none and included by noneprogram-tests/utils/src/batched_address_tree.rsis excluded by none and included by noneprogram-tests/utils/src/e2e_test_env.rsis excluded by none and included by noneprogram-tests/utils/src/lib.rsis excluded by none and included by noneprogram-tests/utils/src/mock_batched_forester.rsis excluded by none and included by noneprogram-tests/utils/src/state_tree_rollover.rsis excluded by none and included by noneprogram-tests/utils/src/test_batch_forester.rsis excluded by none and included by noneprogram-tests/utils/src/test_forester.rsis excluded by none and included by nonesdk-tests/anchor-manual-test/tests/shared.rsis excluded by none and included by nonesdk-tests/anchor-semi-manual-test/tests/shared/mod.rsis excluded by none and included by nonesdk-tests/anchor-semi-manual-test/tests/stress_test.rsis excluded by none and included by nonesdk-tests/client-test/tests/light_client.rsis excluded by none and included by nonesdk-tests/csdk-anchor-full-derived-test/tests/amm_stress_test.rsis excluded by none and included by nonesdk-tests/csdk-anchor-full-derived-test/tests/amm_test.rsis excluded by none and included by nonesdk-tests/csdk-anchor-full-derived-test/tests/basic_test.rsis excluded by none and included by nonesdk-tests/csdk-anchor-full-derived-test/tests/d10_ata_idempotent_test.rsis excluded by none and included by nonesdk-tests/csdk-anchor-full-derived-test/tests/d10_token_accounts_test.rsis excluded by none and included by nonesdk-tests/csdk-anchor-full-derived-test/tests/d11_zero_copy_test.rsis excluded by none and included by nonesdk-tests/csdk-anchor-full-derived-test/tests/failing_tests.rsis excluded by none and included by nonesdk-tests/csdk-anchor-full-derived-test/tests/integration_tests.rsis excluded by none and included by nonesdk-tests/csdk-anchor-full-derived-test/tests/shared.rsis excluded by none and included by nonesdk-tests/pinocchio-light-program-test/tests/shared/mod.rsis excluded by none and included by nonesdk-tests/pinocchio-light-program-test/tests/stress_test.rsis excluded by none and included by nonesdk-tests/pinocchio-manual-test/tests/shared.rsis excluded by none and included by nonesdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/read_only.rsis excluded by none and included by nonesdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/test.rsis excluded by none and included by nonesdk-tests/sdk-native-test/tests/test.rsis excluded by none and included by nonesdk-tests/sdk-pinocchio-v1-test/tests/test.rsis excluded by none and included by nonesdk-tests/sdk-pinocchio-v2-test/tests/test.rsis excluded by none and included by nonesdk-tests/sdk-token-test/tests/ctoken_pda.rsis excluded by none and included by nonesdk-tests/sdk-token-test/tests/decompress_full_cpi.rsis excluded by none and included by nonesdk-tests/sdk-token-test/tests/pda_ctoken.rsis excluded by none and included by nonesdk-tests/sdk-token-test/tests/test.rsis excluded by none and included by nonesdk-tests/sdk-token-test/tests/test_4_invocations.rsis excluded by none and included by nonesdk-tests/sdk-token-test/tests/test_4_transfer2.rsis excluded by none and included by nonesdk-tests/sdk-token-test/tests/test_deposit.rsis excluded by none and included by nonesdk-tests/sdk-v1-native-test/tests/test.rsis excluded by none and included by nonesdk-tests/single-account-loader-test/tests/test.rsis excluded by none and included by nonesdk-tests/single-ata-test/tests/test.rsis excluded by none and included by nonesdk-tests/single-mint-test/tests/test.rsis excluded by none and included by nonesdk-tests/single-pda-test/tests/test.rsis excluded by none and included by nonesdk-tests/single-token-test/tests/test.rsis excluded by none and included by nonesparse-merkle-tree/src/indexed_changelog.rsis excluded by none and included by nonesparse-merkle-tree/tests/indexed_changelog.rsis excluded by none and included by none
📒 Files selected for processing (66)
forester/src/config.rsforester/src/epoch_manager.rsforester/src/forester_status.rsforester/src/main.rsforester/src/metrics.rsforester/src/priority_fee.rsforester/src/processor/v2/helpers.rsforester/src/processor/v2/proof_worker.rsforester/src/processor/v2/strategy/address.rsforester/src/queue_helpers.rsforester/tests/e2e_test.rsforester/tests/legacy/batched_state_async_indexer_test.rsforester/tests/legacy/test_utils.rsforester/tests/test_batch_append_spent.rsforester/tests/test_compressible_pda.rsforester/tests/test_indexer_interface.rsforester/tests/test_utils.rsprover/client/src/errors.rsprover/client/src/helpers.rsprover/client/src/proof.rsprover/client/src/proof_client.rsprover/client/src/proof_types/batch_address_append/json.rsprover/client/src/proof_types/batch_address_append/proof_inputs.rsprover/client/src/proof_types/batch_append/json.rsprover/client/src/proof_types/batch_append/proof_inputs.rsprover/client/src/proof_types/batch_update/json.rsprover/client/src/proof_types/batch_update/proof_inputs.rsprover/client/src/proof_types/combined/v1/json.rsprover/client/src/proof_types/combined/v2/json.rsprover/client/src/proof_types/combined/v2/proof_inputs.rsprover/client/src/proof_types/inclusion/v1/json.rsprover/client/src/proof_types/inclusion/v1/proof_inputs.rsprover/client/src/proof_types/inclusion/v2/json.rsprover/client/src/proof_types/inclusion/v2/proof_inputs.rsprover/client/src/proof_types/non_inclusion/v1/json.rsprover/client/src/proof_types/non_inclusion/v1/proof_inputs.rsprover/client/src/proof_types/non_inclusion/v2/json.rsprover/client/src/proof_types/non_inclusion/v2/proof_inputs.rsprover/client/src/prover.rsprover/client/tests/batch_address_append.rsprover/client/tests/batch_append.rsprover/client/tests/batch_update.rsprover/client/tests/combined.rsprover/client/tests/inclusion.rsprover/client/tests/init_merkle_tree.rsprover/client/tests/non_inclusion.rssdk-libs/client/src/fee.rssdk-libs/client/src/indexer/error.rssdk-libs/client/src/indexer/options.rssdk-libs/client/src/indexer/photon_indexer.rssdk-libs/client/src/indexer/types/proof.rssdk-libs/client/src/indexer/types/queue.rssdk-libs/client/src/indexer/types/token.rssdk-libs/client/src/interface/initialize_config.rssdk-libs/client/src/interface/instructions.rssdk-libs/client/src/interface/load_accounts.rssdk-libs/client/src/interface/mod.rssdk-libs/client/src/interface/pack.rssdk-libs/client/src/interface/serialize.rssdk-libs/client/src/local_test_validator.rssdk-libs/client/src/rpc/client.rssdk-libs/client/src/utils.rssdk-libs/macros/src/light_pdas/seeds/extract.rssdk-libs/program-test/src/indexer/test_indexer.rssdk-libs/program-test/src/program_test/compressible_setup.rssdk-libs/program-test/src/program_test/light_program_test.rs
forester/src/config.rs
Outdated
| valid | ||
| .into_iter() | ||
| .collect::<std::result::Result<Vec<_>, _>>() | ||
| .map_err(|_| ConfigError::InvalidArguments { | ||
| field: "tree_ids", | ||
| invalid_values: vec!["failed to parse tree_ids".to_string()], | ||
| })? |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
This fallback parse error is unreachable after the partition() above.
valid can only contain Ok(Pubkey) values once the invalid branch has returned, so the generic "failed to parse tree_ids" path never executes. I'd either unwrap the partitioned values directly or collapse the parse into a single pass so the error surface stays aligned with the real invalid inputs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@forester/src/config.rs` around lines 341 - 347, The current use of
partition() makes the Err branch already returned, so the map_err fallback on
valid.into_iter().collect(...) is unreachable and gives a misleading generic
error; change the logic in the tree ID parsing flow (the variables/blocks using
partition(), valid, invalid and the ConfigError::InvalidArguments with field
"tree_ids") to either directly unwrap/collect the already-validated `valid`
Vec<Pubkey> (since `invalid` was handled) or perform a single-pass parse that
collects invalid inputs and returns ConfigError::InvalidArguments with the
actual invalid_values; update the code around the partition() usage to return
the real invalid strings instead of the generic "failed to parse tree_ids".
| let inputs_json = match job.inputs.to_json(&job.tree_id, job.seq) { | ||
| Ok(inputs_json) => inputs_json, | ||
| Err(error) => { | ||
| let _ = job | ||
| .result_tx | ||
| .send(ProofJobResult { | ||
| seq: job.seq, | ||
| result: Err(format!("Failed to serialize proof inputs: {}", error)), | ||
| old_root: [0u8; 32], | ||
| new_root: [0u8; 32], | ||
| proof_duration_ms: 0, | ||
| round_trip_ms: 0, | ||
| submitted_at: std::time::Instant::now(), | ||
| }) | ||
| .await; | ||
| return; | ||
| } | ||
| }; |
There was a problem hiding this comment.
Use one timestamp/logging path for serialization failures.
Line 209 stamps submitted_at with Instant::now(), while Line 306 reuses round_trip_start for the same “serialization failed before submit” outcome. That makes latency metrics path-dependent, and these branches also go silent if result_tx is closed. Factor this into one helper with a single timestamp convention and an error! before the send.
Suggested direction
+async fn send_serialization_failure(
+ result_tx: &mpsc::Sender<ProofJobResult>,
+ seq: u64,
+ error: &ProverClientError,
+ submitted_at: std::time::Instant,
+ round_trip_ms: u64,
+) {
+ error!("Failed to serialize proof inputs for seq={}: {}", seq, error);
+ let _ = result_tx
+ .send(ProofJobResult {
+ seq,
+ result: Err(format!("Failed to serialize proof inputs: {}", error)),
+ old_root: [0u8; 32],
+ new_root: [0u8; 32],
+ proof_duration_ms: 0,
+ round_trip_ms,
+ submitted_at,
+ })
+ .await;
+}Also applies to: 296-311
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@forester/src/processor/v2/proof_worker.rs` around lines 197 - 214, The
serialization-failure path uses two different timestamps (submitted_at =
Instant::now() here and round_trip_start elsewhere) and silently ignores closed
result_tx; consolidate this by adding a helper (e.g., send_error_result or
report_serialization_failure) used by both the job.inputs.to_json error branch
and the other serialization-before-submit branch that: captures one
Instant::now() into a single variable (round_trip_start or submitted_at) at the
failure site, logs the error with error! including the error details, constructs
the ProofJobResult with that single timestamp and consistent
round_trip_ms/proof_duration_ms values, and attempts to send via job.result_tx
while handling a closed channel by logging the send failure; update both
locations (the error branch around job.inputs.to_json and the other branch that
currently reuses round_trip_start) to call this helper so latency metrics and
logging are consistent.
| .build() | ||
| .unwrap(); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Prefer expect(...) over bare unwrap() in these setup builders.
These are high-churn setup steps, and the rest of the file already uses descriptive expects. Keeping the message here makes it much faster to tell whether InitializeRentFreeConfig::build() failed or the later transaction path broke.
Minimal tweak
- .build()
- .unwrap();
+ .build()
+ .expect("InitializeRentFreeConfig::build should succeed");Also applies to: 508-509, 750-751
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@forester/tests/test_compressible_pda.rs` around lines 312 - 313, Replace the
bare .unwrap() calls on builder results with .expect(...) that includes a
descriptive message; specifically change the .build().unwrap() invocations that
call InitializeRentFreeConfig::build() (and the two other similar setup builder
calls in this test) to .build().expect("InitializeRentFreeConfig::build()
failed") or an equivalent message that names the failing builder so failures in
setup are clear. Ensure each replaced call uses a distinct message if helpful to
identify which builder failed.
fbf48e5 to
46b80e9
Compare
46b80e9 to
786422b
Compare
786422b to
2357022
Compare
2357022 to
4058e37
Compare
4058e37 to
82dc761
Compare
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
forester/src/epoch_manager.rs (1)
576-625:⚠️ Potential issue | 🔴 CriticalDon’t fan out epochs concurrently while V2 processors are still cached per tree.
epoch_tasksnow runs previous/current epochs at the same time, butget_or_create_state_processor()/get_or_create_address_processor()still reuse one mutableQueueProcessorper tree and retag it in place withupdate_epoch()on reuse. That lets epoch N+1 rewritecontext.epoch,current_root,seq,cached_state, and the proof cache while epoch N is still using the same processor, so later V2 jobs can be built against the wrong epoch/root context. Either keep epoch processing serial here or key the processor caches by(tree, epoch).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@forester/src/epoch_manager.rs` around lines 576 - 625, The current loop fans out epochs via epoch_tasks (FuturesUnordered) while get_or_create_state_processor() / get_or_create_address_processor() reuse and retag a single mutable QueueProcessor via update_epoch(), causing concurrent epochs to corrupt shared processor context; fix by either (A) making epoch processing serial: remove FuturesUnordered concurrency and await process_epoch(epoch) directly in the select branch so each epoch completes before the next starts (affecting epoch_tasks/epoch_tasks.push usage and the async move closure around process_epoch), or (B) make the processor caches safe for concurrent epochs by keying processors by (tree, epoch) instead of reusing a single instance: change get_or_create_state_processor() and get_or_create_address_processor() to accept epoch (or return a per-epoch cloned QueueProcessor) and stop calling update_epoch() to retag a shared processor; choose one approach and apply corresponding changes to process_epoch, update_epoch, and the processor cache logic (QueueProcessor, get_or_create_*, get_or_create_*) so no two concurrent epoch tasks mutate the same processor state.prover/client/src/proof_types/batch_update/proof_inputs.rs (1)
183-190:⚠️ Potential issue | 🟠 MajorDon’t leave a panic in the
Resultpath.
merkle_proof.try_into().unwrap()still aborts on malformed proofs even though this helper now returnsResult. Please turn that conversion failure into aProverClientErrorso bad proof data is reported instead of crashing the caller.🔧 Suggested fix
- let merkle_proof_array = merkle_proof.try_into().unwrap(); + let proof_len = merkle_proof.len(); + let merkle_proof_array: [[u8; 32]; HEIGHT] = merkle_proof.try_into().map_err(|_| { + ProverClientError::GenericError(format!( + "Invalid merkle proof length: got {}, expected {}", + proof_len, HEIGHT + )) + })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@prover/client/src/proof_types/batch_update/proof_inputs.rs` around lines 183 - 190, The code currently uses merkle_proof.try_into().unwrap(), which can panic on malformed proofs; change this to propagate a ProverClientError instead of unwrapping: attempt the conversion (merkle_proof.try_into()) and map the Err case into an appropriate ProverClientError (with context like "invalid merkle proof" and the original error) before assigning merkle_proof_array, so subsequent calls (e.g., compute_root_from_merkle_proof(nullifier, &merkle_proof_array, *index as usize)) run only when the conversion succeeded.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@forester/src/epoch_manager.rs`:
- Around line 680-681: The cleanup call to
self.shutdown_new_tree_workers(NEW_TREE_WORKER_SHUTDOWN_TIMEOUT) in
EpochManager::run() is never executed when the outer run_service() path
cancels/drops epoch_manager.run(), leaving NewTreeWorker threads detached; fix
by ensuring shutdown_new_tree_workers is invoked on service shutdown as well:
either call EpochManager::shutdown_new_tree_workers(...) from the run_service()
shutdown branch before dropping epoch_manager, or refactor EpochManager::run()
so it owns and awaits shutdown completion (i.e., make run() return only after
shutdown_new_tree_workers completes) so NewTreeWorker handles are always joined;
locate use sites around run_service(), EpochManager::run(),
shutdown_new_tree_workers, NEW_TREE_WORKER_SHUTDOWN_TIMEOUT, and NewTreeWorker
to implement the chosen approach.
- Around line 914-994: The new-tree worker spawned in process_epoch is detached
and registered via register_new_tree_worker(NewTreeWorker { ... }) but not
awaited before the epoch's work reporting and cleanup, so ensure new-tree
processing is included in the epoch's join set or explicitly awaited: modify
process_epoch to (a) add the spawned thread_handle/completion future from the
NewTreeWorker into the same join/await collection used for other per-epoch tasks
(or convert the oneshot completion_rx into a future and push it onto the epoch
join set), or (b) before calling update_metrics_and_counts() / per-epoch cleanup
logic, await the NewTreeWorker completion (completion_rx or join handle) for any
registered new trees; reference the spawned closure that calls process_queue and
the register_new_tree_worker/NewTreeWorker entries to locate where to add the
join/await.
In `@forester/src/forester_status.rs`:
- Around line 672-689: Two identical V1 queue parsing blocks (around the uses of
queue_account) should be consolidated: extract a helper function (e.g.,
parse_v1_queue_stats or parse_queue_account_v1) that accepts the account bytes
(or &Account) and a log label/string, calls unsafe
parse_hash_set_from_bytes::<QueueAccount>(&acc.data), computes len by filtering
for cell.sequence_number.is_none(), gets cap via hs.get_capacity(), and returns
(Option<u64>, Option<u64>); on Err call warn!(?error, "{label}: Failed to parse
StateV1 queue hash set") or similar. Replace both duplicated match chains with
calls to that helper and unwrap_or((None,None)) as before.
In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs`:
- Around line 202-203: Add an explicit guard that rejects zkp_batch_size == 0
before using it (before the line that sets batch_len = zkp_batch_size) so we
don't produce a no-op payload with new_root left zeroed; e.g., check
zkp_batch_size == 0 and return an Err (or panic) with a clear message like
"zero-sized ZKP batch not allowed" to fail fast and ensure callers handle the
invalid batch size.
- Around line 403-404: The IndexedChangelogEntry's changelog_index is set
incorrectly using staged_indexed_changelog.len(); change both assignments (where
IndexedChangelogEntry is constructed) to refer to the raw changelog position by
using staged_changelog.len() - 1 immediately after pushing the new entry into
staged_changelog (not staged_indexed_changelog.len()); update the two
occurrences that currently reference staged_indexed_changelog.len() so
changelog_index = staged_changelog.len() - 1.
In `@prover/client/src/prover.rs`:
- Around line 175-202: The local binding spawn_result is redundant because the
async block returns unit; remove the spawn_result variable and directly await
the async block (or simply execute the async block without assigning) that
spawns the process (Command::new(&prover_path).arg("start-prover").spawn()),
calls wait_for_prover_health(..., &mut child).await, and then calls
monitor_prover_child(child) or kills/waits on the child on error; after that
keep the IS_LOADING.store(false, Ordering::Release) call as-is. Locate the block
around spawn_result, Command::new, wait_for_prover_health, monitor_prover_child,
and IS_LOADING to make the change.
In `@sdk-libs/client/src/indexer/photon_indexer.rs`:
- Around line 1145-1162: The second validation that compares proof_len to
expected_siblings is redundant because expected_total is defined as
STATE_MERKLE_TREE_CANOPY_DEPTH + expected_siblings and the prior check ensures
x.proof.len() == expected_total; remove the dead block (the if that returns
IndexerError::InvalidParameters comparing proof_len and expected_siblings) and
retain the initial length check using expected_total, then compute proof_len =
x.proof.len() - STATE_MERKLE_TREE_CANOPY_DEPTH without re-checking equality.
- Around line 1693-1694: tree_next_insertion_index is incorrectly initialized
from aq.start_index; change it to use the tree's actual next insertion index
instead. Locate the AddressQueueWithContext construction where
tree_next_insertion_index is set (currently using aq.start_index) and replace
that with the tree's rightmost/next index obtained from the tree account or
staging tree (call the tree's next_index() or query the tree account state like
the test_indexer does). If the Photon API does not return the tree's next_index,
update the code path to query the tree account state (or extend the API to
include next_index) and use that value when initializing
tree_next_insertion_index so staging trees reflect the real insertion point.
In `@sdk-libs/client/src/interface/load_accounts.rs`:
- Around line 163-165: The ATA enumeration resets per chunk causing
MissingAtaContext to report wrong indices; update build_ata_load to accept a
start_index: usize and use it when creating AtaContext (e.g., in
AtaContext::from_interface calls add start_index + i) so contexts reflect their
absolute position, then when iterating cold_atas.chunks(...) compute the chunk's
start_index (running offset or chunk_index * MAX_ATAS_PER_IX) and pass it into
build_ata_load; apply the same change to the other call site referenced around
lines 401-405 so all ATA load invocations supply the correct flattened index.
- Around line 73-75: The concurrent proof pipelines use join! which waits for
all branches even if one fails; replace the join! at the proof orchestration
site with try_join! so the first proof-fetch error cancels the other futures and
returns immediately, and ensure the streams that call
.buffered(PROOF_FETCH_CONCURRENCY) (the PDA, ATA and mint proof helpers) are
compatible with cancellation; alternatively, if you intended a global cap
instead of per-stream caps, implement a shared semaphore/permit budget used by
the helpers and keep the buffered(…) limits aligned with that shared budget.
In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 98-111: build_compressed_proof currently assumes body is raw gnark
proof JSON but needs to handle the timed envelope { "proof": ...,
"proof_duration_ms": ... } too; modify build_compressed_proof (or create a
shared helper used by _get_validity_proof_v1_implementation) to first check if
the incoming body is an envelope (e.g., parse into a minimal map/object and if
it contains a "proof" key extract that value), then pass the extracted inner
proof JSON to deserialize_gnark_proof_json; keep the existing error mapping to
IndexerError and then continue with proof_from_json_struct and compress_proof as
before so both raw and wrapped responses succeed.
---
Outside diff comments:
In `@forester/src/epoch_manager.rs`:
- Around line 576-625: The current loop fans out epochs via epoch_tasks
(FuturesUnordered) while get_or_create_state_processor() /
get_or_create_address_processor() reuse and retag a single mutable
QueueProcessor via update_epoch(), causing concurrent epochs to corrupt shared
processor context; fix by either (A) making epoch processing serial: remove
FuturesUnordered concurrency and await process_epoch(epoch) directly in the
select branch so each epoch completes before the next starts (affecting
epoch_tasks/epoch_tasks.push usage and the async move closure around
process_epoch), or (B) make the processor caches safe for concurrent epochs by
keying processors by (tree, epoch) instead of reusing a single instance: change
get_or_create_state_processor() and get_or_create_address_processor() to accept
epoch (or return a per-epoch cloned QueueProcessor) and stop calling
update_epoch() to retag a shared processor; choose one approach and apply
corresponding changes to process_epoch, update_epoch, and the processor cache
logic (QueueProcessor, get_or_create_*, get_or_create_*) so no two concurrent
epoch tasks mutate the same processor state.
In `@prover/client/src/proof_types/batch_update/proof_inputs.rs`:
- Around line 183-190: The code currently uses merkle_proof.try_into().unwrap(),
which can panic on malformed proofs; change this to propagate a
ProverClientError instead of unwrapping: attempt the conversion
(merkle_proof.try_into()) and map the Err case into an appropriate
ProverClientError (with context like "invalid merkle proof" and the original
error) before assigning merkle_proof_array, so subsequent calls (e.g.,
compute_root_from_merkle_proof(nullifier, &merkle_proof_array, *index as usize))
run only when the conversion succeeded.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: eab8c888-7dc8-4fd1-a13f-aa56687cb167
⛔ Files ignored due to path filters (6)
Cargo.lockis excluded by!**/*.lockand included by noneprogram-tests/utils/src/actions/legacy/instructions/transfer2.rsis excluded by none and included by noneprogram-tests/utils/src/e2e_test_env.rsis excluded by none and included by noneprogram-tests/utils/src/mock_batched_forester.rsis excluded by none and included by noneprogram-tests/utils/src/test_batch_forester.rsis excluded by none and included by nonesdk-tests/csdk-anchor-full-derived-test/tests/basic_test.rsis excluded by none and included by none
📒 Files selected for processing (38)
forester/src/epoch_manager.rsforester/src/forester_status.rsforester/src/metrics.rsforester/src/priority_fee.rsforester/src/processor/v2/helpers.rsforester/src/processor/v2/processor.rsforester/src/processor/v2/proof_worker.rsforester/src/processor/v2/strategy/address.rsforester/tests/e2e_test.rsforester/tests/legacy/batched_state_async_indexer_test.rsforester/tests/legacy/test_utils.rsforester/tests/test_batch_append_spent.rsforester/tests/test_indexer_interface.rsforester/tests/test_utils.rsprover/client/src/constants.rsprover/client/src/errors.rsprover/client/src/helpers.rsprover/client/src/proof.rsprover/client/src/proof_client.rsprover/client/src/proof_types/batch_address_append/proof_inputs.rsprover/client/src/proof_types/batch_append/proof_inputs.rsprover/client/src/proof_types/batch_update/proof_inputs.rsprover/client/src/prover.rsprover/client/tests/batch_address_append.rssdk-libs/client/src/indexer/photon_indexer.rssdk-libs/client/src/indexer/types/proof.rssdk-libs/client/src/indexer/types/queue.rssdk-libs/client/src/interface/initialize_config.rssdk-libs/client/src/interface/instructions.rssdk-libs/client/src/interface/load_accounts.rssdk-libs/client/src/interface/pack.rssdk-libs/client/src/local_test_validator.rssdk-libs/client/src/utils.rssdk-libs/program-test/src/indexer/test_indexer.rssdk-libs/sdk-types/src/interface/account/token_seeds.rssdk-libs/sdk-types/src/interface/program/decompression/pda.rssdk-libs/sdk-types/src/interface/program/decompression/processor.rssdk-libs/sdk-types/src/interface/program/decompression/token.rs
| let (queue_len, queue_cap) = queue_account | ||
| .map(|acc| { | ||
| unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } | ||
| .ok() | ||
| .map(|hs| { | ||
| match unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } { | ||
| Ok(hs) => { | ||
| let len = hs | ||
| .iter() | ||
| .filter(|(_, cell)| cell.sequence_number.is_none()) | ||
| .count() as u64; | ||
| let cap = hs.get_capacity() as u64; | ||
| (len, cap) | ||
| }) | ||
| .unwrap_or((0, 0)) | ||
| (Some(len), Some(cap)) | ||
| } | ||
| Err(error) => { | ||
| warn!(?error, "Failed to parse StateV1 queue hash set"); | ||
| (None, None) | ||
| } | ||
| } | ||
| }) | ||
| .map(|(l, c)| (Some(l), Some(c))) | ||
| .unwrap_or((None, None)); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Extract duplicated V1 queue parsing logic into one helper.
The two blocks are functionally identical except the log label. Consolidating them will reduce drift risk and simplify future parsing changes.
♻️ Proposed refactor
+fn parse_v1_queue_stats(acc: &Account, tree_kind: &str) -> (Option<u64>, Option<u64>) {
+ match unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } {
+ Ok(hs) => {
+ let len = hs
+ .iter()
+ .filter(|(_, cell)| cell.sequence_number.is_none())
+ .count() as u64;
+ let cap = hs.get_capacity() as u64;
+ (Some(len), Some(cap))
+ }
+ Err(error) => {
+ warn!(?error, %tree_kind, "Failed to parse V1 queue hash set");
+ (None, None)
+ }
+ }
+}
...
- let (queue_len, queue_cap) = queue_account
- .map(|acc| {
- match unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } {
- Ok(hs) => {
- let len = hs
- .iter()
- .filter(|(_, cell)| cell.sequence_number.is_none())
- .count() as u64;
- let cap = hs.get_capacity() as u64;
- (Some(len), Some(cap))
- }
- Err(error) => {
- warn!(?error, "Failed to parse StateV1 queue hash set");
- (None, None)
- }
- }
- })
- .unwrap_or((None, None));
+ let (queue_len, queue_cap) = queue_account
+ .as_ref()
+ .map(|acc| parse_v1_queue_stats(acc, "StateV1"))
+ .unwrap_or((None, None));
...
- let (queue_len, queue_cap) = queue_account
- .map(|acc| {
- match unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } {
- Ok(hs) => {
- let len = hs
- .iter()
- .filter(|(_, cell)| cell.sequence_number.is_none())
- .count() as u64;
- let cap = hs.get_capacity() as u64;
- (Some(len), Some(cap))
- }
- Err(error) => {
- warn!(?error, "Failed to parse AddressV1 queue hash set");
- (None, None)
- }
- }
- })
- .unwrap_or((None, None));
+ let (queue_len, queue_cap) = queue_account
+ .as_ref()
+ .map(|acc| parse_v1_queue_stats(acc, "AddressV1"))
+ .unwrap_or((None, None));Also applies to: 729-746
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@forester/src/forester_status.rs` around lines 672 - 689, Two identical V1
queue parsing blocks (around the uses of queue_account) should be consolidated:
extract a helper function (e.g., parse_v1_queue_stats or parse_queue_account_v1)
that accepts the account bytes (or &Account) and a log label/string, calls
unsafe parse_hash_set_from_bytes::<QueueAccount>(&acc.data), computes len by
filtering for cell.sequence_number.is_none(), gets cap via hs.get_capacity(),
and returns (Option<u64>, Option<u64>); on Err call warn!(?error, "{label}:
Failed to parse StateV1 queue hash set") or similar. Replace both duplicated
match chains with calls to that helper and unwrap_or((None,None)) as before.
| let batch_len = zkp_batch_size; | ||
| for (name, len) in [ |
There was a problem hiding this comment.
Reject zero-sized batches explicitly.
A zkp_batch_size of 0 currently passes guards and can produce an invalid no-op payload (new_root stays zeroed). Add a hard check up front.
Suggested fix
pub fn get_batch_address_append_circuit_inputs<const HEIGHT: usize>(
@@
) -> Result<BatchAddressAppendInputs, ProverClientError> {
+ if zkp_batch_size == 0 {
+ return Err(ProverClientError::GenericError(
+ "zkp_batch_size must be greater than 0".to_string(),
+ ));
+ }
let batch_len = zkp_batch_size;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs` around
lines 202 - 203, Add an explicit guard that rejects zkp_batch_size == 0 before
using it (before the line that sets batch_len = zkp_batch_size) so we don't
produce a no-op payload with new_root left zeroed; e.g., check zkp_batch_size ==
0 and return an Err (or panic) with a clear message like "zero-sized ZKP batch
not allowed" to fail fast and ensure callers handle the invalid batch size.
| changelog_index: staged_indexed_changelog.len(), //change_log_index, | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Find IndexedChangelogEntry definition and changelog_index docs/usages:"
rg -n -C4 'struct IndexedChangelogEntry|changelog_index' --type rust
echo
echo "Find patch_indexed_changelogs implementation/signature:"
rg -n -C6 'fn patch_indexed_changelogs|patch_indexed_changelogs\(' --type rustRepository: Lightprotocol/light-protocol
Length of output: 50385
🏁 Script executed:
# Get context around lines 403-404 and 508-509 in the file
head -520 prover/client/src/proof_types/batch_address_append/proof_inputs.rs | tail -130Repository: Lightprotocol/light-protocol
Length of output: 5417
🏁 Script executed:
# Look at lines around 260-280 to understand initialization
sed -n '260,280p' prover/client/src/proof_types/batch_address_append/proof_inputs.rs
# Look at the loop structure around line 265 to understand initial_changelog_len
sed -n '260,275p' prover/client/src/proof_types/batch_address_append/proof_inputs.rsRepository: Lightprotocol/light-protocol
Length of output: 1688
🏁 Script executed:
# Look at the full loop iteration including where changelog_index variable is used
sed -n '265,310p' prover/client/src/proof_types/batch_address_append/proof_inputs.rsRepository: Lightprotocol/light-protocol
Length of output: 1867
🏁 Script executed:
# Check what happens with patch_indexed_changelogs - does it consume the changelog_index variable?
# Look at sparse-merkle-tree/src/indexed_changelog.rs to see what happens to changelog_index param
sed -n '25,35p' sparse-merkle-tree/src/indexed_changelog.rs
# Now let's look at how the patched changelog_index is used after patching
sed -n '295,410p' prover/client/src/proof_types/batch_address_append/proof_inputs.rsRepository: Lightprotocol/light-protocol
Length of output: 5652
🏁 Script executed:
# Let's verify the exact state of staged_changelog when the IndexedChangelogEntry is created
# by looking at what happens between the push at line 370 and the creation at line 403
sed -n '365,405p' prover/client/src/proof_types/batch_address_append/proof_inputs.rsRepository: Lightprotocol/light-protocol
Length of output: 1600
🏁 Script executed:
# Verify the second occurrence at line 508 has the same pattern
sed -n '430,515p' prover/client/src/proof_types/batch_address_append/proof_inputs.rsRepository: Lightprotocol/light-protocol
Length of output: 3783
🏁 Script executed:
# Let's verify the test expectations one more time to confirm the semantics
# Look at test_forester.rs line 625-626 where it validates changelog_index
sed -n '620,630p' program-tests/utils/src/test_forester.rsRepository: Lightprotocol/light-protocol
Length of output: 429
🏁 Script executed:
# Confirm initial_changelog_len and understand the context better
sed -n '250,270p' prover/client/src/proof_types/batch_address_append/proof_inputs.rsRepository: Lightprotocol/light-protocol
Length of output: 796
Fix changelog_index to reference raw changelog position, not indexed changelog length.
The changelog_index field in IndexedChangelogEntry must point to the position in the raw ConcurrentMerkleTree changelog, not the indexed changelog. Both lines 403 and 508 incorrectly use staged_indexed_changelog.len(). After each entry is pushed to staged_changelog (lines 370 and 498), the changelog position is staged_changelog.len() - 1.
Apply to both locations
let low_element_changelog_entry = IndexedChangelogEntry {
element: new_low_element_raw,
proof: low_element_changelog_proof,
- changelog_index: staged_indexed_changelog.len(), //change_log_index,
+ changelog_index: staged_changelog.len() - 1,
};
let new_element_changelog_entry = IndexedChangelogEntry {
element: new_element_raw,
proof: merkle_proof_array,
- changelog_index: staged_indexed_changelog.len(),
+ changelog_index: staged_changelog.len() - 1,
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| changelog_index: staged_indexed_changelog.len(), //change_log_index, | |
| }; | |
| let low_element_changelog_entry = IndexedChangelogEntry { | |
| element: new_low_element_raw, | |
| proof: low_element_changelog_proof, | |
| changelog_index: staged_changelog.len() - 1, | |
| }; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs` around
lines 403 - 404, The IndexedChangelogEntry's changelog_index is set incorrectly
using staged_indexed_changelog.len(); change both assignments (where
IndexedChangelogEntry is constructed) to refer to the raw changelog position by
using staged_changelog.len() - 1 immediately after pushing the new entry into
staged_changelog (not staged_indexed_changelog.len()); update the two
occurrences that currently reference staged_indexed_changelog.len() so
changelog_index = staged_changelog.len() - 1.
| let expected_siblings = | ||
| STATE_MERKLE_TREE_HEIGHT - STATE_MERKLE_TREE_CANOPY_DEPTH; | ||
| let expected_total = STATE_MERKLE_TREE_CANOPY_DEPTH + expected_siblings; | ||
| if x.proof.len() != expected_total { | ||
| return Err(IndexerError::InvalidParameters(format!( | ||
| "Merkle proof length ({}) is less than canopy depth ({})", | ||
| proof_vec.len(), | ||
| STATE_MERKLE_TREE_CANOPY_DEPTH, | ||
| "Merkle proof length ({}) does not match expected total proof length ({})", | ||
| x.proof.len(), | ||
| expected_total, | ||
| ))); | ||
| } | ||
| let proof_len = x.proof.len() - STATE_MERKLE_TREE_CANOPY_DEPTH; | ||
| if proof_len != expected_siblings { | ||
| return Err(IndexerError::InvalidParameters(format!( | ||
| "Merkle proof sibling count ({}) does not match expected sibling count ({})", | ||
| proof_len, | ||
| expected_siblings, | ||
| ))); | ||
| } |
There was a problem hiding this comment.
Redundant validation: second check can never fail after first check passes.
Let me walk through the math here:
expected_siblings = HEIGHT - CANOPY_DEPTHexpected_total = CANOPY_DEPTH + expected_siblings = HEIGHT- First check ensures
proof.len() == expected_total - After passing:
proof_len = proof.len() - CANOPY_DEPTH = expected_total - CANOPY_DEPTH = expected_siblings
The second check at lines 1156-1161 is dead code—it will always pass because the algebraic relationship guarantees proof_len == expected_siblings once the first check succeeds.
🧹 Suggested simplification
let expected_siblings =
STATE_MERKLE_TREE_HEIGHT - STATE_MERKLE_TREE_CANOPY_DEPTH;
-let expected_total = STATE_MERKLE_TREE_CANOPY_DEPTH + expected_siblings;
-if x.proof.len() != expected_total {
+if x.proof.len() != STATE_MERKLE_TREE_HEIGHT {
return Err(IndexerError::InvalidParameters(format!(
"Merkle proof length ({}) does not match expected total proof length ({})",
x.proof.len(),
- expected_total,
+ STATE_MERKLE_TREE_HEIGHT,
)));
}
-let proof_len = x.proof.len() - STATE_MERKLE_TREE_CANOPY_DEPTH;
-if proof_len != expected_siblings {
- return Err(IndexerError::InvalidParameters(format!(
- "Merkle proof sibling count ({}) does not match expected sibling count ({})",
- proof_len,
- expected_siblings,
- )));
-}
+let proof_len = expected_siblings;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk-libs/client/src/indexer/photon_indexer.rs` around lines 1145 - 1162, The
second validation that compares proof_len to expected_siblings is redundant
because expected_total is defined as STATE_MERKLE_TREE_CANOPY_DEPTH +
expected_siblings and the prior check ensures x.proof.len() == expected_total;
remove the dead block (the if that returns IndexerError::InvalidParameters
comparing proof_len and expected_siblings) and retain the initial length check
using expected_total, then compute proof_len = x.proof.len() -
STATE_MERKLE_TREE_CANOPY_DEPTH without re-checking equality.
| start_index: aq.start_index, | ||
| tree_next_insertion_index: aq.start_index, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check what fields the photon_api AddressQueue response type provides
rg -n "struct.*AddressQueue" --type rust -A 30 | head -80
# Also check if there's a tree_next_insertion_index or similar field in photon_api types
rg -n "next_insertion_index|tree_next_index|next_index" --type rust -g '**photon**' -C 2Repository: Lightprotocol/light-protocol
Length of output: 8091
🏁 Script executed:
# Find NewAddressProofWithContext definition
rg -n "struct.*NewAddressProofWithContext" --type rust -A 25
# Check how tree_next_insertion_index is used
rg -n "tree_next_insertion_index" --type rust -B 2 -A 2Repository: Lightprotocol/light-protocol
Length of output: 6134
🏁 Script executed:
# Search for documentation about address queue and tree insertion semantics
find . -name "*.md" -type f | xargs grep -l "tree_next_insertion_index\|start_index\|address.*queue" 2>/dev/null | head -5
# Look for comments or docs about how start_index relates to tree insertion
rg -n "start_index" --type rust -B 2 -A 2 | grep -i "tree\|insert\|next" | head -20Repository: Lightprotocol/light-protocol
Length of output: 2337
🏁 Script executed:
# Look for AddressWithTree or similar type that might include tree state
rg -n "struct.*Address.*Tree|struct.*AddressTree" --type rust -A 20 | grep -E "next_index|insertion|rightmost" -B 5 -A 5
# Check if there's additional tree context returned by Photon API
rg -n "AddressWithTree\|address.*tree.*state" --type rust -A 15 | head -80Repository: Lightprotocol/light-protocol
Length of output: 7403
🏁 Script executed:
# Check if merkle_tree account state is accessible in this context
rg -n "merkle_tree.*next_index\|tree_next_index\|right_most" --type rust -B 3 -A 3 | grep -E "photon|indexer|AddressQueueWithContext" -B 5 -A 5
# Look at the full context around line 1694 to see what other data is available
sed -n '1680,1710p' sdk-libs/client/src/indexer/photon_indexer.rsRepository: Lightprotocol/light-protocol
Length of output: 1350
Fix tree_next_insertion_index to use the tree's actual insertion point, not the queue's start index.
tree_next_insertion_index initializes sparse staging trees and must represent the tree's current insertion point, not the queue batch's starting position. The field comment in AddressQueueWithContext clearly states: "Sparse tree insertion point / next index used to initialize staging trees."
Currently, both start_index and tree_next_insertion_index are set to aq.start_index, which is semantically incorrect. The test indexer correctly derives tree_next_insertion_index from the tree's rightmost index (see test_indexer.rs:731), and the forester uses the staging tree's next_index().
The Photon API response doesn't provide the tree's next insertion point. This requires either querying the tree account's state or enhancing the Photon API response to include the tree's next_index field.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk-libs/client/src/indexer/photon_indexer.rs` around lines 1693 - 1694,
tree_next_insertion_index is incorrectly initialized from aq.start_index; change
it to use the tree's actual next insertion index instead. Locate the
AddressQueueWithContext construction where tree_next_insertion_index is set
(currently using aq.start_index) and replace that with the tree's rightmost/next
index obtained from the tree account or staging tree (call the tree's
next_index() or query the tree account state like the test_indexer does). If the
Photon API does not return the tree's next_index, update the code path to query
the tree account state (or extend the API to include next_index) and use that
value when initializing tree_next_insertion_index so staging trees reflect the
real insertion point.
| const MAX_ATAS_PER_IX: usize = 8; | ||
| const MAX_PDAS_PER_IX: usize = 8; | ||
| const PROOF_FETCH_CONCURRENCY: usize = 8; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n sdk-libs/client/src/interface/load_accounts.rs | head -100Repository: Lightprotocol/light-protocol
Length of output: 4138
🏁 Script executed:
cat -n sdk-libs/client/src/interface/load_accounts.rs | sed -n '130,150p'Repository: Lightprotocol/light-protocol
Length of output: 966
🏁 Script executed:
cat -n sdk-libs/client/src/interface/load_accounts.rs | sed -n '260,295p'Repository: Lightprotocol/light-protocol
Length of output: 1267
🏁 Script executed:
rg "buffered\|join\!" sdk-libs/client/src/interface/load_accounts.rs -A 2 -B 2Repository: Lightprotocol/light-protocol
Length of output: 168
🏁 Script executed:
rg "buffered|join" sdk-libs/client/src/interface/load_accounts.rs -A 2 -B 2Repository: Lightprotocol/light-protocol
Length of output: 599
🏁 Script executed:
cat -n sdk-libs/client/src/interface/load_accounts.rs | sed -n '294,320p'Repository: Lightprotocol/light-protocol
Length of output: 980
🏁 Script executed:
web_search "futures::join vs futures::try_join difference fail fast"Repository: Lightprotocol/light-protocol
Length of output: 122
🏁 Script executed:
web_search "futures stream buffered concurrency semantic concurrent limits"Repository: Lightprotocol/light-protocol
Length of output: 122
🌐 Web query:
futures::join vs futures::try_join difference fail fast behavior
💡 Result:
-
futures::join!: polls all provided futures concurrently and waits for all of them to complete, returning a tuple of their outputs. It does not fail fast—even if one future’s output is anErr,join!still waits for the others because it doesn’t interpret the outputs. [1] -
futures::try_join!: likejoin!, but for futures that returnResult. It returnsOk((...))if all succeed, and fails fast: as soon as any future resolves toErr(e),try_join!returns immediately with that error and the remaining futures are canceled by being dropped (i.e., they stop being polled). [2] That “cancellation” is cooperative—any cleanup depends on the futures being cancellation-safe. [3]
Important nuance: fail-fast only stops the inlined futures being driven by the macro. If you previously spawned work into separate tasks, dropping the future/handle used by try_join! won’t necessarily stop the spawned tasks (they may keep running unless explicitly aborted/canceled). [3]
Sources:
[1] core::future::join docs/source (join polls multiple futures and returns a tuple) https://doc.rust-lang.org/src/core/future/join.rs.html
[2] futures::try_join! docs (returns early on first error) https://docs.rs/futures/latest/futures/macro.try_join.html
[3] “Cancelling async Rust” (try_join cancels the other futures; cancellation safety discussion) https://sunshowers.io/posts/cancelling-async-rust/
🌐 Web query:
futures stream buffered concurrency how many concurrent requests allowed
💡 Result:
In Rust’s futures::stream::StreamExt, the concurrency is explicitly bounded by the argument you pass:
stream.buffered(n)/stream.buffer_unordered(n)will have at mostnfutures “in flight” (being polled concurrently) at a time; as one completes, the next is started.bufferedpreserves output order;buffer_unordereddoes not. [1]- If you’re using
for_each_concurrent(limit, ...), thelimitis anOption<usize>:Some(n)bounds concurrency ton, whileNonemeans unbounded (limited only by resources / upstream). [1] - So there is no fixed “how many concurrent requests are allowed” by the combinator itself—it’s whatever
n(orlimit) you choose. Typical HTTP examples use these specifically to cap concurrent requests. [2]
Sources:
[1] futures StreamExt docs showing buffered(n), buffer_unordered(n), and for_each_concurrent(limit: Into<Option<usize>>) https://docs.rs/futures/latest/futures/stream/struct.Buffered.html
[2] Example discussion recommending buffer / buffer_unordered to limit concurrent HTTP requests https://stackoverflow.com/questions/51044467/how-can-i-perform-parallel-asynchronous-http-get-requests-with-reqwest
🏁 Script executed:
rg "Semaphore|Permit|Arc|Mutex" sdk-libs/client/src/interface/load_accounts.rsRepository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
cat -n sdk-libs/client/src/interface/load_accounts.rs | sed -n '1,90p' | grep -A5 -B5 "TODO\|FIXME\|NOTE"Repository: Lightprotocol/light-protocol
Length of output: 556
Use try_join! for fail-fast proof orchestration.
Three buffered streams (PDA, ATA, mint proofs) run concurrently via join! at line 135. Each applies .buffered(PROOF_FETCH_CONCURRENCY) (lines 269, 289), so up to 24 proof requests can be in flight. Because join! waits for all branches to complete, an error in one stream doesn't cancel the others—they keep polling and consuming resources until completion.
Switching to try_join! returns immediately on the first proof-fetch error and cancels the remaining futures, which is the intended behavior when any critical account load fails.
Proposed fix
- let (pda_proofs, ata_proofs, mint_proofs) = futures::join!(
+ let (pda_proofs, ata_proofs, mint_proofs) = futures::try_join!(
fetch_proof_batches(&pda_hashes, indexer),
fetch_proofs_batched(&ata_hashes, MAX_ATAS_PER_IX, indexer),
fetch_individual_proofs(&mint_hashes, indexer),
- );
-
- let pda_proofs = pda_proofs?;
- let ata_proofs = ata_proofs?;
- let mint_proofs = mint_proofs?;
+ )?;If the intended ceiling is 8 total proof requests rather than 8 per account type, add a shared semaphore/permit budget inside the helpers to enforce the global cap across all three pipelines.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk-libs/client/src/interface/load_accounts.rs` around lines 73 - 75, The
concurrent proof pipelines use join! which waits for all branches even if one
fails; replace the join! at the proof orchestration site with try_join! so the
first proof-fetch error cancels the other futures and returns immediately, and
ensure the streams that call .buffered(PROOF_FETCH_CONCURRENCY) (the PDA, ATA
and mint proof helpers) are compatible with cancellation; alternatively, if you
intended a global cap instead of per-stream caps, implement a shared
semaphore/permit budget used by the helpers and keep the buffered(…) limits
aligned with that shared budget.
| // 3. ATA loads (CreateAssociatedTokenAccount + Transfer2) - requires mint to exist | ||
| let ata_chunks: Vec<_> = cold_atas.chunks(MAX_ATAS_PER_IX).collect(); | ||
| for (chunk, proof) in ata_chunks.into_iter().zip(ata_proofs) { | ||
| for (chunk, proof) in cold_atas.chunks(MAX_ATAS_PER_IX).zip(ata_proofs) { | ||
| out.extend(build_ata_load(chunk, proof, fee_payer)?); |
There was a problem hiding this comment.
Preserve the flattened ATA index across chunks.
Line 164 chunks cold_atas, but build_ata_load still restarts enumeration at zero. A MissingAtaContext from the second or later chunk will therefore point at 0..7 again instead of the real position in cold_atas.
🛠️ Proposed fix
- for (chunk, proof) in cold_atas.chunks(MAX_ATAS_PER_IX).zip(ata_proofs) {
- out.extend(build_ata_load(chunk, proof, fee_payer)?);
+ let mut ata_offset = 0usize;
+ for (chunk, proof) in cold_atas.chunks(MAX_ATAS_PER_IX).zip(ata_proofs) {
+ out.extend(build_ata_load(chunk, proof, fee_payer, ata_offset)?);
+ ata_offset += chunk.len();
}fn build_ata_load(
ifaces: &[&TokenAccountInterface],
proof: ValidityProofWithContext,
fee_payer: Pubkey,
start_index: usize,
) -> Result<Vec<Instruction>, LoadAccountsError> {
let contexts: Vec<AtaContext> = ifaces
.iter()
.enumerate()
.map(|(i, a)| AtaContext::from_interface(a, start_index + i))
.collect::<Result<Vec<_>, _>>()?;
// ...
}Also applies to: 401-405
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk-libs/client/src/interface/load_accounts.rs` around lines 163 - 165, The
ATA enumeration resets per chunk causing MissingAtaContext to report wrong
indices; update build_ata_load to accept a start_index: usize and use it when
creating AtaContext (e.g., in AtaContext::from_interface calls add start_index +
i) so contexts reflect their absolute position, then when iterating
cold_atas.chunks(...) compute the chunk's start_index (running offset or
chunk_index * MAX_ATAS_PER_IX) and pass it into build_ata_load; apply the same
change to the other call site referenced around lines 401-405 so all ATA load
invocations supply the correct flattened index.
| fn build_compressed_proof(body: &str) -> Result<CompressedProof, IndexerError> { | ||
| let proof_json = deserialize_gnark_proof_json(body) | ||
| .map_err(|error| IndexerError::CustomError(error.to_string()))?; | ||
| let (proof_a, proof_b, proof_c) = proof_from_json_struct(proof_json) | ||
| .map_err(|error| IndexerError::CustomError(error.to_string()))?; | ||
| let (proof_a, proof_b, proof_c) = compress_proof(&proof_a, &proof_b, &proof_c) | ||
| .map_err(|error| IndexerError::CustomError(error.to_string()))?; | ||
|
|
||
| Ok(CompressedProof { | ||
| a: proof_a, | ||
| b: proof_b, | ||
| c: proof_c, | ||
| }) | ||
| } |
There was a problem hiding this comment.
Parse the timed proof envelope here too.
build_compressed_proof() only accepts raw gnark proof JSON, but the same /prove endpoint is now handled elsewhere as either raw proof JSON or { "proof": ..., "proof_duration_ms": ... }. When the wrapped form comes back here, _get_validity_proof_v1_implementation() will fail after a 200 OK. Please unwrap the proof field before deserializing, or move the shared response parsing into a common helper.
🔧 Suggested fix
+#[derive(serde::Deserialize)]
+struct ProofWithTimingJson {
+ proof: serde_json::Value,
+}
+
fn build_compressed_proof(body: &str) -> Result<CompressedProof, IndexerError> {
- let proof_json = deserialize_gnark_proof_json(body)
+ let proof_body = serde_json::from_str::<ProofWithTimingJson>(body)
+ .map(|wrapped| wrapped.proof.to_string())
+ .unwrap_or_else(|_| body.to_string());
+ let proof_json = deserialize_gnark_proof_json(&proof_body)
.map_err(|error| IndexerError::CustomError(error.to_string()))?;
let (proof_a, proof_b, proof_c) = proof_from_json_struct(proof_json)
.map_err(|error| IndexerError::CustomError(error.to_string()))?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 98 - 111,
build_compressed_proof currently assumes body is raw gnark proof JSON but needs
to handle the timed envelope { "proof": ..., "proof_duration_ms": ... } too;
modify build_compressed_proof (or create a shared helper used by
_get_validity_proof_v1_implementation) to first check if the incoming body is an
envelope (e.g., parse into a minimal map/object and if it contains a "proof" key
extract that value), then pass the extracted inner proof JSON to
deserialize_gnark_proof_json; keep the existing error mapping to IndexerError
and then continue with proof_from_json_struct and compress_proof as before so
both raw and wrapped responses succeed.
8040138 to
30a3abb
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
forester/src/epoch_manager.rs (1)
570-585:⚠️ Potential issue | 🟠 MajorShutdown order allows new workers to be registered after drain starts.
shutdown_new_tree_workers()(Line 709) runs before_guardaborts producer tasks (Line 570). While draining,discover_trees_periodically -> add_new_tree -> register_new_tree_workercan still append workers, leaving them outside the shutdown join path.Abort background producers first, then drain/join
new_tree_workers.Also applies to: 709-710
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@forester/src/epoch_manager.rs` around lines 570 - 585, The shutdown currently calls shutdown_new_tree_workers() before the scopeguard that aborts background producers (the guard created as let _guard = scopeguard::guard(...)), which allows discover_trees_periodically -> add_new_tree -> register_new_tree_worker to append workers after drain starts; move the abort of producer tasks earlier so background producers are stopped before draining/joining new_tree_workers: ensure the scopeguard (the closure that aborts current_previous_handle, tree_discovery_handle, balance_check_handle) runs (or explicitly call abort on those handles) prior to invoking shutdown_new_tree_workers()/joining new_tree_workers, so no new workers can be registered after drain starts (adjust location around the scopeguard creation and the call to shutdown_new_tree_workers).sdk-libs/program-test/src/indexer/test_indexer.rs (1)
719-732:⚠️ Potential issue | 🟠 MajorKeep
tree_next_insertion_indexaligned with the production indexer contract.Line 731 now uses
address_tree_bundle.right_most_index(), butsdk-libs/client/src/indexer/photon_indexer.rs:1658-1698still populates the sameAddressQueueDatafield fromaq.start_index. If those values differ, program-test will exercise a different staging offset than production and batch-address proofs can shift by the queue pagination delta. Please reconcile the field meaning in both constructors together.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 719 - 732, The test constructor sets AddressQueueData.tree_next_insertion_index from address_tree_bundle.right_most_index(), while the production constructor populates that same field from aq.start_index; make them consistent by changing the test to set tree_next_insertion_index to start_index (start as u64) so it matches the production behavior, or alternatively update the production constructor to use address_tree_bundle.right_most_index()—whichever semantic (start_index vs right_most_index) you choose, apply it consistently in both the AddressQueueData builder in this test and in the production constructor that currently reads aq.start_index to avoid pagination/offset mismatches.
♻️ Duplicate comments (3)
forester/src/forester_status.rs (1)
673-688: 🧹 Nitpick | 🔵 TrivialExtract duplicated V1 queue parsing into one helper.
Line 673-688 and Line 730-745 duplicate the same
QueueAccountparsing/count/capacity/error logic. Centralize this in a helper to prevent drift.♻️ Refactor sketch
+fn parse_v1_queue_stats(acc: &Account, tree_kind: &str) -> (Option<u64>, Option<u64>) { + match unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } { + Ok(hs) => { + let len = hs + .iter() + .filter(|(_, cell)| cell.sequence_number.is_none()) + .count() as u64; + let cap = hs.get_capacity() as u64; + (Some(len), Some(cap)) + } + Err(error) => { + warn!(?error, %tree_kind, "Failed to parse V1 queue hash set"); + (None, None) + } + } +} ... - let (queue_len, queue_cap) = queue_account - .map(|acc| match unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } { ... }) - .unwrap_or((None, None)); + let (queue_len, queue_cap) = queue_account + .as_ref() + .map(|acc| parse_v1_queue_stats(acc, "StateV1")) + .unwrap_or((None, None)); ... - let (queue_len, queue_cap) = queue_account - .map(|acc| match unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } { ... }) - .unwrap_or((None, None)); + let (queue_len, queue_cap) = queue_account + .as_ref() + .map(|acc| parse_v1_queue_stats(acc, "AddressV1")) + .unwrap_or((None, None));Also applies to: 730-745
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@forester/src/forester_status.rs` around lines 673 - 688, Create a helper function (e.g., fn parse_queue_metrics<T>(acc: &Account) -> (Option<u64>, Option<u64>)) that encapsulates the unsafe parse_hash_set_from_bytes::<QueueAccount>(&acc.data) logic, the iteration to compute len (count of cells with sequence_number.is_none()) and cap (hs.get_capacity()), and the Err branch that calls warn!(?error, "Failed to parse StateV1 queue hash set") returning (None, None); then replace both duplicated map closures (the blocks using parse_hash_set_from_bytes::<QueueAccount>) with calls to this helper (passing the acc), ensuring the helper is used where the original closures lived.forester/src/epoch_manager.rs (2)
943-1023:⚠️ Potential issue | 🟠 MajorEpoch report/cleanup still races with in-flight new-tree workers.
Line 943 spawns per-epoch new-tree workers, but Line 1401 (report send) and Lines 1423-1432 (epoch state cleanup) run without awaiting completion of same-epoch workers. Late worker updates can under-report work and recreate cleaned per-epoch state.
Please await/join all
new_tree_workersforepochbefore reporting and cleanup.Also applies to: 1401-1432
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@forester/src/epoch_manager.rs` around lines 943 - 1023, The per-epoch report/send and cleanup run while per-epoch new-tree workers spawned by register_new_tree_worker may still be running; to fix, before the report send and before removing epoch state iterate the collection of NewTreeWorker entries for that epoch and wait for each worker to finish by awaiting its completion oneshot receiver (NewTreeWorker.completion) and/or joining its thread_handle (JoinHandle::join or await the completion_rx) and only then proceed with the report and epoch cleanup; ensure you also send cancellation (NewTreeWorker.cancel) where appropriate, remove entries after join, and update any places that access self.new_tree_workers so they await these completions for the same epoch.
1936-1957:⚠️ Potential issue | 🔴 CriticalCritical: Shutdown can leave tree-processing work running detached in the background.
At line 600, shutdown breaks the main loop immediately without draining
epoch_tasks. If anepoch_tasksfuture is mid-await onjoin_all(spawn_blocking_tasks)(line 1957), dropping that future does not stop the already-spawned blocking tasks—Tokio'sspawn_blockingworker pool continues executing them independently. This meansprocess_queuework (line 1948) continues running afterrun()returns, violating clean shutdown semantics.Ensure in-flight work is either awaited to completion before shutdown or explicitly aborted. Consider storing
JoinHandles fromspawn_blockingat a scope where they can be explicitly aborted on shutdown, or drainepoch_tasksbefore exiting the select loop.Also applies to: Lines 700–710 (post-loop cleanup lacks epoch_tasks draining)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@forester/src/epoch_manager.rs` around lines 1936 - 1957, The code currently spawns blocking tasks inside the per-tree async task and immediately awaits them, allowing spawned blocking work to continue after run() exits; change this to capture and manage the Tokio JoinHandle so work can be awaited or aborted on shutdown. Specifically, modify the block that calls tokio::task::spawn_blocking in the task pushed into tasks (referenced symbols: spawn_blocking, process_queue, worker_slots, runtime_handle, tasks/epoch_tasks) to return or store the JoinHandle instead of awaiting it inline; then in run() maintain a Vec<JoinHandle> for in-flight spawn_blocking handles and on shutdown either .await them to completion or call .abort() on each (and optionally await the aborted handles) before returning. Ensure permits (from worker_slots.acquire_owned()) are held only for the lifetime of the spawned work and dropped appropriately when aborting or awaiting the JoinHandle so semaphores are released.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@forester/tests/e2e_test.rs`:
- Around line 802-817: The test currently spawns a nested runtime with
tokio::task::spawn_blocking and a new Tokio runtime to run
run_pipeline::<LightClient>, which is unnecessary; replace the spawn_blocking +
Builder::new_multi_thread block with a direct tokio::spawn that awaits
run_pipeline::<LightClient> on the existing test runtime, passing the same
captured values (config, shutdown_receiver, shutdown_compressible_receiver,
shutdown_bootstrap_receiver, work_report_sender) and assign its JoinHandle to
service_handle so the spawned task runs as a regular async task on the test
runtime.
In `@forester/tests/test_batch_append_spent.rs`:
- Around line 331-346: The test spawns a blocking thread that builds a nested
Tokio runtime which causes Handle::current() to bind to the wrong runtime;
replace the tokio::task::spawn_blocking+inner runtime with a direct tokio::spawn
of the async run_pipeline::<LightClient> future. Specifically, move the closure
logic to call tokio::spawn(run_pipeline::<LightClient>(config, None, None,
shutdown_receiver, Some(shutdown_compressible_receiver), None,
work_report_sender)) (await or join the JoinHandle as needed in the test) and
remove the runtime::Builder and runtime.block_on usage so the test uses the
existing test runtime.
In `@prover/client/src/proof_types/batch_append/proof_inputs.rs`:
- Around line 131-142: The loop that checks batch lengths doesn't validate
individual Merkle proof heights, so the later conversion try_into().unwrap() can
panic; update proof_inputs.rs to either (a) preflight-validate every
merkle_proofs[i].len() equals HEIGHT (alongside existing batch checks for
old_leaves, leaves, merkle_proofs) and return ProverClientError::GenericError on
mismatch, or (b) replace the unsafe conversion site where try_into().unwrap() is
used with a fallible conversion that maps the error to ProverClientError (e.g.,
try_into().map_err(|e| ProverClientError::GenericError(format!("invalid merkle
proof height: {e}"))) ) so malformed proof lengths produce a Result::Err instead
of a panic; reference the merkle_proofs array and the try_into().unwrap() call
when applying the change.
In `@prover/client/src/proof.rs`:
- Around line 72-83: In deserialize_hex_string_to_be_bytes, avoid parsing
arbitrarily long hex strings by adding a pre-parse length guard on the
normalized hex string (trimmed_str) before calling
num_bigint::BigUint::from_str_radix: if trimmed_str.len() (after removing
optional "0x"/"0X" and optional leading zeros) exceeds 64 hex characters (32
bytes) return ProverClientError::InvalidHexString early; otherwise proceed to
parse and convert to a 32-byte big-endian array as before. Ensure you also
handle odd-length hex by treating a leading '0' or rejecting consistently so the
length check reflects actual byte count.
In `@sdk-libs/client/src/interface/initialize_config.rs`:
- Line 7: The code imports and serializes InitializeLightConfigParams from the
sibling crate light_account, which violates the SDK-libs dependency rule;
replace that direct dependency by either (a) defining a local DTO (e.g.,
LocalInitializeLightConfigParams) in initialize_config.rs with the same fields
and serialization derives and use it throughout the file, or (b) accept the
payload via the allowed program-lib interface type and map/convert it into a
local type; remove the use light_account::InitializeLightConfigParams import and
any direct uses of InitializeLightConfigParams, updating functions that
reference it (e.g., the initializer/serializer code in this file) to work with
the new local DTO or conversion layer.
- Around line 11-12: Replace the hardcoded
INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR byte array with the Anchor
Discriminator constant/method: import anchor_lang::Discriminator and derive the
8-byte discriminator from the instruction type (e.g., use
<InitializeCompressionConfig as Discriminator>::DISCRIMINATOR or
InitializeCompressionConfig::discriminator()) instead of the literal [133, 228,
12, 169, 56, 76, 222, 61]; apply the same pattern to any other discriminator
constants so the client pulls the canonical discriminator from the instruction
type rather than a hardcoded array.
---
Outside diff comments:
In `@forester/src/epoch_manager.rs`:
- Around line 570-585: The shutdown currently calls shutdown_new_tree_workers()
before the scopeguard that aborts background producers (the guard created as let
_guard = scopeguard::guard(...)), which allows discover_trees_periodically ->
add_new_tree -> register_new_tree_worker to append workers after drain starts;
move the abort of producer tasks earlier so background producers are stopped
before draining/joining new_tree_workers: ensure the scopeguard (the closure
that aborts current_previous_handle, tree_discovery_handle,
balance_check_handle) runs (or explicitly call abort on those handles) prior to
invoking shutdown_new_tree_workers()/joining new_tree_workers, so no new workers
can be registered after drain starts (adjust location around the scopeguard
creation and the call to shutdown_new_tree_workers).
In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 719-732: The test constructor sets
AddressQueueData.tree_next_insertion_index from
address_tree_bundle.right_most_index(), while the production constructor
populates that same field from aq.start_index; make them consistent by changing
the test to set tree_next_insertion_index to start_index (start as u64) so it
matches the production behavior, or alternatively update the production
constructor to use address_tree_bundle.right_most_index()—whichever semantic
(start_index vs right_most_index) you choose, apply it consistently in both the
AddressQueueData builder in this test and in the production constructor that
currently reads aq.start_index to avoid pagination/offset mismatches.
---
Duplicate comments:
In `@forester/src/epoch_manager.rs`:
- Around line 943-1023: The per-epoch report/send and cleanup run while
per-epoch new-tree workers spawned by register_new_tree_worker may still be
running; to fix, before the report send and before removing epoch state iterate
the collection of NewTreeWorker entries for that epoch and wait for each worker
to finish by awaiting its completion oneshot receiver (NewTreeWorker.completion)
and/or joining its thread_handle (JoinHandle::join or await the completion_rx)
and only then proceed with the report and epoch cleanup; ensure you also send
cancellation (NewTreeWorker.cancel) where appropriate, remove entries after
join, and update any places that access self.new_tree_workers so they await
these completions for the same epoch.
- Around line 1936-1957: The code currently spawns blocking tasks inside the
per-tree async task and immediately awaits them, allowing spawned blocking work
to continue after run() exits; change this to capture and manage the Tokio
JoinHandle so work can be awaited or aborted on shutdown. Specifically, modify
the block that calls tokio::task::spawn_blocking in the task pushed into tasks
(referenced symbols: spawn_blocking, process_queue, worker_slots,
runtime_handle, tasks/epoch_tasks) to return or store the JoinHandle instead of
awaiting it inline; then in run() maintain a Vec<JoinHandle> for in-flight
spawn_blocking handles and on shutdown either .await them to completion or call
.abort() on each (and optionally await the aborted handles) before returning.
Ensure permits (from worker_slots.acquire_owned()) are held only for the
lifetime of the spawned work and dropped appropriately when aborting or awaiting
the JoinHandle so semaphores are released.
In `@forester/src/forester_status.rs`:
- Around line 673-688: Create a helper function (e.g., fn
parse_queue_metrics<T>(acc: &Account) -> (Option<u64>, Option<u64>)) that
encapsulates the unsafe parse_hash_set_from_bytes::<QueueAccount>(&acc.data)
logic, the iteration to compute len (count of cells with
sequence_number.is_none()) and cap (hs.get_capacity()), and the Err branch that
calls warn!(?error, "Failed to parse StateV1 queue hash set") returning (None,
None); then replace both duplicated map closures (the blocks using
parse_hash_set_from_bytes::<QueueAccount>) with calls to this helper (passing
the acc), ensuring the helper is used where the original closures lived.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: cc63c1aa-5f87-4c5f-a00f-34561a82b45c
⛔ Files ignored due to path filters (3)
program-tests/utils/src/mock_batched_forester.rsis excluded by none and included by nonesdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/read_only.rsis excluded by none and included by nonesdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/test.rsis excluded by none and included by none
📒 Files selected for processing (19)
forester/src/epoch_manager.rsforester/src/forester_status.rsforester/src/metrics.rsforester/src/priority_fee.rsforester/src/processor/v2/proof_worker.rsforester/tests/e2e_test.rsforester/tests/legacy/batched_state_async_indexer_test.rsforester/tests/test_batch_append_spent.rsprover/client/src/helpers.rsprover/client/src/proof.rsprover/client/src/proof_types/batch_address_append/json.rsprover/client/src/proof_types/batch_append/proof_inputs.rsprover/client/src/proof_types/combined/v2/json.rsprover/client/src/proof_types/non_inclusion/v2/json.rsprover/client/tests/batch_address_append.rsprover/client/tests/init_merkle_tree.rssdk-libs/client/src/interface/initialize_config.rssdk-libs/client/src/local_test_validator.rssdk-libs/program-test/src/indexer/test_indexer.rs
| let config = Arc::new(config.clone()); | ||
| let service_handle = tokio::task::spawn_blocking(move || { | ||
| let runtime = tokio::runtime::Builder::new_multi_thread() | ||
| .worker_threads(2) | ||
| .enable_all() | ||
| .build()?; | ||
| runtime.block_on(run_pipeline::<LightClient>( | ||
| config, | ||
| None, | ||
| None, | ||
| shutdown_receiver, | ||
| Some(shutdown_compressible_receiver), | ||
| Some(shutdown_bootstrap_receiver), | ||
| work_report_sender, | ||
| )) | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n "spawn_blocking\\(|Builder::new_multi_thread\\(|runtime\\.block_on\\(|Handle::current\\(" \
forester/tests/e2e_test.rs \
forester/src/epoch_manager.rsRepository: Lightprotocol/light-protocol
Length of output: 731
🏁 Script executed:
#!/bin/bash
# Check run_pipeline signature and context
rg -B 5 -A 2 "^async fn run_pipeline" forester/src/Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Examine epoch_manager around Handle::current() usage
sed -n '1910,1950p' forester/src/epoch_manager.rsRepository: Lightprotocol/light-protocol
Length of output: 1780
🏁 Script executed:
#!/bin/bash
# Examine the nested runtime creation in epoch_manager
sed -n '945,965p' forester/src/epoch_manager.rsRepository: Lightprotocol/light-protocol
Length of output: 1374
🏁 Script executed:
#!/bin/bash
# Check if run_pipeline returns a Result and what type
rg -A 15 "^async fn run_pipeline" forester/src/ | head -30Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Search for run_pipeline in all forester files
rg "fn run_pipeline" forester/Repository: Lightprotocol/light-protocol
Length of output: 213
🏁 Script executed:
#!/bin/bash
# Try searching in the entire repo
rg -l "run_pipeline" forester/ | head -10Repository: Lightprotocol/light-protocol
Length of output: 473
🏁 Script executed:
#!/bin/bash
# Get context around the spawn_blocking call in e2e_test.rs
sed -n '795,825p' forester/tests/e2e_test.rsRepository: Lightprotocol/light-protocol
Length of output: 1126
🏁 Script executed:
#!/bin/bash
# Check the full signature of run_pipeline
rg -A 10 "^pub async fn run_pipeline" forester/src/lib.rs | head -20Repository: Lightprotocol/light-protocol
Length of output: 937
🏁 Script executed:
#!/bin/bash
# Check if this is a tokio test
sed -n '790,810p' forester/tests/e2e_test.rs | head -25Repository: Lightprotocol/light-protocol
Length of output: 935
🏁 Script executed:
#!/bin/bash
# Look at the test function signature
rg -B 10 "let config = Arc::new\(config.clone\(\)\);" forester/tests/e2e_test.rs | head -20Repository: Lightprotocol/light-protocol
Length of output: 560
🏁 Script executed:
#!/bin/bash
# Find the test function that contains this code
sed -n '770,810p' forester/tests/e2e_test.rsRepository: Lightprotocol/light-protocol
Length of output: 1475
🏁 Script executed:
#!/bin/bash
# Check the full test function signature
rg -B 30 "let service_handle = tokio::task::spawn_blocking" forester/tests/e2e_test.rs | grep -A 30 "^fn " | head -35Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Check if it's a #[tokio::test]
sed -n '750,810p' forester/tests/e2e_test.rsRepository: Lightprotocol/light-protocol
Length of output: 2135
🏁 Script executed:
#!/bin/bash
# Find where setup_forester_pipeline is called
rg -B 5 -A 5 "setup_forester_pipeline" forester/tests/e2e_test.rs | head -50Repository: Lightprotocol/light-protocol
Length of output: 766
🏁 Script executed:
#!/bin/bash
# Check if setup_forester_pipeline itself is async by looking at its definition
sed -n '786,830p' forester/tests/e2e_test.rsRepository: Lightprotocol/light-protocol
Length of output: 1517
🏁 Script executed:
#!/bin/bash
# Check if run_pipeline uses Handle::current() internally
rg -n "Handle::current()" forester/src/lib.rs | head -10Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Look at the actual test that uses setup_forester_pipeline
sed -n '50,100p' forester/tests/e2e_test.rs | head -60Repository: Lightprotocol/light-protocol
Length of output: 1384
🏁 Script executed:
#!/bin/bash
# Find the main test function
rg -n "^#\[tokio::test\]" forester/tests/e2e_test.rsRepository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Check what test runs setup_forester_pipeline
rg -B 2 "setup_forester_pipeline" forester/tests/e2e_test.rs | grep -E "^[0-9]+" | head -5Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Get the full context - what test calls it?
sed -n '1,50p' forester/tests/e2e_test.rsRepository: Lightprotocol/light-protocol
Length of output: 2032
🏁 Script executed:
#!/bin/bash
# Search for test functions that call setup_forester_pipeline
rg -B 50 "setup_forester_pipeline\(&config\)" forester/tests/e2e_test.rs | grep -E "(^async fn|^fn|#\[)" | tail -5Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Search run_pipeline implementation for Handle::current usage
rg -n "Handle::" forester/src/lib.rsRepository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Check what test actually uses setup_forester_pipeline with full context
rg -B 100 "setup_forester_pipeline.*await" forester/tests/e2e_test.rs | grep -E "^(async fn|#\[|[0-9]+.*async fn)" | tail -3Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Get the actual test name and see if it's marked as tokio test
rg -B 100 "setup_forester_pipeline" forester/tests/e2e_test.rs | grep -E "#\[.*test" | tail -1Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Check the return type expectations
sed -n '786,825p' forester/tests/e2e_test.rsRepository: Lightprotocol/light-protocol
Length of output: 1368
🏁 Script executed:
#!/bin/bash
# Check what type spawn_blocking returns vs tokio::spawn
rg -A 5 "pub fn spawn_blocking" | head -15Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Check if run_pipeline_with_run_id actually uses Handle::current() or any Handle methods
sed -n '1,100p' forester/src/lib.rs | head -100Repository: Lightprotocol/light-protocol
Length of output: 3378
🏁 Script executed:
#!/bin/bash
# Look at the full run_pipeline implementation
rg -A 50 "pub async fn run_pipeline_with_run_id" forester/src/lib.rs | head -60Repository: Lightprotocol/light-protocol
Length of output: 2453
🏁 Script executed:
#!/bin/bash
# Check return type of spawn_blocking vs spawn
sed -n '800,820p' forester/tests/e2e_test.rsRepository: Lightprotocol/light-protocol
Length of output: 717
🏁 Script executed:
#!/bin/bash
# Verify run_pipeline doesn't use Handle::current internally by searching its implementation
rg -n "Handle::" forester/src/lib.rs forester/src/epoch_manager.rsRepository: Lightprotocol/light-protocol
Length of output: 154
🏁 Script executed:
#!/bin/bash
# Double-check: does run_pipeline use tokio::task::spawn or tokio::spawn internally?
rg -n "tokio::spawn" forester/src/lib.rs | head -10Repository: Lightprotocol/light-protocol
Length of output: 359
🏁 Script executed:
#!/bin/bash
# Check if run_pipeline (via run_service) spawns tasks that might fail in nested runtime
rg -n "tokio::spawn" forester/src/lib.rs forester/src/epoch_manager.rs | grep -A 2 -B 2 "191\|211\|257\|279\|319\|336"Repository: Lightprotocol/light-protocol
Length of output: 648
🏁 Script executed:
#!/bin/bash
# Understand what run_service does - it's called by run_pipeline_with_run_id
rg -B 5 -A 20 "^async fn run_service" forester/src/epoch_manager.rs | head -40Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Key question: in the e2e_test, is there any direct tokio::spawn happening inside run_pipeline that would need the outer runtime?
sed -n '445,460p' forester/src/lib.rsRepository: Lightprotocol/light-protocol
Length of output: 835
🏁 Script executed:
#!/bin/bash
# Verify: does setup_forester_pipeline appear to be called from within a tokio runtime?
# Look for the test that calls it
rg -B 20 "setup_forester_pipeline\(" forester/tests/e2e_test.rs | grep -E "^[0-9]+--(async fn|#\[)" | tail -5Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Critical check: what is the outer runtime context?
# Find what calls setup_forester_pipeline and check if it's async
sed -n '1,30p' forester/tests/e2e_test.rs | grep -E "tokio|async"Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Check the actual problem: can we just use tokio::spawn instead of spawn_blocking?
# Verify the return type compatibility
sed -n '803,825p' forester/tests/e2e_test.rsRepository: Lightprotocol/light-protocol
Length of output: 734
🏁 Script executed:
#!/bin/bash
# Final verification: check if anything in run_pipeline tries to access Handle::current()
rg -n "Handle::current" forester/src/lib.rsRepository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Also check if any spawned tasks in run_pipeline might need the test runtime
rg -B 3 -A 3 "tokio::spawn" forester/src/lib.rs | head -40Repository: Lightprotocol/light-protocol
Length of output: 1546
🏁 Script executed:
#!/bin/bash
# Final confirmation: check if setup_forester_pipeline is called from within an async test context
rg -B 50 "setup_forester_pipeline\(" forester/tests/e2e_test.rs | grep -E "async fn|#\[" | tail -3Repository: Lightprotocol/light-protocol
Length of output: 105
🏁 Script executed:
#!/bin/bash
# Confirm the test is tokio-based
rg "use tokio::" forester/tests/e2e_test.rsRepository: Lightprotocol/light-protocol
Length of output: 84
🏁 Script executed:
#!/bin/bash
# Verify: what is the actual test function signature that uses setup_forester_pipeline?
sed -n '800,850p' forester/tests/e2e_test.rs | head -60Repository: Lightprotocol/light-protocol
Length of output: 1701
🏁 Script executed:
#!/bin/bash
# Check if there's any usage of Handle::current() that could conflict
# in the run_pipeline or its sub-functions
rg -n "Handle::current" forester/src/Repository: Lightprotocol/light-protocol
Length of output: 154
🏁 Script executed:
#!/bin/bash
# Verify the actual type returned by spawn vs spawn_blocking
# Look at what the test expects
sed -n '820,835p' forester/tests/e2e_test.rsRepository: Lightprotocol/light-protocol
Length of output: 545
🏁 Script executed:
#!/bin/bash
# Key insight: can we safely replace this without breaking return types?
# Check if JoinHandle types are compatible
rg -B 2 -A 5 "service_handle\." forester/tests/e2e_test.rs | head -20Repository: Lightprotocol/light-protocol
Length of output: 369
Use tokio::spawn directly instead of nested runtime.
run_pipeline is fully async and doesn't require blocking execution. Replace spawn_blocking + new_multi_thread with a direct tokio::spawn call—the tasks spawned within run_pipeline will work correctly on the test's existing runtime. This eliminates unnecessary complexity and overhead from creating a separate runtime context.
Proposed fix
let config = Arc::new(config.clone());
- let service_handle = tokio::task::spawn_blocking(move || {
- let runtime = tokio::runtime::Builder::new_multi_thread()
- .worker_threads(2)
- .enable_all()
- .build()?;
- runtime.block_on(run_pipeline::<LightClient>(
- config,
- None,
- None,
- shutdown_receiver,
- Some(shutdown_compressible_receiver),
- Some(shutdown_bootstrap_receiver),
- work_report_sender,
- ))
- });
+ let service_handle = tokio::spawn(run_pipeline::<LightClient>(
+ config,
+ None,
+ None,
+ shutdown_receiver,
+ Some(shutdown_compressible_receiver),
+ Some(shutdown_bootstrap_receiver),
+ work_report_sender,
+ ));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let config = Arc::new(config.clone()); | |
| let service_handle = tokio::task::spawn_blocking(move || { | |
| let runtime = tokio::runtime::Builder::new_multi_thread() | |
| .worker_threads(2) | |
| .enable_all() | |
| .build()?; | |
| runtime.block_on(run_pipeline::<LightClient>( | |
| config, | |
| None, | |
| None, | |
| shutdown_receiver, | |
| Some(shutdown_compressible_receiver), | |
| Some(shutdown_bootstrap_receiver), | |
| work_report_sender, | |
| )) | |
| }); | |
| let config = Arc::new(config.clone()); | |
| let service_handle = tokio::spawn(run_pipeline::<LightClient>( | |
| config, | |
| None, | |
| None, | |
| shutdown_receiver, | |
| Some(shutdown_compressible_receiver), | |
| Some(shutdown_bootstrap_receiver), | |
| work_report_sender, | |
| )); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@forester/tests/e2e_test.rs` around lines 802 - 817, The test currently spawns
a nested runtime with tokio::task::spawn_blocking and a new Tokio runtime to run
run_pipeline::<LightClient>, which is unnecessary; replace the spawn_blocking +
Builder::new_multi_thread block with a direct tokio::spawn that awaits
run_pipeline::<LightClient> on the existing test runtime, passing the same
captured values (config, shutdown_receiver, shutdown_compressible_receiver,
shutdown_bootstrap_receiver, work_report_sender) and assign its JoinHandle to
service_handle so the spawned task runs as a regular async task on the test
runtime.
| let batch_len = batch_size as usize; | ||
| for (name, len) in [ | ||
| ("old_leaves", old_leaves.len()), | ||
| ("leaves", leaves.len()), | ||
| ("merkle_proofs", merkle_proofs.len()), | ||
| ] { | ||
| if len != batch_len { | ||
| return Err(ProverClientError::GenericError(format!( | ||
| "invalid batch append inputs: {name} len {len} != batch_size {batch_len}" | ||
| ))); | ||
| } | ||
| } |
There was a problem hiding this comment.
Validate each proof height before the later unwrap().
The new batch-size checks on Lines 131-142 still leave a panic path on Line 193: a single Merkle proof whose length is not HEIGHT will hit try_into().unwrap() instead of returning ProverClientError. Since this function is now on the Result-based path, please reject malformed proof lengths during preflight or replace the conversion with map_err.
🛠️ Suggested guard
let batch_len = batch_size as usize;
for (name, len) in [
("old_leaves", old_leaves.len()),
("leaves", leaves.len()),
("merkle_proofs", merkle_proofs.len()),
] {
if len != batch_len {
return Err(ProverClientError::GenericError(format!(
"invalid batch append inputs: {name} len {len} != batch_size {batch_len}"
)));
}
}
+ for (i, proof) in merkle_proofs.iter().enumerate() {
+ if proof.len() != HEIGHT {
+ return Err(ProverClientError::GenericError(format!(
+ "invalid batch append proof length at position {i}: got {}, expected {}",
+ proof.len(),
+ HEIGHT
+ )));
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@prover/client/src/proof_types/batch_append/proof_inputs.rs` around lines 131
- 142, The loop that checks batch lengths doesn't validate individual Merkle
proof heights, so the later conversion try_into().unwrap() can panic; update
proof_inputs.rs to either (a) preflight-validate every merkle_proofs[i].len()
equals HEIGHT (alongside existing batch checks for old_leaves, leaves,
merkle_proofs) and return ProverClientError::GenericError on mismatch, or (b)
replace the unsafe conversion site where try_into().unwrap() is used with a
fallible conversion that maps the error to ProverClientError (e.g.,
try_into().map_err(|e| ProverClientError::GenericError(format!("invalid merkle
proof height: {e}"))) ) so malformed proof lengths produce a Result::Err instead
of a panic; reference the merkle_proofs array and the try_into().unwrap() call
when applying the change.
| pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> { | ||
| let trimmed_str = hex_str | ||
| .strip_prefix("0x") | ||
| .or_else(|| hex_str.strip_prefix("0X")) | ||
| .unwrap_or(hex_str); | ||
| let big_uint = num_bigint::BigUint::from_str_radix(trimmed_str, 16) | ||
| .map_err(|error| ProverClientError::InvalidHexString(format!("{hex_str}: {error}")))?; | ||
| let big_uint_bytes = big_uint.to_bytes_be(); | ||
| if big_uint_bytes.len() > 32 { | ||
| return Err(ProverClientError::InvalidHexString(format!( | ||
| "{hex_str}: exceeds 32 bytes" | ||
| ))); |
There was a problem hiding this comment.
Add a pre-parse length guard to prevent expensive unbounded hex parsing.
Line 77 parses the full input before rejecting oversize values, so very large strings can cause unnecessary CPU/memory work. Bound the normalized hex length before from_str_radix.
Proposed fix
pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> {
let trimmed_str = hex_str
.strip_prefix("0x")
.or_else(|| hex_str.strip_prefix("0X"))
.unwrap_or(hex_str);
- let big_uint = num_bigint::BigUint::from_str_radix(trimmed_str, 16)
+ let normalized = trimmed_str.trim_start_matches('0');
+ if normalized.len() > 64 {
+ return Err(ProverClientError::InvalidHexString(format!(
+ "{hex_str}: exceeds 32 bytes"
+ )));
+ }
+ if normalized.is_empty() {
+ return Ok([0u8; 32]);
+ }
+ let big_uint = num_bigint::BigUint::from_str_radix(normalized, 16)
.map_err(|error| ProverClientError::InvalidHexString(format!("{hex_str}: {error}")))?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@prover/client/src/proof.rs` around lines 72 - 83, In
deserialize_hex_string_to_be_bytes, avoid parsing arbitrarily long hex strings
by adding a pre-parse length guard on the normalized hex string (trimmed_str)
before calling num_bigint::BigUint::from_str_radix: if trimmed_str.len() (after
removing optional "0x"/"0X" and optional leading zeros) exceeds 64 hex
characters (32 bytes) return ProverClientError::InvalidHexString early;
otherwise proceed to parse and convert to a 32-byte big-endian array as before.
Ensure you also handle odd-length hex by treating a leading '0' or rejecting
consistently so the length check reflects actual byte count.
| #[cfg(not(feature = "anchor"))] | ||
| use borsh::{BorshDeserialize as AnchorDeserialize, BorshSerialize as AnchorSerialize}; | ||
| use borsh::BorshSerialize as AnchorSerialize; | ||
| use light_account::InitializeLightConfigParams; |
There was a problem hiding this comment.
Keep the request DTO out of a sibling SDK crate.
This change deepens the existing light_account coupling by importing and serializing InitializeLightConfigParams from another SDK library. sdk-libs/client should take this payload from an allowed program-lib or define a local boundary type instead of reaching into a sibling SDK crate.
As per coding guidelines, sdk-libs/**/src/**/*.rs: SDK-libs must depend on program-libs, light-prover-client and external crates only.
Also applies to: 103-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk-libs/client/src/interface/initialize_config.rs` at line 7, The code
imports and serializes InitializeLightConfigParams from the sibling crate
light_account, which violates the SDK-libs dependency rule; replace that direct
dependency by either (a) defining a local DTO (e.g.,
LocalInitializeLightConfigParams) in initialize_config.rs with the same fields
and serialization derives and use it throughout the file, or (b) accept the
payload via the allowed program-lib interface type and map/convert it into a
local type; remove the use light_account::InitializeLightConfigParams import and
any direct uses of InitializeLightConfigParams, updating functions that
reference it (e.g., the initializer/serializer code in this file) to work with
the new local DTO or conversion layer.
| use crate::interface::instructions::INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR; | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n -C3 'INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR|Discriminator::DISCRIMINATOR|initialize_compression_config' --type rustRepository: Lightprotocol/light-protocol
Length of output: 6165
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Find instruction enum definitions in the programs directory
rg -n "enum.*Instruction|InitializeCompressionConfig|#\[derive.*Discriminator" --type rust programs/ | head -50Repository: Lightprotocol/light-protocol
Length of output: 212
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Look for where the instructions are defined, likely in light_system_program or similar
fd "instruction" --type f --extension rs programs/ | head -20Repository: Lightprotocol/light-protocol
Length of output: 409
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for the actual instruction definition with discriminator
rg -n "InitializeCompressionConfig" --type rust programs/ -A5 -B5 | head -100Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Look at the InstructionDiscriminator enum
cat programs/system/src/lib.rs | head -100Repository: Lightprotocol/light-protocol
Length of output: 3509
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for how discriminator values are generated or defined
rg -n "133.*228.*12|initialize_compression_config" --type rust -iRepository: Lightprotocol/light-protocol
Length of output: 2055
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Look for instruction definitions in the program
rg -n "#\[derive.*Instruction|enum Instruction|pub fn initialize" --type rust programs/system/src/ -B2 -A2 | head -80Repository: Lightprotocol/light-protocol
Length of output: 362
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Look at how the discriminator is defined in the macros - is it derived or hardcoded?
cat -n sdk-libs/macros/src/light_pdas/program/instructions.rs | sed -n '810,820p'Repository: Lightprotocol/light-protocol
Length of output: 926
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Search for actual instruction enum in Anchor programs with the discriminator trait
rg -n "#\[derive.*Instruction|pub enum.*Instruction" --type rust sdk-tests/ -B5 -A5 | head -60Repository: Lightprotocol/light-protocol
Length of output: 3840
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Look for if there's an Anchor instruction that would generate this discriminator
rg -n "initialize_compression_config" --type rust sdk-tests/anchor-manual-test/ -B10 -A10 | head -80Repository: Lightprotocol/light-protocol
Length of output: 1829
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check if there's an IDL or generated code that could export the discriminator
rg -n "anchor_lang::Discriminator|DISCRIMINATOR::" --type rust | grep -i initialize | head -20Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Look at the actual Anchor instruction setup in anchor-manual-test
cat -n sdk-tests/anchor-manual-test/src/lib.rs | sed -n '1,100p'Repository: Lightprotocol/light-protocol
Length of output: 4538
🏁 Script executed:
#!/bin/bash
set -euoripefail
# Search for how other Anchor programs export their instruction discriminators
fd "lib.rs" --type f --path "*/programs/*" -x cat -n {} \; | head -150Repository: Lightprotocol/light-protocol
Length of output: 746
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Let's check if there's documentation about the discriminator derivation
rg -n "discriminator|sha256" --type rust sdk-libs/macros/src/light_pdas/program/instructions.rs -B5 -A5 | head -100Repository: Lightprotocol/light-protocol
Length of output: 1865
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check what anchor_lang discriminator would be for initialize_compression_config
# by looking for any Discriminator derive or implementation
rg -n "anchor_lang::Discriminator|impl.*Discriminator" --type rust | head -20Repository: Lightprotocol/light-protocol
Length of output: 2178
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Let's verify the discriminator value using Python and Anchor's hash method
python3 << 'EOF'
import hashlib
# Anchor derives discriminators from the instruction name
instruction_name = "initialize_compression_config"
# Get the first 8 bytes of the SHA256 hash
hash_bytes = hashlib.sha256(b"global:" + instruction_name.encode()).digest()[:8]
discriminator = list(hash_bytes)
print(f"Instruction name: {instruction_name}")
print(f"Expected discriminator (from Anchor): {discriminator}")
print(f"Hardcoded discriminator in code: [133, 228, 12, 169, 56, 76, 222, 61]")
print(f"Match: {discriminator == [133, 228, 12, 169, 56, 76, 222, 61]}")
EOFRepository: Lightprotocol/light-protocol
Length of output: 276
Use anchor_lang::Discriminator trait instead of hardcoded byte array for instruction discriminators.
The constant INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR is currently a hardcoded byte array [133, 228, 12, 169, 56, 76, 222, 61]. While this value is correct (verified against Anchor's sha256 derivation), the coding guideline for {programs,sdk-libs}/**/src/**/*.rs explicitly requires using the anchor_lang::Discriminator trait to obtain 8-byte instruction discriminators.
Hardcoded literals create maintenance risk: if the instruction name ever changes, the constant won't automatically update and can silently drift from the program's canonical discriminator. The proper pattern is deriving the discriminator from the instruction type's Discriminator::DISCRIMINATOR implementation, ensuring your client and program always stay in sync.
Apply this to lines 49-50 in sdk-libs/client/src/interface/instructions.rs and all affected discriminator constants.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk-libs/client/src/interface/initialize_config.rs` around lines 11 - 12,
Replace the hardcoded INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR byte array
with the Anchor Discriminator constant/method: import anchor_lang::Discriminator
and derive the 8-byte discriminator from the instruction type (e.g., use
<InitializeCompressionConfig as Discriminator>::DISCRIMINATOR or
InitializeCompressionConfig::discriminator()) instead of the literal [133, 228,
12, 169, 56, 76, 222, 61]; apply the same pattern to any other discriminator
constants so the client pulls the canonical discriminator from the instruction
type rather than a hardcoded array.
30a3abb to
c97abbd
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
prover/client/src/proof_types/batch_append/proof_inputs.rs (1)
193-193:⚠️ Potential issue | 🟠 MajorPanic path remains: replace
unwrap()with fallible conversion.This conversion will panic if any Merkle proof's length doesn't equal
HEIGHT. Since this function returnsResult<..., ProverClientError>, you should propagate the error instead. The implementation inBatchAppendsCircuitInputs::new()at Lines 64-72 shows exactly the right pattern.🛠️ Suggested fix
- let merkle_proof_array = merkle_proof.try_into().unwrap(); + let proof_len = merkle_proof.len(); + let merkle_proof_array: [[u8; 32]; HEIGHT] = merkle_proof.try_into().map_err(|_| { + ProverClientError::GenericError(format!( + "invalid merkle proof length at position {}: got {}, expected {}", + i, proof_len, HEIGHT + )) + })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@prover/client/src/proof_types/batch_append/proof_inputs.rs` at line 193, The code currently uses merkle_proof.try_into().unwrap() which can panic on invalid length; change this to a fallible conversion that returns a ProverClientError instead (mirror the pattern used in BatchAppendsCircuitInputs::new()). Replace the unwrap with something like merkle_proof.try_into().map_err(|_| ProverClientError::...)? and propagate the Result so the enclosing function returns Err on conversion failure; reference the merkle_proof_array variable and the BatchAppendsCircuitInputs::new() implementation for the exact mapping/variant to use.sdk-libs/program-test/src/indexer/test_indexer.rs (1)
719-733:⚠️ Potential issue | 🟡 MinorFix the off-by-one error:
tree_next_insertion_indexmust be rightmost index plus one.Line 731 assigns
address_tree_bundle.right_most_index()directly totree_next_insertion_index, butright_most_index()returns the index of the last occupied element. Sincetree_next_insertion_indexis used downstream as the starting point for leaf insertion (via forester's staging trees and proof initialization), it must point to the next available index. Change line 731 to:tree_next_insertion_index: address_tree_bundle.right_most_index() as u64 + 1,This aligns with the semantics confirmed in test assertions (
start_index >= 1for initialized trees) and the address-tree initialization contract (sentinel at index 0, next insertion at 1).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 719 - 733, The AddressQueueData being constructed sets tree_next_insertion_index incorrectly to address_tree_bundle.right_most_index(); update the assignment so tree_next_insertion_index uses the next available slot by adding one: set tree_next_insertion_index to address_tree_bundle.right_most_index() as u64 + 1 (leave other fields like initial_root and subtrees unchanged) so subsequent leaf insertion/proof logic uses the correct insertion index.forester/tests/legacy/batched_state_async_indexer_test.rs (2)
300-317:⚠️ Potential issue | 🟠 MajorFallback polling in
wait_for_slotcan hang indefinitely.When
warp_to_slotis unavailable, the loop has no timeout. If slot progression stalls, the test can block forever.Proposed fix
async fn wait_for_slot(rpc: &mut LightClient, target_slot: u64) { match rpc.warp_to_slot(target_slot).await { Ok(_) => { println!("warped to slot {}", target_slot); return; } Err(e) => { println!("warp_to_slot unavailable ({}), falling back to polling", e); } } - while rpc.get_slot().await.unwrap() < target_slot { + let start = tokio::time::Instant::now(); + let max_wait = Duration::from_secs(DEFAULT_TIMEOUT_SECONDS); + loop { + let current_slot = rpc.get_slot().await.unwrap(); + if current_slot >= target_slot { + break; + } + assert!( + start.elapsed() < max_wait, + "Timed out waiting for slot {} (current slot {})", + target_slot, + current_slot + ); println!( "waiting for active phase slot: {}, current slot: {}", target_slot, - rpc.get_slot().await.unwrap() + current_slot ); sleep(Duration::from_millis(400)).await; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@forester/tests/legacy/batched_state_async_indexer_test.rs` around lines 300 - 317, The fallback polling in wait_for_slot can hang forever if warp_to_slot fails; update the async function wait_for_slot to enforce a maximum wait (e.g., via tokio::time::timeout or an elapsed timer with max_duration/max_attempts) when repeatedly calling rpc.get_slot(), and return an error or explicitly fail the test when the timeout is reached; keep references to rpc.warp_to_slot and rpc.get_slot so the polling logic is wrapped in the timeout and the function signature is adjusted to propagate or assert a failure instead of blocking indefinitely.
78-89:⚠️ Potential issue | 🟡 MinorAdd explicit prover readiness check after
init().While this test sets
use_surfpool: trueto gate on CLI startup completion, the error log in the test comments shows the prover still failed to connect at runtime ("error trying to connect: dns error: task 145 was cancelled"). This indicates the implicit wait is insufficient.Compare this to
forester/tests/e2e_test.rs(lines 266–280), which uses an explicitspawn_prover().awaitcall after init, despite also callinginit(). Add the same explicit readiness gate here to avoid flakiness under load:init(Some(LightValidatorConfig { enable_indexer: true, enable_prover: true, wait_time: 30, // ... })) .await; spawn_prover().await; // Explicitly wait for prover health🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@forester/tests/legacy/batched_state_async_indexer_test.rs` around lines 78 - 89, The test currently relies on init(Some(LightValidatorConfig { enable_prover: true, use_surfpool: true, ... })).await but that implicit wait is flaky; after calling init(...) add an explicit spawn_prover().await call (the same readiness gate used in forester/tests/e2e_test.rs) immediately after init to wait for the prover health before proceeding, ensuring the prover has connected and avoiding DNS/connection cancellation errors.sdk-libs/client/src/local_test_validator.rs (1)
40-41:⚠️ Potential issue | 🟠 MajorDocumentation example doesn't match implementation behavior.
The comment at line 40 shows
"--account <ADDRESS> <FILEPATH>"as a single string, but the implementation at lines 94–95 passes eachVec<String>entry directly as a single CLI argument. Multi-part values in one string won't be parsed correctly. Users should provide pre-tokenized arguments instead:vec!["--account", "<ADDRESS>", "<FILEPATH>"].Update the documentation to reflect correct usage:
Suggested doc clarification
- /// Additional arguments to pass to the validator (e.g., "--account <ADDRESS> <FILEPATH>") + /// Additional validator CLI arguments (already tokenized). E.g.: vec!["--account", "<ADDRESS>", "<FILEPATH>"] pub validator_args: Vec<String>,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk-libs/client/src/local_test_validator.rs` around lines 40 - 41, The doc comment for the validator_args field is misleading: it shows "--account <ADDRESS> <FILEPATH>" as one string but the code expects each Vec<String> entry to be a separate CLI token; update the documentation for the validator_args field (symbol: validator_args) to show pre-tokenized usage such as vec!["--account", "<ADDRESS>", "<FILEPATH>"] and explicitly state that each Vec element becomes one CLI argument so multi-part flags must be split into separate entries.
♻️ Duplicate comments (6)
sdk-libs/client/src/interface/initialize_config.rs (2)
11-11:⚠️ Potential issue | 🟠 MajorUse Anchor
Discriminatoras the source of truth for instruction bytes.Line 11 + Lines 120-124 still rely on a constant-backed discriminator pipeline. Pull the 8-byte discriminator from the instruction type via
anchor_lang::Discriminatorto prevent drift from canonical Anchor values.#!/bin/bash set -euo pipefail # Verify hardcoded discriminator constants in client interface modules rg -nP --type rust 'pub const .*DISCRIMINATOR.*=\s*\[[0-9,\s]+\]' sdk-libs/client/src/interface # Verify whether Anchor Discriminator trait usage exists in these modules rg -nP --type rust 'Discriminator::DISCRIMINATOR|::discriminator\(' sdk-libs/client/src/interfaceExpected result:
- First command should return the literal constant definition(s).
- Second command should show no usage in this path today; after fix, it should include the canonical
Discriminatorlookup.As per coding guidelines,
{programs,sdk-libs}/**/src/**/*.rs: Use anchor_lang::Discriminator trait to get 8-byte instruction discriminators from Anchor programs.Also applies to: 120-124
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk-libs/client/src/interface/initialize_config.rs` at line 11, Replace the hardcoded INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR constant usage with the Anchor Discriminator for the instruction type: import anchor_lang::Discriminator and call the instruction struct/type's discriminator (e.g., YourInstructionType::discriminator()) wherever INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR is referenced (including the block around lines 120-124) so the 8-byte discriminator is sourced from Anchor instead of a constant; remove the constant definition and update any comparisons/constructions to use the returned discriminator byte array from the Discriminator trait.
7-7:⚠️ Potential issue | 🟠 MajorRemove cross-SDK DTO dependency in this instruction path.
Line 7 imports
light_account::InitializeLightConfigParams, and Lines 103-114 serialize it directly. This keepssdk-libs/clientcoupled to a sibling SDK crate and violates the SDK dependency boundary.As per coding guidelines,
sdk-libs/**/src/**/*.rs: SDK-libs must depend on program-libs, light-prover-client and external crates only.Also applies to: 103-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk-libs/client/src/interface/initialize_config.rs` at line 7, Remove the direct dependency on light_account::InitializeLightConfigParams by deleting the import and stop serializing that external DTO; instead define and use a local DTO (e.g., InitializeLightConfigDto) in this crate that mirrors only the needed fields, map/copy values from incoming parameters into that local struct, and serialize the local DTO in place of InitializeLightConfigParams (replace the serialization at the block that currently uses InitializeLightConfigParams around lines 103-114). Ensure no other references to InitializeLightConfigParams remain in initialize_config.rs and update any function signatures or helper methods that referenced that type to accept primitives or the new local DTO so the sdk-libs crate no longer depends on the sibling light_account DTO.sdk-libs/program-test/src/indexer/test_indexer.rs (1)
98-111:⚠️ Potential issue | 🟠 MajorParse wrapped prover responses in
build_compressed_proof.Line 98 still assumes
bodyis raw gnark proof JSON. If prover returns{ "proof": ..., "proof_duration_ms": ... }, this fails after HTTP 200 and breaks validity-proof generation.🔧 Proposed fix
+#[derive(serde::Deserialize)] +struct ProofWithTimingJson { + proof: serde_json::Value, +} + fn build_compressed_proof(body: &str) -> Result<CompressedProof, IndexerError> { - let proof_json = deserialize_gnark_proof_json(body) + let proof_body = serde_json::from_str::<ProofWithTimingJson>(body) + .map(|wrapped| wrapped.proof.to_string()) + .unwrap_or_else(|_| body.to_string()); + let proof_json = deserialize_gnark_proof_json(&proof_body) .map_err(|error| IndexerError::CustomError(error.to_string()))?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 98 - 111, The build_compressed_proof function currently assumes body is raw gnark proof JSON; update it to accept both raw proof JSON and a wrapped responder object like { "proof": <proof>, "proof_duration_ms": ... } by detecting and extracting the inner "proof" field (handle cases where "proof" is a JSON object or a JSON string) before calling deserialize_gnark_proof_json, then continue with proof_from_json_struct and compress_proof as before; modify logic in build_compressed_proof to parse the wrapper, extract the inner proof string/object, and only pass that inner proof to deserialize_gnark_proof_json so existing helpers (deserialize_gnark_proof_json, proof_from_json_struct, compress_proof) can be reused.prover/client/src/proof.rs (1)
72-83:⚠️ Potential issue | 🟠 MajorAdd a pre-parse size guard before bigint parsing.
Line 77 parses the whole string first, and only then checks byte length. Very large hex inputs can still trigger expensive parse work before rejection.
💡 Proposed fix
pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> { let trimmed_str = hex_str .strip_prefix("0x") .or_else(|| hex_str.strip_prefix("0X")) .unwrap_or(hex_str); + let normalized = trimmed_str.trim_start_matches('0'); + if normalized.len() > 64 { + return Err(ProverClientError::InvalidHexString(format!( + "{hex_str}: exceeds 32 bytes" + ))); + } + if normalized.is_empty() { + return Ok([0u8; 32]); + } - let big_uint = num_bigint::BigUint::from_str_radix(trimmed_str, 16) + let big_uint = num_bigint::BigUint::from_str_radix(normalized, 16) .map_err(|error| ProverClientError::InvalidHexString(format!("{hex_str}: {error}")))?;Verification (confirm parse currently happens before length guard):
#!/bin/bash set -euo pipefail rg -n 'deserialize_hex_string_to_be_bytes|from_str_radix|exceeds 32 bytes' prover/client/src/proof.rs -A6 -B6🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@prover/client/src/proof.rs` around lines 72 - 83, The function deserialize_hex_string_to_be_bytes currently parses the entire hex into a BigUint before checking size; add a pre-parse guard in deserialize_hex_string_to_be_bytes that inspects the trimmed hex string (after removing "0x"/"0X") and rejects any input whose hex digit length is greater than 64 (i.e., would exceed 32 bytes) with the same ProverClientError::InvalidHexString message, returning early before calling num_bigint::BigUint::from_str_radix; keep the existing flow for inputs <=64 hex chars so the subsequent parsing and byte-length check remain unchanged.prover/client/src/proof_types/non_inclusion/v2/json.rs (1)
27-29:⚠️ Potential issue | 🟠 MajorClient/server
pathIndexwidth is still inconsistent.Line 28 and Line 54+ now allow
u64, but prover server v2 snippets still deserializepathIndexintouint32(prover/server/prover/v2/marshal_non_inclusion.go,prover/server/prover/v2/non_inclusion_proving_system.go).
That means indices aboveu32::MAXwill fail at the service boundary.💡 Compatibility guard (until server contract is widened)
let prof_input = NonInclusionJsonStruct { root: big_int_to_string(&input.root), value: big_int_to_string(&input.value), - path_index: input + path_index: input .index_hashed_indexed_element_leaf .to_u64() .ok_or_else(|| { ProverClientError::IntegerConversion(format!( "failed to convert path index {} to u64", input.index_hashed_indexed_element_leaf )) - })?, + }) + .and_then(|v| { + if v > u32::MAX as u64 { + Err(ProverClientError::IntegerConversion(format!( + "path index {} exceeds prover-server uint32 contract", + v + ))) + } else { + Ok(v) + } + })?,Run this to verify the server-side contract type in this branch:
#!/bin/bash set -euo pipefail rg -nP 'type\s+NonInclusionInputs\s+struct|PathIndex\s+u?int(32|64)' \ prover/server/prover/v2/non_inclusion_proving_system.go -C2 rg -nP 'type\s+NonInclusionProofInputsJSON\s+struct|PathIndex\s+u?int(32|64)' \ prover/server/prover/v2/marshal_non_inclusion.go -C2Also applies to: 54-62
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@prover/client/src/proof_types/non_inclusion/v2/json.rs` around lines 27 - 29, The client JSON struct's path_index is typed as u64 but the prover v2 server expects a 32-bit index (PathIndex / NonInclusionProofInputsJSON use uint32), so change the JSON field type (path_index) in prover/client/src/proof_types/non_inclusion/v2/json.rs from u64 to u32 to match the server contract; if you need to accept larger values temporarily, add explicit validation in the deserialization/constructor for the containing struct to reject values > u32::MAX with a clear error instead of silently truncating.forester/src/epoch_manager.rs (1)
1399-1432:⚠️ Potential issue | 🟠 MajorAwait epoch-matched new-tree workers before reporting/cleanup.
process_epoch()reports work (Line 1399+) and clears epoch maps (Line 1423+) without waiting forNewTreeWorkertasks spawned inadd_new_tree()(Line 943+). A late worker can still callupdate_metrics_and_counts()after reporting/cleanup, causing under-reporting and re-created per-epoch state.Also applies to: 943-1023
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@forester/src/epoch_manager.rs` around lines 1399 - 1432, process_epoch() is clearing per-epoch state and reporting before NewTreeWorker tasks (spawned by add_new_tree()) finish, allowing late workers to call update_metrics_and_counts() after cleanup; fix this by ensuring process_epoch() awaits completion of all NewTreeWorker tasks for the same epoch before sending final reports and removing maps: locate where add_new_tree() spawns NewTreeWorker tasks and store their JoinHandles (or register them in a per-epoch collection like processing_epochs/new_tree_workers keyed by epoch), then in process_epoch() wait/join/await all handles whose key == epoch (or poll a per-epoch completion Notify/Future that NewTreeWorker pushes when done) prior to calling send_work_report(), report_work_onchain(), and removing registration_trackers/processing_epochs/processed_items_per_epoch_count/processing_metrics_per_epoch; ensure update_metrics_and_counts() no longer races by only cleaning state after those awaits complete.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@forester/src/epoch_manager.rs`:
- Around line 477-480: register_new_tree_worker currently always pushes a new
worker even after shutdown; change it to first check the epoch manager's
shutdown state and refuse/ignore the new worker if shutdown is requested
(instead of pushing). Specifically, inside register_new_tree_worker (which
already calls reap_finished_new_tree_workers and accesses new_tree_workers),
query the same shutdown indicator used by shutdown_new_tree_workers/run (e.g., a
shutdown_requested flag or similar Atomic/Mutex field) while holding appropriate
synchronization, and return early without pushing the worker if shutdown is set;
ensure the check happens before new_tree_workers.lock().await.push(...) so late
registrations cannot escape shutdown.
In `@prover/client/src/helpers.rs`:
- Around line 64-75: The code silently truncates path_index bits when iterating
over path_elements; before the loop validate that path_index fits within the
tree height (i.e., that no bits above path_elements.len() or the constant HEIGHT
are set) and return an Err if it does. Locate the variables/path used in this
function (path_index, path_elements, current_index, HEIGHT, changelog_entry) and
add a pre-loop check that fails when path_index >= (1 << path_elements.len()) or
when path_index has any 1-bits beyond HEIGHT, so the function returns an error
instead of computing an incorrect root/changelog path.
In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 125-132: The validator process check using child.try_wait() must
treat any early exit (Some(status)) as failure even when status.success() is
true; modify the logic around tokio::time::sleep(... config.wait_time) and the
child.try_wait() call so that if try_wait() returns Some(status) you always fail
the test (e.g., panic/assert) with a clear message including the status, rather
than only asserting on non-successful status; update the branch that currently
inspects status.success() to instead unconditionally fail on Some(status) and
include the status details in the error message to aid debugging.
---
Outside diff comments:
In `@forester/tests/legacy/batched_state_async_indexer_test.rs`:
- Around line 300-317: The fallback polling in wait_for_slot can hang forever if
warp_to_slot fails; update the async function wait_for_slot to enforce a maximum
wait (e.g., via tokio::time::timeout or an elapsed timer with
max_duration/max_attempts) when repeatedly calling rpc.get_slot(), and return an
error or explicitly fail the test when the timeout is reached; keep references
to rpc.warp_to_slot and rpc.get_slot so the polling logic is wrapped in the
timeout and the function signature is adjusted to propagate or assert a failure
instead of blocking indefinitely.
- Around line 78-89: The test currently relies on init(Some(LightValidatorConfig
{ enable_prover: true, use_surfpool: true, ... })).await but that implicit wait
is flaky; after calling init(...) add an explicit spawn_prover().await call (the
same readiness gate used in forester/tests/e2e_test.rs) immediately after init
to wait for the prover health before proceeding, ensuring the prover has
connected and avoiding DNS/connection cancellation errors.
In `@prover/client/src/proof_types/batch_append/proof_inputs.rs`:
- Line 193: The code currently uses merkle_proof.try_into().unwrap() which can
panic on invalid length; change this to a fallible conversion that returns a
ProverClientError instead (mirror the pattern used in
BatchAppendsCircuitInputs::new()). Replace the unwrap with something like
merkle_proof.try_into().map_err(|_| ProverClientError::...)? and propagate the
Result so the enclosing function returns Err on conversion failure; reference
the merkle_proof_array variable and the BatchAppendsCircuitInputs::new()
implementation for the exact mapping/variant to use.
In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 40-41: The doc comment for the validator_args field is misleading:
it shows "--account <ADDRESS> <FILEPATH>" as one string but the code expects
each Vec<String> entry to be a separate CLI token; update the documentation for
the validator_args field (symbol: validator_args) to show pre-tokenized usage
such as vec!["--account", "<ADDRESS>", "<FILEPATH>"] and explicitly state that
each Vec element becomes one CLI argument so multi-part flags must be split into
separate entries.
In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 719-733: The AddressQueueData being constructed sets
tree_next_insertion_index incorrectly to address_tree_bundle.right_most_index();
update the assignment so tree_next_insertion_index uses the next available slot
by adding one: set tree_next_insertion_index to
address_tree_bundle.right_most_index() as u64 + 1 (leave other fields like
initial_root and subtrees unchanged) so subsequent leaf insertion/proof logic
uses the correct insertion index.
---
Duplicate comments:
In `@forester/src/epoch_manager.rs`:
- Around line 1399-1432: process_epoch() is clearing per-epoch state and
reporting before NewTreeWorker tasks (spawned by add_new_tree()) finish,
allowing late workers to call update_metrics_and_counts() after cleanup; fix
this by ensuring process_epoch() awaits completion of all NewTreeWorker tasks
for the same epoch before sending final reports and removing maps: locate where
add_new_tree() spawns NewTreeWorker tasks and store their JoinHandles (or
register them in a per-epoch collection like processing_epochs/new_tree_workers
keyed by epoch), then in process_epoch() wait/join/await all handles whose key
== epoch (or poll a per-epoch completion Notify/Future that NewTreeWorker pushes
when done) prior to calling send_work_report(), report_work_onchain(), and
removing
registration_trackers/processing_epochs/processed_items_per_epoch_count/processing_metrics_per_epoch;
ensure update_metrics_and_counts() no longer races by only cleaning state after
those awaits complete.
In `@prover/client/src/proof_types/non_inclusion/v2/json.rs`:
- Around line 27-29: The client JSON struct's path_index is typed as u64 but the
prover v2 server expects a 32-bit index (PathIndex / NonInclusionProofInputsJSON
use uint32), so change the JSON field type (path_index) in
prover/client/src/proof_types/non_inclusion/v2/json.rs from u64 to u32 to match
the server contract; if you need to accept larger values temporarily, add
explicit validation in the deserialization/constructor for the containing struct
to reject values > u32::MAX with a clear error instead of silently truncating.
In `@prover/client/src/proof.rs`:
- Around line 72-83: The function deserialize_hex_string_to_be_bytes currently
parses the entire hex into a BigUint before checking size; add a pre-parse guard
in deserialize_hex_string_to_be_bytes that inspects the trimmed hex string
(after removing "0x"/"0X") and rejects any input whose hex digit length is
greater than 64 (i.e., would exceed 32 bytes) with the same
ProverClientError::InvalidHexString message, returning early before calling
num_bigint::BigUint::from_str_radix; keep the existing flow for inputs <=64 hex
chars so the subsequent parsing and byte-length check remain unchanged.
In `@sdk-libs/client/src/interface/initialize_config.rs`:
- Line 11: Replace the hardcoded INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR
constant usage with the Anchor Discriminator for the instruction type: import
anchor_lang::Discriminator and call the instruction struct/type's discriminator
(e.g., YourInstructionType::discriminator()) wherever
INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR is referenced (including the block
around lines 120-124) so the 8-byte discriminator is sourced from Anchor instead
of a constant; remove the constant definition and update any
comparisons/constructions to use the returned discriminator byte array from the
Discriminator trait.
- Line 7: Remove the direct dependency on
light_account::InitializeLightConfigParams by deleting the import and stop
serializing that external DTO; instead define and use a local DTO (e.g.,
InitializeLightConfigDto) in this crate that mirrors only the needed fields,
map/copy values from incoming parameters into that local struct, and serialize
the local DTO in place of InitializeLightConfigParams (replace the serialization
at the block that currently uses InitializeLightConfigParams around lines
103-114). Ensure no other references to InitializeLightConfigParams remain in
initialize_config.rs and update any function signatures or helper methods that
referenced that type to accept primitives or the new local DTO so the sdk-libs
crate no longer depends on the sibling light_account DTO.
In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 98-111: The build_compressed_proof function currently assumes body
is raw gnark proof JSON; update it to accept both raw proof JSON and a wrapped
responder object like { "proof": <proof>, "proof_duration_ms": ... } by
detecting and extracting the inner "proof" field (handle cases where "proof" is
a JSON object or a JSON string) before calling deserialize_gnark_proof_json,
then continue with proof_from_json_struct and compress_proof as before; modify
logic in build_compressed_proof to parse the wrapper, extract the inner proof
string/object, and only pass that inner proof to deserialize_gnark_proof_json so
existing helpers (deserialize_gnark_proof_json, proof_from_json_struct,
compress_proof) can be reused.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 408ea586-b76b-47c2-9158-adaad7ae4063
⛔ Files ignored due to path filters (4)
program-tests/utils/src/mock_batched_forester.rsis excluded by none and included by nonesdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/read_only.rsis excluded by none and included by nonesdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/test.rsis excluded by none and included by nonesdk-tests/sdk-native-test/tests/test.rsis excluded by none and included by none
📒 Files selected for processing (19)
forester/src/epoch_manager.rsforester/src/forester_status.rsforester/src/metrics.rsforester/src/priority_fee.rsforester/src/processor/v2/proof_worker.rsforester/tests/e2e_test.rsforester/tests/legacy/batched_state_async_indexer_test.rsforester/tests/test_batch_append_spent.rsprover/client/src/helpers.rsprover/client/src/proof.rsprover/client/src/proof_types/batch_address_append/json.rsprover/client/src/proof_types/batch_append/proof_inputs.rsprover/client/src/proof_types/combined/v2/json.rsprover/client/src/proof_types/non_inclusion/v2/json.rsprover/client/tests/batch_address_append.rsprover/client/tests/init_merkle_tree.rssdk-libs/client/src/interface/initialize_config.rssdk-libs/client/src/local_test_validator.rssdk-libs/program-test/src/indexer/test_indexer.rs
| async fn register_new_tree_worker(&self, worker: NewTreeWorker) { | ||
| self.reap_finished_new_tree_workers().await; | ||
| self.new_tree_workers.lock().await.push(worker); | ||
| } |
There was a problem hiding this comment.
Block post-shutdown worker registration after drain.
register_new_tree_worker() at Line 477 always appends, even after shutdown is requested. run() drains/cancels workers once in shutdown_new_tree_workers() (Line 482 onward), so a late registration can escape shutdown and outlive service exit.
Suggested fix
async fn register_new_tree_worker(&self, worker: NewTreeWorker) {
+ if self.shutdown_requested.load(Ordering::Acquire) {
+ let mut worker = worker;
+ if let Some(cancel) = worker.cancel.take() {
+ let _ = cancel.send(());
+ }
+ self.detach_new_tree_worker_join(worker);
+ return;
+ }
self.reap_finished_new_tree_workers().await;
self.new_tree_workers.lock().await.push(worker);
}Also applies to: 482-537
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@forester/src/epoch_manager.rs` around lines 477 - 480,
register_new_tree_worker currently always pushes a new worker even after
shutdown; change it to first check the epoch manager's shutdown state and
refuse/ignore the new worker if shutdown is requested (instead of pushing).
Specifically, inside register_new_tree_worker (which already calls
reap_finished_new_tree_workers and accesses new_tree_workers), query the same
shutdown indicator used by shutdown_new_tree_workers/run (e.g., a
shutdown_requested flag or similar Atomic/Mutex field) while holding appropriate
synchronization, and return early without pushing the worker if shutdown is set;
ensure the check happens before new_tree_workers.lock().await.push(...) so late
registrations cannot escape shutdown.
| let mut current_index = path_index; | ||
| for (level, path_element) in path_elements.iter().enumerate() { | ||
| changelog_entry.path[level] = Some(current_hash); | ||
| if current_index.is_multiple_of(2) { | ||
| current_hash = Poseidon::hashv(&[¤t_hash, path_element]).unwrap(); | ||
| current_hash = Poseidon::hashv(&[¤t_hash, path_element])?; | ||
| } else { | ||
| current_hash = Poseidon::hashv(&[path_element, ¤t_hash]).unwrap(); | ||
| current_hash = Poseidon::hashv(&[path_element, ¤t_hash])?; | ||
| } | ||
| current_index /= 2; | ||
| } | ||
|
|
||
| (current_hash, changelog_entry) | ||
| Ok((current_hash, changelog_entry)) |
There was a problem hiding this comment.
Add explicit path-index bounds validation for tree height.
Line 64 accepts usize indices, but the loop only consumes HEIGHT bits. Indices with higher bits are silently truncated, which can compute the wrong root/changelog path without error.
💡 Proposed fix
pub fn compute_root_from_merkle_proof<const HEIGHT: usize>(
leaf: [u8; 32],
path_elements: &[[u8; 32]; HEIGHT],
path_index: usize,
) -> Result<([u8; 32], ChangelogEntry<HEIGHT>), ProverClientError> {
let mut changelog_entry = ChangelogEntry::default_with_index(path_index);
let mut current_hash = leaf;
let mut current_index = path_index;
for (level, path_element) in path_elements.iter().enumerate() {
changelog_entry.path[level] = Some(current_hash);
if current_index.is_multiple_of(2) {
current_hash = Poseidon::hashv(&[¤t_hash, path_element])?;
} else {
current_hash = Poseidon::hashv(&[path_element, ¤t_hash])?;
}
current_index /= 2;
}
+ if current_index != 0 {
+ return Err(ProverClientError::InvalidProofData(format!(
+ "path index {} exceeds tree height {}",
+ path_index, HEIGHT
+ )));
+ }
Ok((current_hash, changelog_entry))
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@prover/client/src/helpers.rs` around lines 64 - 75, The code silently
truncates path_index bits when iterating over path_elements; before the loop
validate that path_index fits within the tree height (i.e., that no bits above
path_elements.len() or the constant HEIGHT are set) and return an Err if it
does. Locate the variables/path used in this function (path_index,
path_elements, current_index, HEIGHT, changelog_entry) and add a pre-loop check
that fails when path_index >= (1 << path_elements.len()) or when path_index has
any 1-bits beyond HEIGHT, so the function returns an error instead of computing
an incorrect root/changelog path.
| tokio::time::sleep(tokio::time::Duration::from_secs(config.wait_time)).await; | ||
| if let Some(status) = child.try_wait().expect("Failed to poll validator process") { | ||
| assert!( | ||
| status.success(), | ||
| "Validator exited early with error: {}", | ||
| status | ||
| ); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify expected lifecycle of non-surfpool `test-validator`.
cat -n cli/test_bin/run | sed -n '1,260p'
rg -nP -C3 'test-validator|surfpool|solana-test-validator|nohup|disown|&' cli/test_bin/runRepository: Lightprotocol/light-protocol
Length of output: 212
🏁 Script executed:
#!/bin/bash
# Find and examine the local_test_validator.rs file
fd -type f -name "local_test_validator.rs" sdk-libs/Repository: Lightprotocol/light-protocol
Length of output: 242
🏁 Script executed:
#!/bin/bash
# Get the file size first
wc -l sdk-libs/client/src/local_test_validator.rsRepository: Lightprotocol/light-protocol
Length of output: 119
🏁 Script executed:
#!/bin/bash
# Examine the critical section and surrounding context (lines 110-145)
sed -n '110,145p' sdk-libs/client/src/local_test_validator.rs | cat -nRepository: Lightprotocol/light-protocol
Length of output: 1309
🏁 Script executed:
#!/bin/bash
# Search for spawn_validator function definition and surfpool-related logic
rg -n "fn spawn_validator|surfpool|use_surfpool" sdk-libs/client/src/local_test_validator.rs -B2 -A8Repository: Lightprotocol/light-protocol
Length of output: 2026
🏁 Script executed:
#!/bin/bash
# Read the full spawn_validator function (lines 59-132)
sed -n '59,132p' sdk-libs/client/src/local_test_validator.rs | cat -nRepository: Lightprotocol/light-protocol
Length of output: 3535
🏁 Script executed:
#!/bin/bash
# Search for spawn_validator usage in tests to understand expected behavior
rg -n "spawn_validator" sdk-libs/client/ -B3 -A10 --type rsRepository: Lightprotocol/light-protocol
Length of output: 98
🏁 Script executed:
#!/bin/bash
# Check if there's any documentation or comments about validator lifecycle
rg -n "long-running|background|keep.*running|exit" sdk-libs/client/src/local_test_validator.rsRepository: Lightprotocol/light-protocol
Length of output: 300
🏁 Script executed:
#!/bin/bash
# Look at the CLI implementation to understand what test-validator does
fd "run" cli/test_bin/ -type fRepository: Lightprotocol/light-protocol
Length of output: 242
🏁 Script executed:
#!/bin/bash
# Search for spawn_validator usage in tests to understand expected behavior
rg -n "spawn_validator" sdk-libs/client/ -B3 -A10Repository: Lightprotocol/light-protocol
Length of output: 4992
🏁 Script executed:
#!/bin/bash
# Check the CLI test-validator implementation
cat -n cli/test_bin/run | head -20Repository: Lightprotocol/light-protocol
Length of output: 212
🏁 Script executed:
#!/bin/bash
# Search for test files that use spawn_validator
fd spawn_validator sdk-libs/ | head -20Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Look for test files using spawn_validator in sdk-libs
find sdk-libs -type f -name "*test*" -o -name "*tests*" | head -20Repository: Lightprotocol/light-protocol
Length of output: 1220
🏁 Script executed:
#!/bin/bash
# Search in sdk-tests for spawn_validator usage
rg -l "spawn_validator" sdk-tests/ 2>/dev/null | head -10Repository: Lightprotocol/light-protocol
Length of output: 115
🏁 Script executed:
#!/bin/bash
# Look at examples or integration tests
find sdk-libs/client -type f \( -name "*.rs" \) -exec grep -l "spawn_validator" {} \;Repository: Lightprotocol/light-protocol
Length of output: 142
🏁 Script executed:
#!/bin/bash
# Examine the actual test that uses spawn_validator
cat -n sdk-tests/client-test/tests/light_client.rs | head -80Repository: Lightprotocol/light-protocol
Length of output: 3857
🏁 Script executed:
#!/bin/bash
# Look for how spawn_validator is called in the test
rg -n "spawn_validator" sdk-tests/client-test/tests/light_client.rs -B5 -A15Repository: Lightprotocol/light-protocol
Length of output: 1961
🏁 Script executed:
#!/bin/bash
# Check if there are other tests or uses of non-surfpool mode
rg -n "use_surfpool.*false" sdk-libs/ sdk-tests/Repository: Lightprotocol/light-protocol
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Look for any issues or comments about validator exit behavior
rg -n "exit|long.running|background" sdk-libs/client/src/local_test_validator.rs -C2Repository: Lightprotocol/light-protocol
Length of output: 839
🏁 Script executed:
#!/bin/bash
# Check if solana-test-validator or the CLI wrapper naturally exits
# Look for any solana-test-validator documentation or similar processes
rg -n "solana-test-validator" --no-heading | head -20Repository: Lightprotocol/light-protocol
Length of output: 1714
🏁 Script executed:
#!/bin/bash
# Verify: is the validator meant to be a long-running process in non-surfpool mode?
# Check what happens after spawn_validator in actual usage
rg -B2 -A20 "spawn_validator.*await" sdk-tests/ sdk-libs/Repository: Lightprotocol/light-protocol
Length of output: 4740
Early validator exit in non-surfpool mode should always fail, not just on non-zero exit code.
solana-test-validator is a long-running process that should persist throughout test execution. The current code at line 126–131 only asserts when exit is unsuccessful; if the validator exits with code 0, the test silently continues without a live validator. This allows tests to run against a dead validator, masking real issues.
💡 Proposed fix
- if let Some(status) = child.try_wait().expect("Failed to poll validator process") {
- assert!(
- status.success(),
- "Validator exited early with error: {}",
- status
- );
- }
+ if let Some(status) = child.try_wait().expect("Failed to poll validator process") {
+ panic!("Validator exited before readiness window completed: {}", status);
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk-libs/client/src/local_test_validator.rs` around lines 125 - 132, The
validator process check using child.try_wait() must treat any early exit
(Some(status)) as failure even when status.success() is true; modify the logic
around tokio::time::sleep(... config.wait_time) and the child.try_wait() call so
that if try_wait() returns Some(status) you always fail the test (e.g.,
panic/assert) with a clear message including the status, rather than only
asserting on non-successful status; update the branch that currently inspects
status.success() to instead unconditionally fail on Some(status) and include the
status details in the error message to aid debugging.
c97abbd to
0ae954a
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (7)
sdk-libs/client/src/local_test_validator.rs (1)
126-132:⚠️ Potential issue | 🟠 MajorFail on any early validator exit in non-surfpool mode.
At Line 126,
Some(status)means the validator exited during the readiness window; that should fail regardless ofstatus.success(), otherwise tests may continue without a live validator.Proposed fix
- if let Some(status) = child.try_wait().expect("Failed to poll validator process") { - assert!( - status.success(), - "Validator exited early with error: {}", - status - ); - } + if let Some(status) = child.try_wait().expect("Failed to poll validator process") { + panic!( + "Validator exited before readiness window completed: {}", + status + ); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk-libs/client/src/local_test_validator.rs` around lines 126 - 132, The current readiness check only treats an early exit as failing when status.success() is false; change the logic so any Some(status) returned from child.try_wait() is considered a failure during the readiness window. Locate the block using child.try_wait() in local_test_validator.rs (the if let Some(status) = child.try_wait() ... block) and replace the conditional so that encountering Some(status) always causes a test failure (include the status in the error message) rather than only failing when status.success() is false.sdk-libs/client/src/interface/initialize_config.rs (1)
7-7:⚠️ Potential issue | 🟠 MajorKeep
sdk-libs/clientdecoupled from sibling SDK crates.
InitializeLightConfigParamsis imported fromlight_accountand used directly for instruction payload assembly. That keeps an SDK-to-SDK dependency in this module; use a local boundary DTO (or a program-lib interface type) and map into it here.As per coding guidelines,
sdk-libs/**/src/**/*.rs: SDK-libs must depend on program-libs, light-prover-client and external crates only.Also applies to: 103-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk-libs/client/src/interface/initialize_config.rs` at line 7, This module currently imports and uses InitializeLightConfigParams from the sibling crate light_account; remove that direct SDK-to-SDK dependency by defining a local DTO (e.g., InitializeLightConfigDto) or using the program-lib interface type inside initialize_config.rs and map into the program-level type where the instruction payload is assembled (the code around the existing usage of InitializeLightConfigParams at the assembly site referenced in lines ~103-114). Replace the use declaration for InitializeLightConfigParams with the local DTO, update the code that builds the instruction payload to convert/map from your local DTO to the program-lib expected type, and ensure all references to InitializeLightConfigParams in this file are replaced by the new local type and mapping logic.prover/client/src/proof.rs (1)
72-94:⚠️ Potential issue | 🟠 MajorPre-parse length guard is still missing.
The function parses the entire hex string via
from_str_radix(line 77) before checking the byte length (line 80). For adversarially large inputs, this can cause unnecessary CPU and memory consumption.Add a pre-flight check on the normalized hex string length before parsing.
🛠️ Proposed fix
pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> { let trimmed_str = hex_str .strip_prefix("0x") .or_else(|| hex_str.strip_prefix("0X")) .unwrap_or(hex_str); + // Pre-flight check: 64 hex chars = 32 bytes max + let normalized = trimmed_str.trim_start_matches('0'); + if normalized.len() > 64 { + return Err(ProverClientError::InvalidHexString(format!( + "{hex_str}: exceeds 32 bytes" + ))); + } let big_uint = num_bigint::BigUint::from_str_radix(trimmed_str, 16)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@prover/client/src/proof.rs` around lines 72 - 94, The function deserialize_hex_string_to_be_bytes currently calls num_bigint::BigUint::from_str_radix on trimmed_str before validating its length; add a pre-parse guard on the normalized trimmed_str (the value computed by strip_prefix) to reject inputs whose hex length exceeds 64 characters (32 bytes) and return ProverClientError::InvalidHexString immediately, and also normalize odd-length hex strings by left-padding with a '0' before parsing so from_str_radix isn't given malformed input; implement this check right after computing trimmed_str and before calling BigUint::from_str_radix in deserialize_hex_string_to_be_bytes.prover/client/src/helpers.rs (1)
56-76:⚠️ Potential issue | 🟠 MajorPath index bounds validation is still missing.
The function now returns
Result, which is great, but it still silently truncatespath_indexvalues that exceed the tree height. Whenpath_index >= (1 << HEIGHT), the higher bits are consumed without error, computing an incorrect root.After the loop completes, if
current_index != 0, the path index was too large for the tree.🛠️ Proposed fix
for (level, path_element) in path_elements.iter().enumerate() { changelog_entry.path[level] = Some(current_hash); if current_index.is_multiple_of(2) { current_hash = Poseidon::hashv(&[¤t_hash, path_element])?; } else { current_hash = Poseidon::hashv(&[path_element, ¤t_hash])?; } current_index /= 2; } + if current_index != 0 { + return Err(ProverClientError::InvalidProofData(format!( + "path index {} exceeds tree height {}", + path_index, HEIGHT + ))); + } Ok((current_hash, changelog_entry))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@prover/client/src/helpers.rs` around lines 56 - 76, compute_root_from_merkle_proof currently silently truncates oversized path_index by dividing current_index in the loop; detect this and return an Err when path_index is too large for the tree (i.e. after the loop if current_index != 0). Update compute_root_from_merkle_proof to validate that path_index < (1 << HEIGHT) (or equivalently check current_index == 0 after processing all HEIGHT levels) and return a ProverClientError (create or reuse an appropriate variant) instead of returning an incorrect root and ChangelogEntry; reference variables/function names: compute_root_from_merkle_proof, path_index, current_index, HEIGHT, and ChangelogEntry.prover/client/src/proof_types/batch_append/proof_inputs.rs (1)
193-193:⚠️ Potential issue | 🟠 MajorLine 193 still panics on malformed merkle proof lengths.
The preflight checks on lines 131-142 validate array counts, but individual proof heights are not validated. When
merkle_proofhas the wrong number of elements,try_into().unwrap()will panic instead of returning aProverClientError.Note that
BatchAppendsCircuitInputs::new(lines 64-72) correctly handles this withmap_err, so this function should follow the same pattern.🛠️ Proposed fix
- let merkle_proof_array = merkle_proof.try_into().unwrap(); + let merkle_proof_array: [[u8; 32]; HEIGHT] = merkle_proof.try_into().map_err(|_| { + ProverClientError::GenericError(format!( + "invalid merkle proof length at index {}: expected {}", + i, HEIGHT + )) + })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@prover/client/src/proof_types/batch_append/proof_inputs.rs` at line 193, The code currently calls merkle_proof.try_into().unwrap() which will panic on malformed proof lengths; instead propagate the conversion error into a ProverClientError like BatchAppendsCircuitInputs::new does. Replace the unwrap with try_into().map_err(...) and convert the conversion error into the appropriate ProverClientError (e.g., an InvalidMerkleProofLength/BadInput variant) so merkle_proof_array creation returns a Result rather than panicking; reference the merkle_proof variable, merkle_proof_array binding, the ProverClientError type, and mirror the error-mapping pattern used in BatchAppendsCircuitInputs::new.forester/src/epoch_manager.rs (1)
477-480:⚠️ Potential issue | 🟠 MajorBlock late worker registration after shutdown is requested.
Line 477 currently accepts new workers even after shutdown starts. Since
shutdown_new_tree_workersdrains only once, late workers can escape cancellation/join and outlive service exit.💡 Suggested fix
async fn register_new_tree_worker(&self, worker: NewTreeWorker) { + if self.shutdown_requested.load(Ordering::Acquire) { + let mut worker = worker; + if let Some(cancel) = worker.cancel.take() { + let _ = cancel.send(()); + } + self.detach_new_tree_worker_join(worker); + return; + } self.reap_finished_new_tree_workers().await; self.new_tree_workers.lock().await.push(worker); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@forester/src/epoch_manager.rs` around lines 477 - 480, register_new_tree_worker currently pushes new workers even after shutdown is initiated, letting late workers escape the single-run drain in shutdown_new_tree_workers; modify register_new_tree_worker to check the shutdown state before accepting a worker (e.g., inspect the same guard/flag or a shared AtomicBool used by shutdown_new_tree_workers) while holding the new_tree_workers lock, and refuse/skip pushing the worker (or return early) when shutdown has been requested so late workers are not added after shutdown_new_tree_workers runs.sdk-libs/program-test/src/indexer/test_indexer.rs (1)
98-111:⚠️ Potential issue | 🟠 MajorUnwrap timed prover envelopes before deserializing.
build_compressed_proof()still feeds the full 200 OK body intodeserialize_gnark_proof_json(). If the prover responds with{ "proof": ..., "proof_duration_ms": ... }, this helper still fails on an otherwise successful proof request.🔧 Suggested fix
+#[derive(serde::Deserialize)] +struct ProofEnvelope { + proof: serde_json::Value, +} + fn build_compressed_proof(body: &str) -> Result<CompressedProof, IndexerError> { - let proof_json = deserialize_gnark_proof_json(body) + let proof_body = serde_json::from_str::<ProofEnvelope>(body) + .map(|wrapped| wrapped.proof.to_string()) + .unwrap_or_else(|_| body.to_owned()); + let proof_json = deserialize_gnark_proof_json(&proof_body) .map_err(|error| IndexerError::CustomError(error.to_string()))?; let (proof_a, proof_b, proof_c) = proof_from_json_struct(proof_json) .map_err(|error| IndexerError::CustomError(error.to_string()))?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 98 - 111, build_compressed_proof currently passes the whole HTTP body (which may be an envelope like { "proof": ..., "proof_duration_ms": ... }) into deserialize_gnark_proof_json; first parse/unwrap that envelope to extract the inner proof payload (the "proof" field) before calling deserialize_gnark_proof_json, then continue with proof_from_json_struct and compress_proof unchanged; update build_compressed_proof to detect the envelope shape (e.g., try deserializing into a small struct with a proof field or check for a top-level "proof" key) and use the inner proof string/object for subsequent deserialization.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 115-116: The current unbounded await on child.wait().await can
hang tests; wrap the wait with tokio::time::timeout using the configured timeout
(e.g., config.wait_time or a surfpool-specific timeout like surfpool_timeout)
when awaiting the CLI child process (the child.wait().await call that produces
status). If the timeout elapses, ensure you kill/terminate the child process
(child.kill() or similar), await its termination, and then fail the test
asserting with a clear timeout error message instead of blocking; otherwise,
continue to assert on status.success() as before.
In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 2595-2608: The retry loop is being short-circuited by using `?` on
`response.text().await` and `build_compressed_proof(&body)?` so transient
failures never update `last_error` or retry; change the success-handling branch
that inspects `response_result`/`status` to explicitly attempt to read the body
and build the proof inside a match/if let and on any error (body read failure or
proof parsing failure) set `last_error` to an appropriate
`IndexerError::CustomError` with the error details and continue the loop instead
of returning early; only return `Ok(ValidityProofWithContext { accounts:
account_proof_inputs, addresses: address_proof_inputs, proof: ... })` after both
`response.text().await` and `build_compressed_proof(&body)` succeed without
using `?`.
---
Duplicate comments:
In `@forester/src/epoch_manager.rs`:
- Around line 477-480: register_new_tree_worker currently pushes new workers
even after shutdown is initiated, letting late workers escape the single-run
drain in shutdown_new_tree_workers; modify register_new_tree_worker to check the
shutdown state before accepting a worker (e.g., inspect the same guard/flag or a
shared AtomicBool used by shutdown_new_tree_workers) while holding the
new_tree_workers lock, and refuse/skip pushing the worker (or return early) when
shutdown has been requested so late workers are not added after
shutdown_new_tree_workers runs.
In `@prover/client/src/helpers.rs`:
- Around line 56-76: compute_root_from_merkle_proof currently silently truncates
oversized path_index by dividing current_index in the loop; detect this and
return an Err when path_index is too large for the tree (i.e. after the loop if
current_index != 0). Update compute_root_from_merkle_proof to validate that
path_index < (1 << HEIGHT) (or equivalently check current_index == 0 after
processing all HEIGHT levels) and return a ProverClientError (create or reuse an
appropriate variant) instead of returning an incorrect root and ChangelogEntry;
reference variables/function names: compute_root_from_merkle_proof, path_index,
current_index, HEIGHT, and ChangelogEntry.
In `@prover/client/src/proof_types/batch_append/proof_inputs.rs`:
- Line 193: The code currently calls merkle_proof.try_into().unwrap() which will
panic on malformed proof lengths; instead propagate the conversion error into a
ProverClientError like BatchAppendsCircuitInputs::new does. Replace the unwrap
with try_into().map_err(...) and convert the conversion error into the
appropriate ProverClientError (e.g., an InvalidMerkleProofLength/BadInput
variant) so merkle_proof_array creation returns a Result rather than panicking;
reference the merkle_proof variable, merkle_proof_array binding, the
ProverClientError type, and mirror the error-mapping pattern used in
BatchAppendsCircuitInputs::new.
In `@prover/client/src/proof.rs`:
- Around line 72-94: The function deserialize_hex_string_to_be_bytes currently
calls num_bigint::BigUint::from_str_radix on trimmed_str before validating its
length; add a pre-parse guard on the normalized trimmed_str (the value computed
by strip_prefix) to reject inputs whose hex length exceeds 64 characters (32
bytes) and return ProverClientError::InvalidHexString immediately, and also
normalize odd-length hex strings by left-padding with a '0' before parsing so
from_str_radix isn't given malformed input; implement this check right after
computing trimmed_str and before calling BigUint::from_str_radix in
deserialize_hex_string_to_be_bytes.
In `@sdk-libs/client/src/interface/initialize_config.rs`:
- Line 7: This module currently imports and uses InitializeLightConfigParams
from the sibling crate light_account; remove that direct SDK-to-SDK dependency
by defining a local DTO (e.g., InitializeLightConfigDto) or using the
program-lib interface type inside initialize_config.rs and map into the
program-level type where the instruction payload is assembled (the code around
the existing usage of InitializeLightConfigParams at the assembly site
referenced in lines ~103-114). Replace the use declaration for
InitializeLightConfigParams with the local DTO, update the code that builds the
instruction payload to convert/map from your local DTO to the program-lib
expected type, and ensure all references to InitializeLightConfigParams in this
file are replaced by the new local type and mapping logic.
In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 126-132: The current readiness check only treats an early exit as
failing when status.success() is false; change the logic so any Some(status)
returned from child.try_wait() is considered a failure during the readiness
window. Locate the block using child.try_wait() in local_test_validator.rs (the
if let Some(status) = child.try_wait() ... block) and replace the conditional so
that encountering Some(status) always causes a test failure (include the status
in the error message) rather than only failing when status.success() is false.
In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 98-111: build_compressed_proof currently passes the whole HTTP
body (which may be an envelope like { "proof": ..., "proof_duration_ms": ... })
into deserialize_gnark_proof_json; first parse/unwrap that envelope to extract
the inner proof payload (the "proof" field) before calling
deserialize_gnark_proof_json, then continue with proof_from_json_struct and
compress_proof unchanged; update build_compressed_proof to detect the envelope
shape (e.g., try deserializing into a small struct with a proof field or check
for a top-level "proof" key) and use the inner proof string/object for
subsequent deserialization.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 47ddc44c-796b-49dd-82de-36532cb1bdaf
⛔ Files ignored due to path filters (5)
program-tests/utils/src/mock_batched_forester.rsis excluded by none and included by nonesdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/read_only.rsis excluded by none and included by nonesdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/test.rsis excluded by none and included by nonesdk-tests/sdk-native-test/tests/test.rsis excluded by none and included by nonesdk-tests/sdk-pinocchio-v1-test/tests/test.rsis excluded by none and included by none
📒 Files selected for processing (19)
forester/src/epoch_manager.rsforester/src/forester_status.rsforester/src/metrics.rsforester/src/priority_fee.rsforester/src/processor/v2/proof_worker.rsforester/tests/e2e_test.rsforester/tests/legacy/batched_state_async_indexer_test.rsforester/tests/test_batch_append_spent.rsprover/client/src/helpers.rsprover/client/src/proof.rsprover/client/src/proof_types/batch_address_append/json.rsprover/client/src/proof_types/batch_append/proof_inputs.rsprover/client/src/proof_types/combined/v2/json.rsprover/client/src/proof_types/non_inclusion/v2/json.rsprover/client/tests/batch_address_append.rsprover/client/tests/init_merkle_tree.rssdk-libs/client/src/interface/initialize_config.rssdk-libs/client/src/local_test_validator.rssdk-libs/program-test/src/indexer/test_indexer.rs
| let status = child.wait().await.expect("Failed to wait for CLI process"); | ||
| assert!(status.success(), "CLI exited with error: {}", status); |
There was a problem hiding this comment.
Bound surfpool startup wait to prevent indefinite hangs.
At Line 115, child.wait().await is unbounded. If the CLI stalls, tests can hang indefinitely; wrap with tokio::time::timeout (ideally using config.wait_time or a dedicated surfpool timeout).
Proposed fix
- let status = child.wait().await.expect("Failed to wait for CLI process");
+ let status = tokio::time::timeout(
+ tokio::time::Duration::from_secs(config.wait_time),
+ child.wait(),
+ )
+ .await
+ .expect("Timed out waiting for CLI process")
+ .expect("Failed to wait for CLI process");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk-libs/client/src/local_test_validator.rs` around lines 115 - 116, The
current unbounded await on child.wait().await can hang tests; wrap the wait with
tokio::time::timeout using the configured timeout (e.g., config.wait_time or a
surfpool-specific timeout like surfpool_timeout) when awaiting the CLI child
process (the child.wait().await call that produces status). If the timeout
elapses, ensure you kill/terminate the child process (child.kill() or similar),
await its termination, and then fail the test asserting with a clear timeout
error message instead of blocking; otherwise, continue to assert on
status.success() as before.
| Ok(response_result) => { | ||
| let status = response_result.status(); | ||
| let body = response_result.text().await.map_err(|error| { | ||
| IndexerError::CustomError(format!( | ||
| "failed to read prover response body: {error}" | ||
| )) | ||
| })?; | ||
|
|
||
| if status.is_success() { | ||
| return Ok(ValidityProofWithContext { | ||
| accounts: account_proof_inputs, | ||
| addresses: address_proof_inputs, | ||
| proof: build_compressed_proof(&body)?.into(), | ||
| }); |
There was a problem hiding this comment.
Don't short-circuit the retry loop on body/proof parsing failures.
Line 2597 and Line 2607 still use ?, so a transient response.text() failure or malformed success body exits on the first attempt instead of updating last_error and retrying. That leaves the new retry logic ineffective for two common flaky-prover cases.
🔧 Suggested fix
match response_result {
Ok(response_result) => {
let status = response_result.status();
- let body = response_result.text().await.map_err(|error| {
- IndexerError::CustomError(format!(
- "failed to read prover response body: {error}"
- ))
- })?;
+ let body = match response_result.text().await {
+ Ok(body) => body,
+ Err(error) => {
+ last_error =
+ format!("failed to read prover response body: {error}");
+ retries -= 1;
+ if retries > 0 {
+ tokio::time::sleep(Duration::from_secs(5)).await;
+ }
+ continue;
+ }
+ };
if status.is_success() {
+ let proof = match build_compressed_proof(&body) {
+ Ok(proof) => proof,
+ Err(error) => {
+ last_error =
+ format!("failed to parse prover response body: {error}");
+ retries -= 1;
+ if retries > 0 {
+ tokio::time::sleep(Duration::from_secs(5)).await;
+ }
+ continue;
+ }
+ };
return Ok(ValidityProofWithContext {
accounts: account_proof_inputs,
addresses: address_proof_inputs,
- proof: build_compressed_proof(&body)?.into(),
+ proof: proof.into(),
});
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Ok(response_result) => { | |
| let status = response_result.status(); | |
| let body = response_result.text().await.map_err(|error| { | |
| IndexerError::CustomError(format!( | |
| "failed to read prover response body: {error}" | |
| )) | |
| })?; | |
| if status.is_success() { | |
| return Ok(ValidityProofWithContext { | |
| accounts: account_proof_inputs, | |
| addresses: address_proof_inputs, | |
| proof: build_compressed_proof(&body)?.into(), | |
| }); | |
| Ok(response_result) => { | |
| let status = response_result.status(); | |
| let body = match response_result.text().await { | |
| Ok(body) => body, | |
| Err(error) => { | |
| last_error = | |
| format!("failed to read prover response body: {error}"); | |
| retries -= 1; | |
| if retries > 0 { | |
| tokio::time::sleep(Duration::from_secs(5)).await; | |
| } | |
| continue; | |
| } | |
| }; | |
| if status.is_success() { | |
| let proof = match build_compressed_proof(&body) { | |
| Ok(proof) => proof, | |
| Err(error) => { | |
| last_error = | |
| format!("failed to parse prover response body: {error}"); | |
| retries -= 1; | |
| if retries > 0 { | |
| tokio::time::sleep(Duration::from_secs(5)).await; | |
| } | |
| continue; | |
| } | |
| }; | |
| return Ok(ValidityProofWithContext { | |
| accounts: account_proof_inputs, | |
| addresses: address_proof_inputs, | |
| proof: proof.into(), | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 2595 - 2608,
The retry loop is being short-circuited by using `?` on `response.text().await`
and `build_compressed_proof(&body)?` so transient failures never update
`last_error` or retry; change the success-handling branch that inspects
`response_result`/`status` to explicitly attempt to read the body and build the
proof inside a match/if let and on any error (body read failure or proof parsing
failure) set `last_error` to an appropriate `IndexerError::CustomError` with the
error details and continue the loop instead of returning early; only return
`Ok(ValidityProofWithContext { accounts: account_proof_inputs, addresses:
address_proof_inputs, proof: ... })` after both `response.text().await` and
`build_compressed_proof(&body)` succeed without using `?`.
9e55024 to
cb20052
Compare
738ed6a to
8c74aa1
Compare
8c74aa1 to
c5e3fb5
Compare
Summary by CodeRabbit
New Features
Bug Fixes
Performance
Tests