feat: isolate and track forester worker concurrency by sergeytimoshin · Pull Request #2343 · Lightprotocol/light-protocol

sergeytimoshin · 2026-03-14T16:52:23Z

Summary by CodeRabbit

New Features
- Managed background worker lifecycle for tree processing with graceful shutdown and improved concurrency.
- More resilient prover startup with health checks, retries and monitoring.
Bug Fixes
- Broader error handling across proof, indexing, queue and network paths to avoid panics and surface clear errors.
- Safer parsing of queue data yielding unknown/absent metrics when parsing fails.
Performance
- Reduced unnecessary buffer cloning during metrics emission.
Tests
- Adjusted test runtimes to use blocking runtimes for stability; updated end-to-end and batch tests.

coderabbitai · 2026-03-14T16:52:30Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Arc-based concurrency and task orchestration added to EpochManager (NewTreeWorker lifecycle, FuturesUnordered tasks); refinalize now returns updated epoch/pda state; V2 processing switched to snapshot-driven batches; many prover client APIs and indexer/SDK types converted to fallible Result-based flows and richer data shapes.

Changes

Cohort / File(s)	Summary
Epoch Manager & Concurrency `forester/src/epoch_manager.rs`	Replaced many `self` receivers with `Arc<Self>`; added `NewTreeWorker` and `new_tree_workers` lifecycle (register/join/detach/reap/shutdown) with timeout; task orchestration via `FuturesUnordered`; `maybe_refinalize` now returns `(ForesterEpochPda, TreeForesterSchedule)`.
Tree Dispatch & Processing `forester/src/epoch_manager.rs`, `forester/src/processor/v2/processor.rs`, `forester/src/processor/v2/helpers.rs`	Dispatch paths (tree/compression/PDA/mint) now accept epoch data by value/Arc and return updated processing results; `AddressBatchSnapshot<const HEIGHT>` and `get_batch_snapshot` added; proof worker spawn made fallible and callers updated.
Address Strategy & Proof Flow `forester/src/processor/v2/strategy/address.rs`, `forester/src/processor/v2/helpers.rs`	Strategy now consumes const-generic batch snapshots and delegates to `staging_tree.process_batch` instead of manual per-batch reconstruction; removed old `BatchDataSlice`.
Proof Worker & Prover Client `forester/src/processor/v2/proof_worker.rs`, `prover/client/src/*`	ProofClients and spawn_proof_workers made `Result`-returning; many prover client functions now return `Result` (deserialize, compress, bigint conversions); added `ProverClientError::IntegerConversion` and new aliases for proof byte types.
Proof Input Types `prover/client/src/proof_types/*`	Batch address-append inputs moved to BigUint-backed fields, staged accumulators and rigorous validation; non-inclusion/path index widened (u32→u64); several JSON constructors and serializers made fallible.
Indexer & Queue Proofs `sdk-libs/client/src/indexer/types/queue.rs`, `.../types/proof.rs`, `.../photon_indexer.rs`	AddressQueueData extended with `start_index` and `tree_next_insertion_index`; proof reconstruction made const-generic returning fixed-size arrays with lookup helpers; `pack_tree_infos` now returns `Result` and validates mixed indices.
SDK Interface & Load Accounts `sdk-libs/client/src/interface/*.rs`	Replaced `InitializeCompressionConfigAnchorData` with `InitializeLightConfigParams`; added `MissingPdaCompressedData` error; batching/grouping for PDAs and concurrent proof fetching added; `PackError` gains `Indexer` variant.
Decompression & Token Seeds `sdk-libs/sdk-types/src/interface/program/decompression/*`	`token_seeds` nesting increased from `Vec<Vec<u8>>` → `Vec<Vec<Vec<u8>>>`; token-vault seed grouping now pushes per-vault seed collections; some parameter renames to use `tree_info.queue_pubkey_index`.
Tests & Runtime Changes `forester/tests/`, `prover/client/tests/`, `sdk-libs/program-test/src/indexer/test_indexer.rs`	Switched some test task spawns to `tokio::task::spawn_blocking` with dedicated runtimes; added `build_compressed_proof` test helper; adjusted tests for u64 path indices and shutdown/wait handling.
Utilities & Minor Fixes `forester/src/forester_status.rs`, `forester/src/metrics.rs`, `forester/src/priority_fee.rs`, `sdk-libs/client/src/utils.rs`, `sdk-libs/client/src/local_test_validator.rs`, `prover/client/src/constants.rs`	Improved queue parsing error handling (return None on parse failure), removed redundant clones in metrics, changed HTTP post to `&str`, hardened UTF-8 handling, replaced shell validator launch with `tokio::process::Command`, and updated `SERVER_ADDRESS` to `127.0.0.1`.

Sequence Diagram(s)

sequenceDiagram
  participant EpochMgr as EpochManager (Arc)
  participant Scheduler as FuturesUnordered
  participant NewWorker as NewTreeWorker (thread)
  participant ProofPool as ProofWorkers (channel)
  participant Indexer as Indexer / RPC
  participant DB as State/Accounts

  EpochMgr->>Scheduler: process_queue(epoch_info, epoch_pda, schedule)
  Scheduler->>EpochMgr: spawn dispatch_tree_processing tasks
  EpochMgr->>NewWorker: register / spawn / cancel (oneshot)
  EpochMgr->>ProofPool: spawn_proof_workers() -> Sender (fallible)
  ProofPool->>Indexer: request proofs
  Indexer->>ProofPool: return proof data
  ProofPool->>DB: verify/store compressed proofs
  NewWorker->>EpochMgr: completion / cancellation
  Scheduler->>EpochMgr: collect ProcessingResult or panic_payload_message
  EpochMgr->>EpochMgr: maybe_refinalize(...) -> (updated_epoch_pda, updated_schedule)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

feat: refinalization for late forester registrations #2326 — overlaps epoch_manager refinalization and process_queue signature changes.
refactor: optimize forester processing #2142 — related V2 processing and proof worker initialization refactors.
feat: state updates via single pipeline + StagingTree #2078 — related per-tree worker/supervisor lifecycle and orchestration changes.

Suggested labels

ai-review

Suggested reviewers

SwenSchaeferjohann
ananas-block

Poem

Arc threads hum, workers start and spin,
Snapshots hold the proofs tucked in,
Errors return, no hidden crash—
Seeds grouped deeper, indices stash,
New epochs close with graceful din.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'isolate and track forester worker concurrency' directly describes the primary architectural change: introducing NewTreeWorker orchestration with Arc-based concurrency control and lifecycle management.
Docstring Coverage	✅ Passed	Docstring coverage is 84.97% which is sufficient. The required threshold is 70.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch sergey/forester-concurrency

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (9)

prover/client/src/helpers.rs (1)

38-48: ⚠️ Potential issue | 🟠 Major

Reject negative values in bigint_to_u8_32.

to_bytes_be() returns (sign, magnitude). Right now -1 serializes as 0x…01 instead of failing, which can turn invalid proof inputs into a different root/hash without any error.

Suggested fix

 pub fn bigint_to_u8_32(n: &BigInt) -> Result<[u8; 32], ProverClientError> {
-    let (_, bytes_be) = n.to_bytes_be();
+    let (sign, bytes_be) = n.to_bytes_be();
+    if sign == num_bigint::Sign::Minus {
+        return Err(ProverClientError::InvalidProofData(
+            "Negative values are not valid 32-byte field elements".to_string(),
+        ));
+    }
     if bytes_be.len() > 32 {
         return Err(ProverClientError::InvalidProofData(
             "Number too large to fit in [u8; 32]".to_string(),
         ));
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/helpers.rs` around lines 38 - 48, The function
bigint_to_u8_32 currently ignores the sign from BigInt::to_bytes_be and
therefore accepts negative values; update bigint_to_u8_32 to inspect the sign
returned by to_bytes_be and return Err(ProverClientError::InvalidProofData(...))
when the sign is negative (i.e., not Sign::Plus), leaving the rest of the logic
unchanged for non-negative values; reference the bigint_to_u8_32 function and
the ProverClientError::InvalidProofData variant when implementing this check.

forester/tests/legacy/batched_state_async_indexer_test.rs (1)

79-90: ⚠️ Potential issue | 🟠 Major

Remove the second prover startup.

init() already starts the local validator with enable_prover: true. Spawning another prover here can race for port 3001 and fail the test before the worker-concurrency path is exercised.
Simplest fix
-    spawn_prover().await.unwrap();
Based on learnings The light test validator must spawn three background processes: solana test validator on port 8899, prover server on port 3001, and photon indexer on port 8784.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/tests/legacy/batched_state_async_indexer_test.rs` around lines 79 -
90, The test starts a prover twice causing a port race: remove the explicit
spawn_prover().await.unwrap() call so the prover started by
init(Some(LightValidatorConfig { enable_prover: true, ... })) is the only
prover; ensure init() is called with enable_indexer: true and the
LightValidatorConfig remains as shown so the test validator still spawns the
Solana validator, prover (port 3001) and photon indexer (port 8784) in the
background.

prover/client/src/proof_client.rs (1)

671-677: ⚠️ Potential issue | 🔴 Critical

Serialize low_element_next_indices in the address-append payload.

The BatchAddressAppendInputs struct carries low_element_next_indices, but the JSON serializer at prover/client/src/proof_types/batch_address_append/json.rs omits this field entirely. The prover circuit needs both the values and indices of boundary elements to validate indexed Merkle tree insertions (verifying that new addresses fit between low_value and low_next_value at the correct tree positions). Dropping the indices breaks this validation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_client.rs` around lines 671 - 677, The JSON
serializer for BatchAddressAppendInputs is omitting the low_element_next_indices
field, breaking prover validation; update the batch_address_append JSON
serialization (the module that produces the payload consumed by
generate_batch_address_append_proof) to include low_element_next_indices
alongside low_element_values/low_next_values, serialize it with the exact
key/name the prover circuit expects, and ensure the values are converted to JSON
(e.g., a numeric array of indices) so to_json(&inputs) used in
generate_batch_address_append_proof contains both the indices and values
required by the circuit.

forester/src/forester_status.rs (1)

674-685: ⚠️ Potential issue | 🟠 Major

Don’t silently mask queue parse failures as “empty queue.”

At Line 674 and Line 729, deserialization errors are collapsed into (0, 0). That makes malformed/unreadable queue data indistinguishable from truly empty queues and can produce misleading status/metrics output.

Please either propagate the parse error (preferred) or at least emit a warning with tree/queue identifiers before fallback.

Also applies to: 729-740
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/forester_status.rs` around lines 674 - 685, The code currently
swallows deserialization errors by calling
parse_hash_set_from_bytes::<QueueAccount>(&acc.data).ok().map(...).unwrap_or((0,0)),
which makes malformed queue data indistinguishable from an empty queue; replace
these silent fallbacks (the two occurrences using
parse_hash_set_from_bytes::<QueueAccount>) with proper error handling: ideally
change the enclosing function to return Result and propagate the parse error
(use parse_hash_set_from_bytes::<QueueAccount>(&acc.data)? and compute (len,
cap) on Ok), otherwise at minimum catch Err(e) and emit a warning including
identifying info (e.g., the tree/queue identifiers available in scope such as
acc.pubkey or tree_id) before returning the fallback (0,0); apply the same fix
at both locations (the parse_hash_set_from_bytes::<QueueAccount> usages around
the shown block and the 729–740 block).

prover/client/src/proof_types/batch_append/proof_inputs.rs (1)

115-130: ⚠️ Potential issue | 🟠 Major

Reject length mismatches before the zipped loop.

zip() stops at the shortest input. If old_leaves, leaves, or merkle_proofs is shorter than batch_size, this still returns batch_size unchanged but computes the root from only a partial batch. That will generate inconsistent public inputs and very opaque prover failures.

Suggested guard

 ) -> Result<(BatchAppendsCircuitInputs, Vec<ChangelogEntry<HEIGHT>>), ProverClientError> {
+    let expected_len = batch_size as usize;
+    if leaves.len() != expected_len
+        || old_leaves.len() != expected_len
+        || merkle_proofs.len() != expected_len
+    {
+        return Err(ProverClientError::GenericError(format!(
+            "batch append input length mismatch: leaves={}, old_leaves={}, merkle_proofs={}, expected={}",
+            leaves.len(),
+            old_leaves.len(),
+            merkle_proofs.len(),
+            expected_len,
+        )));
+    }
+
     let mut new_root = [0u8; 32];

Also applies to: 135-200

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_append/proof_inputs.rs` around lines 115
- 130, The function get_batch_append_inputs currently zips old_leaves, leaves,
and merkle_proofs and can silently process a partial batch if any of those
vectors are shorter than batch_size; add explicit length checks before the
zipped loop: verify old_leaves.len() == batch_size as usize, leaves.len() ==
batch_size as usize, and merkle_proofs.len() == batch_size as usize (or
otherwise reject mismatches), and return an appropriate ProverClientError when
any mismatch is detected so the function fails fast and never computes
roots/inputs from a partial batch.

sdk-libs/client/src/interface/initialize_config.rs (1)

20-27: ⚠️ Potential issue | 🔴 Critical

Consolidate to shared InitializeLightConfigParams payload and use anchor_lang::Discriminator.

The InitializeRentFreeConfig::build() method will produce an incompatible instruction payload. It serializes InitializeCompressionConfigAnchorData (which lacks config_bump), but the sibling initialize_config() function in instructions.rs correctly uses InitializeLightConfigParams with config_bump included. Both target the same on-chain instruction, so they must produce identical wire formats—presently they do not.

Additionally, line 122 hardcodes the discriminator bytes. Replace both issues by:

Reusing InitializeLightConfigParams here instead of the local struct

Using anchor_lang::Discriminator trait to derive the discriminator (as required by coding guidelines for sdk-libs/**/src/**/*.rs) rather than hardcoding bytes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/interface/initialize_config.rs` around lines 20 - 27, The
build path is serializing the wrong payload and hardcodes the discriminator:
replace the local struct InitializeCompressionConfigAnchorData with the shared
InitializeLightConfigParams so InitializeRentFreeConfig::build() emits the same
wire format as the existing initialize_config() function, and remove the manual
discriminator bytes in favor of deriving and using anchor_lang::Discriminator
(call the trait implementation to obtain the discriminator) rather than
hardcoding values; update any serialization calls in
InitializeRentFreeConfig::build() to use InitializeLightConfigParams and the
Discriminator API so both instruction creators produce identical payloads.

sdk-libs/program-test/src/indexer/test_indexer.rs (1)

2502-2535: ⚠️ Potential issue | 🟠 Major

Handle prover HTTP failures explicitly instead of spinning or panicking.

This loop only decrements retries on transport errors. A prover that keeps returning 4xx/5xx will spin forever, and a 200 with an unreadable body still hits unwrap() on text() / deserialize_gnark_proof_json(). Please convert both HTTP-status failures and parse failures into IndexerError and consume a retry in those branches.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 2502 - 2535,
The HTTP loop currently only decrements retries on transport errors and uses
unwrap() on response.text() and deserialize_gnark_proof_json(), so server
4xx/5xx and parse failures can spin or panic; update the client.post handling
(the block around client.post(...).send().await and the success branch that
calls response.text(), deserialize_gnark_proof_json, proof_from_json_struct, and
compress_proof) to: treat non-success status codes as IndexerError::CustomError
(include status and body if available), convert any failures from
response.text(), deserialize_gnark_proof_json, proof_from_json_struct, or
compress_proof into IndexerError::CustomError rather than unwrap/panic, and
ensure each of those error branches decrements retries and follows the same
retry/sleep logic so the function eventually returns
Err(IndexerError::CustomError("Failed to get proof from server")) after
exhausting retries; reference functions/values: client.post, response.text(),
deserialize_gnark_proof_json, proof_from_json_struct, compress_proof,
IndexerError, and ValidityProofWithContext.

forester/src/epoch_manager.rs (2)

4775-4785: ⚠️ Potential issue | 🟠 Major

Shutdown path can bypass tracked new-tree worker teardown.

When shutdown wins the tokio::select! in Line 4775, epoch_manager.run() is dropped before reaching Line 680 cleanup, so tracked std::thread workers may outlive service shutdown and continue processing.

💡 Proposed fix

-                        let result = tokio::select! {
-                            result = epoch_manager.run() => result,
+                        let epoch_manager_for_run = epoch_manager.clone();
+                        let result = tokio::select! {
+                            result = epoch_manager_for_run.run() => result,
                             _ = shutdown => {
                                 info!(
                                     event = "shutdown_received",
                                     run_id = %run_id_for_logs,
                                     phase = "service_run",
                                     "Received shutdown signal. Stopping the service."
                                 );
+                                epoch_manager
+                                    .shutdown_new_tree_workers(NEW_TREE_WORKER_SHUTDOWN_TIMEOUT)
+                                    .await;
                                 Ok(())
                             }
                         };

Also applies to: 680-681

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 4775 - 4785, The shutdown branch
in the tokio::select! currently just returns Ok(()) which drops
epoch_manager.run() and skips the cleanup that joins tracked std::thread
workers; instead, on shutdown you must trigger a graceful stop on the
EpochManager and await its run future to complete so its cleanup code (the
tracked-worker teardown at/around epoch_manager.run() and the cleanup near lines
~680) runs: send the same shutdown signal or call the
epoch_manager.shutdown()/stop() method (or set a shutdown flag) and then await
epoch_manager.run() (or call a provided join/await method) before returning
Ok(()) so all tracked std::thread workers are torn down and joined. Ensure you
reference and use the epoch_manager.run() future and any explicit join/teardown
methods on EpochManager to perform the worker joins instead of dropping the
future.

2181-2192: ⚠️ Potential issue | 🔴 Critical

RegistrationTracker claim can get stuck on early error paths.

After try_claim_refinalize() succeeds, Line 2188 and Line 2191 use ?. If either fails, complete_refinalize() is never called, leaving refinalize_in_progress=true and potentially blocking other workers forever in wait_for_refinalize().

💡 Proposed fix

         if registration_tracker.try_claim_refinalize() {
-                // This task sends the finalize_registration tx
-                let ix = create_finalize_registration_instruction(
-                    &self.config.payer_keypair.pubkey(),
-                    &self.config.derivation_pubkey,
-                    epoch_info.epoch,
-                );
-                let priority_fee = self
-                    .resolve_epoch_priority_fee(&*rpc, epoch_info.epoch)
-                    .await?;
-                let current_slot = rpc.get_slot().await?;
-                let Some(confirmation_deadline) = scheduled_confirmation_deadline(
-                    epoch_info.phases.active.end.saturating_sub(current_slot),
-                ) else {
-                    info!(...);
-                    registration_tracker.complete_refinalize(cached_weight);
-                    return Ok((forester_epoch_pda, tree_schedule));
-                };
-                ...
-                match send_smart_transaction(...).await.map_err(RpcError::from) {
-                    Ok(_) => { ... registration_tracker.complete_refinalize(post_finalize_weight); }
-                    Err(e) => {
-                        registration_tracker.complete_refinalize(cached_weight);
-                        return Err(e.into());
-                    }
-                }
+                let mut completion_weight = cached_weight;
+                let claim_result: Result<()> = async {
+                    let ix = create_finalize_registration_instruction(
+                        &self.config.payer_keypair.pubkey(),
+                        &self.config.derivation_pubkey,
+                        epoch_info.epoch,
+                    );
+                    let priority_fee = self
+                        .resolve_epoch_priority_fee(&*rpc, epoch_info.epoch)
+                        .await?;
+                    let current_slot = rpc.get_slot().await?;
+                    let Some(confirmation_deadline) = scheduled_confirmation_deadline(
+                        epoch_info.phases.active.end.saturating_sub(current_slot),
+                    ) else {
+                        info!(...);
+                        return Ok(());
+                    };
+                    match send_smart_transaction(...).await.map_err(RpcError::from) {
+                        Ok(_) => {
+                            completion_weight = match rpc.get_anchor_account::<EpochPda>(&epoch_pda_address).await {
+                                Ok(Some(pda)) => pda.registered_weight,
+                                _ => on_chain_weight,
+                            };
+                            Ok(())
+                        }
+                        Err(e) => Err(e.into()),
+                    }
+                }.await;
+
+                registration_tracker.complete_refinalize(completion_weight);
+                claim_result?;
             } else {
                 registration_tracker.wait_for_refinalize().await;
             }

Also applies to: 2246-2250

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 2181 - 2192, After a successful
RegistrationTracker::try_claim_refinalize() return ensure
RegistrationTracker::complete_refinalize() is always called even if subsequent
calls like resolve_epoch_priority_fee(), rpc.get_slot(), or
scheduled_confirmation_deadline(...) fail; wrap the post-claim work in a scope
that guarantees cleanup (e.g., a scope guard or an explicit match/let guard) so
complete_refinalize() runs on all early-return/error paths, and apply the same
pattern around the other similar block (lines ~2246-2250) to avoid leaving
refinalize_in_progress = true.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@forester/src/config.rs`:
- Around line 341-347: The current use of partition() makes the Err branch
already returned, so the map_err fallback on valid.into_iter().collect(...) is
unreachable and gives a misleading generic error; change the logic in the tree
ID parsing flow (the variables/blocks using partition(), valid, invalid and the
ConfigError::InvalidArguments with field "tree_ids") to either directly
unwrap/collect the already-validated `valid` Vec<Pubkey> (since `invalid` was
handled) or perform a single-pass parse that collects invalid inputs and returns
ConfigError::InvalidArguments with the actual invalid_values; update the code
around the partition() usage to return the real invalid strings instead of the
generic "failed to parse tree_ids".

In `@forester/src/processor/v2/helpers.rs`:
- Around line 495-524: The function currently treats truncated batches as valid
by using actual_end = end.min(available) and proceeding; change the guard so
that if wait_for_batch(end) returns available < end the function returns
Ok(None) (i.e., do not proceed with partial batches). Specifically, in
get_batch_snapshot replace the current if start >= available { return Ok(None);
} / actual_end logic with a check like if available < end { return Ok(None); }
(still keep the start >= available check as needed), so code paths that call
lock_recover, read data.addresses[start..actual_end], or derive
leaves_hash_chain (data.leaves_hash_chains, create_hash_chain_from_slice) never
run for undersized batches.

In `@forester/src/processor/v2/proof_worker.rs`:
- Around line 197-214: The serialization-failure path uses two different
timestamps (submitted_at = Instant::now() here and round_trip_start elsewhere)
and silently ignores closed result_tx; consolidate this by adding a helper
(e.g., send_error_result or report_serialization_failure) used by both the
job.inputs.to_json error branch and the other serialization-before-submit branch
that: captures one Instant::now() into a single variable (round_trip_start or
submitted_at) at the failure site, logs the error with error! including the
error details, constructs the ProofJobResult with that single timestamp and
consistent round_trip_ms/proof_duration_ms values, and attempts to send via
job.result_tx while handling a closed channel by logging the send failure;
update both locations (the error branch around job.inputs.to_json and the other
branch that currently reuses round_trip_start) to call this helper so latency
metrics and logging are consistent.

In `@forester/tests/test_batch_append_spent.rs`:
- Around line 332-346: The spawned closure passed to tokio::task::spawn_blocking
(bound to service_handle) can fail at runtime construction
(tokio::runtime::Builder::build), during
runtime.block_on(run_pipeline::<LightClient>(...)), and when joining the spawned
task; currently all results are ignored—update the test to explicitly handle
each error: unwrap or propagate the runtime build Result (ensure the build error
is surfaced), propagate/unwrap the Result returned by
run_pipeline::<LightClient> so pipeline failures fail the test, and await/join
the JoinHandle and check its JoinError (or unwrap) instead of discarding with
let _ = ..., failing the test on panic or join failure; reference
service_handle, runtime.block_on, run_pipeline::<LightClient>,
shutdown_receiver, shutdown_compressible_receiver, and work_report_sender when
making these changes.

In `@forester/tests/test_compressible_pda.rs`:
- Around line 312-313: Replace the bare .unwrap() calls on builder results with
.expect(...) that includes a descriptive message; specifically change the
.build().unwrap() invocations that call InitializeRentFreeConfig::build() (and
the two other similar setup builder calls in this test) to
.build().expect("InitializeRentFreeConfig::build() failed") or an equivalent
message that names the failing builder so failures in setup are clear. Ensure
each replaced call uses a distinct message if helpful to identify which builder
failed.

In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs`:
- Around line 190-195: The code currently slices
new_element_values[..zkp_batch_size] and then indexes parallel inputs
(low_element_values, low_element_next_values, low_element_indices,
low_element_next_indices, low_element_proofs) by that length, which can panic if
any of those slices are shorter; add explicit guards at the start of the
function that compute required_len = zkp_batch_size and verify each parallel
input's len() >= required_len (for low_element_proofs check len() >=
required_len and each inner-proof has HEIGHT elements) and return a
ProverClientError on failure instead of allowing a panic; reference the
parameter names low_element_values, low_element_next_values,
low_element_indices, low_element_next_indices, low_element_proofs,
new_element_values and zkp_batch_size when adding these checks.

In `@prover/client/src/proof_types/batch_append/json.rs`:
- Around line 76-79: Rename the inherent fallible method `to_string(&self) ->
Result<String, ProverClientError>` to a clearly-fallible name like
`try_to_json_string(&self) -> Result<String, ProverClientError>` (or
`to_json_string` if you prefer), update all call sites that invoke `to_string()`
to call the new name, keep the same return type and body
(`create_json_from_struct(&self)`), and remove the
`#[allow(clippy::inherent_to_string)]` attribute since the method no longer
shadows the conventional `to_string`. Ensure unit tests and any trait impls or
uses are updated to the new identifier.

In `@prover/client/src/proof_types/batch_update/proof_inputs.rs`:
- Around line 115-118: The public_inputs method currently indexes self.0[0],
which will panic on an empty batch and only returns the first entry; instead
iterate over the inner slice (self.0), call public_inputs_arr() on each item,
collect the results into the Vec<[u8; 32]>, and propagate any errors as
ProverClientError; update public_inputs to map/try_collect (or equivalent) over
self.0 so all batch entries are included and empty slices are handled without
panicking (or explicitly return an error if an empty batch is considered
invalid).

In `@prover/client/src/proof_types/non_inclusion/v2/json.rs`:
- Around line 54-62: The JSON serialization currently forces path_index to a u32
by converting input.index_hashed_indexed_element_leaf via to_u32(), which will
fail for trees deeper than 32 bits; update the contract to allow larger indices
by switching path_index to a wider representation (for example: use to_u64() and
a u64 field, or serialize the index as a string/BigInt) and change the
conversion call from to_u32() to the chosen conversion (e.g., to_u64() or
to_string()) and update any consumers of path_index accordingly so indices >
u32::MAX are supported; ensure error handling still returns
ProverClientError::IntegerConversion with an appropriate message if conversion
to the new representation fails.

In `@prover/client/src/proof.rs`:
- Around line 167-170: The negate_g1 function currently deserializes the
incoming G1 point with Validate::No allowing malformed/off-curve points through;
change the call to G1::deserialize_with_mode(g1_le.as_slice(), Compress::No,
Validate::Yes) so the point is validated at deserialization and any error still
maps to ProverClientError::InvalidProofData; this uses the existing
convert_endianness and error mapping, so no further changes to error handling
are needed.

In `@sdk-libs/client/src/indexer/types/proof.rs`:
- Around line 214-223: The loop that computes output tree indices (using
account.tree_info.next_tree_info and account.tree_info.pack_output_tree_index on
each Account) currently keeps the first index in output_tree_index and ignores
mismatches; update PackedStateTreeInfos construction to validate that every call
to pack_output_tree_index returns the same index and return an error if any
subsequent index differs. Specifically, in the code paths that set
output_tree_index (the blocks referencing output_tree_index,
account.tree_info.next_tree_info, and pack_output_tree_index), compare the newly
computed index to the previously stored Some(index) and propagate a descriptive
Err when they diverge instead of silently ignoring the difference; apply the
same validation to the other occurrence noted (around lines 243-248).

In `@sdk-libs/client/src/interface/load_accounts.rs`:
- Around line 120-124: The missing-PDA error loses the original position because
collect_pda_hashes restarts enumeration for each group; update the flow so the
original flat index from cold_pdas is preserved: modify
group_pda_specs/collect_pda_hashes usage so each group carries a base_index (or
have collect_pda_hashes return (hash, original_index) pairs) and use that
original index when constructing MissingPdaCompressed; apply the same change to
the other occurrence referenced (lines ~164-175) so every MissingPdaCompressed
contains the flattened index from cold_pdas rather than the per-group 0-based
index.

In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 127-136: The non-surfpool branch currently spawns the validator
with Command::new(...).spawn() and immediately drops the Child, so failures
during the sleep are ignored; change this to keep the Child (e.g., let mut child
= Command::new(...).spawn()?), await the
tokio::time::sleep(Duration::from_secs(config.wait_time)).await while retaining
the child, then call child.try_wait() (or child.wait() if blocking is
acceptable) and inspect the ExitStatus; if the child has exited with a
non-success status return an io::Error (or map the ExitStatus to Err) so callers
receive an error instead of Ok(()). Ensure you mirror the surfpool branch
pattern (spawn -> keep handle -> wait/try_wait -> check status) and propagate
any non-zero exit as an io::Error.
- Around line 69-115: The current code concatenates command pieces into a single
shell string and spawns it via Command::new("sh").arg("-c").arg(path), which
breaks on spaces/metacharacters (see path construction and uses of
config.enable_indexer, config.limit_ledger_size, config.sbf_programs,
config.upgradeable_programs, config.validator_args and config.use_surfpool).
Change to build a Vec<String> (or Vec<&str>) of argv: start with the executable
built from project_root + "/cli/test_bin/run", then push "test-validator" and
each flag/value as separate .arg(...) entries (one per sbf_program pair and per
upgradeable_program fields, and each validator_arg as its own arg) and call
Command::new(executable).args(&args).spawn() instead of using "sh -c". Also make
the non-surfpool branch behave like the surfpool branch: keep the child handle,
call child.wait(), check the exit Status and propagate or log failures instead
of dropping the child and sleeping unconditionally.

---

Outside diff comments:
In `@forester/src/epoch_manager.rs`:
- Around line 4775-4785: The shutdown branch in the tokio::select! currently
just returns Ok(()) which drops epoch_manager.run() and skips the cleanup that
joins tracked std::thread workers; instead, on shutdown you must trigger a
graceful stop on the EpochManager and await its run future to complete so its
cleanup code (the tracked-worker teardown at/around epoch_manager.run() and the
cleanup near lines ~680) runs: send the same shutdown signal or call the
epoch_manager.shutdown()/stop() method (or set a shutdown flag) and then await
epoch_manager.run() (or call a provided join/await method) before returning
Ok(()) so all tracked std::thread workers are torn down and joined. Ensure you
reference and use the epoch_manager.run() future and any explicit join/teardown
methods on EpochManager to perform the worker joins instead of dropping the
future.
- Around line 2181-2192: After a successful
RegistrationTracker::try_claim_refinalize() return ensure
RegistrationTracker::complete_refinalize() is always called even if subsequent
calls like resolve_epoch_priority_fee(), rpc.get_slot(), or
scheduled_confirmation_deadline(...) fail; wrap the post-claim work in a scope
that guarantees cleanup (e.g., a scope guard or an explicit match/let guard) so
complete_refinalize() runs on all early-return/error paths, and apply the same
pattern around the other similar block (lines ~2246-2250) to avoid leaving
refinalize_in_progress = true.

In `@forester/src/forester_status.rs`:
- Around line 674-685: The code currently swallows deserialization errors by
calling
parse_hash_set_from_bytes::<QueueAccount>(&acc.data).ok().map(...).unwrap_or((0,0)),
which makes malformed queue data indistinguishable from an empty queue; replace
these silent fallbacks (the two occurrences using
parse_hash_set_from_bytes::<QueueAccount>) with proper error handling: ideally
change the enclosing function to return Result and propagate the parse error
(use parse_hash_set_from_bytes::<QueueAccount>(&acc.data)? and compute (len,
cap) on Ok), otherwise at minimum catch Err(e) and emit a warning including
identifying info (e.g., the tree/queue identifiers available in scope such as
acc.pubkey or tree_id) before returning the fallback (0,0); apply the same fix
at both locations (the parse_hash_set_from_bytes::<QueueAccount> usages around
the shown block and the 729–740 block).

In `@forester/tests/legacy/batched_state_async_indexer_test.rs`:
- Around line 79-90: The test starts a prover twice causing a port race: remove
the explicit spawn_prover().await.unwrap() call so the prover started by
init(Some(LightValidatorConfig { enable_prover: true, ... })) is the only
prover; ensure init() is called with enable_indexer: true and the
LightValidatorConfig remains as shown so the test validator still spawns the
Solana validator, prover (port 3001) and photon indexer (port 8784) in the
background.

In `@prover/client/src/helpers.rs`:
- Around line 38-48: The function bigint_to_u8_32 currently ignores the sign
from BigInt::to_bytes_be and therefore accepts negative values; update
bigint_to_u8_32 to inspect the sign returned by to_bytes_be and return
Err(ProverClientError::InvalidProofData(...)) when the sign is negative (i.e.,
not Sign::Plus), leaving the rest of the logic unchanged for non-negative
values; reference the bigint_to_u8_32 function and the
ProverClientError::InvalidProofData variant when implementing this check.

In `@prover/client/src/proof_client.rs`:
- Around line 671-677: The JSON serializer for BatchAddressAppendInputs is
omitting the low_element_next_indices field, breaking prover validation; update
the batch_address_append JSON serialization (the module that produces the
payload consumed by generate_batch_address_append_proof) to include
low_element_next_indices alongside low_element_values/low_next_values, serialize
it with the exact key/name the prover circuit expects, and ensure the values are
converted to JSON (e.g., a numeric array of indices) so to_json(&inputs) used in
generate_batch_address_append_proof contains both the indices and values
required by the circuit.

In `@prover/client/src/proof_types/batch_append/proof_inputs.rs`:
- Around line 115-130: The function get_batch_append_inputs currently zips
old_leaves, leaves, and merkle_proofs and can silently process a partial batch
if any of those vectors are shorter than batch_size; add explicit length checks
before the zipped loop: verify old_leaves.len() == batch_size as usize,
leaves.len() == batch_size as usize, and merkle_proofs.len() == batch_size as
usize (or otherwise reject mismatches), and return an appropriate
ProverClientError when any mismatch is detected so the function fails fast and
never computes roots/inputs from a partial batch.

In `@sdk-libs/client/src/interface/initialize_config.rs`:
- Around line 20-27: The build path is serializing the wrong payload and
hardcodes the discriminator: replace the local struct
InitializeCompressionConfigAnchorData with the shared
InitializeLightConfigParams so InitializeRentFreeConfig::build() emits the same
wire format as the existing initialize_config() function, and remove the manual
discriminator bytes in favor of deriving and using anchor_lang::Discriminator
(call the trait implementation to obtain the discriminator) rather than
hardcoding values; update any serialization calls in
InitializeRentFreeConfig::build() to use InitializeLightConfigParams and the
Discriminator API so both instruction creators produce identical payloads.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 2502-2535: The HTTP loop currently only decrements retries on
transport errors and uses unwrap() on response.text() and
deserialize_gnark_proof_json(), so server 4xx/5xx and parse failures can spin or
panic; update the client.post handling (the block around
client.post(...).send().await and the success branch that calls response.text(),
deserialize_gnark_proof_json, proof_from_json_struct, and compress_proof) to:
treat non-success status codes as IndexerError::CustomError (include status and
body if available), convert any failures from response.text(),
deserialize_gnark_proof_json, proof_from_json_struct, or compress_proof into
IndexerError::CustomError rather than unwrap/panic, and ensure each of those
error branches decrements retries and follows the same retry/sleep logic so the
function eventually returns Err(IndexerError::CustomError("Failed to get proof
from server")) after exhausting retries; reference functions/values:
client.post, response.text(), deserialize_gnark_proof_json,
proof_from_json_struct, compress_proof, IndexerError, and
ValidityProofWithContext.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b8aeca30-1184-4e0f-84c8-f3cb3237bc1f

📥 Commits

Reviewing files that changed from the base of the PR and between 2644529 and fbf48e5.

⛔ Files ignored due to path filters (63)

forester-utils/src/account_zero_copy.rs is excluded by none and included by none
forester-utils/src/address_merkle_tree_config.rs is excluded by none and included by none
forester-utils/src/address_staging_tree.rs is excluded by none and included by none
forester-utils/src/forester_epoch.rs is excluded by none and included by none
forester-utils/src/instructions/compress_and_close_mint.rs is excluded by none and included by none
forester-utils/src/rate_limiter.rs is excluded by none and included by none
forester-utils/src/registry.rs is excluded by none and included by none
program-tests/account-compression-test/tests/address_merkle_tree_tests.rs is excluded by none and included by none
program-tests/account-compression-test/tests/merkle_tree_tests.rs is excluded by none and included by none
program-tests/batched-merkle-tree-test/tests/merkle_tree.rs is excluded by none and included by none
program-tests/compressed-token-test/tests/freeze/functional.rs is excluded by none and included by none
program-tests/compressed-token-test/tests/v1.rs is excluded by none and included by none
program-tests/system-cpi-v2-test/tests/event.rs is excluded by none and included by none
program-tests/system-cpi-v2-test/tests/invoke_cpi_with_read_only.rs is excluded by none and included by none
program-tests/utils/src/account_zero_copy.rs is excluded by none and included by none
program-tests/utils/src/actions/legacy/instructions/transfer2.rs is excluded by none and included by none
program-tests/utils/src/address_tree_rollover.rs is excluded by none and included by none
program-tests/utils/src/assert_compressed_tx.rs is excluded by none and included by none
program-tests/utils/src/assert_merkle_tree.rs is excluded by none and included by none
program-tests/utils/src/assert_queue.rs is excluded by none and included by none
program-tests/utils/src/batched_address_tree.rs is excluded by none and included by none
program-tests/utils/src/e2e_test_env.rs is excluded by none and included by none
program-tests/utils/src/lib.rs is excluded by none and included by none
program-tests/utils/src/mock_batched_forester.rs is excluded by none and included by none
program-tests/utils/src/state_tree_rollover.rs is excluded by none and included by none
program-tests/utils/src/test_batch_forester.rs is excluded by none and included by none
program-tests/utils/src/test_forester.rs is excluded by none and included by none
sdk-tests/anchor-manual-test/tests/shared.rs is excluded by none and included by none
sdk-tests/anchor-semi-manual-test/tests/shared/mod.rs is excluded by none and included by none
sdk-tests/anchor-semi-manual-test/tests/stress_test.rs is excluded by none and included by none
sdk-tests/client-test/tests/light_client.rs is excluded by none and included by none
sdk-tests/csdk-anchor-full-derived-test/tests/amm_stress_test.rs is excluded by none and included by none
sdk-tests/csdk-anchor-full-derived-test/tests/amm_test.rs is excluded by none and included by none
sdk-tests/csdk-anchor-full-derived-test/tests/basic_test.rs is excluded by none and included by none
sdk-tests/csdk-anchor-full-derived-test/tests/d10_ata_idempotent_test.rs is excluded by none and included by none
sdk-tests/csdk-anchor-full-derived-test/tests/d10_token_accounts_test.rs is excluded by none and included by none
sdk-tests/csdk-anchor-full-derived-test/tests/d11_zero_copy_test.rs is excluded by none and included by none
sdk-tests/csdk-anchor-full-derived-test/tests/failing_tests.rs is excluded by none and included by none
sdk-tests/csdk-anchor-full-derived-test/tests/integration_tests.rs is excluded by none and included by none
sdk-tests/csdk-anchor-full-derived-test/tests/shared.rs is excluded by none and included by none
sdk-tests/pinocchio-light-program-test/tests/shared/mod.rs is excluded by none and included by none
sdk-tests/pinocchio-light-program-test/tests/stress_test.rs is excluded by none and included by none
sdk-tests/pinocchio-manual-test/tests/shared.rs is excluded by none and included by none
sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/read_only.rs is excluded by none and included by none
sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/test.rs is excluded by none and included by none
sdk-tests/sdk-native-test/tests/test.rs is excluded by none and included by none
sdk-tests/sdk-pinocchio-v1-test/tests/test.rs is excluded by none and included by none
sdk-tests/sdk-pinocchio-v2-test/tests/test.rs is excluded by none and included by none
sdk-tests/sdk-token-test/tests/ctoken_pda.rs is excluded by none and included by none
sdk-tests/sdk-token-test/tests/decompress_full_cpi.rs is excluded by none and included by none
sdk-tests/sdk-token-test/tests/pda_ctoken.rs is excluded by none and included by none
sdk-tests/sdk-token-test/tests/test.rs is excluded by none and included by none
sdk-tests/sdk-token-test/tests/test_4_invocations.rs is excluded by none and included by none
sdk-tests/sdk-token-test/tests/test_4_transfer2.rs is excluded by none and included by none
sdk-tests/sdk-token-test/tests/test_deposit.rs is excluded by none and included by none
sdk-tests/sdk-v1-native-test/tests/test.rs is excluded by none and included by none
sdk-tests/single-account-loader-test/tests/test.rs is excluded by none and included by none
sdk-tests/single-ata-test/tests/test.rs is excluded by none and included by none
sdk-tests/single-mint-test/tests/test.rs is excluded by none and included by none
sdk-tests/single-pda-test/tests/test.rs is excluded by none and included by none
sdk-tests/single-token-test/tests/test.rs is excluded by none and included by none
sparse-merkle-tree/src/indexed_changelog.rs is excluded by none and included by none
sparse-merkle-tree/tests/indexed_changelog.rs is excluded by none and included by none

📒 Files selected for processing (66)

forester/src/config.rs
forester/src/epoch_manager.rs
forester/src/forester_status.rs
forester/src/main.rs
forester/src/metrics.rs
forester/src/priority_fee.rs
forester/src/processor/v2/helpers.rs
forester/src/processor/v2/proof_worker.rs
forester/src/processor/v2/strategy/address.rs
forester/src/queue_helpers.rs
forester/tests/e2e_test.rs
forester/tests/legacy/batched_state_async_indexer_test.rs
forester/tests/legacy/test_utils.rs
forester/tests/test_batch_append_spent.rs
forester/tests/test_compressible_pda.rs
forester/tests/test_indexer_interface.rs
forester/tests/test_utils.rs
prover/client/src/errors.rs
prover/client/src/helpers.rs
prover/client/src/proof.rs
prover/client/src/proof_client.rs
prover/client/src/proof_types/batch_address_append/json.rs
prover/client/src/proof_types/batch_address_append/proof_inputs.rs
prover/client/src/proof_types/batch_append/json.rs
prover/client/src/proof_types/batch_append/proof_inputs.rs
prover/client/src/proof_types/batch_update/json.rs
prover/client/src/proof_types/batch_update/proof_inputs.rs
prover/client/src/proof_types/combined/v1/json.rs
prover/client/src/proof_types/combined/v2/json.rs
prover/client/src/proof_types/combined/v2/proof_inputs.rs
prover/client/src/proof_types/inclusion/v1/json.rs
prover/client/src/proof_types/inclusion/v1/proof_inputs.rs
prover/client/src/proof_types/inclusion/v2/json.rs
prover/client/src/proof_types/inclusion/v2/proof_inputs.rs
prover/client/src/proof_types/non_inclusion/v1/json.rs
prover/client/src/proof_types/non_inclusion/v1/proof_inputs.rs
prover/client/src/proof_types/non_inclusion/v2/json.rs
prover/client/src/proof_types/non_inclusion/v2/proof_inputs.rs
prover/client/src/prover.rs
prover/client/tests/batch_address_append.rs
prover/client/tests/batch_append.rs
prover/client/tests/batch_update.rs
prover/client/tests/combined.rs
prover/client/tests/inclusion.rs
prover/client/tests/init_merkle_tree.rs
prover/client/tests/non_inclusion.rs
sdk-libs/client/src/fee.rs
sdk-libs/client/src/indexer/error.rs
sdk-libs/client/src/indexer/options.rs
sdk-libs/client/src/indexer/photon_indexer.rs
sdk-libs/client/src/indexer/types/proof.rs
sdk-libs/client/src/indexer/types/queue.rs
sdk-libs/client/src/indexer/types/token.rs
sdk-libs/client/src/interface/initialize_config.rs
sdk-libs/client/src/interface/instructions.rs
sdk-libs/client/src/interface/load_accounts.rs
sdk-libs/client/src/interface/mod.rs
sdk-libs/client/src/interface/pack.rs
sdk-libs/client/src/interface/serialize.rs
sdk-libs/client/src/local_test_validator.rs
sdk-libs/client/src/rpc/client.rs
sdk-libs/client/src/utils.rs
sdk-libs/macros/src/light_pdas/seeds/extract.rs
sdk-libs/program-test/src/indexer/test_indexer.rs
sdk-libs/program-test/src/program_test/compressible_setup.rs
sdk-libs/program-test/src/program_test/light_program_test.rs

coderabbitai · 2026-03-14T21:32:35Z

forester/src/config.rs

+                    valid
+                        .into_iter()
+                        .collect::<std::result::Result<Vec<_>, _>>()
+                        .map_err(|_| ConfigError::InvalidArguments {
+                            field: "tree_ids",
+                            invalid_values: vec!["failed to parse tree_ids".to_string()],
+                        })?


🧹 Nitpick | 🔵 Trivial

This fallback parse error is unreachable after the partition() above.

valid can only contain Ok(Pubkey) values once the invalid branch has returned, so the generic "failed to parse tree_ids" path never executes. I'd either unwrap the partitioned values directly or collapse the parse into a single pass so the error surface stays aligned with the real invalid inputs.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@forester/src/config.rs` around lines 341 - 347, The current use of partition() makes the Err branch already returned, so the map_err fallback on valid.into_iter().collect(...) is unreachable and gives a misleading generic error; change the logic in the tree ID parsing flow (the variables/blocks using partition(), valid, invalid and the ConfigError::InvalidArguments with field "tree_ids") to either directly unwrap/collect the already-validated `valid` Vec<Pubkey> (since `invalid` was handled) or perform a single-pass parse that collects invalid inputs and returns ConfigError::InvalidArguments with the actual invalid_values; update the code around the partition() usage to return the real invalid strings instead of the generic "failed to parse tree_ids".

forester/src/processor/v2/helpers.rs

coderabbitai · 2026-03-14T21:32:36Z

forester/src/processor/v2/proof_worker.rs

+    let inputs_json = match job.inputs.to_json(&job.tree_id, job.seq) {
+        Ok(inputs_json) => inputs_json,
+        Err(error) => {
+            let _ = job
+                .result_tx
+                .send(ProofJobResult {
+                    seq: job.seq,
+                    result: Err(format!("Failed to serialize proof inputs: {}", error)),
+                    old_root: [0u8; 32],
+                    new_root: [0u8; 32],
+                    proof_duration_ms: 0,
+                    round_trip_ms: 0,
+                    submitted_at: std::time::Instant::now(),
+                })
+                .await;
+            return;
+        }
+    };


⚠️ Potential issue | 🟡 Minor

Use one timestamp/logging path for serialization failures.

Line 209 stamps submitted_at with Instant::now(), while Line 306 reuses round_trip_start for the same “serialization failed before submit” outcome. That makes latency metrics path-dependent, and these branches also go silent if result_tx is closed. Factor this into one helper with a single timestamp convention and an error! before the send.

Suggested direction

+async fn send_serialization_failure( + result_tx: &mpsc::Sender<ProofJobResult>, + seq: u64, + error: &ProverClientError, + submitted_at: std::time::Instant, + round_trip_ms: u64, +) { + error!("Failed to serialize proof inputs for seq={}: {}", seq, error); + let _ = result_tx + .send(ProofJobResult { + seq, + result: Err(format!("Failed to serialize proof inputs: {}", error)), + old_root: [0u8; 32], + new_root: [0u8; 32], + proof_duration_ms: 0, + round_trip_ms, + submitted_at, + }) + .await; +}

Also applies to: 296-311

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@forester/src/processor/v2/proof_worker.rs` around lines 197 - 214, The serialization-failure path uses two different timestamps (submitted_at = Instant::now() here and round_trip_start elsewhere) and silently ignores closed result_tx; consolidate this by adding a helper (e.g., send_error_result or report_serialization_failure) used by both the job.inputs.to_json error branch and the other serialization-before-submit branch that: captures one Instant::now() into a single variable (round_trip_start or submitted_at) at the failure site, logs the error with error! including the error details, constructs the ProofJobResult with that single timestamp and consistent round_trip_ms/proof_duration_ms values, and attempts to send via job.result_tx while handling a closed channel by logging the send failure; update both locations (the error branch around job.inputs.to_json and the other branch that currently reuses round_trip_start) to call this helper so latency metrics and logging are consistent.

forester/tests/test_batch_append_spent.rs

coderabbitai · 2026-03-14T21:32:36Z

forester/tests/test_compressible_pda.rs

+    .build()
+    .unwrap();


🧹 Nitpick | 🔵 Trivial

Prefer expect(...) over bare unwrap() in these setup builders.

These are high-churn setup steps, and the rest of the file already uses descriptive expects. Keeping the message here makes it much faster to tell whether InitializeRentFreeConfig::build() failed or the later transaction path broke.

Minimal tweak

- .build() - .unwrap(); + .build() + .expect("InitializeRentFreeConfig::build should succeed");

Also applies to: 508-509, 750-751

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@forester/tests/test_compressible_pda.rs` around lines 312 - 313, Replace the bare .unwrap() calls on builder results with .expect(...) that includes a descriptive message; specifically change the .build().unwrap() invocations that call InitializeRentFreeConfig::build() (and the two other similar setup builder calls in this test) to .build().expect("InitializeRentFreeConfig::build() failed") or an equivalent message that names the failing builder so failures in setup are clear. Ensure each replaced call uses a distinct message if helpful to identify which builder failed.

prover/client/src/prover.rs

sdk-libs/client/src/indexer/types/proof.rs

sdk-libs/client/src/interface/load_accounts.rs

sdk-libs/client/src/local_test_validator.rs

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

forester/src/epoch_manager.rs (1)
576-625: ⚠️ Potential issue | 🔴 Critical

Don’t fan out epochs concurrently while V2 processors are still cached per tree.

epoch_tasks now runs previous/current epochs at the same time, but get_or_create_state_processor() / get_or_create_address_processor() still reuse one mutable QueueProcessor per tree and retag it in place with update_epoch() on reuse. That lets epoch N+1 rewrite context.epoch, current_root, seq, cached_state, and the proof cache while epoch N is still using the same processor, so later V2 jobs can be built against the wrong epoch/root context. Either keep epoch processing serial here or key the processor caches by (tree, epoch).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 576 - 625, The current loop fans
out epochs via epoch_tasks (FuturesUnordered) while
get_or_create_state_processor() / get_or_create_address_processor() reuse and
retag a single mutable QueueProcessor via update_epoch(), causing concurrent
epochs to corrupt shared processor context; fix by either (A) making epoch
processing serial: remove FuturesUnordered concurrency and await
process_epoch(epoch) directly in the select branch so each epoch completes
before the next starts (affecting epoch_tasks/epoch_tasks.push usage and the
async move closure around process_epoch), or (B) make the processor caches safe
for concurrent epochs by keying processors by (tree, epoch) instead of reusing a
single instance: change get_or_create_state_processor() and
get_or_create_address_processor() to accept epoch (or return a per-epoch cloned
QueueProcessor) and stop calling update_epoch() to retag a shared processor;
choose one approach and apply corresponding changes to process_epoch,
update_epoch, and the processor cache logic (QueueProcessor, get_or_create_*,
get_or_create_*) so no two concurrent epoch tasks mutate the same processor
state.
prover/client/src/proof_types/batch_update/proof_inputs.rs (1)
183-190: ⚠️ Potential issue | 🟠 Major

Don’t leave a panic in the Result path.

merkle_proof.try_into().unwrap() still aborts on malformed proofs even though this helper now returns Result. Please turn that conversion failure into a ProverClientError so bad proof data is reported instead of crashing the caller.
🔧 Suggested fix
-        let merkle_proof_array = merkle_proof.try_into().unwrap();
+        let proof_len = merkle_proof.len();
+        let merkle_proof_array: [[u8; 32]; HEIGHT] = merkle_proof.try_into().map_err(|_| {
+            ProverClientError::GenericError(format!(
+                "Invalid merkle proof length: got {}, expected {}",
+                proof_len, HEIGHT
+            ))
+        })?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_update/proof_inputs.rs` around lines 183
- 190, The code currently uses merkle_proof.try_into().unwrap(), which can panic
on malformed proofs; change this to propagate a ProverClientError instead of
unwrapping: attempt the conversion (merkle_proof.try_into()) and map the Err
case into an appropriate ProverClientError (with context like "invalid merkle
proof" and the original error) before assigning merkle_proof_array, so
subsequent calls (e.g., compute_root_from_merkle_proof(nullifier,
&merkle_proof_array, *index as usize)) run only when the conversion succeeded.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@forester/src/epoch_manager.rs`:
- Around line 680-681: The cleanup call to
self.shutdown_new_tree_workers(NEW_TREE_WORKER_SHUTDOWN_TIMEOUT) in
EpochManager::run() is never executed when the outer run_service() path
cancels/drops epoch_manager.run(), leaving NewTreeWorker threads detached; fix
by ensuring shutdown_new_tree_workers is invoked on service shutdown as well:
either call EpochManager::shutdown_new_tree_workers(...) from the run_service()
shutdown branch before dropping epoch_manager, or refactor EpochManager::run()
so it owns and awaits shutdown completion (i.e., make run() return only after
shutdown_new_tree_workers completes) so NewTreeWorker handles are always joined;
locate use sites around run_service(), EpochManager::run(),
shutdown_new_tree_workers, NEW_TREE_WORKER_SHUTDOWN_TIMEOUT, and NewTreeWorker
to implement the chosen approach.
- Around line 914-994: The new-tree worker spawned in process_epoch is detached
and registered via register_new_tree_worker(NewTreeWorker { ... }) but not
awaited before the epoch's work reporting and cleanup, so ensure new-tree
processing is included in the epoch's join set or explicitly awaited: modify
process_epoch to (a) add the spawned thread_handle/completion future from the
NewTreeWorker into the same join/await collection used for other per-epoch tasks
(or convert the oneshot completion_rx into a future and push it onto the epoch
join set), or (b) before calling update_metrics_and_counts() / per-epoch cleanup
logic, await the NewTreeWorker completion (completion_rx or join handle) for any
registered new trees; reference the spawned closure that calls process_queue and
the register_new_tree_worker/NewTreeWorker entries to locate where to add the
join/await.

In `@forester/src/forester_status.rs`:
- Around line 672-689: Two identical V1 queue parsing blocks (around the uses of
queue_account) should be consolidated: extract a helper function (e.g.,
parse_v1_queue_stats or parse_queue_account_v1) that accepts the account bytes
(or &Account) and a log label/string, calls unsafe
parse_hash_set_from_bytes::<QueueAccount>(&acc.data), computes len by filtering
for cell.sequence_number.is_none(), gets cap via hs.get_capacity(), and returns
(Option<u64>, Option<u64>); on Err call warn!(?error, "{label}: Failed to parse
StateV1 queue hash set") or similar. Replace both duplicated match chains with
calls to that helper and unwrap_or((None,None)) as before.

In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs`:
- Around line 202-203: Add an explicit guard that rejects zkp_batch_size == 0
before using it (before the line that sets batch_len = zkp_batch_size) so we
don't produce a no-op payload with new_root left zeroed; e.g., check
zkp_batch_size == 0 and return an Err (or panic) with a clear message like
"zero-sized ZKP batch not allowed" to fail fast and ensure callers handle the
invalid batch size.
- Around line 403-404: The IndexedChangelogEntry's changelog_index is set
incorrectly using staged_indexed_changelog.len(); change both assignments (where
IndexedChangelogEntry is constructed) to refer to the raw changelog position by
using staged_changelog.len() - 1 immediately after pushing the new entry into
staged_changelog (not staged_indexed_changelog.len()); update the two
occurrences that currently reference staged_indexed_changelog.len() so
changelog_index = staged_changelog.len() - 1.

In `@prover/client/src/prover.rs`:
- Around line 175-202: The local binding spawn_result is redundant because the
async block returns unit; remove the spawn_result variable and directly await
the async block (or simply execute the async block without assigning) that
spawns the process (Command::new(&prover_path).arg("start-prover").spawn()),
calls wait_for_prover_health(..., &mut child).await, and then calls
monitor_prover_child(child) or kills/waits on the child on error; after that
keep the IS_LOADING.store(false, Ordering::Release) call as-is. Locate the block
around spawn_result, Command::new, wait_for_prover_health, monitor_prover_child,
and IS_LOADING to make the change.

In `@sdk-libs/client/src/indexer/photon_indexer.rs`:
- Around line 1145-1162: The second validation that compares proof_len to
expected_siblings is redundant because expected_total is defined as
STATE_MERKLE_TREE_CANOPY_DEPTH + expected_siblings and the prior check ensures
x.proof.len() == expected_total; remove the dead block (the if that returns
IndexerError::InvalidParameters comparing proof_len and expected_siblings) and
retain the initial length check using expected_total, then compute proof_len =
x.proof.len() - STATE_MERKLE_TREE_CANOPY_DEPTH without re-checking equality.
- Around line 1693-1694: tree_next_insertion_index is incorrectly initialized
from aq.start_index; change it to use the tree's actual next insertion index
instead. Locate the AddressQueueWithContext construction where
tree_next_insertion_index is set (currently using aq.start_index) and replace
that with the tree's rightmost/next index obtained from the tree account or
staging tree (call the tree's next_index() or query the tree account state like
the test_indexer does). If the Photon API does not return the tree's next_index,
update the code path to query the tree account state (or extend the API to
include next_index) and use that value when initializing
tree_next_insertion_index so staging trees reflect the real insertion point.

In `@sdk-libs/client/src/interface/load_accounts.rs`:
- Around line 163-165: The ATA enumeration resets per chunk causing
MissingAtaContext to report wrong indices; update build_ata_load to accept a
start_index: usize and use it when creating AtaContext (e.g., in
AtaContext::from_interface calls add start_index + i) so contexts reflect their
absolute position, then when iterating cold_atas.chunks(...) compute the chunk's
start_index (running offset or chunk_index * MAX_ATAS_PER_IX) and pass it into
build_ata_load; apply the same change to the other call site referenced around
lines 401-405 so all ATA load invocations supply the correct flattened index.
- Around line 73-75: The concurrent proof pipelines use join! which waits for
all branches even if one fails; replace the join! at the proof orchestration
site with try_join! so the first proof-fetch error cancels the other futures and
returns immediately, and ensure the streams that call
.buffered(PROOF_FETCH_CONCURRENCY) (the PDA, ATA and mint proof helpers) are
compatible with cancellation; alternatively, if you intended a global cap
instead of per-stream caps, implement a shared semaphore/permit budget used by
the helpers and keep the buffered(…) limits aligned with that shared budget.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 98-111: build_compressed_proof currently assumes body is raw gnark
proof JSON but needs to handle the timed envelope { "proof": ...,
"proof_duration_ms": ... } too; modify build_compressed_proof (or create a
shared helper used by _get_validity_proof_v1_implementation) to first check if
the incoming body is an envelope (e.g., parse into a minimal map/object and if
it contains a "proof" key extract that value), then pass the extracted inner
proof JSON to deserialize_gnark_proof_json; keep the existing error mapping to
IndexerError and then continue with proof_from_json_struct and compress_proof as
before so both raw and wrapped responses succeed.

---

Outside diff comments:
In `@forester/src/epoch_manager.rs`:
- Around line 576-625: The current loop fans out epochs via epoch_tasks
(FuturesUnordered) while get_or_create_state_processor() /
get_or_create_address_processor() reuse and retag a single mutable
QueueProcessor via update_epoch(), causing concurrent epochs to corrupt shared
processor context; fix by either (A) making epoch processing serial: remove
FuturesUnordered concurrency and await process_epoch(epoch) directly in the
select branch so each epoch completes before the next starts (affecting
epoch_tasks/epoch_tasks.push usage and the async move closure around
process_epoch), or (B) make the processor caches safe for concurrent epochs by
keying processors by (tree, epoch) instead of reusing a single instance: change
get_or_create_state_processor() and get_or_create_address_processor() to accept
epoch (or return a per-epoch cloned QueueProcessor) and stop calling
update_epoch() to retag a shared processor; choose one approach and apply
corresponding changes to process_epoch, update_epoch, and the processor cache
logic (QueueProcessor, get_or_create_*, get_or_create_*) so no two concurrent
epoch tasks mutate the same processor state.

In `@prover/client/src/proof_types/batch_update/proof_inputs.rs`:
- Around line 183-190: The code currently uses merkle_proof.try_into().unwrap(),
which can panic on malformed proofs; change this to propagate a
ProverClientError instead of unwrapping: attempt the conversion
(merkle_proof.try_into()) and map the Err case into an appropriate
ProverClientError (with context like "invalid merkle proof" and the original
error) before assigning merkle_proof_array, so subsequent calls (e.g.,
compute_root_from_merkle_proof(nullifier, &merkle_proof_array, *index as usize))
run only when the conversion succeeded.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: eab8c888-7dc8-4fd1-a13f-aa56687cb167

📥 Commits

Reviewing files that changed from the base of the PR and between fbf48e5 and 82dc761.

⛔ Files ignored due to path filters (6)

Cargo.lock is excluded by !**/*.lock and included by none
program-tests/utils/src/actions/legacy/instructions/transfer2.rs is excluded by none and included by none
program-tests/utils/src/e2e_test_env.rs is excluded by none and included by none
program-tests/utils/src/mock_batched_forester.rs is excluded by none and included by none
program-tests/utils/src/test_batch_forester.rs is excluded by none and included by none
sdk-tests/csdk-anchor-full-derived-test/tests/basic_test.rs is excluded by none and included by none

📒 Files selected for processing (38)

forester/src/epoch_manager.rs
forester/src/forester_status.rs
forester/src/metrics.rs
forester/src/priority_fee.rs
forester/src/processor/v2/helpers.rs
forester/src/processor/v2/processor.rs
forester/src/processor/v2/proof_worker.rs
forester/src/processor/v2/strategy/address.rs
forester/tests/e2e_test.rs
forester/tests/legacy/batched_state_async_indexer_test.rs
forester/tests/legacy/test_utils.rs
forester/tests/test_batch_append_spent.rs
forester/tests/test_indexer_interface.rs
forester/tests/test_utils.rs
prover/client/src/constants.rs
prover/client/src/errors.rs
prover/client/src/helpers.rs
prover/client/src/proof.rs
prover/client/src/proof_client.rs
prover/client/src/proof_types/batch_address_append/proof_inputs.rs
prover/client/src/proof_types/batch_append/proof_inputs.rs
prover/client/src/proof_types/batch_update/proof_inputs.rs
prover/client/src/prover.rs
prover/client/tests/batch_address_append.rs
sdk-libs/client/src/indexer/photon_indexer.rs
sdk-libs/client/src/indexer/types/proof.rs
sdk-libs/client/src/indexer/types/queue.rs
sdk-libs/client/src/interface/initialize_config.rs
sdk-libs/client/src/interface/instructions.rs
sdk-libs/client/src/interface/load_accounts.rs
sdk-libs/client/src/interface/pack.rs
sdk-libs/client/src/local_test_validator.rs
sdk-libs/client/src/utils.rs
sdk-libs/program-test/src/indexer/test_indexer.rs
sdk-libs/sdk-types/src/interface/account/token_seeds.rs
sdk-libs/sdk-types/src/interface/program/decompression/pda.rs
sdk-libs/sdk-types/src/interface/program/decompression/processor.rs
sdk-libs/sdk-types/src/interface/program/decompression/token.rs

forester/src/epoch_manager.rs

coderabbitai · 2026-03-17T08:55:55Z

forester/src/forester_status.rs

            let (queue_len, queue_cap) = queue_account
                .map(|acc| {
-                    unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) }
-                        .ok()
-                        .map(|hs| {
+                    match unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } {
+                        Ok(hs) => {
                            let len = hs
                                .iter()
                                .filter(|(_, cell)| cell.sequence_number.is_none())
                                .count() as u64;
                            let cap = hs.get_capacity() as u64;
-                            (len, cap)
-                        })
-                        .unwrap_or((0, 0))
+                            (Some(len), Some(cap))
+                        }
+                        Err(error) => {
+                            warn!(?error, "Failed to parse StateV1 queue hash set");
+                            (None, None)
+                        }
+                    }
                })
-                .map(|(l, c)| (Some(l), Some(c)))
                .unwrap_or((None, None));


🧹 Nitpick | 🔵 Trivial

Extract duplicated V1 queue parsing logic into one helper.

The two blocks are functionally identical except the log label. Consolidating them will reduce drift risk and simplify future parsing changes.

♻️ Proposed refactor

+fn parse_v1_queue_stats(acc: &Account, tree_kind: &str) -> (Option<u64>, Option<u64>) { + match unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } { + Ok(hs) => { + let len = hs + .iter() + .filter(|(_, cell)| cell.sequence_number.is_none()) + .count() as u64; + let cap = hs.get_capacity() as u64; + (Some(len), Some(cap)) + } + Err(error) => { + warn!(?error, %tree_kind, "Failed to parse V1 queue hash set"); + (None, None) + } + } +} ... - let (queue_len, queue_cap) = queue_account - .map(|acc| { - match unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } { - Ok(hs) => { - let len = hs - .iter() - .filter(|(_, cell)| cell.sequence_number.is_none()) - .count() as u64; - let cap = hs.get_capacity() as u64; - (Some(len), Some(cap)) - } - Err(error) => { - warn!(?error, "Failed to parse StateV1 queue hash set"); - (None, None) - } - } - }) - .unwrap_or((None, None)); + let (queue_len, queue_cap) = queue_account + .as_ref() + .map(|acc| parse_v1_queue_stats(acc, "StateV1")) + .unwrap_or((None, None)); ... - let (queue_len, queue_cap) = queue_account - .map(|acc| { - match unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } { - Ok(hs) => { - let len = hs - .iter() - .filter(|(_, cell)| cell.sequence_number.is_none()) - .count() as u64; - let cap = hs.get_capacity() as u64; - (Some(len), Some(cap)) - } - Err(error) => { - warn!(?error, "Failed to parse AddressV1 queue hash set"); - (None, None) - } - } - }) - .unwrap_or((None, None)); + let (queue_len, queue_cap) = queue_account + .as_ref() + .map(|acc| parse_v1_queue_stats(acc, "AddressV1")) + .unwrap_or((None, None));

Also applies to: 729-746

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@forester/src/forester_status.rs` around lines 672 - 689, Two identical V1 queue parsing blocks (around the uses of queue_account) should be consolidated: extract a helper function (e.g., parse_v1_queue_stats or parse_queue_account_v1) that accepts the account bytes (or &Account) and a log label/string, calls unsafe parse_hash_set_from_bytes::<QueueAccount>(&acc.data), computes len by filtering for cell.sequence_number.is_none(), gets cap via hs.get_capacity(), and returns (Option<u64>, Option<u64>); on Err call warn!(?error, "{label}: Failed to parse StateV1 queue hash set") or similar. Replace both duplicated match chains with calls to that helper and unwrap_or((None,None)) as before.

coderabbitai · 2026-03-17T08:55:55Z

prover/client/src/proof_types/batch_address_append/proof_inputs.rs

+    let batch_len = zkp_batch_size;
+    for (name, len) in [


⚠️ Potential issue | 🟡 Minor

Reject zero-sized batches explicitly.

A zkp_batch_size of 0 currently passes guards and can produce an invalid no-op payload (new_root stays zeroed). Add a hard check up front.

Suggested fix

pub fn get_batch_address_append_circuit_inputs<const HEIGHT: usize>( @@ ) -> Result<BatchAddressAppendInputs, ProverClientError> { + if zkp_batch_size == 0 { + return Err(ProverClientError::GenericError( + "zkp_batch_size must be greater than 0".to_string(), + )); + } let batch_len = zkp_batch_size;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs` around lines 202 - 203, Add an explicit guard that rejects zkp_batch_size == 0 before using it (before the line that sets batch_len = zkp_batch_size) so we don't produce a no-op payload with new_root left zeroed; e.g., check zkp_batch_size == 0 and return an Err (or panic) with a clear message like "zero-sized ZKP batch not allowed" to fail fast and ensure callers handle the invalid batch size.

coderabbitai · 2026-03-17T08:55:55Z

prover/client/src/proof_types/batch_address_append/proof_inputs.rs

+            changelog_index: staged_indexed_changelog.len(), //change_log_index,
        };


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Find IndexedChangelogEntry definition and changelog_index docs/usages:" rg -n -C4 'struct IndexedChangelogEntry|changelog_index' --type rust echo echo "Find patch_indexed_changelogs implementation/signature:" rg -n -C6 'fn patch_indexed_changelogs|patch_indexed_changelogs\(' --type rust

Repository: Lightprotocol/light-protocol

Length of output: 50385

🏁 Script executed:

# Get context around lines 403-404 and 508-509 in the file head -520 prover/client/src/proof_types/batch_address_append/proof_inputs.rs | tail -130

Repository: Lightprotocol/light-protocol

Length of output: 5417

🏁 Script executed:

# Look at lines around 260-280 to understand initialization sed -n '260,280p' prover/client/src/proof_types/batch_address_append/proof_inputs.rs # Look at the loop structure around line 265 to understand initial_changelog_len sed -n '260,275p' prover/client/src/proof_types/batch_address_append/proof_inputs.rs

Repository: Lightprotocol/light-protocol

Length of output: 1688

🏁 Script executed:

# Look at the full loop iteration including where changelog_index variable is used sed -n '265,310p' prover/client/src/proof_types/batch_address_append/proof_inputs.rs

Repository: Lightprotocol/light-protocol

Length of output: 1867

🏁 Script executed:

# Check what happens with patch_indexed_changelogs - does it consume the changelog_index variable? # Look at sparse-merkle-tree/src/indexed_changelog.rs to see what happens to changelog_index param sed -n '25,35p' sparse-merkle-tree/src/indexed_changelog.rs # Now let's look at how the patched changelog_index is used after patching sed -n '295,410p' prover/client/src/proof_types/batch_address_append/proof_inputs.rs

Repository: Lightprotocol/light-protocol

Length of output: 5652

🏁 Script executed:

# Let's verify the exact state of staged_changelog when the IndexedChangelogEntry is created # by looking at what happens between the push at line 370 and the creation at line 403 sed -n '365,405p' prover/client/src/proof_types/batch_address_append/proof_inputs.rs

Repository: Lightprotocol/light-protocol

Length of output: 1600

🏁 Script executed:

# Verify the second occurrence at line 508 has the same pattern sed -n '430,515p' prover/client/src/proof_types/batch_address_append/proof_inputs.rs

Repository: Lightprotocol/light-protocol

Length of output: 3783

🏁 Script executed:

# Let's verify the test expectations one more time to confirm the semantics # Look at test_forester.rs line 625-626 where it validates changelog_index sed -n '620,630p' program-tests/utils/src/test_forester.rs

Repository: Lightprotocol/light-protocol

Length of output: 429

🏁 Script executed:

# Confirm initial_changelog_len and understand the context better sed -n '250,270p' prover/client/src/proof_types/batch_address_append/proof_inputs.rs

Repository: Lightprotocol/light-protocol

Length of output: 796

Fix changelog_index to reference raw changelog position, not indexed changelog length.

The changelog_index field in IndexedChangelogEntry must point to the position in the raw ConcurrentMerkleTree changelog, not the indexed changelog. Both lines 403 and 508 incorrectly use staged_indexed_changelog.len(). After each entry is pushed to staged_changelog (lines 370 and 498), the changelog position is staged_changelog.len() - 1.

Apply to both locations

let low_element_changelog_entry = IndexedChangelogEntry { element: new_low_element_raw, proof: low_element_changelog_proof, - changelog_index: staged_indexed_changelog.len(), //change_log_index, + changelog_index: staged_changelog.len() - 1, }; let new_element_changelog_entry = IndexedChangelogEntry { element: new_element_raw, proof: merkle_proof_array, - changelog_index: staged_indexed_changelog.len(), + changelog_index: staged_changelog.len() - 1, };

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

changelog_index: staged_indexed_changelog.len(), //change_log_index,

};

let low_element_changelog_entry = IndexedChangelogEntry {

element: new_low_element_raw,

proof: low_element_changelog_proof,

changelog_index: staged_changelog.len() - 1,

};

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@prover/client/src/proof_types/batch_address_append/proof_inputs.rs` around lines 403 - 404, The IndexedChangelogEntry's changelog_index is set incorrectly using staged_indexed_changelog.len(); change both assignments (where IndexedChangelogEntry is constructed) to refer to the raw changelog position by using staged_changelog.len() - 1 immediately after pushing the new entry into staged_changelog (not staged_indexed_changelog.len()); update the two occurrences that currently reference staged_indexed_changelog.len() so changelog_index = staged_changelog.len() - 1.

coderabbitai · 2026-03-17T08:55:56Z

sdk-libs/client/src/indexer/photon_indexer.rs

+                    let expected_siblings =
+                        STATE_MERKLE_TREE_HEIGHT - STATE_MERKLE_TREE_CANOPY_DEPTH;
+                    let expected_total = STATE_MERKLE_TREE_CANOPY_DEPTH + expected_siblings;
+                    if x.proof.len() != expected_total {
                        return Err(IndexerError::InvalidParameters(format!(
-                            "Merkle proof length ({}) is less than canopy depth ({})",
-                            proof_vec.len(),
-                            STATE_MERKLE_TREE_CANOPY_DEPTH,
+                            "Merkle proof length ({}) does not match expected total proof length ({})",
+                            x.proof.len(),
+                            expected_total,
+                        )));
+                    }
+                    let proof_len = x.proof.len() - STATE_MERKLE_TREE_CANOPY_DEPTH;
+                    if proof_len != expected_siblings {
+                        return Err(IndexerError::InvalidParameters(format!(
+                            "Merkle proof sibling count ({}) does not match expected sibling count ({})",
+                            proof_len,
+                            expected_siblings,
                        )));
                    }


⚠️ Potential issue | 🟡 Minor

Redundant validation: second check can never fail after first check passes.

Let me walk through the math here:

expected_siblings = HEIGHT - CANOPY_DEPTH

expected_total = CANOPY_DEPTH + expected_siblings = HEIGHT

First check ensures proof.len() == expected_total

After passing: proof_len = proof.len() - CANOPY_DEPTH = expected_total - CANOPY_DEPTH = expected_siblings

The second check at lines 1156-1161 is dead code—it will always pass because the algebraic relationship guarantees proof_len == expected_siblings once the first check succeeds.

🧹 Suggested simplification

let expected_siblings = STATE_MERKLE_TREE_HEIGHT - STATE_MERKLE_TREE_CANOPY_DEPTH; -let expected_total = STATE_MERKLE_TREE_CANOPY_DEPTH + expected_siblings; -if x.proof.len() != expected_total { +if x.proof.len() != STATE_MERKLE_TREE_HEIGHT { return Err(IndexerError::InvalidParameters(format!( "Merkle proof length ({}) does not match expected total proof length ({})", x.proof.len(), - expected_total, + STATE_MERKLE_TREE_HEIGHT, ))); } -let proof_len = x.proof.len() - STATE_MERKLE_TREE_CANOPY_DEPTH; -if proof_len != expected_siblings { - return Err(IndexerError::InvalidParameters(format!( - "Merkle proof sibling count ({}) does not match expected sibling count ({})", - proof_len, - expected_siblings, - ))); -} +let proof_len = expected_siblings;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk-libs/client/src/indexer/photon_indexer.rs` around lines 1145 - 1162, The second validation that compares proof_len to expected_siblings is redundant because expected_total is defined as STATE_MERKLE_TREE_CANOPY_DEPTH + expected_siblings and the prior check ensures x.proof.len() == expected_total; remove the dead block (the if that returns IndexerError::InvalidParameters comparing proof_len and expected_siblings) and retain the initial length check using expected_total, then compute proof_len = x.proof.len() - STATE_MERKLE_TREE_CANOPY_DEPTH without re-checking equality.

coderabbitai · 2026-03-17T08:55:56Z

sdk-libs/client/src/indexer/photon_indexer.rs

                    start_index: aq.start_index,
+                    tree_next_insertion_index: aq.start_index,


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check what fields the photon_api AddressQueue response type provides rg -n "struct.*AddressQueue" --type rust -A 30 | head -80 # Also check if there's a tree_next_insertion_index or similar field in photon_api types rg -n "next_insertion_index|tree_next_index|next_index" --type rust -g '**photon**' -C 2

Repository: Lightprotocol/light-protocol

Length of output: 8091

🏁 Script executed:

# Find NewAddressProofWithContext definition rg -n "struct.*NewAddressProofWithContext" --type rust -A 25 # Check how tree_next_insertion_index is used rg -n "tree_next_insertion_index" --type rust -B 2 -A 2

Repository: Lightprotocol/light-protocol

Length of output: 6134

🏁 Script executed:

# Search for documentation about address queue and tree insertion semantics find . -name "*.md" -type f | xargs grep -l "tree_next_insertion_index\|start_index\|address.*queue" 2>/dev/null | head -5 # Look for comments or docs about how start_index relates to tree insertion rg -n "start_index" --type rust -B 2 -A 2 | grep -i "tree\|insert\|next" | head -20

Repository: Lightprotocol/light-protocol

Length of output: 2337

🏁 Script executed:

# Look for AddressWithTree or similar type that might include tree state rg -n "struct.*Address.*Tree|struct.*AddressTree" --type rust -A 20 | grep -E "next_index|insertion|rightmost" -B 5 -A 5 # Check if there's additional tree context returned by Photon API rg -n "AddressWithTree\|address.*tree.*state" --type rust -A 15 | head -80

Repository: Lightprotocol/light-protocol

Length of output: 7403

🏁 Script executed:

# Check if merkle_tree account state is accessible in this context rg -n "merkle_tree.*next_index\|tree_next_index\|right_most" --type rust -B 3 -A 3 | grep -E "photon|indexer|AddressQueueWithContext" -B 5 -A 5 # Look at the full context around line 1694 to see what other data is available sed -n '1680,1710p' sdk-libs/client/src/indexer/photon_indexer.rs

Repository: Lightprotocol/light-protocol

Length of output: 1350

Fix tree_next_insertion_index to use the tree's actual insertion point, not the queue's start index.

tree_next_insertion_index initializes sparse staging trees and must represent the tree's current insertion point, not the queue batch's starting position. The field comment in AddressQueueWithContext clearly states: "Sparse tree insertion point / next index used to initialize staging trees."

Currently, both start_index and tree_next_insertion_index are set to aq.start_index, which is semantically incorrect. The test indexer correctly derives tree_next_insertion_index from the tree's rightmost index (see test_indexer.rs:731), and the forester uses the staging tree's next_index().

The Photon API response doesn't provide the tree's next insertion point. This requires either querying the tree account's state or enhancing the Photon API response to include the tree's next_index field.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk-libs/client/src/indexer/photon_indexer.rs` around lines 1693 - 1694, tree_next_insertion_index is incorrectly initialized from aq.start_index; change it to use the tree's actual next insertion index instead. Locate the AddressQueueWithContext construction where tree_next_insertion_index is set (currently using aq.start_index) and replace that with the tree's rightmost/next index obtained from the tree account or staging tree (call the tree's next_index() or query the tree account state like the test_indexer does). If the Photon API does not return the tree's next_index, update the code path to query the tree account state (or extend the API to include next_index) and use that value when initializing tree_next_insertion_index so staging trees reflect the real insertion point.

coderabbitai · 2026-03-17T08:55:56Z

sdk-libs/client/src/interface/load_accounts.rs

 const MAX_ATAS_PER_IX: usize = 8;
+const MAX_PDAS_PER_IX: usize = 8;
+const PROOF_FETCH_CONCURRENCY: usize = 8;


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n sdk-libs/client/src/interface/load_accounts.rs | head -100

Repository: Lightprotocol/light-protocol

Length of output: 4138

🏁 Script executed:

cat -n sdk-libs/client/src/interface/load_accounts.rs | sed -n '130,150p'

Repository: Lightprotocol/light-protocol

Length of output: 966

🏁 Script executed:

cat -n sdk-libs/client/src/interface/load_accounts.rs | sed -n '260,295p'

Repository: Lightprotocol/light-protocol

Length of output: 1267

🏁 Script executed:

rg "buffered\|join\!" sdk-libs/client/src/interface/load_accounts.rs -A 2 -B 2

Repository: Lightprotocol/light-protocol

Length of output: 168

🏁 Script executed:

rg "buffered|join" sdk-libs/client/src/interface/load_accounts.rs -A 2 -B 2

Repository: Lightprotocol/light-protocol

Length of output: 599

🏁 Script executed:

cat -n sdk-libs/client/src/interface/load_accounts.rs | sed -n '294,320p'

Repository: Lightprotocol/light-protocol

Length of output: 980

🏁 Script executed:

web_search "futures::join vs futures::try_join difference fail fast"

Repository: Lightprotocol/light-protocol

Length of output: 122

🏁 Script executed:

web_search "futures stream buffered concurrency semantic concurrent limits"

Repository: Lightprotocol/light-protocol

Length of output: 122

🌐 Web query:

futures::join vs futures::try_join difference fail fast behavior

💡 Result:

futures::join!: polls all provided futures concurrently and waits for all of them to complete, returning a tuple of their outputs. It does not fail fast—even if one future’s output is an Err, join! still waits for the others because it doesn’t interpret the outputs. [1]

futures::try_join!: like join!, but for futures that return Result. It returns Ok((...)) if all succeed, and fails fast: as soon as any future resolves to Err(e), try_join! returns immediately with that error and the remaining futures are canceled by being dropped (i.e., they stop being polled). [2] That “cancellation” is cooperative—any cleanup depends on the futures being cancellation-safe. [3]

Important nuance: fail-fast only stops the inlined futures being driven by the macro. If you previously spawned work into separate tasks, dropping the future/handle used by try_join! won’t necessarily stop the spawned tasks (they may keep running unless explicitly aborted/canceled). [3]

Sources:
[1] core::future::join docs/source (join polls multiple futures and returns a tuple) https://doc.rust-lang.org/src/core/future/join.rs.html
[2] futures::try_join! docs (returns early on first error) https://docs.rs/futures/latest/futures/macro.try_join.html
[3] “Cancelling async Rust” (try_join cancels the other futures; cancellation safety discussion) https://sunshowers.io/posts/cancelling-async-rust/

🌐 Web query:

futures stream buffered concurrency how many concurrent requests allowed

💡 Result:

In Rust’s futures::stream::StreamExt, the concurrency is explicitly bounded by the argument you pass:

stream.buffered(n) / stream.buffer_unordered(n) will have at most n futures “in flight” (being polled concurrently) at a time; as one completes, the next is started. buffered preserves output order; buffer_unordered does not. [1]

If you’re using for_each_concurrent(limit, ...), the limit is an Option<usize>: Some(n) bounds concurrency to n, while None means unbounded (limited only by resources / upstream). [1]

So there is no fixed “how many concurrent requests are allowed” by the combinator itself—it’s whatever n (or limit) you choose. Typical HTTP examples use these specifically to cap concurrent requests. [2]

Sources:
[1] futures StreamExt docs showing buffered(n), buffer_unordered(n), and for_each_concurrent(limit: Into<Option<usize>>) https://docs.rs/futures/latest/futures/stream/struct.Buffered.html
[2] Example discussion recommending buffer / buffer_unordered to limit concurrent HTTP requests https://stackoverflow.com/questions/51044467/how-can-i-perform-parallel-asynchronous-http-get-requests-with-reqwest

🏁 Script executed:

rg "Semaphore|Permit|Arc|Mutex" sdk-libs/client/src/interface/load_accounts.rs

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

cat -n sdk-libs/client/src/interface/load_accounts.rs | sed -n '1,90p' | grep -A5 -B5 "TODO\|FIXME\|NOTE"

Repository: Lightprotocol/light-protocol

Length of output: 556

Use try_join! for fail-fast proof orchestration.

Three buffered streams (PDA, ATA, mint proofs) run concurrently via join! at line 135. Each applies .buffered(PROOF_FETCH_CONCURRENCY) (lines 269, 289), so up to 24 proof requests can be in flight. Because join! waits for all branches to complete, an error in one stream doesn't cancel the others—they keep polling and consuming resources until completion.

Switching to try_join! returns immediately on the first proof-fetch error and cancels the remaining futures, which is the intended behavior when any critical account load fails.

Proposed fix

- let (pda_proofs, ata_proofs, mint_proofs) = futures::join!( + let (pda_proofs, ata_proofs, mint_proofs) = futures::try_join!( fetch_proof_batches(&pda_hashes, indexer), fetch_proofs_batched(&ata_hashes, MAX_ATAS_PER_IX, indexer), fetch_individual_proofs(&mint_hashes, indexer), - ); - - let pda_proofs = pda_proofs?; - let ata_proofs = ata_proofs?; - let mint_proofs = mint_proofs?; + )?;

If the intended ceiling is 8 total proof requests rather than 8 per account type, add a shared semaphore/permit budget inside the helpers to enforce the global cap across all three pipelines.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk-libs/client/src/interface/load_accounts.rs` around lines 73 - 75, The concurrent proof pipelines use join! which waits for all branches even if one fails; replace the join! at the proof orchestration site with try_join! so the first proof-fetch error cancels the other futures and returns immediately, and ensure the streams that call .buffered(PROOF_FETCH_CONCURRENCY) (the PDA, ATA and mint proof helpers) are compatible with cancellation; alternatively, if you intended a global cap instead of per-stream caps, implement a shared semaphore/permit budget used by the helpers and keep the buffered(…) limits aligned with that shared budget.

coderabbitai · 2026-03-17T08:55:56Z

sdk-libs/client/src/interface/load_accounts.rs

    // 3. ATA loads (CreateAssociatedTokenAccount + Transfer2) - requires mint to exist
-    let ata_chunks: Vec<_> = cold_atas.chunks(MAX_ATAS_PER_IX).collect();
-    for (chunk, proof) in ata_chunks.into_iter().zip(ata_proofs) {
+    for (chunk, proof) in cold_atas.chunks(MAX_ATAS_PER_IX).zip(ata_proofs) {
        out.extend(build_ata_load(chunk, proof, fee_payer)?);


⚠️ Potential issue | 🟡 Minor

Preserve the flattened ATA index across chunks.

Line 164 chunks cold_atas, but build_ata_load still restarts enumeration at zero. A MissingAtaContext from the second or later chunk will therefore point at 0..7 again instead of the real position in cold_atas.

🛠️ Proposed fix

- for (chunk, proof) in cold_atas.chunks(MAX_ATAS_PER_IX).zip(ata_proofs) { - out.extend(build_ata_load(chunk, proof, fee_payer)?); + let mut ata_offset = 0usize; + for (chunk, proof) in cold_atas.chunks(MAX_ATAS_PER_IX).zip(ata_proofs) { + out.extend(build_ata_load(chunk, proof, fee_payer, ata_offset)?); + ata_offset += chunk.len(); }

fn build_ata_load( ifaces: &[&TokenAccountInterface], proof: ValidityProofWithContext, fee_payer: Pubkey, start_index: usize, ) -> Result<Vec<Instruction>, LoadAccountsError> { let contexts: Vec<AtaContext> = ifaces .iter() .enumerate() .map(|(i, a)| AtaContext::from_interface(a, start_index + i)) .collect::<Result<Vec<_>, _>>()?; // ... }

Also applies to: 401-405

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk-libs/client/src/interface/load_accounts.rs` around lines 163 - 165, The ATA enumeration resets per chunk causing MissingAtaContext to report wrong indices; update build_ata_load to accept a start_index: usize and use it when creating AtaContext (e.g., in AtaContext::from_interface calls add start_index + i) so contexts reflect their absolute position, then when iterating cold_atas.chunks(...) compute the chunk's start_index (running offset or chunk_index * MAX_ATAS_PER_IX) and pass it into build_ata_load; apply the same change to the other call site referenced around lines 401-405 so all ATA load invocations supply the correct flattened index.

coderabbitai · 2026-03-17T08:55:56Z

sdk-libs/program-test/src/indexer/test_indexer.rs

+fn build_compressed_proof(body: &str) -> Result<CompressedProof, IndexerError> {
+    let proof_json = deserialize_gnark_proof_json(body)
+        .map_err(|error| IndexerError::CustomError(error.to_string()))?;
+    let (proof_a, proof_b, proof_c) = proof_from_json_struct(proof_json)
+        .map_err(|error| IndexerError::CustomError(error.to_string()))?;
+    let (proof_a, proof_b, proof_c) = compress_proof(&proof_a, &proof_b, &proof_c)
+        .map_err(|error| IndexerError::CustomError(error.to_string()))?;
+
+    Ok(CompressedProof {
+        a: proof_a,
+        b: proof_b,
+        c: proof_c,
+    })
+}


⚠️ Potential issue | 🟠 Major

Parse the timed proof envelope here too.

build_compressed_proof() only accepts raw gnark proof JSON, but the same /prove endpoint is now handled elsewhere as either raw proof JSON or { "proof": ..., "proof_duration_ms": ... }. When the wrapped form comes back here, _get_validity_proof_v1_implementation() will fail after a 200 OK. Please unwrap the proof field before deserializing, or move the shared response parsing into a common helper.

🔧 Suggested fix

+#[derive(serde::Deserialize)] +struct ProofWithTimingJson { + proof: serde_json::Value, +} + fn build_compressed_proof(body: &str) -> Result<CompressedProof, IndexerError> { - let proof_json = deserialize_gnark_proof_json(body) + let proof_body = serde_json::from_str::<ProofWithTimingJson>(body) + .map(|wrapped| wrapped.proof.to_string()) + .unwrap_or_else(|_| body.to_string()); + let proof_json = deserialize_gnark_proof_json(&proof_body) .map_err(|error| IndexerError::CustomError(error.to_string()))?; let (proof_a, proof_b, proof_c) = proof_from_json_struct(proof_json) .map_err(|error| IndexerError::CustomError(error.to_string()))?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 98 - 111, build_compressed_proof currently assumes body is raw gnark proof JSON but needs to handle the timed envelope { "proof": ..., "proof_duration_ms": ... } too; modify build_compressed_proof (or create a shared helper used by _get_validity_proof_v1_implementation) to first check if the incoming body is an envelope (e.g., parse into a minimal map/object and if it contains a "proof" key extract that value), then pass the extracted inner proof JSON to deserialize_gnark_proof_json; keep the existing error mapping to IndexerError and then continue with proof_from_json_struct and compress_proof as before so both raw and wrapped responses succeed.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

forester/src/epoch_manager.rs (1)
570-585: ⚠️ Potential issue | 🟠 Major

Shutdown order allows new workers to be registered after drain starts.

shutdown_new_tree_workers() (Line 709) runs before _guard aborts producer tasks (Line 570). While draining, discover_trees_periodically -> add_new_tree -> register_new_tree_worker can still append workers, leaving them outside the shutdown join path.

Abort background producers first, then drain/join new_tree_workers.

Also applies to: 709-710
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 570 - 585, The shutdown currently
calls shutdown_new_tree_workers() before the scopeguard that aborts background
producers (the guard created as let _guard = scopeguard::guard(...)), which
allows discover_trees_periodically -> add_new_tree -> register_new_tree_worker
to append workers after drain starts; move the abort of producer tasks earlier
so background producers are stopped before draining/joining new_tree_workers:
ensure the scopeguard (the closure that aborts current_previous_handle,
tree_discovery_handle, balance_check_handle) runs (or explicitly call abort on
those handles) prior to invoking shutdown_new_tree_workers()/joining
new_tree_workers, so no new workers can be registered after drain starts (adjust
location around the scopeguard creation and the call to
shutdown_new_tree_workers).
sdk-libs/program-test/src/indexer/test_indexer.rs (1)
719-732: ⚠️ Potential issue | 🟠 Major

Keep tree_next_insertion_index aligned with the production indexer contract.

Line 731 now uses address_tree_bundle.right_most_index(), but sdk-libs/client/src/indexer/photon_indexer.rs:1658-1698 still populates the same AddressQueueData field from aq.start_index. If those values differ, program-test will exercise a different staging offset than production and batch-address proofs can shift by the queue pagination delta. Please reconcile the field meaning in both constructors together.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 719 - 732,
The test constructor sets AddressQueueData.tree_next_insertion_index from
address_tree_bundle.right_most_index(), while the production constructor
populates that same field from aq.start_index; make them consistent by changing
the test to set tree_next_insertion_index to start_index (start as u64) so it
matches the production behavior, or alternatively update the production
constructor to use address_tree_bundle.right_most_index()—whichever semantic
(start_index vs right_most_index) you choose, apply it consistently in both the
AddressQueueData builder in this test and in the production constructor that
currently reads aq.start_index to avoid pagination/offset mismatches.

♻️ Duplicate comments (3)

forester/src/forester_status.rs (1)

673-688: 🧹 Nitpick | 🔵 Trivial

Extract duplicated V1 queue parsing into one helper.

Line 673-688 and Line 730-745 duplicate the same QueueAccount parsing/count/capacity/error logic. Centralize this in a helper to prevent drift.

♻️ Refactor sketch

+fn parse_v1_queue_stats(acc: &Account, tree_kind: &str) -> (Option<u64>, Option<u64>) {
+    match unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } {
+        Ok(hs) => {
+            let len = hs
+                .iter()
+                .filter(|(_, cell)| cell.sequence_number.is_none())
+                .count() as u64;
+            let cap = hs.get_capacity() as u64;
+            (Some(len), Some(cap))
+        }
+        Err(error) => {
+            warn!(?error, %tree_kind, "Failed to parse V1 queue hash set");
+            (None, None)
+        }
+    }
+}
...
- let (queue_len, queue_cap) = queue_account
-     .map(|acc| match unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } { ... })
-     .unwrap_or((None, None));
+ let (queue_len, queue_cap) = queue_account
+     .as_ref()
+     .map(|acc| parse_v1_queue_stats(acc, "StateV1"))
+     .unwrap_or((None, None));
...
- let (queue_len, queue_cap) = queue_account
-     .map(|acc| match unsafe { parse_hash_set_from_bytes::<QueueAccount>(&acc.data) } { ... })
-     .unwrap_or((None, None));
+ let (queue_len, queue_cap) = queue_account
+     .as_ref()
+     .map(|acc| parse_v1_queue_stats(acc, "AddressV1"))
+     .unwrap_or((None, None));

Also applies to: 730-745

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@forester/src/forester_status.rs` around lines 673 - 688, Create a helper
function (e.g., fn parse_queue_metrics<T>(acc: &Account) -> (Option<u64>,
Option<u64>)) that encapsulates the unsafe
parse_hash_set_from_bytes::<QueueAccount>(&acc.data) logic, the iteration to
compute len (count of cells with sequence_number.is_none()) and cap
(hs.get_capacity()), and the Err branch that calls warn!(?error, "Failed to
parse StateV1 queue hash set") returning (None, None); then replace both
duplicated map closures (the blocks using
parse_hash_set_from_bytes::<QueueAccount>) with calls to this helper (passing
the acc), ensuring the helper is used where the original closures lived.

forester/src/epoch_manager.rs (2)

943-1023: ⚠️ Potential issue | 🟠 Major

Epoch report/cleanup still races with in-flight new-tree workers.

Line 943 spawns per-epoch new-tree workers, but Line 1401 (report send) and Lines 1423-1432 (epoch state cleanup) run without awaiting completion of same-epoch workers. Late worker updates can under-report work and recreate cleaned per-epoch state.

Please await/join all new_tree_workers for epoch before reporting and cleanup.

Also applies to: 1401-1432
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 943 - 1023, The per-epoch
report/send and cleanup run while per-epoch new-tree workers spawned by
register_new_tree_worker may still be running; to fix, before the report send
and before removing epoch state iterate the collection of NewTreeWorker entries
for that epoch and wait for each worker to finish by awaiting its completion
oneshot receiver (NewTreeWorker.completion) and/or joining its thread_handle
(JoinHandle::join or await the completion_rx) and only then proceed with the
report and epoch cleanup; ensure you also send cancellation
(NewTreeWorker.cancel) where appropriate, remove entries after join, and update
any places that access self.new_tree_workers so they await these completions for
the same epoch.
1936-1957: ⚠️ Potential issue | 🔴 Critical

Critical: Shutdown can leave tree-processing work running detached in the background.

At line 600, shutdown breaks the main loop immediately without draining epoch_tasks. If an epoch_tasks future is mid-await on join_all(spawn_blocking_tasks) (line 1957), dropping that future does not stop the already-spawned blocking tasks—Tokio's spawn_blocking worker pool continues executing them independently. This means process_queue work (line 1948) continues running after run() returns, violating clean shutdown semantics.

Ensure in-flight work is either awaited to completion before shutdown or explicitly aborted. Consider storing JoinHandles from spawn_blocking at a scope where they can be explicitly aborted on shutdown, or drain epoch_tasks before exiting the select loop.

Also applies to: Lines 700–710 (post-loop cleanup lacks epoch_tasks draining)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 1936 - 1957, The code currently
spawns blocking tasks inside the per-tree async task and immediately awaits
them, allowing spawned blocking work to continue after run() exits; change this
to capture and manage the Tokio JoinHandle so work can be awaited or aborted on
shutdown. Specifically, modify the block that calls tokio::task::spawn_blocking
in the task pushed into tasks (referenced symbols: spawn_blocking,
process_queue, worker_slots, runtime_handle, tasks/epoch_tasks) to return or
store the JoinHandle instead of awaiting it inline; then in run() maintain a
Vec<JoinHandle> for in-flight spawn_blocking handles and on shutdown either
.await them to completion or call .abort() on each (and optionally await the
aborted handles) before returning. Ensure permits (from
worker_slots.acquire_owned()) are held only for the lifetime of the spawned work
and dropped appropriately when aborting or awaiting the JoinHandle so semaphores
are released.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@forester/tests/e2e_test.rs`:
- Around line 802-817: The test currently spawns a nested runtime with
tokio::task::spawn_blocking and a new Tokio runtime to run
run_pipeline::<LightClient>, which is unnecessary; replace the spawn_blocking +
Builder::new_multi_thread block with a direct tokio::spawn that awaits
run_pipeline::<LightClient> on the existing test runtime, passing the same
captured values (config, shutdown_receiver, shutdown_compressible_receiver,
shutdown_bootstrap_receiver, work_report_sender) and assign its JoinHandle to
service_handle so the spawned task runs as a regular async task on the test
runtime.

In `@forester/tests/test_batch_append_spent.rs`:
- Around line 331-346: The test spawns a blocking thread that builds a nested
Tokio runtime which causes Handle::current() to bind to the wrong runtime;
replace the tokio::task::spawn_blocking+inner runtime with a direct tokio::spawn
of the async run_pipeline::<LightClient> future. Specifically, move the closure
logic to call tokio::spawn(run_pipeline::<LightClient>(config, None, None,
shutdown_receiver, Some(shutdown_compressible_receiver), None,
work_report_sender)) (await or join the JoinHandle as needed in the test) and
remove the runtime::Builder and runtime.block_on usage so the test uses the
existing test runtime.

In `@prover/client/src/proof_types/batch_append/proof_inputs.rs`:
- Around line 131-142: The loop that checks batch lengths doesn't validate
individual Merkle proof heights, so the later conversion try_into().unwrap() can
panic; update proof_inputs.rs to either (a) preflight-validate every
merkle_proofs[i].len() equals HEIGHT (alongside existing batch checks for
old_leaves, leaves, merkle_proofs) and return ProverClientError::GenericError on
mismatch, or (b) replace the unsafe conversion site where try_into().unwrap() is
used with a fallible conversion that maps the error to ProverClientError (e.g.,
try_into().map_err(|e| ProverClientError::GenericError(format!("invalid merkle
proof height: {e}"))) ) so malformed proof lengths produce a Result::Err instead
of a panic; reference the merkle_proofs array and the try_into().unwrap() call
when applying the change.

In `@prover/client/src/proof.rs`:
- Around line 72-83: In deserialize_hex_string_to_be_bytes, avoid parsing
arbitrarily long hex strings by adding a pre-parse length guard on the
normalized hex string (trimmed_str) before calling
num_bigint::BigUint::from_str_radix: if trimmed_str.len() (after removing
optional "0x"/"0X" and optional leading zeros) exceeds 64 hex characters (32
bytes) return ProverClientError::InvalidHexString early; otherwise proceed to
parse and convert to a 32-byte big-endian array as before. Ensure you also
handle odd-length hex by treating a leading '0' or rejecting consistently so the
length check reflects actual byte count.

In `@sdk-libs/client/src/interface/initialize_config.rs`:
- Line 7: The code imports and serializes InitializeLightConfigParams from the
sibling crate light_account, which violates the SDK-libs dependency rule;
replace that direct dependency by either (a) defining a local DTO (e.g.,
LocalInitializeLightConfigParams) in initialize_config.rs with the same fields
and serialization derives and use it throughout the file, or (b) accept the
payload via the allowed program-lib interface type and map/convert it into a
local type; remove the use light_account::InitializeLightConfigParams import and
any direct uses of InitializeLightConfigParams, updating functions that
reference it (e.g., the initializer/serializer code in this file) to work with
the new local DTO or conversion layer.
- Around line 11-12: Replace the hardcoded
INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR byte array with the Anchor
Discriminator constant/method: import anchor_lang::Discriminator and derive the
8-byte discriminator from the instruction type (e.g., use
<InitializeCompressionConfig as Discriminator>::DISCRIMINATOR or
InitializeCompressionConfig::discriminator()) instead of the literal [133, 228,
12, 169, 56, 76, 222, 61]; apply the same pattern to any other discriminator
constants so the client pulls the canonical discriminator from the instruction
type rather than a hardcoded array.

---

Outside diff comments:
In `@forester/src/epoch_manager.rs`:
- Around line 570-585: The shutdown currently calls shutdown_new_tree_workers()
before the scopeguard that aborts background producers (the guard created as let
_guard = scopeguard::guard(...)), which allows discover_trees_periodically ->
add_new_tree -> register_new_tree_worker to append workers after drain starts;
move the abort of producer tasks earlier so background producers are stopped
before draining/joining new_tree_workers: ensure the scopeguard (the closure
that aborts current_previous_handle, tree_discovery_handle,
balance_check_handle) runs (or explicitly call abort on those handles) prior to
invoking shutdown_new_tree_workers()/joining new_tree_workers, so no new workers
can be registered after drain starts (adjust location around the scopeguard
creation and the call to shutdown_new_tree_workers).

In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 719-732: The test constructor sets
AddressQueueData.tree_next_insertion_index from
address_tree_bundle.right_most_index(), while the production constructor
populates that same field from aq.start_index; make them consistent by changing
the test to set tree_next_insertion_index to start_index (start as u64) so it
matches the production behavior, or alternatively update the production
constructor to use address_tree_bundle.right_most_index()—whichever semantic
(start_index vs right_most_index) you choose, apply it consistently in both the
AddressQueueData builder in this test and in the production constructor that
currently reads aq.start_index to avoid pagination/offset mismatches.

---

Duplicate comments:
In `@forester/src/epoch_manager.rs`:
- Around line 943-1023: The per-epoch report/send and cleanup run while
per-epoch new-tree workers spawned by register_new_tree_worker may still be
running; to fix, before the report send and before removing epoch state iterate
the collection of NewTreeWorker entries for that epoch and wait for each worker
to finish by awaiting its completion oneshot receiver (NewTreeWorker.completion)
and/or joining its thread_handle (JoinHandle::join or await the completion_rx)
and only then proceed with the report and epoch cleanup; ensure you also send
cancellation (NewTreeWorker.cancel) where appropriate, remove entries after
join, and update any places that access self.new_tree_workers so they await
these completions for the same epoch.
- Around line 1936-1957: The code currently spawns blocking tasks inside the
per-tree async task and immediately awaits them, allowing spawned blocking work
to continue after run() exits; change this to capture and manage the Tokio
JoinHandle so work can be awaited or aborted on shutdown. Specifically, modify
the block that calls tokio::task::spawn_blocking in the task pushed into tasks
(referenced symbols: spawn_blocking, process_queue, worker_slots,
runtime_handle, tasks/epoch_tasks) to return or store the JoinHandle instead of
awaiting it inline; then in run() maintain a Vec<JoinHandle> for in-flight
spawn_blocking handles and on shutdown either .await them to completion or call
.abort() on each (and optionally await the aborted handles) before returning.
Ensure permits (from worker_slots.acquire_owned()) are held only for the
lifetime of the spawned work and dropped appropriately when aborting or awaiting
the JoinHandle so semaphores are released.

In `@forester/src/forester_status.rs`:
- Around line 673-688: Create a helper function (e.g., fn
parse_queue_metrics<T>(acc: &Account) -> (Option<u64>, Option<u64>)) that
encapsulates the unsafe parse_hash_set_from_bytes::<QueueAccount>(&acc.data)
logic, the iteration to compute len (count of cells with
sequence_number.is_none()) and cap (hs.get_capacity()), and the Err branch that
calls warn!(?error, "Failed to parse StateV1 queue hash set") returning (None,
None); then replace both duplicated map closures (the blocks using
parse_hash_set_from_bytes::<QueueAccount>) with calls to this helper (passing
the acc), ensuring the helper is used where the original closures lived.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: cc63c1aa-5f87-4c5f-a00f-34561a82b45c

📥 Commits

Reviewing files that changed from the base of the PR and between 82dc761 and 30a3abb.

⛔ Files ignored due to path filters (3)

program-tests/utils/src/mock_batched_forester.rs is excluded by none and included by none
sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/read_only.rs is excluded by none and included by none
sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/test.rs is excluded by none and included by none

📒 Files selected for processing (19)

forester/src/epoch_manager.rs
forester/src/forester_status.rs
forester/src/metrics.rs
forester/src/priority_fee.rs
forester/src/processor/v2/proof_worker.rs
forester/tests/e2e_test.rs
forester/tests/legacy/batched_state_async_indexer_test.rs
forester/tests/test_batch_append_spent.rs
prover/client/src/helpers.rs
prover/client/src/proof.rs
prover/client/src/proof_types/batch_address_append/json.rs
prover/client/src/proof_types/batch_append/proof_inputs.rs
prover/client/src/proof_types/combined/v2/json.rs
prover/client/src/proof_types/non_inclusion/v2/json.rs
prover/client/tests/batch_address_append.rs
prover/client/tests/init_merkle_tree.rs
sdk-libs/client/src/interface/initialize_config.rs
sdk-libs/client/src/local_test_validator.rs
sdk-libs/program-test/src/indexer/test_indexer.rs

coderabbitai · 2026-03-17T10:09:18Z

forester/tests/e2e_test.rs

+    let config = Arc::new(config.clone());
+    let service_handle = tokio::task::spawn_blocking(move || {
+        let runtime = tokio::runtime::Builder::new_multi_thread()
+            .worker_threads(2)
+            .enable_all()
+            .build()?;
+        runtime.block_on(run_pipeline::<LightClient>(
+            config,
+            None,
+            None,
+            shutdown_receiver,
+            Some(shutdown_compressible_receiver),
+            Some(shutdown_bootstrap_receiver),
+            work_report_sender,
+        ))
+    });


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash rg -n "spawn_blocking\\(|Builder::new_multi_thread\\(|runtime\\.block_on\\(|Handle::current\\(" \ forester/tests/e2e_test.rs \ forester/src/epoch_manager.rs

Repository: Lightprotocol/light-protocol

Length of output: 731

🏁 Script executed:

#!/bin/bash # Check run_pipeline signature and context rg -B 5 -A 2 "^async fn run_pipeline" forester/src/

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Examine epoch_manager around Handle::current() usage sed -n '1910,1950p' forester/src/epoch_manager.rs

Repository: Lightprotocol/light-protocol

Length of output: 1780

🏁 Script executed:

#!/bin/bash # Examine the nested runtime creation in epoch_manager sed -n '945,965p' forester/src/epoch_manager.rs

Repository: Lightprotocol/light-protocol

Length of output: 1374

🏁 Script executed:

#!/bin/bash # Check if run_pipeline returns a Result and what type rg -A 15 "^async fn run_pipeline" forester/src/ | head -30

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Search for run_pipeline in all forester files rg "fn run_pipeline" forester/

Repository: Lightprotocol/light-protocol

Length of output: 213

🏁 Script executed:

#!/bin/bash # Try searching in the entire repo rg -l "run_pipeline" forester/ | head -10

Repository: Lightprotocol/light-protocol

Length of output: 473

🏁 Script executed:

#!/bin/bash # Get context around the spawn_blocking call in e2e_test.rs sed -n '795,825p' forester/tests/e2e_test.rs

Repository: Lightprotocol/light-protocol

Length of output: 1126

🏁 Script executed:

#!/bin/bash # Check the full signature of run_pipeline rg -A 10 "^pub async fn run_pipeline" forester/src/lib.rs | head -20

Repository: Lightprotocol/light-protocol

Length of output: 937

🏁 Script executed:

#!/bin/bash # Check if this is a tokio test sed -n '790,810p' forester/tests/e2e_test.rs | head -25

Repository: Lightprotocol/light-protocol

Length of output: 935

🏁 Script executed:

#!/bin/bash # Look at the test function signature rg -B 10 "let config = Arc::new\(config.clone\(\)\);" forester/tests/e2e_test.rs | head -20

Repository: Lightprotocol/light-protocol

Length of output: 560

🏁 Script executed:

#!/bin/bash # Find the test function that contains this code sed -n '770,810p' forester/tests/e2e_test.rs

Repository: Lightprotocol/light-protocol

Length of output: 1475

🏁 Script executed:

#!/bin/bash # Check the full test function signature rg -B 30 "let service_handle = tokio::task::spawn_blocking" forester/tests/e2e_test.rs | grep -A 30 "^fn " | head -35

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Check if it's a #[tokio::test] sed -n '750,810p' forester/tests/e2e_test.rs

Repository: Lightprotocol/light-protocol

Length of output: 2135

🏁 Script executed:

#!/bin/bash # Find where setup_forester_pipeline is called rg -B 5 -A 5 "setup_forester_pipeline" forester/tests/e2e_test.rs | head -50

Repository: Lightprotocol/light-protocol

Length of output: 766

🏁 Script executed:

#!/bin/bash # Check if setup_forester_pipeline itself is async by looking at its definition sed -n '786,830p' forester/tests/e2e_test.rs

Repository: Lightprotocol/light-protocol

Length of output: 1517

🏁 Script executed:

#!/bin/bash # Check if run_pipeline uses Handle::current() internally rg -n "Handle::current()" forester/src/lib.rs | head -10

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Look at the actual test that uses setup_forester_pipeline sed -n '50,100p' forester/tests/e2e_test.rs | head -60

Repository: Lightprotocol/light-protocol

Length of output: 1384

🏁 Script executed:

#!/bin/bash # Find the main test function rg -n "^#\[tokio::test\]" forester/tests/e2e_test.rs

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Check what test runs setup_forester_pipeline rg -B 2 "setup_forester_pipeline" forester/tests/e2e_test.rs | grep -E "^[0-9]+" | head -5

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Get the full context - what test calls it? sed -n '1,50p' forester/tests/e2e_test.rs

Repository: Lightprotocol/light-protocol

Length of output: 2032

🏁 Script executed:

#!/bin/bash # Search for test functions that call setup_forester_pipeline rg -B 50 "setup_forester_pipeline\(&config\)" forester/tests/e2e_test.rs | grep -E "(^async fn|^fn|#\[)" | tail -5

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Search run_pipeline implementation for Handle::current usage rg -n "Handle::" forester/src/lib.rs

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Check what test actually uses setup_forester_pipeline with full context rg -B 100 "setup_forester_pipeline.*await" forester/tests/e2e_test.rs | grep -E "^(async fn|#\[|[0-9]+.*async fn)" | tail -3

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Get the actual test name and see if it's marked as tokio test rg -B 100 "setup_forester_pipeline" forester/tests/e2e_test.rs | grep -E "#\[.*test" | tail -1

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Check the return type expectations sed -n '786,825p' forester/tests/e2e_test.rs

Repository: Lightprotocol/light-protocol

Length of output: 1368

🏁 Script executed:

#!/bin/bash # Check what type spawn_blocking returns vs tokio::spawn rg -A 5 "pub fn spawn_blocking" | head -15

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Check if run_pipeline_with_run_id actually uses Handle::current() or any Handle methods sed -n '1,100p' forester/src/lib.rs | head -100

Repository: Lightprotocol/light-protocol

Length of output: 3378

🏁 Script executed:

#!/bin/bash # Look at the full run_pipeline implementation rg -A 50 "pub async fn run_pipeline_with_run_id" forester/src/lib.rs | head -60

Repository: Lightprotocol/light-protocol

Length of output: 2453

🏁 Script executed:

#!/bin/bash # Check return type of spawn_blocking vs spawn sed -n '800,820p' forester/tests/e2e_test.rs

Repository: Lightprotocol/light-protocol

Length of output: 717

🏁 Script executed:

#!/bin/bash # Verify run_pipeline doesn't use Handle::current internally by searching its implementation rg -n "Handle::" forester/src/lib.rs forester/src/epoch_manager.rs

Repository: Lightprotocol/light-protocol

Length of output: 154

🏁 Script executed:

#!/bin/bash # Double-check: does run_pipeline use tokio::task::spawn or tokio::spawn internally? rg -n "tokio::spawn" forester/src/lib.rs | head -10

Repository: Lightprotocol/light-protocol

Length of output: 359

🏁 Script executed:

#!/bin/bash # Check if run_pipeline (via run_service) spawns tasks that might fail in nested runtime rg -n "tokio::spawn" forester/src/lib.rs forester/src/epoch_manager.rs | grep -A 2 -B 2 "191\|211\|257\|279\|319\|336"

Repository: Lightprotocol/light-protocol

Length of output: 648

🏁 Script executed:

#!/bin/bash # Understand what run_service does - it's called by run_pipeline_with_run_id rg -B 5 -A 20 "^async fn run_service" forester/src/epoch_manager.rs | head -40

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Key question: in the e2e_test, is there any direct tokio::spawn happening inside run_pipeline that would need the outer runtime? sed -n '445,460p' forester/src/lib.rs

Repository: Lightprotocol/light-protocol

Length of output: 835

🏁 Script executed:

#!/bin/bash # Verify: does setup_forester_pipeline appear to be called from within a tokio runtime? # Look for the test that calls it rg -B 20 "setup_forester_pipeline\(" forester/tests/e2e_test.rs | grep -E "^[0-9]+--(async fn|#\[)" | tail -5

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Critical check: what is the outer runtime context? # Find what calls setup_forester_pipeline and check if it's async sed -n '1,30p' forester/tests/e2e_test.rs | grep -E "tokio|async"

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Check the actual problem: can we just use tokio::spawn instead of spawn_blocking? # Verify the return type compatibility sed -n '803,825p' forester/tests/e2e_test.rs

Repository: Lightprotocol/light-protocol

Length of output: 734

🏁 Script executed:

#!/bin/bash # Final verification: check if anything in run_pipeline tries to access Handle::current() rg -n "Handle::current" forester/src/lib.rs

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Also check if any spawned tasks in run_pipeline might need the test runtime rg -B 3 -A 3 "tokio::spawn" forester/src/lib.rs | head -40

Repository: Lightprotocol/light-protocol

Length of output: 1546

🏁 Script executed:

#!/bin/bash # Final confirmation: check if setup_forester_pipeline is called from within an async test context rg -B 50 "setup_forester_pipeline\(" forester/tests/e2e_test.rs | grep -E "async fn|#\[" | tail -3

Repository: Lightprotocol/light-protocol

Length of output: 105

🏁 Script executed:

#!/bin/bash # Confirm the test is tokio-based rg "use tokio::" forester/tests/e2e_test.rs

Repository: Lightprotocol/light-protocol

Length of output: 84

🏁 Script executed:

#!/bin/bash # Verify: what is the actual test function signature that uses setup_forester_pipeline? sed -n '800,850p' forester/tests/e2e_test.rs | head -60

Repository: Lightprotocol/light-protocol

Length of output: 1701

🏁 Script executed:

#!/bin/bash # Check if there's any usage of Handle::current() that could conflict # in the run_pipeline or its sub-functions rg -n "Handle::current" forester/src/

Repository: Lightprotocol/light-protocol

Length of output: 154

🏁 Script executed:

#!/bin/bash # Verify the actual type returned by spawn vs spawn_blocking # Look at what the test expects sed -n '820,835p' forester/tests/e2e_test.rs

Repository: Lightprotocol/light-protocol

Length of output: 545

🏁 Script executed:

#!/bin/bash # Key insight: can we safely replace this without breaking return types? # Check if JoinHandle types are compatible rg -B 2 -A 5 "service_handle\." forester/tests/e2e_test.rs | head -20

Repository: Lightprotocol/light-protocol

Length of output: 369

Use tokio::spawn directly instead of nested runtime.

run_pipeline is fully async and doesn't require blocking execution. Replace spawn_blocking + new_multi_thread with a direct tokio::spawn call—the tasks spawned within run_pipeline will work correctly on the test's existing runtime. This eliminates unnecessary complexity and overhead from creating a separate runtime context.

Proposed fix

let config = Arc::new(config.clone()); - let service_handle = tokio::task::spawn_blocking(move || { - let runtime = tokio::runtime::Builder::new_multi_thread() - .worker_threads(2) - .enable_all() - .build()?; - runtime.block_on(run_pipeline::<LightClient>( - config, - None, - None, - shutdown_receiver, - Some(shutdown_compressible_receiver), - Some(shutdown_bootstrap_receiver), - work_report_sender, - )) - }); + let service_handle = tokio::spawn(run_pipeline::<LightClient>( + config, + None, + None, + shutdown_receiver, + Some(shutdown_compressible_receiver), + Some(shutdown_bootstrap_receiver), + work_report_sender, + ));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

let config = Arc::new(config.clone());

let service_handle = tokio::task::spawn_blocking(move || {

let runtime = tokio::runtime::Builder::new_multi_thread()

.worker_threads(2)

.enable_all()

.build()?;

runtime.block_on(run_pipeline::<LightClient>(

config,

None,

None,

shutdown_receiver,

Some(shutdown_compressible_receiver),

Some(shutdown_bootstrap_receiver),

work_report_sender,

))

});

let config = Arc::new(config.clone());

let service_handle = tokio::spawn(run_pipeline::<LightClient>(

config,

None,

None,

shutdown_receiver,

Some(shutdown_compressible_receiver),

Some(shutdown_bootstrap_receiver),

work_report_sender,

));

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@forester/tests/e2e_test.rs` around lines 802 - 817, The test currently spawns a nested runtime with tokio::task::spawn_blocking and a new Tokio runtime to run run_pipeline::<LightClient>, which is unnecessary; replace the spawn_blocking + Builder::new_multi_thread block with a direct tokio::spawn that awaits run_pipeline::<LightClient> on the existing test runtime, passing the same captured values (config, shutdown_receiver, shutdown_compressible_receiver, shutdown_bootstrap_receiver, work_report_sender) and assign its JoinHandle to service_handle so the spawned task runs as a regular async task on the test runtime.

forester/tests/test_batch_append_spent.rs

coderabbitai · 2026-03-17T10:09:18Z

prover/client/src/proof_types/batch_append/proof_inputs.rs

+    let batch_len = batch_size as usize;
+    for (name, len) in [
+        ("old_leaves", old_leaves.len()),
+        ("leaves", leaves.len()),
+        ("merkle_proofs", merkle_proofs.len()),
+    ] {
+        if len != batch_len {
+            return Err(ProverClientError::GenericError(format!(
+                "invalid batch append inputs: {name} len {len} != batch_size {batch_len}"
+            )));
+        }
+    }


⚠️ Potential issue | 🟠 Major

Validate each proof height before the later unwrap().

The new batch-size checks on Lines 131-142 still leave a panic path on Line 193: a single Merkle proof whose length is not HEIGHT will hit try_into().unwrap() instead of returning ProverClientError. Since this function is now on the Result-based path, please reject malformed proof lengths during preflight or replace the conversion with map_err.

🛠️ Suggested guard

let batch_len = batch_size as usize; for (name, len) in [ ("old_leaves", old_leaves.len()), ("leaves", leaves.len()), ("merkle_proofs", merkle_proofs.len()), ] { if len != batch_len { return Err(ProverClientError::GenericError(format!( "invalid batch append inputs: {name} len {len} != batch_size {batch_len}" ))); } } + for (i, proof) in merkle_proofs.iter().enumerate() { + if proof.len() != HEIGHT { + return Err(ProverClientError::GenericError(format!( + "invalid batch append proof length at position {i}: got {}, expected {}", + proof.len(), + HEIGHT + ))); + } + }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@prover/client/src/proof_types/batch_append/proof_inputs.rs` around lines 131 - 142, The loop that checks batch lengths doesn't validate individual Merkle proof heights, so the later conversion try_into().unwrap() can panic; update proof_inputs.rs to either (a) preflight-validate every merkle_proofs[i].len() equals HEIGHT (alongside existing batch checks for old_leaves, leaves, merkle_proofs) and return ProverClientError::GenericError on mismatch, or (b) replace the unsafe conversion site where try_into().unwrap() is used with a fallible conversion that maps the error to ProverClientError (e.g., try_into().map_err(|e| ProverClientError::GenericError(format!("invalid merkle proof height: {e}"))) ) so malformed proof lengths produce a Result::Err instead of a panic; reference the merkle_proofs array and the try_into().unwrap() call when applying the change.

coderabbitai · 2026-03-17T10:09:18Z

prover/client/src/proof.rs

+pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> {
+    let trimmed_str = hex_str
+        .strip_prefix("0x")
+        .or_else(|| hex_str.strip_prefix("0X"))
+        .unwrap_or(hex_str);
+    let big_uint = num_bigint::BigUint::from_str_radix(trimmed_str, 16)
+        .map_err(|error| ProverClientError::InvalidHexString(format!("{hex_str}: {error}")))?;
+    let big_uint_bytes = big_uint.to_bytes_be();
+    if big_uint_bytes.len() > 32 {
+        return Err(ProverClientError::InvalidHexString(format!(
+            "{hex_str}: exceeds 32 bytes"
+        )));


⚠️ Potential issue | 🟠 Major

Add a pre-parse length guard to prevent expensive unbounded hex parsing.

Line 77 parses the full input before rejecting oversize values, so very large strings can cause unnecessary CPU/memory work. Bound the normalized hex length before from_str_radix.

Proposed fix

pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> { let trimmed_str = hex_str .strip_prefix("0x") .or_else(|| hex_str.strip_prefix("0X")) .unwrap_or(hex_str); - let big_uint = num_bigint::BigUint::from_str_radix(trimmed_str, 16) + let normalized = trimmed_str.trim_start_matches('0'); + if normalized.len() > 64 { + return Err(ProverClientError::InvalidHexString(format!( + "{hex_str}: exceeds 32 bytes" + ))); + } + if normalized.is_empty() { + return Ok([0u8; 32]); + } + let big_uint = num_bigint::BigUint::from_str_radix(normalized, 16) .map_err(|error| ProverClientError::InvalidHexString(format!("{hex_str}: {error}")))?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@prover/client/src/proof.rs` around lines 72 - 83, In deserialize_hex_string_to_be_bytes, avoid parsing arbitrarily long hex strings by adding a pre-parse length guard on the normalized hex string (trimmed_str) before calling num_bigint::BigUint::from_str_radix: if trimmed_str.len() (after removing optional "0x"/"0X" and optional leading zeros) exceeds 64 hex characters (32 bytes) return ProverClientError::InvalidHexString early; otherwise proceed to parse and convert to a 32-byte big-endian array as before. Ensure you also handle odd-length hex by treating a leading '0' or rejecting consistently so the length check reflects actual byte count.

coderabbitai · 2026-03-17T10:09:18Z

sdk-libs/client/src/interface/initialize_config.rs

 #[cfg(not(feature = "anchor"))]
-use borsh::{BorshDeserialize as AnchorDeserialize, BorshSerialize as AnchorSerialize};
+use borsh::BorshSerialize as AnchorSerialize;
+use light_account::InitializeLightConfigParams;


⚠️ Potential issue | 🟠 Major

Keep the request DTO out of a sibling SDK crate.

This change deepens the existing light_account coupling by importing and serializing InitializeLightConfigParams from another SDK library. sdk-libs/client should take this payload from an allowed program-lib or define a local boundary type instead of reaching into a sibling SDK crate.

As per coding guidelines, sdk-libs/**/src/**/*.rs: SDK-libs must depend on program-libs, light-prover-client and external crates only.

Also applies to: 103-114

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk-libs/client/src/interface/initialize_config.rs` at line 7, The code imports and serializes InitializeLightConfigParams from the sibling crate light_account, which violates the SDK-libs dependency rule; replace that direct dependency by either (a) defining a local DTO (e.g., LocalInitializeLightConfigParams) in initialize_config.rs with the same fields and serialization derives and use it throughout the file, or (b) accept the payload via the allowed program-lib interface type and map/convert it into a local type; remove the use light_account::InitializeLightConfigParams import and any direct uses of InitializeLightConfigParams, updating functions that reference it (e.g., the initializer/serializer code in this file) to work with the new local DTO or conversion layer.

coderabbitai · 2026-03-17T10:09:18Z

sdk-libs/client/src/interface/initialize_config.rs

+use crate::interface::instructions::INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR;
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail rg -n -C3 'INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR|Discriminator::DISCRIMINATOR|initialize_compression_config' --type rust

Repository: Lightprotocol/light-protocol

Length of output: 6165

🏁 Script executed:

#!/bin/bash set -euo pipefail # Find instruction enum definitions in the programs directory rg -n "enum.*Instruction|InitializeCompressionConfig|#\[derive.*Discriminator" --type rust programs/ | head -50

Repository: Lightprotocol/light-protocol

Length of output: 212

🏁 Script executed:

#!/bin/bash set -euo pipefail # Look for where the instructions are defined, likely in light_system_program or similar fd "instruction" --type f --extension rs programs/ | head -20

Repository: Lightprotocol/light-protocol

Length of output: 409

🏁 Script executed:

#!/bin/bash set -euo pipefail # Search for the actual instruction definition with discriminator rg -n "InitializeCompressionConfig" --type rust programs/ -A5 -B5 | head -100

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash set -euo pipefail # Look at the InstructionDiscriminator enum cat programs/system/src/lib.rs | head -100

Repository: Lightprotocol/light-protocol

Length of output: 3509

🏁 Script executed:

#!/bin/bash set -euo pipefail # Search for how discriminator values are generated or defined rg -n "133.*228.*12|initialize_compression_config" --type rust -i

Repository: Lightprotocol/light-protocol

Length of output: 2055

🏁 Script executed:

#!/bin/bash set -euo pipefail # Look for instruction definitions in the program rg -n "#\[derive.*Instruction|enum Instruction|pub fn initialize" --type rust programs/system/src/ -B2 -A2 | head -80

Repository: Lightprotocol/light-protocol

Length of output: 362

🏁 Script executed:

#!/bin/bash set -euo pipefail # Look at how the discriminator is defined in the macros - is it derived or hardcoded? cat -n sdk-libs/macros/src/light_pdas/program/instructions.rs | sed -n '810,820p'

Repository: Lightprotocol/light-protocol

Length of output: 926

🏁 Script executed:

#!/bin/bash set -euo pipefail # Search for actual instruction enum in Anchor programs with the discriminator trait rg -n "#\[derive.*Instruction|pub enum.*Instruction" --type rust sdk-tests/ -B5 -A5 | head -60

Repository: Lightprotocol/light-protocol

Length of output: 3840

🏁 Script executed:

#!/bin/bash set -euo pipefail # Look for if there's an Anchor instruction that would generate this discriminator rg -n "initialize_compression_config" --type rust sdk-tests/anchor-manual-test/ -B10 -A10 | head -80

Repository: Lightprotocol/light-protocol

Length of output: 1829

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check if there's an IDL or generated code that could export the discriminator rg -n "anchor_lang::Discriminator|DISCRIMINATOR::" --type rust | grep -i initialize | head -20

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash set -euo pipefail # Look at the actual Anchor instruction setup in anchor-manual-test cat -n sdk-tests/anchor-manual-test/src/lib.rs | sed -n '1,100p'

Repository: Lightprotocol/light-protocol

Length of output: 4538

🏁 Script executed:

#!/bin/bash set -euoripefail # Search for how other Anchor programs export their instruction discriminators fd "lib.rs" --type f --path "*/programs/*" -x cat -n {} \; | head -150

Repository: Lightprotocol/light-protocol

Length of output: 746

🏁 Script executed:

#!/bin/bash set -euo pipefail # Let's check if there's documentation about the discriminator derivation rg -n "discriminator|sha256" --type rust sdk-libs/macros/src/light_pdas/program/instructions.rs -B5 -A5 | head -100

Repository: Lightprotocol/light-protocol

Length of output: 1865

🏁 Script executed:

#!/bin/bash set -euo pipefail # Check what anchor_lang discriminator would be for initialize_compression_config # by looking for any Discriminator derive or implementation rg -n "anchor_lang::Discriminator|impl.*Discriminator" --type rust | head -20

Repository: Lightprotocol/light-protocol

Length of output: 2178

🏁 Script executed:

#!/bin/bash set -euo pipefail # Let's verify the discriminator value using Python and Anchor's hash method python3 << 'EOF' import hashlib # Anchor derives discriminators from the instruction name instruction_name = "initialize_compression_config" # Get the first 8 bytes of the SHA256 hash hash_bytes = hashlib.sha256(b"global:" + instruction_name.encode()).digest()[:8] discriminator = list(hash_bytes) print(f"Instruction name: {instruction_name}") print(f"Expected discriminator (from Anchor): {discriminator}") print(f"Hardcoded discriminator in code: [133, 228, 12, 169, 56, 76, 222, 61]") print(f"Match: {discriminator == [133, 228, 12, 169, 56, 76, 222, 61]}") EOF

Repository: Lightprotocol/light-protocol

Length of output: 276

Use anchor_lang::Discriminator trait instead of hardcoded byte array for instruction discriminators.

The constant INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR is currently a hardcoded byte array [133, 228, 12, 169, 56, 76, 222, 61]. While this value is correct (verified against Anchor's sha256 derivation), the coding guideline for {programs,sdk-libs}/**/src/**/*.rs explicitly requires using the anchor_lang::Discriminator trait to obtain 8-byte instruction discriminators.

Hardcoded literals create maintenance risk: if the instruction name ever changes, the constant won't automatically update and can silently drift from the program's canonical discriminator. The proper pattern is deriving the discriminator from the instruction type's Discriminator::DISCRIMINATOR implementation, ensuring your client and program always stay in sync.

Apply this to lines 49-50 in sdk-libs/client/src/interface/instructions.rs and all affected discriminator constants.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk-libs/client/src/interface/initialize_config.rs` around lines 11 - 12, Replace the hardcoded INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR byte array with the Anchor Discriminator constant/method: import anchor_lang::Discriminator and derive the 8-byte discriminator from the instruction type (e.g., use <InitializeCompressionConfig as Discriminator>::DISCRIMINATOR or InitializeCompressionConfig::discriminator()) instead of the literal [133, 228, 12, 169, 56, 76, 222, 61]; apply the same pattern to any other discriminator constants so the client pulls the canonical discriminator from the instruction type rather than a hardcoded array.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

prover/client/src/proof_types/batch_append/proof_inputs.rs (1)
193-193: ⚠️ Potential issue | 🟠 Major

Panic path remains: replace unwrap() with fallible conversion.

This conversion will panic if any Merkle proof's length doesn't equal HEIGHT. Since this function returns Result<..., ProverClientError>, you should propagate the error instead. The implementation in BatchAppendsCircuitInputs::new() at Lines 64-72 shows exactly the right pattern.
🛠️ Suggested fix
-        let merkle_proof_array = merkle_proof.try_into().unwrap();
+        let proof_len = merkle_proof.len();
+        let merkle_proof_array: [[u8; 32]; HEIGHT] = merkle_proof.try_into().map_err(|_| {
+            ProverClientError::GenericError(format!(
+                "invalid merkle proof length at position {}: got {}, expected {}",
+                i, proof_len, HEIGHT
+            ))
+        })?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_append/proof_inputs.rs` at line 193, The
code currently uses merkle_proof.try_into().unwrap() which can panic on invalid
length; change this to a fallible conversion that returns a ProverClientError
instead (mirror the pattern used in BatchAppendsCircuitInputs::new()). Replace
the unwrap with something like merkle_proof.try_into().map_err(|_|
ProverClientError::...)? and propagate the Result so the enclosing function
returns Err on conversion failure; reference the merkle_proof_array variable and
the BatchAppendsCircuitInputs::new() implementation for the exact
mapping/variant to use.
sdk-libs/program-test/src/indexer/test_indexer.rs (1)
719-733: ⚠️ Potential issue | 🟡 Minor

Fix the off-by-one error: tree_next_insertion_index must be rightmost index plus one.

Line 731 assigns address_tree_bundle.right_most_index() directly to tree_next_insertion_index, but right_most_index() returns the index of the last occupied element. Since tree_next_insertion_index is used downstream as the starting point for leaf insertion (via forester's staging trees and proof initialization), it must point to the next available index. Change line 731 to:
tree_next_insertion_index: address_tree_bundle.right_most_index() as u64 + 1,
This aligns with the semantics confirmed in test assertions (start_index >= 1 for initialized trees) and the address-tree initialization contract (sentinel at index 0, next insertion at 1).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 719 - 733,
The AddressQueueData being constructed sets tree_next_insertion_index
incorrectly to address_tree_bundle.right_most_index(); update the assignment so
tree_next_insertion_index uses the next available slot by adding one: set
tree_next_insertion_index to address_tree_bundle.right_most_index() as u64 + 1
(leave other fields like initial_root and subtrees unchanged) so subsequent leaf
insertion/proof logic uses the correct insertion index.
forester/tests/legacy/batched_state_async_indexer_test.rs (2)
300-317: ⚠️ Potential issue | 🟠 Major

Fallback polling in wait_for_slot can hang indefinitely.

When warp_to_slot is unavailable, the loop has no timeout. If slot progression stalls, the test can block forever.
Proposed fix
 async fn wait_for_slot(rpc: &mut LightClient, target_slot: u64) {
     match rpc.warp_to_slot(target_slot).await {
         Ok(_) => {
             println!("warped to slot {}", target_slot);
             return;
         }
         Err(e) => {
             println!("warp_to_slot unavailable ({}), falling back to polling", e);
         }
     }
-    while rpc.get_slot().await.unwrap() < target_slot {
+    let start = tokio::time::Instant::now();
+    let max_wait = Duration::from_secs(DEFAULT_TIMEOUT_SECONDS);
+    loop {
+        let current_slot = rpc.get_slot().await.unwrap();
+        if current_slot >= target_slot {
+            break;
+        }
+        assert!(
+            start.elapsed() < max_wait,
+            "Timed out waiting for slot {} (current slot {})",
+            target_slot,
+            current_slot
+        );
         println!(
             "waiting for active phase slot: {}, current slot: {}",
             target_slot,
-            rpc.get_slot().await.unwrap()
+            current_slot
         );
         sleep(Duration::from_millis(400)).await;
     }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/tests/legacy/batched_state_async_indexer_test.rs` around lines 300 -
317, The fallback polling in wait_for_slot can hang forever if warp_to_slot
fails; update the async function wait_for_slot to enforce a maximum wait (e.g.,
via tokio::time::timeout or an elapsed timer with max_duration/max_attempts)
when repeatedly calling rpc.get_slot(), and return an error or explicitly fail
the test when the timeout is reached; keep references to rpc.warp_to_slot and
rpc.get_slot so the polling logic is wrapped in the timeout and the function
signature is adjusted to propagate or assert a failure instead of blocking
indefinitely.
78-89: ⚠️ Potential issue | 🟡 Minor

Add explicit prover readiness check after init().

While this test sets use_surfpool: true to gate on CLI startup completion, the error log in the test comments shows the prover still failed to connect at runtime ("error trying to connect: dns error: task 145 was cancelled"). This indicates the implicit wait is insufficient.

Compare this to forester/tests/e2e_test.rs (lines 266–280), which uses an explicit spawn_prover().await call after init, despite also calling init(). Add the same explicit readiness gate here to avoid flakiness under load:
init(Some(LightValidatorConfig {
    enable_indexer: true,
    enable_prover: true,
    wait_time: 30,
    // ...
}))
.await;
spawn_prover().await;  // Explicitly wait for prover health
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@forester/tests/legacy/batched_state_async_indexer_test.rs` around lines 78 -
89, The test currently relies on init(Some(LightValidatorConfig { enable_prover:
true, use_surfpool: true, ... })).await but that implicit wait is flaky; after
calling init(...) add an explicit spawn_prover().await call (the same readiness
gate used in forester/tests/e2e_test.rs) immediately after init to wait for the
prover health before proceeding, ensuring the prover has connected and avoiding
DNS/connection cancellation errors.
sdk-libs/client/src/local_test_validator.rs (1)
40-41: ⚠️ Potential issue | 🟠 Major

Documentation example doesn't match implementation behavior.

The comment at line 40 shows "--account <ADDRESS> <FILEPATH>" as a single string, but the implementation at lines 94–95 passes each Vec<String> entry directly as a single CLI argument. Multi-part values in one string won't be parsed correctly. Users should provide pre-tokenized arguments instead: vec!["--account", "<ADDRESS>", "<FILEPATH>"].

Update the documentation to reflect correct usage:
Suggested doc clarification
-    /// Additional arguments to pass to the validator (e.g., "--account <ADDRESS> <FILEPATH>")
+    /// Additional validator CLI arguments (already tokenized). E.g.: vec!["--account", "<ADDRESS>", "<FILEPATH>"]
     pub validator_args: Vec<String>,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/local_test_validator.rs` around lines 40 - 41, The doc
comment for the validator_args field is misleading: it shows "--account
<ADDRESS> <FILEPATH>" as one string but the code expects each Vec<String> entry
to be a separate CLI token; update the documentation for the validator_args
field (symbol: validator_args) to show pre-tokenized usage such as
vec!["--account", "<ADDRESS>", "<FILEPATH>"] and explicitly state that each Vec
element becomes one CLI argument so multi-part flags must be split into separate
entries.

♻️ Duplicate comments (6)

sdk-libs/client/src/interface/initialize_config.rs (2)

11-11: ⚠️ Potential issue | 🟠 Major

Use Anchor Discriminator as the source of truth for instruction bytes.

Line 11 + Lines 120-124 still rely on a constant-backed discriminator pipeline. Pull the 8-byte discriminator from the instruction type via anchor_lang::Discriminator to prevent drift from canonical Anchor values.
#!/bin/bash
set -euo pipefail

# Verify hardcoded discriminator constants in client interface modules
rg -nP --type rust 'pub const .*DISCRIMINATOR.*=\s*\[[0-9,\s]+\]' sdk-libs/client/src/interface

# Verify whether Anchor Discriminator trait usage exists in these modules
rg -nP --type rust 'Discriminator::DISCRIMINATOR|::discriminator\(' sdk-libs/client/src/interface
Expected result:

First command should return the literal constant definition(s).

Second command should show no usage in this path today; after fix, it should include the canonical Discriminator lookup.

As per coding guidelines, {programs,sdk-libs}/**/src/**/*.rs: Use anchor_lang::Discriminator trait to get 8-byte instruction discriminators from Anchor programs.

Also applies to: 120-124
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/interface/initialize_config.rs` at line 11, Replace the
hardcoded INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR constant usage with the
Anchor Discriminator for the instruction type: import anchor_lang::Discriminator
and call the instruction struct/type's discriminator (e.g.,
YourInstructionType::discriminator()) wherever
INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR is referenced (including the block
around lines 120-124) so the 8-byte discriminator is sourced from Anchor instead
of a constant; remove the constant definition and update any
comparisons/constructions to use the returned discriminator byte array from the
Discriminator trait.
7-7: ⚠️ Potential issue | 🟠 Major

Remove cross-SDK DTO dependency in this instruction path.

Line 7 imports light_account::InitializeLightConfigParams, and Lines 103-114 serialize it directly. This keeps sdk-libs/client coupled to a sibling SDK crate and violates the SDK dependency boundary.

As per coding guidelines, sdk-libs/**/src/**/*.rs: SDK-libs must depend on program-libs, light-prover-client and external crates only.

Also applies to: 103-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/interface/initialize_config.rs` at line 7, Remove the
direct dependency on light_account::InitializeLightConfigParams by deleting the
import and stop serializing that external DTO; instead define and use a local
DTO (e.g., InitializeLightConfigDto) in this crate that mirrors only the needed
fields, map/copy values from incoming parameters into that local struct, and
serialize the local DTO in place of InitializeLightConfigParams (replace the
serialization at the block that currently uses InitializeLightConfigParams
around lines 103-114). Ensure no other references to InitializeLightConfigParams
remain in initialize_config.rs and update any function signatures or helper
methods that referenced that type to accept primitives or the new local DTO so
the sdk-libs crate no longer depends on the sibling light_account DTO.

sdk-libs/program-test/src/indexer/test_indexer.rs (1)

98-111: ⚠️ Potential issue | 🟠 Major

Parse wrapped prover responses in build_compressed_proof.

Line 98 still assumes body is raw gnark proof JSON. If prover returns { "proof": ..., "proof_duration_ms": ... }, this fails after HTTP 200 and breaks validity-proof generation.

🔧 Proposed fix

+#[derive(serde::Deserialize)]
+struct ProofWithTimingJson {
+    proof: serde_json::Value,
+}
+
 fn build_compressed_proof(body: &str) -> Result<CompressedProof, IndexerError> {
-    let proof_json = deserialize_gnark_proof_json(body)
+    let proof_body = serde_json::from_str::<ProofWithTimingJson>(body)
+        .map(|wrapped| wrapped.proof.to_string())
+        .unwrap_or_else(|_| body.to_string());
+    let proof_json = deserialize_gnark_proof_json(&proof_body)
         .map_err(|error| IndexerError::CustomError(error.to_string()))?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 98 - 111, The
build_compressed_proof function currently assumes body is raw gnark proof JSON;
update it to accept both raw proof JSON and a wrapped responder object like {
"proof": <proof>, "proof_duration_ms": ... } by detecting and extracting the
inner "proof" field (handle cases where "proof" is a JSON object or a JSON
string) before calling deserialize_gnark_proof_json, then continue with
proof_from_json_struct and compress_proof as before; modify logic in
build_compressed_proof to parse the wrapper, extract the inner proof
string/object, and only pass that inner proof to deserialize_gnark_proof_json so
existing helpers (deserialize_gnark_proof_json, proof_from_json_struct,
compress_proof) can be reused.

prover/client/src/proof.rs (1)

72-83: ⚠️ Potential issue | 🟠 Major

Add a pre-parse size guard before bigint parsing.

Line 77 parses the whole string first, and only then checks byte length. Very large hex inputs can still trigger expensive parse work before rejection.

💡 Proposed fix

 pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> {
     let trimmed_str = hex_str
         .strip_prefix("0x")
         .or_else(|| hex_str.strip_prefix("0X"))
         .unwrap_or(hex_str);
+    let normalized = trimmed_str.trim_start_matches('0');
+    if normalized.len() > 64 {
+        return Err(ProverClientError::InvalidHexString(format!(
+            "{hex_str}: exceeds 32 bytes"
+        )));
+    }
+    if normalized.is_empty() {
+        return Ok([0u8; 32]);
+    }
-    let big_uint = num_bigint::BigUint::from_str_radix(trimmed_str, 16)
+    let big_uint = num_bigint::BigUint::from_str_radix(normalized, 16)
         .map_err(|error| ProverClientError::InvalidHexString(format!("{hex_str}: {error}")))?;

Verification (confirm parse currently happens before length guard):

#!/bin/bash
set -euo pipefail
rg -n 'deserialize_hex_string_to_be_bytes|from_str_radix|exceeds 32 bytes' prover/client/src/proof.rs -A6 -B6

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof.rs` around lines 72 - 83, The function
deserialize_hex_string_to_be_bytes currently parses the entire hex into a
BigUint before checking size; add a pre-parse guard in
deserialize_hex_string_to_be_bytes that inspects the trimmed hex string (after
removing "0x"/"0X") and rejects any input whose hex digit length is greater than
64 (i.e., would exceed 32 bytes) with the same
ProverClientError::InvalidHexString message, returning early before calling
num_bigint::BigUint::from_str_radix; keep the existing flow for inputs <=64 hex
chars so the subsequent parsing and byte-length check remain unchanged.

prover/client/src/proof_types/non_inclusion/v2/json.rs (1)

27-29: ⚠️ Potential issue | 🟠 Major

Client/server pathIndex width is still inconsistent.

Line 28 and Line 54+ now allow u64, but prover server v2 snippets still deserialize pathIndex into uint32 (prover/server/prover/v2/marshal_non_inclusion.go, prover/server/prover/v2/non_inclusion_proving_system.go).
That means indices above u32::MAX will fail at the service boundary.

💡 Compatibility guard (until server contract is widened)

             let prof_input = NonInclusionJsonStruct {
                 root: big_int_to_string(&input.root),
                 value: big_int_to_string(&input.value),
-                path_index: input
+                path_index: input
                     .index_hashed_indexed_element_leaf
                     .to_u64()
                     .ok_or_else(|| {
                         ProverClientError::IntegerConversion(format!(
                             "failed to convert path index {} to u64",
                             input.index_hashed_indexed_element_leaf
                         ))
-                    })?,
+                    })
+                    .and_then(|v| {
+                        if v > u32::MAX as u64 {
+                            Err(ProverClientError::IntegerConversion(format!(
+                                "path index {} exceeds prover-server uint32 contract",
+                                v
+                            )))
+                        } else {
+                            Ok(v)
+                        }
+                    })?,

Run this to verify the server-side contract type in this branch:

#!/bin/bash
set -euo pipefail

rg -nP 'type\s+NonInclusionInputs\s+struct|PathIndex\s+u?int(32|64)' \
  prover/server/prover/v2/non_inclusion_proving_system.go -C2

rg -nP 'type\s+NonInclusionProofInputsJSON\s+struct|PathIndex\s+u?int(32|64)' \
  prover/server/prover/v2/marshal_non_inclusion.go -C2

Also applies to: 54-62

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/non_inclusion/v2/json.rs` around lines 27 - 29,
The client JSON struct's path_index is typed as u64 but the prover v2 server
expects a 32-bit index (PathIndex / NonInclusionProofInputsJSON use uint32), so
change the JSON field type (path_index) in
prover/client/src/proof_types/non_inclusion/v2/json.rs from u64 to u32 to match
the server contract; if you need to accept larger values temporarily, add
explicit validation in the deserialization/constructor for the containing struct
to reject values > u32::MAX with a clear error instead of silently truncating.

forester/src/epoch_manager.rs (1)

1399-1432: ⚠️ Potential issue | 🟠 Major

Await epoch-matched new-tree workers before reporting/cleanup.

process_epoch() reports work (Line 1399+) and clears epoch maps (Line 1423+) without waiting for NewTreeWorker tasks spawned in add_new_tree() (Line 943+). A late worker can still call update_metrics_and_counts() after reporting/cleanup, causing under-reporting and re-created per-epoch state.

Also applies to: 943-1023

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 1399 - 1432, process_epoch() is
clearing per-epoch state and reporting before NewTreeWorker tasks (spawned by
add_new_tree()) finish, allowing late workers to call
update_metrics_and_counts() after cleanup; fix this by ensuring process_epoch()
awaits completion of all NewTreeWorker tasks for the same epoch before sending
final reports and removing maps: locate where add_new_tree() spawns
NewTreeWorker tasks and store their JoinHandles (or register them in a per-epoch
collection like processing_epochs/new_tree_workers keyed by epoch), then in
process_epoch() wait/join/await all handles whose key == epoch (or poll a
per-epoch completion Notify/Future that NewTreeWorker pushes when done) prior to
calling send_work_report(), report_work_onchain(), and removing
registration_trackers/processing_epochs/processed_items_per_epoch_count/processing_metrics_per_epoch;
ensure update_metrics_and_counts() no longer races by only cleaning state after
those awaits complete.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@forester/src/epoch_manager.rs`:
- Around line 477-480: register_new_tree_worker currently always pushes a new
worker even after shutdown; change it to first check the epoch manager's
shutdown state and refuse/ignore the new worker if shutdown is requested
(instead of pushing). Specifically, inside register_new_tree_worker (which
already calls reap_finished_new_tree_workers and accesses new_tree_workers),
query the same shutdown indicator used by shutdown_new_tree_workers/run (e.g., a
shutdown_requested flag or similar Atomic/Mutex field) while holding appropriate
synchronization, and return early without pushing the worker if shutdown is set;
ensure the check happens before new_tree_workers.lock().await.push(...) so late
registrations cannot escape shutdown.

In `@prover/client/src/helpers.rs`:
- Around line 64-75: The code silently truncates path_index bits when iterating
over path_elements; before the loop validate that path_index fits within the
tree height (i.e., that no bits above path_elements.len() or the constant HEIGHT
are set) and return an Err if it does. Locate the variables/path used in this
function (path_index, path_elements, current_index, HEIGHT, changelog_entry) and
add a pre-loop check that fails when path_index >= (1 << path_elements.len()) or
when path_index has any 1-bits beyond HEIGHT, so the function returns an error
instead of computing an incorrect root/changelog path.

In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 125-132: The validator process check using child.try_wait() must
treat any early exit (Some(status)) as failure even when status.success() is
true; modify the logic around tokio::time::sleep(... config.wait_time) and the
child.try_wait() call so that if try_wait() returns Some(status) you always fail
the test (e.g., panic/assert) with a clear message including the status, rather
than only asserting on non-successful status; update the branch that currently
inspects status.success() to instead unconditionally fail on Some(status) and
include the status details in the error message to aid debugging.

---

Outside diff comments:
In `@forester/tests/legacy/batched_state_async_indexer_test.rs`:
- Around line 300-317: The fallback polling in wait_for_slot can hang forever if
warp_to_slot fails; update the async function wait_for_slot to enforce a maximum
wait (e.g., via tokio::time::timeout or an elapsed timer with
max_duration/max_attempts) when repeatedly calling rpc.get_slot(), and return an
error or explicitly fail the test when the timeout is reached; keep references
to rpc.warp_to_slot and rpc.get_slot so the polling logic is wrapped in the
timeout and the function signature is adjusted to propagate or assert a failure
instead of blocking indefinitely.
- Around line 78-89: The test currently relies on init(Some(LightValidatorConfig
{ enable_prover: true, use_surfpool: true, ... })).await but that implicit wait
is flaky; after calling init(...) add an explicit spawn_prover().await call (the
same readiness gate used in forester/tests/e2e_test.rs) immediately after init
to wait for the prover health before proceeding, ensuring the prover has
connected and avoiding DNS/connection cancellation errors.

In `@prover/client/src/proof_types/batch_append/proof_inputs.rs`:
- Line 193: The code currently uses merkle_proof.try_into().unwrap() which can
panic on invalid length; change this to a fallible conversion that returns a
ProverClientError instead (mirror the pattern used in
BatchAppendsCircuitInputs::new()). Replace the unwrap with something like
merkle_proof.try_into().map_err(|_| ProverClientError::...)? and propagate the
Result so the enclosing function returns Err on conversion failure; reference
the merkle_proof_array variable and the BatchAppendsCircuitInputs::new()
implementation for the exact mapping/variant to use.

In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 40-41: The doc comment for the validator_args field is misleading:
it shows "--account <ADDRESS> <FILEPATH>" as one string but the code expects
each Vec<String> entry to be a separate CLI token; update the documentation for
the validator_args field (symbol: validator_args) to show pre-tokenized usage
such as vec!["--account", "<ADDRESS>", "<FILEPATH>"] and explicitly state that
each Vec element becomes one CLI argument so multi-part flags must be split into
separate entries.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 719-733: The AddressQueueData being constructed sets
tree_next_insertion_index incorrectly to address_tree_bundle.right_most_index();
update the assignment so tree_next_insertion_index uses the next available slot
by adding one: set tree_next_insertion_index to
address_tree_bundle.right_most_index() as u64 + 1 (leave other fields like
initial_root and subtrees unchanged) so subsequent leaf insertion/proof logic
uses the correct insertion index.

---

Duplicate comments:
In `@forester/src/epoch_manager.rs`:
- Around line 1399-1432: process_epoch() is clearing per-epoch state and
reporting before NewTreeWorker tasks (spawned by add_new_tree()) finish,
allowing late workers to call update_metrics_and_counts() after cleanup; fix
this by ensuring process_epoch() awaits completion of all NewTreeWorker tasks
for the same epoch before sending final reports and removing maps: locate where
add_new_tree() spawns NewTreeWorker tasks and store their JoinHandles (or
register them in a per-epoch collection like processing_epochs/new_tree_workers
keyed by epoch), then in process_epoch() wait/join/await all handles whose key
== epoch (or poll a per-epoch completion Notify/Future that NewTreeWorker pushes
when done) prior to calling send_work_report(), report_work_onchain(), and
removing
registration_trackers/processing_epochs/processed_items_per_epoch_count/processing_metrics_per_epoch;
ensure update_metrics_and_counts() no longer races by only cleaning state after
those awaits complete.

In `@prover/client/src/proof_types/non_inclusion/v2/json.rs`:
- Around line 27-29: The client JSON struct's path_index is typed as u64 but the
prover v2 server expects a 32-bit index (PathIndex / NonInclusionProofInputsJSON
use uint32), so change the JSON field type (path_index) in
prover/client/src/proof_types/non_inclusion/v2/json.rs from u64 to u32 to match
the server contract; if you need to accept larger values temporarily, add
explicit validation in the deserialization/constructor for the containing struct
to reject values > u32::MAX with a clear error instead of silently truncating.

In `@prover/client/src/proof.rs`:
- Around line 72-83: The function deserialize_hex_string_to_be_bytes currently
parses the entire hex into a BigUint before checking size; add a pre-parse guard
in deserialize_hex_string_to_be_bytes that inspects the trimmed hex string
(after removing "0x"/"0X") and rejects any input whose hex digit length is
greater than 64 (i.e., would exceed 32 bytes) with the same
ProverClientError::InvalidHexString message, returning early before calling
num_bigint::BigUint::from_str_radix; keep the existing flow for inputs <=64 hex
chars so the subsequent parsing and byte-length check remain unchanged.

In `@sdk-libs/client/src/interface/initialize_config.rs`:
- Line 11: Replace the hardcoded INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR
constant usage with the Anchor Discriminator for the instruction type: import
anchor_lang::Discriminator and call the instruction struct/type's discriminator
(e.g., YourInstructionType::discriminator()) wherever
INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR is referenced (including the block
around lines 120-124) so the 8-byte discriminator is sourced from Anchor instead
of a constant; remove the constant definition and update any
comparisons/constructions to use the returned discriminator byte array from the
Discriminator trait.
- Line 7: Remove the direct dependency on
light_account::InitializeLightConfigParams by deleting the import and stop
serializing that external DTO; instead define and use a local DTO (e.g.,
InitializeLightConfigDto) in this crate that mirrors only the needed fields,
map/copy values from incoming parameters into that local struct, and serialize
the local DTO in place of InitializeLightConfigParams (replace the serialization
at the block that currently uses InitializeLightConfigParams around lines
103-114). Ensure no other references to InitializeLightConfigParams remain in
initialize_config.rs and update any function signatures or helper methods that
referenced that type to accept primitives or the new local DTO so the sdk-libs
crate no longer depends on the sibling light_account DTO.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 98-111: The build_compressed_proof function currently assumes body
is raw gnark proof JSON; update it to accept both raw proof JSON and a wrapped
responder object like { "proof": <proof>, "proof_duration_ms": ... } by
detecting and extracting the inner "proof" field (handle cases where "proof" is
a JSON object or a JSON string) before calling deserialize_gnark_proof_json,
then continue with proof_from_json_struct and compress_proof as before; modify
logic in build_compressed_proof to parse the wrapper, extract the inner proof
string/object, and only pass that inner proof to deserialize_gnark_proof_json so
existing helpers (deserialize_gnark_proof_json, proof_from_json_struct,
compress_proof) can be reused.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 408ea586-b76b-47c2-9158-adaad7ae4063

📥 Commits

Reviewing files that changed from the base of the PR and between 30a3abb and c97abbd.

⛔ Files ignored due to path filters (4)

program-tests/utils/src/mock_batched_forester.rs is excluded by none and included by none
sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/read_only.rs is excluded by none and included by none
sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/test.rs is excluded by none and included by none
sdk-tests/sdk-native-test/tests/test.rs is excluded by none and included by none

📒 Files selected for processing (19)

forester/src/epoch_manager.rs
forester/src/forester_status.rs
forester/src/metrics.rs
forester/src/priority_fee.rs
forester/src/processor/v2/proof_worker.rs
forester/tests/e2e_test.rs
forester/tests/legacy/batched_state_async_indexer_test.rs
forester/tests/test_batch_append_spent.rs
prover/client/src/helpers.rs
prover/client/src/proof.rs
prover/client/src/proof_types/batch_address_append/json.rs
prover/client/src/proof_types/batch_append/proof_inputs.rs
prover/client/src/proof_types/combined/v2/json.rs
prover/client/src/proof_types/non_inclusion/v2/json.rs
prover/client/tests/batch_address_append.rs
prover/client/tests/init_merkle_tree.rs
sdk-libs/client/src/interface/initialize_config.rs
sdk-libs/client/src/local_test_validator.rs
sdk-libs/program-test/src/indexer/test_indexer.rs

coderabbitai · 2026-03-17T10:47:26Z

forester/src/epoch_manager.rs

+    async fn register_new_tree_worker(&self, worker: NewTreeWorker) {
+        self.reap_finished_new_tree_workers().await;
+        self.new_tree_workers.lock().await.push(worker);
+    }


⚠️ Potential issue | 🟠 Major

Block post-shutdown worker registration after drain.

register_new_tree_worker() at Line 477 always appends, even after shutdown is requested. run() drains/cancels workers once in shutdown_new_tree_workers() (Line 482 onward), so a late registration can escape shutdown and outlive service exit.

Suggested fix

async fn register_new_tree_worker(&self, worker: NewTreeWorker) { + if self.shutdown_requested.load(Ordering::Acquire) { + let mut worker = worker; + if let Some(cancel) = worker.cancel.take() { + let _ = cancel.send(()); + } + self.detach_new_tree_worker_join(worker); + return; + } self.reap_finished_new_tree_workers().await; self.new_tree_workers.lock().await.push(worker); }

Also applies to: 482-537

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@forester/src/epoch_manager.rs` around lines 477 - 480, register_new_tree_worker currently always pushes a new worker even after shutdown; change it to first check the epoch manager's shutdown state and refuse/ignore the new worker if shutdown is requested (instead of pushing). Specifically, inside register_new_tree_worker (which already calls reap_finished_new_tree_workers and accesses new_tree_workers), query the same shutdown indicator used by shutdown_new_tree_workers/run (e.g., a shutdown_requested flag or similar Atomic/Mutex field) while holding appropriate synchronization, and return early without pushing the worker if shutdown is set; ensure the check happens before new_tree_workers.lock().await.push(...) so late registrations cannot escape shutdown.

coderabbitai · 2026-03-17T10:47:26Z

prover/client/src/helpers.rs

    let mut current_index = path_index;
    for (level, path_element) in path_elements.iter().enumerate() {
        changelog_entry.path[level] = Some(current_hash);
        if current_index.is_multiple_of(2) {
-            current_hash = Poseidon::hashv(&[&current_hash, path_element]).unwrap();
+            current_hash = Poseidon::hashv(&[&current_hash, path_element])?;
        } else {
-            current_hash = Poseidon::hashv(&[path_element, &current_hash]).unwrap();
+            current_hash = Poseidon::hashv(&[path_element, &current_hash])?;
        }
        current_index /= 2;
    }

-    (current_hash, changelog_entry)
+    Ok((current_hash, changelog_entry))


⚠️ Potential issue | 🟠 Major

Add explicit path-index bounds validation for tree height.

Line 64 accepts usize indices, but the loop only consumes HEIGHT bits. Indices with higher bits are silently truncated, which can compute the wrong root/changelog path without error.

💡 Proposed fix

pub fn compute_root_from_merkle_proof<const HEIGHT: usize>( leaf: [u8; 32], path_elements: &[[u8; 32]; HEIGHT], path_index: usize, ) -> Result<([u8; 32], ChangelogEntry<HEIGHT>), ProverClientError> { let mut changelog_entry = ChangelogEntry::default_with_index(path_index); let mut current_hash = leaf; let mut current_index = path_index; for (level, path_element) in path_elements.iter().enumerate() { changelog_entry.path[level] = Some(current_hash); if current_index.is_multiple_of(2) { current_hash = Poseidon::hashv(&[&current_hash, path_element])?; } else { current_hash = Poseidon::hashv(&[path_element, &current_hash])?; } current_index /= 2; } + if current_index != 0 { + return Err(ProverClientError::InvalidProofData(format!( + "path index {} exceeds tree height {}", + path_index, HEIGHT + ))); + } Ok((current_hash, changelog_entry)) }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@prover/client/src/helpers.rs` around lines 64 - 75, The code silently truncates path_index bits when iterating over path_elements; before the loop validate that path_index fits within the tree height (i.e., that no bits above path_elements.len() or the constant HEIGHT are set) and return an Err if it does. Locate the variables/path used in this function (path_index, path_elements, current_index, HEIGHT, changelog_entry) and add a pre-loop check that fails when path_index >= (1 << path_elements.len()) or when path_index has any 1-bits beyond HEIGHT, so the function returns an error instead of computing an incorrect root/changelog path.

coderabbitai · 2026-03-17T10:47:26Z

sdk-libs/client/src/local_test_validator.rs

            tokio::time::sleep(tokio::time::Duration::from_secs(config.wait_time)).await;
+            if let Some(status) = child.try_wait().expect("Failed to poll validator process") {
+                assert!(
+                    status.success(),
+                    "Validator exited early with error: {}",
+                    status
+                );
+            }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify expected lifecycle of non-surfpool `test-validator`. cat -n cli/test_bin/run | sed -n '1,260p' rg -nP -C3 'test-validator|surfpool|solana-test-validator|nohup|disown|&' cli/test_bin/run

Repository: Lightprotocol/light-protocol

Length of output: 212

🏁 Script executed:

#!/bin/bash # Find and examine the local_test_validator.rs file fd -type f -name "local_test_validator.rs" sdk-libs/

Repository: Lightprotocol/light-protocol

Length of output: 242

🏁 Script executed:

#!/bin/bash # Get the file size first wc -l sdk-libs/client/src/local_test_validator.rs

Repository: Lightprotocol/light-protocol

Length of output: 119

🏁 Script executed:

#!/bin/bash # Examine the critical section and surrounding context (lines 110-145) sed -n '110,145p' sdk-libs/client/src/local_test_validator.rs | cat -n

Repository: Lightprotocol/light-protocol

Length of output: 1309

🏁 Script executed:

#!/bin/bash # Search for spawn_validator function definition and surfpool-related logic rg -n "fn spawn_validator|surfpool|use_surfpool" sdk-libs/client/src/local_test_validator.rs -B2 -A8

Repository: Lightprotocol/light-protocol

Length of output: 2026

🏁 Script executed:

#!/bin/bash # Read the full spawn_validator function (lines 59-132) sed -n '59,132p' sdk-libs/client/src/local_test_validator.rs | cat -n

Repository: Lightprotocol/light-protocol

Length of output: 3535

🏁 Script executed:

#!/bin/bash # Search for spawn_validator usage in tests to understand expected behavior rg -n "spawn_validator" sdk-libs/client/ -B3 -A10 --type rs

Repository: Lightprotocol/light-protocol

Length of output: 98

🏁 Script executed:

#!/bin/bash # Check if there's any documentation or comments about validator lifecycle rg -n "long-running|background|keep.*running|exit" sdk-libs/client/src/local_test_validator.rs

Repository: Lightprotocol/light-protocol

Length of output: 300

🏁 Script executed:

#!/bin/bash # Look at the CLI implementation to understand what test-validator does fd "run" cli/test_bin/ -type f

Repository: Lightprotocol/light-protocol

Length of output: 242

🏁 Script executed:

#!/bin/bash # Search for spawn_validator usage in tests to understand expected behavior rg -n "spawn_validator" sdk-libs/client/ -B3 -A10

Repository: Lightprotocol/light-protocol

Length of output: 4992

🏁 Script executed:

#!/bin/bash # Check the CLI test-validator implementation cat -n cli/test_bin/run | head -20

Repository: Lightprotocol/light-protocol

Length of output: 212

🏁 Script executed:

#!/bin/bash # Search for test files that use spawn_validator fd spawn_validator sdk-libs/ | head -20

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Look for test files using spawn_validator in sdk-libs find sdk-libs -type f -name "*test*" -o -name "*tests*" | head -20

Repository: Lightprotocol/light-protocol

Length of output: 1220

🏁 Script executed:

#!/bin/bash # Search in sdk-tests for spawn_validator usage rg -l "spawn_validator" sdk-tests/ 2>/dev/null | head -10

Repository: Lightprotocol/light-protocol

Length of output: 115

🏁 Script executed:

#!/bin/bash # Look at examples or integration tests find sdk-libs/client -type f \( -name "*.rs" \) -exec grep -l "spawn_validator" {} \;

Repository: Lightprotocol/light-protocol

Length of output: 142

🏁 Script executed:

#!/bin/bash # Examine the actual test that uses spawn_validator cat -n sdk-tests/client-test/tests/light_client.rs | head -80

Repository: Lightprotocol/light-protocol

Length of output: 3857

🏁 Script executed:

#!/bin/bash # Look for how spawn_validator is called in the test rg -n "spawn_validator" sdk-tests/client-test/tests/light_client.rs -B5 -A15

Repository: Lightprotocol/light-protocol

Length of output: 1961

🏁 Script executed:

#!/bin/bash # Check if there are other tests or uses of non-surfpool mode rg -n "use_surfpool.*false" sdk-libs/ sdk-tests/

Repository: Lightprotocol/light-protocol

Length of output: 54

🏁 Script executed:

#!/bin/bash # Look for any issues or comments about validator exit behavior rg -n "exit|long.running|background" sdk-libs/client/src/local_test_validator.rs -C2

Repository: Lightprotocol/light-protocol

Length of output: 839

🏁 Script executed:

#!/bin/bash # Check if solana-test-validator or the CLI wrapper naturally exits # Look for any solana-test-validator documentation or similar processes rg -n "solana-test-validator" --no-heading | head -20

Repository: Lightprotocol/light-protocol

Length of output: 1714

🏁 Script executed:

#!/bin/bash # Verify: is the validator meant to be a long-running process in non-surfpool mode? # Check what happens after spawn_validator in actual usage rg -B2 -A20 "spawn_validator.*await" sdk-tests/ sdk-libs/

Repository: Lightprotocol/light-protocol

Length of output: 4740

Early validator exit in non-surfpool mode should always fail, not just on non-zero exit code.

solana-test-validator is a long-running process that should persist throughout test execution. The current code at line 126–131 only asserts when exit is unsuccessful; if the validator exits with code 0, the test silently continues without a live validator. This allows tests to run against a dead validator, masking real issues.

💡 Proposed fix

- if let Some(status) = child.try_wait().expect("Failed to poll validator process") { - assert!( - status.success(), - "Validator exited early with error: {}", - status - ); - } + if let Some(status) = child.try_wait().expect("Failed to poll validator process") { + panic!("Validator exited before readiness window completed: {}", status); + }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk-libs/client/src/local_test_validator.rs` around lines 125 - 132, The validator process check using child.try_wait() must treat any early exit (Some(status)) as failure even when status.success() is true; modify the logic around tokio::time::sleep(... config.wait_time) and the child.try_wait() call so that if try_wait() returns Some(status) you always fail the test (e.g., panic/assert) with a clear message including the status, rather than only asserting on non-successful status; update the branch that currently inspects status.success() to instead unconditionally fail on Some(status) and include the status details in the error message to aid debugging.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (7)

sdk-libs/client/src/local_test_validator.rs (1)

126-132: ⚠️ Potential issue | 🟠 Major

Fail on any early validator exit in non-surfpool mode.

At Line 126, Some(status) means the validator exited during the readiness window; that should fail regardless of status.success(), otherwise tests may continue without a live validator.

Proposed fix

-            if let Some(status) = child.try_wait().expect("Failed to poll validator process") {
-                assert!(
-                    status.success(),
-                    "Validator exited early with error: {}",
-                    status
-                );
-            }
+            if let Some(status) = child.try_wait().expect("Failed to poll validator process") {
+                panic!(
+                    "Validator exited before readiness window completed: {}",
+                    status
+                );
+            }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/local_test_validator.rs` around lines 126 - 132, The
current readiness check only treats an early exit as failing when
status.success() is false; change the logic so any Some(status) returned from
child.try_wait() is considered a failure during the readiness window. Locate the
block using child.try_wait() in local_test_validator.rs (the if let Some(status)
= child.try_wait() ... block) and replace the conditional so that encountering
Some(status) always causes a test failure (include the status in the error
message) rather than only failing when status.success() is false.

sdk-libs/client/src/interface/initialize_config.rs (1)

7-7: ⚠️ Potential issue | 🟠 Major

Keep sdk-libs/client decoupled from sibling SDK crates.

InitializeLightConfigParams is imported from light_account and used directly for instruction payload assembly. That keeps an SDK-to-SDK dependency in this module; use a local boundary DTO (or a program-lib interface type) and map into it here.

As per coding guidelines, sdk-libs/**/src/**/*.rs: SDK-libs must depend on program-libs, light-prover-client and external crates only.

Also applies to: 103-114
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/client/src/interface/initialize_config.rs` at line 7, This module
currently imports and uses InitializeLightConfigParams from the sibling crate
light_account; remove that direct SDK-to-SDK dependency by defining a local DTO
(e.g., InitializeLightConfigDto) or using the program-lib interface type inside
initialize_config.rs and map into the program-level type where the instruction
payload is assembled (the code around the existing usage of
InitializeLightConfigParams at the assembly site referenced in lines ~103-114).
Replace the use declaration for InitializeLightConfigParams with the local DTO,
update the code that builds the instruction payload to convert/map from your
local DTO to the program-lib expected type, and ensure all references to
InitializeLightConfigParams in this file are replaced by the new local type and
mapping logic.

prover/client/src/proof.rs (1)

72-94: ⚠️ Potential issue | 🟠 Major

Pre-parse length guard is still missing.

The function parses the entire hex string via from_str_radix (line 77) before checking the byte length (line 80). For adversarially large inputs, this can cause unnecessary CPU and memory consumption.

Add a pre-flight check on the normalized hex string length before parsing.

🛠️ Proposed fix

 pub fn deserialize_hex_string_to_be_bytes(hex_str: &str) -> Result<[u8; 32], ProverClientError> {
     let trimmed_str = hex_str
         .strip_prefix("0x")
         .or_else(|| hex_str.strip_prefix("0X"))
         .unwrap_or(hex_str);
+    // Pre-flight check: 64 hex chars = 32 bytes max
+    let normalized = trimmed_str.trim_start_matches('0');
+    if normalized.len() > 64 {
+        return Err(ProverClientError::InvalidHexString(format!(
+            "{hex_str}: exceeds 32 bytes"
+        )));
+    }
     let big_uint = num_bigint::BigUint::from_str_radix(trimmed_str, 16)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof.rs` around lines 72 - 94, The function
deserialize_hex_string_to_be_bytes currently calls
num_bigint::BigUint::from_str_radix on trimmed_str before validating its length;
add a pre-parse guard on the normalized trimmed_str (the value computed by
strip_prefix) to reject inputs whose hex length exceeds 64 characters (32 bytes)
and return ProverClientError::InvalidHexString immediately, and also normalize
odd-length hex strings by left-padding with a '0' before parsing so
from_str_radix isn't given malformed input; implement this check right after
computing trimmed_str and before calling BigUint::from_str_radix in
deserialize_hex_string_to_be_bytes.

prover/client/src/helpers.rs (1)

56-76: ⚠️ Potential issue | 🟠 Major

Path index bounds validation is still missing.

The function now returns Result, which is great, but it still silently truncates path_index values that exceed the tree height. When path_index >= (1 << HEIGHT), the higher bits are consumed without error, computing an incorrect root.

After the loop completes, if current_index != 0, the path index was too large for the tree.

🛠️ Proposed fix

     for (level, path_element) in path_elements.iter().enumerate() {
         changelog_entry.path[level] = Some(current_hash);
         if current_index.is_multiple_of(2) {
             current_hash = Poseidon::hashv(&[&current_hash, path_element])?;
         } else {
             current_hash = Poseidon::hashv(&[path_element, &current_hash])?;
         }
         current_index /= 2;
     }
+    if current_index != 0 {
+        return Err(ProverClientError::InvalidProofData(format!(
+            "path index {} exceeds tree height {}",
+            path_index, HEIGHT
+        )));
+    }

     Ok((current_hash, changelog_entry))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/helpers.rs` around lines 56 - 76,
compute_root_from_merkle_proof currently silently truncates oversized path_index
by dividing current_index in the loop; detect this and return an Err when
path_index is too large for the tree (i.e. after the loop if current_index !=
0). Update compute_root_from_merkle_proof to validate that path_index < (1 <<
HEIGHT) (or equivalently check current_index == 0 after processing all HEIGHT
levels) and return a ProverClientError (create or reuse an appropriate variant)
instead of returning an incorrect root and ChangelogEntry; reference
variables/function names: compute_root_from_merkle_proof, path_index,
current_index, HEIGHT, and ChangelogEntry.

prover/client/src/proof_types/batch_append/proof_inputs.rs (1)

193-193: ⚠️ Potential issue | 🟠 Major

Line 193 still panics on malformed merkle proof lengths.

The preflight checks on lines 131-142 validate array counts, but individual proof heights are not validated. When merkle_proof has the wrong number of elements, try_into().unwrap() will panic instead of returning a ProverClientError.

Note that BatchAppendsCircuitInputs::new (lines 64-72) correctly handles this with map_err, so this function should follow the same pattern.

🛠️ Proposed fix

-        let merkle_proof_array = merkle_proof.try_into().unwrap();
+        let merkle_proof_array: [[u8; 32]; HEIGHT] = merkle_proof.try_into().map_err(|_| {
+            ProverClientError::GenericError(format!(
+                "invalid merkle proof length at index {}: expected {}",
+                i, HEIGHT
+            ))
+        })?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@prover/client/src/proof_types/batch_append/proof_inputs.rs` at line 193, The
code currently calls merkle_proof.try_into().unwrap() which will panic on
malformed proof lengths; instead propagate the conversion error into a
ProverClientError like BatchAppendsCircuitInputs::new does. Replace the unwrap
with try_into().map_err(...) and convert the conversion error into the
appropriate ProverClientError (e.g., an InvalidMerkleProofLength/BadInput
variant) so merkle_proof_array creation returns a Result rather than panicking;
reference the merkle_proof variable, merkle_proof_array binding, the
ProverClientError type, and mirror the error-mapping pattern used in
BatchAppendsCircuitInputs::new.

forester/src/epoch_manager.rs (1)

477-480: ⚠️ Potential issue | 🟠 Major

Block late worker registration after shutdown is requested.

Line 477 currently accepts new workers even after shutdown starts. Since shutdown_new_tree_workers drains only once, late workers can escape cancellation/join and outlive service exit.

💡 Suggested fix

 async fn register_new_tree_worker(&self, worker: NewTreeWorker) {
+    if self.shutdown_requested.load(Ordering::Acquire) {
+        let mut worker = worker;
+        if let Some(cancel) = worker.cancel.take() {
+            let _ = cancel.send(());
+        }
+        self.detach_new_tree_worker_join(worker);
+        return;
+    }
     self.reap_finished_new_tree_workers().await;
     self.new_tree_workers.lock().await.push(worker);
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@forester/src/epoch_manager.rs` around lines 477 - 480,
register_new_tree_worker currently pushes new workers even after shutdown is
initiated, letting late workers escape the single-run drain in
shutdown_new_tree_workers; modify register_new_tree_worker to check the shutdown
state before accepting a worker (e.g., inspect the same guard/flag or a shared
AtomicBool used by shutdown_new_tree_workers) while holding the new_tree_workers
lock, and refuse/skip pushing the worker (or return early) when shutdown has
been requested so late workers are not added after shutdown_new_tree_workers
runs.

sdk-libs/program-test/src/indexer/test_indexer.rs (1)

98-111: ⚠️ Potential issue | 🟠 Major

Unwrap timed prover envelopes before deserializing.

build_compressed_proof() still feeds the full 200 OK body into deserialize_gnark_proof_json(). If the prover responds with { "proof": ..., "proof_duration_ms": ... }, this helper still fails on an otherwise successful proof request.

🔧 Suggested fix

+#[derive(serde::Deserialize)]
+struct ProofEnvelope {
+    proof: serde_json::Value,
+}
+
 fn build_compressed_proof(body: &str) -> Result<CompressedProof, IndexerError> {
-    let proof_json = deserialize_gnark_proof_json(body)
+    let proof_body = serde_json::from_str::<ProofEnvelope>(body)
+        .map(|wrapped| wrapped.proof.to_string())
+        .unwrap_or_else(|_| body.to_owned());
+    let proof_json = deserialize_gnark_proof_json(&proof_body)
         .map_err(|error| IndexerError::CustomError(error.to_string()))?;
     let (proof_a, proof_b, proof_c) = proof_from_json_struct(proof_json)
         .map_err(|error| IndexerError::CustomError(error.to_string()))?;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 98 - 111,
build_compressed_proof currently passes the whole HTTP body (which may be an
envelope like { "proof": ..., "proof_duration_ms": ... }) into
deserialize_gnark_proof_json; first parse/unwrap that envelope to extract the
inner proof payload (the "proof" field) before calling
deserialize_gnark_proof_json, then continue with proof_from_json_struct and
compress_proof unchanged; update build_compressed_proof to detect the envelope
shape (e.g., try deserializing into a small struct with a proof field or check
for a top-level "proof" key) and use the inner proof string/object for
subsequent deserialization.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 115-116: The current unbounded await on child.wait().await can
hang tests; wrap the wait with tokio::time::timeout using the configured timeout
(e.g., config.wait_time or a surfpool-specific timeout like surfpool_timeout)
when awaiting the CLI child process (the child.wait().await call that produces
status). If the timeout elapses, ensure you kill/terminate the child process
(child.kill() or similar), await its termination, and then fail the test
asserting with a clear timeout error message instead of blocking; otherwise,
continue to assert on status.success() as before.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 2595-2608: The retry loop is being short-circuited by using `?` on
`response.text().await` and `build_compressed_proof(&body)?` so transient
failures never update `last_error` or retry; change the success-handling branch
that inspects `response_result`/`status` to explicitly attempt to read the body
and build the proof inside a match/if let and on any error (body read failure or
proof parsing failure) set `last_error` to an appropriate
`IndexerError::CustomError` with the error details and continue the loop instead
of returning early; only return `Ok(ValidityProofWithContext { accounts:
account_proof_inputs, addresses: address_proof_inputs, proof: ... })` after both
`response.text().await` and `build_compressed_proof(&body)` succeed without
using `?`.

---

Duplicate comments:
In `@forester/src/epoch_manager.rs`:
- Around line 477-480: register_new_tree_worker currently pushes new workers
even after shutdown is initiated, letting late workers escape the single-run
drain in shutdown_new_tree_workers; modify register_new_tree_worker to check the
shutdown state before accepting a worker (e.g., inspect the same guard/flag or a
shared AtomicBool used by shutdown_new_tree_workers) while holding the
new_tree_workers lock, and refuse/skip pushing the worker (or return early) when
shutdown has been requested so late workers are not added after
shutdown_new_tree_workers runs.

In `@prover/client/src/helpers.rs`:
- Around line 56-76: compute_root_from_merkle_proof currently silently truncates
oversized path_index by dividing current_index in the loop; detect this and
return an Err when path_index is too large for the tree (i.e. after the loop if
current_index != 0). Update compute_root_from_merkle_proof to validate that
path_index < (1 << HEIGHT) (or equivalently check current_index == 0 after
processing all HEIGHT levels) and return a ProverClientError (create or reuse an
appropriate variant) instead of returning an incorrect root and ChangelogEntry;
reference variables/function names: compute_root_from_merkle_proof, path_index,
current_index, HEIGHT, and ChangelogEntry.

In `@prover/client/src/proof_types/batch_append/proof_inputs.rs`:
- Line 193: The code currently calls merkle_proof.try_into().unwrap() which will
panic on malformed proof lengths; instead propagate the conversion error into a
ProverClientError like BatchAppendsCircuitInputs::new does. Replace the unwrap
with try_into().map_err(...) and convert the conversion error into the
appropriate ProverClientError (e.g., an InvalidMerkleProofLength/BadInput
variant) so merkle_proof_array creation returns a Result rather than panicking;
reference the merkle_proof variable, merkle_proof_array binding, the
ProverClientError type, and mirror the error-mapping pattern used in
BatchAppendsCircuitInputs::new.

In `@prover/client/src/proof.rs`:
- Around line 72-94: The function deserialize_hex_string_to_be_bytes currently
calls num_bigint::BigUint::from_str_radix on trimmed_str before validating its
length; add a pre-parse guard on the normalized trimmed_str (the value computed
by strip_prefix) to reject inputs whose hex length exceeds 64 characters (32
bytes) and return ProverClientError::InvalidHexString immediately, and also
normalize odd-length hex strings by left-padding with a '0' before parsing so
from_str_radix isn't given malformed input; implement this check right after
computing trimmed_str and before calling BigUint::from_str_radix in
deserialize_hex_string_to_be_bytes.

In `@sdk-libs/client/src/interface/initialize_config.rs`:
- Line 7: This module currently imports and uses InitializeLightConfigParams
from the sibling crate light_account; remove that direct SDK-to-SDK dependency
by defining a local DTO (e.g., InitializeLightConfigDto) or using the
program-lib interface type inside initialize_config.rs and map into the
program-level type where the instruction payload is assembled (the code around
the existing usage of InitializeLightConfigParams at the assembly site
referenced in lines ~103-114). Replace the use declaration for
InitializeLightConfigParams with the local DTO, update the code that builds the
instruction payload to convert/map from your local DTO to the program-lib
expected type, and ensure all references to InitializeLightConfigParams in this
file are replaced by the new local type and mapping logic.

In `@sdk-libs/client/src/local_test_validator.rs`:
- Around line 126-132: The current readiness check only treats an early exit as
failing when status.success() is false; change the logic so any Some(status)
returned from child.try_wait() is considered a failure during the readiness
window. Locate the block using child.try_wait() in local_test_validator.rs (the
if let Some(status) = child.try_wait() ... block) and replace the conditional so
that encountering Some(status) always causes a test failure (include the status
in the error message) rather than only failing when status.success() is false.

In `@sdk-libs/program-test/src/indexer/test_indexer.rs`:
- Around line 98-111: build_compressed_proof currently passes the whole HTTP
body (which may be an envelope like { "proof": ..., "proof_duration_ms": ... })
into deserialize_gnark_proof_json; first parse/unwrap that envelope to extract
the inner proof payload (the "proof" field) before calling
deserialize_gnark_proof_json, then continue with proof_from_json_struct and
compress_proof unchanged; update build_compressed_proof to detect the envelope
shape (e.g., try deserializing into a small struct with a proof field or check
for a top-level "proof" key) and use the inner proof string/object for
subsequent deserialization.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 47ddc44c-796b-49dd-82de-36532cb1bdaf

📥 Commits

Reviewing files that changed from the base of the PR and between c97abbd and 0ae954a.

⛔ Files ignored due to path filters (5)

program-tests/utils/src/mock_batched_forester.rs is excluded by none and included by none
sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/read_only.rs is excluded by none and included by none
sdk-tests/sdk-anchor-test/programs/sdk-anchor-test/tests/test.rs is excluded by none and included by none
sdk-tests/sdk-native-test/tests/test.rs is excluded by none and included by none
sdk-tests/sdk-pinocchio-v1-test/tests/test.rs is excluded by none and included by none

📒 Files selected for processing (19)

forester/src/epoch_manager.rs
forester/src/forester_status.rs
forester/src/metrics.rs
forester/src/priority_fee.rs
forester/src/processor/v2/proof_worker.rs
forester/tests/e2e_test.rs
forester/tests/legacy/batched_state_async_indexer_test.rs
forester/tests/test_batch_append_spent.rs
prover/client/src/helpers.rs
prover/client/src/proof.rs
prover/client/src/proof_types/batch_address_append/json.rs
prover/client/src/proof_types/batch_append/proof_inputs.rs
prover/client/src/proof_types/combined/v2/json.rs
prover/client/src/proof_types/non_inclusion/v2/json.rs
prover/client/tests/batch_address_append.rs
prover/client/tests/init_merkle_tree.rs
sdk-libs/client/src/interface/initialize_config.rs
sdk-libs/client/src/local_test_validator.rs
sdk-libs/program-test/src/indexer/test_indexer.rs

coderabbitai · 2026-03-17T11:29:26Z

sdk-libs/client/src/local_test_validator.rs

+            let status = child.wait().await.expect("Failed to wait for CLI process");
            assert!(status.success(), "CLI exited with error: {}", status);


⚠️ Potential issue | 🟠 Major

Bound surfpool startup wait to prevent indefinite hangs.

At Line 115, child.wait().await is unbounded. If the CLI stalls, tests can hang indefinitely; wrap with tokio::time::timeout (ideally using config.wait_time or a dedicated surfpool timeout).

Proposed fix

- let status = child.wait().await.expect("Failed to wait for CLI process"); + let status = tokio::time::timeout( + tokio::time::Duration::from_secs(config.wait_time), + child.wait(), + ) + .await + .expect("Timed out waiting for CLI process") + .expect("Failed to wait for CLI process");

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk-libs/client/src/local_test_validator.rs` around lines 115 - 116, The current unbounded await on child.wait().await can hang tests; wrap the wait with tokio::time::timeout using the configured timeout (e.g., config.wait_time or a surfpool-specific timeout like surfpool_timeout) when awaiting the CLI child process (the child.wait().await call that produces status). If the timeout elapses, ensure you kill/terminate the child process (child.kill() or similar), await its termination, and then fail the test asserting with a clear timeout error message instead of blocking; otherwise, continue to assert on status.success() as before.

coderabbitai · 2026-03-17T11:29:26Z

sdk-libs/program-test/src/indexer/test_indexer.rs

+                    Ok(response_result) => {
+                        let status = response_result.status();
+                        let body = response_result.text().await.map_err(|error| {
+                            IndexerError::CustomError(format!(
+                                "failed to read prover response body: {error}"
+                            ))
+                        })?;
+
+                        if status.is_success() {
+                            return Ok(ValidityProofWithContext {
+                                accounts: account_proof_inputs,
+                                addresses: address_proof_inputs,
+                                proof: build_compressed_proof(&body)?.into(),
+                            });


⚠️ Potential issue | 🟠 Major

Don't short-circuit the retry loop on body/proof parsing failures.

Line 2597 and Line 2607 still use ?, so a transient response.text() failure or malformed success body exits on the first attempt instead of updating last_error and retrying. That leaves the new retry logic ineffective for two common flaky-prover cases.

🔧 Suggested fix

match response_result { Ok(response_result) => { let status = response_result.status(); - let body = response_result.text().await.map_err(|error| { - IndexerError::CustomError(format!( - "failed to read prover response body: {error}" - )) - })?; + let body = match response_result.text().await { + Ok(body) => body, + Err(error) => { + last_error = + format!("failed to read prover response body: {error}"); + retries -= 1; + if retries > 0 { + tokio::time::sleep(Duration::from_secs(5)).await; + } + continue; + } + }; if status.is_success() { + let proof = match build_compressed_proof(&body) { + Ok(proof) => proof, + Err(error) => { + last_error = + format!("failed to parse prover response body: {error}"); + retries -= 1; + if retries > 0 { + tokio::time::sleep(Duration::from_secs(5)).await; + } + continue; + } + }; return Ok(ValidityProofWithContext { accounts: account_proof_inputs, addresses: address_proof_inputs, - proof: build_compressed_proof(&body)?.into(), + proof: proof.into(), }); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Ok(response_result) => {

let status = response_result.status();

let body = response_result.text().await.map_err(|error| {

IndexerError::CustomError(format!(

"failed to read prover response body: {error}"

))

})?;

if status.is_success() {

return Ok(ValidityProofWithContext {

accounts: account_proof_inputs,

addresses: address_proof_inputs,

proof: build_compressed_proof(&body)?.into(),

});

Ok(response_result) => {

let status = response_result.status();

let body = match response_result.text().await {

Ok(body) => body,

Err(error) => {

last_error =

format!("failed to read prover response body: {error}");

retries -= 1;

if retries > 0 {

tokio::time::sleep(Duration::from_secs(5)).await;

}

continue;

}

};

if status.is_success() {

let proof = match build_compressed_proof(&body) {

Ok(proof) => proof,

Err(error) => {

last_error =

format!("failed to parse prover response body: {error}");

retries -= 1;

if retries > 0 {

tokio::time::sleep(Duration::from_secs(5)).await;

}

continue;

}

};

return Ok(ValidityProofWithContext {

accounts: account_proof_inputs,

addresses: address_proof_inputs,

proof: proof.into(),

});

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@sdk-libs/program-test/src/indexer/test_indexer.rs` around lines 2595 - 2608, The retry loop is being short-circuited by using `?` on `response.text().await` and `build_compressed_proof(&body)?` so transient failures never update `last_error` or retry; change the success-handling branch that inspects `response_result`/`status` to explicitly attempt to read the body and build the proof inside a match/if let and on any error (body read failure or proof parsing failure) set `last_error` to an appropriate `IndexerError::CustomError` with the error details and continue the loop instead of returning early; only return `Ok(ValidityProofWithContext { accounts: account_proof_inputs, addresses: address_proof_inputs, proof: ... })` after both `response.text().await` and `build_compressed_proof(&body)` succeed without using `?`.

sergeytimoshin requested review from SwenSchaeferjohann and ananas-block as code owners March 14, 2026 16:52

sergeytimoshin force-pushed the sergey/forester-concurrency branch 4 times, most recently from dc3b8b4 to fe08bcd Compare March 14, 2026 19:12

feat: optimize address batch pipeline

ce73e21

sergeytimoshin force-pushed the sergey/forester-concurrency branch from fe08bcd to ef9d7fe Compare March 14, 2026 19:21

format

d0e4697

sergeytimoshin force-pushed the sergey/forester-concurrency branch 3 times, most recently from aaaa67c to fbf48e5 Compare March 14, 2026 21:07

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

sergeytimoshin force-pushed the sergey/forester-concurrency branch from fbf48e5 to 46b80e9 Compare March 14, 2026 21:34

sergeytimoshin added 2 commits March 15, 2026 00:35

feat: stabilize address batch pipeline

4650c0a

feat: batch cold account loads in light client

ed44b77

sergeytimoshin force-pushed the sergey/forester-concurrency branch from 46b80e9 to 786422b Compare March 15, 2026 00:37

fix: harden load batching and mixed decompression

f9a788c

sergeytimoshin force-pushed the sergey/forester-concurrency branch from 786422b to 2357022 Compare March 15, 2026 15:04

Fix prover startup and decompression load flow

7ab1c90

sergeytimoshin force-pushed the sergey/forester-concurrency branch from 2357022 to 4058e37 Compare March 16, 2026 15:17

sergeytimoshin added 3 commits March 16, 2026 16:17

cleanup: harden prover startup polling

588127c

fix: harden test indexer proof parsing

1e32b49

fix: harden runtime safety edge cases

d436767

sergeytimoshin force-pushed the sergey/forester-concurrency branch from 4058e37 to 82dc761 Compare March 17, 2026 08:34

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

sergeytimoshin force-pushed the sergey/forester-concurrency branch 2 times, most recently from 8040138 to 30a3abb Compare March 17, 2026 09:46

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

sergeytimoshin force-pushed the sergey/forester-concurrency branch from 30a3abb to c97abbd Compare March 17, 2026 10:26

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

sergeytimoshin force-pushed the sergey/forester-concurrency branch from c97abbd to 0ae954a Compare March 17, 2026 11:07

coderabbitai bot reviewed Mar 17, 2026

View reviewed changes

sergeytimoshin force-pushed the sergey/forester-concurrency branch 3 times, most recently from 9e55024 to cb20052 Compare March 17, 2026 14:15

fix: harden runtime safety fallout

0583fc1

sergeytimoshin force-pushed the sergey/forester-concurrency branch 2 times, most recently from 738ed6a to 8c74aa1 Compare March 17, 2026 15:24

sergeytimoshin added 5 commits March 17, 2026 17:10

format

57e09f2

format

d5a99df

feat: isolate and track forester worker concurrency

4a8fb3b

fix: harden forester concurrency review findings

7c6076c

format

c5e3fb5

sergeytimoshin force-pushed the sergey/forester-concurrency branch from 8c74aa1 to c5e3fb5 Compare March 17, 2026 18:18

		changelog_index: staged_indexed_changelog.len(), //change_log_index,
		};

-            changelog_index: staged_indexed_changelog.len(), //change_log_index,
-        };
+            let low_element_changelog_entry = IndexedChangelogEntry {
+                element: new_low_element_raw,
+                proof: low_element_changelog_proof,
+                changelog_index: staged_changelog.len() - 1,
+            };

		start_index: aq.start_index,
		tree_next_insertion_index: aq.start_index,

		use crate::interface::instructions::INITIALIZE_COMPRESSION_CONFIG_DISCRIMINATOR;

		let status = child.wait().await.expect("Failed to wait for CLI process");
		assert!(status.success(), "CLI exited with error: {}", status);

Conversation

sergeytimoshin commented Mar 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 17, 2026

Choose a reason for hiding this comment

sergeytimoshin commented Mar 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 14, 2026 •

edited

Loading