Skip to content

[POC] feat: same slot composability and real time proving #804

Draft
AnshuJalan wants to merge 24 commits intomasterfrom
surge-real-time-poc
Draft

[POC] feat: same slot composability and real time proving #804
AnshuJalan wants to merge 24 commits intomasterfrom
surge-real-time-poc

Conversation

@AnshuJalan
Copy link
Copy Markdown
Collaborator

@AnshuJalan AnshuJalan commented Jan 22, 2026

Idea

We want the user to be able to perform an L1 -> L2 -> L1 transaction in the same L1 slot. For example:

  • User bridges ETH to L2 (L1 -> L2 txn or L2Call)
  • ETH gets swapped to USDC on L2
  • USDC gets bridged back to L1 (L2 -> L1 txn or L1Call)

Technicalities

  1. [L1] The User sends a UserOp to a surge_sendUserOp endpoint on Catalyst. This user op contains the bridge message for the L2Call.
  2. [L2] Catalyst acts like a relayer and pulls the operation from the user op, and adds the L2Call transaction into the transaction list of the block it preconfirms.
  3. [L2] Catalyst inserts the signal slots generated from UserOp submission on L1 (Step 1) as a "fast signal" into the anchor txn of the L2 block it creates.
  4. [L2] Catalyst records any L1Calls generated from the L2Call execution of step 2.

Proposal submission

A "multicall bundle" of 3 transactions are submitted on L1 in the exact sequence:

  1. UserOp transaction
  2. Proposal in Inbox
  3. L1 Call

The POC is most understandable when we consider 1 block / batch and 1 user op / block.

AnshuJalan and others added 5 commits January 22, 2026 15:31
* refac: add short commit hash as tag for docker build

* fix: image not found in staging when using commit hash

- uses: docker/login-action@v3
- name: Login to JFrog Artifactory
uses: docker/login-action@v3

Check warning

Code scanning / CodeQL

Unpinned tag for a non-immutable Action in workflow Medium

Unpinned 3rd party Action 'Catalyst Node - Docker build and push' step
Uses Step
uses 'docker/login-action' with ref 'v3', not a pinned commit hash
steps:
- uses: docker/login-action@v3
- name: Login to JFrog Artifactory
uses: docker/login-action@v3

Check warning

Code scanning / CodeQL

Unpinned tag for a non-immutable Action in workflow Medium

Unpinned 3rd party Action 'Catalyst Node - Docker build and push' step
Uses Step
uses 'docker/login-action' with ref 'v3', not a pinned commit hash
@AnshuJalan AnshuJalan changed the title [POC] feat: add a poc for proposing with real time proofs [POC] feat: same slot composability and real time proving Jan 28, 2026
AnshuJalan and others added 7 commits January 29, 2026 14:42
* feat: surge actual real time proving

* feat: nits

* fix: preconfing

* raiko interaction

* feat: connect with surge verifier

* fix: proof body

* feat: recovery

* feat: update catalyst

* fix: bind server to 0.0.0.0

* fix: bind server to 0.0.0.0

* feat: add faster polling

* feat: hop proving

* fix: temp push image to prod

* feat: resilience

* feat(realtime): L2 UserOps for bridge-out withdrawals (#922)

* feat(realtime): support L2 UserOps for bridge-out (L2→L1 withdrawals)

Add the ability for users to submit UserOps that execute on L2, enabling
bridge-out functionality. The catalyst now processes both L1→L2 deposits
and L2→L1 withdrawals in the same block.

Changes:
- New `surge_sendL2UserOp` RPC method for submitting L2-targeted UserOps
- L2 UserOp execution transactions are constructed and included in L2 blocks
- After block execution, existing `find_l1_call()` detects the resulting
  bridge MessageSent events and relays them to L1 via processMessage
- Block building handles mixed deposit + withdrawal transactions
- Remove `disable_bridging` gate that prevented bridge handler startup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove unused deposit watcher and plan doc

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(realtime): single surge_sendUserOp RPC with auto chain detection

Instead of separate RPC methods for L1 and L2 UserOps, the single
surge_sendUserOp endpoint now auto-detects the target chain by parsing
the EIP-712 signature in the executeBatch calldata.

The UserOpsSubmitter's EIP-712 domain includes chainId, so the signature
is only valid for one chain. We compute the EIP-712 digest for both L1
and L2 chain IDs, ecrecover each, and route accordingly:

- L1 signature → L1→L2 deposit flow (simulate on L1, processMessage on L2)
- L2 signature → L2 direct execution (UserOp tx in L2 block, L2→L1 relay via find_l1_call)

Both types can coexist in the same block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(realtime): use explicit chainId param instead of signature detection

The surge_sendUserOp RPC now accepts an optional chainId field in the
UserOp params. If chainId matches L2, the UserOp is executed directly
on L2. Otherwise defaults to L1 (backwards compatible).

Removes the EIP-712 signature parsing logic which was unreliable
(ecrecover always returns a non-zero address).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(realtime): increase L2 tx gas limit to 3M

processMessage and L2 UserOp transactions need more gas for operations
that deploy contracts (e.g. CREATE2 smart wallet deployment via bridge
relay). 1M gas was insufficient — the bridge's post-call gas check
was failing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(realtime): only increase gas for bridge/UserOp txs, not anchor

The anchor tx has a required gas limit enforced by the L2 engine.
Revert anchor to 1M, keep processMessage and UserOp txs at 3M.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(realtime): track L2 UserOp status through proving and submission

L2 direct UserOps now get ProvingBlock and Executed/Rejected status
updates, same as L1 UserOps. Added l2_user_op_ids to Proposal struct
and included them in the async submitter's status tracking.

Also adds cleanup: status entries are removed from sled after 60s
to prevent unbounded disk growth.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address code review feedback on L2 UserOps PR

- Fix misleading Processing(zero_hash) status: remove premature status
  update for L2Direct UserOps; status remains Pending until async_submitter
  sets ProvingBlock
- Propagate add_l2_user_op_id error instead of silently ignoring with let _,
  which could leave status entries orphaned forever
- Remove redundant target_chain variable, simplify to direct chain_id comparison
- Fix doc comment: routing is based on chainId field, not EIP-712 signature
- Add SAFETY comment for Recovered::new_unchecked explaining why it's correct

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address second round of review feedback

- Mark L2 UserOp IDs as Rejected on L1 multicall failure (were stuck at ProvingBlock)
- Track L2 UserOp ID before inserting tx into block (prevents executed-but-Rejected state)
- Reject UserOps with unknown chainId instead of silently treating as L1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: clarify debug log when L2Direct UserOp is handled

The "No pending UserOps" log was misleading when an L2Direct op was
processed, since add_pending_user_ops_to_draft_block returns None for
both "nothing queued" and "L2Direct handled". Updated to distinguish.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* logs

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: AnshuJalan <anshujalan206@gmail.com>

* fix: apply cargo fmt to resolve CI lint failures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Justin Chan <justin.chan@nethermind.io>
Co-authored-by: Ahmad Bitar <33181301+smartprogrammer93@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: smartprogrammer93 <smartprogrammer@windowslive.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…env var

Restores the match-based fork selection from master so Shasta,
Pacaya, and Permissionless modes continue to work. Adds Realtime
as a new Fork variant, only activated via FORK=realtime env var
override (never auto-selected by timestamps).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@smartprogrammer93
Copy link
Copy Markdown
Contributor

@claude review

@claude
Copy link
Copy Markdown

claude bot commented Mar 29, 2026

Claude finished @smartprogrammer93's task in 3m 5s —— View job


PR Review: [POC] feat: same slot composability and real time proving

Tasks

  • Gather PR context
  • Read new realtime/ crate core files
  • Read shasta/ modifications (execution layers, proposal manager, bridge handler)
  • Read common infrastructure changes
  • Analyze architecture and design
  • Check code quality (safety, error handling, Clippy rules)
  • Post review findings

Summary

This is a well-structured POC for same-slot L1↔L2↔L1 composability and real-time ZK proving. The architecture is clear: surge_sendUserOp → route by chainId → include in L2 block → submit multicall bundle (UserOp + Propose + L1Call). The realtime/ crate is a clean fork with good separation of concerns. Below are findings from most to least critical.


Critical

1. Hardcoded Foundry private key in two places

shasta/src/l1/execution_layer.rs:124-127 and shasta/src/node/proposal_manager/bridge_handler.rs:196-200 both hard-code Foundry anvil key 0xac0974bec... — for checkpoint signing and L1 signal slot proof signing respectively. Even in a POC branch, this is a security hazard if accidentally deployed. The realtime/ fork correctly avoids the signature-based proof approach and uses merkle proofs instead. The Shasta variant's comment ("must be a merkle proof in production") acknowledges this but doesn't prevent it from being used.

Recommended: Source the key from an environment variable at minimum, or remove the signature-based proof path from Shasta entirely.


Medium

2. Silent data loss for multiple UserOps and L1Calls [realtime/src/l1/proposal_tx_builder.rs:91-124]

// Only first UserOp is processed:
if !batch.user_ops.is_empty() && let Some(user_op) = batch.user_ops.first()

// Only first L1Call is processed:
if !batch.l1_calls.is_empty() && let Some(l1_call) = batch.l1_calls.first()

If multiple UserOps or L1Calls are queued in a batch, all but the first are silently dropped at submission time. The batch builder correctly accumulates them (user_ops: vec![], l1_calls: vec![]), but the TX builder ignores extras. Should either enforce 1:1 at the builder level or iterate over all entries. Fix this →

3. expect() panic violates project Clippy rules [realtime/src/lib.rs:108]

config.l1_rpc_urls.first().expect("L1 RPC URL is required")

The workspace enforces unwrap_used = "deny". This will fail cargo clippy --all-features -- -D warnings. It should use .ok_or_else(|| anyhow!("L1 RPC URL is required"))? instead. Fix this →

4. Duplicated blob sidecar construction [realtime/src/node/proposal_manager/async_submitter.rs:127-146]

The blob sidecar is built twice: once in async_submitter.rs to extract blob hashes for the Raiko request, and again in proposal_tx_builder.rs for the actual transaction. These two paths must produce identical blobs or the proof will be invalid. A refactor that builds the sidecar once and passes it through would eliminate this risk. Fix this →

5. UserOp ID counter resets on restart, sled DB persists [realtime/src/node/proposal_manager/bridge_handler.rs:125-126]

next_id: Arc::new(AtomicU64::new(1)),

The in-memory counter starts at 1 on each restart, but sled persists the status DB on disk. After a restart, new UserOps get IDs that may collide with existing (now-stale) records. Clients polling with an old ID could get wrong status. The counter should be persisted or IDs should be timestamped/random.

6. Hardcoded bridge/signal service addresses via string formatting [realtime/src/l2/execution_layer.rs:55-62]

let bridge_address: Address =
    format!("0x{}{}01", chain_id_string, "0".repeat(zeros_needed)).parse()?;
let signal_service: Address =
    format!("0x{}{}05", chain_id_string, "0".repeat(zeros_needed)).parse()?;

This assumes a specific address derivation from chain ID with no documentation. If the contracts are deployed differently, or the chain ID format changes, this silently produces a wrong address and all bridge operations fail. Should be sourced from the L2 node's contract config or env vars. Fix this →

7. Bridge RPC server binds to hardcoded address [realtime/src/node/proposal_manager/mod.rs:80]

let bridge_addr: SocketAddr = "0.0.0.0:4545".parse()?;

This should come from RealtimeConfig to allow operators to configure it. Fix this →

8. RealtimeConfig read twice during init [realtime/src/l1/execution_layer.rs:103-106]

RealtimeConfig::read_env_variables() is called inside ExecutionLayer::new() independently from the outer create_realtime_node() call. Any inconsistency or env change between calls could create subtle differences. The config should be passed in.

9. Fixed gas limits on L2 transactions [realtime/src/l2/execution_layer.rs:263-265, 338-340]

gas_limit: 3_000_000,
max_fee_per_gas: 1_000_000_000,

These magic numbers are hardcoded for both construct_l2_user_op_tx and construct_l2_call_tx. At 1 gwei max fee, these transactions would fail if the L2 base fee is higher. Should estimate gas dynamically or make these configurable.

10. Unimplemented stub silently succeeds [realtime/src/l2/execution_layer.rs:142-151]

pub async fn transfer_eth_from_l2_to_l1(...) -> Result<(), Error> {
    warn!("Implement bridge transfer logic here");
    Ok(())
}

This returns Ok(()) while doing nothing. If any code path calls this expecting funds to be transferred, it will silently succeed without bridging anything.


Low Priority / Style

11. No tests for realtime/ crate

The entire new realtime/ crate (24 source files) has zero unit tests. At minimum the BatchBuilder, BridgeHandler::next_user_op_routed() routing logic, and the ProposalTxBuilder multicall construction should have unit tests.

12. All Raiko polling responses logged at warn [realtime/src/raiko/mod.rs:128-130]

warn!("Raiko response (attempt {}): HTTP {} | body: {}", ...);

Every single polling attempt — including normal "in_progress" responses — is logged as a warning. This will flood warning logs. Change to debug for polling, info for success, warn/error for actual problems.

13. preconf_only defaults to true [realtime/src/utils/config.rs:57-59]

let preconf_only = std::env::var("PRECONF_ONLY")
    .map(|v| v.to_lowercase() != "false" && v != "0")
    .unwrap_or(true);

The default is true (preconf-only mode, no proving/proposing). This is a safe default but unintuitive — not setting PRECONF_ONLY means the node never proposes. The env var semantics are also inverted from convention (any set value except "false"/"0" enables it). Consider defaulting to false and using standard boolean parsing.

14. #[allow(clippy::too_many_arguments)] on constructors with 12-16 args

BatchManager::new (12 args), Node::new (16 args). These should be wrapped in config structs. The NodeConfig pattern already exists in the codebase — extend it for the real-time-specific parameters.

15. finalize_if_needed uses u16::MAX as sentinel [realtime/src/node/proposal_manager/batch_builder.rs:222-228]

u16::try_from(...).unwrap_or(u16::MAX)

If the block count overflows u16, u16::MAX is passed to is_within_block_limit. Depending on the limit, this may accidentally finalize or not finalize the batch correctly.

16. 60-second status cleanup window [realtime/src/node/proposal_manager/async_submitter.rs:363-371]

After a UserOp completes, its status entry is deleted after 60 seconds. Users polling after this window see UserOpNotFound rather than Executed. This should be longer (e.g., 10 minutes) for a production system, or the cleanup should be skipped for terminal states.


Architecture Notes

The overall design is sound for a POC. Key observations:

  • The realtime/ fork correctly uses merkle proofs for L1 signal verification (vs. Shasta's POC signatures).
  • The async submission pattern (AsyncSubmitter + oneshot channels) cleanly separates proving from the main loop.
  • The sled-backed status store is a pragmatic choice for a POC but a proper DB would be needed in production.
  • The multicall bundle ordering (UserOp → Propose → L1Call) is clearly documented and the construction in ProposalTxBuilder matches the documented sequence.

- Fix silent drop of multiple UserOps/L1Calls in proposal tx builder
- Replace expect() with ok_or_else() to comply with clippy rules
- Make bridge RPC address configurable via BRIDGE_RPC_ADDR env var
- Pass raiko config through EthereumL1Config instead of re-reading env
- Reduce Raiko polling log level from warn to debug

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@smartprogrammer93
Copy link
Copy Markdown
Contributor

@claude review

@claude
Copy link
Copy Markdown

claude bot commented Mar 29, 2026

Claude finished @smartprogrammer93's task in 2m 22s —— View job


PR Review: [POC] feat: same slot composability and real time proving

Tasks

  • Gather PR context
  • Read realtime/ crate core files
  • Read shasta/ modifications
  • Read common infrastructure changes
  • Analyze architecture and post review

Summary

This is a well-structured POC for same-slot L1↔L2↔L1 composability and real-time ZK proving. The architecture is clear: surge_sendUserOp → route by chainId → include in L2 block → submit multicall bundle (UserOp + Propose + L1Call). The realtime/ crate is cleanly separated with good isolation of concerns.

Several findings from the previous review pass have been addressed (.first() silent data loss, .expect() panic, Raiko log flooding, hardcoded bridge RPC addr, duplicate RealtimeConfig reads). Below are remaining issues, from most to least critical.


Critical

1. Hardcoded Foundry private key in two Shasta files

shasta/src/l1/execution_layer.rs:123-127 and shasta/src/node/proposal_manager/bridge_handler.rs:195-199 both hardcode the Foundry anvil key 0xac0974bec... for checkpoint signing and L1 signal slot proof signing. Even in a POC branch, this creates risk of accidental deployment.

The realtime/ fork correctly avoids signature-based proofs entirely and uses merkle proofs instead. The comments acknowledge this ("must be a merkle proof in production") but leave the unsafe path active.

Recommended: Source the key from an env var at minimum, or gate the Shasta bridge path behind a compile-time feature flag. Fix this →


Medium

2. assert! panic in AsyncSubmitter::submit [realtime/src/node/proposal_manager/async_submitter.rs:76]

assert!(
    !self.is_busy(),
    "Cannot submit while another submission is in flight"
);

assert! will panic in production if the caller's guard (is_submission_in_progress()) is somehow bypassed. In async code where BatchManager can be called from multiple paths, this is fragile. Should return an Err instead. Fix this →

3. Duplicated blob sidecar construction [realtime/src/node/proposal_manager/async_submitter.rs:126-146 and realtime/src/l1/proposal_tx_builder.rs:144-173]

The blob sidecar is built twice from the same manifest data: once in async_submitter.rs to extract blob hashes for the Raiko request, and again in proposal_tx_builder.rs for the actual transaction. These two paths must produce identical blobs or the ZK proof will be invalid. A refactor that builds the sidecar once and stores it on the Proposal struct would eliminate this risk entirely. Fix this →

4. UserOp ID counter resets on restart; sled DB persists [realtime/src/node/proposal_manager/bridge_handler.rs:125]

next_id: Arc::new(AtomicU64::new(1)),

On restart, new UserOps get IDs starting from 1 again, while the sled status DB retains old entries. Clients polling stale IDs could receive mismatched status responses. IDs should be seeded from the sled DB's current max key, or use a timestamp/UUID-based scheme. Fix this →

5. Bridge/signal service addresses derived from chain ID via string formatting [realtime/src/l2/execution_layer.rs:55-62]

let bridge_address: Address =
    format!("0x{}{}01", chain_id_string, "0".repeat(zeros_needed)).parse()?;
let signal_service: Address =
    format!("0x{}{}05", chain_id_string, "0".repeat(zeros_needed)).parse()?;

This derivation is undocumented and fragile. If chain ID format changes or contracts are deployed differently, silent misconfiguration occurs. Should be sourced from config or the L2 node's contract registry. Fix this →

6. Fixed gas limits on L2 transactions [realtime/src/l2/execution_layer.rs:264-265 and 337-338]

gas_limit: 3_000_000,
max_fee_per_gas: 1_000_000_000,  // 1 gwei

Both construct_l2_user_op_tx and construct_l2_call_tx use hardcoded gas values. At 1 gwei max fee, transactions fail if L2 base fee is higher. These should be configurable or estimated dynamically.

7. transfer_eth_from_l2_to_l1 stub silently succeeds [realtime/src/l2/execution_layer.rs:142-151]

pub async fn transfer_eth_from_l2_to_l1(...) -> Result<(), Error> {
    warn!("Implement bridge transfer logic here");
    Ok(())
}

Returns Ok(()) while doing nothing. Any caller expecting funds to bridge will silently get a success response. Should return an explicit Err until implemented.

8. Sled DB path hardcoded [realtime/src/node/proposal_manager/bridge_handler.rs:120]

let status_store = UserOpStatusStore::open("data/user_op_status")?;

The relative path data/user_op_status is hardcoded and depends on the working directory at runtime. Should be configurable via RealtimeConfig or an environment variable.


Low Priority / Style

9. No tests for realtime/ crate

The entire new realtime/ crate (24+ source files) has zero unit tests. At minimum the routing logic in BridgeHandler::next_user_op_routed(), the BatchBuilder capacity checks, and the ProposalTxBuilder multicall construction should have unit tests.

10. preconf_only defaults to true [realtime/src/utils/config.rs:61-63]

let preconf_only = std::env::var("PRECONF_ONLY")
    .map(|v| v.to_lowercase() != "false" && v != "0")
    .unwrap_or(true);

Default is true (preconf-only, no proposing). Not setting PRECONF_ONLY means the node never proposes — unintuitive for operators. Consider defaulting to false.

11. #[allow(clippy::too_many_arguments)] on large constructors

BatchManager::new has 13 arguments. These should be consolidated into a config struct, following the NodeConfig pattern already present in the codebase.

12. 60-second status cleanup window [realtime/src/node/proposal_manager/async_submitter.rs:367-372]

After a UserOp completes, its status entry is removed after 60 seconds. Clients polling after this window receive UserOpNotFound instead of Executed. Should be at least 10 minutes for a production system, or terminal states should never be cleaned up.


Architecture Notes

  • The multicall bundle ordering (UserOp → Propose → L1Call) is correctly documented and the ProposalTxBuilder construction matches precisely. This is the core invariant for same-slot composability.
  • The realtime/ merkle proof approach for L1 signal verification is the correct production path (vs. the Shasta POC signature approach). Good separation.
  • The async submission pattern (AsyncSubmitter + oneshot channels) cleanly decouples proving latency from the main loop.
  • The sled-backed status store is pragmatic for a POC but a proper DB with indexing would be needed in production.
  • BridgeHandler::next_user_op_routed() only dequeues one UserOp per block production cycle. For workloads with many concurrent UserOps this will be a throughput bottleneck — worth noting for future work.

- Collapse nested if statements (clippy::collapsible_if)
- Use u8::from() instead of as cast (clippy::cast_lossless)
- Allow async_fn_in_trait on internal Surge traits
- Ignore RUSTSEC-2024-0384 (instant via sled, no safe upgrade)
- Remove stale RUSTSEC-2026-0002 ignore

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
smartprogrammer93 and others added 2 commits March 30, 2026 02:21
- Fix unused imports in realtime crate
- Add #[allow(dead_code)] for POC code not yet wired up
- Replace unwrap() with ok_or_else() in shasta proposal tx builder
- Collapse nested if statements in shasta execution layer
- Fix bool-to-u8 cast, useless conversions, clone-on-Copy
- Add #[allow(clippy::large_enum_variant)] on UserOpRouting
- Ignore RUSTSEC-2025-0057 (fxhash via sled, no safe upgrade)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@smartprogrammer93 smartprogrammer93 marked this pull request as ready for review March 29, 2026 23:42
@smartprogrammer93 smartprogrammer93 marked this pull request as draft March 30, 2026 02:15
gehlotanish and others added 3 commits March 30, 2026 21:59
* fix: gh action to promote tag on prod

* fix: gh action to promote tag on prod
#929)

* feat: auto-reorg L2 blocks on L1 transaction failure instead of shutting down

When an L1 multicall reverts, estimation fails, or the async submission
(e.g. Raiko proof fetch) fails, Catalyst now reorgs the preconfirmed L2
blocks back to the last finalized state and resumes the preconfirmation
loop immediately, instead of triggering a critical shutdown.

The recovery reuses the existing reorg_unproposed_blocks() machinery
which reads lastFinalizedBlockHash from L1 on-chain state and calls
the driver's reorgStaleBlock endpoint to remove orphaned L2 blocks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add surge_txStatus RPC method for tracking any L2 tx lifecycle

Adds a new `surge_txStatus` JSON-RPC method that accepts either a
`userOpId` (existing behavior) or a `txHash` (new: any L2 transaction).

For txHash lookups, the handler queries L2 via eth_getTransactionByHash
to find the block number, then compares against the last finalized block
number on L1 to determine status:
- block <= finalized → Executed
- block > finalized → ProvingBlock { block_id }
- tx pending → error (not yet in a block)
- tx not found → error

A shared AtomicU64 tracks last_finalized_block_number, updated by
BatchManager on successful submission and during reorg recovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: return early after reorg recovery to avoid stale head verification

After recover_from_failed_submission() reorgs L2 blocks, the l2_slot_info
captured at the top of the heartbeat is stale (references pre-reorg head).
The head verifier would detect this as an unexpected head and crash.

Fix: return Ok(()) immediately after recovery so the next heartbeat
picks up fresh L2 state. Also inline check_transaction_error_channel
to apply the same early-return pattern for transaction error recovery.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: mark user ops as Rejected when Raiko proof fetch fails

When the Raiko proof request fails (network error, prover down, etc.),
user ops were left stuck at ProvingBlock status with no transition to
Rejected. Now they are explicitly marked as Rejected with the failure
reason before returning the error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: recover from unexpected L2 head instead of crashing

When an L1 tx reverts after optimistic success reporting, the geth/driver
may externally reorg L2 before the TransactionReverted error arrives in
the channel. The head verifier would detect the unexpected head and crash
before recovery could run.

Now the head verifier triggers recover_from_failed_submission() instead
of cancel_on_critical_error(), allowing the node to resync with L1 state
and continue operating.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: apply cargo fmt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolve conflicts:
- shasta/src/l2/execution_layer.rs: Combined our anchorV4WithSignalSlotsCall
  rename with master's anchorV3 fallback logic
- Cargo.lock: Regenerated from master's base

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants