feat: consensus fixes and block sync#72
Merged
jorgeantonio21 merged 6 commits intomainfrom Feb 19, 2026
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes multiple interconnected consensus bugs that caused the
test_multi_node_happy_pathE2E test to be flaky (~70% pass rate → 100% pass rate across 40+ consecutive runs). Theroot causes were:
SelectParentconsistency across nodesInvalidNonceblock validation failures — cascade nullification didn't clean upStateDiffentries from intermediate views, leaving stalenonce increments in pending state
Changes by file
consensus/src/consensus_manager/state_machine.rsstart_viewandcurrent_view. Per the paper, intermediate views were alreadyprocessed through normal consensus flow — nullifying them corrupts their M-notarization state and causes
SelectParentto return inconsistent results across nodes. Only the currentview is now nullified.
rollback_pending_diffs_in_range(start_view, current_view)to remove staleStateDiffentries. All correct nodeseventually cascade from the same
start_view(2f+1 guarantee), so this converges to consistent pending state.forceparameter tonullify_view()to distinguish cascade nullification (bypasseshas_voted/evidence checks) from normal timeout/Byzantinenullification. New
create_nullify_for_cascade()onViewContextsupports this.MAX_FINALIZATIONS_PER_PASS = 5loop inprogress_to_next_view()that retries pending finalizations. Limits per pass to avoid starvingmessage processing (each finalization involves storage writes).
non_finalized_viewswould panic on the nexttick().ShouldRequestBlockandShouldRequestBlocksevent handlers that broadcastBlockRecoveryRequestmessages.ShouldNullifyRange: Same fix as cascade — only nullify current view, roll back pending diffs in range.consensus/src/consensus_manager/view_chain.rsfinalize_with_l_notarizationnow traces parent hashes backwards from the finalized view to build acanonical_viewsset. Views in the canonicalchain are persisted as finalized; non-canonical forks are persisted as nullified metadata. Previously, all non-nullified views were assumed to be canonical, leading to incorrect
persistence.
pending_canonical_recovery. This isdrained by
tick()to trigger targeted recovery requests.add_recovered_block(): New method that accepts recovered blocks for any view (including nullified ones). Validates block hash against M-notarization. Returns whetherL-notarization is available.
get_block_for_recovery(): New method that searches ALL storage locations — non-finalized chain,FINALIZED_BLOCKS,NULLIFIED_BLOCKS, andNON_FINALIZED_BLOCKS— usingboth view height and block hash.
oldest_finalizable_view(): New method that finds the oldest view with L-notarization (n-f votes), a valid block, and a valid parent chain. Parent nullification check relaxed— M-notarization + nullification can coexist (n >= 5f+1).
progress_after_cascade(): New method for view progression after cascade. Unlikeprogress_with_nullification, does NOT require an aggregated nullification proof — only localnullification (
has_nullified).rollback_pending_diffs_in_range(): Removes pendingStateDiffentries for views in[start_view, end_view]to preventInvalidNonceerrors.remove_pending_diff(): Single-view variant called duringmark_nullified().has_nullified) in intermediate view checks, not justaggregated nullification.
non_finalized_view_numbers_range(): Compute range from actual HashMap keys instead of arithmetic (handles gaps from retained recovery views).persist_m_notarized_view(): Changed to persist asFINALIZED_BLOCKS(wasNON_FINALIZED_BLOCKS) — M-notarized canonical views are committed transitively via thedescendant's L-notarization.
persist_nullified_view_or_metadata(): New method for non-canonical views without nullification artifacts.route_vote()now rejects votes for nullified views early (Lemma 5.3: no L-notarization possible).on_m_notarization(): Added M-notarization guard — only publishesStateDiffto pending state if M-notarization actually exists.add_new_view_block()now callson_m_notarization()retroactively when a block arrives after votes already crossed the M-notarization threshold.find_parent_view(): Changed visibility fromfntopub(crate)for use in leader proposal gating.finalize_with_l_notarization(): Explicitly rejects finalization of nullified views.consensus/src/consensus_manager/view_manager.rstick(): Detects M-notarized views missing blocks and emitsShouldRequestBlock/ShouldRequestBlocksevents. Uses per-view 500ms cooldown to avoidflooding.
MAX_BATCH_RECOVERY = 5per tick.handle_block_recovery_request()allows the leader + F backup responders (next in round-robin). With at most F faulty nodes, this guarantees at leastone honest responder. Previously only the leader responded (single point of failure).
handle_block_recovery_response(): Adds recovered block, clears cooldown, triggers finalization if L-notarization available or checks for deferred finalization viaoldest_finalizable_view().replicas miss blocks.
finalize_view(), moves entries fromViewChain::pending_canonical_recoveryto persistentcanonical_recovery_pendingset. Cleaned upeach tick when blocks arrive or views are GC'd.
mark_nullified(): Now also removes the view's pendingStateDiff.rollback_pending_diffs_in_range(): Delegate toViewChain.progress_after_cascade(): Creates new view context and advances chain without requiring aggregated nullification proof.create_genesis_vote), genesis M-notarization, and genesis ViewContext at view 0. ViewChain now starts directly at viewconsensus/src/consensus_manager/events.rsShouldRequestBlock { view, block_hash }andShouldRequestBlocks { requests }variants.consensus/src/consensus.rsBlockRecoveryRequest { view, block_hash }andBlockRecoveryResponse { view, block }toConsensusMessage.consensus/src/consensus_manager/view_context.rscreate_nullify_for_cascade()— creates nullify message for cascade without requiring Byzantine evidence or timeout.p2p/src/service.rstokio::select!branches to prioritize message reception over sleep timeout during bootstrap. Prevents delayed peer discovery when both are ready simultaneously.tests/src/e2e_consensus/scenarios.rsTest plan
cargo build --release— cleancargo clippy --all-targets --all-features -- -D warnings— cleancargo test --package consensus— 563/563 passedtest_multi_node_happy_pathE2E — 20/20 passed (×2 consecutive runs = 40/40)InvalidNonceblock validation errors no longer occur