Skip to content

fix/hub subscription skips lastpartial#49

Merged
sduchesneau merged 5 commits into
developfrom
fix/hub-subscription-skips-lastpartial
Jun 10, 2026
Merged

fix/hub subscription skips lastpartial#49
sduchesneau merged 5 commits into
developfrom
fix/hub-subscription-skips-lastpartial

Conversation

@sduchesneau

Copy link
Copy Markdown
Contributor

This PR fixes a case where an optimisation of partial blocks would skip sending the
last partial of a block when there was accumulation. This resulted in the downstream process
having to handle UNDOs and sometimes generating spurious UNDO sequences.

  • fix: hub subscription must not skip past a StepNewPartial block
  • fix: ignore non-last partials once a block is settled by its lastPartial
  • add test case for lastPartial duplication

sduchesneau and others added 3 commits June 10, 2026 12:59
getLatestPendingVersionOfCandidateBlock skips intermediate partial
versions of a block when the subscriber lags, but StepNewPartial also
matches StepPartial, so the closing 'last partial' was discarded
whenever a stale partial version of the same block followed it in the
channel. Downstream consumers then never saw the block's closing,
breaking partial-block sequences for every block while lag persisted.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
After the 'last partial' of a block has been sent, a divergent partial
version with a higher PartialIndex (e.g. from a lagging reader on a
different flashblocks feed, or a post-seal republication) still passed
triggersNewLongestChain. That put a non-last partial behind the
StepNewPartial in subscription channels, which combined badly with the
catch-up skipping (see previous commit) and re-opened a settled block
for downstream consumers. The block is settled: such versions are noise
until a reorg replaces it (full block or undo).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@sduchesneau sduchesneau requested review from billettc and maoueh June 10, 2026 18:08

@GabrielCartier GabrielCartier left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the fix — logic is correct on both layers. Verified by building + running the new tests, and by reverting each change to confirm the test that guards it.

hub/subscription.goStepNewPartial = StepNew|StepPartial (steps.go:36), so !step.Matches(StepPartial) || step.Matches(StepNew) delivers the closing last-partial while still skipping plain partials. Minimal and precise: only StepNewPartial's behavior actually changes — every other StepNew-bearing step already returned via the first clause. ✅

forkable/forkable.go — the lastBlockSent.LastPartial && !blk.LastPartial guard is consistent with the existing last-partial semantics (forkable.go:442 and :774). Reverting it makes the ignore non-last partial after lastPartial case fail, so that test does guard the change. ✅

One thing to flag (not blocking): the subscription fix stops the last-partial from being eaten, but it does not drop the trailing stale post-seal partial — the repro delivers partial 0004p4(idx=4) right after 0004a. That suppression lives entirely in the forkable layer. Fine as long as forkable is always upstream of the hub subscription in prod (it is); a subscriber fed blocks directly would still observe a post-seal partial.

Nits inline.

Comment thread forkable/forkable_test.go
bTestBlock("00000003a", "00000002a"), // block{3}
partialBlock("00000004b", "00000003a", 2), // partialBlock{4, idx=2}
lastPartialBlock("00000004a", "00000003a", 3), // partialBlock{4, idx=3, LAST}: block 4 is settled
lastPartialBlock("00000004a", "00000003a", 4), // partialBlock{4, idx=4, LAST}: block 4 is settled again, from another reader that had to bump index again

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 nit: this case passes with the fix reverted — the duplicate 00000004a is dropped by same-ID dedup in AddLink (returns exists), not by the new LastPartial && !blk.LastPartial branch (that branch needs !blk.LastPartial, but this block IS a last-partial). To actually exercise the new logic, give the second close a different ID — but note a different-ID, higher-index last-partial then hits return blk.PartialIndex > p.lastBlockSent.PartialIndex = true and triggers, so assert the behavior you actually want.

Comment thread forkable/forkable_test.go
lastLIBSent: tinyBlk("00000002a"),
parentBlock: tinyBlk("00000003a"),
},
// ignored 4e: block 4 was settled by its last partial

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 nit: copy-paste comment — there is no 4e block in this case (that's case 1). The ignored block here is the duplicate 00000004a idx=4. Same at L2657.

push(bstream.StepPartial, reproBlock("0006p1", 6, 1, false))

go func() {
time.Sleep(500 * time.Millisecond)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 nit: fixed time.Sleep(500ms) adds 0.5s wall-clock and couples the test to timing — it only works because every block is pre-buffered before Run(). Prefer shutting down once len(received) reaches the expected count.

sub.Run()

for _, r := range received {
fmt.Println(" delivered:", r)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 nit: fmt.Println spams test output. Drop it or use t.Log.

@sduchesneau sduchesneau merged commit c1572d2 into develop Jun 10, 2026
3 checks passed
@sduchesneau sduchesneau deleted the fix/hub-subscription-skips-lastpartial branch June 10, 2026 18:21
@sduchesneau sduchesneau restored the fix/hub-subscription-skips-lastpartial branch June 10, 2026 18:31
@sduchesneau sduchesneau deleted the fix/hub-subscription-skips-lastpartial branch June 10, 2026 18:31
sduchesneau added a commit to streamingfast/substreams that referenced this pull request Jun 10, 2026
Brings streamingfast/bstream#49: the hub subscription no longer skips
past a StepNewPartial under subscriber lag, and the forkable ignores
non-last partial versions of a block already settled by its lastPartial.
This also guarantees a partial block is never delivered before its
parent, so the pipeline needs no parent-link checking of its own.
sduchesneau added a commit to streamingfast/substreams that referenced this pull request Jun 10, 2026
#799)

* handle partial-block reorgs with undo, not error

A partial block that does not extend the transactions already sent now
triggers a BlockUndoSignal (last valid = parent) instead of killing the
stream with 'first transactions does not match expected hash'. After
such an undo, further non-last partials of that block are muted until
its last partial or full block settles it, so churning versions produce
a single undo. Repeated undos to the same junction are no longer
suppressed once partial data has been sent.

* bump bstream: fix lost last-partials in live streams

Brings streamingfast/bstream#49: the hub subscription no longer skips
past a StepNewPartial under subscriber lag, and the forkable ignores
non-last partial versions of a block already settled by its lastPartial.
This also guarantees a partial block is never delivered before its
parent, so the pipeline needs no parent-link checking of its own.
@sduchesneau

Copy link
Copy Markdown
Contributor Author

FYI @maoueh

  • does anything rely on every StepIrreversible?
Area Verdict Why
fluxdb (your prime suspect) Not reachable Embeds the forkable hub in-process and resumes by block number with a nil cursor — it never reconnects a wire stream with a below-floor cursor, so the clamp path can't fire. It is internally fragile (write.go:331 isNextBlock + per-IRR-block accumulation in pipeline.go:369 hard-assume contiguous IRR), but the gap never reaches it as wired today.
substreams sinks (sql, files, noop, core) Safe by construction Finality is driven by the final_block_height field on every BlockScopedData (range prune <= LIB), not by per-block STEP_FINAL notifications. A LIB jump just prunes more in one pass. This is the key reason the change is safe over the wire.
firehose-core + substreams engine Safe (one LOW leak) All per-final-block persistence (execout cache, index/store writers) runs on Tier2/linear merged streams that are contiguous by construction; Tier1 builds those writers as nil. The only residual: reversibleBuffers/reversibleOutputs maps delete-on-exact-block instead of prune-<=-LIB, so each skipped final leaks one map entry on Tier1. Bounded, no data loss, no panic.
firehose-{ethereum,solana,near,cosmos} Not at risk No chain-specific forkable/StepIrreversible consumer; they delegate the whole pipeline to firehose-core.

Bottom line: the "best-effort StepIrreversible" caveat is safe for every consumer in the substreams ecosystem. Sinks don't use per-block IRR at all (they use final_block_height), and the stateful per-IRR consumers (fluxdb, firehose-core Tier2) only ever run on contiguous merged-block streams, never the reconnect path. The single nit worth a separate PR (not this repo): make firehose-core/substreams' two Tier1 reversible maps prune <= LIB instead of deleting the exact key, to close the bounded memory leak.

(leak is bound to a single request and minimal, not worth worrying about it)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants