fix/hub subscription skips lastpartial#49
Conversation
getLatestPendingVersionOfCandidateBlock skips intermediate partial versions of a block when the subscriber lags, but StepNewPartial also matches StepPartial, so the closing 'last partial' was discarded whenever a stale partial version of the same block followed it in the channel. Downstream consumers then never saw the block's closing, breaking partial-block sequences for every block while lag persisted. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
After the 'last partial' of a block has been sent, a divergent partial version with a higher PartialIndex (e.g. from a lagging reader on a different flashblocks feed, or a post-seal republication) still passed triggersNewLongestChain. That put a non-last partial behind the StepNewPartial in subscription channels, which combined badly with the catch-up skipping (see previous commit) and re-opened a settled block for downstream consumers. The block is settled: such versions are noise until a reorg replaces it (full block or undo). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
GabrielCartier
left a comment
There was a problem hiding this comment.
Reviewed the fix — logic is correct on both layers. Verified by building + running the new tests, and by reverting each change to confirm the test that guards it.
hub/subscription.go — StepNewPartial = StepNew|StepPartial (steps.go:36), so !step.Matches(StepPartial) || step.Matches(StepNew) delivers the closing last-partial while still skipping plain partials. Minimal and precise: only StepNewPartial's behavior actually changes — every other StepNew-bearing step already returned via the first clause. ✅
forkable/forkable.go — the lastBlockSent.LastPartial && !blk.LastPartial guard is consistent with the existing last-partial semantics (forkable.go:442 and :774). Reverting it makes the ignore non-last partial after lastPartial case fail, so that test does guard the change. ✅
One thing to flag (not blocking): the subscription fix stops the last-partial from being eaten, but it does not drop the trailing stale post-seal partial — the repro delivers partial 0004p4(idx=4) right after 0004a. That suppression lives entirely in the forkable layer. Fine as long as forkable is always upstream of the hub subscription in prod (it is); a subscriber fed blocks directly would still observe a post-seal partial.
Nits inline.
| bTestBlock("00000003a", "00000002a"), // block{3} | ||
| partialBlock("00000004b", "00000003a", 2), // partialBlock{4, idx=2} | ||
| lastPartialBlock("00000004a", "00000003a", 3), // partialBlock{4, idx=3, LAST}: block 4 is settled | ||
| lastPartialBlock("00000004a", "00000003a", 4), // partialBlock{4, idx=4, LAST}: block 4 is settled again, from another reader that had to bump index again |
There was a problem hiding this comment.
🔵 nit: this case passes with the fix reverted — the duplicate 00000004a is dropped by same-ID dedup in AddLink (returns exists), not by the new LastPartial && !blk.LastPartial branch (that branch needs !blk.LastPartial, but this block IS a last-partial). To actually exercise the new logic, give the second close a different ID — but note a different-ID, higher-index last-partial then hits return blk.PartialIndex > p.lastBlockSent.PartialIndex = true and triggers, so assert the behavior you actually want.
| lastLIBSent: tinyBlk("00000002a"), | ||
| parentBlock: tinyBlk("00000003a"), | ||
| }, | ||
| // ignored 4e: block 4 was settled by its last partial |
There was a problem hiding this comment.
🔵 nit: copy-paste comment — there is no 4e block in this case (that's case 1). The ignored block here is the duplicate 00000004a idx=4. Same at L2657.
| push(bstream.StepPartial, reproBlock("0006p1", 6, 1, false)) | ||
|
|
||
| go func() { | ||
| time.Sleep(500 * time.Millisecond) |
There was a problem hiding this comment.
🔵 nit: fixed time.Sleep(500ms) adds 0.5s wall-clock and couples the test to timing — it only works because every block is pre-buffered before Run(). Prefer shutting down once len(received) reaches the expected count.
| sub.Run() | ||
|
|
||
| for _, r := range received { | ||
| fmt.Println(" delivered:", r) |
There was a problem hiding this comment.
🔵 nit: fmt.Println spams test output. Drop it or use t.Log.
Brings streamingfast/bstream#49: the hub subscription no longer skips past a StepNewPartial under subscriber lag, and the forkable ignores non-last partial versions of a block already settled by its lastPartial. This also guarantees a partial block is never delivered before its parent, so the pipeline needs no parent-link checking of its own.
#799) * handle partial-block reorgs with undo, not error A partial block that does not extend the transactions already sent now triggers a BlockUndoSignal (last valid = parent) instead of killing the stream with 'first transactions does not match expected hash'. After such an undo, further non-last partials of that block are muted until its last partial or full block settles it, so churning versions produce a single undo. Repeated undos to the same junction are no longer suppressed once partial data has been sent. * bump bstream: fix lost last-partials in live streams Brings streamingfast/bstream#49: the hub subscription no longer skips past a StepNewPartial under subscriber lag, and the forkable ignores non-last partial versions of a block already settled by its lastPartial. This also guarantees a partial block is never delivered before its parent, so the pipeline needs no parent-link checking of its own.
|
FYI @maoueh
Bottom line: the "best-effort StepIrreversible" caveat is safe for every consumer in the substreams ecosystem. Sinks don't use per-block IRR at all (they use (leak is bound to a single request and minimal, not worth worrying about it) |
This PR fixes a case where an optimisation of partial blocks would skip sending the
last partial of a block when there was accumulation. This resulted in the downstream process
having to handle UNDOs and sometimes generating spurious UNDO sequences.