[AIT-285] Fix clearing buffered object operations during sync sequence#2150
[AIT-285] Fix clearing buffered object operations during sync sequence#2150
Conversation
WalkthroughRenames LiveObjects' attach handler to Changes
Sequence Diagram(s)sequenceDiagram
participant Server
participant Transport
participant Channel
participant RealtimeObject
Note over Server,Transport: Non-resumed ATTACHED (HAS_OBJECTS maybe set)
Server->>Transport: ATTACHED (flags, maybe HAS_OBJECTS)
Transport->>Channel: protocol message
Channel->>RealtimeObject: processMessage -> onNonResumeAttached(hasObjects)
RealtimeObject->>RealtimeObject: clear _bufferedObjectOperations
RealtimeObject->>RealtimeObject: start new sync (OBJECT_SYNC flow)
Server->>Transport: OBJECT_SYNC... (sync messages)
Transport->>Channel: sync messages
Channel->>RealtimeObject: handleObjectSyncMessages...
RealtimeObject->>RealtimeObject: after sync end -> flush buffered ops
sequenceDiagram
participant Server
participant Transport
participant Channel
participant RealtimeObject
Note over Server,Transport: Resumed ATTACHED (RESUMED flag set)
Server->>Transport: ATTACHED (RESUMED | flags)
Transport->>Channel: protocol message
Channel->>RealtimeObject: processMessage -> (resumed attach path)
RealtimeObject->>RealtimeObject: do not clear buffered ops on attach
Server->>Transport: OBJECT_SYNC... (if any)
Transport->>Channel: sync messages
Channel->>RealtimeObject: handleObjectSyncMessages...
RealtimeObject->>RealtimeObject: flush/apply buffered ops when appropriate
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
🔇 Additional comments (8)
✏️ Tip: You can disable this entire section by setting Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ast-grep (0.40.5)test/realtime/liveobjects.test.jsThanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Buffered object operations are now only cleared on non-resume ATTACHED, not when a new OBJECT_SYNC sequence starts. This is because the realtime server caches object state at the moment of channel attachment, so all operations received since attachment must be preserved and applied. For server-initiated resync, the server is expected to send an ATTACHED message with RESUMED=false before the new OBJECT_SYNC sequence. However, if an OBJECT_SYNC is received after a completed sync without preceding ATTACHED, the client handles this as best-effort by buffering operations from that point. Since the buffer is cleared at sync completion, this works implicitly. Resolves AIT-285
e95e32e to
06b746a
Compare
mschristensen
left a comment
There was a problem hiding this comment.
LGTM, just a small comment
| // The realtime server caches the object state at the moment of channel attachment, | ||
| // so we must start fresh and buffer all incoming OBJECT messages from this point. |
There was a problem hiding this comment.
I think a more accurate description would be something like: "The realtime server will deliver a sync sequence following a non-resume attached, and guarantees that the objects sent in that sequence will at least include all operations up to the point of attachment."
| // the Objects tree needs to be re-synced | ||
| if (this._object) { | ||
| this._object.onAttached(hasObjects); | ||
| this._object.onNonResumeAttached(hasObjects); |
There was a problem hiding this comment.
Why is all of the on-ATTACHED behaviour now gated behind the absence of the RESUMED flag? The RTO4 behaviours in general are not dependent on whether it's RESUMED or not.
There was a problem hiding this comment.
Hmm, looking at it again I see that it was always gated behind that flag. But it's not consistent with what the spec says, so we need to decide the correct behaviour and ensure spec and JS are consistent.
There was a problem hiding this comment.
Thought about this a bit more. Was trying to understand how we populate the local objects state when we resume after connection recovery (internal Slack thread). Simon said:
for presence, in protocol v2 we always do a full sync after resume. maybe we might change that at some point, but right now that's what we do. I assume liveobjects does similarly
I think that if this is the case for LiveObjects, then the JS behaviour of not becoming SYNCING when resumed == true is probably not correct, since you could have a scenario like the following (I'm no longer specifically thinking about connection recovery here; in this scenario the client is already SYNCED):
i.e. this appears to be a scenario in which the assumption of RTO5f in ably/specification#416 (specifically, that the preceding ATTACHED has resumed == false) is not true.
@mschristensen are you able to clarify the Realtime behaviour here please?
There was a problem hiding this comment.
The current behaviour in realtime is the same as presence, i.e. we send a sync sequence unconditionally after a resume https://github.com/ably/realtime/blob/5a7663451577fea09446ffec626aef4f0dfc65d5/go/realtime/lib/channel/attachment.go#L916
So it looks like we should change this in realtime so that the client can know whether or not to buffer ops post ATTACHED. Ticket: https://ably.atlassian.net/browse/AIT-326
There was a problem hiding this comment.
If we were to make that change, how would a client populate its local objects state after connection recovery (i.e. it resumes an attachment but without having any local objects data yet)?
There was a problem hiding this comment.
We discussed this in standup today; conclusions:
- currently, if the
HAS_OBJECTSflag is set onATTACHED, this indicates that Realtime will send a sync sequence; thus, the LiveObjects client's condition for enteringSYNCINGand buffering ops should be to do so whenHAS_OBJECTSis set, as opposed to checkingRESUMED- note that spec points RTO4 and RTO4a do actually already mention the unconditional nature of the state sync
- we'd like to consider introducing a mechanism for Realtime to conditionally send a sync sequence (for efficiency); have captured this in AIT-330
That is, do it when we get a discontinuity, not when a new sync sequence changes, per spec changes in [1]. Integration tests ported from JS in [2] at 06b746a. All written by Claude. [1] ably/specification#416 [2] ably/ably-js#2150
That is, do it when we get a discontinuity, not when a new sync sequence starts, per spec changes in [1]. Integration tests ported from JS in [2] at 06b746a. All written by Claude. [1] ably/specification#416 [2] ably/ably-js#2150
Spec for this at ably/specification#416
Buffered object operations are now only cleared on non-resume ATTACHED, not when a new OBJECT_SYNC sequence starts. This is because the realtime server caches object state at the moment of channel attachment, so all operations received since attachment must be preserved and applied.
For server-initiated resync, the server is expected to send an ATTACHED message with RESUMED=false before the new OBJECT_SYNC sequence. However, if an OBJECT_SYNC is received after a completed sync without preceding ATTACHED, the client handles this as best-effort by buffering operations from that point. Since the buffer is cleared at sync completion, this works implicitly.
Resolves AIT-285
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.