-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Currently, the EventProducer uses the incoming triggerID (from the Pub/Sub message ID or Cloud Event ID) as the workerID when acquiring the SavedSearchState lock in Spanner.
The Problem:
This approach is unsafe for distributed locking. If a worker process stalls (e.g., GC pause) and the lock expires, a second worker might pick up the retry of the same message. Since the triggerID is identical, the second worker acquires the lock with the same ID. If the first "zombie" worker wakes up, the database cannot distinguish between them because they share the same ID. This defeats the fencing token protection and could lead to data corruption (Split Brain) if the zombie worker overwrites the state.
The Solution:
The ProcessSearch method in the EventProducer must be updated to generate a new, random UUID (v4) at the start of every execution. This UUID should be used exclusively as the workerID for locking operations:
TryAcquireSavedSearchStateWorkerLockPublishSavedSearchNotificationEvent(for the fencing check)ReleaseSavedSearchStateWorkerLock
Acceptance Criteria:
ProcessSearchgenerates a uniqueworkerUUID.- The
workerUUIDis passed to the lock acquisition method instead oftriggerID. - The
workerUUIDis passed to the publish/release methods to verify ownership. - The
triggerIDis maintained for tracing and as theEventIDfor the resulting notification, but it is not used for the lock identity.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status