Skip to content

[ENHANCEMENT] Use unique UUIDs for Worker Locking in Event Producer instead of Trigger ID #2123

@jcscottiii

Description

@jcscottiii

Currently, the EventProducer uses the incoming triggerID (from the Pub/Sub message ID or Cloud Event ID) as the workerID when acquiring the SavedSearchState lock in Spanner.

The Problem:
This approach is unsafe for distributed locking. If a worker process stalls (e.g., GC pause) and the lock expires, a second worker might pick up the retry of the same message. Since the triggerID is identical, the second worker acquires the lock with the same ID. If the first "zombie" worker wakes up, the database cannot distinguish between them because they share the same ID. This defeats the fencing token protection and could lead to data corruption (Split Brain) if the zombie worker overwrites the state.

The Solution:
The ProcessSearch method in the EventProducer must be updated to generate a new, random UUID (v4) at the start of every execution. This UUID should be used exclusively as the workerID for locking operations:

  1. TryAcquireSavedSearchStateWorkerLock
  2. PublishSavedSearchNotificationEvent (for the fencing check)
  3. ReleaseSavedSearchStateWorkerLock

Acceptance Criteria:

  • ProcessSearch generates a unique workerUUID.
  • The workerUUID is passed to the lock acquisition method instead of triggerID.
  • The workerUUID is passed to the publish/release methods to verify ownership.
  • The triggerID is maintained for tracing and as the EventID for the resulting notification, but it is not used for the lock identity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions