Skip to content

Making ThroughputViolationMap creation logic more robust #1006

Merged
harshOSS merged 1 commit into
linkedin:masterfrom
harshOSS:OSSBugFix
May 5, 2026
Merged

Making ThroughputViolationMap creation logic more robust #1006
harshOSS merged 1 commit into
linkedin:masterfrom
harshOSS:OSSBugFix

Conversation

@harshOSS
Copy link
Copy Markdown
Collaborator

@harshOSS harshOSS commented May 1, 2026

Fix NPE in onAssignmentChange and improve atomicity of populateThroughputViolatingTopicsMap

Problem

Two bugs in Coordinator.java:

  1. NPE in onAssignmentChange: getDatastreamTask(task) can return null for tasks not yet
    visible to the instance (e.g., during a partition rebalance or when a newly assigned task hasn't
    been written to ZK yet). The return value was passed directly into new DatastreamGroup(...)
    without a null check, causing a NullPointerException.

  2. Non-atomic map update in populateThroughputViolatingTopicsMap: The write lock was acquired
    after the map was already cleared but before iteration completed. If an exception was thrown
    mid-iteration, _throughputViolatingTopicsMap would be left empty — dropping all previously
    cached throughput violation data from concurrent readers holding the read lock.

Fix

NPE (onAssignmentChange): Added .filter(Objects::nonNull) after the getDatastreamTask
call to skip null tasks rather than propagating the NPE.

Atomicity (populateThroughputViolatingTopicsMap): Moved the iteration outside the lock into a
local next map. The write lock is now held only for the clear() + putAll(next) swap, making
the visible state transition atomic. A mid-iteration failure no longer corrupts the live map. Also
tightened topic string parsing: .map(String::trim) is now applied before
.filter(s -> !s.isEmpty()) to correctly handle whitespace-only entries.

Changes

  • datastream-server/src/main/java/com/linkedin/datastream/server/Coordinator.java

Testing

  • Existing unit tests pass.
  • The NPE scenario is covered by the onAssignmentChange path for instances receiving tasks not
    yet committed to ZK.

@harshOSS harshOSS changed the title Fixing npe in onassignmentchange and fixing throughputvoilationmapcre… Making ThroughputVoilationMap creation logic more robust May 5, 2026
@harshOSS harshOSS changed the title Making ThroughputVoilationMap creation logic more robust Making ThroughputViolationMap creation logic more robust May 5, 2026
Copy link
Copy Markdown
Collaborator

@khandelwal-ayush khandelwal-ayush left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@harshOSS harshOSS merged commit bcfbacf into linkedin:master May 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants