Making ThroughputViolationMap creation logic more robust #1006
Merged
Conversation
akshayrai
approved these changes
May 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix NPE in
onAssignmentChangeand improve atomicity ofpopulateThroughputViolatingTopicsMapProblem
Two bugs in
Coordinator.java:NPE in
onAssignmentChange:getDatastreamTask(task)can returnnullfor tasks not yetvisible to the instance (e.g., during a partition rebalance or when a newly assigned task hasn't
been written to ZK yet). The return value was passed directly into
new DatastreamGroup(...)without a null check, causing a
NullPointerException.Non-atomic map update in
populateThroughputViolatingTopicsMap: The write lock was acquiredafter the map was already cleared but before iteration completed. If an exception was thrown
mid-iteration,
_throughputViolatingTopicsMapwould be left empty — dropping all previouslycached throughput violation data from concurrent readers holding the read lock.
Fix
NPE (
onAssignmentChange): Added.filter(Objects::nonNull)after thegetDatastreamTaskcall to skip null tasks rather than propagating the NPE.
Atomicity (
populateThroughputViolatingTopicsMap): Moved the iteration outside the lock into alocal
nextmap. The write lock is now held only for theclear()+putAll(next)swap, makingthe visible state transition atomic. A mid-iteration failure no longer corrupts the live map. Also
tightened topic string parsing:
.map(String::trim)is now applied before.filter(s -> !s.isEmpty())to correctly handle whitespace-only entries.Changes
datastream-server/src/main/java/com/linkedin/datastream/server/Coordinator.javaTesting
onAssignmentChangepath for instances receiving tasks notyet committed to ZK.