Skip to content

Conversation

@krishna-st
Copy link
Contributor

Implements a drain mode for Pinot Minion instances, allowing operators to gracefully scale down minions by preventing new task assignments while allowing existing tasks to complete.

  • Extended PUT /instances/{instanceName}/state to support DRAIN state for minion instances
  • Draining changes minion tag from minion_untaggedminion_drained, preventing Helix from assigning new tasks
  • Leverages Helix's native tag-based task assignment mechanism (.setInstanceGroupTag())

@krishna-st krishna-st changed the title Kk/minion drain Add Support to tag a minion as Drained Dec 16, 2025
@codecov-commenter
Copy link

codecov-commenter commented Dec 16, 2025

Codecov Report

❌ Patch coverage is 59.45946% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.30%. Comparing base (624b90b) to head (dc70362).
⚠️ Report is 64 commits behind head on master.

Files with missing lines Patch % Lines
...ntroller/helix/core/PinotHelixResourceManager.java 62.06% 6 Missing and 5 partials ⚠️
...er/api/resources/PinotInstanceRestletResource.java 42.85% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17375      +/-   ##
============================================
+ Coverage     63.24%   63.30%   +0.06%     
  Complexity     1474     1474              
============================================
  Files          3147     3162      +15     
  Lines        187490   188631    +1141     
  Branches      28703    28868     +165     
============================================
+ Hits         118575   119417     +842     
- Misses        59728    59958     +230     
- Partials       9187     9256      +69     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.24% <59.45%> (+0.06%) ⬆️
java-21 63.27% <59.45%> (+0.06%) ⬆️
temurin 63.30% <59.45%> (+0.06%) ⬆️
unittests 63.30% <59.45%> (+0.06%) ⬆️
unittests1 55.60% <ø> (-0.05%) ⬇️
unittests2 34.04% <59.45%> (+0.17%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a drain mode for Pinot Minion instances to enable graceful scale-down operations. The implementation allows operators to prevent new task assignments to specific minions while existing tasks complete, using Helix's tag-based assignment mechanism.

Key Changes:

  • Added DRAIN state support to the instance state API endpoint
  • Implemented tag switching mechanism (minion_untagged → minion_drained) to prevent new task assignments
  • Added comprehensive test coverage for drain operations including edge cases and concurrency scenarios

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
CommonConstants.java Added DRAINED_MINION_INSTANCE constant for the new drain tag
PinotHelixResourceManagerMinionDrainTest.java Comprehensive unit tests covering basic drain operations, edge cases (multiple tags, empty tags, already drained), and concurrent safety
PinotInstanceRestletResourceTest.java Integration tests verifying API behavior for drain operations including error cases
PinotHelixResourceManager.java Core drain logic with drainMinionInstance() method and helper for tag replacement
StateType.java Extended enum to include DRAIN state
PinotInstanceRestletResource.java API endpoint updates to handle DRAIN state with validation for minion instances only

@krishna-st krishna-st requested a review from Copilot December 16, 2025 16:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

* @param instanceName Name of the minion instance to drain
* @return Response indicating success or failure
*/
public synchronized PinotResourceManagerResponse drainMinionInstance(String instanceName) {
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The synchronized method-level lock could become a bottleneck under concurrent drain requests across different minion instances. Consider using instance-level locking or a more granular synchronization mechanism (e.g., ConcurrentHashMap with per-instance locks) to allow concurrent draining of different minions.

Copilot uses AI. Check for mistakes.
@ApiOperation(value = "Enable/disable an instance", notes = "Enable/disable an instance")
@ApiOperation(value = "Enable/disable/drain an instance", notes = "Enable/disable/drain an instance. "
+ "DRAIN state is only applicable to minion instances and retags them from minion_untagged to minion_drained, "
+ "preventing new task assignments while allowing existing tasks to complete.")
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation should mention that the drain operation is idempotent and preserves other tags on the instance. This is important operational context for users of the API.

Suggested change
+ "preventing new task assignments while allowing existing tasks to complete.")
+ "preventing new task assignments while allowing existing tasks to complete. "
+ "The drain operation is idempotent and preserves other tags on the instance.")

Copilot uses AI. Check for mistakes.
@Consumes(MediaType.TEXT_PLAIN)
@ApiOperation(value = "Enable/disable an instance", notes = "Enable/disable an instance")
@ApiOperation(value = "Enable/disable/drain an instance", notes = "Enable/disable/drain an instance. "
+ "DRAIN state is only applicable to minion instances and retags them from minion_untagged to minion_drained, "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it mean if a minion is tagged to something else, then we cannot drain this node?

Copy link
Contributor Author

@krishna-st krishna-st Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this scenario, yes, because ideally, if the minion is tagged with something else, then there are no tasks that are currently being assigned to it, unless the table config has minionInstanceTag present

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiangfu0 noticed that we can custom-tag minions so that specific tasks get executed on those. So replacing all the tags with minion_drained

"DRAIN state only applies to minion instances. Instance '" + instanceName + "' is not a minion.",
Response.Status.BAD_REQUEST);
}
PinotResourceManagerResponse response = _pinotHelixResourceManager.drainMinionInstance(instanceName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if a minion is in drained mode, when the same minion come up again, the tag is still drained, then it won't be able to serve tasks right ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct. In case a minion is in drained mode, you would need to retag them with minion_untagged so that Helix can start assigning tasks. This is only when the minion is drained, but not removed from the controller

@krishna-st krishna-st requested a review from xiangfu0 December 22, 2025 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants