Skip to content

Conversation

@tarun11Mavani
Copy link
Contributor

@tarun11Mavani tarun11Mavani commented Dec 11, 2025

Problem

As discussed in #17337, Segments are incorrectly deleted when any single replica reported zero valid documents, causing permanent data loss during server restarts and async state transitions where replicas had inconsistent validDocIds.

Solution

Added consensus requirement: only delete/compact segments when ALL replicas agree on validDoc counts
Prevents race conditions during OFFLINE→ONLINE transitions and tie-breaking logic divergence

Changes

  • New hasValidDocConsensus() method to verify replica agreement before operation to select segment for deletion or compaction

Tests

Deployed this change in a test cluster and see this warning message about skipping few segments due to validDocId mismatch.


2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Inconsistent validDoc counts across replicas for segment: rta_tarunm_index__101__181__20251215T2107Z. Expected: 607695, but found: 607791
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Inconsistent validDoc counts across replicas for segment: rta_tarunm_index__53__142__20251215T0114Z. Expected: 423539, but found: 423645
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Skipping segment rta_tarunm_index__121__155__20251215T0859Z for table rta_tarunm_index_REALTIME - no consensus on validDoc counts across replicas
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Inconsistent validDoc counts across replicas for segment: rta_tarunm_index__75__184__20251215T2158Z. Expected: 595070, but found: 594974
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Inconsistent validDoc counts across replicas for segment: rta_tarunm_index__19__171__20251215T1626Z. Expected: 569283, but found: 569224
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] CRC mismatch for segment: rta_tarunm_index__12__188__20251215T2329Z, (segmentZKMetadata=450178614, validDocIdsMetadata=3330760987)
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Skipping segment rta_tarunm_index__27__148__20251215T0504Z for table rta_tarunm_index_REALTIME - no consensus on validDoc counts across replicas
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Segment rta_tarunm_index__26__164__20251215T1301Z marked for deletion - all replicas agree it has zero valid documents
2025-12-16T06:50:01+0530 [phx51] [phx51-2u] Skipping segment rta_tarunm_index__25__156__20251215T0922Z for table rta_tarunm_index_REALTIME - no consensus on validDoc counts across replicas

@codecov-commenter
Copy link

codecov-commenter commented Dec 11, 2025

Codecov Report

❌ Patch coverage is 35.71429% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.30%. Comparing base (042549d) to head (dcee854).
⚠️ Report is 3 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...che/pinot/plugin/minion/tasks/MinionTaskUtils.java 36.84% 9 Missing and 3 partials ⚠️
...tcompactmerge/UpsertCompactMergeTaskGenerator.java 14.28% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17352      +/-   ##
============================================
- Coverage     63.34%   63.30%   -0.04%     
- Complexity     1474     1476       +2     
============================================
  Files          3162     3162              
  Lines        188591   188611      +20     
  Branches      28857    28861       +4     
============================================
- Hits         119454   119408      -46     
- Misses        59891    59957      +66     
  Partials       9246     9246              
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.28% <35.71%> (-0.01%) ⬇️
java-21 63.24% <35.71%> (-0.03%) ⬇️
temurin 63.30% <35.71%> (-0.04%) ⬇️
unittests 63.30% <35.71%> (-0.04%) ⬇️
unittests1 55.59% <100.00%> (-0.02%) ⬇️
unittests2 34.05% <35.71%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@tarun11Mavani
Copy link
Contributor Author

@tibrewalpratik17 @xiangfu0 can you please take a look at this?

@tarun11Mavani tarun11Mavani changed the title Add consensus check before selecting a segment for compaction or dele… Add consensus check before selecting a segment for compaction or deletion Dec 29, 2025
@tarun11Mavani
Copy link
Contributor Author

@xiangfu0 gentle bump on this.

* @param replicaMetadataList list of metadata from all replicas of the segment
* @return true if all replicas have consensus on validDoc counts, false otherwise
*/
public static boolean hasValidDocConsensus(String segmentName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tarun11Mavani : btw are we sure that there aren't some conditions where some deviation in valid counts across replicas are expected? (say due to live ingestion)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen this deviation due to live ingestion (have seen it after restart/replace) mostly.
I have added new metric , so we will be able to monitor if there is a large scale deviation across multiple segments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK the only issue we are aiming to solve is when we accidentally delete a segment when valid doc counts are incorrectly reported as zero.

Therefore should we just handle that case? I think by including deviations here we are increasing the scope which can have some other unintended effects (e.g. if due to some reason there's some minor discrepancy in reported valid doc counts, the upsert merge compact might progress very slowly)

Copy link
Contributor Author

@tarun11Mavani tarun11Mavani Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline, The reason for skipping a segment when a deviation is detected is that using a validDocId (which may differ from the one on another replica for some reason) could potentially result in data loss. While I can’t think of a concrete scenario where this would happen, the risk still exists, so it’s safer to take a conservative approach.
we already have a metric to track skipped segment, I will monitor this for our production tables and if we see too many segment being skipped and impacting compaction throughput, we can revisit this check.

@ankitsultana ankitsultana merged commit 7ca1984 into apache:master Jan 7, 2026
84 of 90 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants