feat(compaction): materialize fully-deleted fragments when remapping indexes#7396
Draft
xuanyu-z wants to merge 1 commit into
Draft
feat(compaction): materialize fully-deleted fragments when remapping indexes#7396xuanyu-z wants to merge 1 commit into
xuanyu-z wants to merge 1 commit into
Conversation
…indexes When compaction remaps an index it drops the rows of deleted records that are materialized during the rewrite, but it does not drop fragments that were fully deleted before the compaction. Such a fragment is already gone from the manifest and was never part of a rewrite group, so its rows are absent from the remap map and survive the rebuild. The index keeps covering a fragment that no longer exists, forcing every search onto the masked slow path. Materialize those fragments during remap, in two coordinated places: - index::remap_index augments the remap map so every row of a still-covered but no-longer-present fragment maps to None, dropping it from the rebuilt index data. The rows are enumerated from the dataset version the index was built at, since the fragments are gone from the current manifest (skipped if that version is unavailable, leaving the rows to query-time masking as before). - Transaction::recalculate_fragment_bitmap intersects the recomputed coverage with the live fragments, dropping the deleted fragment from the bitmap too and restoring searches to the fast path. Kept in lockstep with the data drop so a trimmed bitmap never points at rows that are still present. Adds test_compaction_materializes_deleted_fragment.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When compaction remaps an index it drops the rows of deleted records that are materialized during the rewrite. It does not drop fully-deleted fragments — a fragment whose rows were all deleted before the compaction. Such a fragment is already gone from the manifest, so it is never part of a rewrite group; its rows are absent from the remap map and survive the index rebuild. The index keeps covering a fragment that no longer exists, so every search has to mask it out at query time (the slow path) instead of ignoring it outright.
Fix
Materialize those fragments during remap, in two coordinated places:
index::remap_index): augment the remap map so every row of a still-covered-but-no-longer-present fragment maps toNone, dropping it from the rebuilt index. The fragments are gone from the current manifest, so their rows are enumerated from the dataset version the index was built at; if that version can't be loaded (e.g. already cleaned up) the drop is skipped and the rows fall back to query-time masking as before.Transaction::recalculate_fragment_bitmap): intersect the recomputed fragment bitmap with the live fragments, so the deleted fragment is dropped from coverage too. This is what restores searches to the fast path; it is kept in lockstep with the data drop so a trimmed bitmap never points at rows that are still present.Test
test_compaction_materializes_deleted_fragment: build 10 fragments + a scalar index, fully delete one fragment, run an inline compaction, and assert (a) the index no longer covers the deleted fragment and its coverage equals the live fragments, and (b) an indexed scan over the deleted range still returns correctly. Existing optimize (91) and transaction (51) suites stay green.Notes / possible follow-ups
remapwould avoid materializing every address, at the cost of threading a fragment-id set through the per-type remap interface.Draft — opening for discussion of the approach.