Skip to content

feat(compaction): materialize fully-deleted fragments when remapping indexes#7396

Draft
xuanyu-z wants to merge 1 commit into
lance-format:mainfrom
xuanyu-z:fix/materialize-deleted-fragments-on-remap
Draft

feat(compaction): materialize fully-deleted fragments when remapping indexes#7396
xuanyu-z wants to merge 1 commit into
lance-format:mainfrom
xuanyu-z:fix/materialize-deleted-fragments-on-remap

Conversation

@xuanyu-z

Copy link
Copy Markdown
Contributor

Problem

When compaction remaps an index it drops the rows of deleted records that are materialized during the rewrite. It does not drop fully-deleted fragments — a fragment whose rows were all deleted before the compaction. Such a fragment is already gone from the manifest, so it is never part of a rewrite group; its rows are absent from the remap map and survive the index rebuild. The index keeps covering a fragment that no longer exists, so every search has to mask it out at query time (the slow path) instead of ignoring it outright.

Fix

Materialize those fragments during remap, in two coordinated places:

  • Data (index::remap_index): augment the remap map so every row of a still-covered-but-no-longer-present fragment maps to None, dropping it from the rebuilt index. The fragments are gone from the current manifest, so their rows are enumerated from the dataset version the index was built at; if that version can't be loaded (e.g. already cleaned up) the drop is skipped and the rows fall back to query-time masking as before.
  • Coverage (Transaction::recalculate_fragment_bitmap): intersect the recomputed fragment bitmap with the live fragments, so the deleted fragment is dropped from coverage too. This is what restores searches to the fast path; it is kept in lockstep with the data drop so a trimmed bitmap never points at rows that are still present.

Test

test_compaction_materializes_deleted_fragment: build 10 fragments + a scalar index, fully delete one fragment, run an inline compaction, and assert (a) the index no longer covers the deleted fragment and its coverage equals the live fragments, and (b) an indexed scan over the deleted range still returns correctly. Existing optimize (91) and transaction (51) suites stay green.

Notes / possible follow-ups

  • The data drop enumerates the deleted fragment's row addresses into the remap map. For very large fragments that is memory-heavy; a fragment-id-keyed prune inside each index type's remap would avoid materializing every address, at the cost of threading a fragment-id set through the per-type remap interface.
  • For scalar indexes the coverage trim alone is sufficient for correctness (the scalar query layer already filters stale entries); the data drop additionally keeps the rebuilt index minimal and is the safe choice for vector indexes, whose search path resolves and takes row addresses directly.

Draft — opening for discussion of the approach.

…indexes

When compaction remaps an index it drops the rows of deleted records that are
materialized during the rewrite, but it does not drop fragments that were fully
deleted before the compaction. Such a fragment is already gone from the manifest
and was never part of a rewrite group, so its rows are absent from the remap map
and survive the rebuild. The index keeps covering a fragment that no longer
exists, forcing every search onto the masked slow path.

Materialize those fragments during remap, in two coordinated places:

- index::remap_index augments the remap map so every row of a still-covered but
  no-longer-present fragment maps to None, dropping it from the rebuilt index
  data. The rows are enumerated from the dataset version the index was built at,
  since the fragments are gone from the current manifest (skipped if that
  version is unavailable, leaving the rows to query-time masking as before).
- Transaction::recalculate_fragment_bitmap intersects the recomputed coverage
  with the live fragments, dropping the deleted fragment from the bitmap too and
  restoring searches to the fast path. Kept in lockstep with the data drop so a
  trimmed bitmap never points at rows that are still present.

Adds test_compaction_materializes_deleted_fragment.
@github-actions github-actions Bot added the enhancement New feature or request label Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant