fix(compaction): prune fully-deleted fragments from the fragment-reuse index#7378
fix(compaction): prune fully-deleted fragments from the fragment-reuse index#7378xuanyu-z wants to merge 1 commit into
Conversation
|
Important This PR touches the Lance format specification. Substantive changes to the format specification — the If this is a meaningful format change:
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
311eeef to
16c4c9a
Compare
abff2f1 to
d603bc4
Compare
d603bc4 to
536997b
Compare
…e index A fragment that has all its rows deleted is removed from the manifest at delete time, so it never enters a compaction task and is absent from every fragment-reuse-index (FRI) group's old_frags. With deferred compaction (defer_index_remap=true) the inline remap is skipped, so FragReuseIndex::remap_row_id passed those addresses through unchanged instead of pruning them. An index that still covered the removed fragment then kept dangling references to it, surfacing as 'take received reference to fragment that does not exist' errors / ghost results after a later physical remap. Record the orphaned fragment IDs (covered by an index but no longer present in the dataset, and not part of any rewrite group) on the FRI version and prune those addresses to None in remap_row_id, fixing both auto-remap at load and physical remap. Index coverage bitmaps are left intact (masked at query time by effective_fragment_bitmap); only the row data is pruned. The new proto field is additive and backward-compatible. Adds a regression test. Fixes lance-format#7374
536997b to
216112c
Compare
|
@claude review once |
westonpace
left a comment
There was a problem hiding this comment.
Good catch for this error. I have one minor question but this is good otherwise.
| /// Cache of `details.removed_frag_bitmap()`; rebuilt in `new`, so skipped. | ||
| #[serde(skip)] | ||
| removed_frags: RoaringBitmap, |
There was a problem hiding this comment.
This is probably unrelated to this PR but why do we store the FRI as a collection of versions in the first place? Why not merge them all together at write time?
There was a problem hiding this comment.
From the git history and code comments and my understandings, the main benefit is versioning allows the system to trim the FRI as different indices catch up so the size of the FRI is not unbounded before a full index job kicks in.
Then versioning also makes FRI ~append only which inherit all the consistency benefits
| /// Top-200 KNN projecting `i` (the projection forces a `take`, so a stale | ||
| /// row address would error). Asserts no result id falls in `ghost_range` | ||
| /// and at least one live row is returned. | ||
| async fn assert_knn_no_ghosts(dataset: &Dataset, ghost_range: std::ops::Range<i32>) { |
There was a problem hiding this comment.
Ghosts is a good word for the phenomenon 👻
Fixes #7374.
Problem
When compaction is deferred (
defer_index_remap: true), index entries are not physically remapped at compaction time; instead the fragment-reuse index (FRI) records old→new row-address maps and rewrites addresses on the fly viaremap_row_id.If a fragment is fully deleted before the compaction,
apply_deletionsremoves it from the manifest entirely (no deletion file). It therefore never enters a compaction task and is absent from every rewrite group'sold_fragments. An index that still covered that fragment keeps its stale entries, and because no group references the fragment,remap_row_idpasses those addresses straight through — resurfacing as dangling references to a fragment that no longer exists.The corruption is latent (
effective_fragment_bitmapmasks the dead fragment at query time) but surfaces once another rewrite touches the affected coverage.Fix
Track such orphaned fragments explicitly and prune their rows in the FRI:
FragmentReuseIndexDetails.Version.removed_fragments(field 4) andFragReuseVersion.removed_frags.commit_compaction(deferred branch), compute the orphaned set ascovered − live_fragments, wherecoveredis the union of every non-system index's coverage (read post-FRI-remap viaload_indices, so fragments rewritten by earlier deferred compactions are already excluded) andlive_fragmentsis the pre-commit fragment set. The difference is exactly the fragments an index still points at that have vanished from the manifest.remap_row_idmaps any row whose final address (after chaining through all versions) lands in aremoved_fragmentsfragment toNone.Why track by fragment ID rather than per-row address
A fully-deleted fragment is gone from the manifest, so its
physical_rows/address range can no longer be enumerated to build per-addressaddr → Noneentries. Since the entire fragment is gone, fragment-ID granularity is exact.Why the prune is applied after chaining, not before
A fragment can be removed at any generation. A row originally in fragment A may have been rewritten by an earlier deferred compaction into fragment B, which is then deleted. The orphaned fragment (B) shows up only at the final mapped address, not the input one. A removed fragment is always terminal (nothing compacts a manifest-removed fragment forward) and fragment IDs are never reused, so checking the final address never prunes a live row.
Coverage is intentionally left intact
remap_fragment_bitmapdoes not strip orphaned fragments from index coverage. Their rows are already pruned from index data byremap_row_id, and a nonexistent fragment is masked at query time byeffective_fragment_bitmap. Stripping coverage instead makes the planner treat the dead fragment's row range as an unindexed flat-scan fallback, which fails ontake.Backward compatibility
removed_fragmentsis an additive proto field. The FRI is persisted as protobuf; an older index decodes the absent field as an empty list, soremap_row_idshort-circuits and behaves exactly as before. No migration is required.Tests
The corruption is latent end-to-end (
create_restricted_deletion_maskblocks missing fragments andeffective_fragment_bitmapmasks dead ones), so the core cases assert at the FRI level (remap_row_idmust returnNone) via a sharedassert_fri_pruneshelper, and keep faithful end-to-end checks where a non-empty projection actually exercises thetake. Shared helpers (dataset_with_ivf_pq,dataset_with_btree,assert_fri_prunes,assert_knn_no_ghosts) keep the cases small; FRI-level-only cases use a cheap scalar BTREE index (no vector training).test_defer_index_remap_fully_deleted_fragment— the Vector index can become corrupted when compaction is deferred #7374 repro: a fully-deleted fragment, deferred-compacted. Asserts the FRI prunes it and KNN returns no ghosts during the FRI window, then again after the physical remap + FRI trim that closes the window.test_defer_index_remap_deleted_merged_fragment— a fragment merged by one deferred compaction then fully deleted before a second; pruned at its post-merge ID while survivors are kept. (Fails if the prune is applied to the input address instead of the final one.)test_defer_index_remap_multiple_fully_deleted_fragments— several whole-fragment deletions before one compaction; verifiesremoved_fragsis a union.test_defer_index_remap_survivor_deleted_across_versions— a fragment left untouched by an earlier deferred compaction and then deleted (pruned via pass-through, not chaining).test_defer_index_remap_then_delete_during_window— reverse ordering: compaction then delete during the FRI window; the new fragment's deletion vector is honored.