Skip to content

feat(table): classify deletion vectors in scan planning#855

Merged
zeroshade merged 2 commits intoapache:mainfrom
laskoviymishka:feat/dv-scan-planning
Apr 7, 2026
Merged

feat(table): classify deletion vectors in scan planning#855
zeroshade merged 2 commits intoapache:mainfrom
laskoviymishka:feat/dv-scan-planning

Conversation

@laskoviymishka
Copy link
Copy Markdown
Contributor

Separate deletion vectors from regular positional delete files during scan planning. DVs are identified by having ReferencedDataFile set on a positional delete manifest entry (Iceberg v3 spec).

  • Add dvEntries bucket to manifestEntries with isDeletionVector() helper
  • In collectManifestEntries: route DV entries separately from pos deletes
  • Add DeletionVectorFiles field to FileScanTask
  • Index DVs by referenced data file path for O(1) matching in PlanFiles
  • Unit tests for classification, matching, and FileScanTask field

Part of #589 (v3 deletion vector support).

@laskoviymishka laskoviymishka marked this pull request as ready for review April 6, 2026 05:28
Separate deletion vectors from regular positional delete files during
scan planning. DVs are identified by having ReferencedDataFile set on
a positional delete manifest entry (Iceberg v3 spec).

- Add dvEntries bucket to manifestEntries with isDeletionVector() helper
- In collectManifestEntries: route DV entries separately from pos deletes
- Add DeletionVectorFiles field to FileScanTask
- Index DVs by referenced data file path for O(1) matching in PlanFiles
- Unit tests for classification, matching, and FileScanTask field

Part of apache#589 (v3 deletion vector support).
@laskoviymishka laskoviymishka force-pushed the feat/dv-scan-planning branch from 7a47cb5 to 139e283 Compare April 6, 2026 23:50
File iceberg.DataFile
DeleteFiles []iceberg.DataFile // positional delete files
EqualityDeleteFiles []iceberg.DataFile // equality delete files
DeletionVectorFiles []iceberg.DataFile // deletion vectors (puffin files)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the actual scanner will ignore these, we should probably also update the scanner to error if this is non-empty just so people don't think it's working.

Separate deletion vectors from regular positional delete files during
scan planning. DVs are identified by having ReferencedDataFile set on
a positional delete manifest entry (Iceberg v3 spec).

- Add dvEntries bucket to manifestEntries with isDeletionVector() helper
- In collectManifestEntries: route DV entries separately from pos deletes
- Add DeletionVectorFiles field to FileScanTask
- Index DVs by referenced data file path for O(1) matching in PlanFiles
- Scanner errors if DeletionVectorFiles is non-empty (read not yet implemented)
- Unit tests for classification, matching, and FileScanTask field

Part of apache#589 (v3 deletion vector support).
@zeroshade zeroshade merged commit 526492a into apache:main Apr 7, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants