Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/src/format/table/.pages
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@ nav:
- Layout: layout.md
- Branch & Tag: branch_tag.md
- Row ID & Lineage: row_id_lineage.md
- Data Overlay Files: data_overlay_file.md
- MemTable & WAL: mem_wal.md
390 changes: 390 additions & 0 deletions docs/src/format/table/data_overlay_file.md

Large diffs are not rendered by default.

29 changes: 29 additions & 0 deletions docs/src/format/table/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,35 @@ However, this invalidates row addresses and requires rebuilding indices, which c

</details>

## Data Overlay Files

!!! note "Overlay files require feature flag 64 (data overlay files)"

Overlay files supply new values for a subset of `(row offset, field)` cells within
a fragment without rewriting the base data files. They make updates cheap when only
a small percentage of rows and/or columns change: a writer appends a small file
carrying just the changed cells instead of rewriting whole columns or moving rows
to a new fragment.

On read, each cell is resolved by consulting the fragment's overlays from newest to
oldest; the first overlay covering that `(offset, field)` wins, otherwise the value
falls through to the base data file. Indices keep covering the fragment and reconcile
overlays at query time through a field-aware exclusion set.

For the full specification — coverage and resolution rules, dense vs. sparse layout,
versioning, index integration, compaction, and a worked example — see the
[Data Overlay Files Specification](data_overlay_file.md).

<details>
<summary>DataOverlayFile protobuf message</summary>

```protobuf
%%% proto.message.DataOverlayFile %%%
```

</details>


## Related Specifications

### Storage Layout
Expand Down
74 changes: 74 additions & 0 deletions protos/table.proto
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,11 @@ message Manifest {
// * 2: row ids are stable and stored as part of the fragment metadata.
// * 4: use v2 format (deprecated)
// * 8: table config is present
// * 16: data files use multiple base paths (shallow clone / multi-base)
// * 32: the transaction file under _transactions is not written (inline only)
// * 64: data overlay files are present (see DataOverlayFile). Readers that do
// not understand overlays must refuse the dataset, since ignoring an overlay
// would silently return stale base values.
uint64 reader_feature_flags = 9;

// Feature flags for writers.
Expand Down Expand Up @@ -311,6 +316,15 @@ message DataFragment {

repeated DataFile files = 2;

// Optional overlay files for this fragment, which supply new values for a
// subset of cells without rewriting the base data files. This MUST be empty
// if the data overlay files feature flag (64) is not set in the manifest.
//
// Order is significant: a later entry is newer than an earlier one. When two
// overlays cover the same (offset, field) and share a `committed_version`, the
// later entry wins. See DataOverlayFile for the full resolution rules.
repeated DataOverlayFile overlays = 11;

// File that indicates which rows, if any, should be considered deleted.
DeletionFile deletion_file = 3;

Expand Down Expand Up @@ -433,6 +447,66 @@ message DataFile {
optional uint32 base_id = 7;
} // DataFile

// An overlay file supplies new values for a subset of (row offset, field) cells
// within a fragment, without rewriting the fragment's base data files. It is
// used for efficient updates when only a small fraction of rows and/or columns
// change.
//
// On read, a cell is resolved by consulting the fragment's overlays from newest
// to oldest: the first overlay that covers that (offset, field) wins; if none
// cover it, the value falls through to the base data file. Because deletions
// take precedence over overlays, an overlay value for an offset that is also
// marked deleted is dead and is ignored.
//
// The overlay's data file does NOT store a row-offset key column. Within a value
// column, the position of a covered offset's value is the rank (0-based count of
// set bits below it) of that offset within the field's coverage bitmap. Because
// fields may cover different offset sets, the value columns of a single overlay
// data file may have different lengths (which the Lance file format permits).
message DataOverlayFile {
// The data file storing the overlay's new cell values, one value column per
// field in `data_file.fields`. No row-offset key column is stored.
DataFile data_file = 1;

// Which (offset, field) cells this overlay provides values for.
oneof coverage {
// A single 32-bit Roaring bitmap of physical row offsets that applies to
// every field in `data_file.fields` (a "dense" / rectangular overlay).
// Every covered offset has a value for every field. This is the common case
// for a plain UPDATE, where one SET list is applied to one set of rows.
bytes shared_offset_bitmap = 2;
// Per-field coverage for a "sparse" overlay, used when different fields cover
// different offset sets (e.g. a MERGE with multiple WHEN MATCHED branches).
FieldCoverage field_coverage = 4;
}

// The dataset version at which this overlay became effective: the version of
// the commit that introduced it, NOT the version it was read from. It is
// stamped at commit time and re-stamped if the commit is retried, in the same
// way as the created-at / last-updated-at version sequences.
//
// This drives two orderings:
// * Versus index builds: an index whose `dataset_version` >= this value
// already incorporates this overlay. Otherwise the overlay's covered cells
// are excluded from index results for the affected fields and re-evaluated
// against their current values (see the Data Overlay Files specification).
// * Versus other overlays: when two overlays cover the same (offset, field),
// the one with the higher `committed_version` wins. Overlays that share a
// `committed_version` are ordered by their position in
// `DataFragment.overlays`, where a later entry is newer and wins.
uint64 committed_version = 3;
}

// Per-field coverage for a sparse overlay.
message FieldCoverage {
// One entry per field in the overlay's `data_file.fields`, in the same order.
// Each is a 32-bit Roaring bitmap of the physical row offsets covered for that
// field. An offset present in a field's bitmap but mapped to a NULL value
// means the cell is overridden to NULL (distinct from an offset that is absent,
// which falls through to the base data file).
repeated bytes offset_bitmaps = 1;
}

// Deletion File
//
// The path of the deletion file is constructed as:
Expand Down
39 changes: 39 additions & 0 deletions protos/transaction.proto
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,44 @@ message Transaction {
repeated DataReplacementGroup replacements = 1;
}

// Overlay files to append to a single fragment, in order (the last entry is
// newest). The overlays are appended to the fragment's existing `overlays`
// list; they do not replace it, so overlays written by concurrent commits are
// preserved.
message DataOverlayGroup {
uint64 fragment_id = 1;
// Each DataOverlayFile.committed_version is left 0 by the writer and stamped
// to the new dataset version at commit time (re-stamped on retry), in the
// same way as the created-at / last-updated-at version sequences. The fields
// touched are read from each overlay's `data_file.fields`.
repeated DataOverlayFile overlays = 2;
}

// Attach overlay files to fragments, supplying new values for a subset of
// (row offset, field) cells without rewriting the fragments' base data files.
// See the DataOverlayFile message in table.proto and the Data Overlay Files
// specification for resolution, coverage, and versioning rules.
//
// Conflict semantics (intentionally permissive, like DataReplacement). Against
// a concurrent operation that touches one of the same fragments:
// * Another DataOverlay (any fields): COMPATIBLE. Overlays stack; when two
// overlays cover the same (offset, field) the one with the higher
// `committed_version` wins, so independent backfills never conflict.
// * Append / new fragments: COMPATIBLE.
// * Delete: COMPATIBLE. A deletion takes precedence over an overlay, so an
// overlay value for a deleted offset is inert (no special handling needed).
// * DataReplacement or column-rewrite (Update with REWRITE_COLUMNS) of the
// same field: COMPATIBLE. Both preserve physical row addresses, so overlay
// offsets stay valid; the overlay is newer and wins its covered cells, and
// the version gate excludes those cells from any rebuilt index.
// * Row-rewrite, compaction, or an overlay->base fold of the fragment:
// CONFLICT. These change physical row addresses or consume the overlays, so
// the overlay's offsets are no longer valid. The writer must re-read the new
// fragment, recompute, and retry.
message DataOverlay {
repeated DataOverlayGroup groups = 1;
}

// Update the merged generations in MemWAL index.
// This operation is used during merge-insert to atomically record which
// generations have been merged to the base table.
Expand Down Expand Up @@ -346,6 +384,7 @@ message Transaction {
UpdateMemWalState update_mem_wal_state = 112;
Clone clone = 113;
UpdateBases update_bases = 114;
DataOverlay data_overlay = 115;
}

// Fields 200/202 (`blob_append` / `blob_overwrite`) previously represented blob dataset ops.
Expand Down
2 changes: 2 additions & 0 deletions rust/lance-table/benches/manifest_intern.rs
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ fn make_uniform_pb_fragments(n: u64, num_fields: usize) -> Vec<pb::DataFragment>
file_size_bytes: 0,
base_id: None,
}],
overlays: vec![],
deletion_file: None,
row_id_sequence: None,
physical_rows: 1000,
Expand Down Expand Up @@ -135,6 +136,7 @@ fn make_diverse_pb_fragments(
file_size_bytes: 0,
base_id: None,
}],
overlays: vec![],
deletion_file: None,
row_id_sequence: None,
physical_rows: 1000,
Expand Down
4 changes: 4 additions & 0 deletions rust/lance-table/src/format/fragment.rs
Original file line number Diff line number Diff line change
Expand Up @@ -716,6 +716,10 @@ impl From<&Fragment> for pb::DataFragment {
Self {
id: f.id,
files: f.files.iter().map(pb::DataFile::from).collect(),
// Overlay files are not produced by this version of the library; a
// dataset that uses them sets reader feature flag 64, which is
// rejected at the feature-flag layer (see lance-table feature_flags).
overlays: vec![],
deletion_file,
row_id_sequence,
physical_rows: f.physical_rows.unwrap_or_default() as u64,
Expand Down
28 changes: 28 additions & 0 deletions rust/lance/src/dataset/transaction.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3289,6 +3289,16 @@ impl TryFrom<pb::Transaction> for Transaction {
})) => Operation::UpdateBases {
new_bases: new_bases.into_iter().map(BasePath::from).collect(),
},
Some(pb::transaction::Operation::DataOverlay(_)) => {
// Overlay files are not supported by this version of the library.
// A dataset that uses them sets reader feature flag 64, which is
// already rejected at the feature-flag layer; reject here too so a
// transaction referencing the operation can never be applied.
return Err(Error::not_supported(
"data overlay files are not supported by this version of Lance \
(reader feature flag 64)",
));
}
None => {
return Err(Error::internal(
"Transaction message did not contain an operation".to_string(),
Expand Down Expand Up @@ -6127,4 +6137,22 @@ mod tests {
assert!(!left.modifies_same_metadata(&different_key));
assert!(left.modifies_same_metadata(&replace));
}

#[test]
fn test_data_overlay_operation_rejected() {
// Overlay files are not supported by this version of the library. A
// transaction carrying the DataOverlay operation must be rejected rather
// than silently ignored, mirroring the feature-flag-64 rejection.
let message = pb::Transaction {
read_version: 1,
uuid: Uuid::new_v4().to_string(),
operation: Some(pb::transaction::Operation::DataOverlay(
pb::transaction::DataOverlay { groups: vec![] },
)),
..Default::default()
};

let result = Transaction::try_from(message);
assert!(matches!(result, Err(Error::NotSupported { .. })));
}
}
Loading