fix: reject DataReplacement racing concurrent Update/Delete/Merge#7373
Conversation
A DataReplacement rewrites a column's data file positionally against the fragments it targets. The conflict resolver returned Ok unconditionally for a concurrent Update, Delete, or Merge, so a DataReplacement committed at a read version those operations had superseded was applied silently -- dropping or misaligning the rows the concurrent op moved or deleted, with no error raised. Merge was additionally asymmetric: check_merge_txn already treats a concurrent DataReplacement as a conflict, but not the reverse. For Update/Delete, conflict when the other transaction's updated/removed fragment ids overlap our replacement fragment ids (mirrors the existing Rewrite handling). For Merge, which rewrites the entire fragment list, conflict unconditionally (mirrors check_merge_txn). All are retryable, so the committer rebuilds against the new layout. Adds DataReplacement vs Update, Delete, and Merge cases (same and different fragment) to test_conflicts_data_replacement, and updates test_datafile_replacement_error to read at the current version (its pre-Merge read version now correctly conflicts with the Merge). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0d4a340 to
b33113c
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
westonpace
left a comment
There was a problem hiding this comment.
Can you also update /home/pace/dev/lance/main/docs/src/format/table/transaction.md?
| // A concurrent Update/Delete that changed one of our target | ||
| // fragments makes our positional column file stale; conflict so | ||
| // the committer rebuilds (lance otherwise accepts it silently). | ||
| for replacement in replacements { | ||
| let touches_our_fragment = updated_fragments | ||
| .iter() | ||
| .map(|f| f.id) | ||
| .chain(removed_fragment_ids.iter().copied()) | ||
| .any(|id| id == replacement.0); | ||
| if touches_our_fragment { | ||
| return Err( | ||
| self.retryable_conflict_err(other_transaction, other_version) | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
Do we need to conflict if a delete removed a fragment that we updated? Should this be asymmetric?
DataReplacement followed by Delete -> Disallowed?
Delete followed by DataReplacement -> Allowed (but changes discarded silently)
This is a genuine question, I don't really know the answer.
There was a problem hiding this comment.
for the specific case of Delete followed by DataReplacement, will throw an error here if you hit it today:
https://github.com/lance-format/lance/blob/main/rust/lance/src/dataset/transaction.rs#L2229,L2231
I assume we could probably work around this in the DataReplacement operation but handling it in the conflict mechanism seems like a more certain approach to me.
For the other direction we definitely need to conflict or the wrong rows may get deleted.
The transaction spec listed Update and Merge under DataReplacement's retryable conflicts but omitted Delete. Now that the conflict resolver rejects a DataReplacement racing a concurrent Delete on an overlapping fragment, add Delete to the list so the spec matches the behavior (the Delete section already documented the reverse direction). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Important This PR touches the Lance format specification. Substantive changes to the format specification — the If this is a meaningful format change:
|
I think this gap in |
A DataReplacement rewrites a column's data file positionally against the fragments it targets. The conflict resolver returned Ok unconditionally for a concurrent Update, Delete, or Merge, so a DataReplacement committed at a read version those operations had superseded was applied silently -- dropping or misaligning the rows the concurrent op moved or deleted, with no error raised. Merge was additionally asymmetric: check_merge_txn already treats a concurrent DataReplacement as a conflict, but not the reverse.
For Update/Delete, conflict when the other transaction's updated/removed fragment ids overlap our replacement fragment ids (mirrors the existing Rewrite handling). For Merge, which rewrites the entire fragment list, conflict unconditionally (mirrors check_merge_txn). All are retryable, so the committer rebuilds against the new layout.
Adds DataReplacement vs Update, Delete, and Merge cases (same and different fragment) to test_conflicts_data_replacement.