Skip to content

fix: reject DataReplacement racing concurrent Update/Delete/Merge#7373

Merged
wkalt merged 2 commits into
lance-format:mainfrom
wkalt:ticket/gen-633/datareplacement-stale-version-conflict
Jun 22, 2026
Merged

fix: reject DataReplacement racing concurrent Update/Delete/Merge#7373
wkalt merged 2 commits into
lance-format:mainfrom
wkalt:ticket/gen-633/datareplacement-stale-version-conflict

Conversation

@wkalt

@wkalt wkalt commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

A DataReplacement rewrites a column's data file positionally against the fragments it targets. The conflict resolver returned Ok unconditionally for a concurrent Update, Delete, or Merge, so a DataReplacement committed at a read version those operations had superseded was applied silently -- dropping or misaligning the rows the concurrent op moved or deleted, with no error raised. Merge was additionally asymmetric: check_merge_txn already treats a concurrent DataReplacement as a conflict, but not the reverse.

For Update/Delete, conflict when the other transaction's updated/removed fragment ids overlap our replacement fragment ids (mirrors the existing Rewrite handling). For Merge, which rewrites the entire fragment list, conflict unconditionally (mirrors check_merge_txn). All are retryable, so the committer rebuilds against the new layout.

Adds DataReplacement vs Update, Delete, and Merge cases (same and different fragment) to test_conflicts_data_replacement.

@github-actions github-actions Bot added the bug Something isn't working label Jun 19, 2026
A DataReplacement rewrites a column's data file positionally against the
fragments it targets. The conflict resolver returned Ok unconditionally for a
concurrent Update, Delete, or Merge, so a DataReplacement committed at a read
version those operations had superseded was applied silently -- dropping or
misaligning the rows the concurrent op moved or deleted, with no error raised.
Merge was additionally asymmetric: check_merge_txn already treats a concurrent
DataReplacement as a conflict, but not the reverse.

For Update/Delete, conflict when the other transaction's updated/removed fragment
ids overlap our replacement fragment ids (mirrors the existing Rewrite handling).
For Merge, which rewrites the entire fragment list, conflict unconditionally
(mirrors check_merge_txn). All are retryable, so the committer rebuilds against
the new layout.

Adds DataReplacement vs Update, Delete, and Merge cases (same and different
fragment) to test_conflicts_data_replacement, and updates
test_datafile_replacement_error to read at the current version (its pre-Merge
read version now correctly conflicts with the Merge).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@wkalt wkalt force-pushed the ticket/gen-633/datareplacement-stale-version-conflict branch from 0d4a340 to b33113c Compare June 19, 2026 14:56
@codecov

codecov Bot commented Jun 19, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@westonpace westonpace left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also update /home/pace/dev/lance/main/docs/src/format/table/transaction.md?

Comment on lines +926 to +940
// A concurrent Update/Delete that changed one of our target
// fragments makes our positional column file stale; conflict so
// the committer rebuilds (lance otherwise accepts it silently).
for replacement in replacements {
let touches_our_fragment = updated_fragments
.iter()
.map(|f| f.id)
.chain(removed_fragment_ids.iter().copied())
.any(|id| id == replacement.0);
if touches_our_fragment {
return Err(
self.retryable_conflict_err(other_transaction, other_version)
);
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to conflict if a delete removed a fragment that we updated? Should this be asymmetric?

DataReplacement followed by Delete -> Disallowed?
Delete followed by DataReplacement -> Allowed (but changes discarded silently)

This is a genuine question, I don't really know the answer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the specific case of Delete followed by DataReplacement, will throw an error here if you hit it today:
https://github.com/lance-format/lance/blob/main/rust/lance/src/dataset/transaction.rs#L2229,L2231

I assume we could probably work around this in the DataReplacement operation but handling it in the conflict mechanism seems like a more certain approach to me.

For the other direction we definitely need to conflict or the wrong rows may get deleted.

The transaction spec listed Update and Merge under DataReplacement's
retryable conflicts but omitted Delete. Now that the conflict resolver
rejects a DataReplacement racing a concurrent Delete on an overlapping
fragment, add Delete to the list so the spec matches the behavior (the
Delete section already documented the reverse direction).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

Important

This PR touches the Lance format specification.

Substantive changes to the format specification — the .proto definitions
and the spec docs under docs/src/format/ — require a PMC vote before merge.
Minor edits such as typo fixes, wording, or formatting are excluded; use your
judgment.

If this is a meaningful format change:

  • Start a vote following the Lance community voting process.
    Format specification modifications need 3 binding +1 votes (excluding the
    proposer), held on GitHub Discussions, with a minimum voting period of 1 week.
  • Once the vote passes, link the completed vote in this PR. It should not be
    merged until the vote is linked.

@github-actions github-actions Bot added the A-format On-disk format: protos and format spec docs label Jun 22, 2026
@westonpace

Copy link
Copy Markdown
Member

This PR touches the Lance format specification.

I think this gap in docs/src/format was a gap and does not represent a behavior change (as mentioned, we previously conflicted with delete in one direction anyways).

@wkalt wkalt merged commit f4d24ca into lance-format:main Jun 22, 2026
42 of 43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-format On-disk format: protos and format spec docs bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants