Spark: fix delete from branch for canDeleteWhere where it does not resolve to the correct branch#15512
Open
yingjianwu98 wants to merge 4 commits intoapache:mainfrom
Open
Spark: fix delete from branch for canDeleteWhere where it does not resolve to the correct branch#15512yingjianwu98 wants to merge 4 commits intoapache:mainfrom
yingjianwu98 wants to merge 4 commits intoapache:mainfrom
Conversation
added 3 commits
March 4, 2026 12:17
Contributor
|
From Netflix discussion, this sounds like a correctness bug and release blocker for 1.11. |
stevenzwu
reviewed
Apr 6, 2026
|
|
||
| spark.conf().set(SparkSQLProperties.WAP_BRANCH, "dev1"); | ||
| try { | ||
| // all rows go into one file on the WAP branch; main stays empty |
Contributor
There was a problem hiding this comment.
can we also insert some rows/files into the main branch first? ideally with a row of matching the predicate of id=1 .
| // resolve the WAP branch so they scan and commit to the same branch | ||
| sql("DELETE FROM %s WHERE id = 1", tableName); | ||
|
|
||
| assertEquals( |
Contributor
There was a problem hiding this comment.
use assertj
assertThat(sql("SELECT * FROM %s VERSION AS OF 'dev1' ORDER BY id", tableName))
.containsExactlyInAnyOrder(...)
Contributor
|
@yingjianwu98 can you resolve the merge conflict? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When WAP (Write-Audit-Publish) is enabled via spark.wap.branch, canDeleteWhere() and deleteWhere() scan different branches:
This causes canDeleteWhere() to incorrectly return true (metadata-only delete is possible) based on main's data, while deleteWhere() commits to the
WAP branch where the file has partial matches, resulting in:
ValidationException: Cannot delete file where some, but not all, rows match filter
Example
-- WAP enabled, spark.wap.branch = dev1
INSERT INTO t VALUES (1, 'a'), (2, 'b'), (3, 'c'); -- goes to dev1, main is empty
DELETE FROM t WHERE id = 1;
-- canDeleteWhere scans main (empty) → true → metadata delete
-- deleteWhere commits to dev1 → partial match → ValidationException
Fix
because the scan is a read operation, and determineReadBranch correctly handles the case where the WAP branch doesn't exist yet by falling back to
main.
reads and other operations that share the field.
Will work on the backport to other Spark versions once there is consensus from the community.