[bug] Parquet Source: snapshot sync fails on multiple commits with partitions#806
Draft
the-other-tim-brown wants to merge 3 commits intoapache:mainfrom
Draft
Conversation
4 tasks
| return partitionConfigs.stream() | ||
| .flatMap( | ||
| partitionConfig -> | ||
| Stream.of(SyncMode.FULL) // Incremental sync is not yet supported |
Contributor
Author
There was a problem hiding this comment.
Provides setup for the incremental sync mode so the tests can be quickly updated in the future to cover those cases
| .toArray(String[]::new); | ||
| // add partition columns to dataframe | ||
| for (String partitionCol : partitionCols) { | ||
| if (partitionCol.equals("year")) { |
Contributor
There was a problem hiding this comment.
The if/else if chain only handles "year", "month", and "day". If the partition config ever contains a different column, no column gets added to the DataFrame but partitionBy still references it — causing a confusing AnalysisException. Add an else with throw new IllegalArgumentException("Unsupported partition column: " + partitionCol) to fail fast.
| .formatName(formatName) | ||
| // set the metadata path to the data path as the default (required by Hudi) | ||
| .basePath(table.getDataPath()) | ||
| .basePath(table.getBasePath()) |
Contributor
There was a problem hiding this comment.
Nit: the comment on line 154 still says "set the metadata path to the data path" but the code now uses getBasePath(). Update the comment to match.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What is the purpose of the pull request
Fixes issues with snapshot sync with the Parquet source.
Closes #807
Brief change log
Verify this pull request
Integration test is updated to make 2 commits and verify the results for both partitioned and non-partitioned table