[bug] Parquet Source: snapshot sync fails on multiple commits with partitions by the-other-tim-brown · Pull Request #806 · apache/incubator-xtable

the-other-tim-brown · 2026-02-22T21:03:56Z

What is the purpose of the pull request

Fixes issues with snapshot sync with the Parquet source.

Closes #807

Brief change log

Updated integration test

Verify this pull request

Integration test is updated to make 2 commits and verify the results for both partitioned and non-partitioned table

the-other-tim-brown · 2026-02-22T21:08:42Z

xtable-core/src/test/java/org/apache/xtable/parquet/ITParquetConversionSource.java

+    return partitionConfigs.stream()
+        .flatMap(
+            partitionConfig ->
+                Stream.of(SyncMode.FULL) // Incremental sync is not yet supported


Provides setup for the incremental sync mode so the tests can be quickly updated in the future to cover those cases

vinishjail97 · 2026-02-23T05:54:08Z

xtable-core/src/test/java/org/apache/xtable/parquet/ITParquetConversionSource.java

+              .toArray(String[]::new);
+      // add partition columns to dataframe
+      for (String partitionCol : partitionCols) {
+        if (partitionCol.equals("year")) {


The if/else if chain only handles "year", "month", and "day". If the partition config ever contains a different column, no column gets added to the DataFrame but partitionBy still references it — causing a confusing AnalysisException. Add an else with throw new IllegalArgumentException("Unsupported partition column: " + partitionCol) to fail fast.

vinishjail97 · 2026-02-23T05:54:08Z

xtable-core/src/test/java/org/apache/xtable/parquet/ITParquetConversionSource.java

                        .formatName(formatName)
                        // set the metadata path to the data path as the default (required by Hudi)
-                        .basePath(table.getDataPath())
+                        .basePath(table.getBasePath())


Nit: the comment on line 154 still says "set the metadata path to the data path" but the code now uses getBasePath(). Update the comment to match.

make 2 commits, cleanup testing

752f54a

the-other-tim-brown mentioned this pull request Feb 22, 2026

Snapshot sync with Parquet source does not work on partitioned tables #807

Open

4 tasks

the-other-tim-brown added 2 commits February 22, 2026 16:06

fix imports

cd2b553

reduce diff

c6940eb

the-other-tim-brown commented Feb 22, 2026

View reviewed changes

the-other-tim-brown added the bug Something isn't working label Feb 22, 2026

vinishjail97 reviewed Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[bug] Parquet Source: snapshot sync fails on multiple commits with partitions#806

[bug] Parquet Source: snapshot sync fails on multiple commits with partitions#806
the-other-tim-brown wants to merge 3 commits intoapache:mainfrom
the-other-tim-brown:parquet-source-snapshot-failure

the-other-tim-brown commented Feb 22, 2026 •

edited

Loading

Uh oh!

the-other-tim-brown Feb 22, 2026

Uh oh!

vinishjail97 Feb 23, 2026

Uh oh!

vinishjail97 Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

the-other-tim-brown commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the pull request

Brief change log

Verify this pull request

Uh oh!

the-other-tim-brown Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

vinishjail97 Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

vinishjail97 Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

the-other-tim-brown commented Feb 22, 2026 •

edited

Loading