Skip to content

[VL] Refactor Gluten to use upstream Velox Iceberg connector#12219

Open
infvg wants to merge 1 commit into
apache:mainfrom
infvg:newvelox
Open

[VL] Refactor Gluten to use upstream Velox Iceberg connector#12219
infvg wants to merge 1 commit into
apache:mainfrom
infvg:newvelox

Conversation

@infvg

@infvg infvg commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

This PR will move Gluten to the latest Velox branch, with minimal changes and switches the implementation to the new Iceberg connector. This also removes most traces of the enhanced branch.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@infvg

infvg commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Run Gluten Clickhouse CI on x86

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@philo-he

philo-he commented Jun 3, 2026

Copy link
Copy Markdown
Member

Thanks for the PR. Will we let Gluten directly reference Velox's main branch? This seems feasible if Velox PR is verified by Gluten CI. Otherwise, the code changes in Velox can break Gluten build/tests.

@infvg infvg changed the title [VL] Refactor Gluten to use upstream Velox [VL] Refactor Gluten to use upstream Velox Iceberg connector Jun 3, 2026
@infvg

infvg commented Jun 3, 2026

Copy link
Copy Markdown
Contributor Author

@philo-he not directly yet, we still need 5 commits. I've included them in this branch here:
IBM/velox#2047
But we can move off of most of the commits like the Iceberg code.

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@FelixYBW

FelixYBW commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

@philo-he No. The PR is for the iceberg connector refactor only. We will still have to use IBM/velox.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Gluten’s Velox integration to a newer upstream Velox branch and switches Iceberg execution to use the upstream Velox Iceberg connector, removing/relaxing previous “enhanced features” gating so Iceberg support is available in the standard Velox backend.

Changes:

  • Add and register a dedicated Velox Iceberg connector ID, and route Iceberg scans/splits through it (planner + runtime + query context connector configs).
  • Update Iceberg write path to match the upstream connector APIs (IcebergWriter + JNI wiring) and make Spark-side reflection more tolerant of upstream Iceberg/SparkWrite changes.
  • Adjust Iceberg partition-data JSON parsing to support additional JSON shapes, and update Iceberg write metrics expectation in tests.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
gluten-iceberg/src/main/scala/org/apache/iceberg/spark/source/IcebergWriteUtil.scala Makes reflection access to SparkWrite.writeProperties optional for compatibility across Iceberg versions.
gluten-iceberg/src/main/java/org/apache/gluten/connector/write/PartitionDataJson.java Extends partition JSON parsing to accept array/object forms and adds validation on counts.
ep/build-velox/src/get-velox.sh Updates default Velox branch values used by the dependency fetch script.
cpp/velox/substrait/SubstraitToVeloxPlanValidator.h Adds Iceberg connector ID to validator’s connector set.
cpp/velox/substrait/SubstraitToVeloxPlan.cc Routes table scans to Iceberg connector when the split info indicates Iceberg.
cpp/velox/jni/VeloxJniWrapper.cc Removes enhanced-feature compile gating around Iceberg JNI and returns “enhanced enabled” unconditionally.
cpp/velox/config/VeloxConfig.h Introduces kIcebergConnectorId.
cpp/velox/compute/WholeStageResultIterator.cc Uses Iceberg connector ID for Iceberg splits and populates connector session config for it.
cpp/velox/compute/VeloxRuntime.h Makes Iceberg writer APIs always available (no enhanced-feature gating).
cpp/velox/compute/VeloxRuntime.cc Registers/unregisters a scoped Iceberg connector per runtime instance.
cpp/velox/compute/VeloxConnectorIds.h Adds iceberg ID and icebergRegistered tracking.
cpp/velox/compute/VeloxBackend.h Adds Iceberg connector factory API.
cpp/velox/compute/VeloxBackend.cc Implements Iceberg connector creation using upstream Velox IcebergConnector.
cpp/velox/compute/iceberg/IcebergWriter.h Updates writer state/types to align with upstream connector expectations.
cpp/velox/compute/iceberg/IcebergWriter.cc Refactors Iceberg writer setup (QueryCtx/ConnectorQueryCtx/DataSink) and write slicing behavior.
cpp/velox/CMakeLists.txt Always builds Iceberg writer sources (no enhanced-feature gating).
backends-velox/src-iceberg/test/scala/org/apache/gluten/execution/enhanced/VeloxIcebergSuite.scala Updates expected Iceberg write metric numWrittenFiles.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

connectorConfig_);
connectorConfig_,
icebergConfig);
dataSink_.get();
Comment on lines +255 to +256
auto filteredRowVector =
std::make_shared<RowVector>(pool_.get(), rowType_, nullptr, inputRowVector->size(), std::move(dataColumns));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants