[CORE] Optimize Iceberg schema field matching by wankunde · Pull Request #12233 · apache/gluten

wankunde · 2026-06-04T08:59:45Z

What changes are proposed in this pull request?

Why this PR is needed?

In IcebergScanTransformer.typesMatch(), the struct type matching logic creates temporary Iceberg Schema objects for every Spark field:

new Schema(currentType.fields()).findField(...)
new Schema(iceberg.fields()).findField(...)

This repeatedly rebuilds Iceberg schema indexes while checking historical schemas, which can become expensive for wide schemas or tables with many schema versions. In production thread dumps, this shows up in Schema / IndexByName / HashMap initialization during Iceberg scan planning.

Changes in this PR:

This change uses Types.StructType.field(name) and Types.StructType.field(id) directly when matching nested struct fields.

Types.StructType already provides field lookup by name and id, so this avoids constructing temporary Schema objects inside the field loop while preserving the existing matching behavior:

find the current field by Spark field name
find the old schema field by Iceberg field id
keep allowing added columns
keep detecting renamed columns by comparing field names

How was this patch tested?

Test with exist UT

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Codex GPT-5

github-actions · 2026-06-04T09:49:13Z

Run Gluten Clickhouse CI on x86

[CORE] Optimize Iceberg schema field matching

ba61ef5

github-actions Bot added the DATA_LAKE label Jun 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE] Optimize Iceberg schema field matching#12233

[CORE] Optimize Iceberg schema field matching#12233
wankunde wants to merge 1 commit into
apache:mainfrom
wankunde:IcebergScan_schema_check

wankunde commented Jun 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wankunde commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions Bot commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wankunde commented Jun 4, 2026 •

edited

Loading