[GLUTEN-11622][VL] Add basic TIMESTAMP_NTZ type support (#11939)#12229
Open
rui-mo wants to merge 3 commits into
Open
[GLUTEN-11622][VL] Add basic TIMESTAMP_NTZ type support (#11939)#12229rui-mo wants to merge 3 commits into
rui-mo wants to merge 3 commits into
Conversation
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
Contributor
There was a problem hiding this comment.
Pull request overview
Adds initial Velox-backend support for Spark TIMESTAMP_NTZ by enabling native scan paths, wiring Substrait/Arrow type conversions, and adjusting validator/fallback behavior and test coverage across supported Spark versions.
Changes:
- Enable native Parquet scan for
TIMESTAMP_NTZand add validation/translation plumbing (Substrait + C++ parser/expr handling). - Extend Spark↔Arrow and Spark↔Substrait type mappings for
timestamp_ntz, plus related iterator/columnar-to-row handling. - Add/adjust unit tests and backend test settings/exclusions for
TIMESTAMP_NTZbehaviors.
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| gluten-ut/spark41/src/test/scala/org/apache/spark/sql/GlutenCachedTableSuite.scala | Adds cache/uncache coverage for TIMESTAMP_NTZ. |
| gluten-ut/spark41/src/test/scala/org/apache/gluten/utils/velox/VeloxTestSettings.scala | Updates Velox exclusions related to TIMESTAMP_NTZ and associated behaviors. |
| gluten-ut/spark40/src/test/scala/org/apache/spark/sql/GlutenCachedTableSuite.scala | Adds cache/uncache coverage for TIMESTAMP_NTZ. |
| gluten-ut/spark40/src/test/scala/org/apache/gluten/utils/velox/VeloxTestSettings.scala | Updates Velox exclusions related to TIMESTAMP_NTZ and associated behaviors. |
| gluten-ut/spark35/src/test/scala/org/apache/spark/sql/GlutenCachedTableSuite.scala | Adds cache/uncache coverage for TIMESTAMP_NTZ. |
| gluten-ut/spark35/src/test/scala/org/apache/gluten/utils/velox/VeloxTestSettings.scala | Updates Velox exclusions related to TIMESTAMP_NTZ and associated behaviors. |
| gluten-ut/spark35/src/test/scala/org/apache/gluten/utils/velox/VeloxSQLQueryTestSettings.scala | Adjusts supported Spark SQL query-test input lists (notably timestamp-related inputs). |
| gluten-ut/spark34/src/test/scala/org/apache/spark/sql/GlutenCachedTableSuite.scala | Adds cache/uncache coverage for TIMESTAMP_NTZ. |
| gluten-ut/spark34/src/test/scala/org/apache/gluten/utils/velox/VeloxTestSettings.scala | Updates Velox exclusions related to TIMESTAMP_NTZ and associated behaviors. |
| gluten-ut/spark33/src/test/scala/org/apache/spark/sql/GlutenCachedTableSuite.scala | Adds cache/uncache coverage for TIMESTAMP_NTZ. |
| gluten-ut/spark33/src/test/scala/org/apache/gluten/utils/velox/VeloxTestSettings.scala | Updates Velox exclusions related to TIMESTAMP_NTZ and associated behaviors. |
| gluten-ut/common/src/test/scala/org/apache/spark/sql/GlutenSQLTestsBaseTrait.scala | Adjusts Spark test conf to bind driver host/address to localhost. |
| gluten-substrait/src/main/scala/org/apache/gluten/extension/columnar/validator/Validators.scala | Refines validator/fallback rules around TIMESTAMP_NTZ (scan vs non-scan). |
| gluten-substrait/src/main/scala/org/apache/gluten/expression/ConverterUtils.scala | Adds TIMESTAMP_NTZ Spark/Substrait conversions and signatures. |
| gluten-substrait/src/main/java/org/apache/gluten/substrait/type/TypeBuilder.java | Adds TypeBuilder entry point for TimestampNTZTypeNode. |
| gluten-substrait/src/main/java/org/apache/gluten/substrait/type/TimestampNTZTypeNode.java | Introduces Substrait type node for TIMESTAMP_NTZ (mapped via Substrait TIMESTAMP). |
| gluten-substrait/src/main/java/org/apache/gluten/substrait/expression/TimestampNTZLiteralNode.java | Introduces literal node for TIMESTAMP_NTZ. |
| gluten-delta/src/test/scala/org/apache/gluten/execution/DeltaSuite.scala | Updates Delta test expectation for TIMESTAMP_NTZ table behavior. |
| gluten-arrow/src/main/scala/org/apache/spark/sql/utils/SparkArrowUtil.scala | Adds Arrow mapping for timestamp_ntz (no timezone). |
| cpp/velox/substrait/SubstraitToVeloxPlan.cc | Updates includes for Arrow/Velox utilities during plan conversion. |
| cpp/velox/substrait/SubstraitToVeloxExpr.cc | Extends literal type mapping for Substrait timestamp literals. |
| cpp/velox/substrait/SubstraitParser.cc | Extends Substrait type parsing and literal extraction for timestamp (no-tz). |
| backends-velox/src/test/scala/org/apache/gluten/functions/DateFunctionsValidateSuite.scala | Adds coverage for reading timestamp_ntz and ensuring unsupported functions fall back. |
| backends-velox/src/test/scala/org/apache/gluten/execution/VeloxParquetDataTypeValidationSuite.scala | Updates validation test to assert native scan for TIMESTAMP_NTZ. |
| backends-velox/src/test/scala/org/apache/gluten/execution/FallbackSuite.scala | Adjusts fallback behavior test to Spark 3.4+ and new expectations. |
| backends-velox/src/main/scala/org/apache/gluten/execution/VeloxColumnarToRowExec.scala | Allows timestamp_ntz through columnar-to-row validation. |
| backends-velox/src/main/scala/org/apache/gluten/config/VeloxConfig.scala | Changes default for enableTimestampNtzValidation. |
| backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxValidatorApi.scala | Updates schema validation allowance for timestamp_ntz when validation disabled. |
| backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxIteratorApi.scala | Adds partition value formatting support for timestamp_ntz. |
| backends-velox/src-delta33/test/scala/org/apache/spark/sql/delta/DeltaInsertIntoTableSuite.scala | Ignores a Delta timestamp_ntz round-trip test due to unsupported cast behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Run Gluten Clickhouse CI on x86 |
Member
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
Comment on lines
221
to
245
| private class FallbackByTimestampNTZ(enableValidation: Boolean) extends Validator { | ||
| override def validate(plan: SparkPlan): Validator.OutCome = { | ||
| if (!enableValidation) { | ||
| // Validation is disabled, allow TimestampNTZ | ||
| return pass() | ||
| } | ||
|
|
||
| def containsNTZ(dataType: DataType): Boolean = dataType match { | ||
| case dt if dt.catalogString == "timestamp_ntz" => true | ||
| case st: StructType => st.exists(f => containsNTZ(f.dataType)) | ||
| case at: ArrayType => containsNTZ(at.elementType) | ||
| case mt: MapType => containsNTZ(mt.keyType) || containsNTZ(mt.valueType) | ||
| case _ => false | ||
| } | ||
| val hasNTZ = plan.output.exists(a => containsNTZ(a.dataType)) || | ||
| plan.children.exists(_.output.exists(a => containsNTZ(a.dataType))) | ||
| if (hasNTZ) { | ||
| fail(s"${plan.nodeName} has TimestampNTZType in input/output schema") | ||
| } else { | ||
| pass() | ||
| // Validation is disabled, allow supported operators. | ||
| def containsNTZ(dataType: DataType): Boolean = dataType match { | ||
| case dt if dt.typeName == "timestamp_ntz" => true | ||
| case st: StructType => st.exists(f => containsNTZ(f.dataType)) | ||
| case at: ArrayType => containsNTZ(at.elementType) | ||
| case mt: MapType => containsNTZ(mt.keyType) || containsNTZ(mt.valueType) | ||
| case _ => false | ||
| } | ||
| val isScan = plan match { | ||
| case _: BatchScanExec => true | ||
| case _: FileSourceScanExec => true | ||
| case p if HiveTableScanExecTransformer.isHiveTableScan(p) => true | ||
| case _ => false | ||
| } | ||
| val hasNTZ = plan.output.exists(a => containsNTZ(a.dataType)) || | ||
| plan.children.exists(_.output.exists(a => containsNTZ(a.dataType))) | ||
| if (isScan || !hasNTZ) { | ||
| return pass() | ||
| } | ||
| } | ||
| fail(s"${plan.nodeName} has TimestampNTZType in input/output schema") | ||
| } |
|
Run Gluten Clickhouse CI on x86 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
Enables TimestampNTZ scan by default. Supports Arrow conversion and fallback of unsupported function.
How was this patch tested?
Unit tests.
Was this patch authored or co-authored using generative AI tooling?
Related issue: #11622