Skip to content

[GLUTEN-11622][VL] Add basic TIMESTAMP_NTZ type support (#11939)#12229

Open
rui-mo wants to merge 3 commits into
mainfrom
ts_ntz_dev
Open

[GLUTEN-11622][VL] Add basic TIMESTAMP_NTZ type support (#11939)#12229
rui-mo wants to merge 3 commits into
mainfrom
ts_ntz_dev

Conversation

@rui-mo

@rui-mo rui-mo commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

Enables TimestampNTZ scan by default. Supports Arrow conversion and fallback of unsupported function.

How was this patch tested?

Unit tests.

Was this patch authored or co-authored using generative AI tooling?

Related issue: #11622

@github-actions github-actions Bot added CORE works for Gluten Core VELOX DATA_LAKE labels Jun 4, 2026
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Copilot AI review requested due to automatic review settings June 9, 2026 01:31
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds initial Velox-backend support for Spark TIMESTAMP_NTZ by enabling native scan paths, wiring Substrait/Arrow type conversions, and adjusting validator/fallback behavior and test coverage across supported Spark versions.

Changes:

  • Enable native Parquet scan for TIMESTAMP_NTZ and add validation/translation plumbing (Substrait + C++ parser/expr handling).
  • Extend Spark↔Arrow and Spark↔Substrait type mappings for timestamp_ntz, plus related iterator/columnar-to-row handling.
  • Add/adjust unit tests and backend test settings/exclusions for TIMESTAMP_NTZ behaviors.

Reviewed changes

Copilot reviewed 30 out of 30 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
gluten-ut/spark41/src/test/scala/org/apache/spark/sql/GlutenCachedTableSuite.scala Adds cache/uncache coverage for TIMESTAMP_NTZ.
gluten-ut/spark41/src/test/scala/org/apache/gluten/utils/velox/VeloxTestSettings.scala Updates Velox exclusions related to TIMESTAMP_NTZ and associated behaviors.
gluten-ut/spark40/src/test/scala/org/apache/spark/sql/GlutenCachedTableSuite.scala Adds cache/uncache coverage for TIMESTAMP_NTZ.
gluten-ut/spark40/src/test/scala/org/apache/gluten/utils/velox/VeloxTestSettings.scala Updates Velox exclusions related to TIMESTAMP_NTZ and associated behaviors.
gluten-ut/spark35/src/test/scala/org/apache/spark/sql/GlutenCachedTableSuite.scala Adds cache/uncache coverage for TIMESTAMP_NTZ.
gluten-ut/spark35/src/test/scala/org/apache/gluten/utils/velox/VeloxTestSettings.scala Updates Velox exclusions related to TIMESTAMP_NTZ and associated behaviors.
gluten-ut/spark35/src/test/scala/org/apache/gluten/utils/velox/VeloxSQLQueryTestSettings.scala Adjusts supported Spark SQL query-test input lists (notably timestamp-related inputs).
gluten-ut/spark34/src/test/scala/org/apache/spark/sql/GlutenCachedTableSuite.scala Adds cache/uncache coverage for TIMESTAMP_NTZ.
gluten-ut/spark34/src/test/scala/org/apache/gluten/utils/velox/VeloxTestSettings.scala Updates Velox exclusions related to TIMESTAMP_NTZ and associated behaviors.
gluten-ut/spark33/src/test/scala/org/apache/spark/sql/GlutenCachedTableSuite.scala Adds cache/uncache coverage for TIMESTAMP_NTZ.
gluten-ut/spark33/src/test/scala/org/apache/gluten/utils/velox/VeloxTestSettings.scala Updates Velox exclusions related to TIMESTAMP_NTZ and associated behaviors.
gluten-ut/common/src/test/scala/org/apache/spark/sql/GlutenSQLTestsBaseTrait.scala Adjusts Spark test conf to bind driver host/address to localhost.
gluten-substrait/src/main/scala/org/apache/gluten/extension/columnar/validator/Validators.scala Refines validator/fallback rules around TIMESTAMP_NTZ (scan vs non-scan).
gluten-substrait/src/main/scala/org/apache/gluten/expression/ConverterUtils.scala Adds TIMESTAMP_NTZ Spark/Substrait conversions and signatures.
gluten-substrait/src/main/java/org/apache/gluten/substrait/type/TypeBuilder.java Adds TypeBuilder entry point for TimestampNTZTypeNode.
gluten-substrait/src/main/java/org/apache/gluten/substrait/type/TimestampNTZTypeNode.java Introduces Substrait type node for TIMESTAMP_NTZ (mapped via Substrait TIMESTAMP).
gluten-substrait/src/main/java/org/apache/gluten/substrait/expression/TimestampNTZLiteralNode.java Introduces literal node for TIMESTAMP_NTZ.
gluten-delta/src/test/scala/org/apache/gluten/execution/DeltaSuite.scala Updates Delta test expectation for TIMESTAMP_NTZ table behavior.
gluten-arrow/src/main/scala/org/apache/spark/sql/utils/SparkArrowUtil.scala Adds Arrow mapping for timestamp_ntz (no timezone).
cpp/velox/substrait/SubstraitToVeloxPlan.cc Updates includes for Arrow/Velox utilities during plan conversion.
cpp/velox/substrait/SubstraitToVeloxExpr.cc Extends literal type mapping for Substrait timestamp literals.
cpp/velox/substrait/SubstraitParser.cc Extends Substrait type parsing and literal extraction for timestamp (no-tz).
backends-velox/src/test/scala/org/apache/gluten/functions/DateFunctionsValidateSuite.scala Adds coverage for reading timestamp_ntz and ensuring unsupported functions fall back.
backends-velox/src/test/scala/org/apache/gluten/execution/VeloxParquetDataTypeValidationSuite.scala Updates validation test to assert native scan for TIMESTAMP_NTZ.
backends-velox/src/test/scala/org/apache/gluten/execution/FallbackSuite.scala Adjusts fallback behavior test to Spark 3.4+ and new expectations.
backends-velox/src/main/scala/org/apache/gluten/execution/VeloxColumnarToRowExec.scala Allows timestamp_ntz through columnar-to-row validation.
backends-velox/src/main/scala/org/apache/gluten/config/VeloxConfig.scala Changes default for enableTimestampNtzValidation.
backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxValidatorApi.scala Updates schema validation allowance for timestamp_ntz when validation disabled.
backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxIteratorApi.scala Adds partition value formatting support for timestamp_ntz.
backends-velox/src-delta33/test/scala/org/apache/spark/sql/delta/DeltaInsertIntoTableSuite.scala Ignores a Delta timestamp_ntz round-trip test due to unsupported cast behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cpp/velox/substrait/SubstraitParser.cc
Comment thread cpp/velox/substrait/SubstraitToVeloxExpr.cc
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zhouyuan zhouyuan left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@zhouyuan

zhouyuan commented Jun 9, 2026

Copy link
Copy Markdown
Member

Run Gluten Clickhouse CI on x86

Copilot AI review requested due to automatic review settings June 10, 2026 03:10
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.

Comment on lines 221 to 245
private class FallbackByTimestampNTZ(enableValidation: Boolean) extends Validator {
override def validate(plan: SparkPlan): Validator.OutCome = {
if (!enableValidation) {
// Validation is disabled, allow TimestampNTZ
return pass()
}

def containsNTZ(dataType: DataType): Boolean = dataType match {
case dt if dt.catalogString == "timestamp_ntz" => true
case st: StructType => st.exists(f => containsNTZ(f.dataType))
case at: ArrayType => containsNTZ(at.elementType)
case mt: MapType => containsNTZ(mt.keyType) || containsNTZ(mt.valueType)
case _ => false
}
val hasNTZ = plan.output.exists(a => containsNTZ(a.dataType)) ||
plan.children.exists(_.output.exists(a => containsNTZ(a.dataType)))
if (hasNTZ) {
fail(s"${plan.nodeName} has TimestampNTZType in input/output schema")
} else {
pass()
// Validation is disabled, allow supported operators.
def containsNTZ(dataType: DataType): Boolean = dataType match {
case dt if dt.typeName == "timestamp_ntz" => true
case st: StructType => st.exists(f => containsNTZ(f.dataType))
case at: ArrayType => containsNTZ(at.elementType)
case mt: MapType => containsNTZ(mt.keyType) || containsNTZ(mt.valueType)
case _ => false
}
val isScan = plan match {
case _: BatchScanExec => true
case _: FileSourceScanExec => true
case p if HiveTableScanExecTransformer.isHiveTableScan(p) => true
case _ => false
}
val hasNTZ = plan.output.exists(a => containsNTZ(a.dataType)) ||
plan.children.exists(_.output.exists(a => containsNTZ(a.dataType)))
if (isScan || !hasNTZ) {
return pass()
}
}
fail(s"${plan.nodeName} has TimestampNTZType in input/output schema")
}
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core DATA_LAKE VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants