Backend
VL (Velox)
Bug description
spark.sql(s"""
|create table delta_cm2 (id int, name string) using delta
|partitioned by (id)
|tblproperties ("delta.columnMapping.mode"= "name")
|""".stripMargin)
spark.sql(s"""
|insert into delta_cm2 values (1, "v1"), (2, "v2"), (3, "v3")
|""".stripMargin)
spark.sql("select name from delta_cm2 where id > 2")
[Expected behavior]
returns ["v3"]
[actual behavior]
returns ["v1", "v2", "v3"]
== Physical Plan ==
VeloxColumnarToRow
+- ^(32) ProjectExecTransformer [col-4aba4d55-65eb-461f-9a50-154cb6b1c6ec#6087 AS name#6087]
+- ^(32) FileScanTransformer parquet spark_catalog.default.delta_cm2[col-4aba4d55-65eb-461f-9a50-154cb6b1c6ec#6087,col-9a0726cb-9edd-4f44-b7f7-a1ccfc5474c3#6086] Batched: true, DataFilters: [], Format: Parquet, Location: PreparedDeltaFileIndex(1 paths)[file:/root/gluten/backends-velox/spark-warehouse/org.apache.glute..., PartitionFilters: [isnotnull(col-9a0726cb-9edd-4f44-b7f7-a1ccfc5474c3#6086), (col-9a0726cb-9edd-4f44-b7f7-a1ccfc547..., PushedFilters: [], ReadSchema: struct<col-4aba4d55-65eb-461f-9a50-154cb6b1c6ec:string> NativeFilters: []
== Results ==
== Results ==
!== Correct Answer - 1 == == Gluten Answer - 3 ==
struct<> struct<>
![v3] [v1]
! [v2]
! [v3] (GlutenQueryTest.scala:437)
This is because Delta filtering expects logical column name in filters:
https://github.com/delta-io/delta/blob/44c619e51846f3e98aa6605d13fa4517de049281/spark/src/main/scala/org/apache/spark/sql/delta/stats/PrepareDeltaScan.scala#L356
Though Gluten changed logical column names to physical column names for parquet reader, filter pushdown.
In JVM Spark, Delta changes the name to physical column names when creating parquet reader.
https://github.com/delta-io/delta/blob/44c619e51846f3e98aa6605d13fa4517de049281/spark/src/main/scala/org/apache/spark/sql/delta/DeltaParquetFileFormat.scala#L179
We need to add a fallback for column mapping mode, both id and name.
Gluten version
Gluten-1.3
Spark version
Spark-3.5.x
Spark configurations
Spark3.5 and Delta 3.2
System information
No response
Relevant logs
Backend
VL (Velox)
Bug description
[Expected behavior]
returns ["v3"]
[actual behavior]
returns ["v1", "v2", "v3"]
This is because Delta filtering expects logical column name in filters:
https://github.com/delta-io/delta/blob/44c619e51846f3e98aa6605d13fa4517de049281/spark/src/main/scala/org/apache/spark/sql/delta/stats/PrepareDeltaScan.scala#L356
Though Gluten changed logical column names to physical column names for parquet reader, filter pushdown.
In JVM Spark, Delta changes the name to physical column names when creating parquet reader.
https://github.com/delta-io/delta/blob/44c619e51846f3e98aa6605d13fa4517de049281/spark/src/main/scala/org/apache/spark/sql/delta/DeltaParquetFileFormat.scala#L179
We need to add a fallback for column mapping mode, both id and name.
Gluten version
Gluten-1.3
Spark version
Spark-3.5.x
Spark configurations
Spark3.5 and Delta 3.2
System information
No response
Relevant logs