[GLUTEN] Route bitmap_or_agg to native Velox execution#12242
Open
minni31 wants to merge 5 commits into
Open
Conversation
Register bitmap_or_agg aggregate function for native Velox execution: - Add BITMAP_OR_AGG constant to ExpressionNames - Add bitmap_or_agg to C++ plan validator supportedAggFuncs - Register Sig[BitmapOrAgg] in Spark 3.5/4.0/4.1 shims - Add DefaultValidator() to CH_AGGREGATE_FUNC_BLACKLIST (CH fallback) - Add plan-shape assertion test (excluded until Velox function lands) - Add ClickHouse test exclusions for native-only test Note: The native Velox bitmap_or_agg function is pending upstream (facebookincubator/velox). The test is excluded in VeloxTestSettings until that PR is merged and Gluten's Velox dependency is updated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Run Gluten Clickhouse CI on x86 |
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds support wiring for the bitmap_or_agg aggregate across Spark shims and backend validation, plus query-plan routing tests (with backend-specific exclusions where not yet supported).
Changes:
- Register
bitmap_or_aggin Spark 3.5/4.0/4.1 shims and add the expression name constant. - Add query suite coverage asserting
bitmap_or_aggroutes to native aggregation. - Update backend allow/deny lists (Velox validator + CH function validation / test exclusions).
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| shims/spark41/.../Spark41Shims.scala | Registers bitmap_or_agg expression signature for Spark 4.1 shim. |
| shims/spark40/.../Spark40Shims.scala | Registers bitmap_or_agg expression signature for Spark 4.0 shim. |
| shims/spark35/.../Spark35Shims.scala | Registers bitmap_or_agg expression signature for Spark 3.5 shim. |
| shims/common/.../ExpressionNames.scala | Adds BITMAP_OR_AGG SQL function name constant. |
| gluten-ut/spark41/.../GlutenBitmapExpressionsQuerySuite.scala | Adds routing-to-native test for bitmap_or_agg. |
| gluten-ut/spark41/.../VeloxTestSettings.scala | Excludes new bitmap_or_agg test for Velox backend pending support. |
| gluten-ut/spark41/.../ClickHouseTestSettings.scala | Excludes new bitmap_or_agg test for CH backend. |
| gluten-ut/spark40/.../GlutenBitmapExpressionsQuerySuite.scala | Adds routing-to-native test for bitmap_or_agg. |
| gluten-ut/spark40/.../VeloxTestSettings.scala | Excludes new bitmap_or_agg test for Velox backend pending support. |
| gluten-ut/spark40/.../ClickHouseTestSettings.scala | Excludes new bitmap_or_agg test for CH backend. |
| gluten-ut/spark35/.../GlutenBitmapExpressionsQuerySuite.scala | Adds routing-to-native test for bitmap_or_agg. |
| gluten-ut/spark35/.../VeloxTestSettings.scala | Excludes new bitmap_or_agg test for Velox backend pending support. |
| gluten-ut/spark35/.../ClickHouseTestSettings.scala | Excludes new bitmap_or_agg test for CH backend. |
| cpp/velox/substrait/SubstraitToVeloxPlanValidator.cc | Whitelists bitmap_or_agg as a supported aggregate in Velox plan validation. |
| backends-clickhouse/.../CHExpressionUtil.scala | Adds bitmap_or_agg to CH expression validation map. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The native bitmap_or_agg function has been merged upstream in Velox. Remove the .exclude() entries from VeloxTestSettings so the plan-shape assertion test now runs on CI. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Run Gluten Clickhouse CI on x86 |
Use a subquery to avoid nesting bitmap_construct_agg inside bitmap_or_agg at the same aggregation level. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Run Gluten Clickhouse CI on x86 |
Comment on lines
+41
to
+54
| test("bitmap_or_agg routes to native") { | ||
| val df = spark.sql( | ||
| "SELECT bitmap_or_agg(bm) FROM (" + | ||
| "SELECT bitmap_construct_agg(bitmap_bit_position(col)) AS bm " + | ||
| "FROM values (1L), (2L), (3L) AS t(col)" + | ||
| ") sub") | ||
| df.collect() | ||
| assert( | ||
| collectWithSubqueries(df.queryExecution.executedPlan) { | ||
| case h: HashAggregateExecBaseTransformer => h | ||
| }.nonEmpty, | ||
| "Expected native HashAggregateExecBaseTransformer in plan" | ||
| ) | ||
| } |
Comment on lines
+41
to
+54
| test("bitmap_or_agg routes to native") { | ||
| val df = spark.sql( | ||
| "SELECT bitmap_or_agg(bm) FROM (" + | ||
| "SELECT bitmap_construct_agg(bitmap_bit_position(col)) AS bm " + | ||
| "FROM values (1L), (2L), (3L) AS t(col)" + | ||
| ") sub") | ||
| df.collect() | ||
| assert( | ||
| collectWithSubqueries(df.queryExecution.executedPlan) { | ||
| case h: HashAggregateExecBaseTransformer => h | ||
| }.nonEmpty, | ||
| "Expected native HashAggregateExecBaseTransformer in plan" | ||
| ) | ||
| } |
Comment on lines
+41
to
+54
| test("bitmap_or_agg routes to native") { | ||
| val df = spark.sql( | ||
| "SELECT bitmap_or_agg(bm) FROM (" + | ||
| "SELECT bitmap_construct_agg(bitmap_bit_position(col)) AS bm " + | ||
| "FROM values (1L), (2L), (3L) AS t(col)" + | ||
| ") sub") | ||
| df.collect() | ||
| assert( | ||
| collectWithSubqueries(df.queryExecution.executedPlan) { | ||
| case h: HashAggregateExecBaseTransformer => h | ||
| }.nonEmpty, | ||
| "Expected native HashAggregateExecBaseTransformer in plan" | ||
| ) | ||
| } |
Comment on lines
91
to
+94
| enableSuite[GlutenBitmapExpressionsQuerySuite] | ||
| // bitmap_construct_agg is not supported natively in CH backend. | ||
| .excludeCH("bitmap_construct_agg routes to native") | ||
| .excludeCH("bitmap_or_agg routes to native") |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Register bitmap_or_agg aggregate function for native Velox execution:
Note: The native Velox bitmap_or_agg function is pending upstream (facebookincubator/velox). The test is excluded in VeloxTestSettings until that PR is merged and Gluten's Velox dependency is updated.
What changes are proposed in this pull request?
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?