Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions benchmarks/queries/clickbench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,22 @@ Results look like
Elapsed 30.195 seconds.
```


### Q9-Q12: FIRST_VALUE Aggregation Performance

These queries test the performance of the `FIRST_VALUE` aggregation function with different data types and grouping cardinalities.

| Query | `FIRST_VALUE` Column | Column Type | Group By Column | Group By Type | Number of Groups |
|-------|----------------------|-------------|-----------------|---------------|------------------|
| Q9 | `URL` | `Utf8` | `UserID` | `Int64` | 17,630,976 |
| Q10 | `URL` | `Utf8` | `OS` | `Int16` | 91 |
| Q11 | `WatchID` | `Int64` | `UserID` | `Int64` | 17,630,976 |
| Q12 | `WatchID` | `Int64` | `OS` | `Int16` | 91 |





## Data Notes

Here are some interesting statistics about the data used in the queries
Expand Down
8 changes: 8 additions & 0 deletions benchmarks/queries/clickbench/extended/q10.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
-- set datafusion.execution.parquet.binary_as_string = true

SELECT MAX(len) FROM (
SELECT LENGTH(FIRST_VALUE("URL" ORDER BY "EventTime")) as len
FROM hits
GROUP BY "OS"
);
8 changes: 8 additions & 0 deletions benchmarks/queries/clickbench/extended/q11.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
-- set datafusion.execution.parquet.binary_as_string = true

SELECT MAX(fv) FROM (
SELECT FIRST_VALUE("WatchID" ORDER BY "EventTime") as fv
FROM hits
GROUP BY "UserID"
);
8 changes: 8 additions & 0 deletions benchmarks/queries/clickbench/extended/q12.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
-- set datafusion.execution.parquet.binary_as_string = true

SELECT MAX(fv) FROM (
SELECT FIRST_VALUE("WatchID" ORDER BY "EventTime") as fv
FROM hits
GROUP BY "OS"
);
4 changes: 4 additions & 0 deletions benchmarks/queries/clickbench/extended/q8.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
-- set datafusion.execution.parquet.binary_as_string = true

SELECT "RegionID", "UserAgent", "OS", AVG(to_timestamp("ResponseEndTiming")-to_timestamp("ResponseStartTiming")) as avg_response_time, AVG(to_timestamp("ResponseEndTiming")-to_timestamp("ConnectTiming")) as avg_latency FROM hits GROUP BY "RegionID", "UserAgent", "OS" ORDER BY avg_latency DESC limit 10;
8 changes: 8 additions & 0 deletions benchmarks/queries/clickbench/extended/q9.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
-- set datafusion.execution.parquet.binary_as_string = true

SELECT MAX(len) FROM (
SELECT LENGTH(FIRST_VALUE("URL" ORDER BY "EventTime")) as len
FROM hits
GROUP BY "UserID"
);
14 changes: 14 additions & 0 deletions datafusion/common/src/join_type.rs
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,20 @@ impl JoinType {
| JoinType::RightMark
)
}

/// Returns true when an empty build side necessarily produces an empty
/// result for this join type.
pub fn empty_build_side_produces_empty_result(self) -> bool {
matches!(
self,
JoinType::Inner
| JoinType::Left
| JoinType::LeftSemi
| JoinType::LeftAnti
| JoinType::LeftMark
| JoinType::RightSemi
)
}
}

impl Display for JoinType {
Expand Down
1 change: 1 addition & 0 deletions datafusion/expr-common/src/type_coercion/binary.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1576,6 +1576,7 @@ fn string_concat_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option<Da
(Dictionary(_, lhs_value_type), Dictionary(_, rhs_value_type)) => {
string_coercion(lhs_value_type, rhs_value_type).or(None)
}
(Binary, Binary) => Some(Utf8),
_ => None,
})
}
Expand Down
Loading
Loading