Fix Subtraction overflow in max_distinct_count when hash join has a pushed-down limit#20799
Conversation
jonathanc-n
left a comment
There was a problem hiding this comment.
Sorry didn't catch this would happen in my PR. Thanks for the fix!
|
cc @gabotechs |
| Ok(()) | ||
| } | ||
| } | ||
| #[test] |
There was a problem hiding this comment.
I created a test for this for sqllogictests @KARTIK64-rgb, can add:
statement ok
CREATE TABLE t1(a INT, b INT) AS VALUES
(NULL, 1), (NULL, 2), (NULL, 3), (NULL, 4), (NULL, 5);
statement ok
CREATE TABLE t2(c INT) AS VALUES (1), (2);
# This query panicked before the fix: the ORDER BY forces a SortExec,
# the LIMIT gets pushed into SortExec.fetch, and the HashJoinExec
# calls partition_statistics() on the SortExec child during execution.
query II
SELECT sub.a, sub.b FROM (
SELECT * FROM t1 ORDER BY b LIMIT 1
) sub
JOIN t2 ON sub.a = t2.c;
----
statement ok
DROP TABLE t1;
statement ok
DROP TABLE t2;
i verified it reproduces the bug
There was a problem hiding this comment.
Looks good, thanks @KARTIK64-rgb for quick fix and @jonathanc-n for the review!
As soon as the CI issues are addressed, this is good to merge.
| let null_count = *stats.null_count.get_value().unwrap_or(&0); | ||
| let non_null_count = count.checked_sub(null_count).unwrap_or(0); |
There was a problem hiding this comment.
👍 nice, even if this is a good safeguard, the fact that this can even happen makes me think that there is some further work to be done in the stats propagation mechanism.
Ideally, this would not even be possible by construction, but that's a topic for another PR.
|
@KARTIK64-rgb can you help fix the CI tests on this PR? |
max_distinct_count when hash join has a pushed-down limit
…m/KARTIK64-rgb/datafusion into fix/issue-20779-subtract-overflow
|
I pushed a commit to try and resolve the build error |
|
Should be good now, in it goes. Thanks @KARTIK64-rgb for the PR, @jonathanc-n for the review and @alamb for fixing the build error. |
|
woohoo -- progress every day |
|
Thanks @alamb for solving the ci-issues. |
Which issue does this PR close?
max_distinct_countwhen hash join has a pushed-down limit #20779.Rationale for this change
In
max_distinct_count(insidedatafusion/physical-plan/src/joins/utils.rs), thePrecision::Exactbranch computes the number of non-null rows by doing:Before #20228 this subtraction was always safe because
num_rowswas never smallerthan
null_count. But #20228 addedfetch(limit push-down) support toHashJoinExec, and when a limit is applied,partition_statistics()capsnum_rowstoExact(fetch_value)without also capping the per-columnnull_count. This meansnull_countcan legally exceednum_rows, causing apanic with "attempt to subtract with overflow".
What changes are included in this PR?
Bug fix in
max_distinct_count(utils.rs~line 725): replaced the baresubtraction with a saturating subtraction so that when
null_countexceedsnum_rowsthe result is clamped to0instead of panicking.Regression test added at the bottom of the
mod testsblock in the samefile. The test deliberately constructs a scenario where
null_count (5) > num_rows (2)and asserts thatmax_distinct_countreturnsExact(0)withoutpanicking.
Are these changes tested?
Yes. A new unit test
test_max_distinct_count_no_overflow_when_null_count_exceeds_num_rowsis addeddirectly in
datafusion/physical-plan/src/joins/utils.rs. It covers the exactedge-case from the bug report (null_count > num_rows after a fetch/limit
push-down) and would have caught the panic before the fix.
Are there any user-facing changes?
No user-facing or API changes. This is a purely internal arithmetic fix in the
statistics estimation logic. Queries that previously panicked when a limit was
pushed down into a
HashJoinExecwill now complete successfully.