Skip to content

[pull] main from apache:main#81

Merged
pull[bot] merged 7 commits intoburaksenn:mainfrom
apache:main
Apr 4, 2026
Merged

[pull] main from apache:main#81
pull[bot] merged 7 commits intoburaksenn:mainfrom
apache:main

Conversation

@pull
Copy link
Copy Markdown

@pull pull bot commented Apr 4, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

comphead and others added 7 commits April 3, 2026 19:42
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
## Which issue does this PR close?

- Closes #15161.

## Rationale for this change

In a comparison between a numeric column and a string literal (e.g.,
`WHERE int_col < '10'`), we previously coerced the numeric column to be
a string type. This resulted in doing a lexicographic comparison, which
results in incorrect query results.

Instead, we split type coercion into two situations: type coercion for
comparisons (including `IN` lists, `BETWEEN`, and `CASE WHEN`), where we
want string->numeric coercion, and type coercion for places like `UNION`
or `CASE ... THEN/ELSE`, where DataFusion's traditional behavior has
been to tolerate type mismatching by coercing values to strings.

Here is a (not necessarily exhaustive) summary of the behavioral changes
(old -> new):

```
  Comparisons (=, <, >, etc.):
    float_col = '5'      : string (wrong: '5'!='5.0')  -> numeric
    int_col > '100'      : string (wrong: '325'<'100') -> numeric
    int_col = 'hello'    : string, always false        -> cast error
    int_col = ''         : string, always false        -> cast error
    int_col = '99.99'    : string, always false        -> cast error
    Dict(Int) = '5'      : string                      -> numeric
    REE(Int) = '5'       : string                      -> numeric
    struct(int)=struct(str): int field to Utf8  -> str field to int

  IN lists:
    float_col IN ('1.0') : string (wrong: '1.0'!='1') -> numeric
    str_col IN ('a', 1)  : coerce to Utf8       -> coerce to Int64

  CASE:
    CASE str WHEN float  : coerce to Utf8       -> coerce to Float

  LIKE / regex:
    Dict(Int) LIKE '%5%' : coerce to Utf8       -> error (matches int)
    REE(Int) LIKE '%5%'  : coerce to Utf8       -> error (matches int)
    Dict(Int) ~ '5'      : coerce to Utf8       -> error (matches int)
    REE(Int) ~ '5'       : error (no REE)       -> error (REE added)
    REE(Utf8) ~ '5'      : error (no REE)       -> works (REE added)
```

## What changes are included in this PR?

* Update `comparison_coercion` to coerce strings to numerics
* Remove previous `comparison_coercion_numeric` function
* Add a new function, `type_union_coercion`, and use it when appropriate
* Add support for REE types with regexp operators (this was unsupported
for no good reason I can see)
* Add unit and SLT tests for new coercion behavior
* Update existing SLT tests for changes in coercion behavior
* Fix the ClickBench unparser tests to avoid comparing int fields with
non-numeric string literals

## Are these changes tested?

Yes. New tests added, existing tests pass.

## Are there any user-facing changes?

Yes, see table above. In most cases the new behavior should be more
sensible and less error-prone, but it will likely break some user code.

---------

Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## What changes are included in this PR?

Adds support for round spark function in data fusion.

## Are these changes tested?

yes, using UTs

## Are there any user-facing changes?

yes, adds new function.

---------

Co-authored-by: Subham Singhal <subhamsinghal@Subhams-MacBook-Air.local>
Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
## Which issue does this PR close?
 
Follow up of #20768.


## Rationale for this change

`Precision::min/max` allocates a lot of new `ScalarValues`, and it can
be done in place.
While running the `sql_planner` benchmark, it seems like for clickbench
`Statistics::try_merge_iter` is a significant part of the runtime, and
this PR improves that part by about 20-25% locally.

## What changes are included in this PR?

Introduces a couple of of new internal functions to calculate the
min/max of a `Precision` in-place.

## Are these changes tested?

Existing general tests, and a few new unit tests.

## Are there any user-facing changes?

None

---------

Signed-off-by: Adam Gutglick <adamgsal@gmail.com>
…spill files (#21293)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #21292.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

`InProgressSpillFile` will use the first batch schema when it has no
writer, which could have different nullability from the subsequent
batches (e.g. due to union-ing a literal projection and a table column
projection). This can then lead to a panic in `sort_batch`.

`InProgressSpillFile` already has access to the canonical schema that
the spill file should have (`self.spill_writer.schema()`).

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Single line fix: instead of using the first batch schema for the spill
file schema, use the spill_writer's schema instead.

The rest of the changes are two new tests.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes, two new tests are added.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
## Which issue does this PR close?

- part of #20529
- Broken out of #20820


## Rationale for this change

@Dandandan, @adriangb and I are in the process of significantly
reworking how FileStream works internally (morsels!)

As the FileStream will get more complicated, it needs a better structure
than one giant module to ensure we can work with it. I am trying to do
this as incremental PRs to keep review burden low

## What changes are included in this PR?

1. Move FileStreamMetrics to its own module

## Are these changes tested?

Yes by CI

## Are there any user-facing changes?

No, just internal code motion
@pull pull bot locked and limited conversation to collaborators Apr 4, 2026
@pull pull bot added the ⤵️ pull label Apr 4, 2026
@pull pull bot merged commit e1c5453 into buraksenn:main Apr 4, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants