Skip to content

Optimizer incorrectly rewrites window plans involving arrow scan inputs to self-joins with multiple refs to the scan #713

Description

@randalloveson

This regressed in 1.5.0.

This affects queries that use window functions but do not reference the arrow stream table more than once, e.g. SELECT g, v, sum(v) OVER (PARTITION BY g) AS s FROM %s.

1.4.4 always left this as one ARROW_SCAN_DUMB in the plan. 1.5.0 rewrites this to a self-join with multiple ARROW_SCAN_DUMB nodes, which doesn't work. Manifests as:

java.sql.SQLException: Invalid Input Error: This stream has been released

Related to: duckdb/duckdb-python#70

I'm filing a separate issue because that preexisting issue describes a query that references the arrow stream multiple times. If that were the limitation, it's much clearer boundary: There are still situations where you can use arrow stream inputs and it's easy to understand when you can and when you can't (multiple references no, one reference yes). This new regression breaks that contract.

(PR: duckdb/duckdb#23323)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions