Skip to content

Support for aggregate functions on intervals #23085

Description

@Abdullahsab3

Is your feature request related to a problem or challenge?

In postgres you can have the following query:

with test_values as (
    select * from (values (interval '1 second'), (interval '1 year'), (interval '1 month')) as t(value)
)
select avg(value) from test_values;

>> 0 years 4 mons 10 days 0 hours 0 mins 0.333333 secs

with test_values as (
    select * from (values (interval '1 second'), (interval '1 year'), (interval '1 month')) as t(value)
)
select sum(value) from test_values;

>> 1 years 1 mons 0 days 0 hours 0 mins 1.0 secs

It would be nice if Datafusion also had support for it. Datafusion current output:

> with test_values as (
    select * from (values (interval '1 second'), (interval '1 year'), (interval '1 month')) as t(value)
)
select avg(value) from test_values;
Error during planning: Internal error: Function 'avg' failed to match any signature, errors: Error during planning: Function 'avg' requires Decimal, but received Interval(MonthDayNano) (DataType: Interval(MonthDayNano)).,Error during planning: Function 'avg' requires Duration, but received Interval(MonthDayNano) (DataType: Interval(MonthDayNano)).,Error during planning: Function 'avg' requires Float64, but received Interval(MonthDayNano) (DataType: Interval(MonthDayNano))..
This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues. No function matches the given name and argument types 'avg(Interval(MonthDayNano))'. You might need to add explicit type casts.
	Candidate functions:
	avg(Decimal)
	avg(Duration)
	avg(Float64)

> with test_values as (
    select * from (values (interval '1 second'), (interval '1 year'), (interval '1 month')) as t(value)
)
select sum(value) from test_values;
Error during planning: Internal error: Function 'sum' failed to match any signature, errors: Error during planning: Function 'sum' requires Decimal, but received Interval(MonthDayNano) (DataType: Interval(MonthDayNano)).,Error during planning: Function 'sum' requires UInt64, but received Interval(MonthDayNano) (DataType: Interval(MonthDayNano)).,Error during planning: Function 'sum' requires Int64, but received Interval(MonthDayNano) (DataType: Interval(MonthDayNano)).,Error during planning: Function 'sum' requires Float64, but received Interval(MonthDayNano) (DataType: Interval(MonthDayNano)).,Error during planning: Function 'sum' requires Duration, but received Interval(MonthDayNano) (DataType: Interval(MonthDayNano))..
This issue was likely caused by a bug in DataFusion's code. Please help us to resolve this by filing a bug report in our issue tracker: https://github.com/apache/datafusion/issues. No function matches the given name and argument types 'sum(Interval(MonthDayNano))'. You might need to add explicit type casts.
	Candidate functions:
	sum(Decimal)
	sum(UInt64)
	sum(Int64)
	sum(Float64)
	sum(Duration)

Describe the solution you'd like

No response

Describe alternatives you've considered

aggregating epoch

with test_values as (
    select * from (values (interval '1 second'), (interval '1 year'), (interval '1 month')) as t(value)
)
select sum(extract(epoch from value)) from test_values;

it works but it wont be a duration/interval anymore

Additional context

Very nice for time series operations. for example if you would like to know the total duration of present gaps of a time series

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions