Is your feature request related to a problem or challenge?
Noticed while reviewing this PR from @2010YOUY01
GroupsAccumulator currently has two related APIs:
This optional capability adds branching and complexity in the group hash implementation (as the code has to handle the case when skip partial is not available)
It also means GroupsAccumulator implementations can silently miss the high-cardinality fast path.
I believe the original reason for this API was to allow incremental rollout of partial aggregate support (so we could introduce the API and then incrementally implement it for the accumulators)
Describe the solution you'd like
- Make
convert_to_state(...) a required part of the GroupsAccumulator
- remove
supports_convert_to_state()
- simplify hash aggregate so skip-partial aggregation no longer needs to check per-accumulator support.
Suggested Steps:
Acceptance criteria:
Describe alternatives you've considered
We could keep supports_convert_to_state() and continue enabling skip-partial aggregation only when all accumulators opt in. That preserves compatibility, but it keeps the hash aggregate code more complex and allows some accumulators to miss the high-cardinality fast path indefinitely.
Additional context
This is likely a breaking public API change for custom aggregate authors and FFI users, so it should be marked as an API change and documented in the upgrading guide.
Is your feature request related to a problem or challenge?
Noticed while reviewing this PR from @2010YOUY01
EmitToto output #23055GroupsAccumulatorcurrently has two related APIs:convert_to_state(...), which converts raw input rows directly into intermediate aggregate statesupports_convert_to_state(), which lets hash aggregate decide whether the skip-partial-aggregation optimization is availableThis optional capability adds branching and complexity in the group hash implementation (as the code has to handle the case when skip partial is not available)
It also means
GroupsAccumulatorimplementations can silently miss the high-cardinality fast path.I believe the original reason for this API was to allow incremental rollout of partial aggregate support (so we could introduce the API and then incrementally implement it for the accumulators)
Describe the solution you'd like
convert_to_state(...)a required part of theGroupsAccumulatorsupports_convert_to_state()Suggested Steps:
GroupsAccumulatorimplementations and ensure each has a correctconvert_to_state(...)not_implimplementation ofconvert_to_state(...)fromdatafusion/expr-common/src/groups_accumulator.rsGroupsAccumulator::supports_convert_to_staterow_hash.rsguardFFI_GroupsAccumulator::supports_convert_to_stateAcceptance criteria:
supports_convert_to_stateAPI remainsGroupsAccumulatorimplementorsDescribe alternatives you've considered
We could keep
supports_convert_to_state()and continue enabling skip-partial aggregation only when all accumulators opt in. That preserves compatibility, but it keeps the hash aggregate code more complex and allows some accumulators to miss the high-cardinality fast path indefinitely.Additional context
This is likely a breaking public API change for custom aggregate authors and FFI users, so it should be marked as an API change and documented in the upgrading guide.