-
Notifications
You must be signed in to change notification settings - Fork 2k
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
There has been notice that RepartitionExec is quite expensive in certain queries / scenarios recently:
- 20-30x slower on certain array types (internally at Datadog)
- weird behavior in distributed-datafusion on network shuffles depending on the number of output tasks (Improve shuffling performance datafusion-contrib/datafusion-distributed#385)
It has been difficult to investigate / isolate the reason for this due to lack of granularity of metrics provided in the RepartitionExec operator. As of now we are only provided:
send_time: time spent pulling the next batch from input stream (mixed spill, channel send, etc.)repartition_time: big bucket for repartition work (mixed routing and rebuilding batches from routed indices)fetch_time: per output partition, covered the whole public batch path
Describe the solution you'd like
I would like to introduce more granular metrics that will isolate where repartition is spending its time:
fetch_time: unchangedrepartition_time: now the end-to-end total repartition timeroute_time: the time to distribute row indices to output partitionsbatch_build_time: the time to build the record batcheschannel_wait_time: per output partition, the time waiting for channel capacity / send(...) to completespill_write_time: per output partition, the time writing spilled batchesspill_read_wait_time: per output partition, time the consumer side waits for a spilled batch to become readable
Describe alternatives you've considered
I have considered other metrics but want to leave hot-path / overhead as small as possible for collection while still gaining good insight into the operator
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request