Search before asking
Linkis Component
Description
The existing partition statistic query logic in MdqServiceImpl uses functional programming with forEach and multiple stream operations. When handling a large number of partitions, this approach creates excessive intermediate objects, leading to OutOfMemoryError.
Steps to reproduce
- Query partition statistics for a table with many partitions (hundreds or thousands)
- Observe OutOfMemoryError in application logs
- The forEach + stream().sorted() operations create unnecessary object overhead
Expected behavior
The query should complete successfully without OOM errors regardless of the number of partitions.
Your environment
- Linkis version used: 2.0.0
- Environment name and version:
- hadoop-3.3.4
- hive-2.3.3
- spark-2.4.3 / 3.3.0
- scala-2.11.12 / 2.12.17
- jdk 1.8.0_xxx
Anything else
This PR optimizes the code by:
- Replacing forEach with traditional for loop to avoid lambda overhead
- Extracting Comparator creation outside the loop to avoid repeated instantiation
- Adding null filter for subPartitions
- Adding isEmpty check to avoid unnecessary recursive calls
- Using sort() instead of stream().sorted().collect() for better memory efficiency
Are you willing to submit a PR?
Search before asking
Linkis Component
Description
The existing partition statistic query logic in MdqServiceImpl uses functional programming with forEach and multiple stream operations. When handling a large number of partitions, this approach creates excessive intermediate objects, leading to OutOfMemoryError.
Steps to reproduce
Expected behavior
The query should complete successfully without OOM errors regardless of the number of partitions.
Your environment
Anything else
This PR optimizes the code by:
Are you willing to submit a PR?