Skip to content

[fix][PES][datasource] fix oom error in partition statistic query by optimizing loop logic #1059

Description

@v-kkhuang

Search before asking

  • I had searched in the issues and found no similar issues.

Linkis Component

  • linkis-commons
  • linkis-computation-governance
  • linkis-dist
  • linkis-engineconn-plugin
  • linkis-extensions
  • linkis-orchestrator
  • linkis-public-enhancements
  • linkis-spring-cloud-services
  • linkis-web

Description

The existing partition statistic query logic in MdqServiceImpl uses functional programming with forEach and multiple stream operations. When handling a large number of partitions, this approach creates excessive intermediate objects, leading to OutOfMemoryError.

Steps to reproduce

  1. Query partition statistics for a table with many partitions (hundreds or thousands)
  2. Observe OutOfMemoryError in application logs
  3. The forEach + stream().sorted() operations create unnecessary object overhead

Expected behavior

The query should complete successfully without OOM errors regardless of the number of partitions.

Your environment

  • Linkis version used: 2.0.0
  • Environment name and version:
    • hadoop-3.3.4
    • hive-2.3.3
    • spark-2.4.3 / 3.3.0
    • scala-2.11.12 / 2.12.17
    • jdk 1.8.0_xxx

Anything else

This PR optimizes the code by:

  1. Replacing forEach with traditional for loop to avoid lambda overhead
  2. Extracting Comparator creation outside the loop to avoid repeated instantiation
  3. Adding null filter for subPartitions
  4. Adding isEmpty check to avoid unnecessary recursive calls
  5. Using sort() instead of stream().sorted().collect() for better memory efficiency

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions