Skip to content

_SnapshotProducer._summary() unreasonably slow #2673

@Anton-Tarazi

Description

@Anton-Tarazi

Apache Iceberg version

Latest

Please describe the bug 🐞

_SnapshotProducer.commit(), which is called whenever adding / deleting rows from a table, is surprisingly slow. I traced this to _SnapshotProducer._summary(): for every added/ deleted DataFile _summary calls the self._transaction.table_metadata property, unnecessarily copying the metadata.

#1903 introduced this regression, and I don't believe the performance impacts were as insignificant as stated there. O(# number of added / deleted data files) metadata copies is expensive for large writes.

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions