Skip to content

Issue in multiple appends in one transaction #1946

@dor-bernstein

Description

@dor-bernstein

Apache Iceberg version

None

Please describe the bug 🐞

Hey,
I have a large arrow table that I want to append to a partitioned iceberg table.
I'm working locally with dockers and I'm using the tabulario/iceberg-rest:1.6.0 as my rest catalog.
To avoid OOMs, I'm splitting the arrow table into chunks. When using regular appends everything works as expected. However, I want to append all data in a single transaction. I have this code that does that:

            with table.transaction() as tx:
                for offset in range(0, data.num_rows, MAX_APPEND_CHUNK_SIZE):
                    data_slice = data.slice(offset, MAX_APPEND_CHUNK_SIZE)
                    logger.info(f'Writing batch of {data_slice.num_rows} with offset {offset} to table {table.name()}')
                    tx.append(data_slice)
                tx.commit_transaction()

The table is empty and was created in a different task.
I get the following error - CommitFailedException: Requirement failed: branch main was created concurrently.
When retrying I get this error pyiceberg.exceptions.CommitFailedException: CommitFailedException: Requirement failed: branch main has changed: expected id 4547037169132709864 != 132570956257248456.

Any help would be appreciated,
Thanks!

Willingness to contribute

  • I can contribute a fix for this bug independently
  • I would be willing to contribute a fix for this bug with guidance from the Iceberg community
  • I cannot contribute a fix for this bug at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions