Skip to content

Conversation

@010Soham
Copy link
Contributor

@010Soham 010Soham commented Dec 19, 2025

What does this change do?

When write.target-file-size-bytes is smaller than a single row, bin packing computed a 0 row chunk size and PyArrow raised a ValueError. This change clamps the chunk size to at least 1, so writes still succeed (one row per batch/file when needed).

Why is this needed?

Fixes a crash when users set a small target file size and attempt to write large records.

How was this tested?

  • make lint
  • uv run python -m pytest tests/io/test_pyarrow.py -k "bin_pack_arrow_table" -v
  • make test (timed out at ~42%)

Closes #2795

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @010Soham, thanks for fixing this 👍

@Fokko Fokko merged commit 75ef45d into apache:main Dec 22, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tables with small write.target-file-size-bytes fail to ingest very large records

2 participants