Skip to content

fix(fsst): pick i32 vs i64 codes offsets per call#7836

Closed
connortsui20 wants to merge 5 commits into
developfrom
claude/move-fsst-regression-test-7724L
Closed

fix(fsst): pick i32 vs i64 codes offsets per call#7836
connortsui20 wants to merge 5 commits into
developfrom
claude/move-fsst-regression-test-7724L

Conversation

@connortsui20
Copy link
Copy Markdown
Contributor

@connortsui20 connortsui20 commented May 7, 2026

Fixes #7833.

fsst_compress_iter previously hardcoded VarBinBuilder::<i32> for the
FSST output, panicking once cumulative compressed bytes crossed
i32::MAX.

Approach

fsst_compress now does an upfront pass over the input to compute the
total uncompressed byte count, then picks VarBinBuilder<i32> when the
FSST upper-bound compressed size (2 * total + 7 * n) fits in
i32::MAX, falling back to VarBinBuilder<i64> only when it would
actually overflow. Common case keeps the compact i32 layout; pathological
inputs are correctly handled.

fsst_compress_iter (single-pass iterator API) keeps its public
signature and now always uses i64. It can't size-estimate without
consuming the iterator, and direct callers are test-only.

vortex-array/src/arrays/varbin/compute/compare.rs: latent bug surfaced
by the i64 path. With i64 offsets, Datum::try_new produces an Arrow
LargeBinary/LargeUtf8, but the RHS scalar was hardcoded to
Binary/Utf8. Arrow refuses LargeBinary == Binary. The RHS now
picks the matching Arrow scalar from lhs.data_type().

Tests

The previous ~5 GiB #[ignore]d regression test is replaced with:

  • Three boundary unit tests of the upper_bound_fits_i32 helper.
  • A small round-trip that asserts a fresh FSST array keeps i32
    codes_offsets for typical inputs.

The i64 path is exercised only for inputs whose worst-case compressed
size exceeds i32::MAX, which is too expensive to test directly; the
boundary unit tests cover the dispatch.

Checks

  • cargo test -p vortex-fsst --lib — 83 passed (was 78 before this PR).
  • cargo test -p vortex-array --lib arrays::varbin — 93 passed.
  • cargo test -p vortex-btrblocks --lib — 35 passed.
  • cargo +nightly fmt -- --check, cargo clippy -p vortex-fsst -p vortex-array --all-targets --all-features — clean.
  • ./scripts/public-api.sh — no public-api.lock changes.

🤖 Generated with Claude Code

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/fix A bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FSST: fsst_compress panics on cumulative output >2 GiB (i32 offset overflow)

4 participants