Add cast pushdown optimization for bit-packed integer widening by joseph-isaacs · Pull Request #8046 · vortex-data/vortex

joseph-isaacs · 2026-05-21T12:58:09Z

Summary

This PR implements a "cast pushdown" optimization for widening casts on bit-packed integer columns (e.g., u16 -> u32). Rather than canonicalizing to a full-length intermediate array and then casting it, the optimization unpacks each FastLanes chunk into a cache-resident scratch buffer and casts values directly into the output buffer during decompression.

Running locally I get

  ┌────────────────────┬─────────────────┬──────────────────────┬─────────┐
  │ Case               │ Old public path │ Current cast_execute │ Speedup │
  ├────────────────────┼─────────────────┼──────────────────────┼─────────┤
  │ (65536, 1, 0.0)    │        71.68 us │             6.895 us │   10.4x │
  │ (65536, 16, 0.01)  │        2.229 ms │             423.4 us │    5.3x │
  │ (1048576, 1, 0.01) │        2.387 ms │             337.6 us │    7.1x │
  └────────────────────┴─────────────────┴──────────────────────┴─────────┘

Widening a bit-packed narrow integer column to a wider type (e.g. u16 -> u32) currently has no cast pushdown: cast(bit_packed) canonicalizes to a full-length narrow PrimitiveArray and then casts it, allocating two full-length buffers and round-tripping the narrow intermediate through RAM. Add `BitUnpackedChunks::decode_cast_into`, which unpacks each 1024-element FastLanes chunk into the existing cache-resident scratch buffer and maps each value through a closure into a differently-typed output, plus `unpack_and_cast_into_builder` which uses it to unpack straight into a wide PrimitiveBuilder (handling validity and patches). Add a divan benchmark (cast_bitpacked) comparing the current canonicalize-then-cast path against the pushdown, over single and chunked arrays, with and without patches. Signed-off-by: "Joe Isaacs" <joe.isaacs@live.co.uk>

Extend BitPacked's CastKernel so that widening integer casts (e.g. u16 -> u32) dispatch to the unpack-and-cast pushdown automatically, instead of falling back to canonicalize-then-cast. The cast is gated to strictly wider integer targets where every bit-packable value is representable (unsigned source, or signed-to-signed), so no per-value bounds check is needed. Update the cast_bitpacked benchmark to measure the real array.cast(u32).execute() path alongside an explicit canonicalize-then-cast baseline and the direct helper. Signed-off-by: "Joe Isaacs" <joe.isaacs@live.co.uk>

codspeed-hq · 2026-05-21T13:08:50Z

Merging this PR will improve performance by 19.8%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 1 improved benchmark
✅ 1236 untouched benchmarks

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Simulation	`chunked_varbinview_opt_canonical_into[(1000, 10)]`	225.1 µs	187.9 µs	+19.8%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing claude/cast-bitpacked-pushdown-VNtVh (dde5949) with develop (f852d72)}

Signed-off-by: "Joe Isaacs" <joe.isaacs@live.co.uk>

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

Generalize apply_patches_to_uninit_range_fn to a cross-type Fn(S) -> T so the cast pushdown reuses it instead of a near-identical copy, and drop the redundant identity wrapper. Behaviour and performance are unchanged. Signed-off-by: "Joe Isaacs" <joe.isaacs@live.co.uk>

Replace the direct-kernel and direct-helper cast tests with a single end-to-end test that drives array.cast(target).execute(), proving the public Vortex path dispatches to BitPacked's widening pushdown across all supported integer pairs, chunk-boundary lengths, and a sliced case. Signed-off-by: "Joe Isaacs" <joe.isaacs@live.co.uk>

claude added 2 commits May 21, 2026 11:30

claude and others added 2 commits May 21, 2026 13:12

Fix clippy lints and refresh public-api.lock for cast pushdown

2010fd3

Signed-off-by: "Joe Isaacs" <joe.isaacs@live.co.uk>

u

b0d1f54

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs requested review from a team and robert3005 May 21, 2026 14:39

joseph-isaacs and others added 4 commits May 21, 2026 16:35

u

921b04b

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

u

69d7270

Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs added the action/benchmark Trigger full benchmarks to run on this PR label May 21, 2026

github-actions Bot removed the action/benchmark Trigger full benchmarks to run on this PR label May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cast pushdown optimization for bit-packed integer widening#8046

Add cast pushdown optimization for bit-packed integer widening#8046
joseph-isaacs wants to merge 8 commits into
developfrom
claude/cast-bitpacked-pushdown-VNtVh

joseph-isaacs commented May 21, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

joseph-isaacs commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

codspeed-hq Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 19.8%

Performance Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

joseph-isaacs commented May 21, 2026 •

edited

Loading

codspeed-hq Bot commented May 21, 2026 •

edited

Loading