Skip to content

perf: use set_virtual_refs_arr for faster icechunk writes#967

Draft
TomNicholas wants to merge 2 commits intozarr-developers:mainfrom
TomNicholas:use-set-virtual-refs-arr
Draft

perf: use set_virtual_refs_arr for faster icechunk writes#967
TomNicholas wants to merge 2 commits intozarr-developers:mainfrom
TomNicholas:use-set-virtual-refs-arr

Conversation

@TomNicholas
Copy link
Copy Markdown
Member

Summary

  • Replaces the Python loop that creates per-chunk VirtualChunkSpec objects with a single call to store.set_virtual_refs_arr(), passing the manifest's numpy arrays directly to Rust
  • ~2.8x faster for writing virtual refs against object storage (12.4s → 4.5s for 6 vars × 520k chunks each against Arraylake)

Depends on earth-mover/icechunk#2049.

Replaces the Python loop that creates per-chunk VirtualChunkSpec objects
with a single call to store.set_virtual_refs_arr(), passing the manifest's
numpy arrays directly to Rust. ~2.8x faster for writing virtual refs
against object storage (12.4s → 4.5s for 6 vars × 520k chunks each).

Requires icechunk with set_virtual_refs_arr (earth-mover/icechunk#2049).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@TomNicholas TomNicholas added performance Icechunk 🧊 Relates to Icechunk library / spec labels Apr 10, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.06%. Comparing base (7a9e5fd) to head (9e13574).

❌ Your project check has failed because the head coverage (73.06%) is below the target coverage (75.00%). You can increase the head coverage or adjust the target coverage.

❗ There is a different number of reports uploaded between BASE (7a9e5fd) and HEAD (9e13574). Click for more details.

HEAD has 11 uploads less than BASE
Flag BASE (7a9e5fd) HEAD (9e13574)
unittests 12 1
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #967       +/-   ##
===========================================
- Coverage   89.31%   73.06%   -16.26%     
===========================================
  Files          33       33               
  Lines        2031     2027        -4     
===========================================
- Hits         1814     1481      -333     
- Misses        217      546      +329     
Files with missing lines Coverage Δ
virtualizarr/writers/icechunk.py 14.19% <ø> (-77.17%) ⬇️

... and 14 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Icechunk 🧊 Relates to Icechunk library / spec performance test-upstream Run the upstream tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant