Add 1 d chunking #119

jeromekelleher · 2026-01-19T17:02:04Z

WIP for #118

codecov-commenter · 2026-01-20T09:46:44Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.88%. Comparing base (6407cc1) to head (290a545).
⚠️ Report is 10 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #119      +/-   ##
==========================================
- Coverage   97.32%   96.88%   -0.45%     
==========================================
  Files           6        7       +1     
  Lines         337      385      +48     
  Branches       56       62       +6     
==========================================
+ Hits          328      373      +45     
- Misses          7        8       +1     
- Partials        2        4       +2

Flag	Coverage Δ
tszip	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Also switch to 32MiB chunks Closes tskit-dev#69 Closes tskit-dev#118

jeromekelleher · 2026-01-20T12:08:25Z

I think this is ready to go in once we've validated that it actually does fix compression on very large files. I don't have an example lying around handy (raw sc2ts output triggers this, which is what's motivated me to do it!)

I checked in terms of compression performance on the final sc2ts output, and the files are very slightly larger. It's not worth worrying about.

jeromekelleher · 2026-01-30T16:56:24Z

I can confirm that this can store columns that are this long, but there's now nasty performance gotchas with the decoding stage which depends on zarr's "iter" implementation over arrays (which is awful). I'm looking to simplify some logic in here while I'm at it, as it seems unnecessarily convoluted.

jeromekelleher added 2 commits January 19, 2026 16:30

Fix tests for tskit 1.0

c53fc17

Add chunk size to the Python API and test

1d77d1f

jeromekelleher force-pushed the add-1D-chunking branch from 9457139 to df0c7fa Compare January 20, 2026 09:42

jeromekelleher marked this pull request as ready for review January 20, 2026 09:42

jeromekelleher force-pushed the add-1D-chunking branch from df0c7fa to e95adec Compare January 20, 2026 09:45

jeromekelleher force-pushed the add-1D-chunking branch 2 times, most recently from fe22945 to 36aa743 Compare January 20, 2026 11:42

jeromekelleher requested a review from benjeffery January 20, 2026 11:42

jeromekelleher force-pushed the add-1D-chunking branch 2 times, most recently from 5b67175 to 041ed42 Compare January 20, 2026 12:02

Added chunking to CLI

290a545

Also switch to 32MiB chunks Closes tskit-dev#69 Closes tskit-dev#118

jeromekelleher force-pushed the add-1D-chunking branch from 041ed42 to 290a545 Compare January 20, 2026 12:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add 1 d chunking #119

Add 1 d chunking #119

Uh oh!

jeromekelleher commented Jan 19, 2026

Uh oh!

codecov-commenter commented Jan 20, 2026 •

edited

Loading

Uh oh!

jeromekelleher commented Jan 20, 2026

Uh oh!

jeromekelleher commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add 1 d chunking #119

Are you sure you want to change the base?

Add 1 d chunking #119

Uh oh!

Conversation

jeromekelleher commented Jan 19, 2026

Uh oh!

codecov-commenter commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jeromekelleher commented Jan 20, 2026

Uh oh!

jeromekelleher commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov-commenter commented Jan 20, 2026 •

edited

Loading