Skip to content

perf: use zarrs if available for full read/write#2326

Merged
ilan-gold merged 23 commits intomainfrom
ig/zarrs_default
Apr 9, 2026
Merged

perf: use zarrs if available for full read/write#2326
ilan-gold merged 23 commits intomainfrom
ig/zarrs_default

Conversation

@ilan-gold
Copy link
Copy Markdown
Contributor

@ilan-gold ilan-gold commented Feb 3, 2026

I'm not exactly sure how we want to test this (i.e., how do we deal with dependncies? include zarrs in test?) or what the conditions are for turning it on (at the moment, it's that you gave us a LocalStore or a path-like input but not a Group)..

  • Requested by @Zethson at some point, but also makes sense.
  • Tests added
  • Release note added (or unnecessary)

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 3, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.50%. Comparing base (082ee74) to head (898a8e8).
⚠️ Report is 1 commits behind head on main.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/anndata/_io/zarr.py 83.33% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2326      +/-   ##
==========================================
- Coverage   87.45%   85.50%   -1.96%     
==========================================
  Files          49       49              
  Lines        7742     7753      +11     
==========================================
- Hits         6771     6629     -142     
- Misses        971     1124     +153     
Files with missing lines Coverage Δ
src/anndata/_io/zarr.py 82.10% <83.33%> (+2.34%) ⬆️

... and 8 files with indirect coverage changes

Copy link
Copy Markdown
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what the conditions are for turning it on

Maybe have zarrs-python export a function that checks if it can (probably) handle something, and use that function here if zarrs-python is importable?

Comment thread src/anndata/_io/zarr.py Outdated
Co-authored-by: Philipp A. <flying-sheep@web.de>
@ilan-gold
Copy link
Copy Markdown
Contributor Author

ilan-gold commented Feb 5, 2026

Maybe have zarrs-python export a function that checks if it can (probably) handle something, and use that function here if zarrs-python is importable?

It goes deeper than that. If you pass in an ObsStore for example or HTTPStore, zarrs can handle that but it will just sync-ify all of the ops. I would guess we don't want that since I'm not sure about the performance guarantees there. We could also just let it though.

@flying-sheep
Copy link
Copy Markdown
Member

Got it! But this still means that it’d be a good idea to have that check live inside of zarrs-python, because then updates to zarrs-python wouldn’t need an anndata update to go with it in order to start working.

@ilan-gold
Copy link
Copy Markdown
Contributor Author

ilan-gold commented Mar 13, 2026

But this still means that it’d be a good idea to have that check live inside of zarrs-python, because then updates to zarrs-python wouldn’t need an anndata update to go with it in order to start working.

But I would still want this to be custom. The check would be "what does zarrs not syncify from async but also supports directly" which seems a bit cumbersome to bake directly into the package.

@flying-sheep
Copy link
Copy Markdown
Member

flying-sheep commented Mar 13, 2026

My point is that the check should be capability based, not version based.

Unless I’m missing something, you’re encoding knowledge about what the current version of zarrs-python supports into a place that isn’t coupled to that zarrs-python version.

Which we sometimes have to do when we don’t control both sides (API producer and consumer), but since we do, we should take the opportunity and couple the code that handles things with the knowledge if it can handle a thing.

As said, I could be missing something, and there might be a good reason why this check will be correct forever, but I don’t know that reason.

@ilan-gold
Copy link
Copy Markdown
Contributor Author

Ok you're right, I see now - if zarrs starts supporting a new store, we could end up in a situation where we update things here to reflect that but zarrs is still the old version in someone's env.

So the question to me is what do we do? Note that this problem gets worse, it turns out that passing in {} would also be valid input and initialize a memory store, but for remote datasets, it seems like you need a credential string of some sort even anonymous: https://zarr.readthedocs.io/en/stable/user-guide/storage/#implicit-store-creation

With this in mind, I am tempted to just say "if you pass in a string, we accelerate to zarrs because it is a local path" and otherwise we don't. So we would only accelerate in the case when the input is a str. Would this work? Maybe we should ask zarr folks if this is a valid assumption

@flying-sheep
Copy link
Copy Markdown
Member

flying-sheep commented Mar 13, 2026

When you pass a string to zarr.open, it’s interpreted as “path or FSSpec URI” like this, so we should use the same logic if possible:

https://github.com/zarr-developers/zarr-python/blob/a02d9961618f548f8bc398c1124d707e7a9819f7/src/zarr/storage/_common.py#L442

could we just zarr.open and then check if it’s a FSSpecStore or LocalStore?

@ilan-gold
Copy link
Copy Markdown
Contributor Author

could we just zarr.open and then check if it’s a FSSpecStore or LocalStore?

Yes! Codec pipelines are not constructed until arrays are explicilty. This is wayyyy simpler, thanks!

@ilan-gold ilan-gold added this to the 0.13.0 milestone Mar 18, 2026
@ilan-gold
Copy link
Copy Markdown
Contributor Author

I think just "use zarrs" is a good idea - if people complain about slow network speeds or something, we can always just implement async in zarrs-python or provide an option to disable this

@flying-sheep
Copy link
Copy Markdown
Member

so it’ll fall back to the regular one if it can’t handle something?

@ilan-gold
Copy link
Copy Markdown
Contributor Author

so it’ll fall back to the regular one if it can’t handle something?

Exactly

@flying-sheep
Copy link
Copy Markdown
Member

great, no objections then

@ilan-gold ilan-gold marked this pull request as ready for review March 26, 2026 10:59
@ilan-gold
Copy link
Copy Markdown
Contributor Author

@flying-sheep Now for the last question: how do we test this? new CI job? Just trust that "this works"?

@flying-sheep
Copy link
Copy Markdown
Member

flying-sheep commented Mar 26, 2026

If zarrs-python falls back to the regular pipeline when it doesn’t support something, I think trusting it makes sense. Maybe test it on the zarrs-python side?

I think our CI matrix is very complicated already, I think we should simplify it before adding more.

@ilan-gold ilan-gold self-assigned this Apr 2, 2026
@ilan-gold
Copy link
Copy Markdown
Contributor Author

Nice so as of the last commit, this works for reading so we know that zarrs can handle the read side of anndata and we are catching any warnings. So I'll remove now from testing and we can hopefully proceed confidently.

@ilan-gold
Copy link
Copy Markdown
Contributor Author

And now it works for writing. So I'm going to remove the dependencies and merge.

@ilan-gold ilan-gold changed the title perf: use zarrs if available for full read perf: use zarrs if available for full read/write Apr 9, 2026
@ilan-gold ilan-gold merged commit 884cbc8 into main Apr 9, 2026
33 of 39 checks passed
@ilan-gold ilan-gold deleted the ig/zarrs_default branch April 9, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants