perf: use zarrs if available for full read/write#2326
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2326 +/- ##
==========================================
- Coverage 87.45% 85.50% -1.96%
==========================================
Files 49 49
Lines 7742 7753 +11
==========================================
- Hits 6771 6629 -142
- Misses 971 1124 +153
|
flying-sheep
left a comment
There was a problem hiding this comment.
what the conditions are for turning it on
Maybe have zarrs-python export a function that checks if it can (probably) handle something, and use that function here if zarrs-python is importable?
Co-authored-by: Philipp A. <flying-sheep@web.de>
It goes deeper than that. If you pass in an |
|
Got it! But this still means that it’d be a good idea to have that check live inside of zarrs-python, because then updates to zarrs-python wouldn’t need an anndata update to go with it in order to start working. |
But I would still want this to be custom. The check would be "what does zarrs not syncify from async but also supports directly" which seems a bit cumbersome to bake directly into the package. |
|
My point is that the check should be capability based, not version based. Unless I’m missing something, you’re encoding knowledge about what the current version of zarrs-python supports into a place that isn’t coupled to that zarrs-python version. Which we sometimes have to do when we don’t control both sides (API producer and consumer), but since we do, we should take the opportunity and couple the code that handles things with the knowledge if it can handle a thing. As said, I could be missing something, and there might be a good reason why this check will be correct forever, but I don’t know that reason. |
|
Ok you're right, I see now - if So the question to me is what do we do? Note that this problem gets worse, it turns out that passing in With this in mind, I am tempted to just say "if you pass in a |
|
When you pass a string to could we just |
Yes! Codec pipelines are not constructed until arrays are explicilty. This is wayyyy simpler, thanks! |
|
I think just "use zarrs" is a good idea - if people complain about slow network speeds or something, we can always just implement |
|
so it’ll fall back to the regular one if it can’t handle something? |
Exactly |
|
great, no objections then |
for more information, see https://pre-commit.ci
|
@flying-sheep Now for the last question: how do we test this? new CI job? Just trust that "this works"? |
|
If I think our CI matrix is very complicated already, I think we should simplify it before adding more. |
|
Nice so as of the last commit, this works for reading so we know that zarrs can handle the read side of anndata and we are catching any warnings. So I'll remove now from testing and we can hopefully proceed confidently. |
|
And now it works for writing. So I'm going to remove the dependencies and merge. |
zarrs if available for full readzarrs if available for full read/write
I'm not exactly sure how we want to test this (i.e., how do we deal with dependncies? include
zarrsintest?) or what the conditions are for turning it on (at the moment, it's that you gave us aLocalStoreor a path-like input but not a Group)..