feat: AnnData.can_write based on AnnData.reduce + refactored AnnData.__sizeof__#2372
feat: AnnData.can_write based on AnnData.reduce + refactored AnnData.__sizeof__#2372
AnnData.can_write based on AnnData.reduce + refactored AnnData.__sizeof__#2372Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2372 +/- ##
==========================================
- Coverage 87.45% 85.59% -1.87%
==========================================
Files 49 49
Lines 7742 7801 +59
==========================================
- Hits 6771 6677 -94
- Misses 971 1124 +153
|
AnnData.can_write based on AnnData.foldAnnData.can_write based on AnnData.fold + refactored AnnData.__sizeof__
|
Just looking for a first-pass vibe check on this! |
flying-sheep
left a comment
There was a problem hiding this comment.
Very exciting! I think this mainly needs some serious documenting and examples, not much else.
For examples, maybe first something that just handles one kind of accessor/reference first to show how one can short-circuit by returning the accumulator early.
Regarding naming: the canonical name in the stdlib is reduce, it even used to be in builtins, but maybe the scientific python world deviates for no good reason again. It’s also fair if you have other objections to reduce!
src/anndata/_core/anndata.py
Outdated
| (is_ad_ref := isinstance(ref_acc, AdRef)) | ||
| or ( | ||
| is_ref_acc := isinstance( | ||
| ref_acc, LayerAcc | MultiAcc | GraphAcc | ||
| ) | ||
| ) |
There was a problem hiding this comment.
this is
isinstance(ref_acc, AdRef | RefAcc) and not isinstance(ref_acc, MetaAcc)because you want to handle each homogeneous array.
I think there are some ways to make this more ergonomic, e.g.:
- we can add more marker superclasses that handle this use case, or
- we could change the implementation of fold/reduce so it descends into things and returning or raising some marker would prevent descent. that might help handling things as early as possible and therefore having less complicated logic.
|
Hi @ilan-gold, interesting direction! The HTML repr PR (#2236) intersects with HTML repr (#2236): traversal overlapThe SECTION_ORDER = ("X", "obs", "var", "uns", "obsm", "varm", "layers", "obsp", "varp", "raw")
for section in SECTION_ORDER:
parts.append(_render_section(adata, section, context))This is structurally very similar to
On the positive side, the per-column iteration of obs/var via The repr would naturally use DFS-pre order (render section header first, then its entries), so it's good that option exists. If Serialization checks:
|
Great catch! I think we currently don't have a path accessor for it, so I think that's the same thing as
Phil brought this up in a call on Friday as well, not sure (again) what is to be done
I'm not sure I follow this line of thinking fully - why can't you distinguish between these two now? If you are doing
That could make sense. Once you merge this into your PR we can refactor this if needed + work out more kinks |
|
|
||
| sizes = {} | ||
| attrs = ["X", "_obs", "_var"] | ||
| attrs_multi = ["_uns", "_obsm", "_varm", "varp", "_obsp", "_layers"] |
There was a problem hiding this comment.
FWIW this never handled raw anyway
|
See the
|
|
Hi @ilan-gold, thanks for the updates! The rename to I was curious how Before getting into findings: is What worked well: The shared IO registry utilities are a clear win. I extracted What I ran into:
Sounds like a good plan. Both Nested AnnData in Understood that this is still an open question. For reference, the HTML repr already handles this case with a simple depth counter: when it encounters an Net assessment: For Happy to walk through the code in more detail. The experiment branch has |
AnnData.can_write based on AnnData.fold + refactored AnnData.__sizeof__AnnData.can_write based on AnnData.reduce + refactored AnnData.__sizeof__



maybe TODOs:
fold?reduce?write_{zarr,h5ad}to prevent partial stores from being written?We introduce
AnnData.foldandAnnData.can_writefor "crawling" anAnnDataand accumulating, implementing acan_writeto tell if anAnnDatacan be written to a given format.I'm also open to
reduceas the name instead offold.For #2236, I think we could implement this more complexly by building on
foldand accumulating asetofAdRef(or similar) objects pointing at what can't be written. Open to implement this! I reimplemented__sizeof__to stress-test using reference-accessors and it went ok, but it's not amazingly smooth (likeunsand trying to figure out if you are in a "top-level" elem likeobsorvarm)If zarr-developers/zarr-python#3613 ever happens (I hope so) or we do #2238, this will be crucial to making things smoother with hdf5 for preventing incompatibilities.
backedinternal checking #2369 and moving towards something raised in feat: Add HTML representation #2236