Context
Lazy indexing (pydata/xarray#5081, zarr-developers/zarr-python#3906) is a great start. But people do genuinely need to do work that is larger than the memory available. They need to be able to take advantage of on-disk chunks to divy up work.
Proposal: Invest in Cubed
Cubed has a simplified API that is focused exclusively on arrays. It benefited from being created after dask which means that it came out after the array API was solidified (design docs).
Improve xarray integration
Xarray needs functionality from the array libraries that is not included in the strict array API. That's why the ChunkManager API exists.
Would be useful to get hypothesis tests for duck arrays. There already are strategies that Justus and Tom Nicholas had worked on, but might not actually be used.
Improve local use
Questions
- does cubed do anything for parallelization without a cluster? -> yes
- does cubed do any task culling or optimization? -> yes some but perhaps more is available
- does cube write everything to disk? -> I think so. This makes it even more important to optimize the graph
Context
Lazy indexing (pydata/xarray#5081, zarr-developers/zarr-python#3906) is a great start. But people do genuinely need to do work that is larger than the memory available. They need to be able to take advantage of on-disk chunks to divy up work.
Proposal: Invest in Cubed
Cubed has a simplified API that is focused exclusively on arrays. It benefited from being created after dask which means that it came out after the array API was solidified (design docs).
Improve xarray integration
Xarray needs functionality from the array libraries that is not included in the strict array API. That's why the
ChunkManagerAPI exists.Would be useful to get hypothesis tests for duck arrays. There already are strategies that Justus and Tom Nicholas had worked on, but might not actually be used.
Improve local use
Questions