Skip to content

Improve Cubed support in xarray #30

@jsignell

Description

@jsignell

Context

Lazy indexing (pydata/xarray#5081, zarr-developers/zarr-python#3906) is a great start. But people do genuinely need to do work that is larger than the memory available. They need to be able to take advantage of on-disk chunks to divy up work.

Proposal: Invest in Cubed

Cubed has a simplified API that is focused exclusively on arrays. It benefited from being created after dask which means that it came out after the array API was solidified (design docs).

Improve xarray integration

Xarray needs functionality from the array libraries that is not included in the strict array API. That's why the ChunkManager API exists.

Would be useful to get hypothesis tests for duck arrays. There already are strategies that Justus and Tom Nicholas had worked on, but might not actually be used.

Improve local use

Questions

  • does cubed do anything for parallelization without a cluster? -> yes
  • does cubed do any task culling or optimization? -> yes some but perhaps more is available
  • does cube write everything to disk? -> I think so. This makes it even more important to optimize the graph

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions