Review of first notebook 01_io_and_attributes.ipynb

Just a review of the first notebook content:

- First, in general, why rely on pip more than Conda for all the packages?
- Notebooks should probably be commited without cell outputs.
- `show(ds_rasterio)` leads to memory error on binder (with 2GB memory). This has thus happened several other times when too many cells where executed and the data loaded by different libraries.
- After `data = ds_rasterio.read()`, 1.7GB are used.
- > Here, we see that our chunks have the size of 121 MB (11264 x 11264), which is too big and can lead to memory overload

  This is not necessarily too big, about 100MB is good with big collections. In case of a single EO product, this is probably too big, and not aligned with how the underlying arrays are layout in their corresponding files. 
- > This comes from the fact that the original rasters is not chunked on disk

  Not sure about that, are you sure that even if files where chunked, rioxarray will use this knowledge for default chunk size?
- > (the ideal recommanded size is generally between 10~50 MB)

  I would say somewhere between 10 and 200 depending on the dataset size.
- > rechunking can take a bit of time

  More than that: rechunking is a heavy operation than must be avoided
- I've never used odc-geo, but I think there is some magic that should be explained: how can we get the odc accessor when loading data through rioxarray?
- GCP part: cannot be executed in binder, there is no "data" folder in the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Review of first notebook 01_io_and_attributes.ipynb #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Review of first notebook 01_io_and_attributes.ipynb #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions