-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Just a review of the first notebook content:
-
First, in general, why rely on pip more than Conda for all the packages?
-
Notebooks should probably be commited without cell outputs.
-
show(ds_rasterio)leads to memory error on binder (with 2GB memory). This has thus happened several other times when too many cells where executed and the data loaded by different libraries. -
After
data = ds_rasterio.read(), 1.7GB are used. -
Here, we see that our chunks have the size of 121 MB (11264 x 11264), which is too big and can lead to memory overload
This is not necessarily too big, about 100MB is good with big collections. In case of a single EO product, this is probably too big, and not aligned with how the underlying arrays are layout in their corresponding files.
-
This comes from the fact that the original rasters is not chunked on disk
Not sure about that, are you sure that even if files where chunked, rioxarray will use this knowledge for default chunk size?
-
(the ideal recommanded size is generally between 10~50 MB)
I would say somewhere between 10 and 200 depending on the dataset size.
-
rechunking can take a bit of time
More than that: rechunking is a heavy operation than must be avoided
-
I've never used odc-geo, but I think there is some magic that should be explained: how can we get the odc accessor when loading data through rioxarray?
-
GCP part: cannot be executed in binder, there is no "data" folder in the repository.