Suggestions for RF inference notebooks

Just few questions/suggestions for improvements of the notebooks: 

* When using Xarray objects, avoid the use of `.values` as much as possible: this triggers immediate loading of data/computations, which can result in memory usage peaks on the node running the Jupyter server and heavy communications between workers and the client.
* Avoid re-chunking as much as possible: right now some of the variables are loaded directly as Dask arrays, some others are not and converted to Dask arrays later on by rechunking. This is bad as it results in a lot of communication (and rechucking tasks). I would suggest to load all variables as Dask arrays (thus adding the `chunk=...` argument to the `xr.open_dataset` functions). For large files, you would ideally use the same chunk sizes used for the .nc files, for others you could use the same value consistently (maybe something like 50 in both x and y?).
* Some of the variables are read using the `netcdf4` library (I believe), some others are loaded with `rasterio` as engine. Why? Note that the latter is deprecated, and you should do instead `rioxarray.open_rasterio`, which loads the variable directly as a `DataArray` instead of a `Dataset`. 
* Try to separate conversions and transformations on the data from the more implementation specific tasks (loading/rechunking), so it is clearer where the "physical" operations are. Ideally, do transformations after having chunked the data in the most ideal way. Also, I would try to use throughout descriptive names for the variables (try to avoid names like `all` and `result1`).
*  If you want to index an array in order to match the coordinate of another array in a robust way, you can use: `x = x.sel(longitude=y.longitude, latitude=y.latitude, method='nearest', tolerance=0.01)` - I see that you had to play with the values in the slices in order to match the coordinates of the different arrays.
* In the Europe notebook, is the loading of the ERA5 dataset that you used to use as a template needed? I don't see it used anywhere.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Suggestions for RF inference notebooks #17

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Suggestions for RF inference notebooks #17

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions