Skip to content
Discussion options

You must be logged in to vote

I seem to have found few tricks to accomplih the conversion in reasonable time.

  1. chunk the dataset in the time dimension with the same size as what one netcdf file contains. Here one year so for normal years (vs. leap) 8760 hourly timstep.
  2. during the rechunking step, hold on the data into memory with the .persist() command from Dask
  3. create a Zarr store of the final size, open the netcdf files successvely and then append the data by region in the zarr store.

Here is a sample code to achieve in reasonable time the operation I needed:

import xarray as xr
import pandas as pd
import dask

tvec = pd.date_range('2011-01-01 00:00:00', '2014-12-31 23:00:00', freq='1h', inclusive='both')


ds = xr.o…

Replies: 2 comments 4 replies

Comment options

You must be logged in to vote
4 replies
@ArcticSnow
Comment options

@d-v-b
Comment options

@ArcticSnow
Comment options

@d-v-b
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by ArcticSnow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants