Skip to content

Rewrite DataTree.to_netcdf and support netCDF4 in-memory #10624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Aug 11, 2025

This PR includes a handful of significant changes:

  1. It refactors the internal structure of DataTree.to_netcdf() and DataTree.to_zarr() to use lower level interfaces, rather than calling Dataset methods. This allows for properly supporting compute=False (and likely various other improvements).
  2. Reading and writing in-memory data with netCDF4-python is now supported, including DataTree.
  3. I've added a new user-facing load_datatree function, for consistentcy with load_dataset and load_dataarray.
  4. The engine argument in DataTree.to_netcdf() is now set consistently with Dataset.to_netcdf(), preferring netcdf4 to h5netcdf.
  5. Calling Dataset.to_netcdf() without a target now always returns a memoryview object, including in the case where engine='scipy' is used (which currently returns bytes). This is a breaking change, rather than merely issuing a warning as is done in Support for DataTree.to_netcdf to write to a file-like object or bytes #10571. I believe it probably makes sense to do as a this breaking change because (1) it offers significant performance benefits, (2) the default behavior without specifying an engine will already change (because netcdf4 is preferred to the scipy backend) and (3) restoring previous behavior is easy (by wrapping the memoryview with bytes()).
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

@github-actions github-actions bot added topic-backends topic-zarr Related to zarr storage library topic-DataTree Related to the implementation of a DataTree class io labels Aug 11, 2025
@shoyer
Copy link
Member Author

shoyer commented Aug 11, 2025

  • It refactors the internal structure of DataTree.to_netcdf() and DataTree.to_zarr() to use lower level interfaces, rather than calling Dataset methods. This allows for properly supporting compute=False (and likely various other improvements).

I am thinking I might try to split this into a separate PR, because it's unrelated to the netCDF4 in-memory changes.

This PR includes a handful of significant changes:

1. It refactors the internal structure of `DataTree.to_netcdf()` and
   `DataTree.to_zarr()` to use lower level interfaces, rather than
   calling `Dataset` methods. This allows for properly supporting
   `compute=False` (and likely various other improvements).
2. Reading and writing in-memory data with netCDF4-python is now
   supported, including DataTree.
3. The `engine` argument in `DataTree.to_netcdf()` is now set
   consistently with `Dataset.to_netcdf()`, preferring `netcdf4` to
   `h5netcdf`.
3. Calling `Dataset.to_netcdf()` without a target now always returns a
   `memoryview` object, *including* in the case where `engine='scipy'`
   is used (which currently returns `bytes`). This is a breaking change,
   rather than merely issuing a warning as is done in pydata#10571. I believe
   it probably makes sense to do as a this breaking change because (1)
   it offers significant performance benefits, (2) the default behavior
   without specifying an engine will already change (because `netcdf4`
   is preferred to the `scipy` backend) and (3) restoring previous
   behavior is easy (by wrapping the memoryview with `bytes()`).

mypy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io topic-backends topic-DataTree Related to the implementation of a DataTree class topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant