Skip to content

Improve consistency of default engine and return memoryview instead of bytes from to_netcdf() #10656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Aug 19, 2025

This PR introduces two breaking changes:

  1. The default backend engine used by Dataset.to_netcdf and DataTree.to_netcdf is now chosen consistently with open_dataset and open_datatree, using whichever netCDF libraries are available and valid, and preferring netCDF4 to h5netcdf to scipy. Previously, DataTree.to_netcdf was hard-coded to use scipy for writing to file-like objects or bytes, and DataTree.to_netcdf was hard-coded to use h5netcdf.
  2. The return value of Dataset.to_netcdf without path is now a memoryview object instead of bytes. This removes an unnecessary memory copy and ensures consistency when using either engine="scipy" or engine="h5netcdf".

It also includes a minor bug-fix, raising an error when returning a memoryview with compute=False

Fixes #10654

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

This PR introduces a bug fix and a breaking changes:

1. The default backend ``engine`` used by `Dataset.to_netcdf`
   and `DataTree.to_netcdf` is now chosen consistently with
   `open_dataset` and `open_datatree`, using whichever netCDF
   libraries are available and preferring netCDF4 to h5netcdf to scipy.
   Previously, `DataTree.to_netcdf` was hard-coded to use h5netcdf.
2. The return value of `Dataset.to_netcdf` without ``path`` is
   now a ``memoryview`` object instead of ``bytes``. This removes an unnecessary
   memory copy and ensures consistency when using either ``engine="scipy"`` or
   ``engine="h5netcdf"``.

Fixes pydata#10654
@github-actions github-actions bot added topic-backends topic-DataTree Related to the implementation of a DataTree class io labels Aug 19, 2025
@dataclass
class BytesIOProxy(Generic[BytesOrMemory]):
"""Proxy object for a write that returns either bytes or a memoryview."""
class BytesIOProxy:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'm keeping around BytesIOProxy because we'll need it for #10624

@shoyer shoyer changed the title Improve consistency and engine keyword argument for to_netcdf() Improve consistency of default engine and return memoryview instead of bytes from to_netcdf() Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io topic-backends topic-DataTree Related to the implementation of a DataTree class
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataTree.to_netcdf has h5netcdf hardcoded as default
1 participant