Skip to content

Support compute=False from DataTree.to_netcdf #10625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 18, 2025

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Aug 12, 2025

Split out of #10624

This PR combines adds support for compute=False from DataTree.to_netcdf and to_zarr. To do so, I refactored the internals of these methods to use Xarray's lower level data store interface directly, rather than calling Dataset methods.

  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst

@shoyer shoyer requested a review from TomNicholas August 12, 2025 01:19
@github-actions github-actions bot added topic-backends topic-zarr Related to zarr storage library topic-DataTree Related to the implementation of a DataTree class io labels Aug 12, 2025
writer = ArrayWriter()

# TODO: figure out how to properly handle unlimited_dims
try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to refactor this to a common function used in both the netCDF and the Zarr writer. Do you see a way to do that? At first glance the "validate region / encoding" bit seems to make this hard.

If there is no easy way to do that, can you please add a comment to both functions to remind future contributors to keep the logic in sync?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this would be tricky.

I'm not sure a comment makes sense here -- there's no intrinsic reason why the implementations need to match, although hopefully this suggestion would be somewhat obvious? There are also unit tests, of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this separate datatree_io.py file has come to the end of it's usefulness. In a follow-up I can just merge it into the respective backends.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely agreed! In the long term, we might even implement Dataset IO in terms of DataTree IO. This would let us avoid redundant code paths, similar to how we currently implement many DataArray operations in terms of Dataset.

@shoyer
Copy link
Member Author

shoyer commented Aug 16, 2025

This is ready for a final review now that tests are passing.

@shoyer
Copy link
Member Author

shoyer commented Aug 18, 2025

Just a heads up, I am going to submit this shortly so I can start iterating on follow-ups

@shoyer shoyer merged commit 89c913a into pydata:main Aug 18, 2025
37 checks passed
@shoyer shoyer deleted the to_netcdf-internals branch August 18, 2025 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io topic-backends topic-DataTree Related to the implementation of a DataTree class topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants