-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: handle non-iterable hashables as dim names #10638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Core dimension parsing fixes in namedarray/core.py - Type definition updates in _typing.py - Index handling fixes in core/indexes.py - DataArray dimension parsing in core/dataarray.py - Variable docstring updates in core/variable.py
There's been a lot of struggles with this in the past, see #8210 #6142 #8199 for more reading. The idea with the previous typing is that xarray promises to normalize NamedArray handles this fine I think on Main, so I think this PR should try to solve this at the DataArray/Variable level: import numpy as np
import xarray as xr
some_id = uuid.uuid4()
some_id_2 = uuid.uuid4()
dims = (some_id, some_id_2)
xr.NamedArray(dims, np.asarray([[0.0, 1.0]]))
# <xarray.NamedArray (14a722b6-c795-4557-9255-e3a35b76f383: 1,
# d9ebaff1-47b9-4266-8f3f-cb4f2e66719e: 2)> Size: 16B
# array([[0., 1.]]) It probably would've been much easier if |
Tweaking types usually opens a can of worms.
DataArray is just missing |
|
||
Note: Tuples are treated as sequences, so ('a', 'b') means two | ||
dimensions named 'a' and 'b'. To use a tuple as a single dimension | ||
name, wrap it in a list: [('a', 'b')]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name, wrap it in a list: [('a', 'b')]. | |
name, wrap it in a another tuple: (('a', 'b'),) or list: [('a', 'b')]. |
Should recommend what xarray tries to use internally first, tuple[Hashable, ...]
.
|
||
Note: Tuples are treated as sequences, so ('a', 'b') means two | ||
dimensions named 'a' and 'b'. To use a tuple as a single dimension | ||
name, wrap it in a list: [('a', 'b')]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name, wrap it in a list: [('a', 'b')]. | |
name, wrap it in a tuple: (('a', 'b'),) or list: [('a', 'b')]. |
if isinstance(dims, str): | ||
dims = (dims,) | ||
elif isinstance(dims, Iterable): | ||
dims = tuple(dims) | ||
else: | ||
# Single non-string, non-iterable hashable (int, UUID, etc.) | ||
dims = (dims,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_parse_dimensions
is triggered every time NamedArray is initialized. Adding more if-cases will slow down for the people (and NamedArray internally) using the correct type dims: tuple[Hashable, ...]
.
Would be nice to invert the check to promote correct typing.
Fixes: #10634
This exposed that
('a', 'b')
is ambiguous as it is both hashable and a sequence of hashable, so I added an error message in a case where it is ambiguous.There's some error duplication logic, but that is matching the existing duplication of logic.
whats-new.rst
api.rst