Skip to content

fix: handle non-iterable hashables as dim names #10638

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ianhi
Copy link
Contributor

@ianhi ianhi commented Aug 13, 2025

Fixes: #10634

  • Core dimension parsing fixes in namedarray/core.py
  • Type definition updates in _typing.py
  • Index handling fixes in core/indexes.py
  • DataArray dimension parsing in core/dataarray.py
  • Variable docstring updates in core/variable.py

This exposed that ('a', 'b') is ambiguous as it is both hashable and a sequence of hashable, so I added an error message in a case where it is ambiguous.

There's some error duplication logic, but that is matching the existing duplication of logic.

- Core dimension parsing fixes in namedarray/core.py
- Type definition updates in _typing.py
- Index handling fixes in core/indexes.py
- DataArray dimension parsing in core/dataarray.py
- Variable docstring updates in core/variable.py
@github-actions github-actions bot added topic-indexing topic-NamedArray Lightweight version of Variable labels Aug 13, 2025
@Illviljan
Copy link
Contributor

Illviljan commented Aug 13, 2025

There's been a lot of struggles with this in the past, see #8210 #6142 #8199 for more reading.

The idea with the previous typing is that xarray promises to normalize str and the outer Iterable to tuple[Hashable, ...], anything more exotic should follow tuple[Hashable, ...] for example (1, None, "z", ("a", "b")).
xarray internals is not perfect either, a lot of parts in xarray still expects dims: tuple[str, ...]

NamedArray handles this fine I think on Main, so I think this PR should try to solve this at the DataArray/Variable level:

import numpy as np
import xarray as xr

some_id = uuid.uuid4()
some_id_2 = uuid.uuid4()

dims = (some_id, some_id_2)
xr.NamedArray(dims, np.asarray([[0.0, 1.0]]))
# <xarray.NamedArray (14a722b6-c795-4557-9255-e3a35b76f383: 1,
#                     d9ebaff1-47b9-4266-8f3f-cb4f2e66719e: 2)> Size: 16B
# array([[0., 1.]])

It probably would've been much easier if dims: list[Hashable], no ambiguity then but other issues as well.

@Illviljan
Copy link
Contributor

Tweaking types usually opens a can of worms.

xarray/namedarray/utils.py: note: In function "infix_dims":
xarray/namedarray/utils.py:165: error: Item "Hashable" of "Hashable | Iterable[Hashable]" has no attribute "__iter__" (not iterable)  [union-attr]

drop_missing_dims, which is used in infix_dims probably needs to be rewritten to return _Dims. The raise case looks risky.

xarray/tests/test_namedarray.py: note: In member "test_permute_dims" of class "TestNamedArray":
xarray/tests/test_namedarray.py:545: error: Expected iterable as variadic argument  [misc]

test_permute_dims uses _DimsLike, try Iterable[_Dim] instead.
Not a big fan of the Iterable[_Dim] style, Would've been nice if these non-public functions simplified and used tuple[Hashable, ...] instead and the normalization was done somewhere else.

 xarray/tests/test_hashable.py: note: In function "test_hashable_dims":
xarray/tests/test_hashable.py:60: error: Argument "dims" to "DataArray" has incompatible type "list[int | tuple[Any, ...] | DEnum | CustomHashable | UUID] | int | tuple[Any, ...] | DEnum | CustomHashable | UUID"; expected "str | Iterable[Hashable] | None"  [arg-type]
xarray/tests/test_hashable.py:66: error: Argument "dims" to "DataArray" has incompatible type "list[int | tuple[Any, ...] | DEnum | CustomHashable | UUID] | int | tuple[Any, ...] | DEnum | CustomHashable | UUID"; expected "str | Iterable[Hashable] | None"  [arg-type]

DataArray is just missing _DimsLike.


Note: Tuples are treated as sequences, so ('a', 'b') means two
dimensions named 'a' and 'b'. To use a tuple as a single dimension
name, wrap it in a list: [('a', 'b')].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name, wrap it in a list: [('a', 'b')].
name, wrap it in a another tuple: (('a', 'b'),) or list: [('a', 'b')].

Should recommend what xarray tries to use internally first, tuple[Hashable, ...].


Note: Tuples are treated as sequences, so ('a', 'b') means two
dimensions named 'a' and 'b'. To use a tuple as a single dimension
name, wrap it in a list: [('a', 'b')].
Copy link
Contributor

@Illviljan Illviljan Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name, wrap it in a list: [('a', 'b')].
name, wrap it in a tuple: (('a', 'b'),) or list: [('a', 'b')].

Comment on lines +505 to +511
if isinstance(dims, str):
dims = (dims,)
elif isinstance(dims, Iterable):
dims = tuple(dims)
else:
# Single non-string, non-iterable hashable (int, UUID, etc.)
dims = (dims,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_parse_dimensions is triggered every time NamedArray is initialized. Adding more if-cases will slow down for the people (and NamedArray internally) using the correct type dims: tuple[Hashable, ...].

Would be nice to invert the check to promote correct typing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-indexing topic-NamedArray Lightweight version of Variable
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot build xarrays with hashable UUIDs as dims
2 participants