Skip to content

Conversation

Alvaro-Kothe
Copy link
Contributor

@Alvaro-Kothe Alvaro-Kothe commented Sep 13, 2025


This pull requests makes the unstack method differentiate how it manages NA when not sorting. The main problem was that NA values when not sorting didn't have -1 value, resulting in IndexError on mask.put.

The main change is that it keeps track on where NA should be and assign it before creating the multiindex in the _repeater method.

Fix bux when indexes contains `nan` and is not sorting would raise an
`IndexError` or `ValueError`.
Use `np.unique` instead of `unique`.
@mroeschke mroeschke added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Sep 15, 2025
"levels2, expected_columns, expected_data",
[
(
Index([None, 1, 2, 3]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Could you make the Index call in the body of the test since it's the same across all parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the parametrized argument change would be

- Index([None, 1, 2, 3])
+ [None, 1, 2, 3]

or do you expect something different?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After applying the suggested change I had to alter the expected_columns values, the columns from .unstack() became integers instead of floats.

I also removed expected_data from the parametrization and put it into the test body, as it was the same for all tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, I ended up not calling Index in the test, that's why it is no longer float. But I can still reproduce the error that I was seeing in the issue with these new tests:

git restore --source=main pandas/core/reshape/reshape.pypytest pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan
...
FAILED pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan[nan=first] - IndexError: index 8 is out of bounds for axis 0 with size 8
FAILED pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan[nan=second] - IndexError: index 8 is out of bounds for axis 0 with size 8
FAILED pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan[nan=third] - IndexError: index 8 is out of bounds for axis 0 with size 8
FAILED pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan[nan=last] - IndexError: index 8 is out of bounds for axis 0 with size 8

@Alvaro-Kothe
Copy link
Contributor Author

For the sake of completeness, here is the result of the reproduction from #61221 with this patch:

import pandas as pd

levels1 = ['b', 'a']
levels2 = pd.Index([1, 2, 3, pd.NA], dtype=pd.Int64Dtype())
index = pd.MultiIndex.from_product([levels1, levels2], names=['level1', 'level2'])

df = pd.DataFrame(dict(value=range(len(index))), index=index)
print(df)
#                value
# level1 level2
# b      1           0
#        2           1
#        3           2
#        <NA>        3
# a      1           4
#        2           5
#        3           6
#        <NA>        7

print(df.unstack(level='level2'))
#        value
# level2  <NA>  1  2  3
# level1
# a          7  4  5  6
# b          3  0  1  2

print(df.unstack(level='level2', sort=False))
#        value
# level2     1  2  3 <NA>
# level1
# b          0  1  2    3
# a          4  5  6    7

@mroeschke mroeschke added this to the 3.0 milestone Sep 15, 2025
@mroeschke mroeschke merged commit 4f4b108 into pandas-dev:main Sep 15, 2025
40 of 41 checks passed
@mroeschke
Copy link
Member

Great! Thanks @Alvaro-Kothe

@Alvaro-Kothe Alvaro-Kothe deleted the fix/unstack-na branch September 15, 2025 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Exception with unstack(sort=False) and NA in index
2 participants