-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
BUG: fix error when unstacking with sort=False
when indexes contains NA
#62334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fix bux when indexes contains `nan` and is not sorting would raise an `IndexError` or `ValueError`.
Use `np.unique` instead of `unique`.
63aeec0
to
55117be
Compare
55117be
to
6f98429
Compare
"levels2, expected_columns, expected_data", | ||
[ | ||
( | ||
Index([None, 1, 2, 3]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Could you make the Index
call in the body of the test since it's the same across all parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the parametrized argument change would be
- Index([None, 1, 2, 3])
+ [None, 1, 2, 3]
or do you expect something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After applying the suggested change I had to alter the expected_columns
values, the columns from .unstack()
became integers instead of floats.
I also removed expected_data
from the parametrization and put it into the test body, as it was the same for all tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, I ended up not calling Index
in the test, that's why it is no longer float. But I can still reproduce the error that I was seeing in the issue with these new tests:
❯ git restore --source=main pandas/core/reshape/reshape.py
❯ pytest pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan
...
FAILED pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan[nan=first] - IndexError: index 8 is out of bounds for axis 0 with size 8
FAILED pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan[nan=second] - IndexError: index 8 is out of bounds for axis 0 with size 8
FAILED pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan[nan=third] - IndexError: index 8 is out of bounds for axis 0 with size 8
FAILED pandas/tests/frame/test_stack_unstack.py::test_unstack_sort_false_nan[nan=last] - IndexError: index 8 is out of bounds for axis 0 with size 8
For the sake of completeness, here is the result of the reproduction from #61221 with this patch: import pandas as pd
levels1 = ['b', 'a']
levels2 = pd.Index([1, 2, 3, pd.NA], dtype=pd.Int64Dtype())
index = pd.MultiIndex.from_product([levels1, levels2], names=['level1', 'level2'])
df = pd.DataFrame(dict(value=range(len(index))), index=index)
print(df)
# value
# level1 level2
# b 1 0
# 2 1
# 3 2
# <NA> 3
# a 1 4
# 2 5
# 3 6
# <NA> 7
print(df.unstack(level='level2'))
# value
# level2 <NA> 1 2 3
# level1
# a 7 4 5 6
# b 3 0 1 2
print(df.unstack(level='level2', sort=False))
# value
# level2 1 2 3 <NA>
# level1
# b 0 1 2 3
# a 4 5 6 7 |
Great! Thanks @Alvaro-Kothe |
unstack(sort=False)
and NA in index #61221 (Replace xxxx with the GitHub issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.This pull requests makes the
unstack
method differentiate how it managesNA
when not sorting. The main problem was thatNA
values when not sorting didn't have-1
value, resulting inIndexError
onmask.put
.The main change is that it keeps track on where
NA
should be and assign it before creating the multiindex in the_repeater
method.