Skip to content

pandas.MultiIndex roundtrip with extension-arrays #10581

@ilan-gold

Description

@ilan-gold

Is your feature request related to a problem?

In the following test

def test_to_xarray_with_multiindex(self, df):
from xarray import Dataset
# MultiIndex
df.index = MultiIndex.from_product([["a"], range(4)], names=["one", "two"])
result = df.to_xarray()
assert result.sizes["one"] == 1
assert result.sizes["two"] == 4
assert len(result.coords) == 2
assert len(result.data_vars) == 8
tm.assert_almost_equal(list(result.coords.keys()), ["one", "two"])
assert isinstance(result, Dataset)
result = result.to_dataframe()
expected = df.copy()
expected["f"] = expected["f"].astype(object)
expected.columns.name = None
tm.assert_frame_equal(result, expected)

We encode that we expect extension arrays roundtripped with a MultiIndex as the dataframe index to be cast into numpy (broadcasting rules, if I remember correctly) but this should be relaxed to allow true roundtripping!

See #10559 (comment) for the originating discussion

Describe the solution you'd like

pandas.Dataframe with a MultiIndex and extension arrays in it should be roundtrippable without data type loss

Describe alternatives you've considered

N/A

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions