Skip to content

Datetime field formatting breaks for datetimes before 1678-1-1 when reading from file #553

@apon97

Description

@apon97

Hello,

there seems to be some issue with how datetime columns are read when reading a dataframe from file, where, if the file contains a date < 1678-01-01 (i guess something to do with numpy/pandas using ns resolution for datetime), then the values in the column are formatted differently, and are not interpreted as a datetime column.

from io import StringIO

# import geopandas as gpd


# for year in range(1670, 1680, 1):
#     age = f"{year}-01-01T00:00:00"
#     geojson_string = f"""
#     {{
#         "type": "FeatureCollection",
#         "features": [
#             {{
#                 "type": "Feature",
#                 "properties": {{
#                     "age": "{age}"
#                 }},
#                 "geometry": null
#             }}
#         ]
#     }}
#     """
#     geojson_io = StringIO(geojson_string)
#     gdf = gpd.read_file(geojson_io, engine="pyogrio")
#     print(gdf.age.iloc[0])

# print()

# test the same with pyogrio directly
import pyogrio

for year in range(1670, 1680, 1):
    age = f"{year}-01-01T00:00:00"
    geojson_string = f"""
    {{
        "type": "FeatureCollection",
        "features": [
            {{
                "type": "Feature",
                "properties": {{
                    "age": "{age}"
                }},
                "geometry": null
            }}
        ]
    }}
    """
    geojson_io = StringIO(geojson_string)
    gdf = pyogrio.read_dataframe(geojson_io)
    print(gdf["age"].iloc[0])

output:

1670/01/01 00:00:00
1671/01/01 00:00:00
1672/01/01 00:00:00
1673/01/01 00:00:00
1674/01/01 00:00:00
1675/01/01 00:00:00
1676/01/01 00:00:00
1677/01/01 00:00:00
1678-01-01 00:00:00
1679-01-01 00:00:00

In my case, i have some automated pipelines that write the datetime in iso format string, then read them back to dataframe. But as soon as there is a datetime older than 1678 (and bigger than 2262 i guess), the formatting breaks.

The formatting is consistent when using geopandas with fiona.

As a side note, I would be fine with the column being read as a string column, but that does not seem possible with geopandas. Is that the case, or am I missing something?

Thank you

versions: geopandas 0.14.4, pyogrio 0.11.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions