-
-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Hello,
there seems to be some issue with how datetime columns are read when reading a dataframe from file, where, if the file contains a date < 1678-01-01 (i guess something to do with numpy/pandas using ns resolution for datetime), then the values in the column are formatted differently, and are not interpreted as a datetime column.
from io import StringIO
# import geopandas as gpd
# for year in range(1670, 1680, 1):
# age = f"{year}-01-01T00:00:00"
# geojson_string = f"""
# {{
# "type": "FeatureCollection",
# "features": [
# {{
# "type": "Feature",
# "properties": {{
# "age": "{age}"
# }},
# "geometry": null
# }}
# ]
# }}
# """
# geojson_io = StringIO(geojson_string)
# gdf = gpd.read_file(geojson_io, engine="pyogrio")
# print(gdf.age.iloc[0])
# print()
# test the same with pyogrio directly
import pyogrio
for year in range(1670, 1680, 1):
age = f"{year}-01-01T00:00:00"
geojson_string = f"""
{{
"type": "FeatureCollection",
"features": [
{{
"type": "Feature",
"properties": {{
"age": "{age}"
}},
"geometry": null
}}
]
}}
"""
geojson_io = StringIO(geojson_string)
gdf = pyogrio.read_dataframe(geojson_io)
print(gdf["age"].iloc[0])
output:
1670/01/01 00:00:00
1671/01/01 00:00:00
1672/01/01 00:00:00
1673/01/01 00:00:00
1674/01/01 00:00:00
1675/01/01 00:00:00
1676/01/01 00:00:00
1677/01/01 00:00:00
1678-01-01 00:00:00
1679-01-01 00:00:00
In my case, i have some automated pipelines that write the datetime in iso format string, then read them back to dataframe. But as soon as there is a datetime older than 1678 (and bigger than 2262 i guess), the formatting breaks.
The formatting is consistent when using geopandas with fiona.
As a side note, I would be fine with the column being read as a string column, but that does not seem possible with geopandas. Is that the case, or am I missing something?
Thank you
versions: geopandas 0.14.4, pyogrio 0.11.0