-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
See apache/arrow#37291 for context.
In the Timedelta constructor, we have:
pandas/pandas/_libs/tslibs/timedeltas.pyx
Lines 953 to 964 in 43691a2
# For millisecond and second resos, we cannot actually pass int(value) because | |
# many cases would fall outside of the pytimedelta implementation bounds. | |
# We pass 0 instead, and override seconds, microseconds, days. | |
# In principle we could pass 0 for ns and us too. | |
if reso == NPY_FR_ns: | |
td_base = _Timedelta.__new__(cls, microseconds=int(value) // 1000) | |
elif reso == NPY_DATETIMEUNIT.NPY_FR_us: | |
td_base = _Timedelta.__new__(cls, microseconds=int(value)) | |
elif reso == NPY_DATETIMEUNIT.NPY_FR_ms: | |
td_base = _Timedelta.__new__(cls, milliseconds=0) | |
elif reso == NPY_DATETIMEUNIT.NPY_FR_s: | |
td_base = _Timedelta.__new__(cls, seconds=0) |
and then the relevant note in the seconds attribute:
pandas/pandas/_libs/tslibs/timedeltas.pyx
Lines 1103 to 1104 in 43691a2
# NB: using the python C-API PyDateTime_DELTA_GET_SECONDS will fail | |
# (or be incorrect) |
So it is a known issue that those C API functions no longer will work with Timedeltas of second or millisecond resolution.
Pyarrow is exactly using those C APIs, which are now silently incorrect for second and millisecond resolution Timedeltas. While pyarrow should certainly adapt to properly support pandas.Timedelta (especially for cases outside the range of datetime.timedelta), I think it might be safer for pandas to ensure those C APIs keep working (I think pyarrow will certainly not be the only one using those).
It seems possible to me to at least set those for the cases where they are within the range of datetime.timedelta (which will be by far the large majority of cases, since it goes up to 999999999 days, which is more than 2 million years if I did the math correctly), at the cost of making the constructor a bit more costly.