[SPARK-53330][SQL][PYTHON] Fix Arrow UDF with DayTimeIntervalType (bounds != start/end) #52077

benrobby · 2025-08-19T21:31:00Z

What changes were proposed in this pull request?

makes ArrowEvalPythonExec type check more lenient when comparing DayTimeIntervalTypes so that they are considered equal when the source type has more information than the target type. The arrow serialization always sends full intervals either way, so that should always be true. We can then rely on the engine to interpret the data according to the node's output type.

Why are the changes needed?

When a pyspark udf (useArrow=true) returns interval type data, it currently errors with below error when the resultType (e.g., DayTimeIntervalType) has begin/end that don't span the maximum range.

org.apache.spark.SparkException: [ARROW_TYPE_MISMATCH] Invalid schema from pandas_udf(): expected DayTimeIntervalType(1,3), got DayTimeIntervalType(0,3). SQLSTATE: 42K0G

Repro:

from pyspark.sql.types import DayTimeIntervalType
from pyspark.sql.functions import udf
 
# this works
@udf(useArrow=True, returnType=DayTimeIntervalType(0, 3))
def return_interval1(x):
  return x

# this fails, although it matches the input type HOUR TO SECOND
@udf(useArrow=True, returnType=DayTimeIntervalType(1, 3)) 
def return_interval2(x):
  return x

spark.sql("SELECT INTERVAL '10:30:45.123' HOUR TO SECOND as value").select(return_interval2("value")).collect()

The cause is that when the worker sends data back, it is always just sends a full arrow duration, which does not remember begin or end index. In above example, the begin should be HOUR (1), and that causes the node to throw said ARROW_TYPE_MISMATCH.

YearToMonthIntervalType is not supported in arrow udfs, so that is currently not a concern.

Does this PR introduce any user-facing change?

Yes, a bug fix that enables behavior that previously threw an error.

How was this patch tested?

added python tests

Was this patch authored or co-authored using generative AI tooling?

No

…ds != start-end

benrobby · 2025-08-19T21:32:56Z

Hi @HyukjinKwon, I see that you authored the DayTimeInterval support, could you take a look?

benhurdelhey added 2 commits August 19, 2025 21:21

[SPARK-53330] Fix Arrow UDF support for DayTimeIntervalType with boun…

a4b6c91

…ds != start-end

try recursive equality check

b8c786e

benrobby changed the title ~~[SPARK-53330] Fix Arrow UDF support for DayTimeIntervalType with bounds != start-end~~ [SPARK-53330] Fix Arrow UDF with DayTimeIntervalType with bounds != start-end Aug 19, 2025

github-actions bot added SQL PYTHON labels Aug 19, 2025

benrobby changed the title ~~[SPARK-53330] Fix Arrow UDF with DayTimeIntervalType with bounds != start-end~~ [SPARK-53330] Fix Arrow UDF with DayTimeIntervalType (bounds != start/end) Aug 19, 2025

benhurdelhey added 3 commits August 20, 2025 08:15

cleanup imports

b07224c

lint

0a30641

format

731fd8c

HyukjinKwon changed the title ~~[SPARK-53330] Fix Arrow UDF with DayTimeIntervalType (bounds != start/end)~~ [SPARK-53330][SQL][PYTHON] Fix Arrow UDF with DayTimeIntervalType (bounds != start/end) Aug 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-53330][SQL][PYTHON] Fix Arrow UDF with DayTimeIntervalType (bounds != start/end) #52077

[SPARK-53330][SQL][PYTHON] Fix Arrow UDF with DayTimeIntervalType (bounds != start/end) #52077

benrobby commented Aug 19, 2025 •

edited

Loading

Uh oh!

benrobby commented Aug 19, 2025

Uh oh!

Uh oh!

[SPARK-53330][SQL][PYTHON] Fix Arrow UDF with DayTimeIntervalType (bounds != start/end) #52077

Are you sure you want to change the base?

[SPARK-53330][SQL][PYTHON] Fix Arrow UDF with DayTimeIntervalType (bounds != start/end) #52077

Conversation

benrobby commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

benrobby commented Aug 19, 2025

Uh oh!

Uh oh!

benrobby commented Aug 19, 2025 •

edited

Loading