-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix/spark next day nullability #19340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/spark next day nullability #19340
Conversation
89d6070 to
e3f2b2a
Compare
|
Cleaned up commit history to a single logical change. |
| }) | ||
| .unwrap(); | ||
|
|
||
| assert!(field.is_nullable()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add tests for non-None scalar arguments and for invalid scalar arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added tests for non-None scalar arguments as well as invalid scalar arguments.
858cd40 to
cf214a4
Compare
| // returns NULL instead of an error for a malformed dayOfWeek. | ||
| None | ||
| } | ||
| let day_of_week = match day_of_week.trim().to_uppercase().as_str() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be optimized to use str::eq_ignore_ascii_case() as below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated implementation to use eq_ignore_ascii_case().
cf214a4 to
a001f77
Compare
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark seems to define NextDay as always nullable:
|
Looks like we already have |
Which issue does this PR close?
next_dayneed to have custom nullability #19156Rationale for this change
The Spark
next_dayUDF can return NULL for malformed day_of_week values even when all input arguments are non-null.However, the existing implementation inferred a non-nullable return type. This happened because the UDF implemented only
return_type(), which returns aDataTypebut does not propagate nullability information.DataFusion requires UDFs to implement
return_field_from_args()when return nullability depends on input fields or runtime validation.As a result:
next_day(non_nullable_date, non_nullable_string)could still produce NULL at runtimeday_of_weekvalues yield NULLThis PR corrects the nullability inference to accurately model Spark behavior.
What changes are included in this PR?
Implemented return_field_from_args() for the Spark next_day UDF
Date32Updated
return_type()to return an error, per DataFusion API guidelines when overriding nullabilityAdded unit tests verifying:
Are these changes tested?
Yes.
This PR includes new unit tests that validate:
Correct nullability inference
return_field_from_args()Are there any user-facing changes?
Yes, but they are correctness fixes, not breaking changes.
next_dayUDF now correctly reports nullable output schemas