Skip to content

Conversation

@harshchawra
Copy link

Which issue does this PR close?

Rationale for this change

The Spark next_day UDF can return NULL for malformed day_of_week values even when all input arguments are non-null.

However, the existing implementation inferred a non-nullable return type. This happened because the UDF implemented only return_type(), which returns a DataType but does not propagate nullability information.

DataFusion requires UDFs to implement return_field_from_args() when return nullability depends on input fields or runtime validation.

As a result:

  • next_day(non_nullable_date, non_nullable_string) could still produce NULL at runtime
  • Logical plans and schema inference incorrectly marked the output as non-nullable
  • This diverged from Spark semantics, where invalid day_of_week values yield NULL

This PR corrects the nullability inference to accurately model Spark behavior.

What changes are included in this PR?

Implemented return_field_from_args() for the Spark next_day UDF

  • Output type: Date32
  • Output nullability: derived from input fields and possible runtime NULL outcomes

Updated return_type() to return an error, per DataFusion API guidelines when overriding nullability

Added unit tests verifying:

  • Non-nullable inputs → non-nullable output
  • Nullable inputs → nullable output

Are these changes tested?

Yes.

This PR includes new unit tests that validate:

Correct nullability inference

  • Proper enforcement of return_field_from_args()
  • No change in runtime evaluation semantics

Are there any user-facing changes?

Yes, but they are correctness fixes, not breaking changes.

  • The Spark next_day UDF now correctly reports nullable output schemas
  • No API changes
  • No runtime behavior changes — only planner metadata is corrected

@github-actions github-actions bot added the spark label Dec 15, 2025
@harshchawra harshchawra force-pushed the fix/spark-next_day-nullability branch from 89d6070 to e3f2b2a Compare December 15, 2025 18:08
@harshchawra
Copy link
Author

Cleaned up commit history to a single logical change.
Rebased on latest upstream/main.

})
.unwrap();

assert!(field.is_nullable());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add tests for non-None scalar arguments and for invalid scalar arguments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added tests for non-None scalar arguments as well as invalid scalar arguments.

@harshchawra harshchawra force-pushed the fix/spark-next_day-nullability branch 2 times, most recently from 858cd40 to cf214a4 Compare December 16, 2025 18:18
// returns NULL instead of an error for a malformed dayOfWeek.
None
}
let day_of_week = match day_of_week.trim().to_uppercase().as_str() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be optimized to use str::eq_ignore_ascii_case() as below

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated implementation to use eq_ignore_ascii_case().

@harshchawra harshchawra force-pushed the fix/spark-next_day-nullability branch from cf214a4 to a001f77 Compare December 17, 2025 13:53
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jefffrey
Copy link
Contributor

Looks like we already have

@Jefffrey Jefffrey closed this Dec 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

spark next_day need to have custom nullability

3 participants