-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat: [datafusion-spark] Implement next_day
function
#16780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
next_day
function
@@ -23,5 +23,17 @@ | |||
|
|||
## Original Query: SELECT next_day('2015-01-14', 'TU'); | |||
## PySpark 3.5.5 Result: {'next_day(2015-01-14, TU)': datetime.date(2015, 1, 20), 'typeof(next_day(2015-01-14, TU))': 'date', 'typeof(2015-01-14)': 'string', 'typeof(TU)': 'string'} | |||
#query | |||
#SELECT next_day('2015-01-14'::string, 'TU'::string); | |||
query D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend to add tests for invalid inputs:
- 0 or >2 inputs
- Each element can be either valid input, invalid input of correct type like
2015-13-32
, or invalid types, and finally nulls. We want to test different combinations, to ensure for invalid inputs, the expected (and easy-to-understand) errors are returned, instead of panicking.
Also here we only checked ScalarValue()
input, let's also do the tests for Array
inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added all of these cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Add support for Spark’s next_day
function in DataFusion by implementing the UDF and its tests, registering it in the datetime module, and adding chrono
as a dependency.
- Introduced SQLLogicTest cases for
next_day
- Implemented
SparkNextDay
UDF (scalar + array) - Registered the UDF in
mod.rs
and updatedCargo.toml
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
next_day.slt | Added functional tests for next_day with various inputs |
next_day.rs | Full implementation of next_day UDF logic |
mod.rs | Registered and exported next_day in datetime module |
Cargo.toml | Added chrono as a workspace dependency |
Comments suppressed due to low confidence (3)
datafusion/spark/src/function/datetime/next_day.rs:77
- The code only handles
Date32
inputs for the date argument but the tests pass string dates. You need to add a branch to parseScalarValue::Utf8
/LargeUtf8
as ISO-8601 dates and convert them toDate32
before computing the next day.
(ColumnarValue::Scalar(date), ColumnarValue::Scalar(day_of_week)) => {
datafusion/sqllogictest/test_files/spark/datetime/next_day.slt:32
- Consider adding tests for edge cases such as NULL inputs and invalid weekday strings to verify null propagation and error handling behavior.
SELECT next_day('2015-07-27'::string, 'Sun'::string);
datafusion/spark/Cargo.toml:40
- The syntax for adding a workspace dependency is incorrect. Change to
chrono = { workspace = true }
to match the other entries.
chrono.workspace = true
|
||
export_functions!(( | ||
next_day, | ||
"Returns the first date which is later than start_date and named as indicated. The function returns NULL if at least one of the input parameters is NULL. When both of the input parameters are not NULL and day_of_week is an invalid input, the function throws SparkIllegalArgumentException if spark.sql.ansi.enabled is set to true, otherwise NULL.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be adjusted. Rust does not have exceptions and ansi mode is not hooked up yet (might need something like #16661 for that to happen)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed that part of the doc and moved it to a comment for now.
Thanks! I think it's ready to go after the merge conflict is resolved. |
Nice, I resolved the conflicts |
## Original Query: SELECT next_day('2015-01-14', 'TU'); | ||
## PySpark 3.5.5 Result: {'next_day(2015-01-14, TU)': datetime.date(2015, 1, 20), 'typeof(next_day(2015-01-14, TU))': 'date', 'typeof(2015-01-14)': 'string', 'typeof(TU)': 'string'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can safely remove the commented-out tests, since their functionality is already covered below.
Additionally, there appear to be some duplicate test cases in the lower section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @petern48, it is LGTM
Which issue does this PR close?
date
functionnext_day
#16775Rationale for this change
See #16775
What changes are included in this PR?
Implement spark-compatible
next_day
functionAre these changes tested?
Yes, I added tests from all of the links in the Spark Test Files README.md
Are there any user-facing changes?
Yes, new function.