Skip to content

feat: [datafusion-spark] Implement next_day function #16780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jul 29, 2025

Conversation

petern48
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

See #16775

What changes are included in this PR?

Implement spark-compatible next_day function

Are these changes tested?

Yes, I added tests from all of the links in the Spark Test Files README.md

Are there any user-facing changes?

Yes, new function.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Jul 15, 2025
@petern48 petern48 changed the title feat: Implement next_day feat: [datafusion-spark] Implement next_day function Jul 15, 2025
@petern48 petern48 marked this pull request as ready for review July 15, 2025 04:55
@@ -23,5 +23,17 @@

## Original Query: SELECT next_day('2015-01-14', 'TU');
## PySpark 3.5.5 Result: {'next_day(2015-01-14, TU)': datetime.date(2015, 1, 20), 'typeof(next_day(2015-01-14, TU))': 'date', 'typeof(2015-01-14)': 'string', 'typeof(TU)': 'string'}
#query
#SELECT next_day('2015-01-14'::string, 'TU'::string);
query D
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend to add tests for invalid inputs:

  1. 0 or >2 inputs
  2. Each element can be either valid input, invalid input of correct type like 2015-13-32, or invalid types, and finally nulls. We want to test different combinations, to ensure for invalid inputs, the expected (and easy-to-understand) errors are returned, instead of panicking.

Also here we only checked ScalarValue() input, let's also do the tests for Array inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added all of these cases

@2010YOUY01 2010YOUY01 requested a review from Copilot July 15, 2025 06:43
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Add support for Spark’s next_day function in DataFusion by implementing the UDF and its tests, registering it in the datetime module, and adding chrono as a dependency.

  • Introduced SQLLogicTest cases for next_day
  • Implemented SparkNextDay UDF (scalar + array)
  • Registered the UDF in mod.rs and updated Cargo.toml

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

File Description
next_day.slt Added functional tests for next_day with various inputs
next_day.rs Full implementation of next_day UDF logic
mod.rs Registered and exported next_day in datetime module
Cargo.toml Added chrono as a workspace dependency
Comments suppressed due to low confidence (3)

datafusion/spark/src/function/datetime/next_day.rs:77

  • The code only handles Date32 inputs for the date argument but the tests pass string dates. You need to add a branch to parse ScalarValue::Utf8/LargeUtf8 as ISO-8601 dates and convert them to Date32 before computing the next day.
            (ColumnarValue::Scalar(date), ColumnarValue::Scalar(day_of_week)) => {

datafusion/sqllogictest/test_files/spark/datetime/next_day.slt:32

  • Consider adding tests for edge cases such as NULL inputs and invalid weekday strings to verify null propagation and error handling behavior.
SELECT next_day('2015-07-27'::string, 'Sun'::string);

datafusion/spark/Cargo.toml:40

  • The syntax for adding a workspace dependency is incorrect. Change to chrono = { workspace = true } to match the other entries.
chrono.workspace = true


export_functions!((
next_day,
"Returns the first date which is later than start_date and named as indicated. The function returns NULL if at least one of the input parameters is NULL. When both of the input parameters are not NULL and day_of_week is an invalid input, the function throws SparkIllegalArgumentException if spark.sql.ansi.enabled is set to true, otherwise NULL.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be adjusted. Rust does not have exceptions and ansi mode is not hooked up yet (might need something like #16661 for that to happen)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed that part of the doc and moved it to a comment for now.

@petern48 petern48 marked this pull request as draft July 16, 2025 05:09
@petern48 petern48 marked this pull request as ready for review July 25, 2025 16:25
@petern48 petern48 requested review from Omega359 and 2010YOUY01 July 25, 2025 16:26
@2010YOUY01
Copy link
Contributor

Thanks! I think it's ready to go after the merge conflict is resolved.

@petern48
Copy link
Contributor Author

Nice, I resolved the conflicts

Comment on lines 24 to 25
## Original Query: SELECT next_day('2015-01-14', 'TU');
## PySpark 3.5.5 Result: {'next_day(2015-01-14, TU)': datetime.date(2015, 1, 20), 'typeof(next_day(2015-01-14, TU))': 'date', 'typeof(2015-01-14)': 'string', 'typeof(TU)': 'string'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can safely remove the commented-out tests, since their functionality is already covered below.
Additionally, there appear to be some duplicate test cases in the lower section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

Copy link
Contributor

@Standing-Man Standing-Man left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @petern48, it is LGTM

@2010YOUY01 2010YOUY01 merged commit aab44fd into apache:main Jul 29, 2025
27 checks passed
@petern48 petern48 deleted the next_day branch July 29, 2025 03:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spark sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[datafusion-spark] Implement Spark date function next_day
4 participants