Skip to content

Conversation

davidlghellin
Copy link
Contributor

Which issue does this PR close?

part of #15914

Rationale for this change

Migrate spark functions from https://github.com/lakehq/sail/ to datafusion engine to unify codebase

What changes are included in this PR?

implement spark udf make_interval
https://spark.apache.org/docs/latest/api/sql/index.html#make_interval

Are these changes tested?

unit-tests and sqllogictests added

Are there any user-facing changes?

now can be called in queries

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Sep 4, 2025
@davidlghellin davidlghellin marked this pull request as draft September 4, 2025 19:29
@davidlghellin davidlghellin changed the title feat(spark): implement Spark make_interval function feat(spark): implement Spark make_interval function Sep 6, 2025
@davidlghellin
Copy link
Contributor Author

In spark 3.5
When overflow in years
image

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 6, 2025
@davidlghellin
Copy link
Contributor Author

davidlghellin commented Sep 6, 2025

in this commit f812157 test sqllogictests return blank line always with empty params and all params its 0.

Need to check if all params are 0 like this:

IntervalMonthDayNano::new(0, 0, 0)

return line blank

image

example

@davidlghellin davidlghellin marked this pull request as ready for review September 8, 2025 21:28
@davidlghellin davidlghellin marked this pull request as draft September 8, 2025 21:40
@davidlghellin davidlghellin marked this pull request as ready for review September 9, 2025 05:49
match length {
x if x > 7 => {
exec_err!(
"make_interval expects between 1 and 7, got {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"make_interval expects between 1 and 7, got {}",
"make_interval expects between 0 and 7 arguments, got {}",

use arrow::array::AsArray;
use arrow::datatypes::{Float64Type, Int32Type};

// 0 args is in invoke_with_args
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that special case isn't handled in here instead?

Comment on lines +167 to +174
for (i, a) in args.iter().enumerate().skip(1) {
if a.len() != n_rows {
return exec_err!(
"make_dt_interval: argument {i} has length {}, expected {n_rows}",
a.len()
);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do functions usually need to do this check themselves? Is it possible to reach this point where functions are called with arrays of uneven length?

use datafusion_common::DataFusionError;

if !sec.is_finite() {
return Err(DataFusionError::Execution("seconds is NaN/Inf".into()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Err(DataFusionError::Execution("seconds is NaN/Inf".into()));
return Err(DataFusionError::Execution("seconds cannot be NaN or Inf".into()));


// 0 args is in invoke_with_args
if args.is_empty() || args.len() > 7 {
return exec_err!("make_interval expects between 0 and 7, got {}", args.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return exec_err!("make_interval expects between 0 and 7, got {}", args.len());
return exec_err!("make_interval expects between 0 and 7 arguments, got {}", args.len());

Though ideally coerce types should already handle this check, I believe

Comment on lines +319 to +327
let secs_nanos = sec_int
.checked_mul(1_000_000_000)
.ok_or_else(|| DataFusionError::Execution("seconds to nanos overflow".into()))?;

let total_nanos = hours_nanos
.checked_add(mins_nanos)
.and_then(|v| v.checked_add(secs_nanos))
.and_then(|v| v.checked_add(frac_nanos))
.ok_or_else(|| DataFusionError::Execution("sum nanos overflow".into()))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these error messages could have some more detail to help the user understand exactly what happened

Comment on lines +93 to +102
# seconds is now blank line
#query ?
#SELECT make_interval(0, 0, 0, 0, 0, 0, 0.0);
#----
#0.000000000 secs

#query ?
#SELECT make_interval();
#----
#0.000000000 secs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan to uncomment these tests?

@davidlghellin davidlghellin marked this pull request as draft September 11, 2025 05:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants