-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat(spark): implement Spark make_interval
function
#17424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
make_interval
function
match length { | ||
x if x > 7 => { | ||
exec_err!( | ||
"make_interval expects between 1 and 7, got {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"make_interval expects between 1 and 7, got {}", | |
"make_interval expects between 0 and 7 arguments, got {}", |
use arrow::array::AsArray; | ||
use arrow::datatypes::{Float64Type, Int32Type}; | ||
|
||
// 0 args is in invoke_with_args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason that special case isn't handled in here instead?
for (i, a) in args.iter().enumerate().skip(1) { | ||
if a.len() != n_rows { | ||
return exec_err!( | ||
"make_dt_interval: argument {i} has length {}, expected {n_rows}", | ||
a.len() | ||
); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do functions usually need to do this check themselves? Is it possible to reach this point where functions are called with arrays of uneven length?
use datafusion_common::DataFusionError; | ||
|
||
if !sec.is_finite() { | ||
return Err(DataFusionError::Execution("seconds is NaN/Inf".into())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Err(DataFusionError::Execution("seconds is NaN/Inf".into())); | |
return Err(DataFusionError::Execution("seconds cannot be NaN or Inf".into())); |
|
||
// 0 args is in invoke_with_args | ||
if args.is_empty() || args.len() > 7 { | ||
return exec_err!("make_interval expects between 0 and 7, got {}", args.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return exec_err!("make_interval expects between 0 and 7, got {}", args.len()); | |
return exec_err!("make_interval expects between 0 and 7 arguments, got {}", args.len()); |
Though ideally coerce types should already handle this check, I believe
let secs_nanos = sec_int | ||
.checked_mul(1_000_000_000) | ||
.ok_or_else(|| DataFusionError::Execution("seconds to nanos overflow".into()))?; | ||
|
||
let total_nanos = hours_nanos | ||
.checked_add(mins_nanos) | ||
.and_then(|v| v.checked_add(secs_nanos)) | ||
.and_then(|v| v.checked_add(frac_nanos)) | ||
.ok_or_else(|| DataFusionError::Execution("sum nanos overflow".into()))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these error messages could have some more detail to help the user understand exactly what happened
# seconds is now blank line | ||
#query ? | ||
#SELECT make_interval(0, 0, 0, 0, 0, 0, 0.0); | ||
#---- | ||
#0.000000000 secs | ||
|
||
#query ? | ||
#SELECT make_interval(); | ||
#---- | ||
#0.000000000 secs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a plan to uncomment these tests?
Which issue does this PR close?
part of #15914
Rationale for this change
Migrate spark functions from https://github.com/lakehq/sail/ to datafusion engine to unify codebase
What changes are included in this PR?
implement spark udf make_interval
https://spark.apache.org/docs/latest/api/sql/index.html#make_interval
Are these changes tested?
unit-tests and sqllogictests added
Are there any user-facing changes?
now can be called in queries