-
Notifications
You must be signed in to change notification settings - Fork 1.9k
feat: allow to skip named parameters and fill skipped with NULL #19308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: allow to skip named parameters and fill skipped with NULL #19308
Conversation
|
One thing I'm not understanding, is that this PR seems to imply all arguments are now optional and any can be skipped, filling in skipped arguments with scalar For example if we had a UDF like so: Technically here custom_udf(prefix => 'a', length => 1)We'd always have 3 arguments provided because
|
a2e6175 to
0b61542
Compare
|
Hey @Jefffrey, thanks for having a look. What actually happens: This PR allows skipping any parameter with named arguments by filling it with NULL. The UDF receives that NULL and can't distinguish between explicit NULL vs skipped parameter. All three are identical: custom_udf('a', NULL, 1)
custom_udf(prefix=>'a', suffix=>NULL, length=>1)
custom_udf(prefix=>'a', length=>1)All three pass 3 arguments with suffix=NULL to the function. Regarding your OneOf example: That wouldn't work well with parameter_names regardless of this PR - shorter signatures must be prefixes of longer ones (each position means the same thing across variants). Your example has position 1 meaning different things (length vs suffix). On NULL compatibility: NULL is universally accepted by all type signatures in DataFusion - the signature matching logic explicitly treats The function implementation decides how to handle that NULL (skip it, error, return NULL, etc.). The PR just sets all missing previous parameters NULL under the hood instead of requiring consecutive filling. Please have a look at the most recent commit where I stress test your concerns. Does that clarify things? I will update the title and description of the PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to go ahead with this approach we need to ensure we have documentation explaining this behaviour.
Also, while having more tests is great, it would be also good if we can work on compacting it where possible as I'm not sure the amount of tests introduced here is strictly necessary (test code is also code that needs to be maintained 😅)
| } | ||
|
|
||
| #[test] | ||
| fn test_alternating_filled_and_skipped() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel this test is the same as test_sparse_parameters, can probably remove this
| @@ -269,3 +269,80 @@ SELECT row_number(value => 1) OVER (ORDER BY id) FROM window_test; | |||
| # Cleanup | |||
| statement ok | |||
| DROP TABLE window_test; | |||
|
|
|||
| ############# | |||
| ## Test UDF with Many Optional Parameters (test_optional_params - sums p1 through p7) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these SLT tests duplicate the same unit tests in arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there's an overlap between the SLTs and the unit tests. What kind is generally preferred?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer SLTs
| Self { | ||
| signature: Signature::one_of( | ||
| vec![ | ||
| // Support 1 to 7 parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we can reduce the number of parameters here, and for the invoke do something as simple as outputting a string of which parameters were provided, to make it more clear which parameters were actually successfully passed through.
Relates to:
Rationale for this change
Named arguments currently require all parameters from the first to the last provided argument to be explicitly specified. This forces verbose workarounds like
func(a => 1, b => NULL, c => NULL, d => 5)instead of the cleanerfunc(a => 1, d => 5).What changes are included in this PR?
Modified
datafusion/expr/src/arguments.rsto automatically fill skipped parameters with NULL when using named arguments. The algorithm finds the highest provided parameter index and fills any gaps with NULL expressions.This means:
func(100, NULL, 300)(positional NULL)func(p1 => 100, p2 => NULL, p3 => 300)(explicit NULL)func(p1 => 100, p3 => 300)(skipped parameter)All three are equivalent - they pass the same NULL value to the function at position 2.
Are these changes tested?
Yes. Added unit tests in
arguments.rsand SQL integration tests innamed_arguments.slt, including tests that verify explicit NULL and skipped parameters behave identically across different type signatures (Int64, String, etc.).Are there any user-facing changes?
Yes. Users can now skip middle parameters with named arguments - they will be filled with NULL. The function signature validation accepts NULL for any type, and it's up to the UDF implementation to handle NULL values appropriately.
This change is fully backward compatible - all existing queries continue to work unchanged.