Skip to content

Conversation

@bubulalabu
Copy link
Contributor

@bubulalabu bubulalabu commented Dec 13, 2025

Relates to:

Rationale for this change

Named arguments currently require all parameters from the first to the last provided argument to be explicitly specified. This forces verbose workarounds like func(a => 1, b => NULL, c => NULL, d => 5) instead of the cleaner func(a => 1, d => 5).

What changes are included in this PR?

Modified datafusion/expr/src/arguments.rs to automatically fill skipped parameters with NULL when using named arguments. The algorithm finds the highest provided parameter index and fills any gaps with NULL expressions.

This means:

  • func(100, NULL, 300) (positional NULL)
  • func(p1 => 100, p2 => NULL, p3 => 300) (explicit NULL)
  • func(p1 => 100, p3 => 300) (skipped parameter)

All three are equivalent - they pass the same NULL value to the function at position 2.

Are these changes tested?

Yes. Added unit tests in arguments.rs and SQL integration tests in named_arguments.slt, including tests that verify explicit NULL and skipped parameters behave identically across different type signatures (Int64, String, etc.).

Are there any user-facing changes?

Yes. Users can now skip middle parameters with named arguments - they will be filled with NULL. The function signature validation accepts NULL for any type, and it's up to the UDF implementation to handle NULL values appropriately.

This change is fully backward compatible - all existing queries continue to work unchanged.

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels Dec 13, 2025
@bubulalabu bubulalabu changed the title Fix allow to skip middle optional named parameters fix: allow to skip middle optional named parameters Dec 13, 2025
@Jefffrey
Copy link
Contributor

One thing I'm not understanding, is that this PR seems to imply all arguments are now optional and any can be skipped, filling in skipped arguments with scalar NULL; this means it is up to the UDFs themselves to check which arguments were provided (and define which are required), and assume that any scalar NULLs mean the argument was missing (even if the caller passed in a scalar NULL themself), correct?

For example if we had a UDF like so:

name: custom_udf
signatures:
  - prefix: string, length: i64
  - prefix: string, suffix: string, length: i64

Technically here suffix is optional, and when calling the function if we call it two 2 arguments (string & i64) then we'd know that only 2 arguments were provided and the implementation of the UDF can branch based on that. However if we call with named arguments for prefix and length but still omit suffix:

custom_udf(prefix => 'a', length => 1)

We'd always have 3 arguments provided because suffix gets filled with null.

  • Technically I think this code wouldn't work on current main anyway, now that I think of it; though it would just cause an error

@bubulalabu bubulalabu force-pushed the fix-allow-to-skip-middle-optional-named-parameters branch from a2e6175 to 0b61542 Compare December 17, 2025 17:02
@bubulalabu
Copy link
Contributor Author

Hey @Jefffrey, thanks for having a look.

What actually happens: This PR allows skipping any parameter with named arguments by filling it with NULL. The UDF receives that NULL and can't distinguish between explicit NULL vs skipped parameter. All three are identical:

custom_udf('a', NULL, 1)
custom_udf(prefix=>'a', suffix=>NULL, length=>1)
custom_udf(prefix=>'a', length=>1)

All three pass 3 arguments with suffix=NULL to the function.

Regarding your OneOf example: That wouldn't work well with parameter_names regardless of this PR - shorter signatures must be prefixes of longer ones (each position means the same thing across variants). Your example has position 1 meaning different things (length vs suffix).

On NULL compatibility: NULL is universally accepted by all type signatures in DataFusion - the signature matching logic explicitly treats DataType::Null as compatible with any type. So signature validation will always pass with the NULL-filled arguments. All function are expected to handle NULLs.

The function implementation decides how to handle that NULL (skip it, error, return NULL, etc.). The PR just sets all missing previous parameters NULL under the hood instead of requiring consecutive filling.

Please have a look at the most recent commit where I stress test your concerns.

Does that clarify things?

I will update the title and description of the PR

@bubulalabu bubulalabu changed the title fix: allow to skip middle optional named parameters fix: allow to skip named parameters and fill skipped with NULL Dec 17, 2025
@bubulalabu bubulalabu changed the title fix: allow to skip named parameters and fill skipped with NULL feat: allow to skip named parameters and fill skipped with NULL Dec 17, 2025
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide to go ahead with this approach we need to ensure we have documentation explaining this behaviour.

Also, while having more tests is great, it would be also good if we can work on compacting it where possible as I'm not sure the amount of tests introduced here is strictly necessary (test code is also code that needs to be maintained 😅)

}

#[test]
fn test_alternating_filled_and_skipped() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this test is the same as test_sparse_parameters, can probably remove this

@@ -269,3 +269,80 @@ SELECT row_number(value => 1) OVER (ORDER BY id) FROM window_test;
# Cleanup
statement ok
DROP TABLE window_test;

#############
## Test UDF with Many Optional Parameters (test_optional_params - sums p1 through p7)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these SLT tests duplicate the same unit tests in arguments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there's an overlap between the SLTs and the unit tests. What kind is generally preferred?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer SLTs

Self {
signature: Signature::one_of(
vec![
// Support 1 to 7 parameters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we can reduce the number of parameters here, and for the invoke do something as simple as outputting a string of which parameters were provided, to make it more clear which parameters were actually successfully passed through.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants