-
Notifications
You must be signed in to change notification settings - Fork 1.9k
fix: derive Spark sha2 nullability and add tests #19323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: derive Spark sha2 nullability and add tests #19323
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR aligns the Spark sha2 function's nullability handling with Spark semantics by deriving the output field's nullability from input types and scalars. The implementation shifts from return_type to return_field_from_args to properly track nullability information.
Key Changes:
- Replaced
return_typewithreturn_field_from_argsfor nullability-aware return type derivation - Added logic to compute output nullability based on input field nullability and null scalar values
- Added comprehensive unit tests for various nullability scenarios
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
5456164 to
05c187b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
|
@rluvaton are we sure on this requirement? I took a look at Spark code and it seems to be nullable, unless I'm reading it wrong/looking at wrong place: Also we need to consider when the bit lengths is an array; if there is an invalid bit length it should null, see this example test: statement ok
CREATE TABLE test_table (
expr STRING NOT NULL,
bit_length INT NOT NULL
) as VALUES
('foo', 0),
('foo', 999)
;
query T
select sha2(arrow_cast(expr, 'Utf8'), bit_length) from test_table;
----
2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae
NULLSo deriving nullable = false would only work if the bit_length is a constant scalar. |
|
You are right, some expressions I was wrong Would you mind close the issue with this comment? |
|
Thanks @ShashidharM0118 & @martin-g, but seems these changes aren't required |
Which issue does this PR close?
Closes #19159
Rationale for this change
Align Spark
sha2nullability with Spark semantics; output reflects nullable inputs/scalars.What changes are included in this PR?
return_field_from_args, using input types/nullability (including null scalars).return_typeto point toreturn_field_from_args.Field,FieldRef,ReturnFieldArgs,internal_err.test_sha2_nullability.Are these changes tested?
Yes,
Are there any user-facing changes?
No