Skip to content

Conversation

@shashidhar-bm
Copy link
Contributor

Which issue does this PR close?

Closes #19159

Rationale for this change

Align Spark sha2 nullability with Spark semantics; output reflects nullable inputs/scalars.

What changes are included in this PR?

  • Derive return field via return_field_from_args, using input types/nullability (including null scalars).
  • Stub return_type to point to return_field_from_args.
  • Update imports for Field, FieldRef, ReturnFieldArgs, internal_err.
  • Add nullability unit test test_sha2_nullability.

Are these changes tested?

Yes,

Are there any user-facing changes?

No

Copilot AI review requested due to automatic review settings December 15, 2025 06:50
@github-actions github-actions bot added the spark label Dec 15, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aligns the Spark sha2 function's nullability handling with Spark semantics by deriving the output field's nullability from input types and scalars. The implementation shifts from return_type to return_field_from_args to properly track nullability information.

Key Changes:

  • Replaced return_type with return_field_from_args for nullability-aware return type derivation
  • Added logic to compute output nullability based on input field nullability and null scalar values
  • Added comprehensive unit tests for various nullability scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@shashidhar-bm shashidhar-bm force-pushed the fix/spark-sha2-nullability branch from 5456164 to 05c187b Compare December 15, 2025 14:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Jefffrey
Copy link
Contributor

@rluvaton are we sure on this requirement? I took a look at Spark code and it seems to be nullable, unless I'm reading it wrong/looking at wrong place:

https://github.com/apache/spark/blob/20af8bdfb9073be5711cea5df0c5ce0ff168e92c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L107-L114

Also we need to consider when the bit lengths is an array; if there is an invalid bit length it should null, see this example test:

statement ok
CREATE TABLE test_table (
  expr STRING NOT NULL,
  bit_length INT NOT NULL
) as VALUES
  ('foo', 0),
  ('foo', 999)
;

query T
select sha2(arrow_cast(expr, 'Utf8'), bit_length) from test_table;
----
2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae
NULL

So deriving nullable = false would only work if the bit_length is a constant scalar.

@rluvaton
Copy link
Member

rluvaton commented Dec 22, 2025

You are right, some expressions I was wrong

Would you mind close the issue with this comment?

@Jefffrey
Copy link
Contributor

Thanks @ShashidharM0118 & @martin-g, but seems these changes aren't required

@Jefffrey Jefffrey closed this Dec 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

spark sha2 need to have custom nullability

4 participants