Skip to content

Conversation

@jatin510
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

This commit adds support for the Arrow Float16 type in Substrait plans.

Are these changes tested?

Yes

Are there any user-facing changes?

Add support for Arrow Float16 type in Substrait plans

@github-actions github-actions bot added the substrait Changes to the substrait crate label Jul 15, 2025
@alamb
Copy link
Contributor

alamb commented Jul 15, 2025

Thank you @jatin510

@gabotechs or @LiaCastaneda would you have time to review this PR?

Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should follow another approach based on UDTs for shipping support for F16s.

I myself asked the question about whether it was fine to use a type variation ref for F16s here substrait-io/substrait#822, and this was the response:

No. Type variations are different encodings for a type (e.g. dictionary, string view) and they must be able to map 1:1 with the base type.

I don't think there's any precedence about using a UDT for representing an arrow type in Substrait, but maybe this is the first use-case.

@jatin510
Copy link
Contributor Author

made some changes @gabotechs

Copy link
Contributor

@LiaCastaneda LiaCastaneda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me! unless Gabriel thinks otherwise since he’s leading the epic.

DataType::Float16 => Ok(substrait::proto::Type {
kind: Some(r#type::Kind::UserDefined(r#type::UserDefined {
type_reference: FLOAT16_TYPE_REF,
type_variation_reference: 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use DEFAULT_TYPE_VARIATION_REF?

Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look good! unfortunately I think we might be missing some important details when working with UDTs in Substrait.

When working with User Define Types and User Defined Functions in Substrait, their references need to appear at the top level node that represents the plan:

https://github.com/substrait-io/substrait/blob/main/proto/substrait/plan.proto#L32-L35

Luckily, DataFusion already provides tooling for registering User Defined Types in the Substrait plan:

pub fn register_type(&mut self, type_name: String) -> u32 {

As there is no precedence in generating UDTs out of DataFusion plans, several necessary pieces are still not there, and the work pending might not be trivial. For example, I see the SubstraitProducer trait having a register_function method, but it does not have a register_type method (

fn register_function(&mut self, signature: String) -> u32;
) it probably needs to be added, and a mutable reference to the SubstraitProducer will need be threaded to every place that can potentially produce a new UDT. The actual number identifying the UDT should probably come from this new register_type method, rather than being hardcoded in a constant, same as for UDFs.

@alamb
Copy link
Contributor

alamb commented Jul 25, 2025

I am not quite sure what the next steps for this PR are. @gabotechs do you think there need to be changes, or is #16793 (review) suggesting changes for a future PR?

@gabotechs
Copy link
Contributor

or is #16793 (review) suggesting changes for a future PR?

I think those changes will need to happen either in this PR, or one before this. IMO they can be done in this PR, but I understand it might fall out of the scope of the initial intentions of @jatin510

@github-actions
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Sep 27, 2025
@github-actions github-actions bot closed this Oct 7, 2025
github-merge-queue bot pushed a commit that referenced this pull request Oct 17, 2025
## Which issue does this PR close?

- Closes #16298

## Rationale for this change

Float16 is an Arrow type. Substrait serialization for the type is
defined in
https://github.com/apache/arrow/blame/main/format/substrait/extension_types.yaml
as part of Arrow. We should support it.

This picks up where #16793
leaves off.

## What changes are included in this PR?

Support for converting DataType::Float16 to/from Substrait.
Support for converting ScalarValue::Float16 to/from Substrait.

## Are these changes tested?

Yes

## Are there any user-facing changes?

Yes.

The `SubstraitProducer` trait received a new method (`register_type`)
which downstream implementors will need to provide an implementation
for. The example custom producer has been updated with a default
implementation.

One public method that changed is
[`datafusion_substrait::logical_plan::producer::from_empty_relation`](https://docs.rs/datafusion-substrait/50.2.0/datafusion_substrait/logical_plan/producer/fn.from_empty_relation.html).
I'm not sure if that is meant to be part of the public API (for one
thing, it is undocumented, though maybe this is because it serves an
obvious purpose. It also returns a `Rel` which is a pretty internal
structure).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Stale PR has not had any activity for some time substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[substrait] [sqllogictest] Unsupported cast type: Float16

4 participants