-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Add support for Float16 type in substrait #16793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you @jatin510 @gabotechs or @LiaCastaneda would you have time to review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should follow another approach based on UDTs for shipping support for F16s.
I myself asked the question about whether it was fine to use a type variation ref for F16s here substrait-io/substrait#822, and this was the response:
No. Type variations are different encodings for a type (e.g. dictionary, string view) and they must be able to map 1:1 with the base type.
I don't think there's any precedence about using a UDT for representing an arrow type in Substrait, but maybe this is the first use-case.
|
made some changes @gabotechs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me! unless Gabriel thinks otherwise since he’s leading the epic.
| DataType::Float16 => Ok(substrait::proto::Type { | ||
| kind: Some(r#type::Kind::UserDefined(r#type::UserDefined { | ||
| type_reference: FLOAT16_TYPE_REF, | ||
| type_variation_reference: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use DEFAULT_TYPE_VARIATION_REF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to look good! unfortunately I think we might be missing some important details when working with UDTs in Substrait.
When working with User Define Types and User Defined Functions in Substrait, their references need to appear at the top level node that represents the plan:
https://github.com/substrait-io/substrait/blob/main/proto/substrait/plan.proto#L32-L35
Luckily, DataFusion already provides tooling for registering User Defined Types in the Substrait plan:
| pub fn register_type(&mut self, type_name: String) -> u32 { |
As there is no precedence in generating UDTs out of DataFusion plans, several necessary pieces are still not there, and the work pending might not be trivial. For example, I see the SubstraitProducer trait having a register_function method, but it does not have a register_type method (
| fn register_function(&mut self, signature: String) -> u32; |
SubstraitProducer will need be threaded to every place that can potentially produce a new UDT. The actual number identifying the UDT should probably come from this new register_type method, rather than being hardcoded in a constant, same as for UDFs.
|
I am not quite sure what the next steps for this PR are. @gabotechs do you think there need to be changes, or is #16793 (review) suggesting changes for a future PR? |
I think those changes will need to happen either in this PR, or one before this. IMO they can be done in this PR, but I understand it might fall out of the scope of the initial intentions of @jatin510 |
|
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
## Which issue does this PR close? - Closes #16298 ## Rationale for this change Float16 is an Arrow type. Substrait serialization for the type is defined in https://github.com/apache/arrow/blame/main/format/substrait/extension_types.yaml as part of Arrow. We should support it. This picks up where #16793 leaves off. ## What changes are included in this PR? Support for converting DataType::Float16 to/from Substrait. Support for converting ScalarValue::Float16 to/from Substrait. ## Are these changes tested? Yes ## Are there any user-facing changes? Yes. The `SubstraitProducer` trait received a new method (`register_type`) which downstream implementors will need to provide an implementation for. The example custom producer has been updated with a default implementation. One public method that changed is [`datafusion_substrait::logical_plan::producer::from_empty_relation`](https://docs.rs/datafusion-substrait/50.2.0/datafusion_substrait/logical_plan/producer/fn.from_empty_relation.html). I'm not sure if that is meant to be part of the public API (for one thing, it is undocumented, though maybe this is because it serves an obvious purpose. It also returns a `Rel` which is a pretty internal structure).
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
This commit adds support for the Arrow Float16 type in Substrait plans.
Are these changes tested?
Yes
Are there any user-facing changes?
Add support for Arrow Float16 type in Substrait plans