-
Couldn't load subscription status.
- Fork 1.7k
Clean up display and comparison of types with metadata #18277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| pub fn check_metadata_with_storage_equal( | ||
| actual: ( | ||
| &DataType, | ||
| Option<&std::collections::HashMap<String, String>>, | ||
| ), | ||
| expected: ( | ||
| &DataType, | ||
| Option<&std::collections::HashMap<String, String>>, | ||
| ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I'm trying to remove
| pub fn format_type_and_metadata( | ||
| data_type: &DataType, | ||
| metadata: Option<&std::collections::HashMap<String, String>>, | ||
| ) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this
| false => write!(f, "{v:?} {:?}", metadata.as_ref().unwrap()), | ||
| true => write!(f, "{v:?}"), | ||
| } | ||
| write!(f, "{}", ScalarAndMetadata::new(v.clone(), metadata.clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cloning the scalar here just to render it isn't ideal. I played with just making this Expr::Literal(ScalarAndMetadata) but that touches a lot.
| // TODO: Something about string equality or expression equality is causing these cases | ||
| // to fail | ||
| // TestCase { | ||
| // desc: r#"min(c2) --> "min(c2)" -- (column *named* "min(t.c2)"!)"#, | ||
| // input: sort(min(col("c2"))), | ||
| // expected: sort(col("min(t.c2)")), | ||
| // }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll look into this...something about equality or the string representation.
| let actual = SerializedTypeView::from(field); | ||
| let expected = SerializedTypeView::from(prev); | ||
| if actual != expected { | ||
| return plan_err!( | ||
| "Conflicting types for parameter '{id}': expected {expected} but got {actual}" | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The helper in action
| Prepare: "my_plan" [Utf8, FixedSizeBinary(16)<{"ARROW:extension:name": "arrow.uuid"}>] | ||
| Prepare: "my_plan" [Utf8, arrow.uuid<FixedSizeBinary(16)>] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also significantly improves the plan representation of an extension type (although this is only used in the prepare output)
| #[derive(Debug, Clone, Copy, Eq)] | ||
| pub struct SerializedTypeView<'a, 'b, 'c> { | ||
| arrow_type: &'a DataType, | ||
| extension_name: Option<&'b str>, | ||
| extension_metadata: Option<&'c str>, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lifetimes are kind of annoying but we mix and match metadata in a lot of ways and this manages to not copy anything when we're peeking at the type.
| pub fn extension_metadata(&self) -> Option<&str> { | ||
| if let Some(metadata) = self.extension_metadata { | ||
| if !metadata.is_empty() { | ||
| return Some(metadata); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This normalizes comparing two extension types where the metadata is missing and/or empty. Arrow C++ always exports "empty" instead of "absent" (and used to crash on the "absent" case). The byte-for-byte comparison of the metadata is probably too strict for any actual parameterized extension types but is safer in lieu of pluggable type comparison.
| impl Display for SerializedTypeView<'_, '_, '_> { | ||
| fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
| match (self.extension_name(), self.extension_metadata()) { | ||
| (Some(name), None) => write!(f, "{}<{}>", name, self.arrow_type()), | ||
| (Some(name), Some(metadata)) => { | ||
| write!(f, "{}({})<{}>", name, metadata, self.arrow_type()) | ||
| } | ||
| _ => write!(f, "{}", self.arrow_type()), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| impl PartialEq<SerializedTypeView<'_, '_, '_>> for SerializedTypeView<'_, '_, '_> { | ||
| fn eq(&self, other: &SerializedTypeView) -> bool { | ||
| self.arrow_type() == other.arrow_type() | ||
| && self.extension_name() == other.extension_name() | ||
| && self.extension_metadata() == other.extension_metadata() | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The replacement formatter and comparison.
This does have a subtle difference with the previous version, which is that non extension metadata is not considered with type equality. I think this is a good thing...extension types are formal (we error on mismatch), metadata is informal (we accumulate on mismatch, or at least existing code does).
| impl Display for ScalarAndMetadata { | ||
| fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
| let storage_type = self.value.data_type(); | ||
| let serialized_type = SerializedTypeView::from((&storage_type, self.metadata())); | ||
|
|
||
| let metadata_without_extension_info = self | ||
| .metadata | ||
| .as_ref() | ||
| .map(|metadata| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure all of this is a good idea but I was trying to improve the extension scalar outuput without dropping the display of extra metadata.
Which issue does this PR close?
Follow on PR for #17986 , cleaning a few verbose helpers for comparing equality and printing. This is done by introducing a non-owning SerializedTypeView (name up for debate) that handles comparison and display of extension types.
Rationale for this change
The two helpers designed to centralize the printing and comparison of DataType + Metadata were verbose and hard to understand.
What changes are included in this PR?
SerializedTypeView(==&DataType+Option(extension_name)+Option(extension_metadata)Are these changes tested?
I'll complete the test coverage if this seems like a reasonable approach.