-
Notifications
You must be signed in to change notification settings - Fork 1k
[WIP] Support Shredded Lists/Array in variant_get
#8354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple comments that are hopefully helpful.
Also, we should (eventually) support nesting -- arrays and structs inside arrays.
Let's get simple lists of primitives working first, tho!
let main_struct = crate::variant_array::StructArrayBuilder::new() | ||
.with_field("metadata", Arc::new(metadata_array)) | ||
.with_field("value", Arc::new(value_array)) | ||
.with_field("typed_value", Arc::new(typed_value_array)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the variant shredding spec for arrays -- the typed_value
for a shredded variant array is a non-nullable group called element
, with child fields typed_value
and value
for shredded and unshredded list elements, respectively.
And then we'll need to build an appropriate GenericListArray out of this string array you built, which gives the offsets for each sub-list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this too, I was under the wrong impression that the metadata encoding stores the offsets for the actual values. Reading your #8359 and rereading the Variant Encoding spec I see that the values offsets are within the value encoding itself.
So the outermost typed_value
should be an GenericListArray
of element
- VariantObjects
with {value
and typed_value
fields}?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, exactly! And element
is non-nullable (**), while the two children are nullable.
(**) As always, in arrow, it can still have null entries, but only if its parent is already NULL for the same row (so nobody can ever observe a non-null element)
.with_field("value", Arc::new(value_array)) | ||
.with_field("typed_value", Arc::new(typed_value_array)) | ||
.with_field("metadata", Arc::new(metadata_array), false) | ||
// .with_field("value", Arc::new(value_array), true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need the value
array since every value is a list in the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two different value
fields:
array_col: {
value: BINARY, -- for non-array variant objects
typed_value: {
elements: {
value: BINARY, -- for wrong-type array elements
typed_value: <ELEMENT TYPE>,
}
},
}
AFAIK, both those columns can be missing if every row contains a list with all-correct element types.
Note that variant arrays cannot contain (SQL) NULL tho -- so the list [Some(1), None, Some(3)]
would produce a NULL elements.typed_value
with Variant::Null
in elements.value
.
…ed_list_support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand how these unit tests will translate to variant_get
?
Could you elaborate please? I am currently trying to build just the Shredded |
No worries -- the current iteration does look it produces a correct shredded variant containing a list, so I should probably just be patient and let you finish! |
Which issue does this PR close?
variant_get
#8082.Rationale for this change
We should be able to read lists using
variant_get
What changes are included in this PR?
Are these changes tested?
I'm trying to start with some basic tests to do some TDD.
Are there any user-facing changes?