-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
- part of [EPIC] [Parquet] Implement Variant type support in Parquet #6736
- Requires [Variant] Support Shredded Objects in
variant_get
: typed path access (STEP 1) #8150
Note this is likely one of the most complex parts of implementing Shredded Variants , so it is not a good first task
We are trying to support the general case of the variant_get
function, which allows runtime dynamic access to Variants (either shredded or unshredded).
- We found in [Variant] Support Shredded Objects in
variant_get
#8083 that supporting variant_get is quite complicated (see here), so we are proposing to brake it down into multiple piece.
This ticket tracks
Support variant_get
for Some(DataType::VARIANT)
The idea here is that the user could reconstruct an unshredded Variant from any input Variant (either Shredded or Unshredded)
Implementing this functionality will likely require the basic representation for shredded Variant arrays along with path traversal in variant_get
. However, it does NOT cover the following (which are / will be broken into separate tickets)
- Support for retrieving as a specific non Struct data type (e.g.
Some(DataType::Utf8)
) - Retrieving any arbitrary path and returning what is there (no type specified)
- Retrieving an arbitrary path as a "Struct" (aka implementing shredding)
Describe the solution you'd like
@scovich sketched out a high level design for Shredded Objects (see Representing Variant In Arrow Proposal: "Shredding an Object" and Variant Shredding::Objects) in this PR
This likely requires reusing some of the logic in the cast_to_variant
kernel to convert typed columns into Variants
So roughly that means supporting
// get the named field of variant object as a typed field
variant_get(array, "$.field_name", Variant)
Where $.field_name
represents some arbitrary VariantPath
such as a
for field "a", or a.b
for field "b" of field "a"
This should work for:
- Variants where the field_name is in a typed_value
- Variants where the field_name is not in the typed value
Describe alternatives you've considered
- Add a test that manually constructs a shredded variant array (follow the example in the arrow proposal)
- Add a test that calls variant_get appropriately
- Implement the code
I suggest getting this working for non-nested obejcts first, and then working on nesting / pathing as a second pR
Additional context
Reference