Skip to content

Conversation

sdf-jkl
Copy link
Contributor

@sdf-jkl sdf-jkl commented Sep 16, 2025

Which issue does this PR close?

Rationale for this change

We should be able to read lists using variant_get

What changes are included in this PR?

Are these changes tested?

I'm trying to start with some basic tests to do some TDD.

Are there any user-facing changes?

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments that are hopefully helpful.

Also, we should (eventually) support nesting -- arrays and structs inside arrays.
Let's get simple lists of primitives working first, tho!

Comment on lines 1100 to 1103
let main_struct = crate::variant_array::StructArrayBuilder::new()
.with_field("metadata", Arc::new(metadata_array))
.with_field("value", Arc::new(value_array))
.with_field("typed_value", Arc::new(typed_value_array))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the variant shredding spec for arrays -- the typed_value for a shredded variant array is a non-nullable group called element, with child fields typed_value and value for shredded and unshredded list elements, respectively.

And then we'll need to build an appropriate GenericListArray out of this string array you built, which gives the offsets for each sub-list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this too, I was under the wrong impression that the metadata encoding stores the offsets for the actual values. Reading your #8359 and rereading the Variant Encoding spec I see that the values offsets are within the value encoding itself.

So the outermost typed_value should be an GenericListArray of element - VariantObjects with {value and typed_value fields}?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly! And element is non-nullable (**), while the two children are nullable.

(**) As always, in arrow, it can still have null entries, but only if its parent is already NULL for the same row (so nobody can ever observe a non-null element)

.with_field("value", Arc::new(value_array))
.with_field("typed_value", Arc::new(typed_value_array))
.with_field("metadata", Arc::new(metadata_array), false)
// .with_field("value", Arc::new(value_array), true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need the value array since every value is a list in the array.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two different value fields:

array_col: {
    value: BINARY, -- for non-array variant objects
    typed_value: {
        elements: {
            value: BINARY, -- for wrong-type array elements
            typed_value: <ELEMENT TYPE>,
        }
    },
}

AFAIK, both those columns can be missing if every row contains a list with all-correct element types.

Note that variant arrays cannot contain (SQL) NULL tho -- so the list [Some(1), None, Some(3)] would produce a NULL elements.typed_value with Variant::Null in elements.value.

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand how these unit tests will translate to variant_get?

@sdf-jkl
Copy link
Contributor Author

sdf-jkl commented Sep 19, 2025

I'm not sure I understand how these unit tests will translate to variant_get?

Could you elaborate please?

I am currently trying to build just the Shredded List VariantArray test case, and while doing so learning how we could build them in shred_variant later. Once have a good way of building simple Shredded List VariantArray it will be easy to work on the rest of the unit tests for variant_get

@scovich
Copy link
Contributor

scovich commented Sep 19, 2025

I'm not sure I understand how these unit tests will translate to variant_get?

Could you elaborate please?

I am currently trying to build just the Shredded List VariantArray test case, and while doing so learning how we could build them in shred_variant later. Once have a good way of building simple Shredded List VariantArray it will be easy to work on the rest of the unit tests for variant_get

No worries -- the current iteration does look it produces a correct shredded variant containing a list, so I should probably just be patient and let you finish!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet-variant parquet-variant* crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants