-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Description
Re
My biggest comment / suggestion is to consider making the API vectorized (convert the entire Arrow Array) but I think we can do that as a follow on PR
And #8299 (comment) -- that run-end encoding could be handled more easily in a vectorized API.
And #8299 (comment) that suggests an append_all_rows()
method.
And #8299 (comment) that also wonders about vectorization.
I'll try to give one response that covers them all:
I think it's reasonable to consider adding a bulk append type API, but we have to be cognizant of the limitations and challenges it will face:
- We will need a new trait that knows how to create (and finish!) variant builder instances
- Variant building is inherently row-based, so any builder that ultimately needs to produce a variant array or variant object as its output will have a trivial
append_all_rows
that just callsappend_row
in a loop (like today), in order to recursively build up the fields/elements of the variant it creates. - The API would be very nice for converting primitive arrays to variant, because they don't need to recurse on anything. Also nice because we could potentially define a specialized impl just for
VariantArrayBuilder
, so we don't have to deal with that new variant builder create+finish trait. - Casting a list of primitive values is an interesting intermediate case, where one should be able to append all the elements of a given list in one shot. But that might require the new create+finish trait? Or maybe it just needs a second specialization for
ListBuilder
? - Maybe instead of a no-arg
append_all_rows()
, we should consider a rangedappend_many_rows(start..end)
? One could always pass..
to request encoding of all rows.
Originally posted by @scovich in #8299 (comment)
liamzwbao
Metadata
Metadata
Assignees
Labels
No labels