-
Notifications
You must be signed in to change notification settings - Fork 984
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This came up in conversations with @friendlymatthew and @zeroshade today
Given this example
let mut builder = VariantBuilder::new()
// the sub builder allocates a new buffer
let mut obj = builder.new_object();
obj.insert("a", 1);
// finishes the builder, copies the data into the parent's buider
obj.finish()?;
Here is the buffer used by the ObjectBuilder:
https://github.com/apache/arrow-rs/blob/34bb605a0ca5ce7f03de0116023fb2cac6b669b3/parquet-variant/src/builder.rs#L817-L816
Here is where it is copied to the parent builder: https://github.com/apache/arrow-rs/blob/34bb605a0ca5ce7f03de0116023fb2cac6b669b3/parquet-variant/src/builder.rs#L936-L935
Describe the solution you'd like
What I would like to do is avoid the extra allocation to improve performance
Describe alternatives you've considered
Here is an approach that must copy the child object bytes but does not use its own allocation. It is modeled after a description of how the go implementation works from @zeroshade
- Change the ObjectBuilder so it remembers where the object should start in the parent's buffer
- Remove
ObjectBuffer::buffer
field - On append, the ObjectBuilder writes directly into the parent's buffer
- On
ObjectBuilder::finish
compute how much space is needed for the offsets, and shift (by copy) the child object bytes down by that amount in the parent's buffer - Fill in the object header + offsets for the child array
- return
Ideally we would see some performance improvement in the benchmarks
Additional context
If this works out, I think we can do a similar optimization for ListBuilder