Skip to content

[Variant] Avoid extra allocation in ObjectBuilder #7899

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

This came up in conversations with @friendlymatthew and @zeroshade today

Given this example

let mut builder = VariantBuilder::new()
// the sub builder allocates a new buffer
let mut obj = builder.new_object();
obj.insert("a", 1);
// finishes the builder, copies the data into the parent's buider
 obj.finish()?;

Here is the buffer used by the ObjectBuilder:
https://github.com/apache/arrow-rs/blob/34bb605a0ca5ce7f03de0116023fb2cac6b669b3/parquet-variant/src/builder.rs#L817-L816

Here is where it is copied to the parent builder: https://github.com/apache/arrow-rs/blob/34bb605a0ca5ce7f03de0116023fb2cac6b669b3/parquet-variant/src/builder.rs#L936-L935

Describe the solution you'd like
What I would like to do is avoid the extra allocation to improve performance

Describe alternatives you've considered
Here is an approach that must copy the child object bytes but does not use its own allocation. It is modeled after a description of how the go implementation works from @zeroshade

  1. Change the ObjectBuilder so it remembers where the object should start in the parent's buffer
  2. Remove ObjectBuffer::buffer field
  3. On append, the ObjectBuilder writes directly into the parent's buffer
  4. On ObjectBuilder::finish compute how much space is needed for the offsets, and shift (by copy) the child object bytes down by that amount in the parent's buffer
  5. Fill in the object header + offsets for the child array
  6. return

Ideally we would see some performance improvement in the benchmarks

Additional context

If this works out, I think we can do a similar optimization for ListBuilder

Metadata

Metadata

Assignees

Labels

enhancementAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crate

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions