Skip to content

Commit a620957

Browse files
authored
[Variant] Support read-only metadata builders (#8208)
# Which issue does this PR close? - Closes #8152 # Rationale for this change When manipulating existing variant values (unshredding, removing fields, etc), the metadata column is already defined and already contains all necessary field ids. In fact, defining new/different field ids would require rewriting the bytes of those already-encoded variant values. We need a way to build variant values that rely on an existing metadata dictionary. # What changes are included in this PR? * `MetadataBuilder` is now a trait, and most methods that work with metadata builders now take `&mut dyn MetadataBuilder` instead of `&mut MetadataBuilder`. * The old `MetadataBuilder` struct is now `BasicMetadataBuilder` that implements `MetadataBuilder` * Define a `ReadOnlyMetadataBuilder` that wraps a `VariantMetadata` and which also implements `MetadataBuilder` * Update the `try_binary_search_range_by` helper method to be more general, so we can define an efficient `VariantMetadata::get_entry` that returns the field id for a given field name. # Are these changes tested? Existing tests cover the basic metadata builder. New tests added to cover the read-only metadata builder. # Are there any user-facing changes? The renamed `BasicMetadataBuilder` (breaking), the new `MetadataBuilder` trait (breaking), and the new `ReadOnlyMetadataBuilder`.
1 parent c83c6b2 commit a620957

File tree

5 files changed

+236
-44
lines changed

5 files changed

+236
-44
lines changed

parquet-variant-compute/src/variant_array_builder.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ use crate::VariantArray;
2121
use arrow::array::{ArrayRef, BinaryViewArray, BinaryViewBuilder, NullBufferBuilder, StructArray};
2222
use arrow_schema::{ArrowError, DataType, Field, Fields};
2323
use parquet_variant::{ListBuilder, ObjectBuilder, Variant, VariantBuilderExt};
24-
use parquet_variant::{MetadataBuilder, ParentState, ValueBuilder};
24+
use parquet_variant::{ParentState, ValueBuilder, WritableMetadataBuilder};
2525
use std::sync::Arc;
2626

2727
/// A builder for [`VariantArray`]
@@ -74,7 +74,7 @@ pub struct VariantArrayBuilder {
7474
/// Nulls
7575
nulls: NullBufferBuilder,
7676
/// builder for all the metadata
77-
metadata_builder: MetadataBuilder,
77+
metadata_builder: WritableMetadataBuilder,
7878
/// ending offset for each serialized metadata dictionary in the buffer
7979
metadata_offsets: Vec<usize>,
8080
/// builder for values
@@ -96,7 +96,7 @@ impl VariantArrayBuilder {
9696

9797
Self {
9898
nulls: NullBufferBuilder::new(row_capacity),
99-
metadata_builder: MetadataBuilder::default(),
99+
metadata_builder: WritableMetadataBuilder::default(),
100100
metadata_offsets: Vec::with_capacity(row_capacity),
101101
value_builder: ValueBuilder::new(),
102102
value_offsets: Vec::with_capacity(row_capacity),

0 commit comments

Comments
 (0)