Skip to content

Commit c561acb

Browse files
alambSamyak2scovich
authored
[Variant] Add variant_get and Shredded VariantArray (#8021)
# Which issue does this PR close? - Part of #6736 - Closes #7941 - Closes #7965 # Rationale for this change This is has a proposal for how to structure shredded `VariantArray`s and the `variant_get` kernel If people like the basic idea I will file some more tickets to track additional follow on work It is based on ideas ideas from @carpecodeum in #7946 and @scovich in #7915 I basically took the tests from #7965 and the conversation with @scovich recorded from #7941 (comment) and I bashed out how this might look # What changes are included in this PR? 1. Update `VariantArray` to represent shredding 2. Add code to `variant_get` to support extracting paths as both variants and typed fields 3. A pattern that I think can represent shredding and extraction 4. Tests for same Note there are many things that are NOT in this PR that I envision doing as follow on PRs: 1. Support and implementing `Path`s 2. Support for shredded objects 3. Support shredded lists 4. Support nested objects / lists 5. Full casting support 6. Support for other output types: `StringArray`, `StringViewArray`, etc 8. Many performance improvements # Are these changes tested? Yes # Are there any user-facing changes? New feature --------- Co-authored-by: Samyak Sarnayak <[email protected]> Co-authored-by: Ryan Johnson <[email protected]>
1 parent 7a5f6d3 commit c561acb

File tree

10 files changed

+1104
-252
lines changed

10 files changed

+1104
-252
lines changed

parquet-variant-compute/src/from_json.rs

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ pub fn batch_json_string_to_variant(input: &ArrayRef) -> Result<VariantArray, Ar
5252
#[cfg(test)]
5353
mod test {
5454
use crate::batch_json_string_to_variant;
55-
use arrow::array::{Array, ArrayRef, AsArray, StringArray};
55+
use arrow::array::{Array, ArrayRef, StringArray};
5656
use arrow_schema::ArrowError;
5757
use parquet_variant::{Variant, VariantBuilder};
5858
use std::sync::Arc;
@@ -69,8 +69,8 @@ mod test {
6969
let array_ref: ArrayRef = Arc::new(input);
7070
let variant_array = batch_json_string_to_variant(&array_ref).unwrap();
7171

72-
let metadata_array = variant_array.metadata_field().as_binary_view();
73-
let value_array = variant_array.value_field().as_binary_view();
72+
let metadata_array = variant_array.metadata_field();
73+
let value_array = variant_array.value_field().expect("value field");
7474

7575
// Compare row 0
7676
assert!(!variant_array.is_null(0));

parquet-variant-compute/src/lib.rs

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,34 @@
1515
// specific language governing permissions and limitations
1616
// under the License.
1717

18+
//! [`VariantArray`] and compute kernels for the [Variant Binary Encoding] from [Apache Parquet].
19+
//!
20+
//! ## Main APIs
21+
//! - [`VariantArray`] : Represents an array of `Variant` values.
22+
//! - [`VariantArrayBuilder`]: For building [`VariantArray`]
23+
//! - [`batch_json_string_to_variant`]: Function to convert a batch of JSON strings to a `VariantArray`.
24+
//! - [`batch_variant_to_json_string`]: Function to convert a `VariantArray` to a batch of JSON strings.
25+
//! - [`cast_to_variant`]: Module to cast other Arrow arrays to `VariantArray`.
26+
//! - [`variant_get`]: Module to get values from a `VariantArray` using a specified [`VariantPath`]
27+
//!
28+
//! ## 🚧 Work In Progress
29+
//!
30+
//! This crate is under active development and is not yet ready for production use.
31+
//! If you are interested in helping, you can find more information on the GitHub [Variant issue]
32+
//!
33+
//! [Variant Binary Encoding]: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
34+
//! [Apache Parquet]: https://parquet.apache.org/
35+
//! [`VariantPath`]: parquet_variant::VariantPath
36+
//! [Variant issue]: https://github.com/apache/arrow-rs/issues/6736
37+
1838
pub mod cast_to_variant;
1939
mod from_json;
2040
mod to_json;
2141
mod variant_array;
2242
mod variant_array_builder;
2343
pub mod variant_get;
2444

25-
pub use variant_array::VariantArray;
45+
pub use variant_array::{ShreddingState, VariantArray};
2646
pub use variant_array_builder::{VariantArrayBuilder, VariantArrayVariantBuilder};
2747

2848
pub use from_json::batch_json_string_to_variant;

0 commit comments

Comments
 (0)