-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crateChanges to the parquet crate
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Parquet recently adopted the Variant type from Spark: https://github.com/apache/parquet-format/blob/master/VariantEncoding.md
Details on
Describe the solution you'd like
I would like to implement variant support in parquet-rs
Additional context
I am not sure if any other parquet implementations have implemented this yet / if there are example parquet files. I will attempt to find out
Shredding Support
- [Variant] Add low level support for shredding and unshredding #7715
- [Variant] API to construct Shredded Variant Arrays #7895
- Retrieve array from RecordBatch for a leaf column #5699
- [Variant] Support Shredded Objects in
variant_get
: untyped path access #8151 - [Variant] extend shredded null handling for arrays #8400
- [Variant] Support Shredded Objects in
variant_get
: access asSome(DataType::Struct)
(nested shredding) #8153 - [Variant] [Shredding] Support typed_access for
Boolean
#8329 - [Variant] [Shredding] Support typed_access for
Date32
#8330 - [Variant] [Shredding] Support typed_access for
Timestamp(Microsecond, _)
andTimestamp(Nanosecond, _)
#8331 - [Variant] [Shredding] Support typed_access for
Decimal128
#8332 - [Variant] [Shredding] Support typed_access for
Utf8
andBinaryView
#8333 - [Variant] [Shredding] Support typed_access for
Time64(Microsecond)
#8334 - [Variant] [Shredding] Support typed_access for
FixedSizeBinary
#8335 - [Variant] [Shredding] Support typed_access for
Struct
#8336 - [Variant] [Shredding] Support typed_access for
List
#8337 - [Variant] Implement
VariantArray::value
for shredded variants #8091 - [Variant] Support Shredded Objects in
variant_get
: typed path access (STEP 1) #8150 - [Variant] Implement
ShreddingState::AllNull
variant #8088 - [Variant] Support
variant_get
kernel for shredded variants #7941 - [Variant] Support Shredded Objects in
variant_get
#8083 - [Variant] Implement a
shred_variant
function #8361
Full variant_get
Support
- [Variant] Support Shredded Objects in
variant_get
: Access asVARIANT
(Unshredding) #8154 - [Variant] Support Shredded Lists/Array in
variant_get
#8082 - [Variant] Casting errors behavior support in
variant_get
#8086 - [Variant] Support typed access for other types in
variant_get
#8087 - [Variant] Allow appending raw object/list bytes to variant builders #8141
- [Variant] Support creating Variants with pre-existing Metadata #8152
Parquet File Integration
- [Variant]
VariantArray::data_type
returnsStructType
, causingArray::as_struct
to panic #8319 - [Variant] Support
BinaryArray
/LargeBinaryArray
in addition toBinaryViewArray
for Variant #8387 - [Variant] Integration tests for reading parquet w/ Variants #8084
- [Variant] Support reading/writing Parquet Variant LogicalType #8370
- [Variant] Support mapping canonical extension types to Parquet LogicalTypes #7063
- [Variant] writing a VariantArray to parquet panics #8296
- [Variant] Add
variant
feature toparquet
crate #8132 - Add example Variant data and parquet files parquet-testing#75
- [Variant] Rename
variant_experimental
flag tovariant
and remove warnings about being experimental #8297
Arrow --> Variant conversions
- [Variant] cast_to_variant will panic on certain
Date64
or Timestamp Values values #8155 - [Variant] Rename
batch_json_string_to_variant
andbatch_variant_to_json_string
json_to_variant #8144 - [Variant] Support
StringView
andLargeString
in ´batch_json_string_to_variant` #8145 - [Variant]: Implement
DataType::List/LargeList
support forcast_to_variant
kernel #8060 - [Variant]: Implement
DataType::Dictionary
support forcast_to_variant
kernel #8062 - [Variant]: Implement
DataType::Map
support forcast_to_variant
kernel #8063 - [Variant]: Implement
DataType::RunEndEncoded
support forcast_to_variant
kernel #8064 - [Variant] Implement
cast_to_variant
kernel #8043 - [Variant]: Implement
DataType::Utf8/LargeUtf8/Utf8View
support forcast_to_variant
kernel #8049 - [Variant]: Implement
DataType::Binary/LargeBinary/BinaryView
support forcast_to_variant
kernel #8050 - [Variant]: Implement
DataType::FixedSizeBinary
support forcast_to_variant
kernel #8051 - [Variant]: Implement
DataType::Boolean
support forcast_to_variant
kernel #8052 - [Variant]: Implement
DataType::Null
support forcast_to_variant
kernel #8053 - [Variant]: Implement
DataType::Date32 / DataType::Date64
support forcast_to_variant
kernel #8054 - [Variant]: Implement
DataType::Time32/Time64
support forcast_to_variant
kernel #8055 - [Variant]: Implement
DataType::Interval
support forcast_to_variant
kernel #8056 - [Variant]: Implement
DataType::Float16
support forcast_to_variant
kernel #8057 - [Variant]: Implement
DataType::Timestamp(..)
support forcast_to_variant
kernel #8058 - [Variant]: Implement
DataType::Decimal32/Decimal64/Decimal128/Decimal256
support forcast_to_variant
kernel #8059 - [Variant]: Implement
DataType::Struct
support forcast_to_variant
kernel #8061
Variant
infrastructure
- [Variant] Optimize the object header generation logic in ObjectBuilder::finish #7978
- [Variant] Revisit validation cost of infallible iterators #7711
- [Variant] VariantBuilder (wrongly?) accepts 0+ variant values #7870
- [Variant] Avoiding extra splice in
ObjectBuilder::finish
if possible #7960 - [Variant] Add
Variant::as_f16
#8228 -
validated
andis_fully_validated
flags doesn't need to be part of PartialEq #7952 - [Variant] Avoid extra allocation in
ObjectBuilder
#7899 - [Variant] Avoid extra allocation in list builder #7977
- [Variant] Convert JSON to Variant with fewer copies #7964
- [Variant]
impl FromIterator
fprVariantPath
#7955 - [Variant] remove VariantMetadata::dictionary_size #7947
- [Variant] Improve
VariantArray
performance by storing the index of the metadata and value arrays #7920 - [Variant] Test and implement efficient building for "large" Arrays #7699
- [Variant] Add
ListBuilder::with_value
for convenience #7951 - [Variant] Add
ObjectBuilder::with_field
for convenience #7949 - [Variant] Rust API to Read Variant Values #7423
- [Variant] Implement read support for remaining primitive types #7630
- Initial Builder API for Creating Variant Values #7653
- [Variant]: Rust API to Create Variant Values #7424
- [Variant] Improve API for iterating over values of a VariantList #7685
- [Variant] Add Variant::as_object and Variant::as_list #7755
- [Variant] Consider validating variants on creation (rather than read) #7684
- [Variant] Implement
VariantObject::field
andVariantObject::fields
#7665 - [Variant] More efficient determination of String vs ShortString #7700
- [Variant] Panic when appending Object or List to VariantBuilder #7701
- [Variant] Introduce structs for Variant::Decimal types #7660
- [Variant] Support Nested Data in
VariantBuilder
#7696 - [Variant] Improved API for accessing Variant Objects and lists #7756
- [Variant] Add negative tests for reading invalid primitive variant values #7645
- Variant: Write Variant Values as JSON #7426
- [Variant] Add input validation in
VariantBuilder
#7697 - [Variant] Minor: make fields in
VariantDecimal*
private, add examples #7770 - [Variant] Add flag in
ObjectBuilder
to control validation behavior on duplicate field write #7777 - [Variant] Avoid second copy of field name in MetadataBuilder #7814
- [Variant] Add testing for invalid variants (fuzz testing??) #7842
- [Variant] Remove explicit ObjectBuilder::finish() and ListBuilder::finish and move to
Drop
impl #7780 - [Variant] Field lookup with out of bounds index causes unwanted behavior #7784
- [Variant] Make it harder to forget to finish a pending parent i n ObjectBuilder #7798
- [Variant] Move JSON related functionality to different crate. #7800
- [Variant] make
serde_json
an optional dependency ofparquet-variant
#7775 - [Variant] Add tests for invalid variant values (aka verify invalid inputs) #7681
- Variant: Read/Parse JSON value as Variant #7425
- [Variant] If
ObjectBuilder::finalize
is not called, the resulting Variant object is malformed. #7863 - [Variant] Improve VariantBuilder when creating field name dictionaries / sorted dictionaries #7698
- [Variant] Tests for creating "large"
VariantList
s #7820 - [Variant] Impl PartialEq for VariantObject #7943 #7948
- [Variant] Panic when appending nested objects to VariantBuilder #7907
- [Variant] Tests for creating "large"
VariantObjects
s #7821 - [Variant][Compute] Add batch processing for Variant-JSON String conversion #7883
- [Variant] Offer
simdutf8
as an optional dependency when validating metadata #7902 - [Variant] VariantMetadata, VariantList and VariantObject are too big for Copy #7831
- [Variant] Define basic convenience methods for variant pathing #7894
- [Variant] Converting variant to JSON string seems slow #7869
- [Variant]
test_json_to_variant_object_very_large
takes over 20s #7872 - [Variant] Support VariantBuilder to write to buffers owned by the caller #7805
Related PRs
** Related Community Resources**
- Parquet C/C++ variant implementation from @neilechao: GH-45937: [C++][Parquet] Variant logical type definition arrow#45375
- Rust Variant impelementation from @jonhoo @wjones127 and others: https://github.com/datafusion-contrib/datafusion-functions-variant
- [ ]
mapleFU, kostaspap, codephage2020, mkarbo and austin362667etseidl, richox, emkornfield, soumilshah1995, debugmiller and 1 more
Metadata
Metadata
Assignees
Labels
enhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crateChanges to the parquet crate