-
Notifications
You must be signed in to change notification settings - Fork 1k
[Variant] Implement VariantArray::value
for shredded variants
#8105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
}}; | ||
} | ||
// | ||
// /// Convert the input array to a `VariantArray` row by row, using `method` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment out this for the draft
version; I will remove it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @klion26 -- this is very cool. I am sorry for the delay in reviewing. I left some comments -- let me know what you think
parquet-variant-compute/src/lib.rs
Outdated
|
||
/// Convert the input array to a `VariantArray` row by row, using `method` | ||
/// to downcast the generic array to a specific array type and `cast_fn` | ||
/// to transform each element to a type compatible with Variant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an interesting idea -- to have two variations of each macro. It is probably a reasonable place to start
However, I think it would be nice to avoid having macros in the root lib.rs
file -- what do you think about making a new module like parquet-variant-compute/src/arrow_types.rs
or parquet-variant-compute/src/type_conversion.rs
to hold the code to convert back and forth between Arrow types and Variant types
@scovich added similar code in VariantToPrimitive
here: #8122
So I am thinking having a module to start collecting such code would be useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Place the macros in parquet-variant-compute/src/type_conversion.rs
is better, will check #8122 and fix the remainings
let typed_value = typed_value.as_primitive::<Int32Type>(); | ||
Variant::from(typed_value.value(index)) | ||
} | ||
DataType::Int16 => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good -- we can probably change the DataType::Int32
branch to use this macro too, right?
Also, I think we should add a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, only Int16 is implemented to confirm the direction
Indeed, we need to add tests to cover this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb Thanks for the review; I'll refine the PR and contact you after I'm done.
parquet-variant-compute/src/lib.rs
Outdated
|
||
/// Convert the input array to a `VariantArray` row by row, using `method` | ||
/// to downcast the generic array to a specific array type and `cast_fn` | ||
/// to transform each element to a type compatible with Variant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Place the macros in parquet-variant-compute/src/type_conversion.rs
is better, will check #8122 and fix the remainings
let typed_value = typed_value.as_primitive::<Int32Type>(); | ||
Variant::from(typed_value.value(index)) | ||
} | ||
DataType::Int16 => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, only Int16 is implemented to confirm the direction
Indeed, we need to add tests to cover this logic.
marking as draft so it is clear this one isn't waiting on a review |
caa9420
to
d682bfd
Compare
19e0b2b
to
363f9ad
Compare
@alamb, please help review this when you have time. Thanks. This is ready to review now. |
|
||
/// Convert the value at a specific index in the given array into a `Variant`. | ||
#[macro_export] | ||
macro_rules! primitive_conversion_single_value { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to a separate macro other than a macro rules
in the same macro as before, I think the current implementation is better for the user to choose which one they need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I was thinking about if there is some way to avoid defining 2 macros for each type of array, but given the branches are different (array.is_null --> null
or Variant::Null
) I suspect the complexity of refactoring required would outweight any "don't repeat yourself" benefits
as_type, | ||
cast_options, | ||
))), | ||
DataType::Int16 => Ok(Box::new(PrimitiveOutputBuilder::<Int16Type>::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add the remaining types in a follow-up pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @klion26 -- I think this makes sense to me and moves us towards a nice set of "convert arrow --> variant" and back again
cc @carpecodeum @mprammer and @scovich
|
||
/// Convert the value at a specific index in the given array into a `Variant`. | ||
#[macro_export] | ||
macro_rules! primitive_conversion_single_value { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I was thinking about if there is some way to avoid defining 2 macros for each type of array, but given the branches are different (array.is_null --> null
or Variant::Null
) I suspect the complexity of refactoring required would outweight any "don't repeat yourself" benefits
@@ -0,0 +1,166 @@ | |||
// Licensed to the Apache Software Foundation (ASF) under one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This PR has a conflict. Can you resolve it @klion26 ? Thank you! |
FYI @scovich |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The approach generally looks good, but I'm not sure how it fits with e.g.
- #8166 ?
(I'm guessing the infrastructure for casting arrays remains useful, and the call/use site would just change if/when the other PR merges?)
continue; | ||
} | ||
$builder.append_variant(Variant::from(array.value(i))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why not just:
continue; | |
} | |
$builder.append_variant(Variant::from(array.value(i))); | |
} else { | |
$builder.append_variant(Variant::from(array.value(i))); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the ruturn/break/continue as soon as possible way, but if we have a strong preference here, I can change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not a strong preference, no need to change if you like it as-is.
(I personally prefer early return/continue/etc any time the else
block would be more than one line because it starts to save a lot of indentation. For one line, it's about the same either way)
macro_rules! non_generic_conversion_array { | ||
($method:ident, $cast_fn:expr, $input:expr, $builder:expr) => {{ | ||
let array = $input.$method(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this is the only "real" macro, and the other two are just special cases?
But for that to work, this macros shouldn't take $input
and $method
as separate arguments, so caller is e.g.
- non_generic_conversion_array!(as_boolean, |v| v, input, builder);
+ non_generic_conversion_array!(input.as_boolean(), |v| v, builder);
If that's acceptable then we can do:
macro_rules! non_generic_conversion_array {
($array:expr, $cast_fn:expr, $builder:expr) => {{
let array = $array;
...
}};
}
and then
macro_rules! generic_conversion_array {
($t:ty, $method:ident, $cast_fn:expr, $input:expr, $builder:expr) => {
non_generic_conversion_array!($input.$method::<$t>, $cast_fn, $builder)
};
}
and then
macro_rules! primitive_conversion {
($t:ty, $input:expr, $builder:expr) => {
generic_conversion_array!($t, as_primitive, |v| v, $input, $builder)
};
}
A similar factoring should work for the xxx_conversion_scalar
macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in a separate commit. The code looks cleaner than before. In the new version, there's something changed for the user. If we want to use the primitive_conversion
macro, we may need to import the three macros(encounter this in parquet-variant-compute/src/variant_get/output.variant.rs
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! I forgot about that. You just have to use fully qualified paths, starting with $crate
:
macro_rules! primitive_conversion {
($t:ty, $input:expr, $builder:expr) => {
$crate::type_conversion::generic_conversion_array!($t, as_primitive, |v| v, $input, $builder)
};
}
etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if you want the macro to be usable outside the current crate, then instead:
use self as parquet_variant_compute;
macro_rules! primitive_conversion {
($t:ty, $input:expr, $builder:expr) => {
parquet_variant_compute::type_conversion::generic_conversion_array!($t, as_primitive, |v| v, $input, $builder)
};
}
... which allows the same path to work both inside and outside the crate.
Tho I suspect it would still break if somebody brought in the crate with a different name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure those definitions should even be public tho? Why not ditch the #[macro_export]
and:
macro_rules! primitive_conversion {
($t:ty, $input:expr, $builder:expr) => {
$crate::type_conversion::generic_conversion_array!($t, as_primitive, |v| v, $input, $builder)
};
}
pub(crate) use primitive_conversion;
macro_rules! decimal_to_variant_decimal { | ||
($v:ident, $scale:expr, $value_type:ty, $variant_type:ty) => { | ||
if *$scale < 0 { | ||
// For negative scale, we need to multiply the value by 10^|scale| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variant doesn't support negative scale?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see -- you're manually applying the scale and producing a scale=0 decimal result.
($v:ident, $scale:expr, $value_type:ty, $variant_type:ty) => { | ||
if *$scale < 0 { | ||
// For negative scale, we need to multiply the value by 10^|scale| | ||
// For example: 123 with scale -2 becomes 12300 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// For example: 123 with scale -2 becomes 12300 | |
// For example: 123 with scale -2 becomes 12300 with scale 0 |
if *$scale < 0 { | ||
// For negative scale, we need to multiply the value by 10^|scale| | ||
// For example: 123 with scale -2 becomes 12300 | ||
let multiplier = (10 as $value_type).pow((-*$scale) as u32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let multiplier = (10 as $value_type).pow((-*$scale) as u32); | |
let multiplier = <$value_type>::pow(10, (-*$scale) as u32); |
if $v > 0 && $v > <$value_type>::MAX / multiplier { | ||
return Variant::Null; | ||
} | ||
if $v < 0 && $v < <$value_type>::MIN / multiplier { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these fully accurate boundary checks? I worry that the truncating integer division could have corner cases.
I think we can do:
v.checked_mul(multiplier)
.and_then(|v| <$variant_type>::try_new(v, 0).ok())
.map_or(Variant::Null, Into::into)
if *$scale < 0 { | ||
// For negative scale, we need to multiply the value by 10^|scale| | ||
// For example: 123 with scale -2 becomes 12300 | ||
let multiplier = (10 as $value_type).pow((-*$scale) as u32); | ||
// Check for overflow | ||
if $v > 0 && $v > <$value_type>::MAX / multiplier { | ||
return Variant::Null; | ||
} | ||
if $v < 0 && $v < <$value_type>::MIN / multiplier { | ||
return Variant::Null; | ||
} | ||
<$variant_type>::try_new($v * multiplier, 0) | ||
.map(|v| v.into()) | ||
.unwrap_or(Variant::Null) | ||
} else { | ||
<$variant_type>::try_new($v, *$scale as u8) | ||
.map(|v| v.into()) | ||
.unwrap_or(Variant::Null) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if *$scale < 0 { | |
// For negative scale, we need to multiply the value by 10^|scale| | |
// For example: 123 with scale -2 becomes 12300 | |
let multiplier = (10 as $value_type).pow((-*$scale) as u32); | |
// Check for overflow | |
if $v > 0 && $v > <$value_type>::MAX / multiplier { | |
return Variant::Null; | |
} | |
if $v < 0 && $v < <$value_type>::MIN / multiplier { | |
return Variant::Null; | |
} | |
<$variant_type>::try_new($v * multiplier, 0) | |
.map(|v| v.into()) | |
.unwrap_or(Variant::Null) | |
} else { | |
<$variant_type>::try_new($v, *$scale as u8) | |
.map(|v| v.into()) | |
.unwrap_or(Variant::Null) | |
} | |
let (v, scale) = if *$scale < 0 { | |
// For negative scale, we need to multiply the value by 10^-scale | |
// For example: 123 with scale -2 becomes 12300 with scale 0 | |
let multiplier = <$value_type>::pow(10, (-*$scale) as u32); | |
($v.checked_mul($v), 0u8) | |
} else { | |
(Some($v), *$scale as u8) | |
}; | |
$v.and_then(|v| <$variant_type>::try_new(v, scale).ok()) | |
.map_or(Variant::Null, Into::into) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
|
||
/// Convert arrays that don't need generic type parameters | ||
#[macro_export] | ||
macro_rules! cast_conversion_nongeneric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this just non_generic_conversion_array
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've removed the redundant macros. My bad for not figuring it out when copying it.
let array = $input.$method::<$offset_type>(); | ||
for i in 0..array.len() { | ||
if array.is_null(i) { | ||
$builder.append_null(); | ||
continue; | ||
} | ||
let cast_value = $cast_fn(array.value(i)); | ||
$builder.append_variant(Variant::from(cast_value)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking the advice above for non_generic_conversion_array
:
let array = $input.$method::<$offset_type>(); | |
for i in 0..array.len() { | |
if array.is_null(i) { | |
$builder.append_null(); | |
continue; | |
} | |
let cast_value = $cast_fn(array.value(i)); | |
$builder.append_variant(Variant::from(cast_value)); | |
} | |
non_generic_conversion_array!($input.$method::<$offset_type>(), $cast_fn, $builder) |
/// Convert the input array of a specific primitive type to a `VariantArray` | ||
/// row by row | ||
#[macro_export] | ||
macro_rules! primitive_conversion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macro_rules! primitive_conversion { | |
macro_rules! primitive_conversion_array { |
(to match the others?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
783cb22
to
f813b08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scovich Thanks for your review, I've updated the code. Please take another look
/// Convert the input array of a specific primitive type to a `VariantArray` | ||
/// row by row | ||
#[macro_export] | ||
macro_rules! primitive_conversion { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
continue; | ||
} | ||
$builder.append_variant(Variant::from(array.value(i))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the ruturn/break/continue as soon as possible way, but if we have a strong preference here, I can change it.
|
||
/// Convert arrays that don't need generic type parameters | ||
#[macro_export] | ||
macro_rules! cast_conversion_nongeneric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've removed the redundant macros. My bad for not figuring it out when copying it.
if *$scale < 0 { | ||
// For negative scale, we need to multiply the value by 10^|scale| | ||
// For example: 123 with scale -2 becomes 12300 | ||
let multiplier = (10 as $value_type).pow((-*$scale) as u32); | ||
// Check for overflow | ||
if $v > 0 && $v > <$value_type>::MAX / multiplier { | ||
return Variant::Null; | ||
} | ||
if $v < 0 && $v < <$value_type>::MIN / multiplier { | ||
return Variant::Null; | ||
} | ||
<$variant_type>::try_new($v * multiplier, 0) | ||
.map(|v| v.into()) | ||
.unwrap_or(Variant::Null) | ||
} else { | ||
<$variant_type>::try_new($v, *$scale as u8) | ||
.map(|v| v.into()) | ||
.unwrap_or(Variant::Null) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
macro_rules! non_generic_conversion_array { | ||
($method:ident, $cast_fn:expr, $input:expr, $builder:expr) => {{ | ||
let array = $input.$method(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in a separate commit. The code looks cleaner than before. In the new version, there's something changed for the user. If we want to use the primitive_conversion
macro, we may need to import the three macros(encounter this in parquet-variant-compute/src/variant_get/output.variant.rs
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but please double check that return
in primitive_conversion_single_value
-- I'm pretty sure it's incorrect and only works by accident today.
generic_conversion_array!( | ||
Float16Type, | ||
as_primitive, | ||
|v: f16| -> f32 { v.into() }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside:
|v: f16| -> f32 { v.into() }, | |
f32::from, |
generic_conversion_array!( | ||
Decimal256Type, | ||
as_primitive, | ||
|v: i256| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside: L237-241 below could simplify to just:
v.to_i128().map_or(
Variant::Null,
decimal_to_variant_decimal!(v, scale, i128, VariantDecimal16,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't change this, because compile thinks that v
passed into the macro is i256, and seems we can't cast it when calling the macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I had a silly typo, sorry --
v.to_i128().map_or(
Variant::Null,
|v| decimal_to_variant_decimal!(v, scale, i128, VariantDecimal16),
)
(missing |v|
in the closure)
generic_conversion_array!( | ||
Time64NanosecondType, | ||
as_primitive, | ||
|v| NaiveTime::from_num_seconds_from_midnight_opt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside: @alamb -- I'm not sure CI is running fmt
against this file? At least, I've never seen it willing to omit trailing commas for non-macro invocations (L412), and it always formats multi-line lambdas with curly braces even tho I'd personally prefer it didn't:
|v| {
NaiveTime::foo(
a,
b,
)
}
($t:ty, $input:expr, $index:expr) => {{ | ||
let array = $input.as_primitive::<$t>(); | ||
if array.is_null($index) { | ||
return Variant::Null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return
from a macro seems dangerous/wrong? It would return from whatever function invoked the macro which is probably not what the caller expected? Is it even what the macro writer intended? To return Variant::Null
from the function on NULL, but the macro invocation produces a normal Variant
otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, return
from a macro seems wrong, changed the implementation.
@scovich Thanks for the quick review. I've addressed the comments. Please take another look. |
Fixed in the latest commit
|
/// Convert the input array to a `VariantArray` row by row, using `method` | ||
/// requiring a generic type to downcast the generic array to a specific | ||
/// array type and `cast_fn` to transform each element to a type compatible with Variant | ||
#[macro_export] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: There's no point (publicly) exporting a macro that references $crate
-- compilation will fail for any use sites outside this crate. Can do pub(crate) use generic_conversion_array
to make it visible everywhere inside the crate.
Ah! The latter is in the |
macro_rules! generic_conversion_single_value { | ||
($t:ty, $method:ident, $cast_fn:expr, $input:expr, $index:expr) => {{ | ||
let array = $input.$method::<$t>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this not be expressed in terms of non_generic_conversion_single_value
?
macro_rules! primitive_conversion_single_value { | ||
($t:ty, $input:expr, $index:expr) => {{ | ||
let array = $input.as_primitive::<$t>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this not be expressed in terms of generic_conversion_single_value
?
Nothing but nits left at this point, can fix them before or after merge. |
@scovich Thank you very much for the detailed and patient review. Addressed the comments in the last commit. One more thing: I changed all the macros to |
Which issue does this PR close?
We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.
VariantArray::value
for shredded variants #8091 .Rationale for this change
Implement
VariantArray::value
for some more shredded variants(eg. primitive_conversion/generic_conversion/non_generic_conversion).What changes are included in this PR?
macroRules
to a separate moduletype_conversion.rs
variant value
Are these changes tested?
Covered by the existing test
Are there any user-facing changes?
No