Skip to content

Conversation

scovich
Copy link
Contributor

@scovich scovich commented Sep 16, 2025

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

  • Closes #NNN.

Rationale for this change

Historically, Variant::as_fXX methods don't even try to cast int values as floating point, which is counter-intuitive.

What changes are included in this PR?

Allow lossless casting of variant integer values to variant floating point values, by a naive determination of precision:

  • Every floating point number has some number of bits of precision
    • 53 (double)
    • 24 (single)
    • 11 (half)
  • Any integer that fits entirely inside the target floating point type's precision can be converted losslessly
    • This produces an intuitive result: "too big" numbers fail to convert, while "small enough" numbers do convert.
    • This is a sufficient but not a necessary condition.
    • Technically, wider integer can be represented losslessly as well, as long as they have enough trailing zeros
    • It's unclear whether allowing those wider values to cast is actually helpful in practice, because only 1 in 2**k values can cast (where k is the number of bits of excess precision); it would certainly make input testing more expensive.

Are these changes tested?

New unit tests and doc tests.

Are there any user-facing changes?

Yes. Values that failed to cast before now succeed.

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Sep 16, 2025
@scovich
Copy link
Contributor Author

scovich commented Sep 16, 2025

CC @klion26 @alamb -- should be a quickie

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense to me -- thanks @scovich

Copy link
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the improvement.

@@ -144,3 +144,20 @@ pub(crate) const fn expect_size_of<T>(expected: usize) {
let _ = [""; 0][size];
}
}

pub(crate) fn fits_precision<const N: u32>(n: impl Into<i64>) -> bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put N into the generic, not the parameter, because this can yield better performance, do I understand correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was mostly out of habit for small integer manipulation utilities... But now that you mention, it probably doesn't matter in the slightest -- the compiler will anyway inline aggressively and the constant arg will be folded in regardless of whether it's a generic arg or a function arg.

That said, there's one potential advantage to keeping the generic arg: Otherwise, it could be ambiguous which of two integer args is the precision and which is the actual value. Any preferences?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed reply, I'm fine with the current implementation

@@ -1096,13 +1096,21 @@ impl<'m, 'v> Variant<'m, 'v> {
/// let v2 = Variant::from(std::f64::consts::PI);
/// assert_eq!(v2.as_f16(), Some(f16::from_f64(std::f64::consts::PI)));
///
/// // and from integers with no more than 11 bits of precision
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need to add an overflow example here(e.g, Variant::from(2048) for f16)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit test for fits_precision does test overflow, both positive and negative.
Do we need additional (indirect) test coverage here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to update the doc(the comment said that -- Returns Some(f16) for float and double variants, None for non-floating-point variants) or add an example for integer overflow (`Variant::from(2048) for f16 -- will return None)

I don't have a strong preference here, it's just an idea that popped into my mind when I saw this

Copy link
Contributor Author

@scovich scovich Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Good catch on the doc comment. Updated all three.

@alamb
Copy link
Contributor

alamb commented Sep 17, 2025

🚀

@alamb alamb merged commit d6f40ce into apache:main Sep 17, 2025
12 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 17, 2025

Thanks @scovich and @klion26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet-variant parquet-variant* crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants