Skip to content

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Sep 11, 2025

Which issue does this PR close?

We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax.

Rationale for this change

Now that we have merged the upstream parquet-variant tests:

We can test how far we are from the rust variant implementation working for all the values

This PR updates the test harness added #8104 by @carpecodeum to use the final parquet files and the currnet APIs

What changes are included in this PR?

  1. Update parquet-testing pin
  2. Update the test harness to use the standard rust test runner (#[test]) rather than a custom main function
  3. Added links to follow on tickets

You can run this test manually like this:

cargo test --all-features --test variant_integration

...
running 138 tests
test test_variant_integration_case_106 ... ok
test test_variant_integration_case_107 ... ok
test test_variant_integration_case_109 ... ok
test test_variant_integration_case_110 ... ok
..
test test_variant_integration_case_90 ... ok
test test_variant_integration_case_91 ... ok
test test_variant_integration_case_93 ... ok
test test_variant_integration_case_83 - should panic ... ok
test test_variant_integration_case_84 - should panic ... ok

Are these changes tested?

Yes this is all tests

Are there any user-facing changes?

No

@github-actions github-actions bot added parquet Changes to the parquet crate parquet-variant parquet-variant* crates labels Sep 11, 2025
@alamb alamb changed the title Alamb/variant tests Update variant_integration to use final series Sep 11, 2025
@alamb alamb marked this pull request as ready for review September 12, 2025 11:18
/// Note this value may be smaller than what was passed to [`Self::new`] or
/// [`Self::try_new`] if the input was larger than necessary to encode the
/// metadata dictionary.
pub fn size(&self) -> usize {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I needed to expose this information because the variant metadata / data are appended in one .bin file in the test cases


/// Test case definition structure matching the format from cases.json
#[derive(Debug, Clone)]
// Generate test functions for each case
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this file so the tests use the existing Rust test harness rather than our own. It does require some redundancy, but it means normal rust test execution tools work

// - cases 40, 42, 87, 127 and 128 are expected to fail always (they include invalid variants)
// - the remaining cases are expected to (eventually) pass

variant_test_case!(1, "Unsupported typed_value type: List(");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am quite pleased how many cases pass, actually

@alamb alamb changed the title Update variant_integration to use final series Update variant_integration to use final approved parquet-testing data Sep 12, 2025
@alamb
Copy link
Contributor Author

alamb commented Sep 12, 2025

@carpecodeum or @mprammer do you have time to review this PR?

@alamb
Copy link
Contributor Author

alamb commented Sep 12, 2025

Also FYI @scovich @codephage2020 @liamzwbao and @klion26

@alamb
Copy link
Contributor Author

alamb commented Sep 12, 2025

@scovich one thing I have been thinking about is if there is some way to leverage this same test suite for variant_get

At the moment the tests read a shredded variant out as an unshredded varant and compares it

I was thinking maybe I could extend this more to then re-shred the variant and compare it with the original shredded one 🤔

@alamb alamb changed the title Update variant_integration to use final approved parquet-testing data Update variant_integration test to use final approved parquet-testing data Sep 12, 2025
Copy link
Member

@klion26 klion26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alamb
Copy link
Contributor Author

alamb commented Sep 15, 2025

Thanks for the review @klion26

I'll plan to merge this tomorrow unless anyone else would like additional time to review

Copy link
Contributor

@codephage2020 codephage2020 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good To Me! Thanks for the clean implementation.

@alamb
Copy link
Contributor Author

alamb commented Sep 16, 2025

The test is running more and more cases before I can even merge it!

@alamb alamb closed this Sep 16, 2025
@alamb alamb reopened this Sep 16, 2025
@alamb alamb merged commit 2ec77b5 into apache:main Sep 16, 2025
39 checks passed
@alamb
Copy link
Contributor Author

alamb commented Sep 16, 2025

😅 -- testing for the win!

@alamb alamb deleted the alamb/variant_tests branch September 16, 2025 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate parquet-variant parquet-variant* crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Variant] Integration tests for reading parquet w/ Variants
4 participants