-
Notifications
You must be signed in to change notification settings - Fork 1k
Adds Map & Enum support, round-trip & benchmark tests #8353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…additional types like Decimal, FixedSizeBinary, Utf8, List, Struct, and Map. Add round-trip validation for complex and logical types including Duration and UUID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nathaniel-d-ef LGTM!
Besides that nit, the only thing I'd recommend is making encode_map_entries
a method of MapEncoder
. Pretty minor though.
Co-authored-by: Connor Sanders <[email protected]>
…y code structure.
@jecsand838 Thanks for the quick review 🙏 |
@alamb This follow-up is ready for a once-over whenever you have a chance. 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @nathaniel-d-ef and @jecsand838 -- this looks great to me. I had some small cleanup suggestions but nothing that is needed
.into_owned() | ||
}; | ||
// Read original file into a single RecordBatch for comparison | ||
let f_in = File::open(&path).expect("open input avro"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a non trivial amount of code duplication here and in the other tests for the mechanics of writing to a temporary file and reading the values back as an arrow RecordBatch -- among other things I think this obscures the intent of the test somewhat
Wny chance you are willing to consolidate some of the duplication in a follow on PR?
Like it would be great if these tests look like
let data = ...;
// function roundtrip writes data out to a file (or memory buffer) and reads it back as batches and verifies
round_trip(data);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion for sure. I'll add this as a note and we can tackle it along with the in-memory recommendation from earlier.
Co-authored-by: Andrew Lamb <[email protected]>
Sticking with the downcast_ref pattern after all for simplicity's sake - as_string_opt is generic requires additional type handling |
Thanks again @nathaniel-d-ef and @jecsand838 |
Which issue does this PR close?
Rationale for this change
This PR adds Map and Enum encoders to the arrow-avro crate writer, along with new benchmark tests for remaining types and round-trip tests.
What changes are included in this PR?
New encoders:
Map
Enum
Corresponding changes in support of these encoders in FieldEncoder and FieldPlan
Additional round trip tests in
mod.rs
New tests follow existing file read pattern
Additional benchmark tests for data types
Are these changes tested?
Yes, additional complex type unit tests have been added for Map and Enum. The rest of the PR beyond the new types are tests themselves. All tests, new and existing, pass.
Are there any user-facing changes?
n/a, arrow-avro crate is not yet public