Skip to content

Conversation

nathaniel-d-ef
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

This PR adds Map and Enum encoders to the arrow-avro crate writer, along with new benchmark tests for remaining types and round-trip tests.

What changes are included in this PR?

New encoders:
Map
Enum

Corresponding changes in support of these encoders in FieldEncoder and FieldPlan

Additional round trip tests in mod.rs

New tests follow existing file read pattern

  • simple_fixed
  • duration_uuid
  • nonnullable.impala.avro
  • decimals
  • enum

Additional benchmark tests for data types

  • Utf8
  • List
  • Struct
  • FixedSizeBinary16
  • UUID
  • IntervalMonthDayNanoDuration
  • Decimal32(bytes)
  • Decimal64(bytes)
  • Decimal128(bytes)
  • Decimal128(fixed16)
  • Decimal256(bytes)
  • Map
  • Enum

Are these changes tested?

Yes, additional complex type unit tests have been added for Map and Enum. The rest of the PR beyond the new types are tests themselves. All tests, new and existing, pass.

Are there any user-facing changes?

n/a, arrow-avro crate is not yet public

…additional types like Decimal, FixedSizeBinary, Utf8, List, Struct, and Map. Add round-trip validation for complex and logical types including Duration and UUID.
@github-actions github-actions bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Sep 16, 2025
Copy link
Contributor

@jecsand838 jecsand838 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathaniel-d-ef LGTM!

Besides that nit, the only thing I'd recommend is making encode_map_entries a method of MapEncoder . Pretty minor though.

@nathaniel-d-ef
Copy link
Contributor Author

@jecsand838 Thanks for the quick review 🙏

@nathaniel-d-ef
Copy link
Contributor Author

@alamb This follow-up is ready for a once-over whenever you have a chance. 🙏

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @nathaniel-d-ef and @jecsand838 -- this looks great to me. I had some small cleanup suggestions but nothing that is needed

.into_owned()
};
// Read original file into a single RecordBatch for comparison
let f_in = File::open(&path).expect("open input avro");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a non trivial amount of code duplication here and in the other tests for the mechanics of writing to a temporary file and reading the values back as an arrow RecordBatch -- among other things I think this obscures the intent of the test somewhat

Wny chance you are willing to consolidate some of the duplication in a follow on PR?

Like it would be great if these tests look like

let data = ...;
// function roundtrip writes data out to a file (or memory buffer) and reads it back as batches and verifies
round_trip(data);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion for sure. I'll add this as a note and we can tackle it along with the in-memory recommendation from earlier.

@nathaniel-d-ef
Copy link
Contributor Author

Sticking with the downcast_ref pattern after all for simplicity's sake - as_string_opt is generic requires additional type handling

@alamb alamb merged commit d74d9ba into apache:main Sep 17, 2025
23 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 17, 2025

Thanks again @nathaniel-d-ef and @jecsand838

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate arrow-avro arrow-avro crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants