Skip to content

Conversation

@ByteBaker
Copy link

Which issue does this PR close?

Closes #13037

Rationale for this change

DataFusion previously maintained custom implementations of record_batch! and create_array! macros. These macros are now available upstream in arrow-rs (added in apache/arrow-rs#6588), so we should use those instead to reduce code duplication and align with the Arrow ecosystem.

What changes are included in this PR?

  • Removed custom record_batch! and create_array! macro definitions from datafusion/common/src/test_util.rs
  • Re-exported the macros from arrow::array instead
  • Updated all 67 usages across 24 files from vec![...] syntax to array literal [...] syntax to match arrow-rs macro expectations
  • Added arrow_schema module aliases in test modules for macro compatibility
  • Replaced macro usage with manual RecordBatch::try_new() construction in cases where variables are passed (macros only support literal values)

Are these changes tested?

  • All existing tests pass (no new test failures introduced)
  • Verified with cargo test --lib across all modified packages
  • cargo clippy and cargo fmt checks pass on modified code

Are there any user-facing changes?

No user-facing changes. The macros maintain the same public API, just sourced from arrow-rs instead of DataFusion.

Removes DataFusion's custom `record_batch!` and `create_array!` macro
implementations in favor of the upstream versions from arrow-rs added in
apache/arrow-rs#6588.

Changes:
- Replace custom macro definitions with re-exports from arrow::array
- Update syntax from vec![...] to array literal [...] across 67 usages
- Add arrow_schema aliases in test modules for macro compatibility
- Replace macro usage with manual RecordBatch construction where
  variables are used (macros only support literals)

Closes apache#13037
@github-actions github-actions bot added core Core DataFusion crate common Related to common crate datasource Changes to the datasource crate ffi Changes to the ffi crate labels Oct 23, 2025
Comment on lines -32 to +44
record_batch!(("a", Int32, a_vals), ("b", Float64, b_vals)).unwrap()
let schema = Arc::new(Schema::new(vec![
Field::new("a", DataType::Int32, true),
Field::new("b", DataType::Float64, true),
]));

RecordBatch::try_new(
schema,
vec![
Arc::new(Int32Array::from(a_vals)),
Arc::new(Float64Array::from(b_vals)),
],
)
.unwrap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the old macro supported variables but the new ones don't?

Copy link
Author

@ByteBaker ByteBaker Oct 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, yes. And since I'm the one who wrote the other macro, I must acknowledge that I didn't think of such use cases.

On the flip side, the purpose of this PR is to sync datafusion w/ upstream. As we merge it now and proceed further, the task of fixing the macro could be taken up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to keep the old macros in this case and only migrate the uses which can be replaced with upstream version; that way we can track where in the codebase to replace the old macros

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @Jefffrey

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some CI failures to address

Comment on lines -32 to +44
record_batch!(("a", Int32, a_vals), ("b", Float64, b_vals)).unwrap()
let schema = Arc::new(Schema::new(vec![
Field::new("a", DataType::Int32, true),
Field::new("b", DataType::Float64, true),
]));

RecordBatch::try_new(
schema,
vec![
Arc::new(Int32Array::from(a_vals)),
Arc::new(Float64Array::from(b_vals)),
],
)
.unwrap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to keep the old macros in this case and only migrate the uses which can be replaced with upstream version; that way we can track where in the codebase to replace the old macros

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate ffi Changes to the ffi crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove record_batch! macro once upstream updates

3 participants