-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Use upstream arrow-rs record_batch! and create_array! macros #18252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Removes DataFusion's custom `record_batch!` and `create_array!` macro implementations in favor of the upstream versions from arrow-rs added in apache/arrow-rs#6588. Changes: - Replace custom macro definitions with re-exports from arrow::array - Update syntax from vec![...] to array literal [...] across 67 usages - Add arrow_schema aliases in test modules for macro compatibility - Replace macro usage with manual RecordBatch construction where variables are used (macros only support literals) Closes apache#13037
| record_batch!(("a", Int32, a_vals), ("b", Float64, b_vals)).unwrap() | ||
| let schema = Arc::new(Schema::new(vec![ | ||
| Field::new("a", DataType::Int32, true), | ||
| Field::new("b", DataType::Float64, true), | ||
| ])); | ||
|
|
||
| RecordBatch::try_new( | ||
| schema, | ||
| vec![ | ||
| Arc::new(Int32Array::from(a_vals)), | ||
| Arc::new(Float64Array::from(b_vals)), | ||
| ], | ||
| ) | ||
| .unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the old macro supported variables but the new ones don't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, yes. And since I'm the one who wrote the other macro, I must acknowledge that I didn't think of such use cases.
On the flip side, the purpose of this PR is to sync datafusion w/ upstream. As we merge it now and proceed further, the task of fixing the macro could be taken up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to keep the old macros in this case and only migrate the uses which can be replaced with upstream version; that way we can track where in the codebase to replace the old macros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with @Jefffrey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like some CI failures to address
| record_batch!(("a", Int32, a_vals), ("b", Float64, b_vals)).unwrap() | ||
| let schema = Arc::new(Schema::new(vec![ | ||
| Field::new("a", DataType::Int32, true), | ||
| Field::new("b", DataType::Float64, true), | ||
| ])); | ||
|
|
||
| RecordBatch::try_new( | ||
| schema, | ||
| vec![ | ||
| Arc::new(Int32Array::from(a_vals)), | ||
| Arc::new(Float64Array::from(b_vals)), | ||
| ], | ||
| ) | ||
| .unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to keep the old macros in this case and only migrate the uses which can be replaced with upstream version; that way we can track where in the codebase to replace the old macros
Which issue does this PR close?
Closes #13037
Rationale for this change
DataFusion previously maintained custom implementations of
record_batch!andcreate_array!macros. These macros are now available upstream in arrow-rs (added in apache/arrow-rs#6588), so we should use those instead to reduce code duplication and align with the Arrow ecosystem.What changes are included in this PR?
record_batch!andcreate_array!macro definitions fromdatafusion/common/src/test_util.rsarrow::arrayinsteadvec![...]syntax to array literal[...]syntax to match arrow-rs macro expectationsarrow_schemamodule aliases in test modules for macro compatibilityRecordBatch::try_new()construction in cases where variables are passed (macros only support literal values)Are these changes tested?
cargo test --libacross all modified packagescargo clippyandcargo fmtchecks pass on modified codeAre there any user-facing changes?
No user-facing changes. The macros maintain the same public API, just sourced from arrow-rs instead of DataFusion.