-
Notifications
You must be signed in to change notification settings - Fork 1.8k
feat: support multi-threaded writing of Parquet files with modular encryption #16738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support multi-threaded writing of Parquet files with modular encryption #16738
Conversation
|
Note: this includes some unrelated changes so as to be able to use changes in apache/arrow-rs#7818. @adamreeve @corwinjoy thought? |
b7e1398 to
d8709fc
Compare
d8709fc to
bb475bf
Compare
8e872c3 to
5929724
Compare
5929724 to
f1f6d63
Compare
|
This approach looks good to me! Do the existing tests hit the parallel write code path? We probably want to make sure there are encryption tests for both the parallel and serial code paths. |
|
Via copilot: Dependency Updates:
Feature Enhancements:
Code Cleanup:
Protobuf Updates:
|
| let mut current_rg_rows = 0; | ||
| // TODO: row_group_writer should use the correct row group index. Currently this would fail if | ||
| // multiple row groups were written. | ||
| // let mut rg_index = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adamreeve Definitely something to note. We will want to resolve this before the final PR.
|
Looks good to me, with the exception of multi-row group writing being missing. When you go to rebase to the latest datafusion the diff should get a lot simpler since they have upgraded to support arrow-55.3 and added the missing statitistics and types. I also like @adamreeve suggestion of having a test for the parallel write path. |
| .map_err(|e| DataFusionError::ExecutionJoin(Box::new(e)))??; | ||
| Ok(file_metadata) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot comments:
- Row Group Indexing: The TODO comment indicates that the row group writer currently assumes a single row group index (always 0). If multiple row groups are written in parallel, this might cause incorrect output or data corruption. Consider addressing this before merging.
- Encryption Parallelism: The removal of the explicit check for parallelism when encryption is enabled relies on arrow-rs’ internal handling. Double-check upstream to ensure there’s a clear error or fallback if parallelism is not supported with encryption.
- Testing: These changes affect core serialization logic. Ensure thorough integration and performance tests, especially for edge cases (encryption, large datasets, multiple row groups).
- Documentation: Consider updating related docs or comments, as the code flow and APIs used have changed significantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel these were mostly addressed or the code was significantly refactored so they might not apply anymore.
f1f6d63 to
15c5b1a
Compare
…uet (#8162) - Closes #8115. - Closes #8260 - Closes #8259 # Rationale for this change #8029 introduced `pub ArrowWriter.get_column_writers` and `pub ArrowWriter.append_row_group` to enable multi-threaded parquet encrypted writing. However testing downstream showed the API is not feasible, see #8115. # What changes are included in this PR? This introduces `pub ArrowWriter.into_serialized_writer` and deprecates `pub ArrowWriter.get_column_writers` and `pub ArrowWriter.append_row_group`. It also makes `ArrowRowGroupWriterFactory` public and adds a `pub ArrowRowGroupWriterFactory.create_column_writers`. # Are these changes tested? This includes a DataFusion inspired test for concurrent writing across columns and row groups to make sure parallel writing is and remains possible with `ArrowWriter`s API. Further we created a draft PR in DataFusion apache/datafusion#16738 to test for multithreaded writing support. # Are there any user-facing changes? See description of changes. --------- Co-authored-by: Adam Reeve <[email protected]> Co-authored-by: Andrew Lamb <[email protected]>
915bfee to
4dfcbd9
Compare
|
With arrow/parquet bumped to |
|
cc @corwinjoy |
4dfcbd9 to
cc9f7f4
Compare
|
I'll review this shortly. Thanks @rok and @adamreeve |
diff --git c/Cargo.lock i/Cargo.lock index 7499715..f0b9d0a 100644 --- c/Cargo.lock +++ i/Cargo.lock @@ -246,52 +246,62 @@ checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50" [[package]] name = "arrow" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "fd798aea3553913a5986813e9c6ad31a2d2b04e931fe8ea4a37155eb541cebb5" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-arith", - "arrow-array", - "arrow-buffer", - "arrow-cast", + "arrow-arith 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "arrow-csv", - "arrow-data", - "arrow-ipc", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "arrow-json", - "arrow-ord", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "arrow-pyarrow", - "arrow-row", - "arrow-schema", - "arrow-select", - "arrow-string", + "arrow-row 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-string 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "half", "rand 0.9.2", ] [[package]] name = "arrow-arith" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "508dafb53e5804a238cab7fd97a59ddcbfab20cc4d9814b1ab5465b9fa147f2e" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "chrono", + "num", +] + +[[package]] +name = "arrow-arith" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "chrono", "num", ] [[package]] name = "arrow-array" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "e2730bc045d62bb2e53ef8395b7d4242f5c8102f41ceac15e8395b9ac3d08461" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ "ahash 0.8.12", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "chrono", "chrono-tz", "half", @@ -299,11 +309,35 @@ dependencies = [ "num", ] +[[package]] +name = "arrow-array" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "ahash 0.8.12", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "chrono", + "half", + "hashbrown 0.15.4", + "num", +] + [[package]] name = "arrow-buffer" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "54295b93beb702ee9a6f6fbced08ad7f4d76ec1c297952d4b83cf68755421d1d" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" +dependencies = [ + "bytes", + "half", + "num", +] + +[[package]] +name = "arrow-buffer" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ "bytes", "half", @@ -312,15 +346,14 @@ dependencies = [ [[package]] name = "arrow-cast" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "67e8bcb7dc971d779a7280593a1bf0c2743533b8028909073e804552e85e75b5" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", - "arrow-select", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "atoi", "base64 0.22.1", "chrono", @@ -332,14 +365,32 @@ dependencies = [ ] [[package]] -name = "arrow-csv" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "673fd2b5fb57a1754fdbfac425efd7cf54c947ac9950c1cce86b14e248f1c458" +name = "arrow-cast" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ - "arrow-array", - "arrow-cast", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "atoi", + "base64 0.22.1", + "chrono", + "half", + "lexical-core", + "num", + "ryu", +] + +[[package]] +name = "arrow-csv" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "chrono", "csv", "csv-core", @@ -348,33 +399,42 @@ dependencies = [ [[package]] name = "arrow-data" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "97c22fe3da840039c69e9f61f81e78092ea36d57037b4900151f063615a2f6b4" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-buffer", - "arrow-schema", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "half", + "num", +] + +[[package]] +name = "arrow-data" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "half", "num", ] [[package]] name = "arrow-flight" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "6808d235786b721e49e228c44dd94242f2e8b46b7e95b233b0733c46e758bfee" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ - "arrow-arith", - "arrow-array", - "arrow-buffer", - "arrow-cast", - "arrow-data", - "arrow-ipc", - "arrow-ord", - "arrow-row", - "arrow-schema", - "arrow-select", - "arrow-string", + "arrow-arith 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-row 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-string 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "base64 0.22.1", "bytes", "futures", @@ -382,35 +442,45 @@ dependencies = [ "paste", "prost", "prost-types", - "tonic", + "tonic 0.12.3", ] [[package]] name = "arrow-ipc" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "778de14c5a69aedb27359e3dd06dd5f9c481d5f6ee9fbae912dba332fd64636b" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "flatbuffers", "lz4_flex", "zstd", ] [[package]] -name = "arrow-json" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3860db334fe7b19fcf81f6b56f8d9d95053f3839ffe443d56b5436f7a29a1794" +name = "arrow-ipc" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-cast", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "flatbuffers", +] + +[[package]] +name = "arrow-json" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "chrono", "half", "indexmap 2.10.0", @@ -424,78 +494,130 @@ dependencies = [ [[package]] name = "arrow-ord" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "425fa0b42a39d3ff55160832e7c25553e7f012c3f187def3d70313e7a29ba5d9" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", - "arrow-select", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", +] + +[[package]] +name = "arrow-ord" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", ] [[package]] name = "arrow-pyarrow" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d944d8ae9b77230124e6570865b570416c33a5809f32c4136c679bbe774e45c9" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "pyo3", ] [[package]] name = "arrow-row" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "df9c9423c9e71abd1b08a7f788fcd203ba2698ac8e72a1f236f1faa1a06a7414" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "half", +] + +[[package]] +name = "arrow-row" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "half", ] [[package]] name = "arrow-schema" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "85fa1babc4a45fdc64a92175ef51ff00eba5ebbc0007962fecf8022ac1c6ce28" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ "bitflags 2.9.1", "serde", "serde_json", ] +[[package]] +name = "arrow-schema" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" + [[package]] name = "arrow-select" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d8854d15f1cf5005b4b358abeb60adea17091ff5bdd094dca5d3f73787d81170" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ "ahash 0.8.12", - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "num", +] + +[[package]] +name = "arrow-select" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "ahash 0.8.12", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "num", ] [[package]] name = "arrow-string" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2c477e8b89e1213d5927a2a84a72c384a9bf4dd0dbf15f9fd66d821aafd9e95e" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", - "arrow-select", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "memchr", + "num", + "regex", + "regex-syntax", +] + +[[package]] +name = "arrow-string" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "memchr", "num", "regex", @@ -567,6 +689,28 @@ dependencies = [ "syn 2.0.106", ] +[[package]] +name = "async-stream" +version = "0.3.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0b5a71a6f37880a80d1d7f19efd781e4b5de42c88f0722cc13bcb6cc2cfe8476" +dependencies = [ + "async-stream-impl", + "futures-core", + "pin-project-lite", +] + +[[package]] +name = "async-stream-impl" +version = "0.3.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c7c24de15d275a1ecfd47a380fb4d5ec9bfe0933f309ed5e705b775596a3574d" +dependencies = [ + "proc-macro2", + "quote", + "syn 2.0.104", +] + [[package]] name = "async-trait" version = "0.1.89" @@ -827,7 +971,7 @@ dependencies = [ "rustls-native-certs", "rustls-pki-types", "tokio", - "tower", + "tower 0.5.2", "tracing", ] @@ -948,18 +1092,19 @@ dependencies = [ [[package]] name = "axum" -version = "0.8.4" +version = "0.7.9" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "021e862c184ae977658b36c4500f7feac3221ca5da43e3f25bd04ab6c79a29b5" +checksum = "edca88bc138befd0323b20752846e6587272d3b03b0343c8ea28a6f819e6e71f" dependencies = [ - "axum-core", + "async-trait", + "axum-core 0.4.5", "bytes", "futures-util", "http 1.3.1", "http-body 1.0.1", "http-body-util", "itoa", - "matchit", + "matchit 0.7.3", "memchr", "mime", "percent-encoding", @@ -967,7 +1112,53 @@ dependencies = [ "rustversion", "serde", "sync_wrapper", - "tower", + "tower 0.5.2", + "tower-layer", + "tower-service", +] + +[[package]] +name = "axum" +version = "0.8.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "021e862c184ae977658b36c4500f7feac3221ca5da43e3f25bd04ab6c79a29b5" +dependencies = [ + "axum-core 0.5.2", + "bytes", + "futures-util", + "http 1.3.1", + "http-body 1.0.1", + "http-body-util", + "itoa", + "matchit 0.8.4", + "memchr", + "mime", + "percent-encoding", + "pin-project-lite", + "rustversion", + "serde", + "sync_wrapper", + "tower 0.5.2", + "tower-layer", + "tower-service", +] + +[[package]] +name = "axum-core" +version = "0.4.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "09f2bd6146b97ae3359fa0cc6d6b376d9539582c7b4220f041a33ec24c226199" +dependencies = [ + "async-trait", + "bytes", + "futures-util", + "http 1.3.1", + "http-body 1.0.1", + "http-body-util", + "mime", + "pin-project-lite", + "rustversion", + "sync_wrapper", "tower-layer", "tower-service", ] @@ -1818,8 +2009,8 @@ name = "datafusion" version = "49.0.1" dependencies = [ "arrow", - "arrow-ipc", - "arrow-schema", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-trait", "bytes", "bzip2 0.6.0", @@ -1996,7 +2187,7 @@ dependencies = [ "ahash 0.8.12", "apache-avro", "arrow", - "arrow-ipc", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "base64 0.22.1", "chrono", "half", @@ -2176,7 +2367,7 @@ version = "49.0.1" dependencies = [ "arrow", "arrow-flight", - "arrow-schema", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-trait", "base64 0.22.1", "bytes", @@ -2197,7 +2388,7 @@ dependencies = [ "tempfile", "test-utils", "tokio", - "tonic", + "tonic 0.13.1", "tracing", "tracing-subscriber", "url", @@ -2264,7 +2455,7 @@ version = "49.0.1" dependencies = [ "abi_stable", "arrow", - "arrow-schema", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-ffi", "async-trait", "datafusion", @@ -2284,7 +2475,7 @@ name = "datafusion-functions" version = "49.0.1" dependencies = [ "arrow", - "arrow-buffer", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "base64 0.22.1", "blake2", "blake3", @@ -2347,7 +2538,7 @@ name = "datafusion-functions-nested" version = "49.0.1" dependencies = [ "arrow", - "arrow-ord", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "criterion", "datafusion-common", "datafusion-doc", @@ -2517,8 +2708,8 @@ version = "49.0.1" dependencies = [ "ahash 0.8.12", "arrow", - "arrow-ord", - "arrow-schema", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-trait", "chrono", "criterion", @@ -2589,7 +2780,7 @@ name = "datafusion-pruning" version = "49.0.1" dependencies = [ "arrow", - "arrow-schema", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "datafusion-common", "datafusion-datasource", "datafusion-expr", @@ -4157,6 +4348,12 @@ dependencies = [ "pkg-config", ] +[[package]] +name = "matchit" +version = "0.7.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0e7465ac9959cc2b1404e8e2367b43684a6d13790fe23056cc8c6c5a6b7bcb94" + [[package]] name = "matchit" version = "0.8.4" @@ -4529,18 +4726,17 @@ dependencies = [ [[package]] name = "parquet" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "c7288a07ed5d25939a90f9cb1ca5afa6855faa08ec7700613511ae64bdb0620c" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ "ahash 0.8.12", - "arrow-array", - "arrow-buffer", - "arrow-cast", - "arrow-data", - "arrow-ipc", - "arrow-schema", - "arrow-select", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "base64 0.22.1", "brotli", "bytes", @@ -5449,7 +5645,7 @@ dependencies = [ "tokio", "tokio-rustls", "tokio-util", - "tower", + "tower 0.5.2", "tower-http", "tower-service", "url", @@ -6681,12 +6877,13 @@ dependencies = [ [[package]] name = "tonic" -version = "0.13.1" +version = "0.12.3" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "7e581ba15a835f4d9ea06c55ab1bd4dce26fc53752c69a04aac00703bfb49ba9" +checksum = "877c5b330756d856ffcc4553ab34a5684481ade925ecc54bcd1bf02b1d0d4d52" dependencies = [ + "async-stream", "async-trait", - "axum", + "axum 0.7.9", "base64 0.22.1", "bytes", "h2", @@ -6702,7 +6899,56 @@ dependencies = [ "socket2 0.5.10", "tokio", "tokio-stream", - "tower", + "tower 0.4.13", + "tower-layer", + "tower-service", + "tracing", +] + +[[package]] +name = "tonic" +version = "0.13.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7e581ba15a835f4d9ea06c55ab1bd4dce26fc53752c69a04aac00703bfb49ba9" +dependencies = [ + "async-trait", + "axum 0.8.4", + "base64 0.22.1", + "bytes", + "h2", + "http 1.3.1", + "http-body 1.0.1", + "http-body-util", + "hyper", + "hyper-timeout", + "hyper-util", + "percent-encoding", + "pin-project", + "prost", + "socket2 0.5.10", + "tokio", + "tokio-stream", + "tower 0.5.2", + "tower-layer", + "tower-service", + "tracing", +] + +[[package]] +name = "tower" +version = "0.4.13" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b8fa9be0de6cf49e536ce1851f987bd21a43b771b09473c3549a6c853db37c1c" +dependencies = [ + "futures-core", + "futures-util", + "indexmap 1.9.3", + "pin-project", + "pin-project-lite", + "rand 0.8.5", + "slab", + "tokio", + "tokio-util", "tower-layer", "tower-service", "tracing", @@ -6740,7 +6986,7 @@ dependencies = [ "http-body 1.0.1", "iri-string", "pin-project-lite", - "tower", + "tower 0.5.2", "tower-layer", "tower-service", ] diff --git c/Cargo.toml i/Cargo.toml index 5915035..5ee3cc5 100644 --- c/Cargo.toml +++ i/Cargo.toml @@ -90,19 +90,20 @@ ahash = { version = "0.8", default-features = false, features = [ "runtime-rng", ] } apache-avro = { version = "0.17", default-features = false } -arrow = { version = "56.0.0", features = [ +arrow = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing", features = [ "prettyprint", "chrono-tz", ] } -arrow-buffer = { version = "56.0.0", default-features = false } -arrow-flight = { version = "56.0.0", features = [ + +arrow-buffer = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing", default-features = false } +arrow-flight = { git = "https://github.com/rok/arrow-rs.git", features = [ "flight-sql-experimental", ] } -arrow-ipc = { version = "56.0.0", default-features = false, features = [ +arrow-ipc = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing", default-features = false, features = [ "lz4", ] } -arrow-ord = { version = "56.0.0", default-features = false } -arrow-schema = { version = "56.0.0", default-features = false } +arrow-ord = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing", default-features = false } +arrow-schema = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing", default-features = false } async-trait = "0.1.89" bigdecimal = "0.4.8" bytes = "1.10" @@ -157,7 +158,7 @@ itertools = "0.14" log = "^0.4" object_store = { version = "0.12.3", default-features = false } parking_lot = "0.12" -parquet = { version = "56.0.0", default-features = false, features = [ +parquet = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing", default-features = false, features = [ "arrow", "async", "object_store", diff --git c/datafusion-examples/Cargo.toml i/datafusion-examples/Cargo.toml index f12bd92..b4c8d35 100644 --- c/datafusion-examples/Cargo.toml +++ i/datafusion-examples/Cargo.toml @@ -32,18 +32,6 @@ rust-version = { workspace = true } [lints] workspace = true -[[example]] -name = "flight_sql_server" -path = "examples/flight/flight_sql_server.rs" - -[[example]] -name = "flight_server" -path = "examples/flight/flight_server.rs" - -[[example]] -name = "flight_client" -path = "examples/flight/flight_client.rs" - [[example]] name = "dataframe_to_s3" path = "examples/external_dependency/dataframe-to-s3.rs" diff --git c/datafusion/common/Cargo.toml i/datafusion/common/Cargo.toml index afd74c7..8040b3a 100644 --- c/datafusion/common/Cargo.toml +++ i/datafusion/common/Cargo.toml @@ -71,7 +71,7 @@ log = { workspace = true } object_store = { workspace = true, optional = true } parquet = { workspace = true, optional = true, default-features = true } paste = "1.0.15" -pyo3 = { version = "0.25", optional = true } +pyo3 = { version = "0.25.1", optional = true } recursive = { workspace = true, optional = true } sqlparser = { workspace = true } tokio = { workspace = true } diff --git c/datafusion/common/src/file_options/parquet_writer.rs i/datafusion/common/src/file_options/parquet_writer.rs index 185826a..d7b490a 100644 --- c/datafusion/common/src/file_options/parquet_writer.rs +++ i/datafusion/common/src/file_options/parquet_writer.rs @@ -25,6 +25,8 @@ use crate::{ DataFusionError, Result, _internal_datafusion_err, }; +pub const DEFAULT_MAX_STATISTICS_SIZE: usize = 4096; + use arrow::datatypes::Schema; // TODO: handle once deprecated #[allow(deprecated)] diff --git c/datafusion/common/src/scalar/mod.rs i/datafusion/common/src/scalar/mod.rs index 5124761..8f8c520 100644 --- c/datafusion/common/src/scalar/mod.rs +++ i/datafusion/common/src/scalar/mod.rs @@ -2386,7 +2386,9 @@ impl ScalarValue { | DataType::Time64(TimeUnit::Millisecond) | DataType::RunEndEncoded(_, _) | DataType::ListView(_) - | DataType::LargeListView(_) => { + | DataType::LargeListView(_) + | DataType::Decimal32(_, _) + | DataType::Decimal64(_, _) => { return _not_impl_err!( "Unsupported creation of {:?} array from ScalarValue {:?}", data_type, diff --git c/datafusion/core/src/dataframe/parquet.rs i/datafusion/core/src/dataframe/parquet.rs index 83bb601..01149c1 100644 --- c/datafusion/core/src/dataframe/parquet.rs +++ i/datafusion/core/src/dataframe/parquet.rs @@ -278,6 +278,7 @@ mod tests { // Write encrypted parquet using write_parquet let mut options = TableParquetOptions::default(); options.crypto.file_encryption = Some((&encrypt).into()); + options.global.allow_single_file_parallelism = true; df.write_parquet( tempfile_str.as_str(), diff --git c/datafusion/core/tests/fuzz_cases/pruning.rs i/datafusion/core/tests/fuzz_cases/pruning.rs index c6e30c0..4ab1f08 100644 --- c/datafusion/core/tests/fuzz_cases/pruning.rs +++ i/datafusion/core/tests/fuzz_cases/pruning.rs @@ -314,7 +314,7 @@ async fn execute_with_predicate( } async fn write_parquet_file( - truncation_length: Option<usize>, + _truncation_length: Option<usize>, schema: Arc<Schema>, row_groups: Vec<Vec<String>>, ) -> Bytes { diff --git c/datafusion/datasource-avro/src/avro_to_arrow/schema.rs i/datafusion/datasource-avro/src/avro_to_arrow/schema.rs index cc87d3c..00b3f9d 100644 --- c/datafusion/datasource-avro/src/avro_to_arrow/schema.rs +++ i/datafusion/datasource-avro/src/avro_to_arrow/schema.rs @@ -239,6 +239,8 @@ fn default_field_name(dt: &DataType) -> &str { DataType::Decimal64(_, _) => "decimal", DataType::Decimal128(_, _) => "decimal", DataType::Decimal256(_, _) => "decimal", + DataType::Decimal32(_, _) => "decimal", + DataType::Decimal64(_, _) => "decimal", } } diff --git c/datafusion/datasource-parquet/src/file_format.rs i/datafusion/datasource-parquet/src/file_format.rs index 5671853..934a7b2 100644 --- c/datafusion/datasource-parquet/src/file_format.rs +++ i/datafusion/datasource-parquet/src/file_format.rs @@ -78,8 +78,8 @@ use object_store::path::Path; use object_store::{ObjectMeta, ObjectStore}; use parquet::arrow::arrow_reader::statistics::StatisticsConverter; use parquet::arrow::arrow_writer::{ - compute_leaves, get_column_writers, ArrowColumnChunk, ArrowColumnWriter, - ArrowLeafColumn, ArrowWriterOptions, + compute_leaves, ArrowColumnChunk, ArrowColumnWriter, ArrowLeafColumn, + ArrowRowGroupWriterFactory, ArrowWriterOptions, }; use parquet::arrow::async_reader::MetadataFetch; use parquet::arrow::{parquet_to_arrow_schema, ArrowSchemaConverter, AsyncArrowWriter}; @@ -1570,7 +1570,7 @@ impl FileSink for ParquetSink { while let Some((path, mut rx)) = file_stream_rx.recv().await { let parquet_props = self.create_writer_props(&runtime, &path)?; - if !allow_single_file_parallelism { + if !parquet_opts.global.allow_single_file_parallelism { let mut writer = self .create_async_arrow_writer( &path, @@ -1698,13 +1698,13 @@ type ColSender = Sender<ArrowLeafColumn>; /// Returns join handles for each columns serialization task along with a send channel /// to send arrow arrays to each serialization task. fn spawn_column_parallel_row_group_writer( - schema: Arc<Schema>, - parquet_props: Arc<WriterProperties>, + arrow_row_group_writer_factory: Arc<ArrowRowGroupWriterFactory>, max_buffer_size: usize, pool: &Arc<dyn MemoryPool>, ) -> Result<(Vec<ColumnWriterTask>, Vec<ColSender>)> { - let schema_desc = ArrowSchemaConverter::new().convert(&schema)?; - let col_writers = get_column_writers(&schema_desc, &parquet_props, &schema)?; + let arrow_row_group_writer = + arrow_row_group_writer_factory.create_row_group_writer(0)?; + let col_writers = arrow_row_group_writer.into_column_writers(); let num_columns = col_writers.len(); let mut col_writer_tasks = Vec::with_capacity(num_columns); @@ -1799,6 +1799,7 @@ fn spawn_rg_join_and_finalize_task( /// across both columns and row_groups, with a theoretical max number of parallel tasks /// given by n_columns * num_row_groups. fn spawn_parquet_parallel_serialization_task( + arrow_row_group_writer_factory: Arc<ArrowRowGroupWriterFactory>, mut data: Receiver<RecordBatch>, serialize_tx: Sender<SpawnedTask<RBStreamSerializeResult>>, schema: Arc<Schema>, @@ -1811,12 +1812,14 @@ fn spawn_parquet_parallel_serialization_task( let max_row_group_rows = writer_props.max_row_group_size(); let (mut column_writer_handles, mut col_array_channels) = spawn_column_parallel_row_group_writer( - Arc::clone(&schema), - Arc::clone(&writer_props), + Arc::clone(&arrow_row_group_writer_factory), max_buffer_rb, &pool, )?; let mut current_rg_rows = 0; + // TODO: row_group_writer should use the correct row group index. Currently this would fail if + // multiple row groups were written. + // let mut rg_index = 0; while let Some(mut rb) = data.recv().await { // This loop allows the "else" block to repeatedly split the RecordBatch to handle the case @@ -1863,8 +1866,7 @@ fn spawn_parquet_parallel_serialization_task( (column_writer_handles, col_array_channels) = spawn_column_parallel_row_group_writer( - Arc::clone(&schema), - Arc::clone(&writer_props), + Arc::clone(&arrow_row_group_writer_factory), max_buffer_rb, &pool, )?; @@ -1895,24 +1897,15 @@ fn spawn_parquet_parallel_serialization_task( /// Consume RowGroups serialized by other parallel tasks and concatenate them in /// to the final parquet file, while flushing finalized bytes to an [ObjectStore] async fn concatenate_parallel_row_groups( + mut parquet_writer: SerializedFileWriter<SharedBuffer>, + merged_buff: SharedBuffer, mut serialize_rx: Receiver<SpawnedTask<RBStreamSerializeResult>>, - schema: Arc<Schema>, - writer_props: Arc<WriterProperties>, mut object_store_writer: Box<dyn AsyncWrite + Send + Unpin>, pool: Arc<dyn MemoryPool>, ) -> Result<FileMetaData> { - let merged_buff = SharedBuffer::new(INITIAL_BUFFER_BYTES); - let mut file_reservation = MemoryConsumer::new("ParquetSink(SerializedFileWriter)").register(&pool); - let schema_desc = ArrowSchemaConverter::new().convert(schema.as_ref())?; - let mut parquet_writer = SerializedFileWriter::new( - merged_buff.clone(), - schema_desc.root_schema_ptr(), - writer_props, - )?; - while let Some(task) = serialize_rx.recv().await { let result = task.join_unwind().await; let mut rg_out = parquet_writer.next_row_group()?; @@ -1963,8 +1956,25 @@ async fn output_single_parquet_file_parallelized( let (serialize_tx, serialize_rx) = mpsc::channel::<SpawnedTask<RBStreamSerializeResult>>(max_rowgroups); + let parquet_schema = ArrowSchemaConverter::new() + .with_coerce_types(parquet_props.coerce_types()) + .convert(&output_schema)?; + let merged_buff = SharedBuffer::new(INITIAL_BUFFER_BYTES); + let parquet_writer = SerializedFileWriter::new( + merged_buff.clone(), + parquet_schema.root_schema_ptr(), + parquet_props.clone().into(), + )?; + let arrow_row_group_writer_factory = ArrowRowGroupWriterFactory::new( + &parquet_writer, + parquet_schema, + Arc::clone(&output_schema), + parquet_props.clone().into(), + ); + let arc_props = Arc::new(parquet_props.clone()); let launch_serialization_task = spawn_parquet_parallel_serialization_task( + Arc::new(arrow_row_group_writer_factory), data, serialize_tx, Arc::clone(&output_schema), @@ -1972,19 +1982,21 @@ async fn output_single_parquet_file_parallelized( parallel_options, Arc::clone(&pool), ); - let file_metadata = concatenate_parallel_row_groups( - serialize_rx, - Arc::clone(&output_schema), - Arc::clone(&arc_props), - object_store_writer, - pool, - ) - .await?; launch_serialization_task .join_unwind() .await .map_err(|e| DataFusionError::ExecutionJoin(Box::new(e)))??; + + let file_metadata = concatenate_parallel_row_groups( + parquet_writer, + merged_buff, + serialize_rx, + object_store_writer, + pool, + ) + .await?; + Ok(file_metadata) } diff --git c/datafusion/expr/src/utils.rs i/datafusion/expr/src/utils.rs index 7a612b6..cd8e419 100644 --- c/datafusion/expr/src/utils.rs +++ i/datafusion/expr/src/utils.rs @@ -818,6 +818,8 @@ pub fn can_hash(data_type: &DataType) -> bool { DataType::Decimal64(_, _) => true, DataType::Decimal128(_, _) => true, DataType::Decimal256(_, _) => true, + DataType::Decimal32(_, _) => true, + DataType::Decimal64(_, _) => true, DataType::Timestamp(_, _) => true, DataType::Utf8 => true, DataType::LargeUtf8 => true, diff --git c/datafusion/sql/src/unparser/expr.rs i/datafusion/sql/src/unparser/expr.rs index 0501a4e..86c648c 100644 --- c/datafusion/sql/src/unparser/expr.rs +++ i/datafusion/sql/src/unparser/expr.rs @@ -1729,7 +1729,9 @@ impl Unparser<'_> { not_impl_err!("Unsupported DataType: conversion: {data_type:?}") } DataType::Decimal128(precision, scale) - | DataType::Decimal256(precision, scale) => { + | DataType::Decimal256(precision, scale) + | DataType::Decimal32(precision, scale) + | DataType::Decimal64(precision, scale) => { let mut new_precision = *precision as u64; let mut new_scale = *scale as u64; if *scale < 0 { diff --git c/datafusion/sqllogictest/test_files/copy.slt i/datafusion/sqllogictest/test_files/copy.slt index 096cde8..e16fcfe 100644 --- c/datafusion/sqllogictest/test_files/copy.slt +++ i/datafusion/sqllogictest/test_files/copy.slt @@ -306,7 +306,7 @@ select * from validate_struct_with_array; # Copy parquet with all supported statement overrides -query I +query error DataFusion error: Invalid or Unsupported Configuration: Config value "max_statistics_size" not found on ParquetOptions COPY source_table TO 'test_files/scratch/copy/table_with_options/' STORED AS PARQUET @@ -336,8 +336,6 @@ OPTIONS ( 'format.bloom_filter_ndv' 100, 'format.metadata::key' 'value' ) ----- -2 # valid vs invalid metadata @@ -404,11 +402,8 @@ OPTIONS ( statement ok CREATE EXTERNAL TABLE validate_parquet_with_options STORED AS PARQUET LOCATION 'test_files/scratch/copy/table_with_options/'; -query IT +statement count 0 select * from validate_parquet_with_options; ----- -1 Foo -2 Bar # Copy from table to single file query I diff --git c/datafusion/substrait/src/logical_plan/consumer/utils.rs i/datafusion/substrait/src/logical_plan/consumer/utils.rs index f7eedcb..f809bc8 100644 --- c/datafusion/substrait/src/logical_plan/consumer/utils.rs +++ i/datafusion/substrait/src/logical_plan/consumer/utils.rs @@ -216,7 +216,9 @@ pub fn rename_data_type( | DataType::Decimal32(_, _) | DataType::Decimal64(_, _) | DataType::Decimal128(_, _) - | DataType::Decimal256(_, _) => Ok(data_type.clone()), + | DataType::Decimal256(_, _) + | DataType::Decimal32(_, _) + | DataType::Decimal64(_, _) => Ok(data_type.clone()), } }
diff --git c/Cargo.lock i/Cargo.lock
index f0b9d0a5f..373239aab 100644
--- c/Cargo.lock
+++ i/Cargo.lock
@@ -246,62 +246,62 @@ checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50"
[[package]]
name = "arrow"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-arith 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-arith 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"arrow-csv",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"arrow-json",
- "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-ord 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"arrow-pyarrow",
- "arrow-row 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-string 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-row 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-string 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"half",
"rand 0.9.2",
]
[[package]]
name = "arrow-arith"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"chrono",
"num",
]
[[package]]
name = "arrow-arith"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
"chrono",
"num",
]
[[package]]
name = "arrow-array"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
"ahash 0.8.12",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"chrono",
"chrono-tz",
"half",
@@ -311,13 +311,13 @@ dependencies = [
[[package]]
name = "arrow-array"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
"ahash 0.8.12",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
"chrono",
"half",
"hashbrown 0.15.4",
@@ -326,8 +326,8 @@ dependencies = [
[[package]]
name = "arrow-buffer"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
"bytes",
"half",
@@ -336,8 +336,8 @@ dependencies = [
[[package]]
name = "arrow-buffer"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
"bytes",
"half",
@@ -346,14 +346,14 @@ dependencies = [
[[package]]
name = "arrow-cast"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"atoi",
"base64 0.22.1",
"chrono",
@@ -366,14 +366,14 @@ dependencies = [
[[package]]
name = "arrow-cast"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
"atoi",
"base64 0.22.1",
"chrono",
@@ -385,12 +385,12 @@ dependencies = [
[[package]]
name = "arrow-csv"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"chrono",
"csv",
"csv-core",
@@ -399,42 +399,42 @@ dependencies = [
[[package]]
name = "arrow-data"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"half",
"num",
]
[[package]]
name = "arrow-data"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
"half",
"num",
]
[[package]]
name = "arrow-flight"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
- "arrow-arith 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-row 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-string 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-arith 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-ord 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-row 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-string 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
"base64 0.22.1",
"bytes",
"futures",
@@ -442,18 +442,18 @@ dependencies = [
"paste",
"prost",
"prost-types",
- "tonic 0.12.3",
+ "tonic",
]
[[package]]
name = "arrow-ipc"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"flatbuffers",
"lz4_flex",
"zstd",
@@ -461,26 +461,26 @@ dependencies = [
[[package]]
name = "arrow-ipc"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
"flatbuffers",
]
[[package]]
name = "arrow-json"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"chrono",
"half",
"indexmap 2.10.0",
@@ -494,67 +494,67 @@ dependencies = [
[[package]]
name = "arrow-ord"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
]
[[package]]
name = "arrow-ord"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
]
[[package]]
name = "arrow-pyarrow"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"pyo3",
]
[[package]]
name = "arrow-row"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"half",
]
[[package]]
name = "arrow-row"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
"half",
]
[[package]]
name = "arrow-schema"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
"bitflags 2.9.1",
"serde",
@@ -563,45 +563,45 @@ dependencies = [
[[package]]
name = "arrow-schema"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
[[package]]
name = "arrow-select"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
"ahash 0.8.12",
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"num",
]
[[package]]
name = "arrow-select"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
"ahash 0.8.12",
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
"num",
]
[[package]]
name = "arrow-string"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"memchr",
"num",
"regex",
@@ -610,14 +610,14 @@ dependencies = [
[[package]]
name = "arrow-string"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
"memchr",
"num",
"regex",
@@ -689,28 +689,6 @@ dependencies = [
"syn 2.0.106",
]
-[[package]]
-name = "async-stream"
-version = "0.3.6"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "0b5a71a6f37880a80d1d7f19efd781e4b5de42c88f0722cc13bcb6cc2cfe8476"
-dependencies = [
- "async-stream-impl",
- "futures-core",
- "pin-project-lite",
-]
-
-[[package]]
-name = "async-stream-impl"
-version = "0.3.6"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "c7c24de15d275a1ecfd47a380fb4d5ec9bfe0933f309ed5e705b775596a3574d"
-dependencies = [
- "proc-macro2",
- "quote",
- "syn 2.0.104",
-]
-
[[package]]
name = "async-trait"
version = "0.1.89"
@@ -971,7 +949,7 @@ dependencies = [
"rustls-native-certs",
"rustls-pki-types",
"tokio",
- "tower 0.5.2",
+ "tower",
"tracing",
]
@@ -1090,47 +1068,20 @@ dependencies = [
"tracing",
]
-[[package]]
-name = "axum"
-version = "0.7.9"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "edca88bc138befd0323b20752846e6587272d3b03b0343c8ea28a6f819e6e71f"
-dependencies = [
- "async-trait",
- "axum-core 0.4.5",
- "bytes",
- "futures-util",
- "http 1.3.1",
- "http-body 1.0.1",
- "http-body-util",
- "itoa",
- "matchit 0.7.3",
- "memchr",
- "mime",
- "percent-encoding",
- "pin-project-lite",
- "rustversion",
- "serde",
- "sync_wrapper",
- "tower 0.5.2",
- "tower-layer",
- "tower-service",
-]
-
[[package]]
name = "axum"
version = "0.8.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "021e862c184ae977658b36c4500f7feac3221ca5da43e3f25bd04ab6c79a29b5"
dependencies = [
- "axum-core 0.5.2",
+ "axum-core",
"bytes",
"futures-util",
"http 1.3.1",
"http-body 1.0.1",
"http-body-util",
"itoa",
- "matchit 0.8.4",
+ "matchit",
"memchr",
"mime",
"percent-encoding",
@@ -1138,27 +1089,7 @@ dependencies = [
"rustversion",
"serde",
"sync_wrapper",
- "tower 0.5.2",
- "tower-layer",
- "tower-service",
-]
-
-[[package]]
-name = "axum-core"
-version = "0.4.5"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "09f2bd6146b97ae3359fa0cc6d6b376d9539582c7b4220f041a33ec24c226199"
-dependencies = [
- "async-trait",
- "bytes",
- "futures-util",
- "http 1.3.1",
- "http-body 1.0.1",
- "http-body-util",
- "mime",
- "pin-project-lite",
- "rustversion",
- "sync_wrapper",
+ "tower",
"tower-layer",
"tower-service",
]
@@ -2009,8 +1940,8 @@ name = "datafusion"
version = "49.0.1"
dependencies = [
"arrow",
- "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"async-trait",
"bytes",
"bzip2 0.6.0",
@@ -2187,7 +2118,7 @@ dependencies = [
"ahash 0.8.12",
"apache-avro",
"arrow",
- "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"base64 0.22.1",
"chrono",
"half",
@@ -2367,7 +2298,7 @@ version = "49.0.1"
dependencies = [
"arrow",
"arrow-flight",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"async-trait",
"base64 0.22.1",
"bytes",
@@ -2388,7 +2319,7 @@ dependencies = [
"tempfile",
"test-utils",
"tokio",
- "tonic 0.13.1",
+ "tonic",
"tracing",
"tracing-subscriber",
"url",
@@ -2455,7 +2386,7 @@ version = "49.0.1"
dependencies = [
"abi_stable",
"arrow",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"async-ffi",
"async-trait",
"datafusion",
@@ -2475,7 +2406,7 @@ name = "datafusion-functions"
version = "49.0.1"
dependencies = [
"arrow",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"base64 0.22.1",
"blake2",
"blake3",
@@ -2538,7 +2469,7 @@ name = "datafusion-functions-nested"
version = "49.0.1"
dependencies = [
"arrow",
- "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-ord 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"criterion",
"datafusion-common",
"datafusion-doc",
@@ -2708,8 +2639,8 @@ version = "49.0.1"
dependencies = [
"ahash 0.8.12",
"arrow",
- "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-ord 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"async-trait",
"chrono",
"criterion",
@@ -2780,7 +2711,7 @@ name = "datafusion-pruning"
version = "49.0.1"
dependencies = [
"arrow",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"datafusion-common",
"datafusion-datasource",
"datafusion-expr",
@@ -4348,12 +4279,6 @@ dependencies = [
"pkg-config",
]
-[[package]]
-name = "matchit"
-version = "0.7.3"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "0e7465ac9959cc2b1404e8e2367b43684a6d13790fe23056cc8c6c5a6b7bcb94"
-
[[package]]
name = "matchit"
version = "0.8.4"
@@ -4726,17 +4651,17 @@ dependencies = [
[[package]]
name = "parquet"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
"ahash 0.8.12",
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"base64 0.22.1",
"brotli",
"bytes",
@@ -5645,7 +5570,7 @@ dependencies = [
"tokio",
"tokio-rustls",
"tokio-util",
- "tower 0.5.2",
+ "tower",
"tower-http",
"tower-service",
"url",
@@ -6875,36 +6800,6 @@ dependencies = [
"winnow",
]
-[[package]]
-name = "tonic"
-version = "0.12.3"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "877c5b330756d856ffcc4553ab34a5684481ade925ecc54bcd1bf02b1d0d4d52"
-dependencies = [
- "async-stream",
- "async-trait",
- "axum 0.7.9",
- "base64 0.22.1",
- "bytes",
- "h2",
- "http 1.3.1",
- "http-body 1.0.1",
- "http-body-util",
- "hyper",
- "hyper-timeout",
- "hyper-util",
- "percent-encoding",
- "pin-project",
- "prost",
- "socket2 0.5.10",
- "tokio",
- "tokio-stream",
- "tower 0.4.13",
- "tower-layer",
- "tower-service",
- "tracing",
-]
-
[[package]]
name = "tonic"
version = "0.13.1"
@@ -6912,7 +6807,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7e581ba15a835f4d9ea06c55ab1bd4dce26fc53752c69a04aac00703bfb49ba9"
dependencies = [
"async-trait",
- "axum 0.8.4",
+ "axum",
"base64 0.22.1",
"bytes",
"h2",
@@ -6928,27 +6823,7 @@ dependencies = [
"socket2 0.5.10",
"tokio",
"tokio-stream",
- "tower 0.5.2",
- "tower-layer",
- "tower-service",
- "tracing",
-]
-
-[[package]]
-name = "tower"
-version = "0.4.13"
-source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "b8fa9be0de6cf49e536ce1851f987bd21a43b771b09473c3549a6c853db37c1c"
-dependencies = [
- "futures-core",
- "futures-util",
- "indexmap 1.9.3",
- "pin-project",
- "pin-project-lite",
- "rand 0.8.5",
- "slab",
- "tokio",
- "tokio-util",
+ "tower",
"tower-layer",
"tower-service",
"tracing",
@@ -6986,7 +6861,7 @@ dependencies = [
"http-body 1.0.1",
"iri-string",
"pin-project-lite",
- "tower 0.5.2",
+ "tower",
"tower-layer",
"tower-service",
]
diff --git c/Cargo.toml i/Cargo.toml
index 5ee3cc566..b6f6a35be 100644
--- c/Cargo.toml
+++ i/Cargo.toml
@@ -90,20 +90,20 @@ ahash = { version = "0.8", default-features = false, features = [
"runtime-rng",
] }
apache-avro = { version = "0.17", default-features = false }
-arrow = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing", features = [
+arrow = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing_2", features = [
"prettyprint",
"chrono-tz",
] }
-arrow-buffer = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing", default-features = false }
+arrow-buffer = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing_2", default-features = false }
arrow-flight = { git = "https://github.com/rok/arrow-rs.git", features = [
"flight-sql-experimental",
] }
-arrow-ipc = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing", default-features = false, features = [
+arrow-ipc = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing_2", default-features = false, features = [
"lz4",
] }
-arrow-ord = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing", default-features = false }
-arrow-schema = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing", default-features = false }
+arrow-ord = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing_2", default-features = false }
+arrow-schema = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing_2", default-features = false }
async-trait = "0.1.89"
bigdecimal = "0.4.8"
bytes = "1.10"
@@ -158,7 +158,7 @@ itertools = "0.14"
log = "^0.4"
object_store = { version = "0.12.3", default-features = false }
parking_lot = "0.12"
-parquet = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing", default-features = false, features = [
+parquet = { git = "https://github.com/rok/arrow-rs.git", branch = "multi-threaded_encrypted_writing_2", default-features = false, features = [
"arrow",
"async",
"object_store",
diff --git c/datafusion-examples/Cargo.toml i/datafusion-examples/Cargo.toml
index b4c8d3507..f12bd9202 100644
--- c/datafusion-examples/Cargo.toml
+++ i/datafusion-examples/Cargo.toml
@@ -32,6 +32,18 @@ rust-version = { workspace = true }
[lints]
workspace = true
+[[example]]
+name = "flight_sql_server"
+path = "examples/flight/flight_sql_server.rs"
+
+[[example]]
+name = "flight_server"
+path = "examples/flight/flight_server.rs"
+
+[[example]]
+name = "flight_client"
+path = "examples/flight/flight_client.rs"
+
[[example]]
name = "dataframe_to_s3"
path = "examples/external_dependency/dataframe-to-s3.rs"
diff --git c/datafusion/common/Cargo.toml i/datafusion/common/Cargo.toml
index 8040b3ad1..afd74c7be 100644
--- c/datafusion/common/Cargo.toml
+++ i/datafusion/common/Cargo.toml
@@ -71,7 +71,7 @@ log = { workspace = true }
object_store = { workspace = true, optional = true }
parquet = { workspace = true, optional = true, default-features = true }
paste = "1.0.15"
-pyo3 = { version = "0.25.1", optional = true }
+pyo3 = { version = "0.25", optional = true }
recursive = { workspace = true, optional = true }
sqlparser = { workspace = true }
tokio = { workspace = true }
diff --git c/datafusion/common/src/file_options/parquet_writer.rs i/datafusion/common/src/file_options/parquet_writer.rs
index d7b490af0..185826aef 100644
--- c/datafusion/common/src/file_options/parquet_writer.rs
+++ i/datafusion/common/src/file_options/parquet_writer.rs
@@ -25,8 +25,6 @@ use crate::{
DataFusionError, Result, _internal_datafusion_err,
};
-pub const DEFAULT_MAX_STATISTICS_SIZE: usize = 4096;
-
use arrow::datatypes::Schema;
// TODO: handle once deprecated
#[allow(deprecated)]
diff --git c/datafusion/common/src/scalar/mod.rs i/datafusion/common/src/scalar/mod.rs
index 8f8c52086..51247612e 100644
--- c/datafusion/common/src/scalar/mod.rs
+++ i/datafusion/common/src/scalar/mod.rs
@@ -2386,9 +2386,7 @@ impl ScalarValue {
| DataType::Time64(TimeUnit::Millisecond)
| DataType::RunEndEncoded(_, _)
| DataType::ListView(_)
- | DataType::LargeListView(_)
- | DataType::Decimal32(_, _)
- | DataType::Decimal64(_, _) => {
+ | DataType::LargeListView(_) => {
return _not_impl_err!(
"Unsupported creation of {:?} array from ScalarValue {:?}",
data_type,
diff --git c/datafusion/core/src/dataframe/parquet.rs i/datafusion/core/src/dataframe/parquet.rs
index 01149c1ec..83bb60184 100644
--- c/datafusion/core/src/dataframe/parquet.rs
+++ i/datafusion/core/src/dataframe/parquet.rs
@@ -278,7 +278,6 @@ mod tests {
// Write encrypted parquet using write_parquet
let mut options = TableParquetOptions::default();
options.crypto.file_encryption = Some((&encrypt).into());
- options.global.allow_single_file_parallelism = true;
df.write_parquet(
tempfile_str.as_str(),
diff --git c/datafusion/core/tests/fuzz_cases/pruning.rs i/datafusion/core/tests/fuzz_cases/pruning.rs
index 4ab1f08f1..c6e30c072 100644
--- c/datafusion/core/tests/fuzz_cases/pruning.rs
+++ i/datafusion/core/tests/fuzz_cases/pruning.rs
@@ -314,7 +314,7 @@ async fn execute_with_predicate(
}
async fn write_parquet_file(
- _truncation_length: Option<usize>,
+ truncation_length: Option<usize>,
schema: Arc<Schema>,
row_groups: Vec<Vec<String>>,
) -> Bytes {
diff --git c/datafusion/datasource-avro/src/avro_to_arrow/schema.rs i/datafusion/datasource-avro/src/avro_to_arrow/schema.rs
index 00b3f9d6d..cc87d3c1c 100644
--- c/datafusion/datasource-avro/src/avro_to_arrow/schema.rs
+++ i/datafusion/datasource-avro/src/avro_to_arrow/schema.rs
@@ -239,8 +239,6 @@ fn default_field_name(dt: &DataType) -> &str {
DataType::Decimal64(_, _) => "decimal",
DataType::Decimal128(_, _) => "decimal",
DataType::Decimal256(_, _) => "decimal",
- DataType::Decimal32(_, _) => "decimal",
- DataType::Decimal64(_, _) => "decimal",
}
}
diff --git c/datafusion/datasource-parquet/src/file_format.rs i/datafusion/datasource-parquet/src/file_format.rs
index 934a7b2ee..b16764534 100644
--- c/datafusion/datasource-parquet/src/file_format.rs
+++ i/datafusion/datasource-parquet/src/file_format.rs
@@ -79,10 +79,10 @@ use object_store::{ObjectMeta, ObjectStore};
use parquet::arrow::arrow_reader::statistics::StatisticsConverter;
use parquet::arrow::arrow_writer::{
compute_leaves, ArrowColumnChunk, ArrowColumnWriter, ArrowLeafColumn,
- ArrowRowGroupWriterFactory, ArrowWriterOptions,
+ ArrowWriterOptions,
};
use parquet::arrow::async_reader::MetadataFetch;
-use parquet::arrow::{parquet_to_arrow_schema, ArrowSchemaConverter, AsyncArrowWriter};
+use parquet::arrow::{parquet_to_arrow_schema, ArrowSchemaConverter, ArrowWriter, AsyncArrowWriter};
use parquet::basic::Type;
use datafusion_execution::cache::cache_manager::FileMetadataCache;
@@ -1698,13 +1698,10 @@ type ColSender = Sender<ArrowLeafColumn>;
/// Returns join handles for each columns serialization task along with a send channel
/// to send arrow arrays to each serialization task.
fn spawn_column_parallel_row_group_writer(
- arrow_row_group_writer_factory: Arc<ArrowRowGroupWriterFactory>,
+ col_writers: Vec<ArrowColumnWriter>,
max_buffer_size: usize,
pool: &Arc<dyn MemoryPool>,
) -> Result<(Vec<ColumnWriterTask>, Vec<ColSender>)> {
- let arrow_row_group_writer =
- arrow_row_group_writer_factory.create_row_group_writer(0)?;
- let col_writers = arrow_row_group_writer.into_column_writers();
let num_columns = col_writers.len();
let mut col_writer_tasks = Vec::with_capacity(num_columns);
@@ -1799,7 +1796,7 @@ fn spawn_rg_join_and_finalize_task(
/// across both columns and row_groups, with a theoretical max number of parallel tasks
/// given by n_columns * num_row_groups.
fn spawn_parquet_parallel_serialization_task(
- arrow_row_group_writer_factory: Arc<ArrowRowGroupWriterFactory>,
+ arrow_writer: ArrowWriter<SerializedFileWriter<SharedBuffer>>,
mut data: Receiver<RecordBatch>,
serialize_tx: Sender<SpawnedTask<RBStreamSerializeResult>>,
schema: Arc<Schema>,
@@ -1810,9 +1807,10 @@ fn spawn_parquet_parallel_serialization_task(
SpawnedTask::spawn(async move {
let max_buffer_rb = parallel_options.max_buffered_record_batches_per_stream;
let max_row_group_rows = writer_props.max_row_group_size();
+ let col_writers = arrow_writer.get_column_writers().unwrap();
let (mut column_writer_handles, mut col_array_channels) =
spawn_column_parallel_row_group_writer(
- Arc::clone(&arrow_row_group_writer_factory),
+ col_writers,
max_buffer_rb,
&pool,
)?;
@@ -1866,7 +1864,7 @@ fn spawn_parquet_parallel_serialization_task(
(column_writer_handles, col_array_channels) =
spawn_column_parallel_row_group_writer(
- Arc::clone(&arrow_row_group_writer_factory),
+ col_writers,
max_buffer_rb,
&pool,
)?;
@@ -1965,16 +1963,12 @@ async fn output_single_parquet_file_parallelized(
parquet_schema.root_schema_ptr(),
parquet_props.clone().into(),
)?;
- let arrow_row_group_writer_factory = ArrowRowGroupWriterFactory::new(
- &parquet_writer,
- parquet_schema,
- Arc::clone(&output_schema),
- parquet_props.clone().into(),
- );
+ let writer = ArrowWriter::try_new(
+ parquet_writer, Arc::clone(&output_schema), Some(parquet_props.clone()))?;
let arc_props = Arc::new(parquet_props.clone());
let launch_serialization_task = spawn_parquet_parallel_serialization_task(
- Arc::new(arrow_row_group_writer_factory),
+ writer,
data,
serialize_tx,
Arc::clone(&output_schema),
diff --git c/datafusion/expr/src/utils.rs i/datafusion/expr/src/utils.rs
index cd8e419ac..7a612b6fe 100644
--- c/datafusion/expr/src/utils.rs
+++ i/datafusion/expr/src/utils.rs
@@ -818,8 +818,6 @@ pub fn can_hash(data_type: &DataType) -> bool {
DataType::Decimal64(_, _) => true,
DataType::Decimal128(_, _) => true,
DataType::Decimal256(_, _) => true,
- DataType::Decimal32(_, _) => true,
- DataType::Decimal64(_, _) => true,
DataType::Timestamp(_, _) => true,
DataType::Utf8 => true,
DataType::LargeUtf8 => true,
diff --git c/datafusion/sql/src/unparser/expr.rs i/datafusion/sql/src/unparser/expr.rs
index 86c648cba..0501a4e04 100644
--- c/datafusion/sql/src/unparser/expr.rs
+++ i/datafusion/sql/src/unparser/expr.rs
@@ -1729,9 +1729,7 @@ impl Unparser<'_> {
not_impl_err!("Unsupported DataType: conversion: {data_type:?}")
}
DataType::Decimal128(precision, scale)
- | DataType::Decimal256(precision, scale)
- | DataType::Decimal32(precision, scale)
- | DataType::Decimal64(precision, scale) => {
+ | DataType::Decimal256(precision, scale) => {
let mut new_precision = *precision as u64;
let mut new_scale = *scale as u64;
if *scale < 0 {
diff --git c/datafusion/sqllogictest/test_files/copy.slt i/datafusion/sqllogictest/test_files/copy.slt
index e16fcfe84..096cde86f 100644
--- c/datafusion/sqllogictest/test_files/copy.slt
+++ i/datafusion/sqllogictest/test_files/copy.slt
@@ -306,7 +306,7 @@ select * from validate_struct_with_array;
# Copy parquet with all supported statement overrides
-query error DataFusion error: Invalid or Unsupported Configuration: Config value "max_statistics_size" not found on ParquetOptions
+query I
COPY source_table
TO 'test_files/scratch/copy/table_with_options/'
STORED AS PARQUET
@@ -336,6 +336,8 @@ OPTIONS (
'format.bloom_filter_ndv' 100,
'format.metadata::key' 'value'
)
+----
+2
# valid vs invalid metadata
@@ -402,8 +404,11 @@ OPTIONS (
statement ok
CREATE EXTERNAL TABLE validate_parquet_with_options STORED AS PARQUET LOCATION 'test_files/scratch/copy/table_with_options/';
-statement count 0
+query IT
select * from validate_parquet_with_options;
+----
+1 Foo
+2 Bar
# Copy from table to single file
query I
diff --git c/datafusion/substrait/src/logical_plan/consumer/utils.rs i/datafusion/substrait/src/logical_plan/consumer/utils.rs
index f809bc82a..f7eedcb7a 100644
--- c/datafusion/substrait/src/logical_plan/consumer/utils.rs
+++ i/datafusion/substrait/src/logical_plan/consumer/utils.rs
@@ -216,9 +216,7 @@ pub fn rename_data_type(
| DataType::Decimal32(_, _)
| DataType::Decimal64(_, _)
| DataType::Decimal128(_, _)
- | DataType::Decimal256(_, _)
- | DataType::Decimal32(_, _)
- | DataType::Decimal64(_, _) => Ok(data_type.clone()),
+ | DataType::Decimal256(_, _) => Ok(data_type.clone()),
}
}
diff --git c/Cargo.lock i/Cargo.lock
index 5816269a6..c94dea9f6 100644
--- c/Cargo.lock
+++ i/Cargo.lock
@@ -246,62 +246,62 @@ checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50"
[[package]]
name = "arrow"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-arith 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-arith 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"arrow-csv",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"arrow-json",
- "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-ord 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"arrow-pyarrow",
- "arrow-row 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-string 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-row 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-string 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"half",
"rand 0.9.2",
]
[[package]]
name = "arrow-arith"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"chrono",
"num",
]
[[package]]
name = "arrow-arith"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
"chrono",
"num",
]
[[package]]
name = "arrow-array"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
"ahash 0.8.12",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"chrono",
"chrono-tz",
"half",
@@ -311,13 +311,13 @@ dependencies = [
[[package]]
name = "arrow-array"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
"ahash 0.8.12",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
"chrono",
"half",
"hashbrown 0.15.4",
@@ -326,8 +326,8 @@ dependencies = [
[[package]]
name = "arrow-buffer"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
"bytes",
"half",
@@ -336,8 +336,8 @@ dependencies = [
[[package]]
name = "arrow-buffer"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
"bytes",
"half",
@@ -346,14 +346,14 @@ dependencies = [
[[package]]
name = "arrow-cast"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"atoi",
"base64 0.22.1",
"chrono",
@@ -366,14 +366,14 @@ dependencies = [
[[package]]
name = "arrow-cast"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
"atoi",
"base64 0.22.1",
"chrono",
@@ -385,12 +385,12 @@ dependencies = [
[[package]]
name = "arrow-csv"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"chrono",
"csv",
"csv-core",
@@ -399,42 +399,42 @@ dependencies = [
[[package]]
name = "arrow-data"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
dependencies = [
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
"half",
"num",
]
[[package]]
name = "arrow-data"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
"half",
"num",
]
[[package]]
name = "arrow-flight"
-version = "55.2.0"
-source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58"
+version = "56.0.0"
+source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
dependencies = [
- "arrow-arith 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)",…
diff --git c/Cargo.lock i/Cargo.lock
index 373239aab..145b6e576 100644
--- c/Cargo.lock
+++ i/Cargo.lock
@@ -247,22 +247,22 @@ checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50"
[[package]]
name = "arrow"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-arith 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-arith",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-cast",
"arrow-csv",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data",
+ "arrow-ipc",
"arrow-json",
- "arrow-ord 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-ord",
"arrow-pyarrow",
- "arrow-row 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-string 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-row",
+ "arrow-schema",
+ "arrow-select",
+ "arrow-string",
"half",
"rand 0.9.2",
]
@@ -270,25 +270,12 @@ dependencies = [
[[package]]
name = "arrow-arith"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "chrono",
- "num",
-]
-
-[[package]]
-name = "arrow-arith"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
"chrono",
"num",
]
@@ -296,12 +283,12 @@ dependencies = [
[[package]]
name = "arrow-array"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
"ahash 0.8.12",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
"chrono",
"chrono-tz",
"half",
@@ -309,35 +296,10 @@ dependencies = [
"num",
]
-[[package]]
-name = "arrow-array"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "ahash 0.8.12",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "chrono",
- "half",
- "hashbrown 0.15.4",
- "num",
-]
-
[[package]]
name = "arrow-buffer"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
-dependencies = [
- "bytes",
- "half",
- "num",
-]
-
-[[package]]
-name = "arrow-buffer"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
"bytes",
"half",
@@ -347,13 +309,13 @@ dependencies = [
[[package]]
name = "arrow-cast"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
+ "arrow-select",
"atoi",
"base64 0.22.1",
"chrono",
@@ -364,33 +326,14 @@ dependencies = [
"ryu",
]
-[[package]]
-name = "arrow-cast"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "atoi",
- "base64 0.22.1",
- "chrono",
- "half",
- "lexical-core",
- "num",
- "ryu",
-]
-
[[package]]
name = "arrow-csv"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-array",
+ "arrow-cast",
+ "arrow-schema",
"chrono",
"csv",
"csv-core",
@@ -400,21 +343,10 @@ dependencies = [
[[package]]
name = "arrow-data"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "half",
- "num",
-]
-
-[[package]]
-name = "arrow-data"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer",
+ "arrow-schema",
"half",
"num",
]
@@ -422,19 +354,19 @@ dependencies = [
[[package]]
name = "arrow-flight"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-arith 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-ord 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-row 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-string 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-arith",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-cast",
+ "arrow-data",
+ "arrow-ipc",
+ "arrow-ord",
+ "arrow-row",
+ "arrow-schema",
+ "arrow-select",
+ "arrow-string",
"base64 0.22.1",
"bytes",
"futures",
@@ -448,39 +380,27 @@ dependencies = [
[[package]]
name = "arrow-ipc"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
"flatbuffers",
"lz4_flex",
"zstd",
]
-[[package]]
-name = "arrow-ipc"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "flatbuffers",
-]
-
[[package]]
name = "arrow-json"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-cast",
+ "arrow-data",
+ "arrow-schema",
"chrono",
"half",
"indexmap 2.10.0",
@@ -495,129 +415,71 @@ dependencies = [
[[package]]
name = "arrow-ord"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
-]
-
-[[package]]
-name = "arrow-ord"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
+ "arrow-select",
]
[[package]]
name = "arrow-pyarrow"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-array",
+ "arrow-data",
+ "arrow-schema",
"pyo3",
]
[[package]]
name = "arrow-row"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "half",
-]
-
-[[package]]
-name = "arrow-row"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
"half",
]
[[package]]
name = "arrow-schema"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
"bitflags 2.9.1",
"serde",
"serde_json",
]
-[[package]]
-name = "arrow-schema"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-
[[package]]
name = "arrow-select"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
"ahash 0.8.12",
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "num",
-]
-
-[[package]]
-name = "arrow-select"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "ahash 0.8.12",
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
"num",
]
[[package]]
name = "arrow-string"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "memchr",
- "num",
- "regex",
- "regex-syntax",
-]
-
-[[package]]
-name = "arrow-string"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
+ "arrow-select",
"memchr",
"num",
"regex",
@@ -1940,8 +1802,8 @@ name = "datafusion"
version = "49.0.1"
dependencies = [
"arrow",
- "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-ipc",
+ "arrow-schema",
"async-trait",
"bytes",
"bzip2 0.6.0",
@@ -2118,7 +1980,7 @@ dependencies = [
"ahash 0.8.12",
"apache-avro",
"arrow",
- "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-ipc",
"base64 0.22.1",
"chrono",
"half",
@@ -2298,7 +2160,7 @@ version = "49.0.1"
dependencies = [
"arrow",
"arrow-flight",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema",
"async-trait",
"base64 0.22.1",
"bytes",
@@ -2386,7 +2248,7 @@ version = "49.0.1"
dependencies = [
"abi_stable",
"arrow",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema",
"async-ffi",
"async-trait",
"datafusion",
@@ -2406,7 +2268,7 @@ name = "datafusion-functions"
version = "49.0.1"
dependencies = [
"arrow",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer",
"base64 0.22.1",
"blake2",
"blake3",
@@ -2469,7 +2331,7 @@ name = "datafusion-functions-nested"
version = "49.0.1"
dependencies = [
"arrow",
- "arrow-ord 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-ord",
"criterion",
"datafusion-common",
"datafusion-doc",
@@ -2639,8 +2501,8 @@ version = "49.0.1"
dependencies = [
"ahash 0.8.12",
"arrow",
- "arrow-ord 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-ord",
+ "arrow-schema",
"async-trait",
"chrono",
"criterion",
@@ -2711,7 +2573,7 @@ name = "datafusion-pruning"
version = "49.0.1"
dependencies = [
"arrow",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema",
"datafusion-common",
"datafusion-datasource",
"datafusion-expr",
@@ -4652,16 +4514,16 @@ dependencies = [
[[package]]
name = "parquet"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
"ahash 0.8.12",
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-cast",
+ "arrow-data",
+ "arrow-ipc",
+ "arrow-schema",
+ "arrow-select",
"base64 0.22.1",
"brotli",
"bytes",
diff --git c/datafusion/datasource-parquet/src/file_format.rs i/datafusion/datasource-parquet/src/file_format.rs
index 93c5f83e2..a3c1a2b01 100644
--- c/datafusion/datasource-parquet/src/file_format.rs
+++ i/datafusion/datasource-parquet/src/file_format.rs
@@ -82,7 +82,9 @@ use parquet::arrow::arrow_writer::{
ArrowWriterOptions,
};
use parquet::arrow::async_reader::MetadataFetch;
-use parquet::arrow::{parquet_to_arrow_schema, ArrowSchemaConverter, ArrowWriter, AsyncArrowWriter};
+use parquet::arrow::{
+ parquet_to_arrow_schema, ArrowSchemaConverter, ArrowWriter, AsyncArrowWriter,
+};
use parquet::basic::Type;
use datafusion_execution::cache::cache_manager::FileMetadataCache;
@@ -1809,11 +1811,7 @@ fn spawn_parquet_parallel_serialization_task(
let max_row_group_rows = writer_props.max_row_group_size();
let col_writers = arrow_writer.get_column_writers().unwrap();
let (mut column_writer_handles, mut col_array_channels) =
- spawn_column_parallel_row_group_writer(
- col_writers,
- max_buffer_rb,
- &pool,
- )?;
+ spawn_column_parallel_row_group_writer(col_writers, max_buffer_rb, &pool)?;
let mut current_rg_rows = 0;
// TODO: row_group_writer should use the correct row group index. Currently this would fail if
// multiple row groups were written.
@@ -1896,6 +1894,8 @@ fn spawn_parquet_parallel_serialization_task(
/// Consume RowGroups serialized by other parallel tasks and concatenate them in
/// to the final parquet file, while flushing finalized bytes to an [ObjectStore]
async fn concatenate_parallel_row_groups(
+ // TODO
+ // mut arrow_writer: ArrowWriter<SharedBuffer>,
mut parquet_writer: SerializedFileWriter<SharedBuffer>,
merged_buff: SharedBuffer,
mut serialize_rx: Receiver<SpawnedTask<RBStreamSerializeResult>>,
@@ -1910,6 +1910,12 @@ async fn concatenate_parallel_row_groups(
let mut rg_out = parquet_writer.next_row_group()?;
let (serialized_columns, mut rg_reservation, _cnt) =
result.map_err(|e| DataFusionError::ExecutionJoin(Box::new(e)))??;
+ // TODO: use arrow_writer.append_row_group
+ // let mut finalized_rg = Vec::with_capacity(serialized_columns.len());
+ // for task in serialized_columns {
+ // finalized_rg.push(task);
+ // }
+ // arrow_writer.append_row_group(finalized_rg)?;
for chunk in serialized_columns {
chunk.append_to_row_group(&mut rg_out)?;
rg_reservation.free();
@@ -1965,7 +1971,10 @@ async fn output_single_parquet_file_parallelized(
parquet_props.clone().into(),
)?;
let writer = ArrowWriter::try_new(
- merged_buff.clone(), Arc::clone(&output_schema), Some(parquet_props.clone()))?;
+ merged_buff.clone(),
+ Arc::clone(&output_schema),
+ Some(parquet_props.clone()),
+ )?;
let arc_props = Arc::new(parquet_props.clone());
let launch_serialization_task = spawn_parquet_parallel_serialization_task(
@@ -1984,6 +1993,8 @@ async fn output_single_parquet_file_parallelized(
.map_err(|e| DataFusionError::ExecutionJoin(Box::new(e)))??;
let file_metadata = concatenate_parallel_row_groups(
+ // TODO
+ // writer,
parquet_writer,
merged_buff,
serialize_rx,
diff --git c/Cargo.lock i/Cargo.lock
index c94dea9f6..3dd68862f 100644
--- c/Cargo.lock
+++ i/Cargo.lock
@@ -247,22 +247,22 @@ checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50"
[[package]]
name = "arrow"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-arith 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-arith",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-cast",
"arrow-csv",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-data",
+ "arrow-ipc",
"arrow-json",
- "arrow-ord 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-ord",
"arrow-pyarrow",
- "arrow-row 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-string 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-row",
+ "arrow-schema",
+ "arrow-select",
+ "arrow-string",
"half",
"rand 0.9.2",
]
@@ -270,25 +270,12 @@ dependencies = [
[[package]]
name = "arrow-arith"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "chrono",
- "num",
-]
-
-[[package]]
-name = "arrow-arith"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
"chrono",
"num",
]
@@ -296,12 +283,12 @@ dependencies = [
[[package]]
name = "arrow-array"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
"ahash 0.8.12",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
"chrono",
"chrono-tz",
"half",
@@ -309,35 +296,10 @@ dependencies = [
"num",
]
-[[package]]
-name = "arrow-array"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "ahash 0.8.12",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "chrono",
- "half",
- "hashbrown 0.15.4",
- "num",
-]
-
[[package]]
name = "arrow-buffer"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
-dependencies = [
- "bytes",
- "half",
- "num",
-]
-
-[[package]]
-name = "arrow-buffer"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
"bytes",
"half",
@@ -347,13 +309,13 @@ dependencies = [
[[package]]
name = "arrow-cast"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
+ "arrow-select",
"atoi",
"base64 0.22.1",
"chrono",
@@ -364,33 +326,14 @@ dependencies = [
"ryu",
]
-[[package]]
-name = "arrow-cast"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "atoi",
- "base64 0.22.1",
- "chrono",
- "half",
- "lexical-core",
- "num",
- "ryu",
-]
-
[[package]]
name = "arrow-csv"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-array",
+ "arrow-cast",
+ "arrow-schema",
"chrono",
"csv",
"csv-core",
@@ -400,21 +343,10 @@ dependencies = [
[[package]]
name = "arrow-data"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "half",
- "num",
-]
-
-[[package]]
-name = "arrow-data"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-buffer",
+ "arrow-schema",
"half",
"num",
]
@@ -422,19 +354,19 @@ dependencies = [
[[package]]
name = "arrow-flight"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-arith 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-ord 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-row 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-string 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-arith",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-cast",
+ "arrow-data",
+ "arrow-ipc",
+ "arrow-ord",
+ "arrow-row",
+ "arrow-schema",
+ "arrow-select",
+ "arrow-string",
"base64 0.22.1",
"bytes",
"futures",
@@ -448,39 +380,27 @@ dependencies = [
[[package]]
name = "arrow-ipc"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
"flatbuffers",
"lz4_flex",
"zstd",
]
-[[package]]
-name = "arrow-ipc"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "flatbuffers",
-]
-
[[package]]
name = "arrow-json"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-cast",
+ "arrow-data",
+ "arrow-schema",
"chrono",
"half",
"indexmap 2.10.0",
@@ -495,129 +415,71 @@ dependencies = [
[[package]]
name = "arrow-ord"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
-]
-
-[[package]]
-name = "arrow-ord"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
+ "arrow-select",
]
[[package]]
name = "arrow-pyarrow"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-array",
+ "arrow-data",
+ "arrow-schema",
"pyo3",
]
[[package]]
name = "arrow-row"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "half",
-]
-
-[[package]]
-name = "arrow-row"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
"half",
]
[[package]]
name = "arrow-schema"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
"bitflags 2.9.1",
"serde",
"serde_json",
]
-[[package]]
-name = "arrow-schema"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-
[[package]]
name = "arrow-select"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
"ahash 0.8.12",
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "num",
-]
-
-[[package]]
-name = "arrow-select"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "ahash 0.8.12",
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
"num",
]
[[package]]
name = "arrow-string"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "memchr",
- "num",
- "regex",
- "regex-syntax",
-]
-
-[[package]]
-name = "arrow-string"
-version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git#876585c1cd986dbaee0c26d52b55a4186a2f68c8"
-dependencies = [
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-data",
+ "arrow-schema",
+ "arrow-select",
"memchr",
"num",
"regex",
@@ -1940,8 +1802,8 @@ name = "datafusion"
version = "49.0.1"
dependencies = [
"arrow",
- "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-ipc",
+ "arrow-schema",
"async-trait",
"bytes",
"bzip2 0.6.0",
@@ -2118,7 +1980,7 @@ dependencies = [
"ahash 0.8.12",
"apache-avro",
"arrow",
- "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-ipc",
"base64 0.22.1",
"chrono",
"half",
@@ -2298,7 +2160,7 @@ version = "49.0.1"
dependencies = [
"arrow",
"arrow-flight",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema",
"async-trait",
"base64 0.22.1",
"bytes",
@@ -2386,7 +2248,7 @@ version = "49.0.1"
dependencies = [
"abi_stable",
"arrow",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema",
"async-ffi",
"async-trait",
"datafusion",
@@ -2406,7 +2268,7 @@ name = "datafusion-functions"
version = "49.0.1"
dependencies = [
"arrow",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-buffer",
"base64 0.22.1",
"blake2",
"blake3",
@@ -2469,7 +2331,7 @@ name = "datafusion-functions-nested"
version = "49.0.1"
dependencies = [
"arrow",
- "arrow-ord 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-ord",
"criterion",
"datafusion-common",
"datafusion-doc",
@@ -2639,8 +2501,8 @@ version = "49.0.1"
dependencies = [
"ahash 0.8.12",
"arrow",
- "arrow-ord 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-ord",
+ "arrow-schema",
"async-trait",
"chrono",
"criterion",
@@ -2711,7 +2573,7 @@ name = "datafusion-pruning"
version = "49.0.1"
dependencies = [
"arrow",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-schema",
"datafusion-common",
"datafusion-datasource",
"datafusion-expr",
@@ -4652,16 +4514,16 @@ dependencies = [
[[package]]
name = "parquet"
version = "56.0.0"
-source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2#d9590212db94de291203220e2ed0beb808c69072"
+source = "git+https://github.com/apache/arrow-rs.git#7a5f6d3d48655bea190560a7e393cafb2c5eb073"
dependencies = [
"ahash 0.8.12",
- "arrow-array 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-buffer 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-cast 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-data 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-ipc 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-schema 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
- "arrow-select 56.0.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing_2)",
+ "arrow-array",
+ "arrow-buffer",
+ "arrow-cast",
+ "arrow-data",
+ "arrow-ipc",
+ "arrow-schema",
+ "arrow-select",
"base64 0.22.1",
"brotli",
"bytes",
diff --git c/datafusion/datasource-parquet/src/file_format.rs i/datafusion/datasource-parquet/src/file_format.rs
index 48f334265..b39df3353 100644
--- c/datafusion/datasource-parquet/src/file_format.rs
+++ i/datafusion/datasource-parquet/src/file_format.rs
@@ -78,7 +78,9 @@ use parquet::arrow::arrow_writer::{
ArrowWriterOptions,
};
use parquet::arrow::async_reader::MetadataFetch;
-use parquet::arrow::{ArrowSchemaConverter, ArrowWriter, AsyncArrowWriter};
+use parquet::arrow::{
+ ArrowSchemaConverter, ArrowWriter, AsyncArrowWriter,
+};
use parquet::basic::Type;
use crate::metadata::DFParquetMetadata;
@@ -1512,11 +1514,7 @@ fn spawn_parquet_parallel_serialization_task(
let max_row_group_rows = writer_props.max_row_group_size();
let col_writers = arrow_writer.get_column_writers().unwrap();
let (mut column_writer_handles, mut col_array_channels) =
- spawn_column_parallel_row_group_writer(
- col_writers,
- max_buffer_rb,
- &pool,
- )?;
+ spawn_column_parallel_row_group_writer(col_writers, max_buffer_rb, &pool)?;
let mut current_rg_rows = 0;
// TODO: row_group_writer should use the correct row group index. Currently this would fail if
// multiple row groups were written.
@@ -1599,6 +1597,8 @@ fn spawn_parquet_parallel_serialization_task(
/// Consume RowGroups serialized by other parallel tasks and concatenate them in
/// to the final parquet file, while flushing finalized bytes to an [ObjectStore]
async fn concatenate_parallel_row_groups(
+ // TODO
+ // mut arrow_writer: ArrowWriter<SharedBuffer>,
mut parquet_writer: SerializedFileWriter<SharedBuffer>,
merged_buff: SharedBuffer,
mut serialize_rx: Receiver<SpawnedTask<RBStreamSerializeResult>>,
@@ -1613,6 +1613,12 @@ async fn concatenate_parallel_row_groups(
let mut rg_out = parquet_writer.next_row_group()?;
let (serialized_columns, mut rg_reservation, _cnt) =
result.map_err(|e| DataFusionError::ExecutionJoin(Box::new(e)))??;
+ // TODO: use arrow_writer.append_row_group
+ // let mut finalized_rg = Vec::with_capacity(serialized_columns.len());
+ // for task in serialized_columns {
+ // finalized_rg.push(task);
+ // }
+ // arrow_writer.append_row_group(finalized_rg)?;
for chunk in serialized_columns {
chunk.append_to_row_group(&mut rg_out)?;
rg_reservation.free();
@@ -1668,7 +1674,10 @@ async fn output_single_parquet_file_parallelized(
parquet_props.clone().into(),
)?;
let writer = ArrowWriter::try_new(
- merged_buff.clone(), Arc::clone(&output_schema), Some(parquet_props.clone()))?;
+ merged_buff.clone(),
+ Arc::clone(&output_schema),
+ Some(parquet_props.clone()),
+ )?;
let arc_props = Arc::new(parquet_props.clone());
let launch_serialization_task = spawn_parquet_parallel_serialization_task(
@@ -1687,6 +1696,8 @@ async fn output_single_parquet_file_parallelized(
.map_err(|e| DataFusionError::ExecutionJoin(Box::new(e)))??;
let file_metadata = concatenate_parallel_row_groups(
+ // TODO
+ // writer,
parquet_writer,
merged_buff,
serialize_rx,
diff --git c/datafusion/datasource-parquet/src/file_format.rs i/datafusion/datasource-parquet/src/file_format.rs
index b495d9762..f3cc9d560 100644
--- c/datafusion/datasource-parquet/src/file_format.rs
+++ i/datafusion/datasource-parquet/src/file_format.rs
@@ -78,7 +78,9 @@ use parquet::arrow::arrow_writer::{
ArrowWriterOptions,
};
use parquet::arrow::async_reader::MetadataFetch;
-use parquet::arrow::{ArrowSchemaConverter, ArrowWriter, AsyncArrowWriter};
+use parquet::arrow::{
+ ArrowSchemaConverter, ArrowWriter, AsyncArrowWriter,
+};
use parquet::basic::Type;
use crate::metadata::DFParquetMetadata;
@@ -1520,11 +1522,7 @@ fn spawn_parquet_parallel_serialization_task(
let max_row_group_rows = writer_props.max_row_group_size();
let col_writers = arrow_writer.get_column_writers().unwrap();
let (mut column_writer_handles, mut col_array_channels) =
- spawn_column_parallel_row_group_writer(
- col_writers,
- max_buffer_rb,
- &pool,
- )?;
+ spawn_column_parallel_row_group_writer(col_writers, max_buffer_rb, &pool)?;
let mut current_rg_rows = 0;
// TODO: row_group_writer should use the correct row group index. Currently this would fail if
// multiple row groups were written.
@@ -1607,6 +1605,8 @@ fn spawn_parquet_parallel_serialization_task(
/// Consume RowGroups serialized by other parallel tasks and concatenate them in
/// to the final parquet file, while flushing finalized bytes to an [ObjectStore]
async fn concatenate_parallel_row_groups(
+ // TODO
+ // mut arrow_writer: ArrowWriter<SharedBuffer>,
mut parquet_writer: SerializedFileWriter<SharedBuffer>,
merged_buff: SharedBuffer,
mut serialize_rx: Receiver<SpawnedTask<RBStreamSerializeResult>>,
@@ -1621,6 +1621,12 @@ async fn concatenate_parallel_row_groups(
let mut rg_out = parquet_writer.next_row_group()?;
let (serialized_columns, mut rg_reservation, _cnt) =
result.map_err(|e| DataFusionError::ExecutionJoin(Box::new(e)))??;
+ // TODO: use arrow_writer.append_row_group
+ // let mut finalized_rg = Vec::with_capacity(serialized_columns.len());
+ // for task in serialized_columns {
+ // finalized_rg.push(task);
+ // }
+ // arrow_writer.append_row_group(finalized_rg)?;
for chunk in serialized_columns {
chunk.append_to_row_group(&mut rg_out)?;
rg_reservation.free();
@@ -1676,7 +1682,10 @@ async fn output_single_parquet_file_parallelized(
parquet_props.clone().into(),
)?;
let writer = ArrowWriter::try_new(
- merged_buff.clone(), Arc::clone(&output_schema), Some(parquet_props.clone()))?;
+ merged_buff.clone(),
+ Arc::clone(&output_schema),
+ Some(parquet_props.clone()),
+ )?;
let arc_props = Arc::new(parquet_props.clone());
let launch_serialization_task = spawn_parquet_parallel_serialization_task(
@@ -1695,6 +1704,8 @@ async fn output_single_parquet_file_parallelized(
.map_err(|e| DataFusionError::ExecutionJoin(Box::new(e)))??;
let file_metadata = concatenate_parallel_row_groups(
+ // TODO
+ // writer,
parquet_writer,
merged_buff,
serialize_rx,
diff --git c/datafusion/datasource-parquet/src/file_format.rs i/datafusion/datasource-parquet/src/file_format.rs index b39df33..2d60d02 100644 --- c/datafusion/datasource-parquet/src/file_format.rs +++ i/datafusion/datasource-parquet/src/file_format.rs @@ -88,7 +88,6 @@ use datafusion_execution::cache::cache_manager::FileMetadataCache; use parquet::errors::ParquetError; use parquet::file::metadata::ParquetMetaData; use parquet::file::properties::{WriterProperties, WriterPropertiesBuilder}; -use parquet::file::writer::SerializedFileWriter; use parquet::format::FileMetaData; use parquet::schema::types::SchemaDescriptor; use tokio::io::{AsyncWrite, AsyncWriteExt}; @@ -1508,7 +1507,7 @@ fn spawn_parquet_parallel_serialization_task( writer_props: Arc<WriterProperties>, parallel_options: ParallelParquetWriterOptions, pool: Arc<dyn MemoryPool>, -) -> SpawnedTask<Result<(), DataFusionError>> { +) -> SpawnedTask<Result<ArrowWriter<SharedBuffer>, DataFusionError>> { SpawnedTask::spawn(async move { let max_buffer_rb = parallel_options.max_buffered_record_batches_per_stream; let max_row_group_rows = writer_props.max_row_group_size(); @@ -1557,7 +1556,7 @@ fn spawn_parquet_parallel_serialization_task( // Do not surface error from closed channel (means something // else hit an error, and the plan is shutting down). if serialize_tx.send(finalize_rg_task).await.is_err() { - return Ok(()); + return Ok(arrow_writer); } current_rg_rows = 0; @@ -1586,20 +1585,18 @@ fn spawn_parquet_parallel_serialization_task( // Do not surface error from closed channel (means something // else hit an error, and the plan is shutting down). if serialize_tx.send(finalize_rg_task).await.is_err() { - return Ok(()); + return Ok(arrow_writer); } } - Ok(()) + Ok(arrow_writer) }) } /// Consume RowGroups serialized by other parallel tasks and concatenate them in /// to the final parquet file, while flushing finalized bytes to an [ObjectStore] async fn concatenate_parallel_row_groups( - // TODO - // mut arrow_writer: ArrowWriter<SharedBuffer>, - mut parquet_writer: SerializedFileWriter<SharedBuffer>, + mut arrow_writer: ArrowWriter<SharedBuffer>, merged_buff: SharedBuffer, mut serialize_rx: Receiver<SpawnedTask<RBStreamSerializeResult>>, mut object_store_writer: Box<dyn AsyncWrite + Send + Unpin>, @@ -1610,19 +1607,14 @@ async fn concatenate_parallel_row_groups( while let Some(task) = serialize_rx.recv().await { let result = task.join_unwind().await; - let mut rg_out = parquet_writer.next_row_group()?; let (serialized_columns, mut rg_reservation, _cnt) = result.map_err(|e| DataFusionError::ExecutionJoin(Box::new(e)))??; - // TODO: use arrow_writer.append_row_group - // let mut finalized_rg = Vec::with_capacity(serialized_columns.len()); - // for task in serialized_columns { - // finalized_rg.push(task); - // } - // arrow_writer.append_row_group(finalized_rg)?; - for chunk in serialized_columns { - chunk.append_to_row_group(&mut rg_out)?; - rg_reservation.free(); + let mut finalized_rg = Vec::with_capacity(serialized_columns.len()); + for task in serialized_columns { + finalized_rg.push(task); + + rg_reservation.free(); let mut buff_to_flush = merged_buff.buffer.try_lock().unwrap(); file_reservation.try_resize(buff_to_flush.len())?; @@ -1634,10 +1626,10 @@ async fn concatenate_parallel_row_groups( file_reservation.try_resize(buff_to_flush.len())?; // will set to zero } } - rg_out.close()?; + arrow_writer.append_row_group(finalized_rg)?; } - let file_metadata = parquet_writer.close()?; + let file_metadata = arrow_writer.finish()?; let final_buff = merged_buff.buffer.try_lock().unwrap(); object_store_writer.write_all(final_buff.as_slice()).await?; @@ -1664,15 +1656,7 @@ async fn output_single_parquet_file_parallelized( let (serialize_tx, serialize_rx) = mpsc::channel::<SpawnedTask<RBStreamSerializeResult>>(max_rowgroups); - let parquet_schema = ArrowSchemaConverter::new() - .with_coerce_types(parquet_props.coerce_types()) - .convert(&output_schema)?; let merged_buff = SharedBuffer::new(INITIAL_BUFFER_BYTES); - let parquet_writer = SerializedFileWriter::new( - merged_buff.clone(), - parquet_schema.root_schema_ptr(), - parquet_props.clone().into(), - )?; let writer = ArrowWriter::try_new( merged_buff.clone(), Arc::clone(&output_schema), @@ -1690,15 +1674,13 @@ async fn output_single_parquet_file_parallelized( Arc::clone(&pool), ); - launch_serialization_task + let writer = launch_serialization_task .join_unwind() .await .map_err(|e| DataFusionError::ExecutionJoin(Box::new(e)))??; let file_metadata = concatenate_parallel_row_groups( - // TODO - // writer, - parquet_writer, + writer, merged_buff, serialize_rx, object_store_writer,
diff --git c/datafusion/datasource-parquet/src/file_format.rs i/datafusion/datasource-parquet/src/file_format.rs index aaed624..cf7df59 100644 --- c/datafusion/datasource-parquet/src/file_format.rs +++ i/datafusion/datasource-parquet/src/file_format.rs @@ -75,7 +75,7 @@ use object_store::path::Path; use object_store::{ObjectMeta, ObjectStore}; use parquet::arrow::arrow_writer::{ compute_leaves, ArrowColumnChunk, ArrowColumnWriter, ArrowLeafColumn, - ArrowWriterOptions, + ArrowRowGroupWriterFactory, ArrowWriterOptions, }; use parquet::arrow::async_reader::MetadataFetch; use parquet::arrow::{ @@ -88,6 +88,7 @@ use datafusion_execution::cache::cache_manager::FileMetadataCache; use parquet::errors::ParquetError; use parquet::file::metadata::ParquetMetaData; use parquet::file::properties::{WriterProperties, WriterPropertiesBuilder}; +use parquet::file::writer::SerializedFileWriter; use parquet::format::FileMetaData; use parquet::schema::types::SchemaDescriptor; use tokio::io::{AsyncWrite, AsyncWriteExt}; @@ -1498,18 +1499,20 @@ fn spawn_rg_join_and_finalize_task( /// across both columns and row_groups, with a theoretical max number of parallel tasks /// given by n_columns * num_row_groups. fn spawn_parquet_parallel_serialization_task( - mut arrow_writer: ArrowWriter<SharedBuffer>, + row_group_writer_factory: ArrowRowGroupWriterFactory, mut data: Receiver<RecordBatch>, serialize_tx: Sender<SpawnedTask<RBStreamSerializeResult>>, schema: Arc<Schema>, writer_props: Arc<WriterProperties>, parallel_options: ParallelParquetWriterOptions, pool: Arc<dyn MemoryPool>, -) -> SpawnedTask<Result<ArrowWriter<SharedBuffer>, DataFusionError>> { +) -> SpawnedTask<Result<(), DataFusionError>> { SpawnedTask::spawn(async move { let max_buffer_rb = parallel_options.max_buffered_record_batches_per_stream; let max_row_group_rows = writer_props.max_row_group_size(); - let col_writers = arrow_writer.get_column_writers().unwrap(); + let mut row_group_index = 0; + let col_writers = + row_group_writer_factory.create_column_writers(row_group_index)?; let (mut column_writer_handles, mut col_array_channels) = spawn_column_parallel_row_group_writer(col_writers, max_buffer_rb, &pool)?; let mut current_rg_rows = 0; @@ -1551,13 +1554,15 @@ fn spawn_parquet_parallel_serialization_task( // Do not surface error from closed channel (means something // else hit an error, and the plan is shutting down). if serialize_tx.send(finalize_rg_task).await.is_err() { - return Ok(arrow_writer); + return Ok(()); } current_rg_rows = 0; rb = rb.slice(rows_left, rb.num_rows() - rows_left); - let col_writers = arrow_writer.get_column_writers().unwrap(); + row_group_index += 1; + let col_writers = row_group_writer_factory + .create_column_writers(row_group_index)?; (column_writer_handles, col_array_channels) = spawn_column_parallel_row_group_writer( col_writers, @@ -1580,18 +1585,18 @@ fn spawn_parquet_parallel_serialization_task( // Do not surface error from closed channel (means something // else hit an error, and the plan is shutting down). if serialize_tx.send(finalize_rg_task).await.is_err() { - return Ok(arrow_writer); + return Ok(()); } } - Ok(arrow_writer) + Ok(()) }) } /// Consume RowGroups serialized by other parallel tasks and concatenate them in /// to the final parquet file, while flushing finalized bytes to an [ObjectStore] async fn concatenate_parallel_row_groups( - mut arrow_writer: ArrowWriter<SharedBuffer>, + mut parquet_writer: SerializedFileWriter<SharedBuffer>, merged_buff: SharedBuffer, mut serialize_rx: Receiver<SpawnedTask<RBStreamSerializeResult>>, mut object_store_writer: Box<dyn AsyncWrite + Send + Unpin>, @@ -1605,9 +1610,9 @@ async fn concatenate_parallel_row_groups( let (serialized_columns, mut rg_reservation, _cnt) = result.map_err(|e| DataFusionError::ExecutionJoin(Box::new(e)))??; - let mut finalized_rg = Vec::with_capacity(serialized_columns.len()); - for task in serialized_columns { - finalized_rg.push(task); + let mut rg_out = parquet_writer.next_row_group()?; + for chunk in serialized_columns { + chunk.append_to_row_group(&mut rg_out)?; rg_reservation.free(); let mut buff_to_flush = merged_buff.buffer.try_lock().unwrap(); @@ -1621,10 +1626,10 @@ async fn concatenate_parallel_row_groups( file_reservation.try_resize(buff_to_flush.len())?; // will set to zero } } - arrow_writer.append_row_group(finalized_rg)?; + rg_out.close()?; } - let file_metadata = arrow_writer.finish()?; + let file_metadata = parquet_writer.close()?; let final_buff = merged_buff.buffer.try_lock().unwrap(); object_store_writer.write_all(final_buff.as_slice()).await?; @@ -1657,10 +1662,11 @@ async fn output_single_parquet_file_parallelized( Arc::clone(&output_schema), Some(parquet_props.clone()), )?; + let (writer, row_group_writer_factory) = writer.into_serialized_writer()?; let arc_props = Arc::new(parquet_props.clone()); let launch_serialization_task = spawn_parquet_parallel_serialization_task( - writer, + row_group_writer_factory, data, serialize_tx, Arc::clone(&output_schema), @@ -1669,11 +1675,6 @@ async fn output_single_parquet_file_parallelized( Arc::clone(&pool), ); - let writer = launch_serialization_task - .join_unwind() - .await - .map_err(|e| DataFusionError::ExecutionJoin(Box::new(e)))??; - let file_metadata = concatenate_parallel_row_groups( writer, merged_buff, @@ -1683,6 +1684,11 @@ async fn output_single_parquet_file_parallelized( ) .await?; + launch_serialization_task + .join_unwind() + .await + .map_err(|e| DataFusionError::ExecutionJoin(Box::new(e)))??; + Ok(file_metadata) }
cc9f7f4 to
ea3e5bb
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rok and @adamreeve -- I went over this carefully and it looks good to me
| let max_buffer_rb = parallel_options.max_buffered_record_batches_per_stream; | ||
| let max_row_group_rows = writer_props.max_row_group_size(); | ||
| let mut row_group_index = 0; | ||
| let col_writers = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is really nice to use the ArrowRowGroupWriterFactory API here for all writing on both paths 👍
* Use `Display` formatting of `DataType`:s in error messages (#17565) * Use Display formatting for DataTypes where I could find them * fix * More places * Less Debug * Cargo fmt * More cleanup * Plural types as Display * Fixes * Update some more tests and error messages * Update test snapshot * last (?) fixes * update another slt * Update instructions on how to run the tests * Ignore pending snapshot files in .gitignore * Running all the tests is so slow * just a trailing space * Update another test * Fix markdown formatting * Improve Display for NativeType * Update code related to error reporting of NativeType * Revert some formatting * fixelyfix * Another snapshot update * docs: Move Google Summer of Code 2025 pages to a section (#17504) * Move GSOC content to its own section * Update to 20205 * feat: Add `OR REPLACE` to creating external tables (#17580) * feat: Add `OR REPLACE` to creating external tables * regen * fmt * make more explicit + add tests * clipy fix --------- Co-authored-by: Dmitrii Blaginin <[email protected]> * `avg(distinct)` support for decimal types (#17560) * chore: mv `DistinctSumAccumulator` to common * feat: add avg distinct support for float64 type * chore: fmt * refactor: update import for DataType in Float64DistinctAvgAccumulator and remove unused sum_distinct module * feat: add avg distinct support for float64 type * feat: add avg distinct support for decimal * feat: more test for avg distinct in rust api * Remove DataFrame API tests for avg(distinct) * Remove proto test * Fix merge errors * Refactoring * Minor cleanup * Decimal slt tests for avg(distinct) * Fix state_fields for decimal distinct avg --------- Co-authored-by: YuNing Chen <[email protected]> Co-authored-by: Andrew Lamb <[email protected]> Co-authored-by: Dmitrii Blaginin <[email protected]> * chore(deps): bump taiki-e/install-action from 2.61.8 to 2.61.9 (#17640) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.8 to 2.61.9. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/2fdc5fd6ac805b0f8256893bd4c807bcb666af00...8ea32481661d5e04d602f215b94f17e4014b44f9) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.61.9 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump Swatinem/rust-cache from 2.8.0 to 2.8.1 (#17641) Bumps [Swatinem/rust-cache](https://github.com/swatinem/rust-cache) from 2.8.0 to 2.8.1. - [Release notes](https://github.com/swatinem/rust-cache/releases) - [Changelog](https://github.com/Swatinem/rust-cache/blob/master/CHANGELOG.md) - [Commits](https://github.com/swatinem/rust-cache/compare/98c8021b550208e191a6a3145459bfc9fb29c4c0...f13886b937689c021905a6b90929199931d60db1) --- updated-dependencies: - dependency-name: Swatinem/rust-cache dependency-version: 2.8.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Validate the memory consumption in SPM created by multi level merge (#17029) * use GreedyMemoryPool for sanity check * validate whether batch read from spill exceeds max_record_batch_mem * replace err with warn log * fix(SubqueryAlias): use maybe_project_redundant_column (#17478) * fix(SubqueryAlias): use maybe_project_redundant_column Fixes #17405 * chore: format * ci: retry * chore(SubqueryAlias): restructore duplicate detection and add tests * docs: add examples and context to the reproducer * minor: Ensure `datafusion-sql` package dependencies have `sql` flag (#17644) * optimizer: Rewrite `IS NOT DISTINCT FROM` joins as Hash Joins (#17319) * optimizer: Convert to Hash Join for join predicates like 'a IS NOT DISTINCT FROM b' * drop tables in slt * fix rust doc * Update datafusion/optimizer/src/extract_equijoin_predicate.rs Co-authored-by: Jonathan Chen <[email protected]> * Update datafusion/optimizer/src/extract_equijoin_predicate.rs * Update datafusion/sqllogictest/test_files/join_is_not_distinct_from.slt Co-authored-by: Nga Tran <[email protected]> * review: more tests and better error message * review: improve doc --------- Co-authored-by: Jonathan Chen <[email protected]> Co-authored-by: Nga Tran <[email protected]> Co-authored-by: Andrew Lamb <[email protected]> * Upgrade to arrow 56.1.0 (#17275) * Update to arrow/parquet 56.1.0 * Adjust for new parquet sizes, update for deprecated API * Thread through max_predicate_cache_size, add test * fix: Preserves field metadata when creating logical plan for VALUES expression (#17525) * [ISSUE 17425] Initial attempt to fix this problem * Add tests for the fix * Require that the metadata of values in VALUES clause must be identical * fix merge error --------- Co-authored-by: Andrew Lamb <[email protected]> * chore(deps): bump serde from 1.0.223 to 1.0.225 (#17614) Bumps [serde](https://github.com/serde-rs/serde) from 1.0.223 to 1.0.225. - [Release notes](https://github.com/serde-rs/serde/releases) - [Commits](https://github.com/serde-rs/serde/compare/v1.0.223...v1.0.225) --- updated-dependencies: - dependency-name: serde dependency-version: 1.0.225 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dmitrii Blaginin <[email protected]> * chore: Update dynamic filter formatting (#17647) * chore: update dynamic filter formatting to indicate expr is placeholder * update tests * update tests * chore(deps): bump taiki-e/install-action from 2.61.9 to 2.61.10 (#17660) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.9 to 2.61.10. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/8ea32481661d5e04d602f215b94f17e4014b44f9...0aa4f22591557b744fe31e55dbfcdfea74a073f7) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.61.10 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * proto: don't include parquet feature by default (#17577) * feat: add support for RightAnti and RightSemi join types (#17604) Closes #17603 * minor: Ensure `proto` crate has datetime & unicode expr flags in datafusion dev dependency (#17656) * minor: Ensure `proto` crate has datetime & unicode expr flags in datafusion dev dependency * toml formatting * chore(deps): bump indexmap from 2.11.3 to 2.11.4 (#17661) Bumps [indexmap](https://github.com/indexmap-rs/indexmap) from 2.11.3 to 2.11.4. - [Changelog](https://github.com/indexmap-rs/indexmap/blob/main/RELEASES.md) - [Commits](https://github.com/indexmap-rs/indexmap/compare/2.11.3...2.11.4) --- updated-dependencies: - dependency-name: indexmap dependency-version: 2.11.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * docs: add xorq to list of known users (#17668) * Introduce `TypeSignatureClass::Binary` to allow accepting arbitrarily sized `FixedSizeBinary` arguments (#17531) * Introduce wildcard const for FixedSizeBinary type signature * Add Binary to TypeSignatureClass * Remove FIXED_SIZE_BINARY_WILDCARD * docs: deduplicate links in `introduction.md` (#17669) * docs: deduplicate links in `introduction.md` * Further simplifications * Fix * Add explicit PMC/committers list to governance docs page (#17574) * Add committers explicitly to governance page, with script * add license header * Update Wes McKinney's affiliation in governance.md * Update adriangb's affiliation * Update affiliation * Andy Grove Affiliation * Update Qi Zhu affiliation * Updatd linwei's info * Update docs/source/contributor-guide/governance.md * Update docs/source/contributor-guide/governance.md * Apply suggestions from code review Co-authored-by: Oleks V <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]> * Apply suggestions from code review Co-authored-by: Alex Huang <[email protected]> Co-authored-by: Yang Jiang <[email protected]> Co-authored-by: Yongting You <[email protected]> * Apply suggestions from code review Co-authored-by: Yijie Shen <[email protected]> * Apply suggestions from code review * Apply suggestions from code review Co-authored-by: Brent Gardner <[email protected]> Co-authored-by: Dmitrii Blaginin <[email protected]> Co-authored-by: Jax Liu <[email protected]> Co-authored-by: Ifeanyi Ubah <[email protected]> * Apply suggestions from code review Co-authored-by: Will Jones <[email protected]> * Clarify what is updated in the script * Apply suggestions from code review Co-authored-by: Paddy Horan <[email protected]> Co-authored-by: Dan Harris <[email protected]> * Update docs/source/contributor-guide/governance.md * Update docs/source/contributor-guide/governance.md Co-authored-by: Parth Chandra <[email protected]> * Update docs/source/contributor-guide/governance.md * prettier --------- Co-authored-by: Wes McKinney <[email protected]> Co-authored-by: Adrian Garcia Badaracco <[email protected]> Co-authored-by: Mustafa Akur <[email protected]> Co-authored-by: Qi Zhu <[email protected]> Co-authored-by: 张林伟 <[email protected]> Co-authored-by: xudong.w <[email protected]> Co-authored-by: Oleks V <[email protected]> Co-authored-by: Liang-Chi Hsieh <[email protected]> Co-authored-by: Alex Huang <[email protected]> Co-authored-by: Yang Jiang <[email protected]> Co-authored-by: Yongting You <[email protected]> Co-authored-by: Yijie Shen <[email protected]> Co-authored-by: Brent Gardner <[email protected]> Co-authored-by: Dmitrii Blaginin <[email protected]> Co-authored-by: Jax Liu <[email protected]> Co-authored-by: Ifeanyi Ubah <[email protected]> Co-authored-by: Will Jones <[email protected]> Co-authored-by: Paddy Horan <[email protected]> Co-authored-by: Dan Harris <[email protected]> Co-authored-by: Ruihang Xia <[email protected]> Co-authored-by: Parth Chandra <[email protected]> * fix: Ignore governance doc from typos (#17678) * Support Decimal32/64 types (#17501) * Support Decimal32/64 types * Fix bugs, tests, handle more aggregate functions and schema * Fill out more parts in expr,common and expr-common * Some stragglers and overlooked corners * Actually commit the avg_distinct support --------- Co-authored-by: Andrew Lamb <[email protected]> * minor: Improve hygiene for `datafusion-functions` macros (#17638) * feat(small): Display `NullEquality` in join executor's `EXPLAIN` output (#17664) * Clarify null-equal explain expectations * Format null equality display strings * fix test * review: more concise message * review: more concise message * Custom timestamp format for DuckDB (#17653) * feat(substrait): add time literal support (#17655) Adds support for `ScalarValue::Time64Microsecond` and `ScalarValue::Time64Nanosecond` to be converted to and from Substrait literals. This includes the `PrecisionTime` literal type and specific `TIME_64_TYPE_VARIATION_REF` for 6-digit (microseconds) and 9-digit (nanoseconds) precision. Co-authored-by: Bruno Volpato <[email protected]> * Support LargeList for array_sort (#17657) * Support FixedSizeList for array_except (#17658) * fix: null padding for `array_reverse` on `FixedSizeList` (#17673) * fix: array_reverse with null * update * update * chore: refactor array fn signatures & add more slt tests (#17672) * Support FixedSizeList for array_to_string (#17666) * fix: correct statistics for `NestedLoopJoinExec` (#17680) * fix: correct statistics for nestedloopexec * chore: update comment * minor: add SQLancer fuzzed SLT case for natural joins (#17683) * chore: Upgrade Rust version to 1.90.0 (#17677) * chore: bump workspace rust version to 1.90.0 * fix clippy errors * fix clippy errors * try using dedicate runner temp space * retrigger * inspect disk usage * split build/run * disable debug info in ci profile * revert ci changes * Support FixedSizeList for array_position (#17659) * chore(deps): bump the proto group with 2 updates (#16806) * chore(deps): bump the proto group with 2 updates Bumps the proto group with 2 updates: [pbjson-build](https://github.com/influxdata/pbjson) and [prost-build](https://github.com/tokio-rs/prost). Updates `pbjson-build` from 0.7.0 to 0.8.0 - [Commits](https://github.com/influxdata/pbjson/commits) Updates `prost-build` from 0.13.5 to 0.14.1 - [Release notes](https://github.com/tokio-rs/prost/releases) - [Changelog](https://github.com/tokio-rs/prost/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/prost/compare/v0.13.5...v0.14.1) --- updated-dependencies: - dependency-name: pbjson-build dependency-version: 0.8.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: proto - dependency-name: prost-build dependency-version: 0.14.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: proto ... Signed-off-by: dependabot[bot] <[email protected]> * Regen protos --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jefffrey <[email protected]> * feat(spark): implement Spark `make_interval` function (#17424) * feat(spark): implement Spark make_interval function * fix name length * add doc * add doc and change test, need more test * fmt * add test and doc, need to work in overflow * clippy * empty params * test ok IntervalMonthDayNano::new(0, 0, 0) in unit test * line blank * fix doc table select * dont panic * update test and not panic fmt * review * review fix test failure * review fix test failure format simple string * test uncomment and link * return test (empty) * changes review * all overflow null * all overflow null fix fmt * changes review * changes review clippy * refactor move * fix error doc date_sub * clean slt * no space device * chore: Update READMEs of crates to be more consistent (#17691) * chore: Update READMEs of crates to be more consistent * Add some more Apache project links * Minor formatting * Formatting * Update datafusion/pruning/README.md Co-authored-by: Andrew Lamb <[email protected]> * suggestion * formatting * formatting --------- Co-authored-by: Andrew Lamb <[email protected]> * chore: update a bunch of dependencies (#17708) * chore: fix wasm-pack installation link in wasmtest README (#17704) * Support FixedSizeList for array_slice via coercion to List (#17667) * docs: Remove disclaimer that `datafusion` 50.0.0 is not released (#17695) * docs: Remove disclaimer that datafusion 50.0.0 is not released * Add section about 51.0.0 * chore(deps): bump taiki-e/install-action from 2.61.10 to 2.62.1 (#17710) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.61.10 to 2.62.1. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/0aa4f22591557b744fe31e55dbfcdfea74a073f7...d6912b47771be2c443ec90dbb3d28e023987e782) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * perf: Improve the performance of WINDOW functions with many partitions (#17528) * perf: Improve the performance of WINDOW functions with many partitions * Improve variable name in calculate_n_out_row * fix: Partial AggregateMode will generate duplicate field names which will fail DFSchema construct (#17706) * fix: Partial AggregateMode will generate duplicate field names which will fail DFSchema construct * Update datafusion/common/src/dfschema.rs Co-authored-by: Andrew Lamb <[email protected]> * fmt --------- Co-authored-by: Andrew Lamb <[email protected]> * feat: expose `udafs` and `udwfs` methods on `FunctionRegistry` (#17650) * expose udafs and udwfs method on `FunctionRegistry` * fix doc test * add default implementations not to trigger backward incompatible change for others * Support remaining substrait time literal variations (#17707) * Bump MSRV to 1.87.0 (#17724) * Bump MSRV to 1.87.0 * automatic code fixes * Add upgrading entry * Avoid redundant Schema clones (#17643) * Collocate variants of From DFSchema to Schema * Remove duplicated logic for obtaining Schema from DFSchema * Remove Arc clone in hash_nested_array * Avoid redundant Schema clones * Avoid some Field clones * make arc clones explicit * retract the new From * empty: roll the dice 🎲 * Use github link instead of relative link to optimizer_rule.rs in query-optimizer.md (#17723) * Move misplaced upgrading entry about MSRV (#17727) * Introduce `avg_distinct()` and `sum_distinct()` functions to DataFrame API (#17536) * Introduce `avg_distinct()` and `sum_distinct()` functions to DataFrame API * Add to roundtrip proto tests * Support `WHERE`, `ORDER BY`, `LIMIT`, `SELECT`, `EXTEND` pipe operators (#17278) * support WHERE pipe operator * support order by * support limit * select pipe * extend support * document supported pipe operators in user guide * fmt * fix where pipe before extend * don't rebind * remove clone * move docs into select.md * avoid confusion by removing `>` in examples --------- Co-authored-by: Jeffrey Vo <[email protected]> * doc: add missing examples for multiple math functions (#17018) * Update Scalar_functions.md * pretier fix * Updated files * Updated Scalar functions * Update datafusion/functions/src/math/log.rs Co-authored-by: Jeffrey Vo <[email protected]> * Update datafusion/functions/src/math/monotonicity.rs Co-authored-by: Jeffrey Vo <[email protected]> * Update datafusion/functions/src/math/monotonicity.rs Co-authored-by: Jeffrey Vo <[email protected]> * Update datafusion/functions/src/math/nans.rs Co-authored-by: Jeffrey Vo <[email protected]> * Update datafusion/functions/src/math/nanvl.rs Co-authored-by: Jeffrey Vo <[email protected]> * Fix tanh example to be tanh not trunc * Run update_function_docs.sh --------- Co-authored-by: Jeffrey Vo <[email protected]> * feat: support for null, date, and timestamp types in approx_distinct (#17618) * feat: let approx_distinct handle null, date and timestamp types Signed-off-by: Dennis Zhuang <[email protected]> * chore: update testing submodule Signed-off-by: Dennis Zhuang <[email protected]> * feat: supports time type and refactor NullHLLAccumulator Signed-off-by: Dennis Zhuang <[email protected]> * bump arrow-testing submodule --------- Signed-off-by: Dennis Zhuang <[email protected]> Co-authored-by: Jefffrey <[email protected]> * fix(agg/corr): return NULL when variance is zero or samples < 2 (#17621) Signed-off-by: Dennis Zhuang <[email protected]> * chore(deps): bump taiki-e/install-action from 2.62.1 to 2.62.4 (#17739) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.1 to 2.62.4. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/d6912b47771be2c443ec90dbb3d28e023987e782...5597bc27da443ba8bf9a3bc4e5459ea59177de42) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump tempfile from 3.22.0 to 3.23.0 (#17741) Bumps [tempfile](https://github.com/Stebalien/tempfile) from 3.22.0 to 3.23.0. - [Changelog](https://github.com/Stebalien/tempfile/blob/master/CHANGELOG.md) - [Commits](https://github.com/Stebalien/tempfile/compare/v3.22.0...v3.23.0) --- updated-dependencies: - dependency-name: tempfile dependency-version: 3.23.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: make `LimitPushPastWindows` public (#17736) * fix: Remove parquet encryption feature from root deps (#17700) This fix relates to issue #16650 by completing #16649 . * fix: Remove datafusion-macros's dependency on datafusion-expr (#17688) * Remove datafusion-macros's dependency on datafusion-expr * Re-export * chore: remove homebrew publish instructions from release steps (#17735) * minor: create `OptimizerContext` with provided `ConfigOptions` (#17742) * Improve documentation for ordered set aggregate functions (#17744) * docs: fix sidebar overlapping table on configuration page on website (#17738) * solved bug * fix:modified css for table overlapping * Add support for calling async UDF as aggregation expression (#17620) * Add support for calling async UDF as aggregation expression Fixes https://github.com/apache/datafusion/issues/17619 * add explain plans * chore(deps): bump taiki-e/install-action from 2.62.4 to 2.62.5 (#17750) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.4 to 2.62.5. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/5597bc27da443ba8bf9a3bc4e5459ea59177de42...6f69ec9970ed0c500b1b76d648e05c4c7e0e5671) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.5 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * (fix): Lag function creates unwanted projection (#17630) (#17639) * fix: Not adding generated windown expr resulting column twice (#17630) * Making clippy happier * Support `LargeList` in `array_has` simplification to `InList` (#17732) * Support `LargeList` in `array_has` simplification to `InList` * refactoring * chore(deps): bump wasm-bindgen-test from 0.3.51 to 0.3.53 (#17642) * chore(deps): bump wasm-bindgen-test from 0.3.51 to 0.3.53 Bumps [wasm-bindgen-test](https://github.com/wasm-bindgen/wasm-bindgen) from 0.3.51 to 0.3.53. - [Release notes](https://github.com/wasm-bindgen/wasm-bindgen/releases) - [Changelog](https://github.com/wasm-bindgen/wasm-bindgen/blob/main/CHANGELOG.md) - [Commits](https://github.com/wasm-bindgen/wasm-bindgen/commits) --- updated-dependencies: - dependency-name: wasm-bindgen-test dependency-version: 0.3.53 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * testing setting WASM_BINDGEN_TEST_TIMEOUT * more testing * more testing * more testing * more testing * more testing * testing * testing * testing * testing * whoops * whoops * testing * testing * testing * testing * testing * testing * testing * testing * testing * testing * testing * testing * problem commit * please let this work * oops * test 0.3.53 * fix --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jeffrey Vo <[email protected]> * feat: support `Utf8View` for more args of `regexp_replace` (#17195) * Stash changes. * Signature cleanup, more test scenarios. * Minor test renaming. * Simplify signature. * Update tests. * Signature change for binary input support. * Return type changes for binary. * Stash. * Stash. * Stash. * Stash. * Fix regx bench. * Clippy. * Fix bench regx. * Refactor signature. I need to remove the match arms that aren't used anymore, update the .slt test for string_view.slt, and understand why String(3) and String(4) is not equivalent to this. * Remove unnecessary match arms. * Update string_view slt test. * Reduce diff by returning to single function with a match arm instead of two. * Simplify template args. * Fix benchmark compilation. * Address PR feedback. * feat(spark): implement Spark `map` function `map_from_arrays` (#17456) * feat(spark): implement Spark `map` function `map_from_arrays` * chore: add test with nested `map_from_arrays` calls, refactor map_deduplicate_keys to remove unnesessary variables and array slices * fix: clippy warning * fix: null and different size input lists treatment, chore: move common map funcs to utils.rs, add more tests * fix: typo * fix: clippy docstring warning * chore: move more helpers needed for multiple map functions to utils * chore: add multi-row tests * fix: null values treatment * fix: docstring warnings * chore(deps): bump object_store from 0.12.3 to 0.12.4 (#17753) Bumps [object_store](https://github.com/apache/arrow-rs-object-store) from 0.12.3 to 0.12.4. - [Changelog](https://github.com/apache/arrow-rs-object-store/blob/main/CHANGELOG-old.md) - [Commits](https://github.com/apache/arrow-rs-object-store/compare/v0.12.3...v0.12.4) --- updated-dependencies: - dependency-name: object_store dependency-version: 0.12.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update `arrow` / `parquet` to 56.2.0 (#17631) * temp update to arrow 56.2.0 pin * Update to 56.2.0 * Use released arrow * Update cargo.lock * fix lock * chore(deps): bump taiki-e/install-action from 2.62.5 to 2.62.6 (#17766) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.5 to 2.62.6. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/6f69ec9970ed0c500b1b76d648e05c4c7e0e5671...4575ae687efd0e2c78240087f26013fb2484987f) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.6 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Keep aggregate udaf schema names unique when missing an order-by (#17731) * test: reproducer of bug * fix: make schema names unique for approx_percentile_cont * test: regression test is now resolved * feat : Display function alias in output column name (#17690) * display function's alias name in output column * Update function.rs * updated verbose name format * simplify alias logic and removing args clone * Support join cardinality estimation less conservatively (#17476) * Support join cardinality estimation if distinct_count is set Currently we require max and min to be set, as they might be used to estimate the distinct count. This is unnecessarily conservative if distinct_count has actually been provided, in which case max and min won't be used at all and the presence of max or min has no influence over how good of an estimate it is. * Update datafusion/physical-plan/src/joins/utils.rs Co-authored-by: Piotr Findeisen <[email protected]> * Update tests * Calculate cardinality even if distinct or min/max not provided --------- Co-authored-by: Piotr Findeisen <[email protected]> * chore(deps): bump libc from 0.2.175 to 0.2.176 (#17767) Bumps [libc](https://github.com/rust-lang/libc) from 0.2.175 to 0.2.176. - [Release notes](https://github.com/rust-lang/libc/releases) - [Changelog](https://github.com/rust-lang/libc/blob/0.2.176/CHANGELOG.md) - [Commits](https://github.com/rust-lang/libc/compare/0.2.175...0.2.176) --- updated-dependencies: - dependency-name: libc dependency-version: 0.2.176 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump postgres-types from 0.2.9 to 0.2.10 (#17768) Bumps [postgres-types](https://github.com/rust-postgres/rust-postgres) from 0.2.9 to 0.2.10. - [Release notes](https://github.com/rust-postgres/rust-postgres/releases) - [Commits](https://github.com/rust-postgres/rust-postgres/compare/postgres-types-v0.2.9...postgres-types-v0.2.10) --- updated-dependencies: - dependency-name: postgres-types dependency-version: 0.2.10 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Use `Expr::qualified_name()` and `Column::new()` to extract partition keys from window and aggregate operators (#17757) * Use `Expr::qualified_name()` and `Column::new()` to extract partition keys Using `Expr::schema_name()` and `Column::from_qualified_name()` could incorrectly parse the column name. * Use `Expr::qualified_name()` to extract group by keys * Retrain dataframe tests with filters and aggregates * Prevent exponential planning time for Window functions - v2 (#17684) * fix * Update mod.rs * Update mod.rs * Update mod.rs * tests copied from v1 pr * test case from review comment https://github.com/apache/datafusion/pull/17684#discussion_r2366146307 * one more test case * Update mod.rs * Update datafusion/physical-plan/src/windows/mod.rs Co-authored-by: Andrew Lamb <[email protected]> * Update datafusion/physical-plan/src/windows/mod.rs Co-authored-by: Andrew Lamb <[email protected]> * Update mod.rs * Update mod.rs --------- Co-authored-by: Piotr Findeisen <[email protected]> Co-authored-by: Andrew Lamb <[email protected]> * docs: add Ballista link to landing page (#17746) (#17775) * docs: add Ballista link to landing page (#17746) This adds a link and description for DataFusion Ballista to the landing page, as suggested in issue #17746. Ballista is a distributed compute platform built on top of DataFusion. Closes: #17746 * fix(docs): update Ballista link * updated theory part * chore(deps): bump taiki-e/install-action from 2.62.6 to 2.62.8 (#17781) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.6 to 2.62.8. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/4575ae687efd0e2c78240087f26013fb2484987f...ea0eda622640ac23a17ba349cf09e2709d58f5e1) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.8 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump wasm-bindgen-test from 0.3.53 to 0.3.54 (#17784) Bumps [wasm-bindgen-test](https://github.com/wasm-bindgen/wasm-bindgen) from 0.3.53 to 0.3.54. - [Release notes](https://github.com/wasm-bindgen/wasm-bindgen/releases) - [Changelog](https://github.com/wasm-bindgen/wasm-bindgen/blob/main/CHANGELOG.md) - [Commits](https://github.com/wasm-bindgen/wasm-bindgen/commits) --- updated-dependencies: - dependency-name: wasm-bindgen-test dependency-version: 0.3.54 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore: Action some old TODOs in github actions (#17694) * chore: Action some old TODOs in github actions * Update Cargo.toml * testing * Revert changing cli test runner to use container * Remove sccache * dev: Add benchmark for compilation profiles (#17754) * Add benchmark for compilation profiles * add apache header * add apache header * chore(deps): bump tokio-postgres from 0.7.13 to 0.7.14 (#17785) Bumps [tokio-postgres](https://github.com/rust-postgres/rust-postgres) from 0.7.13 to 0.7.14. - [Release notes](https://github.com/rust-postgres/rust-postgres/releases) - [Commits](https://github.com/rust-postgres/rust-postgres/compare/tokio-postgres-v0.7.13...tokio-postgres-v0.7.14) --- updated-dependencies: - dependency-name: tokio-postgres dependency-version: 0.7.14 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump serde from 1.0.226 to 1.0.227 (#17783) Bumps [serde](https://github.com/serde-rs/serde) from 1.0.226 to 1.0.227. - [Release notes](https://github.com/serde-rs/serde/releases) - [Commits](https://github.com/serde-rs/serde/compare/v1.0.226...v1.0.227) --- updated-dependencies: - dependency-name: serde dependency-version: 1.0.227 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump regex from 1.11.2 to 1.11.3 (#17782) Bumps [regex](https://github.com/rust-lang/regex) from 1.11.2 to 1.11.3. - [Release notes](https://github.com/rust-lang/regex/releases) - [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-lang/regex/compare/1.11.2...1.11.3) --- updated-dependencies: - dependency-name: regex dependency-version: 1.11.3 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Support `CAST` from temporal to `Utf8View` (#17535) * Add case expr simplifiers for literal comparisons (#17743) * Add case expr simplifiers for literal comparisons * Update datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs Co-authored-by: Andrew Lamb <[email protected]> * Avoid expr clones --------- Co-authored-by: Andrew Lamb <[email protected]> * chore: dependabot to run weekly (#17797) * [DOCS] Add dbt Fusion engine and R2 Query Engine to "Known Users" (#17793) * Add dbt Fusion engine and R2 Query Engine * Update docs/source/user-guide/introduction.md * Update docs/source/user-guide/introduction.md * feat: change `datafusion-proto` to use `TaskContext` rather than`SessionContext` for physical plan serialization (#17601) * change session context to task context in physical proto ... * fix compilation issue * remove `RuntimeEnv` from few function arguments * update upgrading guide * display window function's alias name in output (#17788) * docs: update wasmtest README with instructions for Apple silicon (#17755) * chore(deps): bump sysinfo from 0.37.0 to 0.37.1 (#17800) Bumps [sysinfo](https://github.com/GuillaumeGomez/sysinfo) from 0.37.0 to 0.37.1. - [Changelog](https://github.com/GuillaumeGomez/sysinfo/blob/master/CHANGELOG.md) - [Commits](https://github.com/GuillaumeGomez/sysinfo/compare/v0.37.0...v0.37.1) --- updated-dependencies: - dependency-name: sysinfo dependency-version: 0.37.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * chore(deps): bump taiki-e/install-action from 2.62.8 to 2.62.9 (#17799) Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.62.8 to 2.62.9. - [Release notes](https://github.com/taiki-e/install-action/releases) - [Changelog](https://github.com/taiki-e/install-action/blob/main/CHANGELOG.md) - [Commits](https://github.com/taiki-e/install-action/compare/ea0eda622640ac23a17ba349cf09e2709d58f5e1...71d339ebf191fcbc3d49cd04b9484a4261f29975) --- updated-dependencies: - dependency-name: taiki-e/install-action dependency-version: 2.62.9 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * feat(spark): implement Spark `make_dt_interval` function (#17728) * feat(spark): implement Spark make_dt_interval function * fmt * delete pub * test slt * fmt * overflow -> null * sugested changes * fmt * only res in slt * null not void type * explain types * explain types fix url * better comment * Fix potential overflow when we print verbose physical plan (#17798) * change debug to trace for potential overflow * fix comments. * fix * Add SedonaDB as known user to Apache DataFusion (#17806) * Extend datatype semantic equality check to include timestamps (#17777) * Extend datatype semantic equality to include timestamps * test * Respond to comments * cargo fmt --------- Co-authored-by: Shiv Bhatia <[email protected]> * fix: Filter out nulls properly in approx_percentile_cont_with_weight (#17780) * chore: refactor usage of `reassign_predicate_columns` (#17703) * chore: refactor usage of `reassign_predicate_columns` * chore: Address PR comments --------- Co-authored-by: Andrew Lamb <[email protected]> * dev: Add Apache license check to the lint script (#17787) * Add liscense checker ci script * fix the deliberately added bad license header * review: use dev profile and pin the version * Fix: common_sub_expression_eliminate optimizer rule failed (#16066) Common_sub_expression_eliminate rule failed with error: `SchemaError(FieldNotFound {field: <name>}, valid_fields: []})` due to the schema being changed by the second application of `find_common_exprs` As I understood the source of the problem was in sequential call of `find_common_exprs`. First call returned original names as `aggr_expr` and changed names as `new_aggr_expr`. Second call takes into account only `new_aggr_expr` and if names was already changed by first call will return changed names as `aggr_expr`(original ones) and put them into Projection logic. I used NamePreserver mechanism to restore original schema names and generate Projection with original name at the end of aggregate optimization. Co-authored-by: Andrew Lamb <[email protected]> * feat: support multi-threaded writing of Parquet files with modular encryption (#16738) * Initial commit diff --git c/Cargo.lock i/Cargo.lock index 749971532..f0b9d0a5f 100644 --- c/Cargo.lock +++ i/Cargo.lock @@ -246,52 +246,62 @@ checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50" [[package]] name = "arrow" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "fd798aea3553913a5986813e9c6ad31a2d2b04e931fe8ea4a37155eb541cebb5" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-arith", - "arrow-array", - "arrow-buffer", - "arrow-cast", + "arrow-arith 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "arrow-csv", - "arrow-data", - "arrow-ipc", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "arrow-json", - "arrow-ord", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "arrow-pyarrow", - "arrow-row", - "arrow-schema", - "arrow-select", - "arrow-string", + "arrow-row 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-string 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "half", "rand 0.9.2", ] [[package]] name = "arrow-arith" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "508dafb53e5804a238cab7fd97a59ddcbfab20cc4d9814b1ab5465b9fa147f2e" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "chrono", + "num", +] + +[[package]] +name = "arrow-arith" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "chrono", "num", ] [[package]] name = "arrow-array" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "e2730bc045d62bb2e53ef8395b7d4242f5c8102f41ceac15e8395b9ac3d08461" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ "ahash 0.8.12", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "chrono", "chrono-tz", "half", @@ -299,11 +309,35 @@ dependencies = [ "num", ] +[[package]] +name = "arrow-array" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "ahash 0.8.12", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "chrono", + "half", + "hashbrown 0.15.4", + "num", +] + [[package]] name = "arrow-buffer" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "54295b93beb702ee9a6f6fbced08ad7f4d76ec1c297952d4b83cf68755421d1d" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" +dependencies = [ + "bytes", + "half", + "num", +] + +[[package]] +name = "arrow-buffer" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ "bytes", "half", @@ -312,15 +346,14 @@ dependencies = [ [[package]] name = "arrow-cast" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "67e8bcb7dc971d779a7280593a1bf0c2743533b8028909073e804552e85e75b5" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", - "arrow-select", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "atoi", "base64 0.22.1", "chrono", @@ -332,14 +365,32 @@ dependencies = [ ] [[package]] -name = "arrow-csv" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "673fd2b5fb57a1754fdbfac425efd7cf54c947ac9950c1cce86b14e248f1c458" +name = "arrow-cast" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ - "arrow-array", - "arrow-cast", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "atoi", + "base64 0.22.1", + "chrono", + "half", + "lexical-core", + "num", + "ryu", +] + +[[package]] +name = "arrow-csv" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "chrono", "csv", "csv-core", @@ -348,33 +399,42 @@ dependencies = [ [[package]] name = "arrow-data" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "97c22fe3da840039c69e9f61f81e78092ea36d57037b4900151f063615a2f6b4" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-buffer", - "arrow-schema", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "half", + "num", +] + +[[package]] +name = "arrow-data" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "half", "num", ] [[package]] name = "arrow-flight" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "6808d235786b721e49e228c44dd94242f2e8b46b7e95b233b0733c46e758bfee" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ - "arrow-arith", - "arrow-array", - "arrow-buffer", - "arrow-cast", - "arrow-data", - "arrow-ipc", - "arrow-ord", - "arrow-row", - "arrow-schema", - "arrow-select", - "arrow-string", + "arrow-arith 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-row 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-string 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "base64 0.22.1", "bytes", "futures", @@ -382,35 +442,45 @@ dependencies = [ "paste", "prost", "prost-types", - "tonic", + "tonic 0.12.3", ] [[package]] name = "arrow-ipc" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "778de14c5a69aedb27359e3dd06dd5f9c481d5f6ee9fbae912dba332fd64636b" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "flatbuffers", "lz4_flex", "zstd", ] [[package]] -name = "arrow-json" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "3860db334fe7b19fcf81f6b56f8d9d95053f3839ffe443d56b5436f7a29a1794" +name = "arrow-ipc" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-cast", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "flatbuffers", +] + +[[package]] +name = "arrow-json" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-cast 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "chrono", "half", "indexmap 2.10.0", @@ -424,78 +494,130 @@ dependencies = [ [[package]] name = "arrow-ord" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "425fa0b42a39d3ff55160832e7c25553e7f012c3f187def3d70313e7a29ba5d9" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", - "arrow-select", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", +] + +[[package]] +name = "arrow-ord" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", ] [[package]] name = "arrow-pyarrow" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d944d8ae9b77230124e6570865b570416c33a5809f32c4136c679bbe774e45c9" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "pyo3", ] [[package]] name = "arrow-row" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "df9c9423c9e71abd1b08a7f788fcd203ba2698ac8e72a1f236f1faa1a06a7414" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "half", +] + +[[package]] +name = "arrow-row" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "half", ] [[package]] name = "arrow-schema" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "85fa1babc4a45fdc64a92175ef51ff00eba5ebbc0007962fecf8022ac1c6ce28" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ "bitflags 2.9.1", "serde", "serde_json", ] +[[package]] +name = "arrow-schema" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" + [[package]] name = "arrow-select" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "d8854d15f1cf5005b4b358abeb60adea17091ff5bdd094dca5d3f73787d81170" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ "ahash 0.8.12", - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "num", +] + +[[package]] +name = "arrow-select" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "ahash 0.8.12", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "num", ] [[package]] name = "arrow-string" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "2c477e8b89e1213d5927a2a84a72c384a9bf4dd0dbf15f9fd66d821aafd9e95e" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing#b9396ccee27a39c91feccc982f5e976f0c0ff6d8" dependencies = [ - "arrow-array", - "arrow-buffer", - "arrow-data", - "arrow-schema", - "arrow-select", + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "memchr", + "num", + "regex", + "regex-syntax", +] + +[[package]] +name = "arrow-string" +version = "55.2.0" +source = "git+https://github.com/rok/arrow-rs.git#674dc17b2c423be16d0725a6537b0063ac7b1b58" +dependencies = [ + "arrow-array 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-data 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git)", + "arrow-select 55.2.0 (git+https://github.com/rok/arrow-rs.git)", "memchr", "num", "regex", @@ -567,6 +689,28 @@ dependencies = [ "syn 2.0.106", ] +[[package]] +name = "async-stream" +version = "0.3.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0b5a71a6f37880a80d1d7f19efd781e4b5de42c88f0722cc13bcb6cc2cfe8476" +dependencies = [ + "async-stream-impl", + "futures-core", + "pin-project-lite", +] + +[[package]] +name = "async-stream-impl" +version = "0.3.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c7c24de15d275a1ecfd47a380fb4d5ec9bfe0933f309ed5e705b775596a3574d" +dependencies = [ + "proc-macro2", + "quote", + "syn 2.0.104", +] + [[package]] name = "async-trait" version = "0.1.89" @@ -827,7 +971,7 @@ dependencies = [ "rustls-native-certs", "rustls-pki-types", "tokio", - "tower", + "tower 0.5.2", "tracing", ] @@ -948,18 +1092,19 @@ dependencies = [ [[package]] name = "axum" -version = "0.8.4" +version = "0.7.9" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "021e862c184ae977658b36c4500f7feac3221ca5da43e3f25bd04ab6c79a29b5" +checksum = "edca88bc138befd0323b20752846e6587272d3b03b0343c8ea28a6f819e6e71f" dependencies = [ - "axum-core", + "async-trait", + "axum-core 0.4.5", "bytes", "futures-util", "http 1.3.1", "http-body 1.0.1", "http-body-util", "itoa", - "matchit", + "matchit 0.7.3", "memchr", "mime", "percent-encoding", @@ -967,7 +1112,53 @@ dependencies = [ "rustversion", "serde", "sync_wrapper", - "tower", + "tower 0.5.2", + "tower-layer", + "tower-service", +] + +[[package]] +name = "axum" +version = "0.8.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "021e862c184ae977658b36c4500f7feac3221ca5da43e3f25bd04ab6c79a29b5" +dependencies = [ + "axum-core 0.5.2", + "bytes", + "futures-util", + "http 1.3.1", + "http-body 1.0.1", + "http-body-util", + "itoa", + "matchit 0.8.4", + "memchr", + "mime", + "percent-encoding", + "pin-project-lite", + "rustversion", + "serde", + "sync_wrapper", + "tower 0.5.2", + "tower-layer", + "tower-service", +] + +[[package]] +name = "axum-core" +version = "0.4.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "09f2bd6146b97ae3359fa0cc6d6b376d9539582c7b4220f041a33ec24c226199" +dependencies = [ + "async-trait", + "bytes", + "futures-util", + "http 1.3.1", + "http-body 1.0.1", + "http-body-util", + "mime", + "pin-project-lite", + "rustversion", + "sync_wrapper", "tower-layer", "tower-service", ] @@ -1818,8 +2009,8 @@ name = "datafusion" version = "49.0.1" dependencies = [ "arrow", - "arrow-ipc", - "arrow-schema", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-trait", "bytes", "bzip2 0.6.0", @@ -1996,7 +2187,7 @@ dependencies = [ "ahash 0.8.12", "apache-avro", "arrow", - "arrow-ipc", + "arrow-ipc 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "base64 0.22.1", "chrono", "half", @@ -2176,7 +2367,7 @@ version = "49.0.1" dependencies = [ "arrow", "arrow-flight", - "arrow-schema", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-trait", "base64 0.22.1", "bytes", @@ -2197,7 +2388,7 @@ dependencies = [ "tempfile", "test-utils", "tokio", - "tonic", + "tonic 0.13.1", "tracing", "tracing-subscriber", "url", @@ -2264,7 +2455,7 @@ version = "49.0.1" dependencies = [ "abi_stable", "arrow", - "arrow-schema", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-ffi", "async-trait", "datafusion", @@ -2284,7 +2475,7 @@ name = "datafusion-functions" version = "49.0.1" dependencies = [ "arrow", - "arrow-buffer", + "arrow-buffer 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "base64 0.22.1", "blake2", "blake3", @@ -2347,7 +2538,7 @@ name = "datafusion-functions-nested" version = "49.0.1" dependencies = [ "arrow", - "arrow-ord", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "criterion", "datafusion-common", "datafusion-doc", @@ -2517,8 +2708,8 @@ version = "49.0.1" dependencies = [ "ahash 0.8.12", "arrow", - "arrow-ord", - "arrow-schema", + "arrow-ord 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "async-trait", "chrono", "criterion", @@ -2589,7 +2780,7 @@ name = "datafusion-pruning" version = "49.0.1" dependencies = [ "arrow", - "arrow-schema", + "arrow-schema 55.2.0 (git+https://github.com/rok/arrow-rs.git?branch=multi-threaded_encrypted_writing)", "datafusion-common", "datafusion-datasource", "datafusion-expr", @@ -4157,6 +4348,12 @@ dependencies = [ "pkg-config", ] +[[package]] +name = "matchit" +version = "0.7.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0e7465ac9959cc2b1404e8e2367b43684a6d13790fe23056cc8c6c5a6b7bcb94" + [[package]] name = "matchit" version = "0.8.4" @@ -4529,18 +4726,17 @@ dependencies = [ [[package]] name = "parquet" -version = "56.0.0" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "c7288a07e…
Which issue does this PR close?
Rationale for this change
#16351 added modular encryption reading and writing. This builds on top of #16351 and uses apache/arrow-rs#7818 to enable multi threaded encrypted writing.
What changes are included in this PR?
This uses a lower level
ArrowRowGroupWriterFactoryAPI to create encryption-aware column writers that are then run multi threaded.Are these changes tested?
Yes.
Are there any user-facing changes?
Previously
parquet_opts.global.allow_single_file_parallelism == truewould be ignored and encrypted write would always be single threaded. Now it will run multithreaded.