Skip to content

Conversation

ding-young
Copy link
Contributor

@ding-young ding-young commented Jul 12, 2025

Which issue does this PR close?

Rationale for this change

As described in above issue, when constructing a StringViewArray from rows, we currently store inline strings twice: once through make_view, and again in the values buffer so that we can validate utf8 in one go. However, this is suboptimal in terms of memory consumption, so ideally, we should avoid placing inline strings into the values buffer when UTF-8 validation is disabled.

What changes are included in this PR?

When UTF-8 validation is disabled, this PR modifies the string/bytes view array construction from rows as follows:

  1. The capacity of the values buffer is set to accommodate only long strings plus 12 bytes for a single inline string placeholder.

  2. All decoded strings are initially appended to the values buffer.

  3. If a string turns out to be an inline string, it is included via make_view, and then the corresponding inline portion is truncated from the values buffer, ensuring the inline string does not appear twice in the resulting array.

Are these changes tested?

  1. copied & modified existing fuzz_test to set disable utf8 validation.
  2. Run bench & add bench case when array consists of both inline string & long strings

Are there any user-facing changes?

No.

Considered alternatives

One idea was to support separate buffers for inline strings even when UTF-8 validation is enabled. However, since we need to call decoded_len() first to determine the target buffer, this approach can be tricky or inefficient:

  • For example, precomputing a boolean flag per string to determine which buffer to use would increase temporary memory usage.

  • Alternatively, appending to the values buffer first and then moving inline strings to a separate buffer would lead to frequent memcpy overhead.

Given that datafusion disables UTF-8 validation when using RowConverter, this PR focuses on improving memory efficiency specifically when validation is turned off.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 12, 2025
@ding-young
Copy link
Contributor Author

@alamb @XiangpengHao @2010YOUY01

When running the existing benchmarks (cargo bench --bench row_format "convert_rows 4096 string view” ), I noticed there might be a slight regression, but it seems relatively minor given the normal level of fluctuation.
I’d love to hear your thoughts on this PR — if you think this direction is useful, I’ll run more benchmark experiments and polish it further.

@XiangpengHao
Copy link
Contributor

I took a high level look and it looks good to me. Curious to see the perf diff. Does the benchmark report memory usage yet? @ding-young

@ding-young
Copy link
Contributor Author

@XiangpengHao The current benchmark doesn’t report memory usage directly, but I’ve been printing stats manually using jemalloc. It seems like there might be an issue with my implementation, so I’ll double-check that and share the perf once I’ve confirmed.

@ding-young
Copy link
Contributor Author

  • cargo bench result
Case (str_len, null prob) main issue-6057
string view(10, 0) 51.23 µs 52.18 µs
string view(30, 0) 45.47 µs 46.63 µs
string view(100, 0) 64.18 µs 68.54 µs
string view(100, 0.5) 70.11 µs 74.06 µs
string view(1..100, 0) 100.72 µs 103.80 µs
string view(1..100, 0.5) 80.48 µs 86.02 µs
  • manual memory profiling result (*unit = B)

I added code to get jemalloc stats (allocate, resident, active) before and after decoding binary view, and the memory usage actually improved especially when short strings are mixed up with large strings. When given rows consists of only large strings, the memory usage was the same.

let before = jemalloc_stat();

let view = if !validate_utf8 {
    decode_binary_view_inner_utf8_unchecked(rows, options)
} else {
    decode_binary_view_inner(rows, options, validate_utf8)
};

let after = jemalloc_stat();
// print ( after - before ) 

(To reproduce, see https://github.com/ding-young/arrow-rs/tree/issue-6057-bench-mem )

Case main (alloc / active) issue-6057 (alloc / active)
string view(10, 0) 102656 / 114688 65536 / 69632
string view(30, 0) 196608 / 204800 196608 / 204800
string view(100, 0) 524288 / 532480 524288 / 532480
string view(100, 0.5) 294912 / 303104 294912 / 303104
string view(1..100, 0) 294912 / 303104 294912 / 303104
string view(1..100, 0.5) 180224 / 188416 163840 / 172032

@ding-young ding-young marked this pull request as ready for review July 16, 2025 07:44
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ding-young -- the basic idea makes sense to me but this PR contains a lot of duplicated code which makes it hard to understand what is actually changing here

if the goal of this PR is to reduce the size of the values buffer when we are not validating utf8, I think that should also be testable

For example decode the same rows with and without validation and show the buffer without validation is smaller 🤔

@@ -246,10 +246,76 @@ pub fn decode_binary<I: OffsetSizeTrait>(
unsafe { GenericBinaryArray::from(builder.build_unchecked()) }
}

fn decode_binary_view_inner(
fn decode_binary_view_inner_utf8_unchecked(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand -- if it is a BinaryViewArray it can never have utf8 data. Maybe we can rename this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decode_string_view also calls decode_binary_view_inner(...), so this function can still be reached when decoding UTF-8 data. I'll think about whether there’s a clearer way to rename it.

check_utf8: bool,
validate_utf8: bool,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to keep the naming consistent with

pub unsafe fn decode_string_view(
rows: &mut [&[u8]],
options: SortOptions,
validate_utf8: bool,

@ding-young
Copy link
Contributor Author

@alamb Thank you for review! I've added tests to compare the length of values buffer, and moved branching logic into decode_binary_view_inner

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ding-young -- I reviewed this code carefully and it makes sense to me. I have also started the benchmarks.

My scripts won't include your new benchmarks as they are in the same PR. If you can break the benchmarks into a separate PR that would be much easier for me to check

@alamb
Copy link
Contributor

alamb commented Jul 28, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue-6057 (2df0426) to ec81db3 diff
BENCH_NAME=row_format
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench row_format
BENCH_FILTER=
BENCH_BRANCH_NAME=issue-6057
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 28, 2025

🤖: Benchmark completed

Details

group                                                                                                                         issue-6057                             main
-----                                                                                                                         ----------                             ----
append_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                 1.00    378.6±1.14µs        ? ?/sec    1.02    384.4±2.38µs        ? ?/sec
append_rows 4096 bool(0, 0.5)                                                                                                 1.00      8.6±0.02µs        ? ?/sec    1.00      8.6±0.04µs        ? ?/sec
append_rows 4096 bool(0.3, 0.5)                                                                                               1.00     16.1±0.09µs        ? ?/sec    1.00     16.1±0.15µs        ? ?/sec
append_rows 4096 i64(0)                                                                                                       1.00      7.6±0.12µs        ? ?/sec    1.01      7.7±0.13µs        ? ?/sec
append_rows 4096 i64(0.3)                                                                                                     1.23     18.0±0.11µs        ? ?/sec    1.00     14.7±0.12µs        ? ?/sec
append_rows 4096 string view(1..100, 0)                                                                                       1.00    117.5±0.48µs        ? ?/sec  
append_rows 4096 string view(1..100, 0.5)                                                                                     1.00    101.9±0.38µs        ? ?/sec  
append_rows 4096 string view(10, 0)                                                                                           1.00     50.2±0.11µs        ? ?/sec    1.06     53.0±0.18µs        ? ?/sec
append_rows 4096 string view(100, 0)                                                                                          1.02     78.9±0.58µs        ? ?/sec    1.00     77.6±0.16µs        ? ?/sec
append_rows 4096 string view(100, 0.5)                                                                                        1.00     81.7±0.26µs        ? ?/sec    1.03     84.2±0.22µs        ? ?/sec
append_rows 4096 string view(30, 0)                                                                                           1.00     53.4±0.12µs        ? ?/sec    1.02     54.4±0.15µs        ? ?/sec
append_rows 4096 string(10, 0)                                                                                                1.00     46.8±0.28µs        ? ?/sec    1.07     49.9±0.34µs        ? ?/sec
append_rows 4096 string(100, 0)                                                                                               1.02     78.9±0.30µs        ? ?/sec    1.00     77.7±0.19µs        ? ?/sec
append_rows 4096 string(100, 0.5)                                                                                             1.00     85.9±0.31µs        ? ?/sec    1.00     85.8±0.17µs        ? ?/sec
append_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                       1.02    249.5±1.09µs        ? ?/sec    1.00    244.3±0.77µs        ? ?/sec
append_rows 4096 string(30, 0)                                                                                                1.00     49.6±0.12µs        ? ?/sec    1.04     51.4±0.28µs        ? ?/sec
append_rows 4096 string_dictionary(10, 0)                                                                                     1.00     76.6±0.25µs        ? ?/sec    1.03     79.1±0.53µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0)                                                                                    1.00    152.5±0.62µs        ? ?/sec    1.00    152.7±0.89µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0.5)                                                                                  1.00    117.7±0.45µs        ? ?/sec    1.01    119.2±0.24µs        ? ?/sec
append_rows 4096 string_dictionary(30, 0)                                                                                     1.00     79.4±0.28µs        ? ?/sec    1.02     80.6±0.71µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                     1.00     29.0±0.07µs        ? ?/sec    1.00     29.1±0.03µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                    1.00     47.5±0.12µs        ? ?/sec    1.01     47.8±0.07µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                     1.02     29.7±0.06µs        ? ?/sec    1.00     29.1±0.21µs        ? ?/sec
append_rows 4096 u64(0)                                                                                                       1.01      7.7±0.09µs        ? ?/sec    1.00      7.7±0.11µs        ? ?/sec
append_rows 4096 u64(0.3)                                                                                                     1.00     14.8±0.07µs        ? ?/sec    1.00     14.8±0.10µs        ? ?/sec
convert_columns 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)             1.00    386.3±4.48µs        ? ?/sec    1.00    386.0±1.66µs        ? ?/sec
convert_columns 4096 bool(0, 0.5)                                                                                             1.00      8.9±0.03µs        ? ?/sec    1.00      8.8±0.02µs        ? ?/sec
convert_columns 4096 bool(0.3, 0.5)                                                                                           1.01     16.4±0.11µs        ? ?/sec    1.00     16.3±0.10µs        ? ?/sec
convert_columns 4096 i64(0)                                                                                                   1.01      8.1±0.14µs        ? ?/sec    1.00      8.0±0.13µs        ? ?/sec
convert_columns 4096 i64(0.3)                                                                                                 1.23     18.3±0.07µs        ? ?/sec    1.00     14.8±0.09µs        ? ?/sec
convert_columns 4096 string view(1..100, 0)                                                                                   1.00    118.0±0.29µs        ? ?/sec  
convert_columns 4096 string view(1..100, 0.5)                                                                                 1.00    104.7±0.27µs        ? ?/sec  
convert_columns 4096 string view(10, 0)                                                                                       1.00     50.6±0.14µs        ? ?/sec    1.04     52.8±0.23µs        ? ?/sec
convert_columns 4096 string view(100, 0)                                                                                      1.01     78.9±0.24µs        ? ?/sec    1.00     78.2±0.66µs        ? ?/sec
convert_columns 4096 string view(100, 0.5)                                                                                    1.00     83.3±0.15µs        ? ?/sec    1.02     84.7±0.19µs        ? ?/sec
convert_columns 4096 string view(30, 0)                                                                                       1.00     53.5±0.11µs        ? ?/sec    1.03     55.1±0.58µs        ? ?/sec
convert_columns 4096 string(10, 0)                                                                                            1.00     46.7±0.11µs        ? ?/sec    1.05     49.1±0.41µs        ? ?/sec
convert_columns 4096 string(100, 0)                                                                                           1.01     78.6±0.28µs        ? ?/sec    1.00     77.7±0.46µs        ? ?/sec
convert_columns 4096 string(100, 0.5)                                                                                         1.00     86.6±0.30µs        ? ?/sec    1.00     86.1±0.21µs        ? ?/sec
convert_columns 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                   1.01    246.3±0.67µs        ? ?/sec    1.00    243.1±0.86µs        ? ?/sec
convert_columns 4096 string(30, 0)                                                                                            1.00     49.9±0.12µs        ? ?/sec    1.02     50.9±0.08µs        ? ?/sec
convert_columns 4096 string_dictionary(10, 0)                                                                                 1.00     77.5±0.39µs        ? ?/sec    1.02     78.9±0.32µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0)                                                                                1.01    156.5±1.14µs        ? ?/sec    1.00    154.3±0.89µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0.5)                                                                              1.00    120.2±0.35µs        ? ?/sec    1.00    119.9±0.35µs        ? ?/sec
convert_columns 4096 string_dictionary(30, 0)                                                                                 1.01     82.6±0.20µs        ? ?/sec    1.00     81.9±0.24µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(10, 0)                                                                 1.00     30.2±0.05µs        ? ?/sec    1.00     30.1±0.08µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(100, 0)                                                                1.00     48.7±0.11µs        ? ?/sec    1.01     49.3±0.09µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(30, 0)                                                                 1.00     30.3±0.12µs        ? ?/sec    1.00     30.4±0.14µs        ? ?/sec
convert_columns 4096 u64(0)                                                                                                   1.03      8.0±0.06µs        ? ?/sec    1.00      7.8±0.10µs        ? ?/sec
convert_columns 4096 u64(0.3)                                                                                                 1.00     15.0±0.08µs        ? ?/sec    1.00     15.0±0.10µs        ? ?/sec
convert_columns_prepared 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)    1.00    379.8±2.71µs        ? ?/sec    1.02    386.0±1.86µs        ? ?/sec
convert_columns_prepared 4096 bool(0, 0.5)                                                                                    1.00      8.8±0.05µs        ? ?/sec    1.00      8.7±0.02µs        ? ?/sec
convert_columns_prepared 4096 bool(0.3, 0.5)                                                                                  1.00     16.3±0.13µs        ? ?/sec    1.00     16.2±0.09µs        ? ?/sec
convert_columns_prepared 4096 i64(0)                                                                                          1.01      8.0±0.04µs        ? ?/sec    1.00      7.9±0.13µs        ? ?/sec
convert_columns_prepared 4096 i64(0.3)                                                                                        1.23     18.1±0.09µs        ? ?/sec    1.00     14.7±0.11µs        ? ?/sec
convert_columns_prepared 4096 string view(1..100, 0)                                                                          1.00    117.7±0.22µs        ? ?/sec  
convert_columns_prepared 4096 string view(1..100, 0.5)                                                                        1.00    102.8±0.37µs        ? ?/sec  
convert_columns_prepared 4096 string view(10, 0)                                                                              1.00     50.5±0.12µs        ? ?/sec    1.09     55.1±0.22µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0)                                                                             1.01     78.9±0.17µs        ? ?/sec    1.00     78.4±0.40µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0.5)                                                                           1.00     82.1±0.33µs        ? ?/sec    1.03     84.4±0.21µs        ? ?/sec
convert_columns_prepared 4096 string view(30, 0)                                                                              1.00     53.6±0.14µs        ? ?/sec    1.02     54.8±0.16µs        ? ?/sec
convert_columns_prepared 4096 string(10, 0)                                                                                   1.00     47.6±0.23µs        ? ?/sec    1.03     49.0±0.43µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0)                                                                                  1.00     78.7±0.17µs        ? ?/sec    1.00     78.4±0.39µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0.5)                                                                                1.00     86.2±0.14µs        ? ?/sec    1.00     86.1±0.12µs        ? ?/sec
convert_columns_prepared 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                          1.01    244.5±1.11µs        ? ?/sec    1.00    243.1±0.97µs        ? ?/sec
convert_columns_prepared 4096 string(30, 0)                                                                                   1.00     49.8±0.10µs        ? ?/sec    1.02     50.8±0.11µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(10, 0)                                                                        1.00     76.3±0.51µs        ? ?/sec    1.03     78.7±0.45µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0)                                                                       1.00    153.0±0.47µs        ? ?/sec    1.00    153.2±0.85µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0.5)                                                                     1.00    118.4±0.88µs        ? ?/sec    1.01    119.8±0.30µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(30, 0)                                                                        1.00     79.5±0.30µs        ? ?/sec    1.02     81.0±0.55µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(10, 0)                                                        1.00     29.2±0.06µs        ? ?/sec    1.01     29.5±0.24µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(100, 0)                                                       1.00     47.6±0.06µs        ? ?/sec    1.00     47.8±0.06µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(30, 0)                                                        1.02     29.9±0.07µs        ? ?/sec    1.00     29.3±0.04µs        ? ?/sec
convert_columns_prepared 4096 u64(0)                                                                                          1.00      7.6±0.04µs        ? ?/sec    1.02      7.8±0.10µs        ? ?/sec
convert_columns_prepared 4096 u64(0.3)                                                                                        1.00     15.0±0.08µs        ? ?/sec    1.00     14.9±0.08µs        ? ?/sec
convert_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                1.00    297.4±2.45µs        ? ?/sec    1.00    296.4±2.37µs        ? ?/sec
convert_rows 4096 bool(0, 0.5)                                                                                                1.00     16.4±0.06µs        ? ?/sec    1.00     16.4±0.05µs        ? ?/sec
convert_rows 4096 bool(0.3, 0.5)                                                                                              1.00     16.5±0.07µs        ? ?/sec    1.00     16.4±0.19µs        ? ?/sec
convert_rows 4096 i64(0)                                                                                                      1.00     33.4±0.07µs        ? ?/sec    1.00     33.4±0.22µs        ? ?/sec
convert_rows 4096 i64(0.3)                                                                                                    1.01     33.6±0.43µs        ? ?/sec    1.00     33.3±0.04µs        ? ?/sec
convert_rows 4096 string view(1..100, 0)                                                                                      1.00    170.1±0.97µs        ? ?/sec  
convert_rows 4096 string view(1..100, 0.5)                                                                                    1.00    134.3±0.60µs        ? ?/sec  
convert_rows 4096 string view(10, 0)                                                                                          1.00     73.2±0.36µs        ? ?/sec    1.01     73.6±0.14µs        ? ?/sec
convert_rows 4096 string view(100, 0)                                                                                         1.00    120.9±0.40µs        ? ?/sec    1.02    122.8±1.71µs        ? ?/sec
convert_rows 4096 string view(100, 0.5)                                                                                       1.00    112.1±1.25µs        ? ?/sec    1.00    111.8±0.27µs        ? ?/sec
convert_rows 4096 string view(30, 0)                                                                                          1.00     79.4±0.17µs        ? ?/sec    1.04     82.3±0.18µs        ? ?/sec
convert_rows 4096 string(10, 0)                                                                                               1.00     61.6±0.23µs        ? ?/sec    1.00     61.8±0.11µs        ? ?/sec
convert_rows 4096 string(100, 0)                                                                                              1.01    108.3±0.48µs        ? ?/sec    1.00    107.3±0.26µs        ? ?/sec
convert_rows 4096 string(100, 0.5)                                                                                            1.00    103.7±0.27µs        ? ?/sec    1.01    104.3±6.39µs        ? ?/sec
convert_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                      1.00    297.1±2.07µs        ? ?/sec    1.01    299.0±1.71µs        ? ?/sec
convert_rows 4096 string(30, 0)                                                                                               1.00     73.9±0.23µs        ? ?/sec    1.00     74.1±0.23µs        ? ?/sec
convert_rows 4096 string_dictionary(10, 0)                                                                                    1.00     61.9±0.17µs        ? ?/sec    1.00     62.0±0.11µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0)                                                                                   1.00    108.2±0.31µs        ? ?/sec    1.00    108.3±0.44µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0.5)                                                                                 1.00    104.2±0.31µs        ? ?/sec    1.00    103.8±0.23µs        ? ?/sec
convert_rows 4096 string_dictionary(30, 0)                                                                                    1.00     73.9±0.27µs        ? ?/sec    1.00     74.2±0.44µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                    1.00     61.6±0.15µs        ? ?/sec    1.00     61.8±0.11µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                   1.00    109.1±0.74µs        ? ?/sec    1.00    109.2±1.14µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                    1.00     74.0±0.23µs        ? ?/sec    1.00     74.3±0.27µs        ? ?/sec
convert_rows 4096 u64(0)                                                                                                      1.00     30.5±0.06µs        ? ?/sec    1.00     30.6±0.10µs        ? ?/sec
convert_rows 4096 u64(0.3)                                                                                                    1.00     30.7±0.12µs        ? ?/sec    1.00     30.7±0.04µs        ? ?/sec
iterate rows                                                                                                                  1.00      2.6±0.01µs        ? ?/sec    1.00      2.6±0.00µs        ? ?/sec

@ding-young
Copy link
Contributor Author

My scripts won't include your new benchmarks as they are in the same PR. If you can break the benchmarks into a separate PR that would be much easier for me to check

I opened a separate PR, thanks :)

alamb pushed a commit that referenced this pull request Jul 29, 2025
…g strings (#8015)

### Description
Add benchmark case for performance comparison in
#7917 .
@alamb
Copy link
Contributor

alamb commented Jul 29, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing issue-6057 (18d4853) to cbadec7 diff
BENCH_NAME=row_format
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench row_format
BENCH_FILTER=
BENCH_BRANCH_NAME=issue-6057
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 29, 2025

🤖: Benchmark completed

Details

group                                                                                                                         issue-6057                             main
-----                                                                                                                         ----------                             ----
append_rows 10 large_list(0) of u64(0)                                                                                        1.00    668.7±7.41ns        ? ?/sec    1.01    677.1±1.29ns        ? ?/sec
append_rows 10 list(0) of u64(0)                                                                                              1.00    701.2±1.19ns        ? ?/sec    1.01    706.4±2.12ns        ? ?/sec
append_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                 1.01    385.6±1.72µs        ? ?/sec    1.00    383.6±2.18µs        ? ?/sec
append_rows 4096 bool(0, 0.5)                                                                                                 1.00      8.6±0.01µs        ? ?/sec    1.00      8.6±0.02µs        ? ?/sec
append_rows 4096 bool(0.3, 0.5)                                                                                               1.00     16.1±0.10µs        ? ?/sec    1.00     16.1±0.09µs        ? ?/sec
append_rows 4096 i64(0)                                                                                                       1.01      7.8±0.09µs        ? ?/sec    1.00      7.7±0.14µs        ? ?/sec
append_rows 4096 i64(0.3)                                                                                                     1.01     18.1±0.11µs        ? ?/sec    1.00     18.0±0.06µs        ? ?/sec
append_rows 4096 large_list(0) of u64(0)                                                                                      1.00    165.5±0.34µs        ? ?/sec    1.01    166.6±0.96µs        ? ?/sec
append_rows 4096 large_list(0) sliced to 10 of u64(0)                                                                         1.00    955.7±5.22ns        ? ?/sec    1.04    993.7±2.20ns        ? ?/sec
append_rows 4096 list(0) of u64(0)                                                                                            1.00    167.9±1.35µs        ? ?/sec    1.00    167.3±0.49µs        ? ?/sec
append_rows 4096 list(0) sliced to 10 of u64(0)                                                                               1.00   1064.7±2.36ns        ? ?/sec    1.01   1071.2±2.15ns        ? ?/sec
append_rows 4096 string view(1..100, 0)                                                                                       1.00    117.5±0.37µs        ? ?/sec    1.00    117.5±0.22µs        ? ?/sec
append_rows 4096 string view(1..100, 0.5)                                                                                     1.00    102.4±0.23µs        ? ?/sec    1.02    104.3±0.40µs        ? ?/sec
append_rows 4096 string view(10, 0)                                                                                           1.00     50.2±0.09µs        ? ?/sec    1.00     50.4±0.20µs        ? ?/sec
append_rows 4096 string view(100, 0)                                                                                          1.00     77.9±0.23µs        ? ?/sec    1.00     78.2±0.34µs        ? ?/sec
append_rows 4096 string view(100, 0.5)                                                                                        1.00     81.4±0.21µs        ? ?/sec    1.01     82.0±0.25µs        ? ?/sec
append_rows 4096 string view(30, 0)                                                                                           1.00     53.7±0.22µs        ? ?/sec    1.00     53.5±0.16µs        ? ?/sec
append_rows 4096 string(10, 0)                                                                                                1.00     46.7±0.33µs        ? ?/sec    1.00     46.5±0.24µs        ? ?/sec
append_rows 4096 string(100, 0)                                                                                               1.00     77.8±0.16µs        ? ?/sec    1.00     77.9±0.35µs        ? ?/sec
append_rows 4096 string(100, 0.5)                                                                                             1.00     87.1±0.39µs        ? ?/sec    1.00     87.1±0.24µs        ? ?/sec
append_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                       1.00    243.5±1.00µs        ? ?/sec    1.00    244.1±1.16µs        ? ?/sec
append_rows 4096 string(30, 0)                                                                                                1.00     49.7±0.09µs        ? ?/sec    1.00     49.6±0.08µs        ? ?/sec
append_rows 4096 string_dictionary(10, 0)                                                                                     1.00     76.1±0.12µs        ? ?/sec    1.00     75.8±0.18µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0)                                                                                    1.01    153.7±1.21µs        ? ?/sec    1.00    152.3±0.57µs        ? ?/sec
append_rows 4096 string_dictionary(100, 0.5)                                                                                  1.00    119.2±0.38µs        ? ?/sec    1.00    119.7±0.26µs        ? ?/sec
append_rows 4096 string_dictionary(30, 0)                                                                                     1.00     80.4±0.38µs        ? ?/sec    1.00     80.3±0.17µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                     1.00     29.2±0.07µs        ? ?/sec    1.01     29.5±0.07µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                    1.00     47.3±0.10µs        ? ?/sec    1.01     47.9±0.11µs        ? ?/sec
append_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                     1.00     29.7±0.05µs        ? ?/sec    1.00     29.6±0.17µs        ? ?/sec
append_rows 4096 u64(0)                                                                                                       1.00      7.6±0.11µs        ? ?/sec    1.00      7.6±0.10µs        ? ?/sec
append_rows 4096 u64(0.3)                                                                                                     1.00     14.8±0.10µs        ? ?/sec    1.00     14.9±0.10µs        ? ?/sec
convert_columns 10 large_list(0) of u64(0)                                                                                    1.00    918.1±5.61ns        ? ?/sec    1.04    954.4±8.20ns        ? ?/sec
convert_columns 10 list(0) of u64(0)                                                                                          1.00    966.0±2.75ns        ? ?/sec    1.02    987.4±5.92ns        ? ?/sec
convert_columns 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)             1.00    385.6±1.95µs        ? ?/sec    1.01    388.5±2.20µs        ? ?/sec
convert_columns 4096 bool(0, 0.5)                                                                                             1.00      8.8±0.02µs        ? ?/sec    1.01      8.9±0.01µs        ? ?/sec
convert_columns 4096 bool(0.3, 0.5)                                                                                           1.00     16.3±0.06µs        ? ?/sec    1.01     16.4±0.07µs        ? ?/sec
convert_columns 4096 i64(0)                                                                                                   1.01      8.1±0.13µs        ? ?/sec    1.00      8.0±0.12µs        ? ?/sec
convert_columns 4096 i64(0.3)                                                                                                 1.00     18.3±0.12µs        ? ?/sec    1.00     18.3±0.12µs        ? ?/sec
convert_columns 4096 large_list(0) of u64(0)                                                                                  1.00    166.6±1.68µs        ? ?/sec    1.00    166.9±0.50µs        ? ?/sec
convert_columns 4096 large_list(0) sliced to 10 of u64(0)                                                                     1.00   1227.8±5.54ns        ? ?/sec    1.03   1262.3±5.91ns        ? ?/sec
convert_columns 4096 list(0) of u64(0)                                                                                        1.00    168.7±0.94µs        ? ?/sec    1.00    168.1±0.43µs        ? ?/sec
convert_columns 4096 list(0) sliced to 10 of u64(0)                                                                           1.00   1347.1±2.44ns        ? ?/sec    1.02   1369.1±5.07ns        ? ?/sec
convert_columns 4096 string view(1..100, 0)                                                                                   1.00    118.1±0.28µs        ? ?/sec    1.00    117.9±0.34µs        ? ?/sec
convert_columns 4096 string view(1..100, 0.5)                                                                                 1.00    103.2±0.42µs        ? ?/sec    1.02    105.4±0.81µs        ? ?/sec
convert_columns 4096 string view(10, 0)                                                                                       1.00     50.4±0.19µs        ? ?/sec    1.00     50.6±0.09µs        ? ?/sec
convert_columns 4096 string view(100, 0)                                                                                      1.00     77.6±0.26µs        ? ?/sec    1.00     77.7±0.37µs        ? ?/sec
convert_columns 4096 string view(100, 0.5)                                                                                    1.00     82.3±0.32µs        ? ?/sec    1.00     82.4±0.36µs        ? ?/sec
convert_columns 4096 string view(30, 0)                                                                                       1.00     53.7±0.15µs        ? ?/sec    1.00     53.7±0.14µs        ? ?/sec
convert_columns 4096 string(10, 0)                                                                                            1.00     46.6±0.10µs        ? ?/sec    1.00     46.6±0.09µs        ? ?/sec
convert_columns 4096 string(100, 0)                                                                                           1.00     77.3±0.24µs        ? ?/sec    1.00     77.3±0.26µs        ? ?/sec
convert_columns 4096 string(100, 0.5)                                                                                         1.00     87.1±0.24µs        ? ?/sec    1.00     87.0±0.14µs        ? ?/sec
convert_columns 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                   1.00    244.5±0.83µs        ? ?/sec    1.01    248.1±1.09µs        ? ?/sec
convert_columns 4096 string(30, 0)                                                                                            1.00     50.0±0.08µs        ? ?/sec    1.00     49.9±0.15µs        ? ?/sec
convert_columns 4096 string_dictionary(10, 0)                                                                                 1.01     77.9±1.79µs        ? ?/sec    1.00     76.8±0.16µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0)                                                                                1.01    154.8±1.47µs        ? ?/sec    1.00    152.8±1.08µs        ? ?/sec
convert_columns 4096 string_dictionary(100, 0.5)                                                                              1.01    120.6±0.51µs        ? ?/sec    1.00    119.6±0.48µs        ? ?/sec
convert_columns 4096 string_dictionary(30, 0)                                                                                 1.00     81.0±0.21µs        ? ?/sec    1.01     81.7±0.20µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(10, 0)                                                                 1.00     30.2±0.08µs        ? ?/sec    1.01     30.4±0.04µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(100, 0)                                                                1.00     49.1±0.19µs        ? ?/sec    1.00     49.1±0.12µs        ? ?/sec
convert_columns 4096 string_dictionary_low_cardinality(30, 0)                                                                 1.00     30.4±0.12µs        ? ?/sec    1.00     30.5±0.08µs        ? ?/sec
convert_columns 4096 u64(0)                                                                                                   1.01      7.9±0.12µs        ? ?/sec    1.00      7.8±0.11µs        ? ?/sec
convert_columns 4096 u64(0.3)                                                                                                 1.00     15.0±0.09µs        ? ?/sec    1.00     15.0±0.07µs        ? ?/sec
convert_columns_prepared 10 large_list(0) of u64(0)                                                                           1.00    709.0±4.02ns        ? ?/sec    1.05    744.1±7.35ns        ? ?/sec
convert_columns_prepared 10 list(0) of u64(0)                                                                                 1.00    754.7±1.77ns        ? ?/sec    1.01    762.4±2.47ns        ? ?/sec
convert_columns_prepared 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)    1.00    385.4±2.15µs        ? ?/sec    1.00    385.8±1.87µs        ? ?/sec
convert_columns_prepared 4096 bool(0, 0.5)                                                                                    1.00      8.8±0.01µs        ? ?/sec    1.00      8.8±0.04µs        ? ?/sec
convert_columns_prepared 4096 bool(0.3, 0.5)                                                                                  1.01     16.3±0.09µs        ? ?/sec    1.00     16.2±0.08µs        ? ?/sec
convert_columns_prepared 4096 i64(0)                                                                                          1.00      7.9±0.11µs        ? ?/sec    1.01      8.0±0.01µs        ? ?/sec
convert_columns_prepared 4096 i64(0.3)                                                                                        1.01     18.2±0.10µs        ? ?/sec    1.00     18.1±0.09µs        ? ?/sec
convert_columns_prepared 4096 large_list(0) of u64(0)                                                                         1.00    166.2±1.50µs        ? ?/sec    1.00    166.6±0.49µs        ? ?/sec
convert_columns_prepared 4096 large_list(0) sliced to 10 of u64(0)                                                            1.00   1027.9±4.14ns        ? ?/sec    1.04   1067.5±2.50ns        ? ?/sec
convert_columns_prepared 4096 list(0) of u64(0)                                                                               1.01    168.5±1.52µs        ? ?/sec    1.00    167.4±0.37µs        ? ?/sec
convert_columns_prepared 4096 list(0) sliced to 10 of u64(0)                                                                  1.00   1139.6±3.10ns        ? ?/sec    1.02   1161.2±6.99ns        ? ?/sec
convert_columns_prepared 4096 string view(1..100, 0)                                                                          1.00    117.7±0.30µs        ? ?/sec    1.00    117.7±0.28µs        ? ?/sec
convert_columns_prepared 4096 string view(1..100, 0.5)                                                                        1.00    103.4±0.38µs        ? ?/sec    1.02    105.1±0.38µs        ? ?/sec
convert_columns_prepared 4096 string view(10, 0)                                                                              1.00     50.5±0.14µs        ? ?/sec    1.00     50.4±0.07µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0)                                                                             1.00     77.1±0.31µs        ? ?/sec    1.02     78.6±0.25µs        ? ?/sec
convert_columns_prepared 4096 string view(100, 0.5)                                                                           1.00     81.9±0.29µs        ? ?/sec    1.00     82.2±0.27µs        ? ?/sec
convert_columns_prepared 4096 string view(30, 0)                                                                              1.00     53.7±0.17µs        ? ?/sec    1.00     53.6±0.16µs        ? ?/sec
convert_columns_prepared 4096 string(10, 0)                                                                                   1.00     46.5±0.23µs        ? ?/sec    1.00     46.7±0.15µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0)                                                                                  1.00     77.9±0.27µs        ? ?/sec    1.00     78.0±0.34µs        ? ?/sec
convert_columns_prepared 4096 string(100, 0.5)                                                                                1.00     87.2±0.49µs        ? ?/sec    1.00     87.4±0.15µs        ? ?/sec
convert_columns_prepared 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                          1.00    245.9±1.29µs        ? ?/sec    1.00    245.8±0.87µs        ? ?/sec
convert_columns_prepared 4096 string(30, 0)                                                                                   1.00     49.8±0.09µs        ? ?/sec    1.00     49.8±0.12µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(10, 0)                                                                        1.00     76.3±0.17µs        ? ?/sec    1.00     76.5±0.20µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0)                                                                       1.00    153.9±0.77µs        ? ?/sec    1.00    153.4±1.27µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(100, 0.5)                                                                     1.00    119.8±0.68µs        ? ?/sec    1.00    119.7±0.49µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary(30, 0)                                                                        1.00     80.4±0.32µs        ? ?/sec    1.00     80.7±0.32µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(10, 0)                                                        1.00     29.4±0.10µs        ? ?/sec    1.01     29.7±0.05µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(100, 0)                                                       1.00     47.5±0.13µs        ? ?/sec    1.01     47.9±0.09µs        ? ?/sec
convert_columns_prepared 4096 string_dictionary_low_cardinality(30, 0)                                                        1.00     29.8±0.07µs        ? ?/sec    1.01     30.1±0.07µs        ? ?/sec
convert_columns_prepared 4096 u64(0)                                                                                          1.00      7.7±0.12µs        ? ?/sec    1.00      7.8±0.12µs        ? ?/sec
convert_columns_prepared 4096 u64(0.3)                                                                                        1.00     14.9±0.09µs        ? ?/sec    1.00     14.9±0.07µs        ? ?/sec
convert_rows 10 large_list(0) of u64(0)                                                                                       1.00   1580.2±5.72ns        ? ?/sec    1.05   1657.8±9.70ns        ? ?/sec
convert_rows 10 list(0) of u64(0)                                                                                             1.00   1769.6±4.63ns        ? ?/sec    1.00   1769.1±3.46ns        ? ?/sec
convert_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, 0), i64(0)                1.00    301.0±1.37µs        ? ?/sec    1.00    300.0±1.19µs        ? ?/sec
convert_rows 4096 bool(0, 0.5)                                                                                                1.00     16.0±0.07µs        ? ?/sec    1.00     16.0±0.03µs        ? ?/sec
convert_rows 4096 bool(0.3, 0.5)                                                                                              1.00     16.0±0.02µs        ? ?/sec    1.00     16.0±0.04µs        ? ?/sec
convert_rows 4096 i64(0)                                                                                                      1.00     32.9±0.12µs        ? ?/sec    1.00     33.0±0.08µs        ? ?/sec
convert_rows 4096 i64(0.3)                                                                                                    1.00     33.0±0.10µs        ? ?/sec    1.00     33.0±0.07µs        ? ?/sec
convert_rows 4096 large_list(0) of u64(0)                                                                                     1.00    260.2±2.14µs        ? ?/sec    1.00    260.2±0.97µs        ? ?/sec
convert_rows 4096 large_list(0) sliced to 10 of u64(0)                                                                        1.00      2.0±0.00µs        ? ?/sec    1.00      2.0±0.01µs        ? ?/sec
convert_rows 4096 list(0) of u64(0)                                                                                           1.00    264.5±2.58µs        ? ?/sec    1.00    265.6±0.60µs        ? ?/sec
convert_rows 4096 list(0) sliced to 10 of u64(0)                                                                              1.00      2.1±0.01µs        ? ?/sec    1.06      2.2±0.01µs        ? ?/sec
convert_rows 4096 string view(1..100, 0)                                                                                      1.00    169.0±0.27µs        ? ?/sec    1.02    171.8±0.42µs        ? ?/sec
convert_rows 4096 string view(1..100, 0.5)                                                                                    1.00    132.9±0.30µs        ? ?/sec    1.02    136.1±0.29µs        ? ?/sec
convert_rows 4096 string view(10, 0)                                                                                          1.00     72.6±0.15µs        ? ?/sec    1.01     73.0±0.12µs        ? ?/sec
convert_rows 4096 string view(100, 0)                                                                                         1.00    122.3±0.74µs        ? ?/sec    1.00    122.2±0.50µs        ? ?/sec
convert_rows 4096 string view(100, 0.5)                                                                                       1.00    110.7±0.31µs        ? ?/sec    1.01    111.8±0.27µs        ? ?/sec
convert_rows 4096 string view(30, 0)                                                                                          1.00     79.4±1.49µs        ? ?/sec    1.03     81.9±0.15µs        ? ?/sec
convert_rows 4096 string(10, 0)                                                                                               1.00     61.0±0.14µs        ? ?/sec    1.00     60.9±0.13µs        ? ?/sec
convert_rows 4096 string(100, 0)                                                                                              1.00    107.8±0.55µs        ? ?/sec    1.00    108.1±0.40µs        ? ?/sec
convert_rows 4096 string(100, 0.5)                                                                                            1.00    102.8±0.18µs        ? ?/sec    1.01    103.4±0.31µs        ? ?/sec
convert_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0)                                                      1.00    295.3±3.46µs        ? ?/sec    1.02    299.9±2.00µs        ? ?/sec
convert_rows 4096 string(30, 0)                                                                                               1.00     73.3±0.21µs        ? ?/sec    1.01     73.8±0.26µs        ? ?/sec
convert_rows 4096 string_dictionary(10, 0)                                                                                    1.00     61.2±0.16µs        ? ?/sec    1.00     61.1±0.12µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0)                                                                                   1.00    108.4±0.37µs        ? ?/sec    1.00    108.8±0.62µs        ? ?/sec
convert_rows 4096 string_dictionary(100, 0.5)                                                                                 1.00    103.2±0.33µs        ? ?/sec    1.00    103.4±0.25µs        ? ?/sec
convert_rows 4096 string_dictionary(30, 0)                                                                                    1.00     73.9±0.25µs        ? ?/sec    1.00     74.2±0.36µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(10, 0)                                                                    1.00     61.1±0.12µs        ? ?/sec    1.00     61.2±0.09µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(100, 0)                                                                   1.00    107.3±0.43µs        ? ?/sec    1.01    107.9±1.02µs        ? ?/sec
convert_rows 4096 string_dictionary_low_cardinality(30, 0)                                                                    1.00     74.0±0.21µs        ? ?/sec    1.01     74.5±0.33µs        ? ?/sec
convert_rows 4096 u64(0)                                                                                                      1.00     30.1±0.08µs        ? ?/sec    1.00     30.0±0.06µs        ? ?/sec
convert_rows 4096 u64(0.3)                                                                                                    1.00     30.2±0.06µs        ? ?/sec    1.00     30.2±0.04µs        ? ?/sec
iterate rows                                                                                                                  1.00      2.6±0.00µs        ? ?/sec    1.00      2.6±0.00µs        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jul 29, 2025

benchmark results look good -- no regressions

@alamb alamb merged commit 079d4f2 into apache:main Jul 29, 2025
13 checks passed
@alamb
Copy link
Contributor

alamb commented Jul 29, 2025

Thanks again @ding-young

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants