Skip to content

Conversation

@AlexGuteniev
Copy link
Contributor

@AlexGuteniev AlexGuteniev commented Aug 23, 2025

⚙️ Optimization

Resolves #3857. Divides by 100 instead of by 10, as proposed.
The table is already there, as mentioned by @StephanTLavavej on Discord.
Adjusted the file with the table to be C++14 capable.

There's 32-bit arch branch to deal with 64-bit type, that one is ignored for now, we focus on x64 optimization.

There's similar place in to_chars, skipped for now.

🏁 Benchmark

Large and small numbers., like numbers naturally seen when counting things.
Generated via log-normal distribution, as @statementreply suggested.
Picked some arbitrary parameters, to approximately fit in the integer ranges.

Benchmarked also std::_UIntegral_to_buff separetely as well to see how much the optimization helps on its own, avoiding #1024 limitation.

⏱️ Benchmark results

i5-1235U P cores:

Benchmark Before After Speedup
internal_integer_to_buff<uint8_t, 2.5, 1.5> 3.36 ns 2.63 ns 1.28
internal_integer_to_buff<uint16_t, 5.0, 3.0> 3.64 ns 2.86 ns 1.27
internal_integer_to_buff<uint32_t, 10.0, 6.0> 4.84 ns 3.57 ns 1.36
internal_integer_to_buff<uint64_t, 20.0, 12.0> 9.78 ns 6.57 ns 1.49
integer_to_string<uint8_t, 2.5, 1.5> 6.77 ns 7.61 ns 0.89
integer_to_string<uint16_t, 5.0, 3.0> 8.86 ns 7.32 ns 1.21
integer_to_string<uint32_t, 10.0, 6.0> 8.91 ns 7.62 ns 1.17
integer_to_string<uint64_t, 20.0, 12.0> 16.6 ns 12.8 ns 1.30
integer_to_string<int8_t, 2.5, 1.5> 8.60 ns 8.20 ns 1.05
integer_to_string<int16_t, 5.0, 3.0> 7.73 ns 7.91 ns 0.98
integer_to_string<int32_t, 10.0, 6.0> 10.5 ns 8.07 ns 1.30
integer_to_string<int64_t, 20.0, 12.0> 16.6 ns 13.6 ns 1.22

i5-1235U E cores:

Benchmark Before Time Speedup
internal_integer_to_buff<uint8_t, 2.5, 1.5> 4.40 ns 6.57 ns 0.67
internal_integer_to_buff<uint16_t, 5.0, 3.0> 7.01 ns 5.45 ns 1.29
internal_integer_to_buff<uint32_t, 10.0, 6.0> 11.8 ns 6.85 ns 1.72
internal_integer_to_buff<uint64_t, 20.0, 12.0> 25.4 ns 14.8 ns 1.72
integer_to_string<uint8_t, 2.5, 1.5> 19.8 ns 22.6 ns 0.88
integer_to_string<uint16_t, 5.0, 3.0> 17.7 ns 18.0 ns 0.98
integer_to_string<uint32_t, 10.0, 6.0> 20.1 ns 19.1 ns 1.05
integer_to_string<uint64_t, 20.0, 12.0> 40.1 ns 32.5 ns 1.23
integer_to_string<int8_t, 2.5, 1.5> 18.8 ns 21.3 ns 0.88
integer_to_string<int16_t, 5.0, 3.0> 20.8 ns 20.3 ns 1.02
integer_to_string<int32_t, 10.0, 6.0> 22.0 ns 20.4 ns 1.08
integer_to_string<int64_t, 20.0, 12.0> 40.9 ns 36.1 ns 1.13

🥉 Results interpretation

I'm not even sure if this is worth doing.

Allocating the string and copying the result there takes roughly half of the time, so the effect of micro-optimization in digits generation is small.

However, the internal function seem to show improvement. This looks like an indication that #1024 improvement would help here. It could be that the performance is limited due to failed store-to-load forwarding, as individual character stores are followed by bulk memcpy; in this case, the improvement may be somewhat negated by a longer stall.

@AlexGuteniev AlexGuteniev requested a review from a team as a code owner August 23, 2025 19:42
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews Aug 23, 2025
@StephanTLavavej StephanTLavavej added performance Must go faster decision needed We need to choose something before working on this labels Aug 24, 2025
@StephanTLavavej StephanTLavavej self-assigned this Aug 24, 2025
Comment on lines +2794 to +2795
*--_RNext = static_cast<_Elem>(__DIGIT_TABLE<_Elem>[_UVal_trunc_part * 2 + 1]);
*--_RNext = static_cast<_Elem>(__DIGIT_TABLE<_Elem>[_UVal_trunc_part * 2]);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be further optimized to

            _RNext -= 2;
            _CSTD memcpy(_RNext, __DIGIT_TABLE<_Elem> + _UVal_trunc_part * 2, 2 * sizeof(_Elem))

still without much better results in the benchmark.

@StephanTLavavej

This comment was marked as resolved.

@azure-pipelines

This comment was marked as resolved.

@AlexGuteniev

This comment was marked as outdated.

@AlexGuteniev AlexGuteniev force-pushed the integers branch 2 times, most recently from 672f1db to 7ea6121 Compare August 25, 2025 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

decision needed We need to choose something before working on this performance Must go faster

Projects

Status: Initial Review

Development

Successfully merging this pull request may close these issues.

<string>: to_string() for integers could be faster

2 participants