Use division by 100 in to_string for integers
#5691
+111
−12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚙️ Optimization
Resolves #3857. Divides by 100 instead of by 10, as proposed.
The table is already there, as mentioned by @StephanTLavavej on Discord.
Adjusted the file with the table to be C++14 capable.
There's 32-bit arch branch to deal with 64-bit type, that one is ignored for now, we focus on x64 optimization.
There's similar place in
to_chars, skipped for now.🏁 Benchmark
Large and small numbers., like numbers naturally seen when counting things.
Generated via log-normal distribution, as @statementreply suggested.
Picked some arbitrary parameters, to approximately fit in the integer ranges.
Benchmarked also
std::_UIntegral_to_buffseparetely as well to see how much the optimization helps on its own, avoiding #1024 limitation.⏱️ Benchmark results
i5-1235U P cores:
i5-1235U E cores:
🥉 Results interpretation
I'm not even sure if this is worth doing.
Allocating the string and copying the result there takes roughly half of the time, so the effect of micro-optimization in digits generation is small.
However, the internal function seem to show improvement. This looks like an indication that #1024 improvement would help here. It could be that the performance is limited due to failed store-to-load forwarding, as individual character stores are followed by bulk memcpy; in this case, the improvement may be somewhat negated by a longer stall.