Skip to content

Conversation

jakobnissen
Copy link
Member

@jakobnissen jakobnissen commented Sep 19, 2025

This was orignially a larger PR, but I've shelved the larger refactoring for now.
This PR does two things:

  • Add a new method tryparse(::Type{T}, c::AbstractChar; base::Integer=10) where {T <: Integer} which was previously missing
  • Speed up parsing of chars to integer by ~4x

Benchmark:
```
julia> s = [rand('0':'9') for i in 1:100000];

julia> function foo(s)
           n = 0
           for i in s
               n += parse(Int, i)
           end
           n
       end
```
Before: 196 us
After: 39 us
@tecosaur
Copy link
Member

Oi, you've sniped me! (just kidding, improvement is nice to see)

@nsajko nsajko added the performance Must go faster label Sep 19, 2025
@jakobnissen jakobnissen changed the title WIP: Refactor parsing integers/floats Improve char to int parsing, and add tryparse method Sep 21, 2025
@jakobnissen jakobnissen marked this pull request as ready for review September 21, 2025 13:11
@jakobnissen
Copy link
Member Author

Test failures are unrelated

Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces a lot of unrelated and unmotivated style chagnes that make this function out of like with they style of the surrounding code. Other than that, this seems pretty reasonable.

@jakobnissen
Copy link
Member Author

  • I've aligned formatting closer with the original style (if only there was some automatic tool to format code so people didn't have to think about this!) 😉
  • I've changed the 0x0a literal to UInt8(10) as requested

@jakobnissen
Copy link
Member Author

jakobnissen commented Oct 2, 2025

Thanks for the feedback. I've made the following changes as you suggested:

  • Allow parse/tryparse to non-Integers, in the manner you suggested
  • Have tryparse(::T, ::AbstractChar) throw an error on invalid base instead of returning nothing, to mirror the string method
  • Have the other methods in the file also use _invalid_char to improve their out/inlining.
  • Use codepoint instead of doing UInt8(c). Efficiency is the same, but codepoint is a more direct, unambiguous way of achieving it.

It's fine for us to join forces. I'll keep this one open since it's the most recent and there is no point closing it, but whatever changes you have in the other PR, please feel free to point them out or suggest them here.

@cmcaine
Copy link
Contributor

cmcaine commented Oct 2, 2025

That looks good! Thanks for thinking about my comments :)

My PR includes these things that you might want to pull in:

  • documentation of the changes in the docstrings and in NEWS.md
  • more tests

Here's a patch that only contains those changes. You can apply it locally by copying it to a file and doing git apply filename. There will be some minor conflicts in the tests file.

diff --git a/NEWS.md b/NEWS.md
index e20f62d146..41f4065404 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -20,6 +20,7 @@ is considered a bug fix ([#47102])
   - The `hash` algorithm and its values have changed. Most `hash` specializations will remain correct and require no action. Types that reimplement the core hashing logic independently, such as some third-party string packages do, may require a migration to the new algorithm. ([#57509])
 
 * Indexless `getindex` and `setindex!` (i.e. `A[]`) on `ReinterpretArray` now correctly throw a `BoundsError` when there is more than one element. ([#58814])
+* Parsing characters is now faster and characters can be parsed to more number types; `tryparse` now accepts characters as well as strings ([#47167])
 
 Compiler/Runtime improvements
 -----------------------------
diff --git a/base/parse.jl b/base/parse.jl
index 4309094e9f..42e6d9350e 100644
--- a/base/parse.jl
+++ b/base/parse.jl
@@ -5,9 +5,10 @@ import Base.Checked: add_with_overflow, mul_with_overflow
 ## string to integer functions ##
 
 """
-    parse(type, str; base)
+    parse(type, str)
+    parse(<:Integer, str; base=10)
 
-Parse a string as a number. For `Integer` types, a base can be specified
+Parse a string (or character) as a number. For `Integer` types, a base can be specified
 (the default is 10). For floating-point types, the string is parsed as a decimal
 floating-point number.  `Complex` types are parsed from decimal strings
 of the form `"R±Iim"` as a `Complex(R,I)` of the requested type; `"i"` or `"j"` can also be
@@ -17,6 +18,9 @@ If the string does not contain a valid number, an error is raised.
 !!! compat "Julia 1.1"
     `parse(Bool, str)` requires at least Julia 1.1.
 
+!!! compat "Julia 1.13"
+    `parse(type, AbstractChar)` requires at least Julia 1.13 for non-integer types.
+
 # Examples
 ```jldoctest
 julia> parse(Int, "1234")
@@ -38,6 +42,18 @@ julia> parse(Complex{Float64}, "3.2e-1 + 4.5im")
 parse(T::Type, str; base = Int)
 parse(::Type{Union{}}, slurp...; kwargs...) = error("cannot parse a value as Union{}")
 
+"""
+    tryparse(type, str)
+    tryparse(<:Integer, str; base=10)
+
+Like [`parse`](@ref), but returns either a value of the requested type,
+or [`nothing`](@ref) if the string does not contain a valid number.
+
+!!! compat "Julia 1.13"
+    `tryparse(type, AbstractChar)` requires at least Julia 1.13.
+"""
+tryparse(T::Type, str; base = Int)
+
 function parse(::Type{T}, c::AbstractChar; base::Integer = 10) where T<:Integer
     a::Int = (base <= 36 ? 10 : 36)
     2 <= base <= 62 || throw(ArgumentError("invalid base: base must be 2 ≤ base ≤ 62, got $base"))
diff --git a/test/parse.jl b/test/parse.jl
index e2b94a45cc..a2e9c166e7 100644
--- a/test/parse.jl
+++ b/test/parse.jl
@@ -29,6 +29,18 @@
     @test parse(Int, 'a', base=16) == 10
     @test_throws ArgumentError parse(Int, 'a')
     @test_throws ArgumentError parse(Int,typemax(Char))
+
+    @test tryparse(Int, '8') === 8
+    @test tryparse(Int, 'a') === nothing
+    @test tryparse(Int, 'a'; base=11) === 10
+    @test tryparse(Int32, 'a'; base=11) === Int32(10)
+    @test tryparse(UInt8, 'f'; base=16) === 0x0f
+    @test tryparse(UInt8, 'f'; base=15) === nothing
+
+    @test_throws ArgumentError parse(Int, 'a'; base=63)
+    @test_throws ArgumentError tryparse(Int, 'a'; base=63)
+    @test_throws ArgumentError parse(Int, "a"; base=63)
+    @test_throws ArgumentError tryparse(Int, "a"; base=63)
 end
 
 # Issue 29451
@@ -265,6 +277,18 @@ end
 @test tryparse(Float32, "1.23") === 1.23f0
 @test tryparse(Float16, "1.23") === Float16(1.23)
 
+@testset "parse Chars to non-integer types" begin
+    for T in (Float64, Float32, Float16, Complex)
+        @test parse(T, '3') === T(3)
+        @test tryparse(T, '3') === T(3)
+        @test tryparse(T, 'a') === nothing
+
+        # for consistency with parse(T, str), `base` is not a valid argument
+        @test_throws MethodError parse(T, '3'; base=11)
+        @test_throws MethodError tryparse(T, '3'; base=11)
+    end
+end
+
 # parsing complex numbers (#22250)
 @testset "complex parsing" begin
     for sign in ('-','+'), Im in ("i","j","im"), s1 in (""," "), s2 in (""," "), s3 in (""," "), s4 in (""," ")

@PallHaraldsson

This comment was marked as off-topic.

Copy link
Member

@LilithHafner LilithHafner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. My comments are more FYI than requesting changes. If you want to change anything based on them, feel free; otherwise go ahead and add the merge me label and I'll merge this.

Thanks for the PR!

UInt8('a') cp UInt8('z') ? cp - UInt8('a') + a :
0xff
d < base || (throw ? _invalid_digit(base, c) : return nothing)
convert(T, d)::T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI This is equivalent to a ::T return type annotation.

end

function parse(::Type{T}, c::AbstractChar; base::Integer=10) where {T <: Integer}
@inline parse_char(T, c, base, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I try to avoid second guessing the compiler on inlining heuristics unless I'm very familiar with the code and all its callsites and/or I've measured performance differences that confirm that I know better than the compiler.

This inlining choice seems accurate to me so I'm willing to proceed with the annotation if you think it helps performance.

@LilithHafner LilithHafner merged commit 92af0d8 into JuliaLang:master Oct 9, 2025
6 of 8 checks passed
@cmcaine
Copy link
Contributor

cmcaine commented Oct 9, 2025

The merged PR was still missing the documentation changes as detailed here

@LilithHafner
Copy link
Member

Ah, right. Sorry about that. Do you want me to make a followup PR with that diff or would you like to?

cmcaine added a commit to cmcaine/julia that referenced this pull request Oct 9, 2025
@cmcaine
Copy link
Contributor

cmcaine commented Oct 9, 2025

I've opened one here: #59789

@aviatesk
Copy link
Member

This introduced a new test failure.

aviatesk added a commit that referenced this pull request Oct 10, 2025
@jakobnissen jakobnissen deleted the parse branch October 10, 2025 07:58
@DilumAluthge DilumAluthge added the reverted This PR has since been reverted label Oct 11, 2025
jakobnissen pushed a commit to cmcaine/julia that referenced this pull request Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Must go faster reverted This PR has since been reverted strings "Strings!"

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants