-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Improve char to int parsing, and add tryparse
method
#59603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Benchmark: ``` julia> s = [rand('0':'9') for i in 1:100000]; julia> function foo(s) n = 0 for i in s n += parse(Int, i) end n end ``` Before: 196 us After: 39 us
Oi, you've sniped me! (just kidding, improvement is nice to see) |
tryparse
method
Test failures are unrelated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR introduces a lot of unrelated and unmotivated style chagnes that make this function out of like with they style of the surrounding code. Other than that, this seems pretty reasonable.
|
Thanks for the feedback. I've made the following changes as you suggested:
It's fine for us to join forces. I'll keep this one open since it's the most recent and there is no point closing it, but whatever changes you have in the other PR, please feel free to point them out or suggest them here. |
That looks good! Thanks for thinking about my comments :) My PR includes these things that you might want to pull in:
Here's a patch that only contains those changes. You can apply it locally by copying it to a file and doing diff --git a/NEWS.md b/NEWS.md
index e20f62d146..41f4065404 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -20,6 +20,7 @@ is considered a bug fix ([#47102])
- The `hash` algorithm and its values have changed. Most `hash` specializations will remain correct and require no action. Types that reimplement the core hashing logic independently, such as some third-party string packages do, may require a migration to the new algorithm. ([#57509])
* Indexless `getindex` and `setindex!` (i.e. `A[]`) on `ReinterpretArray` now correctly throw a `BoundsError` when there is more than one element. ([#58814])
+* Parsing characters is now faster and characters can be parsed to more number types; `tryparse` now accepts characters as well as strings ([#47167])
Compiler/Runtime improvements
-----------------------------
diff --git a/base/parse.jl b/base/parse.jl
index 4309094e9f..42e6d9350e 100644
--- a/base/parse.jl
+++ b/base/parse.jl
@@ -5,9 +5,10 @@ import Base.Checked: add_with_overflow, mul_with_overflow
## string to integer functions ##
"""
- parse(type, str; base)
+ parse(type, str)
+ parse(<:Integer, str; base=10)
-Parse a string as a number. For `Integer` types, a base can be specified
+Parse a string (or character) as a number. For `Integer` types, a base can be specified
(the default is 10). For floating-point types, the string is parsed as a decimal
floating-point number. `Complex` types are parsed from decimal strings
of the form `"R±Iim"` as a `Complex(R,I)` of the requested type; `"i"` or `"j"` can also be
@@ -17,6 +18,9 @@ If the string does not contain a valid number, an error is raised.
!!! compat "Julia 1.1"
`parse(Bool, str)` requires at least Julia 1.1.
+!!! compat "Julia 1.13"
+ `parse(type, AbstractChar)` requires at least Julia 1.13 for non-integer types.
+
# Examples
```jldoctest
julia> parse(Int, "1234")
@@ -38,6 +42,18 @@ julia> parse(Complex{Float64}, "3.2e-1 + 4.5im")
parse(T::Type, str; base = Int)
parse(::Type{Union{}}, slurp...; kwargs...) = error("cannot parse a value as Union{}")
+"""
+ tryparse(type, str)
+ tryparse(<:Integer, str; base=10)
+
+Like [`parse`](@ref), but returns either a value of the requested type,
+or [`nothing`](@ref) if the string does not contain a valid number.
+
+!!! compat "Julia 1.13"
+ `tryparse(type, AbstractChar)` requires at least Julia 1.13.
+"""
+tryparse(T::Type, str; base = Int)
+
function parse(::Type{T}, c::AbstractChar; base::Integer = 10) where T<:Integer
a::Int = (base <= 36 ? 10 : 36)
2 <= base <= 62 || throw(ArgumentError("invalid base: base must be 2 ≤ base ≤ 62, got $base"))
diff --git a/test/parse.jl b/test/parse.jl
index e2b94a45cc..a2e9c166e7 100644
--- a/test/parse.jl
+++ b/test/parse.jl
@@ -29,6 +29,18 @@
@test parse(Int, 'a', base=16) == 10
@test_throws ArgumentError parse(Int, 'a')
@test_throws ArgumentError parse(Int,typemax(Char))
+
+ @test tryparse(Int, '8') === 8
+ @test tryparse(Int, 'a') === nothing
+ @test tryparse(Int, 'a'; base=11) === 10
+ @test tryparse(Int32, 'a'; base=11) === Int32(10)
+ @test tryparse(UInt8, 'f'; base=16) === 0x0f
+ @test tryparse(UInt8, 'f'; base=15) === nothing
+
+ @test_throws ArgumentError parse(Int, 'a'; base=63)
+ @test_throws ArgumentError tryparse(Int, 'a'; base=63)
+ @test_throws ArgumentError parse(Int, "a"; base=63)
+ @test_throws ArgumentError tryparse(Int, "a"; base=63)
end
# Issue 29451
@@ -265,6 +277,18 @@ end
@test tryparse(Float32, "1.23") === 1.23f0
@test tryparse(Float16, "1.23") === Float16(1.23)
+@testset "parse Chars to non-integer types" begin
+ for T in (Float64, Float32, Float16, Complex)
+ @test parse(T, '3') === T(3)
+ @test tryparse(T, '3') === T(3)
+ @test tryparse(T, 'a') === nothing
+
+ # for consistency with parse(T, str), `base` is not a valid argument
+ @test_throws MethodError parse(T, '3'; base=11)
+ @test_throws MethodError tryparse(T, '3'; base=11)
+ end
+end
+
# parsing complex numbers (#22250)
@testset "complex parsing" begin
for sign in ('-','+'), Im in ("i","j","im"), s1 in (""," "), s2 in (""," "), s3 in (""," "), s4 in (""," ") |
This comment was marked as off-topic.
This comment was marked as off-topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. My comments are more FYI than requesting changes. If you want to change anything based on them, feel free; otherwise go ahead and add the merge me label and I'll merge this.
Thanks for the PR!
UInt8('a') ≤ cp ≤ UInt8('z') ? cp - UInt8('a') + a : | ||
0xff | ||
d < base || (throw ? _invalid_digit(base, c) : return nothing) | ||
convert(T, d)::T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI This is equivalent to a ::T
return type annotation.
end | ||
|
||
function parse(::Type{T}, c::AbstractChar; base::Integer=10) where {T <: Integer} | ||
@inline parse_char(T, c, base, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I try to avoid second guessing the compiler on inlining heuristics unless I'm very familiar with the code and all its callsites and/or I've measured performance differences that confirm that I know better than the compiler.
This inlining choice seems accurate to me so I'm willing to proceed with the annotation if you think it helps performance.
The merged PR was still missing the documentation changes as detailed here |
Ah, right. Sorry about that. Do you want me to make a followup PR with that diff or would you like to? |
I've opened one here: #59789 |
This introduced a new test failure. |
This was orignially a larger PR, but I've shelved the larger refactoring for now.
This PR does two things:
tryparse(::Type{T}, c::AbstractChar; base::Integer=10) where {T <: Integer}
which was previously missing