Skip to content

Conversation

LilithHafner
Copy link
Member

Instead of taking of using a pre-allocated linear product tree (i.e. foldl(mul!, arr)), use a balanced binary product tree with pre-allocated linear leaves of length 16. This is far from optimal but already often an order of magnitude or two faster than the current implementation. It can be sub-optimal by up to a factor of log2(n). This is reached in the case where there is one big element and the rest are small and nonzero in which case this implementation performs log2(n) large multiplications while the optimal only performs 1. The current implementation is sub-optimal by a factor of n/log2(n) in cases where all the elements are similar sizes.

Fixes #59052

Benchmarks:

fig
Methodology and test cases

function prod_default(arr::AbstractArray{BigInt})
    invoke(prod, Tuple{AbstractArray{<:Real}}, arr)
end

"Assumes `x != 0`. Computes top_set_bit(abs(x))."

@eval Base.GMP function _top_set_bit(x::BigInt)
    x.size * BITS_PER_LIMB - GC.@preserve x leading_zeros(unsafe_load(x.d, abs(x.size)))
end

@eval Base.GMP function prod_proposed(arr::AbstractArray{BigInt})
    any(iszero, arr) && return zero(BigInt)
    _prod_proposed(arr, firstindex(arr), lastindex(arr))
end
@eval Base.GMP function _prod_proposed(arr::AbstractArray{BigInt}, lo, hi)
    if hi - lo + 1 <= 16
        nbits = BITS_PER_LIMB
        for i in lo:hi
            nbits += _top_set_bit(arr[i])
        end
        init = BigInt(; nbits)
        MPZ.set_si!(init, 1)
        for i in lo:hi
            MPZ.mul!(init, arr[i])
        end
        init
    else
        mid = (lo + hi) ÷ 2
        MPZ.mul!(_prod_proposed(arr, lo, mid), _prod_proposed(arr, mid+1, hi))
    end
end
prod_proposed(x) = Base.GMP.prod_proposed(x)

using Chairmarks, Plots
test_cases = [
    vcat(big(rand(UInt128))^1000, fill(big(3), 10000)),
    vcat(big(rand(UInt128))^1000, fill(big(2), 10000)),
    vcat(big(rand(UInt128))^1000, fill(big(1), 10000)),
    vcat(big(rand(UInt128))^1000, fill(big(0), 10000)),
    vcat(fill(big(3), 10000), big(rand(UInt128))^1000),#5
    vcat(fill(big(2), 10000), big(rand(UInt128))^1000),
    vcat(fill(big(1), 10000), big(rand(UInt128))^1000),
    vcat(fill(big(0), 10000), big(rand(UInt128))^1000),
    [big(rand(UInt128))^8 for _ in 1:5000],
    [big(rand(UInt128))^8 for _ in 1:50],#10
    [big(rand(UInt128))^8 for _ in 1:5],
    [big(rand(UInt128))^4 for _ in 1:10000],
    [big(rand(UInt128))^2 for _ in 1:10000],
    [big(rand(UInt128)) for _ in 1:10000],
    [big(rand(UInt64)) for _ in 1:10000],#15
    [big(rand(UInt32)) for _ in 1:10000],
    [big(rand(UInt16)) for _ in 1:10000],
    [big(rand(1:3)) for _ in 1:100000],
    [big(rand(UInt16)) for _ in 1:1000],
    [big(rand(UInt16)) for _ in 1:10],#20
    collect(big.(1:10_000)),
]
bits() = sum.(Base.top_set_bit, test_cases)

function benchmark()
    [(@b x prod,prod_default,prod_proposed) for x in test_cases]
end

labels() = ["prod", "prod_default", "prod_proposed"]

x = @time benchmark()

M = 3.5e9 * max.(25e-9, transpose(stack([collect(getproperty.(t, :time)) for t in x]))) ./ bits()

yt = [.0005,.0006,.0007,.0008,.0009,.001,.002,.003,.004,.005,.006,.007,.008,.009,.01,.02,.03,.04,.05,.06,.07,.08,.09,.1,.2,.3,.4,.5,.6,.7,.8,.9,1,2,3,4,5,6,7,8,9,10,20,30,40,50,60,70,80,90,100,200,300,400,500,600,700,800,900,1000,2000]; 
scatter(M, yaxis=:log, labels=permutedims(labels()), alpha=.8, 
    yticks=(yt, [x / 10.0^floor(Int, log10(x)) > 1 ? "" : isinteger(x) ? string(Int(x)) : string(x) for x in yt]),
    ylabel="Cycles per bit", xlabel="Test case")

@LilithHafner LilithHafner added performance Must go faster bignums BigInt and BigFloat labels Sep 1, 2025
@adienes
Copy link
Member

adienes commented Sep 2, 2025

will this break on AbstractArrays that don't support linear indexing?

@oscardssmith
Copy link
Member

IIUC AbstractArrays support linear indexing (it just might be slower since it has to do the conversion to cartesian). For a case like this, it's probably worth it no matter what since the indexing will be faster than the BigInt multiplication.

@LilithHafner
Copy link
Member Author

@oscardssmith, are you willing to give this a final review?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bignums BigInt and BigFloat performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

prod(::AbstractVector{BigInt}) is O(n^2) when it could be O(n log n)
5 participants