Skip to content

Reuse integrators in nlstep and NonlinearSolveAlg integrations #2760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ChrisRackauckas
Copy link
Member

This is a pretty major performance boost

@@ -10,7 +10,7 @@ using Test
eqs = [D(y₁) ~ -k₁ * y₁ + k₃ * y₂ * y₃,
D(y₂) ~ k₁ * y₁ - k₂ * y₂^2 - k₃ * y₂ * y₃,
D(y₃) ~ k₂ * y₂^2]
@mtkbuild rober = ODESystem(eqs, t)
@mtkcompile rober = ODESystem(eqs, t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtkcompile rober = System(eqs, t)

@ChrisRackauckas
Copy link
Member Author

With this and the nonlinearsolve.jl PR, this is now not allocating in the loop locally... except I found something in FBDF 😅

terk_tmp = @.. broadcast=false fd_weights[k - 2, 1]*u

Copy link
Contributor

github-actions bot commented Jul 18, 2025

Benchmark Results (Julia v1)

Time benchmarks
master afa99d6... master / afa99d6...
construction/linear_N50 29.9 ± 16 μs 0.0503 ± 0.023 ms 0.596 ± 0.42
construction/lotka_volterra 20 ± 11 μs 20.5 ± 13 μs 0.976 ± 0.84
construction/rober 0.0352 ± 0.016 ms 20.1 ± 0.96 μs 1.75 ± 0.82
nonstiff/fitzhugh_nagumo/BS3 0.153 ± 0.052 ms 0.232 ± 0.021 ms 0.662 ± 0.23
nonstiff/fitzhugh_nagumo/DP5 0.183 ± 0.031 ms 0.104 ± 0.064 ms 1.76 ± 1.1
nonstiff/fitzhugh_nagumo/Tsit5 0.118 ± 0.0082 ms 0.123 ± 0.07 ms 0.959 ± 0.55
nonstiff/fitzhugh_nagumo/Vern6 0.17 ± 0.0087 ms 0.285 ± 0.043 ms 0.598 ± 0.095
nonstiff/fitzhugh_nagumo/Vern7 0.119 ± 0.0094 ms 0.122 ± 0.0087 ms 0.975 ± 0.1
nonstiff/lotka_volterra/BS3 0.298 ± 0.017 ms 0.309 ± 0.028 ms 0.965 ± 0.11
nonstiff/lotka_volterra/DP5 0.0536 ± 0.027 ms 0.0864 ± 0.045 ms 0.62 ± 0.45
nonstiff/lotka_volterra/Tsit5 0.071 ± 0.044 ms 0.0644 ± 0.036 ms 1.1 ± 0.92
nonstiff/lotka_volterra/Vern6 0.109 ± 0.047 ms 0.153 ± 0.031 ms 0.71 ± 0.34
nonstiff/lotka_volterra/Vern7 0.0563 ± 0.034 ms 0.0619 ± 0.031 ms 0.91 ± 0.71
nonstiff/pleiades/BS3 0.121 ± 0.046 s 0.101 ± 0.032 s 1.2 ± 0.59
nonstiff/pleiades/DP5 1.25 ± 0.11 ms 1.62 ± 0.26 ms 0.777 ± 0.14
nonstiff/pleiades/Tsit5 16.5 ± 7.3 ms 21.2 ± 10 ms 0.779 ± 0.51
nonstiff/pleiades/Vern6 8.56 s 9.57 s 0.895
nonstiff/pleiades/Vern7 9.22 s 10.5 s 0.881
scaling/brusselator_2d/16x16 0.504 ± 0.24 s 0.592 ± 0.33 s 0.852 ± 0.63
scaling/brusselator_2d/32x32 25.2 s 59.2 s 0.425
scaling/brusselator_2d/8x8 12.1 ± 1.3 ms 13.5 ± 1.6 ms 0.896 ± 0.14
scaling/linear/N10 0.0447 ± 0.025 ms 0.0507 ± 0.024 ms 0.881 ± 0.64
scaling/linear/N100 1.27 ± 0.13 ms 0.867 ± 0.025 ms 1.47 ± 0.16
scaling/linear/N50 0.409 ± 0.061 ms 0.306 ± 0.17 ms 1.34 ± 0.78
stiff/pollution/FBDF 0.639 ± 0.053 ms 0.679 ± 0.031 ms 0.941 ± 0.089
stiff/pollution/KenCarp4 0.605 ± 0.042 ms 0.625 ± 0.059 ms 0.968 ± 0.11
stiff/pollution/Rodas4 0.908 ± 0.5 ms 0.833 ± 0.061 ms 1.09 ± 0.6
stiff/pollution/Rosenbrock23 1.46 ± 0.1 ms 1.49 ± 0.84 ms 0.981 ± 0.55
stiff/pollution/TRBDF2 0.551 ± 0.024 ms 0.591 ± 0.081 ms 0.932 ± 0.13
stiff/rober/FBDF 0.733 ± 0.11 ms 0.762 ± 0.22 ms 0.962 ± 0.31
stiff/rober/KenCarp4 1.08 ± 0.054 ms 1.7 ± 0.91 ms 0.633 ± 0.34
stiff/rober/Rodas4 0.507 ± 0.017 ms 1.04 ± 0.55 ms 0.487 ± 0.26
stiff/rober/Rosenbrock23 0.333 ± 0.018 ms 0.356 ± 0.023 ms 0.935 ± 0.078
stiff/rober/TRBDF2 2.46 ± 0.21 ms 2.69 ± 2.2 ms 0.915 ± 0.75
stiff/van_der_pol/FBDF 13.7 ± 9.7 ms 12.8 ± 3.5 ms 1.07 ± 0.82
stiff/van_der_pol/KenCarp4 7 ± 0.34 ms 7.68 ± 6.8 ms 0.911 ± 0.8
stiff/van_der_pol/Rodas4 12.2 ± 9.2 ms 22 ± 3.4 ms 0.555 ± 0.43
stiff/van_der_pol/Rosenbrock23 31.5 ± 14 ms 0.0363 ± 0.016 s 0.868 ± 0.54
stiff/van_der_pol/TRBDF2 4.29 ± 0.24 ms 4.33 ± 0.38 ms 0.99 ± 0.1
time_to_load 5.33 ± 0.65 s 4.91 ± 0.99 s 1.09 ± 0.26
Memory benchmarks
master afa99d6... master / afa99d6...
construction/linear_N50 0.071 k allocs: 0.0411 MB 0.071 k allocs: 0.0411 MB 1
construction/lotka_volterra 0.065 k allocs: 2.45 kB 0.065 k allocs: 2.45 kB 1
construction/rober 0.065 k allocs: 2.42 kB 0.065 k allocs: 2.42 kB 1
nonstiff/fitzhugh_nagumo/BS3 3.69 k allocs: 0.164 MB 3.69 k allocs: 0.164 MB 1
nonstiff/fitzhugh_nagumo/DP5 2.63 k allocs: 0.127 MB 2.63 k allocs: 0.127 MB 1
nonstiff/fitzhugh_nagumo/Tsit5 4 k allocs: 0.182 MB 4 k allocs: 0.182 MB 1
nonstiff/fitzhugh_nagumo/Vern6 4.54 k allocs: 0.207 MB 4.54 k allocs: 0.207 MB 1
nonstiff/fitzhugh_nagumo/Vern7 3.91 k allocs: 0.165 MB 3.91 k allocs: 0.165 MB 1
nonstiff/lotka_volterra/BS3 7.88 k allocs: 0.365 MB 7.88 k allocs: 0.365 MB 1
nonstiff/lotka_volterra/DP5 1.21 k allocs: 0.0543 MB 1.21 k allocs: 0.0543 MB 1
nonstiff/lotka_volterra/Tsit5 2.17 k allocs: 0.093 MB 2.17 k allocs: 0.093 MB 1
nonstiff/lotka_volterra/Vern6 2.24 k allocs: 0.0986 MB 2.24 k allocs: 0.0986 MB 1
nonstiff/lotka_volterra/Vern7 1.65 k allocs: 0.0736 MB 1.65 k allocs: 0.0736 MB 1
nonstiff/pleiades/BS3 0.685 M allocs: 0.0675 GB 0.685 M allocs: 0.0675 GB 1
nonstiff/pleiades/DP5 6.63 k allocs: 0.51 MB 6.63 k allocs: 0.51 MB 1
nonstiff/pleiades/Tsit5 0.186 M allocs: 20.4 MB 0.186 M allocs: 20.4 MB 1
nonstiff/pleiades/Vern6 0.038 G allocs: 4.05 GB 0.038 G allocs: 4.05 GB 1
nonstiff/pleiades/Vern7 0.044 G allocs: 4.63 GB 0.044 G allocs: 4.63 GB 1
scaling/brusselator_2d/16x16 3.58 k allocs: 0.152 GB 3.58 k allocs: 0.152 GB 1
scaling/brusselator_2d/32x32 3.4 k allocs: 2.22 GB 3.4 k allocs: 2.22 GB 1
scaling/brusselator_2d/8x8 2.62 k allocs: 8.76 MB 2.62 k allocs: 8.76 MB 1
scaling/linear/N10 0.752 k allocs: 0.0514 MB 0.752 k allocs: 0.0514 MB 1
scaling/linear/N100 2.27 k allocs: 0.901 MB 2.27 k allocs: 0.901 MB 1
scaling/linear/N50 1.66 k allocs: 0.348 MB 1.66 k allocs: 0.348 MB 1
stiff/pollution/FBDF 1.5 k allocs: 0.288 MB 1.5 k allocs: 0.288 MB 1
stiff/pollution/KenCarp4 0.516 k allocs: 0.134 MB 0.516 k allocs: 0.134 MB 1
stiff/pollution/Rodas4 1.28 k allocs: 0.363 MB 1.28 k allocs: 0.363 MB 1
stiff/pollution/Rosenbrock23 2.77 k allocs: 0.819 MB 2.77 k allocs: 0.819 MB 1
stiff/pollution/TRBDF2 0.783 k allocs: 0.168 MB 0.783 k allocs: 0.168 MB 1
stiff/rober/FBDF 2.89 k allocs: 0.137 MB 2.89 k allocs: 0.137 MB 1
stiff/rober/KenCarp4 1.3 k allocs: 0.0575 MB 1.3 k allocs: 0.0575 MB 1
stiff/rober/Rodas4 2.19 k allocs: 0.106 MB 2.19 k allocs: 0.106 MB 1
stiff/rober/Rosenbrock23 2.51 k allocs: 0.117 MB 2.51 k allocs: 0.117 MB 1
stiff/rober/TRBDF2 7.72 k allocs: 0.36 MB 7.72 k allocs: 0.36 MB 1
stiff/van_der_pol/FBDF 0.0444 M allocs: 2.2 MB 0.0444 M allocs: 2.2 MB 1
stiff/van_der_pol/KenCarp4 4.27 k allocs: 0.184 MB 4.27 k allocs: 0.184 MB 1
stiff/van_der_pol/Rodas4 0.0377 M allocs: 1.86 MB 0.0377 M allocs: 1.86 MB 1
stiff/van_der_pol/Rosenbrock23 0.213 M allocs: 9.09 MB 0.213 M allocs: 9.09 MB 1
stiff/van_der_pol/TRBDF2 5.02 k allocs: 0.212 MB 5.02 k allocs: 0.212 MB 1
time_to_load 0.155 k allocs: 10.9 kB 0.155 k allocs: 10.9 kB 1

@ChrisRackauckas
Copy link
Member Author

Spot benchmarks with this:

using ModelingToolkit
using ModelingToolkit: t_nounits as t, D_nounits as D
using NonlinearSolve, OrdinaryDiffEqBDF, OrdinaryDiffEqSDIRK, DiffEqDevTools
using OrdinaryDiffEqNonlinearSolve: NonlinearSolveAlg
using Test
using BenchmarkTools

@parameters k₁ k₂ k₃
@variables y₁(t) y₂(t) y₃(t)

eqs = [D(y₁) ~ -k₁ * y₁ + k₃ * y₂ * y₃,
    D(y₂) ~ k₁ * y₁ - k₂ * y₂^2 - k₃ * y₂ * y₃,
    D(y₃) ~ k₂ * y₂^2]
@mtkcompile rober = ODESystem(eqs, t)
prob = ODEProblem(rober, [[y₁, y₂, y₃] .=> [1.0; 0.0; 0.0]; [k₁, k₂, k₃] .=> (0.04, 3e7, 1e4)], (0.0, 1e5), jac = true)
prob2 = ODEProblem(rober, [[y₁, y₂, y₃] .=> [1.0; 0.0; 0.0]; [k₁, k₂, k₃] .=> (0.04, 3e7, 1e4)], (0.0, 1e5), jac = true, nlstep = true)

nlalg = NonlinearSolveAlg(NewtonRaphson(autodiff = AutoFiniteDiff()));
nlalgrobust = NonlinearSolveAlg(TrustRegion(autodiff = AutoFiniteDiff()));
sol2 = solve(prob2, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg));

@profview for i in 1:10000 solve(prob2, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); end

sol = solve(prob2, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg));
using Plots
plot(sol)

@btime solve(prob, FBDF(autodiff=AutoFiniteDiff()); save_everystep=false); # 132.333 μs (211 allocations: 16.17 KiB)
@btime solve(prob, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); # 148.459 μs (254 allocations: 21.77 KiB)
@btime solve(prob2, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); # 143.791 μs (250 allocations: 23.58 KiB)

@btime solve(prob, KenCarp4(autodiff=AutoFiniteDiff()); save_everystep=false); # 170.334 μs (193 allocations: 14.81 KiB)
@btime solve(prob, KenCarp4(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); # 168.083 μs (230 allocations: 20.12 KiB)
@btime solve(prob2, KenCarp4(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); # 135.625 μs (238 allocations: 23.08 KiB)

using Sundials
@btime solve(prob, CVODE_BDF(); save_everystep=false); # 72.958 μs (1289 allocations: 45.50 KiB)

using OrdinaryDiffEqRosenbrock
@btime solve(prob2, Rosenbrock23(autodiff=AutoFiniteDiff()); save_everystep=false); # 37.041 μs (179 allocations: 17.44 KiB)
@btime solve(prob2, Rodas5P(autodiff=AutoFiniteDiff()); save_everystep=false); # 84.708 μs (195 allocations: 19.72 KiB)

The vast majority of what's left vs CVODE here is just BDF overhead on small equations. We've never really worried about that too much because it's the domain of the Rosenbrock methods anyways, but with this new feature it might matter more now and we should actually improve the FBDF non-nonlinearsolver parts 😅

@oscardssmith
Copy link
Member

This is now working locally:

using ModelingToolkit
using ModelingToolkit: t_nounits as t, D_nounits as D
using LinearAlgebra, SparseArrays

const N = 8
const xyd_brusselator = range(0, stop = 1, length = N)
brusselator_f(x, y, t) = (((x - 0.3)^2 + (y - 0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.0
limit(a, N) = a == N + 1 ? 1 : a == 0 ? N : a
function brusselator_2d_loop(du, u, p, t)
    A, B, alpha, dx = p
    alpha = alpha / dx^2
    @inbounds for I in CartesianIndices((N, N))
        i, j = Tuple(I)
        x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
        ip1, im1, jp1, jm1 = limit(i + 1, N), limit(i - 1, N), limit(j + 1, N),
        limit(j - 1, N)
        du[i, j, 1] = alpha * (u[im1, j, 1] + u[ip1, j, 1] + u[i, jp1, 1] + u[i, jm1, 1] -
                       4u[i, j, 1]) +
                      B + u[i, j, 1]^2 * u[i, j, 2] - (A + 1) * u[i, j, 1] +
                      brusselator_f(x, y, t)
        du[i, j, 2] = alpha * (u[im1, j, 2] + u[ip1, j, 2] + u[i, jp1, 2] + u[i, jm1, 2] -
                       4u[i, j, 2]) +
                      A * u[i, j, 1] - u[i, j, 1]^2 * u[i, j, 2]
    end
end
p = (3.4, 1.0, 10.0, step(xyd_brusselator))

function init_brusselator_2d(xyd)
    N = length(xyd)
    u = zeros(N, N, 2)
    for I in CartesianIndices((N, N))
        x = xyd[I[1]]
        y = xyd[I[2]]
        u[I, 1] = 22 * (y * (1 - y))^(3 / 2)
        u[I, 2] = 27 * (x * (1 - x))^(3 / 2)
    end
    u
end
u0 = init_brusselator_2d(xyd_brusselator)
prob = ODEProblem(brusselator_2d_loop, u0, (0.0, 11.5), p)

@mtkcompile sys = modelingtoolkitize(prob)
prob_mtk = ODEProblem(sys, [], (0.0, 11.5), nlstep=true, nlstep_compile=false)


nlalg = NonlinearSolveAlg(NewtonRaphson(autodiff = AutoFiniteDiff()))
using NonlinearSolve,OrdinaryDiffEqBDF, OrdinaryDiffEqNonlinearSolve
using OrdinaryDiffEqNonlinearSolve: NonlinearSolveAlg

nlalg = NonlinearSolveAlg(TrustRegion(autodiff = AutoFiniteDiff()))
@btime sol = solve(prob_mtk, FBDF(autodiff=false, nlsolve = nlalg)).stats

@ChrisRackauckas
Copy link
Member Author

nlstep tests are failing.

@ChrisRackauckas ChrisRackauckas mentioned this pull request Aug 10, 2025
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants