Reuse integrators in nlstep and NonlinearSolveAlg integrations #2760

ChrisRackauckas · 2025-07-18T18:56:31Z

This is a pretty major performance boost

oscardssmith · 2025-07-18T19:44:49Z

test/modelingtoolkit/nlstep_tests.jl

@@ -10,7 +10,7 @@ using Test
 eqs = [D(y₁) ~ -k₁ * y₁ + k₃ * y₂ * y₃,
    D(y₂) ~ k₁ * y₁ - k₂ * y₂^2 - k₃ * y₂ * y₃,
    D(y₃) ~ k₂ * y₂^2]
-@mtkbuild rober = ODESystem(eqs, t)
+@mtkcompile rober = ODESystem(eqs, t)


@mtkcompile rober = System(eqs, t)

lib/OrdinaryDiffEqNonlinearSolve/src/newton.jl

ChrisRackauckas · 2025-07-18T19:48:01Z

With this and the nonlinearsolve.jl PR, this is now not allocating in the loop locally... except I found something in FBDF 😅

terk_tmp = @.. broadcast=false fd_weights[k - 2, 1]*u

github-actions · 2025-07-18T19:49:55Z

Benchmark Results (Julia v1)

Time benchmarks

	master	`afa99d6`...	master / `afa99d6`...
construction/linear_N50	29.9 ± 16 μs	0.0503 ± 0.023 ms	0.596 ± 0.42
construction/lotka_volterra	20 ± 11 μs	20.5 ± 13 μs	0.976 ± 0.84
construction/rober	0.0352 ± 0.016 ms	20.1 ± 0.96 μs	1.75 ± 0.82
nonstiff/fitzhugh_nagumo/BS3	0.153 ± 0.052 ms	0.232 ± 0.021 ms	0.662 ± 0.23
nonstiff/fitzhugh_nagumo/DP5	0.183 ± 0.031 ms	0.104 ± 0.064 ms	1.76 ± 1.1
nonstiff/fitzhugh_nagumo/Tsit5	0.118 ± 0.0082 ms	0.123 ± 0.07 ms	0.959 ± 0.55
nonstiff/fitzhugh_nagumo/Vern6	0.17 ± 0.0087 ms	0.285 ± 0.043 ms	0.598 ± 0.095
nonstiff/fitzhugh_nagumo/Vern7	0.119 ± 0.0094 ms	0.122 ± 0.0087 ms	0.975 ± 0.1
nonstiff/lotka_volterra/BS3	0.298 ± 0.017 ms	0.309 ± 0.028 ms	0.965 ± 0.11
nonstiff/lotka_volterra/DP5	0.0536 ± 0.027 ms	0.0864 ± 0.045 ms	0.62 ± 0.45
nonstiff/lotka_volterra/Tsit5	0.071 ± 0.044 ms	0.0644 ± 0.036 ms	1.1 ± 0.92
nonstiff/lotka_volterra/Vern6	0.109 ± 0.047 ms	0.153 ± 0.031 ms	0.71 ± 0.34
nonstiff/lotka_volterra/Vern7	0.0563 ± 0.034 ms	0.0619 ± 0.031 ms	0.91 ± 0.71
nonstiff/pleiades/BS3	0.121 ± 0.046 s	0.101 ± 0.032 s	1.2 ± 0.59
nonstiff/pleiades/DP5	1.25 ± 0.11 ms	1.62 ± 0.26 ms	0.777 ± 0.14
nonstiff/pleiades/Tsit5	16.5 ± 7.3 ms	21.2 ± 10 ms	0.779 ± 0.51
nonstiff/pleiades/Vern6	8.56 s	9.57 s	0.895
nonstiff/pleiades/Vern7	9.22 s	10.5 s	0.881
scaling/brusselator_2d/16x16	0.504 ± 0.24 s	0.592 ± 0.33 s	0.852 ± 0.63
scaling/brusselator_2d/32x32	25.2 s	59.2 s	0.425
scaling/brusselator_2d/8x8	12.1 ± 1.3 ms	13.5 ± 1.6 ms	0.896 ± 0.14
scaling/linear/N10	0.0447 ± 0.025 ms	0.0507 ± 0.024 ms	0.881 ± 0.64
scaling/linear/N100	1.27 ± 0.13 ms	0.867 ± 0.025 ms	1.47 ± 0.16
scaling/linear/N50	0.409 ± 0.061 ms	0.306 ± 0.17 ms	1.34 ± 0.78
stiff/pollution/FBDF	0.639 ± 0.053 ms	0.679 ± 0.031 ms	0.941 ± 0.089
stiff/pollution/KenCarp4	0.605 ± 0.042 ms	0.625 ± 0.059 ms	0.968 ± 0.11
stiff/pollution/Rodas4	0.908 ± 0.5 ms	0.833 ± 0.061 ms	1.09 ± 0.6
stiff/pollution/Rosenbrock23	1.46 ± 0.1 ms	1.49 ± 0.84 ms	0.981 ± 0.55
stiff/pollution/TRBDF2	0.551 ± 0.024 ms	0.591 ± 0.081 ms	0.932 ± 0.13
stiff/rober/FBDF	0.733 ± 0.11 ms	0.762 ± 0.22 ms	0.962 ± 0.31
stiff/rober/KenCarp4	1.08 ± 0.054 ms	1.7 ± 0.91 ms	0.633 ± 0.34
stiff/rober/Rodas4	0.507 ± 0.017 ms	1.04 ± 0.55 ms	0.487 ± 0.26
stiff/rober/Rosenbrock23	0.333 ± 0.018 ms	0.356 ± 0.023 ms	0.935 ± 0.078
stiff/rober/TRBDF2	2.46 ± 0.21 ms	2.69 ± 2.2 ms	0.915 ± 0.75
stiff/van_der_pol/FBDF	13.7 ± 9.7 ms	12.8 ± 3.5 ms	1.07 ± 0.82
stiff/van_der_pol/KenCarp4	7 ± 0.34 ms	7.68 ± 6.8 ms	0.911 ± 0.8
stiff/van_der_pol/Rodas4	12.2 ± 9.2 ms	22 ± 3.4 ms	0.555 ± 0.43
stiff/van_der_pol/Rosenbrock23	31.5 ± 14 ms	0.0363 ± 0.016 s	0.868 ± 0.54
stiff/van_der_pol/TRBDF2	4.29 ± 0.24 ms	4.33 ± 0.38 ms	0.99 ± 0.1
time_to_load	5.33 ± 0.65 s	4.91 ± 0.99 s	1.09 ± 0.26

Memory benchmarks

	master	`afa99d6`...	master / `afa99d6`...
construction/linear_N50	0.071 k allocs: 0.0411 MB	0.071 k allocs: 0.0411 MB	1
construction/lotka_volterra	0.065 k allocs: 2.45 kB	0.065 k allocs: 2.45 kB	1
construction/rober	0.065 k allocs: 2.42 kB	0.065 k allocs: 2.42 kB	1
nonstiff/fitzhugh_nagumo/BS3	3.69 k allocs: 0.164 MB	3.69 k allocs: 0.164 MB	1
nonstiff/fitzhugh_nagumo/DP5	2.63 k allocs: 0.127 MB	2.63 k allocs: 0.127 MB	1
nonstiff/fitzhugh_nagumo/Tsit5	4 k allocs: 0.182 MB	4 k allocs: 0.182 MB	1
nonstiff/fitzhugh_nagumo/Vern6	4.54 k allocs: 0.207 MB	4.54 k allocs: 0.207 MB	1
nonstiff/fitzhugh_nagumo/Vern7	3.91 k allocs: 0.165 MB	3.91 k allocs: 0.165 MB	1
nonstiff/lotka_volterra/BS3	7.88 k allocs: 0.365 MB	7.88 k allocs: 0.365 MB	1
nonstiff/lotka_volterra/DP5	1.21 k allocs: 0.0543 MB	1.21 k allocs: 0.0543 MB	1
nonstiff/lotka_volterra/Tsit5	2.17 k allocs: 0.093 MB	2.17 k allocs: 0.093 MB	1
nonstiff/lotka_volterra/Vern6	2.24 k allocs: 0.0986 MB	2.24 k allocs: 0.0986 MB	1
nonstiff/lotka_volterra/Vern7	1.65 k allocs: 0.0736 MB	1.65 k allocs: 0.0736 MB	1
nonstiff/pleiades/BS3	0.685 M allocs: 0.0675 GB	0.685 M allocs: 0.0675 GB	1
nonstiff/pleiades/DP5	6.63 k allocs: 0.51 MB	6.63 k allocs: 0.51 MB	1
nonstiff/pleiades/Tsit5	0.186 M allocs: 20.4 MB	0.186 M allocs: 20.4 MB	1
nonstiff/pleiades/Vern6	0.038 G allocs: 4.05 GB	0.038 G allocs: 4.05 GB	1
nonstiff/pleiades/Vern7	0.044 G allocs: 4.63 GB	0.044 G allocs: 4.63 GB	1
scaling/brusselator_2d/16x16	3.58 k allocs: 0.152 GB	3.58 k allocs: 0.152 GB	1
scaling/brusselator_2d/32x32	3.4 k allocs: 2.22 GB	3.4 k allocs: 2.22 GB	1
scaling/brusselator_2d/8x8	2.62 k allocs: 8.76 MB	2.62 k allocs: 8.76 MB	1
scaling/linear/N10	0.752 k allocs: 0.0514 MB	0.752 k allocs: 0.0514 MB	1
scaling/linear/N100	2.27 k allocs: 0.901 MB	2.27 k allocs: 0.901 MB	1
scaling/linear/N50	1.66 k allocs: 0.348 MB	1.66 k allocs: 0.348 MB	1
stiff/pollution/FBDF	1.5 k allocs: 0.288 MB	1.5 k allocs: 0.288 MB	1
stiff/pollution/KenCarp4	0.516 k allocs: 0.134 MB	0.516 k allocs: 0.134 MB	1
stiff/pollution/Rodas4	1.28 k allocs: 0.363 MB	1.28 k allocs: 0.363 MB	1
stiff/pollution/Rosenbrock23	2.77 k allocs: 0.819 MB	2.77 k allocs: 0.819 MB	1
stiff/pollution/TRBDF2	0.783 k allocs: 0.168 MB	0.783 k allocs: 0.168 MB	1
stiff/rober/FBDF	2.89 k allocs: 0.137 MB	2.89 k allocs: 0.137 MB	1
stiff/rober/KenCarp4	1.3 k allocs: 0.0575 MB	1.3 k allocs: 0.0575 MB	1
stiff/rober/Rodas4	2.19 k allocs: 0.106 MB	2.19 k allocs: 0.106 MB	1
stiff/rober/Rosenbrock23	2.51 k allocs: 0.117 MB	2.51 k allocs: 0.117 MB	1
stiff/rober/TRBDF2	7.72 k allocs: 0.36 MB	7.72 k allocs: 0.36 MB	1
stiff/van_der_pol/FBDF	0.0444 M allocs: 2.2 MB	0.0444 M allocs: 2.2 MB	1
stiff/van_der_pol/KenCarp4	4.27 k allocs: 0.184 MB	4.27 k allocs: 0.184 MB	1
stiff/van_der_pol/Rodas4	0.0377 M allocs: 1.86 MB	0.0377 M allocs: 1.86 MB	1
stiff/van_der_pol/Rosenbrock23	0.213 M allocs: 9.09 MB	0.213 M allocs: 9.09 MB	1
stiff/van_der_pol/TRBDF2	5.02 k allocs: 0.212 MB	5.02 k allocs: 0.212 MB	1
time_to_load	0.155 k allocs: 10.9 kB	0.155 k allocs: 10.9 kB	1

ChrisRackauckas · 2025-07-18T21:24:42Z

Spot benchmarks with this:

using ModelingToolkit
using ModelingToolkit: t_nounits as t, D_nounits as D
using NonlinearSolve, OrdinaryDiffEqBDF, OrdinaryDiffEqSDIRK, DiffEqDevTools
using OrdinaryDiffEqNonlinearSolve: NonlinearSolveAlg
using Test
using BenchmarkTools

@parameters k₁ k₂ k₃
@variables y₁(t) y₂(t) y₃(t)

eqs = [D(y₁) ~ -k₁ * y₁ + k₃ * y₂ * y₃,
    D(y₂) ~ k₁ * y₁ - k₂ * y₂^2 - k₃ * y₂ * y₃,
    D(y₃) ~ k₂ * y₂^2]
@mtkcompile rober = ODESystem(eqs, t)
prob = ODEProblem(rober, [[y₁, y₂, y₃] .=> [1.0; 0.0; 0.0]; [k₁, k₂, k₃] .=> (0.04, 3e7, 1e4)], (0.0, 1e5), jac = true)
prob2 = ODEProblem(rober, [[y₁, y₂, y₃] .=> [1.0; 0.0; 0.0]; [k₁, k₂, k₃] .=> (0.04, 3e7, 1e4)], (0.0, 1e5), jac = true, nlstep = true)

nlalg = NonlinearSolveAlg(NewtonRaphson(autodiff = AutoFiniteDiff()));
nlalgrobust = NonlinearSolveAlg(TrustRegion(autodiff = AutoFiniteDiff()));
sol2 = solve(prob2, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg));

@profview for i in 1:10000 solve(prob2, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); end

sol = solve(prob2, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg));
using Plots
plot(sol)

@btime solve(prob, FBDF(autodiff=AutoFiniteDiff()); save_everystep=false); # 132.333 μs (211 allocations: 16.17 KiB)
@btime solve(prob, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); # 148.459 μs (254 allocations: 21.77 KiB)
@btime solve(prob2, FBDF(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); # 143.791 μs (250 allocations: 23.58 KiB)

@btime solve(prob, KenCarp4(autodiff=AutoFiniteDiff()); save_everystep=false); # 170.334 μs (193 allocations: 14.81 KiB)
@btime solve(prob, KenCarp4(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); # 168.083 μs (230 allocations: 20.12 KiB)
@btime solve(prob2, KenCarp4(autodiff=AutoFiniteDiff(), nlsolve = nlalg); save_everystep=false); # 135.625 μs (238 allocations: 23.08 KiB)

using Sundials
@btime solve(prob, CVODE_BDF(); save_everystep=false); # 72.958 μs (1289 allocations: 45.50 KiB)

using OrdinaryDiffEqRosenbrock
@btime solve(prob2, Rosenbrock23(autodiff=AutoFiniteDiff()); save_everystep=false); # 37.041 μs (179 allocations: 17.44 KiB)
@btime solve(prob2, Rodas5P(autodiff=AutoFiniteDiff()); save_everystep=false); # 84.708 μs (195 allocations: 19.72 KiB)

The vast majority of what's left vs CVODE here is just BDF overhead on small equations. We've never really worried about that too much because it's the domain of the Rosenbrock methods anyways, but with this new feature it might matter more now and we should actually improve the FBDF non-nonlinearsolver parts 😅

This is a pretty major performance boost

oscardssmith · 2025-08-08T04:48:49Z

This is now working locally:

using ModelingToolkit
using ModelingToolkit: t_nounits as t, D_nounits as D
using LinearAlgebra, SparseArrays

const N = 8
const xyd_brusselator = range(0, stop = 1, length = N)
brusselator_f(x, y, t) = (((x - 0.3)^2 + (y - 0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.0
limit(a, N) = a == N + 1 ? 1 : a == 0 ? N : a
function brusselator_2d_loop(du, u, p, t)
    A, B, alpha, dx = p
    alpha = alpha / dx^2
    @inbounds for I in CartesianIndices((N, N))
        i, j = Tuple(I)
        x, y = xyd_brusselator[I[1]], xyd_brusselator[I[2]]
        ip1, im1, jp1, jm1 = limit(i + 1, N), limit(i - 1, N), limit(j + 1, N),
        limit(j - 1, N)
        du[i, j, 1] = alpha * (u[im1, j, 1] + u[ip1, j, 1] + u[i, jp1, 1] + u[i, jm1, 1] -
                       4u[i, j, 1]) +
                      B + u[i, j, 1]^2 * u[i, j, 2] - (A + 1) * u[i, j, 1] +
                      brusselator_f(x, y, t)
        du[i, j, 2] = alpha * (u[im1, j, 2] + u[ip1, j, 2] + u[i, jp1, 2] + u[i, jm1, 2] -
                       4u[i, j, 2]) +
                      A * u[i, j, 1] - u[i, j, 1]^2 * u[i, j, 2]
    end
end
p = (3.4, 1.0, 10.0, step(xyd_brusselator))

function init_brusselator_2d(xyd)
    N = length(xyd)
    u = zeros(N, N, 2)
    for I in CartesianIndices((N, N))
        x = xyd[I[1]]
        y = xyd[I[2]]
        u[I, 1] = 22 * (y * (1 - y))^(3 / 2)
        u[I, 2] = 27 * (x * (1 - x))^(3 / 2)
    end
    u
end
u0 = init_brusselator_2d(xyd_brusselator)
prob = ODEProblem(brusselator_2d_loop, u0, (0.0, 11.5), p)

@mtkcompile sys = modelingtoolkitize(prob)
prob_mtk = ODEProblem(sys, [], (0.0, 11.5), nlstep=true, nlstep_compile=false)


nlalg = NonlinearSolveAlg(NewtonRaphson(autodiff = AutoFiniteDiff()))
using NonlinearSolve,OrdinaryDiffEqBDF, OrdinaryDiffEqNonlinearSolve
using OrdinaryDiffEqNonlinearSolve: NonlinearSolveAlg

nlalg = NonlinearSolveAlg(TrustRegion(autodiff = AutoFiniteDiff()))
@btime sol = solve(prob_mtk, FBDF(autodiff=false, nlsolve = nlalg)).stats

ChrisRackauckas · 2025-08-08T06:55:07Z

nlstep tests are failing.

ChrisRackauckas mentioned this pull request Jul 18, 2025

NonlinearSolveAlg performance testing and improvements #2757

Open

oscardssmith reviewed Jul 18, 2025

View reviewed changes

lib/OrdinaryDiffEqNonlinearSolve/src/newton.jl Outdated Show resolved Hide resolved

ChrisRackauckas added 2 commits August 6, 2025 14:05

Reuse integrators in nlstep and NonlinearSolveAlg integrations

acaca21

This is a pretty major performance boost

Reduce loop allocations

b9dd04e

oscardssmith mentioned this pull request Aug 6, 2025

don't reset aliasing on reinit! SciML/NonlinearSolve.jl#668

Merged

switch tmp and inplace probmap

e115f0d

oscardssmith force-pushed the reinit branch from 2228761 to e115f0d Compare August 7, 2025 17:07

it works

afa99d6

ChrisRackauckas mentioned this pull request Aug 10, 2025

OrdinaryDiffEq v7 #2310

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reuse integrators in nlstep and NonlinearSolveAlg integrations #2760

Reuse integrators in nlstep and NonlinearSolveAlg integrations #2760

ChrisRackauckas commented Jul 18, 2025

Uh oh!

oscardssmith Jul 18, 2025

Uh oh!

Uh oh!

ChrisRackauckas commented Jul 18, 2025

Uh oh!

github-actions bot commented Jul 18, 2025 •

edited

Loading

Uh oh!

ChrisRackauckas commented Jul 18, 2025

Uh oh!

oscardssmith commented Aug 8, 2025

Uh oh!

ChrisRackauckas commented Aug 8, 2025

Uh oh!

Uh oh!

Uh oh!

Reuse integrators in nlstep and NonlinearSolveAlg integrations #2760

Are you sure you want to change the base?

Reuse integrators in nlstep and NonlinearSolveAlg integrations #2760

Conversation

ChrisRackauckas commented Jul 18, 2025

Uh oh!

oscardssmith Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ChrisRackauckas commented Jul 18, 2025

Uh oh!

github-actions bot commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results (Julia v1)

Uh oh!

ChrisRackauckas commented Jul 18, 2025

Uh oh!

oscardssmith commented Aug 8, 2025

Uh oh!

ChrisRackauckas commented Aug 8, 2025

Uh oh!

Uh oh!

github-actions bot commented Jul 18, 2025 •

edited

Loading