Add @inbounds to SparseArrays.nzrange for CuSparseDeviceMatrixCSC #2970

Zinoex · 2025-11-13T11:28:36Z

Add @inbounds to SparseArrays.nzrange(g::CuSparseDeviceMatrixCSC, col::Integer) to avoid bounds checking the colPtr (that was causing me much register spilling into local mem), consistent with the new SparseArrays functionality of GPUArrays.jl.

…::Integer)

github-actions · 2025-11-13T11:29:09Z

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.

diff --git a/lib/cusparse/device.jl b/lib/cusparse/device.jl
index 6fa563552..dea10c5f6 100644
--- a/lib/cusparse/device.jl
+++ b/lib/cusparse/device.jl
@@ -37,7 +37,7 @@ SparseArrays.nnz(g::CuSparseDeviceMatrixCSC) = g.nnz
 SparseArrays.rowvals(g::CuSparseDeviceMatrixCSC) = g.rowVal
 SparseArrays.getcolptr(g::CuSparseDeviceMatrixCSC) = g.colPtr
 SparseArrays.getnzval(g::CuSparseDeviceMatrixCSC) = g.nzVal
-SparseArrays.nzrange(g::CuSparseDeviceMatrixCSC, col::Integer) = @inbounds SparseArrays.getcolptr(g)[col]:(SparseArrays.getcolptr(g)[col+1]-1)
+SparseArrays.nzrange(g::CuSparseDeviceMatrixCSC, col::Integer) = @inbounds SparseArrays.getcolptr(g)[col]:(SparseArrays.getcolptr(g)[col + 1] - 1)
 SparseArrays.nonzeros(g::CuSparseDeviceMatrixCSC) = g.nzVal
 
 const CuSparseDeviceColumnView{Tv, Ti} = SubArray{Tv, 1, <:CuSparseDeviceMatrixCSC{Tv, Ti}, Tuple{Base.Slice{Base.OneTo{Int}}, Int}}

kshyatt · 2025-11-13T11:43:29Z

Just fyi these types are probably going away soon -- I honestly don't remember if I had @inbounds on the GPUArrays.jl function 🙈

Zinoex · 2025-11-13T13:30:27Z

I'll cross that bridge when I need to. The GPUArrays.jl sparse device arrays look to be a drop-in replacement, but I also have some quite complex kernels in IntervalMDP.jl operating on CuSparseMatrixCSC, and therefore, I expect to run into issues when transitioning - and I most certainly don't have the time to fix that atm.

github-actions

CUDA.jl Benchmarks

Benchmark suite	Current: `48ee0d2`	Previous: `2e983fe`	Ratio
`latency/precompile`	`56920777390` ns	`56427085830.5` ns	`1.01`
`latency/ttfp`	`8317020437.5` ns	`8362501410` ns	`0.99`
`latency/import`	`4498206426` ns	`4521778039` ns	`0.99`
`integration/volumerhs`	`9624244.5` ns	`9624952.5` ns	`1.00`
`integration/byval/slices=1`	`147221` ns	`146870` ns	`1.00`
`integration/byval/slices=3`	`426220` ns	`425790` ns	`1.00`
`integration/byval/reference`	`145136` ns	`144866` ns	`1.00`
`integration/byval/slices=2`	`286528.5` ns	`286021` ns	`1.00`
`integration/cudadevrt`	`103621` ns	`103323` ns	`1.00`
`kernel/indexing`	`14182` ns	`14090` ns	`1.01`
`kernel/indexing_checked`	`14943` ns	`14977.5` ns	`1.00`
`kernel/occupancy`	`691.2333333333333` ns	`670.5886075949367` ns	`1.03`
`kernel/launch`	`2199.3333333333335` ns	`2115.8` ns	`1.04`
`kernel/rand`	`18671` ns	`16842` ns	`1.11`
`array/reverse/1d`	`19950` ns	`19633` ns	`1.02`
`array/reverse/2dL_inplace`	`66907` ns	`66698` ns	`1.00`
`array/reverse/1dL`	`70183` ns	`69881` ns	`1.00`
`array/reverse/2d`	`21758` ns	`21367` ns	`1.02`
`array/reverse/1d_inplace`	`9638` ns	`9601` ns	`1.00`
`array/reverse/2d_inplace`	`13400` ns	`13220` ns	`1.01`
`array/reverse/2dL`	`73895` ns	`73483` ns	`1.01`
`array/reverse/1dL_inplace`	`66908` ns	`66751` ns	`1.00`
`array/copy`	`20830` ns	`20712` ns	`1.01`
`array/iteration/findall/int`	`157172.5` ns	`156846` ns	`1.00`
`array/iteration/findall/bool`	`139875` ns	`139935.5` ns	`1.00`
`array/iteration/findfirst/int`	`161160` ns	`160606` ns	`1.00`
`array/iteration/findfirst/bool`	`161959` ns	`161405` ns	`1.00`
`array/iteration/scalar`	`73837` ns	`72218` ns	`1.02`
`array/iteration/logical`	`216845.5` ns	`215761.5` ns	`1.01`
`array/iteration/findmin/1d`	`50519` ns	`49669` ns	`1.02`
`array/iteration/findmin/2d`	`96561.5` ns	`96275.5` ns	`1.00`
`array/reductions/reduce/Int64/1d`	`43137` ns	`43492` ns	`0.99`
`array/reductions/reduce/Int64/dims=1`	`44422` ns	`44664.5` ns	`0.99`
`array/reductions/reduce/Int64/dims=2`	`61610` ns	`61641` ns	`1.00`
`array/reductions/reduce/Int64/dims=1L`	`88803` ns	`88640` ns	`1.00`
`array/reductions/reduce/Int64/dims=2L`	`87915.5` ns	`87635.5` ns	`1.00`
`array/reductions/reduce/Float32/1d`	`37370` ns	`36681` ns	`1.02`
`array/reductions/reduce/Float32/dims=1`	`51991` ns	`48806` ns	`1.07`
`array/reductions/reduce/Float32/dims=2`	`59879` ns	`59459` ns	`1.01`
`array/reductions/reduce/Float32/dims=1L`	`52434` ns	`52065` ns	`1.01`
`array/reductions/reduce/Float32/dims=2L`	`72183` ns	`71664` ns	`1.01`
`array/reductions/mapreduce/Int64/1d`	`43399` ns	`43256` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=1`	`44879.5` ns	`44863` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=2`	`61690` ns	`61500` ns	`1.00`
`array/reductions/mapreduce/Int64/dims=1L`	`89152` ns	`88638` ns	`1.01`
`array/reductions/mapreduce/Int64/dims=2L`	`88167` ns	`87897.5` ns	`1.00`
`array/reductions/mapreduce/Float32/1d`	`37366` ns	`36277.5` ns	`1.03`
`array/reductions/mapreduce/Float32/dims=1`	`48730` ns	`41259` ns	`1.18`
`array/reductions/mapreduce/Float32/dims=2`	`60130` ns	`59440` ns	`1.01`
`array/reductions/mapreduce/Float32/dims=1L`	`52689` ns	`52331.5` ns	`1.01`
`array/reductions/mapreduce/Float32/dims=2L`	`72524.5` ns	`71656.5` ns	`1.01`
`array/broadcast`	`20031` ns	`19817` ns	`1.01`
`array/copyto!/gpu_to_gpu`	`12986` ns	`11436` ns	`1.14`
`array/copyto!/cpu_to_gpu`	`214181` ns	`215179` ns	`1.00`
`array/copyto!/gpu_to_cpu`	`282787` ns	`282618` ns	`1.00`
`array/accumulate/Int64/1d`	`124556` ns	`124273` ns	`1.00`
`array/accumulate/Int64/dims=1`	`83102` ns	`83182` ns	`1.00`
`array/accumulate/Int64/dims=2`	`157715` ns	`157485` ns	`1.00`
`array/accumulate/Int64/dims=1L`	`1709359` ns	`1709450` ns	`1.00`
`array/accumulate/Int64/dims=2L`	`966166` ns	`966304` ns	`1.00`
`array/accumulate/Float32/1d`	`109001` ns	`108932` ns	`1.00`
`array/accumulate/Float32/dims=1`	`80062` ns	`80065` ns	`1.00`
`array/accumulate/Float32/dims=2`	`147542.5` ns	`146929` ns	`1.00`
`array/accumulate/Float32/dims=1L`	`1618641` ns	`1618534.5` ns	`1.00`
`array/accumulate/Float32/dims=2L`	`698238` ns	`697506` ns	`1.00`
`array/construct`	`1282.2` ns	`1270.6` ns	`1.01`
`array/random/randn/Float32`	`45167.5` ns	`47947` ns	`0.94`
`array/random/randn!/Float32`	`24926` ns	`24918` ns	`1.00`
`array/random/rand!/Int64`	`27311` ns	`27167` ns	`1.01`
`array/random/rand!/Float32`	`8903.666666666666` ns	`8884.333333333334` ns	`1.00`
`array/random/rand/Int64`	`29812` ns	`37695.5` ns	`0.79`
`array/random/rand/Float32`	`13240.5` ns	`12943` ns	`1.02`
`array/permutedims/4d`	`55770.5` ns	`59797.5` ns	`0.93`
`array/permutedims/2d`	`54046` ns	`53660` ns	`1.01`
`array/permutedims/3d`	`54951` ns	`54666` ns	`1.01`
`array/sorting/1d`	`2757753` ns	`2757791.5` ns	`1.00`
`array/sorting/by`	`3344532` ns	`3344326` ns	`1.00`
`array/sorting/2d`	`1080947` ns	`1080588` ns	`1.00`
`cuda/synchronization/stream/auto`	`1044.8` ns	`1040` ns	`1.00`
`cuda/synchronization/stream/nonblocking`	`7844.4` ns	`6879.299999999999` ns	`1.14`
`cuda/synchronization/stream/blocking`	`857.6823529411764` ns	`805.0612244897959` ns	`1.07`
`cuda/synchronization/context/auto`	`1196` ns	`1175.2` ns	`1.02`
`cuda/synchronization/context/nonblocking`	`7480` ns	`7439.7` ns	`1.01`
`cuda/synchronization/context/blocking`	`932.9230769230769` ns	`896.560975609756` ns	`1.04`

This comment was automatically generated by workflow using github-action-benchmark.

Add @inbounds to SparseArrays.nzrange(g::CuSparseDeviceMatrixCSC, col…

48ee0d2

…::Integer)

github-actions bot reviewed Nov 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add @inbounds to SparseArrays.nzrange for CuSparseDeviceMatrixCSC #2970

Add @inbounds to SparseArrays.nzrange for CuSparseDeviceMatrixCSC #2970

Uh oh!

Zinoex commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

kshyatt commented Nov 13, 2025

Uh oh!

Zinoex commented Nov 13, 2025

Uh oh!

github-actions bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add @inbounds to SparseArrays.nzrange for CuSparseDeviceMatrixCSC #2970

Are you sure you want to change the base?

Add @inbounds to SparseArrays.nzrange for CuSparseDeviceMatrixCSC #2970

Uh oh!

Conversation

Zinoex commented Nov 13, 2025

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

kshyatt commented Nov 13, 2025

Uh oh!

Zinoex commented Nov 13, 2025

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

CUDA.jl Benchmarks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants