After working on #584, I noticed the sparse implementation of SUNMatScaleAdd has a $O(M N)$ runtime for the same reason identified in #253. Ideally the runtime would be $O(nnz_A + nnz_B)$ (total # of nonzero entries in two matrices being added). This can be attained (or at least close), but this is complicated somewhat by indexptrs not necessarily being sorted for each column (assuming CSC). After reviewing the csparse implementation, on which I think the current algorithm is roughly based, I see two possible approaches to improve SUNMatScaleAdd performance that I can test.