-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Closed
Labels
broadcastApplying a function over a collectionApplying a function over a collectionperformanceMust go fasterMust go faster
Description
Consider two implementations of the same function
julia> function foo2(T, c, f)
d = T.(c)
return sum(f .* d)
end
foo2 (generic function with 2 methods)
julia> bar2(T, c, f) = sum(f .* T.(c))
bar2 (generic function with 3 methods)
bar2
seems like it is just better,
it avoids creating the intermidairy variable, so allows the broadcase to fuse,
and indeed with a plain vector it is:
julia> @btime foo2(Float32, [0.1,0.2], [1.0,2.0]);
473.802 ns (9 allocations: 512 bytes)
julia> @btime bar2(Float32, [0.1,0.2], [1.0,2.0]);
369.920 ns (8 allocations: 416 bytes)
But with a StaticArray foo2
is much better.
julia> @btime foo2(Float32, @SVector[0.1,0.2], @SVector[1.0,2.0]);
0.031 ns (0 allocations: 0 bytes)
julia> @btime bar2(Float32, @SVector[0.1,0.2], @SVector[1.0,2.0]);
946.536 ns (22 allocations: 528 bytes)
I think what is happening is that something about foo2
is constant fold-able for StaticArray
s.
and that bar2
isn't.
Those number were taken on 1.5., but i saw similar on master
Metadata
Metadata
Assignees
Labels
broadcastApplying a function over a collectionApplying a function over a collectionperformanceMust go fasterMust go faster