Skip to content

Fused Broadcasting kills constant folding? #39151

@oxinabox

Description

@oxinabox

Consider two implementations of the same function

julia> function foo2(T, c, f)
       d = T.(c)
       return sum(f .* d)
       end
foo2 (generic function with 2 methods)

julia> bar2(T, c, f) = sum(f .* T.(c))
bar2 (generic function with 3 methods)

bar2 seems like it is just better,
it avoids creating the intermidairy variable, so allows the broadcase to fuse,
and indeed with a plain vector it is:

julia> @btime foo2(Float32, [0.1,0.2], [1.0,2.0]);
  473.802 ns (9 allocations: 512 bytes)

julia> @btime bar2(Float32, [0.1,0.2], [1.0,2.0]);
  369.920 ns (8 allocations: 416 bytes)

But with a StaticArray foo2 is much better.

julia> @btime foo2(Float32, @SVector[0.1,0.2], @SVector[1.0,2.0]);
  0.031 ns (0 allocations: 0 bytes)

julia> @btime bar2(Float32, @SVector[0.1,0.2], @SVector[1.0,2.0]);
  946.536 ns (22 allocations: 528 bytes)

I think what is happening is that something about foo2 is constant fold-able for StaticArrays.
and that bar2 isn't.

Those number were taken on 1.5., but i saw similar on master

Metadata

Metadata

Assignees

No one assigned

    Labels

    broadcastApplying a function over a collectionperformanceMust go faster

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions