Branch prediction: yet another example

2017/10/04 julia microoptimization

Tomas Lycken linked a very nice discussion on StackOverflow about branch prediction as a comment on the previous post. It has an intuitive explanation (read it if you like good metaphors) and some Java benchmarks. I was curious about how it looks in Julia.

The exercise is to sum elements in a vector only if they are greater than or equal to 128.

function sumabove_if(x)
    s = zero(eltype(x))
    for elt in x
        if elt  128
            s += elt
        end
    end
    s
end

This calculation naturally has a branch in it, while the branchless version, using ifelse, does not:

function sumabove_ifelse(x)
    s = zero(eltype(x))
    for elt in x
        s += ifelse(elt  128, elt, zero(eltype(x)))
    end
    s
end

The actual example has something different: using tricky bit-twiddling to calculate the same value. I generally like to leave this sort of thing up to the compiler, because it is much, much better at it than I am, and I make mistakes all the time; worse, I don't know what I actually did when I reread the code 6 months later. But I included it here for comparison:

function sumabove_tricky(x::Vector{Int64})
    s = Int64(0)
    for elt in x
        s += ~((elt - 128) >> 63) & elt
    end
    s
end

Following the original example on StackOverflow, we sum 2^15 random integers in 1:256. For this, we don't need to worry about overflow. We also sum the sorted vector: this will facilitate branch predicion, since the various branches will be contiguous.

I also benchmark a simple version using generators:

sumabove_generator(x) = sum(y for y in x if y  128)
Benchmarks (\(μ\)s)
random sorted
if 139 28
ifelse 21 21
if & sort 96 n/a
tricky 27 27
generator 219 168

Benchmarks are in the table above. Note that

  1. for the version with if, working on sorted vectors is dramatically faster (about 5x).

  2. the non-branching ifelse version beats them hands down, and naturally it does not care about sorting.

  3. if you have to use if, then you are better off sorting, even if you take the time of that into account.

  4. generators are susprisingly bad.

  5. the tricky bit-twiddling version is actually worse than ifelse (which reinforces my aversion to it).

Self-contained code for everything is available below.

download code as code.jl

site not optimized for small screens, math may break