How to remove conditionals from your Shader Program

Introduction

Having conditionals in shaders can often have a serious performance impact on the code. Compilers can sometimes find clever ways to optimize them away, but if your shader code really does have to branch and perform comparisons the speed can drop off significantly.

How GPU handle branching of the code

Following part explains how GPU execute the if-else conditions –

“NVIDIA gpus use conditional execution to handle branch divergence within the SIMD group (“warp”). In your if…else example, both branches get executed by every thread in the diverging warp, but those threads which don’t follow a given branch are flagged and perform a null op instead. This is the classic branch divergence penalty – interwarp branch divergence takes two passes through the code section to retire for warp. This isn’t ideal, which is why performance oriented code tries to minimize this. One thing which often catches out people is making an assumption about which section of a divergent path gets executed “first”. The have been some very subtle bugs cause by second guessing the internal order of execution within a divergent warp."

For simpler conditionals, NVIDIA GPUs support conditional evaluation at the ALU, which causes no divergence, and for conditionals where the whole warp follows the same path, there is also obviously no penalty.”

To avoid these situations I designed a number of functions which allow you to do things often done using conditionals. They perform some comparison and then return 1.0 on true and 0.0 on false. This can be useful, for example, if you want to add a number to another number, but only when some condition is true.

if (x == 0) { y += 5; }

This can be transformed to the following.

y += 5 * when_eq(x, 0)

It doesn’t cover all cases but most of the time some clever thinking can transform the comparison into this kind of operation.

The full set of functions are defined here:

vec4 when_eq(vec4 x, vec4 y) 
{ return 1.0 – abs(sign(x – y)); }

vec4 when_neq(vec4 x, vec4 y) 
{ return abs(sign(x – y)); }

vec4 when_gt(vec4 x, vec4 y) 
{ return max(sign(x – y), 0.0); }

vec4 when_lt(vec4 x, vec4 y) 
{ return max(sign(y – x), 0.0); }

vec4 when_ge(vec4 x, vec4 y) 
{ return 1.0 – when_lt(x, y); }

vec4 when_le(vec4 x, vec4 y) 
{ return 1.0 – when_gt(x, y); }

These are defined in GLSL, and used on the vec4 types. Using vec4 the comparison is done component-wise, which means you can do four comparisons at once! Similar logic can also be used for any other shading language or vector/float type.

The logic behind the functions is fairly simple, and underlies the actual maths behind comparison which computers use. This is exactly the kind of transformation a smart compiler would do to optimize conditionals away from your code naturally.

As a bonus here are a set of logical operators you can use together with the outputs from these comparisons.

vec4 and(vec4 a, vec4 b) 
{ return a * b; }

vec4 or(vec4 a, vec4 b) 
{ return min(a + b, 1.0); }

vec4 xor(vec4 a, vec4 b) 
{ return (a + b) % 2.0; }

vec4 not(vec4 a) 
{ return 1.0 – a; }

Hopefully this has helped speed up your shaders by reducing unneeded conditionals. Happy shading!

References:

http://stackoverflow.com/questions/5897454/what-do-work-items-execute-when-conditionals-are-used-in-gpu-programming
https://www.shadertoy.com/view/XtG3zc
http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter34.html