The magic of OpenGL’s fragment shader texture sampling
I’ve been learning OpenGL ES 2.0, the basis for WebGL. It’s neat how simple they’ve made OpenGL with this specification. Almost everything is done by writing two “shader” functions that run on the GPU: a vertex shader to position some count of coordinates, and a fragment shader to color each pixel in the resulting shape. Simple, yet powerful.
One thing a fragment (≈pixel) shader can do is lookup a color from an input image to use for the output color. This is called texture sampling, and can look something like:
gl_FragColor = texture2D(inputImage, texturePosition);
This causes the color of the current output pixel be the color found at some position in the texture image. The texture can be sampled at a position between two underlying texture pixels, in which case the nearby pixels might be blended by interpolation.
Now, imagine if a fragment shader were using a square texture image that’s 256 pixels wide to get colors for a much smaller amount (say 16×16) of output pixels. To make the blended output values better represent the overall source texture, the texture pixels might actually be averaged to a number of smaller texture images (e.g. 128, 64, 32… pixels wide) and the one closest to the size needed will be used to lookup the resulting pixel value.
What’s strange about this is that the exact same code is used to do the lookup across multiple texture detail levels; the GPU will automatically pick the right size texture reduction to use. But how? The fragment shader just asks the texture sample function about a single position in a texture set, but that doesn’t tell the sampler anything about how “big” a sample is needed! Yet somehow the sampler does its job, using a smaller version from the texture set when appropriate.
To accomplish this strange magic, the GPU uses a really nifty trick. You might also call this trick swell, or even neat-o. This trick, that is so superb, is explained as follows:
We are assuming that the expression is being evaluated in parallel on a SIMD array so that at any given point in time the value of the function is known at the grid points represented by the SIMD array. — some document I found on the Internet
Basically, the fragment shader function gets applied to a block of, say, 4 pixels simultaneously. Example: the GPU is ready to draw pixels at (0,0), (0,1), (1,0) and (1,1) and so it calls the same fragment shader four times with each of those positions. The fragment shaders all do some mathy stuff to decide where they want to sample from, and perhaps they each respectively ask for texture coordinates (0.49, 0.49), (0.49, 0.51), (0.51, 0.49), (0.51, 0.51) — AT THE SAME TIME!
Voilà! Now the GPU isn’t being asked to look up a single position in the texture. It’s got four positions, which it can compare to the pixel locations and see that the four adjacent pixels are coming from texture locations only 0.02 units apart. That’s enough information to pick the correct texture level, based on the rate of change across each of the fragment sampler calls.
But what if we’re feeling subversive and write a fragment shader that only samples the texture if the output pixel falls on a dark square of some checkerboard pattern? Documents on the Internet gotcha covered:
These implicit derivatives will be undefined for texture fetches occuring inside non-uniform control flow or for vertex shader texture fetches, resulting in undefined texels. — spoken by official looking words
One of the first things a programmer should learn is that “undefined” is legal vocabulary for “do you feel lucky, punk?”. (More politely: “we recommend you not cause this situation”.) The OpenGL site has some tips for Avoiding This Subversion of Magic on the Shader language sampler wiki page.