GPU Ray Tracing in an Afternoon

10 minute read

It seems to have become something of a rite of passage nowadays for those interested in graphics programming to work their way through Peter Shirley’s excellent Ray Tracing in One Weekend, along the way experiencing firsthand the “aha” moments and the “it’s that simple!?” realizations. With its companions Ray Tracing: The Next Week and Ray Tracing: The Rest of Your Life, the book walks the reader through building a straightforward ray tracing implementation from scratch.

While the book presents all of its code in C++, there have been countless others who have translated its content to other languages. Indeed, I am far from the first to attempt implementing the book on the GPU, but enjoyed the undertaking and decided to share my experience in doing so.

The books are available for a very reasonable $2.99 each on Amazon, and are freely (and legally) available here.

Motivation

Enjoyable challenge aside, the main motivation for moving this task to the GPU is speed. Ray tracing happens to fall into a category of problems sometimes referred to as “embarrassingly parallel”. Specifically, each individual pixel can be rendered separately without knowledge of other pixels being rendered alongside it. This is just the type of work a GPU is built for and will happily churn through with relative ease. A render that could take several minutes, if not hours, to create a passable image at low- to medium- resolution on a CPU, can produce a higher-quality result in a fraction of the time on a GPU.

“In an Afternoon”

The goal of this post is to summarize the changes involved to move the CPU-based implementation to run entirely on the GPU. It is assumed the reader is familiar with Shirley’s book, and has read through and followed along implementing the CPU-based ray tracer. The content here, then, should be largely review and it is not the goal to reintroduce concepts that are covered by the book.

It is recommended that the reader first work their way through the book’s implementation in order to gain a firm conceptual grasp before tackling a GPU-based implementation.

From start to finish, it took me only a few hours to implement the version of the book presented below, including searching for things like random number functions that aren’t immediately available GPU-side.

Setup

I used Visual Studio Code with the following plugins while implementing the ray tracer:

Shader Toy by Adam Stevenson

Shader languages support for VS Code by slevesque

The ray tracer here is written in GLSL in order to take advantage of the Shader Toy plugin and get immediate feedback while working through the individual chapter implementations.

Adaptations

Inheritance and Polymorphism

In the book’s implementation, Shirley takes advantage of C++’s concepts of inheritance and polymorphism to present a clean and simple interface for hittable objects and materials to implement. This serves to simplify much of the implementation, as the implementer can then rely on the correct method being executed based on the underlying type.

For example:

class hittable
{
public:
    virtual bool hit(const ray& r, float tmin, float tmax, hit_record& rec) const = 0;
};

class sphere : public hittable
{
public:
    bool hit(const ray& r, float tmin, float tmax, hit_record& rec) const override
    {
        // sphere-specific hit implentation
    }
};

Since these commodities are not available on the GPU side, we can instead employ type identifiers when it’s necessary to be able to perform an action in a polymorphic way.

// MT_ material type
#define MT_DIFFUSE 0
#define MT_METAL 1
#define MT_DIALECTRIC 2

struct Material
{
    int type;
    vec3 albedo;
    float roughness;    // controls roughness for metals
    float refIdx;       // index of refraction for dialectric
};

bool scatter(Ray rIn, HitRecord rec, out vec3 atten, out Ray rScattered)
{
    if(rec.material.type == MT_DIFFUSE)
    {
        // ... diffuse scatter response
        return true;
    }
    if(rec.material.type == MT_METAL)
    {
        // ... metal scatter response
        return true;
    }
    if(rec.material.type == MT_DIALECTRIC)
    {
        // ... dialectric scatter response
        return true;
    }
    return false;
}

Random Numbers

C’s and C++’s random number utilities make random number generation CPU-side rather straightforward (we won’t discuss the merits of how “good” the results are here).

C++ provides the <random> header, and the types and functions therein can be used to easily generate random numbers in a given range. For our purposes, we’re specifically interested in generating numbers between 0 and 1.

While searching for an easy and “good enough” implementation of GPU-based pseudo-random numbers, I stumbled upon the following ShaderToy implementation by Reinder Nijhoff who adapted the hash functions by nimitz here. I also adopted the random_in_unit_disk and random_in_unit_sphere functions provided by Nijhoff. As mentioned in the introduction, I’m hardly the first to undertake this task.

Anti-aliasing

Chapter 7 of the book adds anti-aliasing to the ray tracer. At its very basics, this means taking multiple subsamples at a given pixel and averaging the results together. For my implementation of this chapter, I went with an absurdly simple box filter based implementation.

vec2 rcpRes = vec2(1.0) / iResolution.xy;
vec3 col = vec3(0.0);
int numSamples = 4;
float rcpNumSamples = 1.0 / float(numSamples);
for(int x = 0; x < numSamples; ++x)
{
    for(int y = 0; y < numSamples; ++y)
    {
        vec2 adj = vec2(float(x), float(y));
        vec2 uv = (gl_FragCoord.xy + adj * rcpNumSamples) * rcpRes;
        col += color(getRay(cam, uv));
    }
}
col /= float(numSamples * numSamples);

This works well enough to confirm that multiple subsamples averaged together make a cleaner image, but it’s limited in a number of ways, not the least of which is that the highly regular sampling pattern yields diminishing returns from additional samples.

Progressive Path Tracing

Starting with chapter 8, which introduces diffuse materials and kicks off what most would deem the more visually interesting portion of the book, I decided that instead of taking multiple samples per frame, I would instead create a feedback loop where all previous results were fed into the current frame. Each frame would use a random offset within the current pixel footprint by way of the noise functions mentioned earlier, and have its result appended to a running average of all previous frames.

The new shader entrypoint looks similar to the following, and is used in every subsequent chapter’s implementation.

// near the top of the shader - this is described in the Shader Toy plugin's documentation
// sets the previous frame's result as a texture input to the current frame
#iChannel0 "self"
vec2 uv = gl_FragCoord.xy / iResolution.xy;
vec4 prev = texture(iChannel0, uv);
vec3 prevLinear = toLinear(prev.xyz);
prevLinear *= prev.w;

uv = (gl_FragCoord.xy + hash2(gSeed)) / iResolution.xy;
vec3 col = color(getRay(cam, uv));

if(iMouseButton.x != 0.0 || iMouseButton.y != 0.0)
{
    col = toGamma(col);
    gl_FragColor = vec4(col, 1.0);
    return;
}
if(prev.w > 5000.0)
{
    gl_FragColor = prev;
    return;
}

col = (col + prevLinear);
float w = prev.w + 1.0;
col /= w;
col = toGamma(col);
gl_FragColor = vec4(col, w);

Since the number of samples is ever-increasing, this approach has the potential to run into floating-point issues when the number of samples in the running average becomes large. If you remove the if(prev.w > 5000.0) block and allow the ray tracer to run long enough, you’re likely to see little black dots show up in the image. These are caused by values becoming so high in the running average that they are no longer representable as floating-point numbers and end up as nan or inf. Capping the number of samples allows for a high quality render and also avoids these issues. The value can be adjusted up or down depending on scene and preference. There are almost certainly more robust ways to solve this issue, but in the spirit of simplicity, that will be outside the scope of this post.

Because of the substantial speedup gained by moving the work to the GPU, I’ve added simple camera controls to the scenes. In cases where the camera moves, the running average is reset in order to prevent the views smearing across each other. This is what the if(iMouseButton...) check is doing.

Below are two images of the same scene, one taken moments after the render began, the other taken after a few seconds of accumulation.

Short Render Time Longer Render Time

Recursion

GLSL does not support recursive functions. This limitation is simple enough to overcome by instead using a loop with a capped number of steps.

The CPU implementation below, taken from my own initial follow-along with the book

Vector3f color(const Ray& ray, const Hitable* world, const int32 depth)
{
    HitRecord rec;
    if(world->hit(ray, 0.001f, FLT_MAX, rec))
    {
        Ray scattered;
        Vector3f attenuation;

        if(depth < 50 && rec.material->scatter(ray, rec, attenuation, scattered))
        {
            return attenuation * color(scattered, world, depth + 1);
        }
        else
        {
            return Vector3f{0.0f, 0.0f, 0.0f};
        }
    }

    const Vector3f unitDirection = normalize(ray.direction());
    const float32 t = 0.5f * (unitDirection.y + 1.0f);
    return Vector3f{1.0f, 1.0f, 1.0f} * (1.0f - t) + Vector3f{0.5f, 0.7f, 1.0f} * t;
}

now becomes

vec3 color(Ray r)
{
    HitRecord rec;
    vec3 col = vec3(1.0);
    for(int i = 0; i < MAX_DEPTH; ++i)
    {
        if(hit_world(r, 0.001, 10000.0, rec))
        {
            Ray scatterRay;
            vec3 atten;
            if(scatter(r, rec, atten, scatterRay))
            {
                col *= atten;
                r = scatterRay;
            }
            else
            {
                return vec3(0.0);
            }
        }
        else
        {
            float t = 0.5 * (r.d.y + 1.0);
            col *= mix(vec3(1.0), vec3(0.5, 0.7, 1.0), t);
            return col;
        }
    }
    return col;
}

The initial ray is simply overwritten with the result of the ray produced by the scattering event before the next loop iteration.

Scene Representation

The use of polymorphic types allows certain niceties as described in the section above. One of these niceties is having a HittableList type that can itself include any number of Hittable implementers and be traversed quite easily.

Instead of creating a pseudo-polymorphic hittable type as described above for materials, I’ve instead opted for building the scene on each invocation of a hit_world function that can be implemented freshly in each shader depending on what content is desired. There is perhaps a trade-off here between the memory usage and execution speed that could be worth exploring in the future. For a scene reprensentation built on shader entry and used throughout, the memory requirement would increase, but the cost of re-creating those types and materials during traversal may decrease.

Of course, there’s nothing stopping you from implementing a more flexible Hittable type, similar to how materials are handled, and having a type identifier decipher which intersection function should be used. For the purposes of this exercise, I found the above approach to be plenty.

The Code

Obtaining

All the code associated with this post can be found in the below GitLab repository. The code is structured such that there is one .glsl file for each chapter of the book that produces output. The files are named with a convention of b#_ch#.glsl where the first number is which book the file comes from, and the second is the chapter in the book. All of the first book in the series is represented, as well as the first chapter in the second book, as it was minimal effort to add a new MovingSphere type and implement motion blur.

The common.glsl file contains types, intersection functions, the scattering function, helper functions for creating types with default values, and the all-important noise functions.

Gitlab Repository

Running

The simplest way to see the code in action is to follow theses steps:

  • clone it from the repository
  • open a Visual Studio Code workspace at the root of the folder containing the code
  • open one of the chapter .glsl files
  • select Shader Toy: Show GLSL Preview from the VS Code command palette (ctrl+shift+p).

ShaderToy

With a little coercing, mostly around setting up the feedback-loop through a buffer, detecting mouse movement, and updating main to mainImage, the code can be updated to run on ShaderToy. I’ve created an example here.

In order to see the results in action, be sure to press the play button on the embedded viewer. Click and move the mouse to change the camera location.

Results

Results
Motion Blur

References

Ray Tracing in One Weekend by Peter Shirley

RIOW 1.12: Where Next? by Reinder Nijhoff

Quality hashes collection WebGL2 by nimitz

Updated: