Screen Space Glossy Reflections Demo Available

It’s Been a While

Earlier this year I wrote a post discussing an implementation of real-time screen space glossy reflections.  The post has received a lot of positive feedback, and I’ve had some very interesting conversations with various individuals since it went up discussing theory, details, shortcomings, and everything in between.  The response has been great, and I appreciate the community’s interest.  One request I’ve received a few times is for a working demo that users could play with to get a better feel for the effect in action.  I had originally hoped to finish updating the engine to support DirectX 12 before releasing anything, and while it’s probably about 90% done, there are still some areas that need work and my time lately has been limited.

Thankfully, it’s the year 2015 (for a little while longer) and we have this magical thing called source control.  I’ve decided to use a tag I created right before the DX 12 update began, and have modified it to provide a small demo for anyone that’s been waiting on it.  The goods news is that it’s entirely DirectX 11-based, so the hardware support will be much broader than that of a DX 12 solution.  The downside is that I’ve been able to make a few improvements, especially around blending missed ray hits with fallback solutions that won’t be present in the demo provided.  I should get a chance to release a demo with the new features once things settle down a bit and all will be right with the world.

Demo Controls

Once the scene loads, anyone familiar with first person applications should feel more or less at home with the basics.  A, W, S, and D control movement, with W and S moving the camera forwards and backwards, and A and D strafing the camera left and right.  The mouse controls where the camera looks.  The user is not glued to the ground, so will move forward in whatever direction the camera is facing.  J and K control the floor roughness value, with J making the floor smoother, and K making it rougher.  A uniform roughness texture is applied over the entire floor, but in a real-world application an artist-authored texture would be used to make the results much more convincing.  Q and E are used to change the time of day.

The Esc key is used to quit the application.  To restart the scene without exiting, use the left, right, or up arrow keys.

Some Ugliness

The fallback environment maps are setup exactly as they were in the original post.  Specifically, this means that a large area of the scene only has the global, undetailed environment map to fall back to.  This is quite noticeable in the beginning area of the scene underneath the characters.  If you move directly forward from the starting point of the scene, you’ll pass through a few walls and end up in an enclosed hallway-type structure.  This area does have localized environment maps to fallback to on ray hits and the results are cleaner.  As stated in the first section, more work has been done to improve blending that is not present in the demo.

Besides the shortcomings of the screen space approach discussed in the original blog post, the stack of boxes in the scene still use the engine’s old physics and collision system.  In the latest version, all of this has been updated to use the Bullet Physics implementation, but if you choose to knock the stack down (clicking the left mouse button throws a ball), be aware that you’re likely to see quite a bit of oddness.  That being said – go for it, it’s always fun to knock things over!

Also, ambient light is handled by sampling from environment maps placed throughout the scene.  To ensure maintaining these doesn’t become a bottleneck, only one is ever updated per frame, and they’re only updated when the lighting changes.  Namely, this means that as the time of day changes the environment maps will get rebuilt.  If the time of day changes slowly enough, as it would in a real-world application, these updates would be mostly unnoticeable.  However, since the user can control the time of day the overall lighting situation can change faster than the environment maps can keep up.  If the user holds down one of the keys to change the time of day, they’ll see stale lighting data being applied to most parts of the scene.  Once the key is released, the environment map renderer will catch up and the lighting will become coherent again.

The Demo

Below is a link to download the demo.  Feel free and encouraged to continue commenting, asking questions, and offering constructive criticism.

Download the demo here.

Screen Space Glossy Reflections


Reflections are an important effect present in any routine attempting to approximate global illumination.  They give the user important spatial information about an object’s location, as well as provide an important visual indicator of the surface properties of certain materials.

For several years now, engineers and researchers in real-time graphics have worked towards improving reflections in their applications.  Simple implementations like cube maps used as reflection probes have been around for decades, while much newer techniques build upon their predecessors, such as parallax-corrected environment maps [4].

More recently, screen space ray tracing has become a widely used supplement to previously established methods of applying reflections to scenes.  The idea is simple enough – a view ray is reflected about a surface normal, then the resultant vector is stepped along until it has intersected the depth buffer.  With that location discovered, the light buffer is sampled and the value is added  to the final lighting result.  The below image shows a comparison of a scene looking at a stack of boxes without and with screen space ray tracing enabled.


In practice, there are more than a few pitfalls to this approach that need special care and addressing to avoid.  The most obvious short-coming of this and any other screen space effect is the limited information available.  If a ray doesn’t hit something before leaving the screen bounds, it will not return a result, even if its would-be collider is just barely off-screen.

This effect also tends to have a lot of trouble with rays facing back towards the viewer.  When given thought, it makes a lot of sense that this would present an issue.  For one, if the ray reflects directly back at the viewer, it will never intersect the depth buffer, thus basically degenerating into the case of rays traveling off-screen that’s already been discussed.  The other issue is similar, but maybe not as obvious.  If a ray is traveling back in the general direction of the viewer and it does intersect the depth buffer, it’s likely to do so on a face of an object that’s faced away from the viewer.  This means that even if an intersection is reported, an incorrect result will be sampled from the light buffer at that position.  This can lead to ugly artifacts such as streaks across surfaces.  The figure below shows a top-down view of a ray being cast from the viewer, hitting a mirrored surface, and finally making contact with the back of a box.  Since from the viewer’s perspective the back of the box is not currently on-screen, erroneous results will be returned if that result is used.


There are ways to mitigate many of these artifacts, including fallback methods and fading that will be addressed later on.

Glossy Ray Traced Reflections

One further challenge with the generic approach described above is that if the result is used directly, only perfectly mirror-like reflections can be generated.  In the real world, most surfaces do not reflect light perfectly, but instead scatter, absorb, and reflect it in varying proportions due to microfacets [9].  To account for this, the technique needs to not only consider where the ray intersects the depth buffer, but also the roughness of the reflecting surface and the distance the ray has traveled.  The following image shows a comparison of mirror-like and glossy reflections.  Notice on the right half of the image how the further the ray has to travel to make contact, the blurrier it becomes.


The rest of this post will re-touch on some of these issues as it discusses and provides a full implementation of ray tracing in screen space and creating glossy reflections via cone tracing.

Setting Up

The effects described in this post are implemented using DirectX 11 and HLSL.  That’s not at all to say those are mandatory for following along.  In fact, the ray tracing shader used below is a translation of one written in GLSL, which would use OpenGL as its graphics API.

This implementation was designed as part of a deferred shading pipeline.  The effect runs after geometry buffer generation and lighting has completed.  The ray tracing step needs access to the depth buffer and a buffer containing the normals of the geometry in view.  The blurring step needs access to the light buffer.  The cone tracing step needs access to all of the aforementioned buffers, including the resultant ray traced buffer and blurred light buffer, as well as a buffer containing the specular values for materials in view.  It is also beneficial to include a fallback buffer containing indirect specular contributions derived from methods such as parallax-corrected cube maps.  These will each be addressed as they are used in the implementation.

Therefore, the final list of buffers needed before starting the effect becomes:

  • Depth buffer – the implementation uses a non-linear depth buffer due to its ready availability after the geometry buffer is generated.  McGuire’s initial implementation [1] uses a linear depth buffer and may be more efficient.
  • Normal buffer – the geometry buffer used in this implementation stores all values in view space.  If the implementer stores their values in world space, they will need to be cognizant of the differences and prepared to apply appropriate transforms when necessary.
  • Light buffer – this is a buffer containing all lighting to be applied to the scene.  The exact values stored in this buffer will be refined further during implementation discussion.
  • Specular buffer – stored linearly as Fresnel reflectance at normal incidence (F(0°)) [5].  Some engines, such as Unreal Engine 4, have different workflows where this value may be hard-coded for dialectrics to a value of around 0.04 and stored in base color for metals.  The engine in use for this project is custom and stores the value directly.
  • Roughness buffer – this engine stores the roughness value in the w-component of the specular buffer, and is thus readily available when the previous buffer is bound.
  • Fallback indirect specular buffer – this buffer contains specular lighting values calculated before the ray tracing step using less precise techniques such as parallax-corrected cube maps and environment probes to help alleviate jarring discontinuities between ray hits and misses.

The depth buffer used in this implementation has 32 bits for depth.  All buffers containing lighting data contain 16 bit per channel floating point buffers.

Also needed for this effect is a constant buffer containing values specific to the effect.  In the initial GLSL implementation these were passed as uniforms, but in HLSL we set up a constant buffer like so:

 * The SSLRConstantBuffer.
 * Defines constants used to implement SSLR cone traced screen-space reflections.

cbuffer cbSSLR : register(b0)
 float2 cb_depthBufferSize; // dimensions of the z-buffer
 float cb_zThickness; // thickness to ascribe to each pixel in the depth buffer
 float cb_nearPlaneZ; // the camera's near z plane

 float cb_stride; // Step in horizontal or vertical pixels between samples. This is a float
 // because integer math is slow on GPUs, but should be set to an integer >= 1.
 float cb_maxSteps; // Maximum number of iterations. Higher gives better images but may be slow.
 float cb_maxDistance; // Maximum camera-space distance to trace before returning a miss.
 float cb_strideZCutoff; // More distant pixels are smaller in screen space. This value tells at what point to
 // start relaxing the stride to give higher quality reflections for objects far from
 // the camera.

 float cb_numMips; // the number of mip levels in the convolved color buffer
 float cb_fadeStart; // determines where to start screen edge fading of effect
 float cb_fadeEnd; // determines where to end screen edge fading of effect
 float cb_sslr_padding0; // padding for alignment


This constant buffer is contained in it’s own .hlsli file and included in the various steps where needed.  Most of the values map directly to uniform values in the GLSL implementation, and a few others will be discussed as they become pertinent.

Ray Tracing in Screen Space

The ray tracing portion of this technique is directly derived from Morgan McGuire and Mike Mara’s open source implementation of using the Digital Differential Analyzer (DDA) line algorithm to evenly distribute ray traced samples in screen space [1].  Their method handles perspective-correct interpolation of a 3D ray projected to screen space, and helps avoid over- and under-sampling issues present in traditional ray marches.  This helps more evenly distribute the limited number of samples that can be afforded per frame across the ray instead of skipping large portions at the start of the ray and bunching up samples towards the end.

McGuire and Mara’s initial implementation was presented in GLSL and assumed negative one (-1) to be the far plane Z value.  Below, the implementation has been converted to HLSL and uses postive one for the far plane.  The initial implementation also uses a linear depth buffer, though their accompanying paper provides source code for running the effect with a non-linear depth buffer.  The provided implementation assumes non-linear depth, and reconstructs linear Z values as they are sampled from the depth buffer using the methods described in [6].

// By Morgan McGuire and Michael Mara at Williams College 2014
// Released as open source under the BSD 2-Clause License
// Copyright (c) 2014, Morgan McGuire and Michael Mara
// All rights reserved.
// From McGuire and Mara, Efficient GPU Screen-Space Ray Tracing,
// Journal of Computer Graphics Techniques, 2014
// This software is open source under the "BSD 2-clause license":
// Redistribution and use in source and binary forms, with or
// without modification, are permitted provided that the following
// conditions are met:
// 1. Redistributions of source code must retain the above
// copyright notice, this list of conditions and the following
// disclaimer.
// 2. Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following
// disclaimer in the documentation and/or other materials provided
// with the distribution.
 * The ray tracing step of the SSLR implementation.
 * Modified version of the work stated above.
#include "SSLRConstantBuffer.hlsli"
#include "../../ConstantBuffers/PerFrame.hlsli"
#include "../../Utils/DepthUtils.hlsli"

Texture2D depthBuffer : register(t0);
Texture2D normalBuffer: register(t1);

struct VertexOut
 float4 posH : SV_POSITION;
 float3 viewRay : VIEWRAY;
 float2 tex : TEXCOORD;

float distanceSquared(float2 a, float2 b)
 a -= b;
 return dot(a, a);

bool intersectsDepthBuffer(float z, float minZ, float maxZ)
 * Based on how far away from the camera the depth is,
 * adding a bit of extra thickness can help improve some
 * artifacts. Driving this value up too high can cause
 * artifacts of its own.
 float depthScale = min(1.0f, z * cb_strideZCutoff);
 z += cb_zThickness + lerp(0.0f, 2.0f, depthScale);
 return (maxZ >= z) && (minZ - cb_zThickness <= z);

void swap(inout float a, inout float b)
 float t = a;
 a = b;
 b = t;

float linearDepthTexelFetch(int2 hitPixel)
 // Load returns 0 for any value accessed out of bounds
 return linearizeDepth(depthBuffer.Load(int3(hitPixel, 0)).r);

// Returns true if the ray hit something
bool traceScreenSpaceRay(
 // Camera-space ray origin, which must be within the view volume
 float3 csOrig,
 // Unit length camera-space ray direction
 float3 csDir,
 // Number between 0 and 1 for how far to bump the ray in stride units
 // to conceal banding artifacts. Not needed if stride == 1.
 float jitter,
 // Pixel coordinates of the first intersection with the scene
 out float2 hitPixel,
 // Camera space location of the ray hit
 out float3 hitPoint)
 // Clip to the near plane
 float rayLength = ((csOrig.z + csDir.z * cb_maxDistance) < cb_nearPlaneZ) ?
 (cb_nearPlaneZ - csOrig.z) / csDir.z : cb_maxDistance;
 float3 csEndPoint = csOrig + csDir * rayLength;

 // Project into homogeneous clip space
 float4 H0 = mul(float4(csOrig, 1.0f), viewToTextureSpaceMatrix);
 H0.xy *= cb_depthBufferSize;
 float4 H1 = mul(float4(csEndPoint, 1.0f), viewToTextureSpaceMatrix);
 H1.xy *= cb_depthBufferSize;
 float k0 = 1.0f / H0.w;
 float k1 = 1.0f / H1.w;

 // The interpolated homogeneous version of the camera-space points
 float3 Q0 = csOrig * k0;
 float3 Q1 = csEndPoint * k1;

 // Screen-space endpoints
 float2 P0 = H0.xy * k0;
 float2 P1 = H1.xy * k1;

 // If the line is degenerate, make it cover at least one pixel
 // to avoid handling zero-pixel extent as a special case later
 P1 += (distanceSquared(P0, P1) < 0.0001f) ? float2(0.01f, 0.01f) : 0.0f;
 float2 delta = P1 - P0;

 // Permute so that the primary iteration is in x to collapse
 // all quadrant-specific DDA cases later
 bool permute = false;
 if(abs(delta.x) < abs(delta.y))
 // This is a more-vertical line
 permute = true;
 delta = delta.yx;
 P0 = P0.yx;
 P1 = P1.yx;

 float stepDir = sign(delta.x);
 float invdx = stepDir / delta.x;

 // Track the derivatives of Q and k
 float3 dQ = (Q1 - Q0) * invdx;
 float dk = (k1 - k0) * invdx;
 float2 dP = float2(stepDir, delta.y * invdx);

 // Scale derivatives by the desired pixel stride and then
 // offset the starting values by the jitter fraction
 float strideScale = 1.0f - min(1.0f, csOrig.z * cb_strideZCutoff);
 float stride = 1.0f + strideScale * cb_stride;
 dP *= stride;
 dQ *= stride;
 dk *= stride;

 P0 += dP * jitter;
 Q0 += dQ * jitter;
 k0 += dk * jitter;

 // Slide P from P0 to P1, (now-homogeneous) Q from Q0 to Q1, k from k0 to k1
 float4 PQk = float4(P0, Q0.z, k0);
 float4 dPQk = float4(dP, dQ.z, dk);
 float3 Q = Q0; 

 // Adjust end condition for iteration direction
 float end = P1.x * stepDir;

 float stepCount = 0.0f;
 float prevZMaxEstimate = csOrig.z;
 float rayZMin = prevZMaxEstimate;
 float rayZMax = prevZMaxEstimate;
 float sceneZMax = rayZMax + 100.0f;
 ((PQk.x * stepDir) <= end) && (stepCount < cb_maxSteps) &&
 !intersectsDepthBuffer(sceneZMax, rayZMin, rayZMax) &&
 (sceneZMax != 0.0f);
 rayZMin = prevZMaxEstimate;
 rayZMax = (dPQk.z * 0.5f + PQk.z) / (dPQk.w * 0.5f + PQk.w);
 prevZMaxEstimate = rayZMax;
 if(rayZMin > rayZMax)
 swap(rayZMin, rayZMax);

 hitPixel = permute ? PQk.yx : PQk.xy;
 // You may need hitPixel.y = depthBufferSize.y - hitPixel.y; here if your vertical axis
 // is different than ours in screen space
 sceneZMax = linearDepthTexelFetch(depthBuffer, int2(hitPixel));

 PQk += dPQk;

 // Advance Q based on the number of steps
 Q.xy += dQ.xy * stepCount;
 hitPoint = Q * (1.0f / PQk.w);
 return intersectsDepthBuffer(sceneZMax, rayZMin, rayZMax);

float4 main(VertexOut pIn) : SV_TARGET
 int3 loadIndices = int3(pIn.posH.xy, 0);
 float3 normalVS = normalBuffer.Load(loadIndices).xyz;
 return 0.0f;

 float depth = depthBuffer.Load(loadIndices).r;
 float3 rayOriginVS = pIn.viewRay * linearizeDepth(depth);

 * Since position is reconstructed in view space, just normalize it to get the
 * vector from the eye to the position and then reflect that around the normal to
 * get the ray direction to trace.
 float3 toPositionVS = normalize(rayOriginVS);
 float3 rayDirectionVS = normalize(reflect(toPositionVS, normalVS));

 // output rDotV to the alpha channel for use in determining how much to fade the ray
 float rDotV = dot(rayDirectionVS, toPositionVS);

 // out parameters
 float2 hitPixel = float2(0.0f, 0.0f);
 float3 hitPoint = float3(0.0f, 0.0f, 0.0f);

 float jitter = cb_stride > 1.0f ? float(int(pIn.posH.x + pIn.posH.y) & 1) * 0.5f : 0.0f;

 // perform ray tracing - true if hit found, false otherwise
 bool intersection = traceScreenSpaceRay(rayOriginVS, rayDirectionVS, jitter, hitPixel, hitPoint);

 depth = depthBuffer.Load(int3(hitPixel, 0)).r;

 // move hit pixel from pixel position to UVs
 hitPixel *= float2(texelWidth, texelHeight);
 if(hitPixel.x > 1.0f || hitPixel.x < 0.0f || hitPixel.y > 1.0f || hitPixel.y < 0.0f)
 intersection = false;

 return float4(hitPixel, depth, rDotV) * (intersection ? 1.0f : 0.0f);

The DepthUtils.hlsli header contains the linearizeDepth function that’s used to convert a perspective-z depth into a linear value.  The PerFrame.hlsli header contains several values that are set at the start of a frame and remain constant throughout.  Of particular interest are texelWidth and texelHeight, which contain the texel size for the client (1 / dimension).  We use these value to convert pixel positions from the trace result into UV coordinates for easy lookup in subsequent steps.

An idea borrowed from Ben Hopkins (@kode80), who also open sourced his implementation of ray tracing based on McGuire’s initial work, is to use cutoff value for the stride based on Z distance [2].  The idea is that since as the distance grows further from the viewer and perspective projection makes objects smaller in screen space, the stride can be shortened and still likely find its contact point.  This helps distant locations create higher quality reflections than if they were to use a large stride similar to closer locations.  In the above implementation, this idea was extended into adding additional thickness to objects as their distance from the viewer increased.  This resulted in less artifacts at shallow angles where the rayZMin and rayZMax values would grow such that the sampled sceneZMax would fail and be rejected by small margins.

Another interesting idea from Hopkins’ implementation was to store the values and the step derivatives in float4 types.  The goal of this is to encourage the HLSL compiler to take advantage of SIMD operations since they are used in identical operations all at the same time.  In practice, the output from the Visual Studio 2013 Graphics Debugger showed the bytecode was nearly identical between the McGuire implementation and Hopkins’ implementation, but it was left in for being a cool idea.

The image below shows the results of the ray tracing step.  The buffer values include the UV coordinates of the ray hit in the x and y components, the depth in the z component, and the dot product of the view ray and the reflection ray in the w component.  The value stored in the w-component is used in the cone tracing step to fade rays facing towards the camera.  Black pixels mark areas where no intersection occurred.


Blurring the Light Buffer

The next step to obtaining glossy reflections is to blur the light buffer.  Specifically, the light buffer is copied to the top-most mip level of a texture supporting a full mip chain, and from there the result is blurred into its lower mip levels.  A separable 1-dimensional Gaussian blur is used.  The below implementation uses a 7-tap kernel, but the implementer should experiment to get a value that seems appropriate for their particular needs.  First the blur is applied vertically to a temporary buffer, then the blur is applied horizontally to the next level down in the mip chain.  The following code listing shows a simple blur shader.  Notice that to use the contents, there would need to exist two additional shaders, one defining each of the pre-processor directives specifying directionality and including the file below.

 * The Convolution shader body.
 * to be defined.

#include "SSLRConstantBuffer.hlsli"

struct VertexOut
 float4 posH : SV_POSITION;
 float2 tex : TEXCOORD;

Texture2D colorBuffer : register(t1);

static const int2 offsets[7] = {{-3, 0}, {-2, 0}, {-1, 0}, {0, 0}, {1, 0}, {2, 0}, {3, 0}};
static const int2 offsets[7] = {{0, -3}, {0, -2}, {0, -1}, {0, 0}, {0, 1}, {0, 2}, {0, 3}};
static const float weights[7] = {0.001f, 0.028f, 0.233f, 0.474f, 0.233f, 0.028f, 0.001f};

float4 main(VertexOut pIn): SV_Target0
 float2 uvs = pIn.tex * cb_depthBufferSize; // make sure to send in the SRV's dimensions for cb_depthBufferSize
 // sample level zero since only one mip level is available with the bound SRV
 int3 loadPos = int3(uvs, 0);

 float4 color = float4(0.0f, 0.0f, 0.0f, 1.0f);
 for(uint i = 0u; i < 7u; ++i)
 color += colorBuffer.Load(loadPos, offsets[i]) * weights[i];
 return float4(color.rgb, 1.0f);


During the blur passes the constant buffer values storing depth buffer size for the rest of the effect are re-purposed for recovering the load positions for fetches from the bound texture.  At the end of all blur passes these values should be reset to the correct dimensions before proceeding.

Cone Tracing

At this point in the effect, the ray traced buffer is complete and the full mip chain of the light buffer has been generated.  The idea in this section comes from the Yasin Uludag’s article in GPU Pro 5 [3].

It was mentioned earlier in the post that for glossy reflections to be represented, both the surface roughness and the distance traveled from the reflecting point to its point of contact needed to be accounted for.  Whereas a perfect mirror would cast a straight line outwards from the origin point, a rougher surface would cast a cone shape.  The figure below shows a representation of this phenomenon (albeit a bit crudely).


With these observation made, it can further be distinguished that in screen space a cone (3-dimensional) projects into an isosceles triangle (2-dimensional).  Knowing the location of the starting point and the ray’s end point tells us how far in screen space the ray has traveled.  With the roughness value available for the current surface through sampling the appropriate texture, everything that’s needed to move forward is on-hand.

The steps for cone tracing are as follows.

  1. The adjacent length of the isosceles triangle is found by finding the magnitude of the vector from the origin position to the ray hit position.
  2. The sampled roughness is converted into a specular power.
  3. The specular power is then used to calculate the cone angle (theta) for the isosceles triangle.
  4. The opposite length of the the triangle is found by dividing the cone angle in half and finding the opposite side of a right triangle using basic trigonometry, specifically that tan(theta) = oppositeLength/adjacentLength, which is equivalently represented as oppositeLength = tan(theta) * adjacentLength.
  5. The result is then doubled to recover the full length.
  6. The radius of a circle inscribed in the triangle is found using the formula found at [7] for isosceles triangles.  This is used to determine the sample position and the mip level from which to sample.
  7. The color is sampled and weighted based on surface roughness.
  8. Steps 2-7 are repeated several times until the resulting alpha reaches 1, or the loop hits its iteration limit.  During each iteration, the triangle’s adjacent length is shortened by the previously calculated radius, then each value is recomputed for the new triangle.

Step 7 in particular differs from Uludag’s implementation where he builds out an entire visibility buffer that is used to help diminish contributions from sampled pixels that should not be included as part of the result.  For most cases, the results tend to be good enough with this simplified approach, and the cost saved from not creating the visibility buffer and the hierarchical z-buffer from Uludag’s article can be re-assigned to further refinements or other effects.

The formula for finding the incircle of an isosceles triangle is displayed below.  In the formula, a represents the opposite length of the triangle and h represents the adjacent length.  The following image was obtained from [7].

Once the cone traced color is found, it’s modulated by the calculated Fresnel term using the values from the specular buffer, a normalized vector pointing from the surface location back towards the viewer, and the surface normal.  Finally, several fading steps are applied to help diminish the pronouncement of areas where the ray tracing step failed to find an intersection.  The results of this step are added back to the original light buffer and the process is complete.

The below shader code demonstrates this process.

#include "SSLRConstantBuffer.hlsli"
#include "../../LightingModel/PBL/LightUtils.hlsli"
#include "../../ConstantBuffers/PerFrame.hlsli"
#include "../../Utils/DepthUtils.hlsli"
#include "../../ShaderConstants.hlsli"

struct VertexOut
 float4 posH : SV_POSITION;
 float3 viewRay : VIEWRAY;
 float2 tex : TEXCOORD;

SamplerState sampTrilinearClamp : register(s1);

Texture2D depthBuffer : register(t0); // scene depth buffer used in ray tracing step
Texture2D colorBuffer : register(t1); // convolved color buffer - all mip levels
Texture2D rayTracingBuffer : register(t2); // ray-tracing buffer
Texture2D normalBuffer : register(t3); // normal buffer - from g-buffer
Texture2D specularBuffer : register(t4); // specular buffer - from g-buffer (rgb = ior, a = roughness)
Texture2D indirectSpecularBuffer : register(t5); // indirect specular light buffer used for fallback

// Cone tracing methods

float specularPowerToConeAngle(float specularPower)
 // based on phong distribution model
 if(specularPower >= exp2(CNST_MAX_SPECULAR_EXP))
 return 0.0f;
 const float xi = 0.244f;
 float exponent = 1.0f / (specularPower + 1.0f);
 return acos(pow(xi, exponent));

float isoscelesTriangleOpposite(float adjacentLength, float coneTheta)
 // simple trig and algebra - soh, cah, toa - tan(theta) = opp/adj, opp = tan(theta) * adj, then multiply * 2.0f for isosceles triangle base
 return 2.0f * tan(coneTheta) * adjacentLength;

float isoscelesTriangleInRadius(float a, float h)
 float a2 = a * a;
 float fh2 = 4.0f * h * h;
 return (a * (sqrt(a2 + fh2) - a)) / (4.0f * h);

float4 coneSampleWeightedColor(float2 samplePos, float mipChannel, float gloss)
 float3 sampleColor = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos, mipChannel).rgb;
 return float4(sampleColor * gloss, gloss);

float isoscelesTriangleNextAdjacent(float adjacentLength, float incircleRadius)
 // subtract the diameter of the incircle to get the adjacent side of the next level on the cone
 return adjacentLength - (incircleRadius * 2.0f);


float4 main(VertexOut pIn) : SV_TARGET
 int3 loadIndices = int3(pIn.posH.xy, 0);
 // get screen-space ray intersection point
 float4 raySS = rayTracingBuffer.Load(loadIndices).xyzw;
 float3 fallbackColor = indirectSpecularBuffer.Load(loadIndices).rgb;
 if(raySS.w <= 0.0f) // either means no hit or the ray faces back towards the camera
 // no data for this point - a fallback like localized environment maps should be used
 return float4(fallbackColor, 1.0f);
 float depth = depthBuffer.Load(loadIndices).r;
 float3 positionSS = float3(pIn.tex, depth);
 float linearDepth = linearizeDepth(depth);
 float3 positionVS = pIn.viewRay * linearDepth;
 // since calculations are in view-space, we can just normalize the position to point at it
 float3 toPositionVS = normalize(positionVS);
 float3 normalVS = normalBuffer.Load(loadIndices).rgb;

 // get specular power from roughness
 float4 specularAll = specularBuffer.Load(loadIndices);
 float gloss = 1.0f - specularAll.a;
 float specularPower = roughnessToSpecularPower(specularAll.a);

 // convert to cone angle (maximum extent of the specular lobe aperture)
 // only want half the full cone angle since we're slicing the isosceles triangle in half to get a right triangle
 float coneTheta = specularPowerToConeAngle(specularPower) * 0.5f;

 // P1 = positionSS, P2 = raySS, adjacent length = ||P2 - P1||
 float2 deltaP = raySS.xy - positionSS.xy;
 float adjacentLength = length(deltaP);
 float2 adjacentUnit = normalize(deltaP);

 float4 totalColor = float4(0.0f, 0.0f, 0.0f, 0.0f);
 float remainingAlpha = 1.0f;
 float maxMipLevel = (float)cb_numMips - 1.0f;
 float glossMult = gloss;
 // cone-tracing using an isosceles triangle to approximate a cone in screen space
 for(int i = 0; i < 14; ++i)
 // intersection length is the adjacent side, get the opposite side using trig
 float oppositeLength = isoscelesTriangleOpposite(adjacentLength, coneTheta);

 // calculate in-radius of the isosceles triangle
 float incircleSize = isoscelesTriangleInRadius(oppositeLength, adjacentLength);

 // get the sample position in screen space
 float2 samplePos = positionSS.xy + adjacentUnit * (adjacentLength - incircleSize);

 // convert the in-radius into screen size then check what power N to raise 2 to reach it - that power N becomes mip level to sample from
 float mipChannel = clamp(log2(incircleSize * max(cb_depthBufferSize.x, cb_depthBufferSize.y)), 0.0f, maxMipLevel);

 * Read color and accumulate it using trilinear filtering and weight it.
 * Uses pre-convolved image (color buffer) and glossiness to weigh color contributions.
 * Visibility is accumulated in the alpha channel. Break if visibility is 100% or greater (>= 1.0f).
 float4 newColor = coneSampleWeightedColor(samplePos, mipChannel, glossMult);

 remainingAlpha -= newColor.a;
 if(remainingAlpha < 0.0f)
 newColor.rgb *= (1.0f - abs(remainingAlpha));
 totalColor += newColor;

 if(totalColor.a >= 1.0f)

 adjacentLength = isoscelesTriangleNextAdjacent(adjacentLength, incircleSize);
 glossMult *= gloss;

 float3 toEye = -toPositionVS;
 float3 specular = calculateFresnelTerm(specularAll.rgb, abs(dot(normalVS, toEye))) * CNST_1DIVPI;

 // fade rays close to screen edge
 float2 boundary = abs(raySS.xy - float2(0.5f, 0.5f)) * 2.0f;
 const float fadeDiffRcp = 1.0f / (cb_fadeEnd - cb_fadeStart);
 float fadeOnBorder = 1.0f - saturate((boundary.x - cb_fadeStart) * fadeDiffRcp);
 fadeOnBorder *= 1.0f - saturate((boundary.y - cb_fadeStart) * fadeDiffRcp);
 fadeOnBorder = smoothstep(0.0f, 1.0f, fadeOnBorder);
 float3 rayHitPositionVS = viewSpacePositionFromDepth(raySS.xy, raySS.z);
 float fadeOnDistance = 1.0f - saturate(distance(rayHitPositionVS, positionVS) / cb_maxDistance);
 // ray tracing steps stores rdotv in w component - always > 0 due to check at start of this method
 float fadeOnPerpendicular = saturate(lerp(0.0f, 1.0f, saturate(raySS.w * 4.0f)));
 float fadeOnRoughness = saturate(lerp(0.0f, 1.0f, gloss * 4.0f));
 float totalFade = fadeOnBorder * fadeOnDistance * fadeOnPerpendicular * fadeOnRoughness * (1.0f - saturate(remainingAlpha));

 return float4(lerp(fallbackColor, totalColor.rgb * specular, totalFade), 1.0f);

The following image roughly illustrates the process.  From top to bottom, the floor of the image starts off perfectly mirror-like and gradually becomes rougher.  The red lines indicate the cones.  The circles inscribed in them show how the radii are used for mip selection (i.e., the larger the circle, the further down the mip chain), and the center of each circle is where the sample would be taken.  Notice that for a perfectly mirror-like surface, the cone diminishes to a straight line.


Bringing It All Together

It’s mentioned earlier that a fallback technique is useful for any screen space reflection technique.  This implementation uses parallax-corrected cube maps based on Lagarde’s post [4]. These also include a fallback to generic, non-corrected cube maps as a last resort.  These values are all computed before the screen space reflections technique starts and are accessed above in the cone tracing step through the “indirectSpecularBuffer” resource.  While fallback methods won’t be as exact as ray traced results, properly set-up cube maps can certainly help alleviate jarring artifacts.  The image below shows a comparison of two sections of the same scene.  The left half of the image does not have good cube map placement and the missed reflection data is quite noticeable under the sphere.  The right half includes blended parallax-corrected cube maps and introduces a much less severe penalty for missed rays.


Another artifact of inadequate fallback techniques can also be seen in the left image above.  As the traced ray nears closer to the edge of the screen, it starts to become faded.  The code for this is towards the bottom of the cone tracing shader.  Without a decent fallback technique in place, the differences between the center of the screen and the edges can be quite drastic.  The right half of the image shows such fading only to a very minor degree, most noticeably on the left edge of the picture.

Due to the numerous issues mentioned towards the start of the post, rays facing back towards the viewer are disallowed entirely.  This is an implementation choice and by no means a requirement.  Implementers should experiment with their own scenes and determine whether backwards-traversing rays provide acceptable results for use cases specific to the application.  In the implementation above, ray results start to fade as they become perpendicular such as to not cause a sharp cutoff at any one point.

A final nicety that was added to this implementation is that the indirect specular buffer is actually a part of the light buffer during the initial convolution and is subtracted back out before applying the cone tracing pass.  What this allows for is metals to be reflected more appropriately in the cone traced step.  In the image below, the left half does not take these steps into consideration and the metal’s reflection is black.  The specular highlight shows up in the reflection since it is contributed from direct lighting, the sun in this case, but none of the indirect light is included.  In the right half of the image, these effects are enabled and the sky is observable in the reflected sphere.


The U-shape on the bottom of each sphere is due to not having good fallback techniques in this area of the scene, and can be alleviated as discussed previously.

Areas of Improvement

The biggest area of needed improvement with this technique in its current state is the need for a better blur technique.  The current separable Gaussian blur, while fast, can lead to reflections being blurred onto parts of the scene where they don’t belong.  A feature-aware blur similar to a bilateral blur is likely a better candidate in this space and will be followed up on in a separate post once a better method is determined.  Specifically, the blur will likely need to account for large depth discrepancies and reject samples that do not fall within a specified threshold.  It should be noted that battling these type of artifacts is a potential strength of Uludag’s proposed visibility buffer.

The blur can also be sped up while still obtaining the same results by using the approach found at [10].  This is slated as future work for the current effect, and will likely be included in the same follow up post that revisits a better overall blurring solution.

While testing storage for the blurred results, a Texture2DArray was also tried out.  While this means of storage improved the overall perceived smoothness of the blur over varying roughness values, the memory requirements and increased time to run the blur several times over the full textures were simply not worth the small improvements.  The mip-chained texture provides decent results and blends adequately with trilinear sampling.  While testing values for various kernel sizes and sigmas, the calculator at [11] was extremely helpful for quick iteration.

One further improvement that can be made to the blurred result using the current implementation is to sample several points within the inscribed circle instead of just the center and blend all the results together.  The trade-off for sampling multiple points in this fashion is between performance and quality.  This technique is demonstrated in [8] on page 3 of the conversation.

Another area of improvement for this technique would be to update the reflection model to better match the lighting model used in the rest of the engine’s rendering pipeline.  As mentioned previously, the above implementation for the cone-tracing step is based off Uludag’s explanation provided in [3].  In its current state, the effect uses an approximation of the Phong model, while the rest of the pipeline uses GGX for its specular distribution term.  Uludag does offer suggestions in his article on how to adapt to other reflection models, and this will likely be the topic of a future post once implemented.

Furthermore, using more efficiently packed buffers for lighting data could prove to be a performance improvement for this technique.  As mentioned above, all buffers containing lighting data are 64-bit floating point buffers with 16 bits of precision in each channel.  Future experimentation with a more efficient 32-bit floating point buffer such as DirectX’s DXGI_FORMAT_R11G11B10_FLOAT should be considered.


This section contains images generated using the techniques described above.  Each image is comprised of a few smaller images showing increasing roughness in the floor material.

The first image shows the effect working on a large scale in an area of the scene spanning over 100 meters.


The second image shows the effect working in a more localized setting at ground level, similar to how a user would perceive the world in a first-person game or application.  The area uses parallax-corrected cube maps as a fallback technique, and missed ray intersections, such as those that would likely occur around concave objects (the soldier in this case), are very well-blended.


The third image again shows the effect in a localized setting.  The later time of day creates a steeper contrast between shadowed and un-shadowed areas causing the effect to be more pronounced and better showing how a rougher surface will blur and even start to pull the reflection vertically.


The fourth image again uses a steeper lighting contrast to help demonstrate how the effect applies as the floor material changes from very smooth to very rough.


The following videos show the effect running in a real-time interactive application.  For best viewing, it is recommended to either run the videos in full-screen with high-definition enabled, or visit their respective YouTube pages by following these links:  Video 1  Video 2.


This post has presented a full implementation of a solution for glossy screen space reflections.  While the abundance of programmer art and MS Paint images may not be quite as fantastical as those rendered using a proper studio’s asset collection, the contributions of the effect to the final result should be clear.  Even with a basic reflection model, the technique serves to add more realism to a scene and provides a means for believable real-time reflections for rough surfaces.


I first came into contact with Bruce Wilkie about a year ago when he posted a topic on  We were both working on implementing Yasin Uludag’s article from GPU Pro 5 [3].  We spoke a few times on the subject, and it became abundantly clear that he was much more knowledgeable on the matter than me.  He was critical in helping me understand and figure out Uludag’s use of the hierarchical Z-buffer for ray tracing and work the kinks out of my initial attempts at implementing it [8].  Bruce was kind enough to offer that we keep in touch and that I could ask him questions around issues I might have while implementing different features in my engine, which I work on as a hobby in my spare time.  I’ve certainly taken advantage of that offer over the course of the year, and he’s offered various ranges of advice on almost everything graphics-related that’s been posted to this blog to date.  He has shown a great deal of patience in helping clarify certain concepts to me, and has a knack for explaining how to arrive at a solution without simply giving the answer away – an extremely valuable teaching technique.  He also brought the idea of the more efficient blur using [10] to my attention as a solid alternative to the standard approach used above, as well as offered a few more suggestions for improvement over the first draft of this post.

Thank you, Bruce.

I would also like thank Morgan McGuire (@morgan3d) and Mike Mara for open-sourcing and generously licensing their DDA-based ray tracing code.  A thank you goes to Ben Hopkins (@kode80) for doing the same with his implementation.


[1] Morgan McGuire and Mike Mara.

[2] Ben Hopkins.

[3] Yasin Uludag.  GPU Pro 5.  Hi-Z Screen-Space Cone-Traced Reflections.

[4] Sébastien Lagarde.

[5] Sébastien Lagarde.

[6] Matt Pettineo.

[7] Weisstein, Eric W. “Inradius.” From MathWorld–A Wolfram Web Resource.





Dealing with Shadow Map Artifacts

In a previous post on stack stabilization, the linked video showed a few major issue with shadow mapping.  These issues have plagued the technique since it’s inception, and while there are many methods that assist in alleviating them, it’s still very difficult to completely get rid of them.  Here we’ll review some common artifacts and discuss potential ways to squash them.

Perspective Aliasing

These types of artifacts are perhaps the simplest to alleviate.  Stair-like artifacts outlining the projected shadows are generally caused by the resolution of the shadow map being too low.  Compare the halves in the image below.  The top half shows a scene using a shadow map resolution of 256×256, while the bottom shows the same scene using a resolution of 2048×2048.


Unfortunately, increasing the resolution will only get us so far.  Even at high resolutions, if the viewer is close enough to the receiving surface, tiny stair-like artifacts will still be noticeable along the edges of projected shadows.  The solution to this is to use a technique called percentage closer filtering (PCF).  Instead of sampling at one location, this algorithm samples several points around the initial location, weighs the results that are shadowed versus non-shadowed, and creates soft edges for the result.  The image below shows an up-close view of a shadow map with 2048×2048 resolution without and then with PCF enabled.


There are several different sampling patterns that can be used for the PCF algorithm.  Currently, I’m using a simple box filter around the center location.  Other sampling patterns, such as a rotated Poisson disc, are also popular and produce varying results.

Shadow Acne

Another common artifact found in shadow mapping is shadow acne, or erroneous self-shadowing.  This generally occurs when the texel depth in light space and the texel depth in view space are so close that floating point errors incorrectly cause the depth test to fail.  The image below shows an example of these artifacts present (top) and addressed (bottom).


There are a few ways to address this issue.  It’s so prevalent, that most graphics APIs provide a means to instantiate a rasterizer state that includes both a depth bias and a slope-scaled depth bias.  Essentially, during shadow map creation, these values are used in combination to offset the current value by a certain amount and push it out of the range where floating point inaccuracies would cause inaccurate comparisons.  One must be careful when setting these bias values.  Too high of a value can cause the next issue to be discussed, peter panning, while too low of a value will still let acne artifacts creep back into the final image.

Peter Panning

It’s frustrating when introducing a fix for one thing breaks something else.  That’s exactly what we can potentially end up with when we use depth biases for shadow maps.  Peter Panning is caused by offsetting the depth values in light space too much.  The result is that the shadow becomes detached from the object casting it.  The image below displays this phenomenon.  In both halves of the image, the blocks are resting on the ground, but in the top half the depth bias is so large that it pushes the shadow away from the caster, causing them to appear as though they could be floating.  The bottom half uses a more appropriate depth bias and the shadow appears properly attached.



Working in the Shader

Using hardware depth biasing in the rasterizer is nice in that it’s fast and easy enough to set up and get working.  Sometimes, however, we have different needs for our shadow maps and want to delay these type of correction steps until further in the pipeline.  Though I’ve since reverted to a more basic approach, when first implementing transmittance through thin materials I switched my shadow map vertex shaders to output linear values to make the implementation a bit more straightforward.  If I used the rasterizer state offsets as described above, I would have to somehow track and undo those offsets before I could use the values effectively in my transmittance calculations, or else have major artifacts from depth discrepancies.  Fortunately, there are several excellent resources that describe alternative methods for getting rid of shadow artifacts (see references), and with a combination of ideas borrowed from all of them, I’ve been able to get a fairly decent implementation working.  Below is some example code in HLSL.

Storing linear values to the shadow map:

// client code
Matrix4x4f linearProjectionMtx = createPerspectiveFOVLHMatrix4x4f(fovy, aspect, nearPlane, farPlane);
linearProjectionMtx.rc33 /= farPlane;
linearProjectionMtx.rc34 /= farPlane;

// shadow map vertex shader
float4 main(VertexIn vIn) : SV_POSITION
 // transform to homogeneous clip space
 float4 posH = mul(float4(vIn.posL, 1.0f), worldViewProjectionMatrix);
 // store linear depth to shadow map - there is no change to the value stored for orthographic projections since w == 1
 posH.z *= posH.w;
 return posH;

Using a scaled normal offset in the light shader before transforming a point in world space by the shadow transform matrix.  I use a deferred shading pipeline and store data in the G-Buffer in view space, hence having to transform the new position by the inverse of the camera view matrix first:

 float3 toLightV = normalize(-light.direction);
 float3 toLightV = normalize(light.position - position);
 float cosAngle = saturate(1.0f - dot(toLightV, normal));
 float3 scaledNormalOffset = normal * (cb_normalOffset * cosAngle * smTexelDimensions);
 float4 shadowPosW = mul(float4(position + scaledNormalOffset, 1.0f), inverseViewMatrix);

Once the point has been transformed by the shadow matrix, finish projecting it and apply a depth offset:

// complete projection by doing division by w /= shadowPosH.w;
shadowPosH.z -= cb_depthBias * smTexelDimensions;
float depth = shadowPosH.z; // depth to use for PCF comparison

And that’s it.  The values for depth bias and normal offset have to be adjusted per light and depend on various factors, such as the light range, the shadow projection matrix, and to some extent the resolution of the shadow map, but when properly set the results can be quite nice and artifacts are almost entirely mitigated.


Bachelor Thesis Acknowledgment

I recently received an acknowledgement in Lukas Hermanns’ bachelor’s thesis entitled Screen Space Cone Tracing for Glossy Reflections, which I thought was really cool of him.  He’s produced some great results, and I’m happy to have lent a hand in the excellent work he’s done.

The full thesis can be found here:

Separable Subsurface Scattering

I’ve recently implemented Screen-Space Separable Subsurface Scattering into my rendering engine.  This implementation is based off the incredible work that’s been done over the past several years, and documented here and here.  I’m quite pleased with the results I’m getting from the effect and so am posting a few screenshots of it in action.

The first screenshot shows the effect in daylight.  Hopefully it’s quite obvious which head in the picture has the new technique applied and which is being lit with the engine’s standard lighting model.


The second screenshot shows another part of the overall effect, which is the transmittance of light through very thin slabs of materials, such as ears.


The next screenshot better shows both subsurface scattering and transmittance working together.  In particular, notice how the light behaves along the ridge of the nose.


Finally, I cobbled together a quick setup showing how this technique could be used to create a nice effect for candles.  In truth, I cheated a little in that I have not yet incorporated a wax kernel for the subsurface scattering technique, so instead I aimed a bright spotlight straight down at a cylinder using the same skin kernel as used in the above screenshots.  Even with such a simple (and quite lazy) setup, the result is still decent looking, and it bodes well for creating a proper candle in an actual scene using a correct SSS kernel.


Update:  I didn’t like that I had left the post at “think about how nice a candle could look”, so I went ahead and brought in a more wax-like kernel.  The setup is basically the same in that it’s just a cylinder with a light shining down on it, but now it definitely exhibits light interactions much more like an actual candle.