Screen Space Glossy Reflections

Introduction

Reflections are an important effect present in any routine attempting to approximate global illumination.  They give the user important spatial information about an object’s location, as well as provide an important visual indicator of the surface properties of certain materials.

For several years now, engineers and researchers in real-time graphics have worked towards improving reflections in their applications.  Simple implementations like cube maps used as reflection probes have been around for decades, while much newer techniques build upon their predecessors, such as parallax-corrected environment maps [4].

More recently, screen space ray tracing has become a widely used supplement to previously established methods of applying reflections to scenes.  The idea is simple enough – a view ray is reflected about a surface normal, then the resultant vector is stepped along until it has intersected the depth buffer.  With that location discovered, the light buffer is sampled and the value is added  to the final lighting result.  The below image shows a comparison of a scene looking at a stack of boxes without and with screen space ray tracing enabled.

sslr_off_on_comparison

In practice, there are more than a few pitfalls to this approach that need special care and addressing to avoid.  The most obvious short-coming of this and any other screen space effect is the limited information available.  If a ray doesn’t hit something before leaving the screen bounds, it will not return a result, even if its would-be collider is just barely off-screen.

This effect also tends to have a lot of trouble with rays facing back towards the viewer.  When given thought, it makes a lot of sense that this would present an issue.  For one, if the ray reflects directly back at the viewer, it will never intersect the depth buffer, thus basically degenerating into the case of rays traveling off-screen that’s already been discussed.  The other issue is similar, but maybe not as obvious.  If a ray is traveling back in the general direction of the viewer and it does intersect the depth buffer, it’s likely to do so on a face of an object that’s faced away from the viewer.  This means that even if an intersection is reported, an incorrect result will be sampled from the light buffer at that position.  This can lead to ugly artifacts such as streaks across surfaces.  The figure below shows a top-down view of a ray being cast from the viewer, hitting a mirrored surface, and finally making contact with the back of a box.  Since from the viewer’s perspective the back of the box is not currently on-screen, erroneous results will be returned if that result is used.

reflect_back_of_box

There are ways to mitigate many of these artifacts, including fallback methods and fading that will be addressed later on.

Glossy Ray Traced Reflections

One further challenge with the generic approach described above is that if the result is used directly, only perfectly mirror-like reflections can be generated.  In the real world, most surfaces do not reflect light perfectly, but instead scatter, absorb, and reflect it in varying proportions due to microfacets [9].  To account for this, the technique needs to not only consider where the ray intersects the depth buffer, but also the roughness of the reflecting surface and the distance the ray has traveled.  The following image shows a comparison of mirror-like and glossy reflections.  Notice on the right half of the image how the further the ray has to travel to make contact, the blurrier it becomes.

sslr_mirror_glossy_comparison

The rest of this post will re-touch on some of these issues as it discusses and provides a full implementation of ray tracing in screen space and creating glossy reflections via cone tracing.

Setting Up

The effects described in this post are implemented using DirectX 11 and HLSL.  That’s not at all to say those are mandatory for following along.  In fact, the ray tracing shader used below is a translation of one written in GLSL, which would use OpenGL as its graphics API.

This implementation was designed as part of a deferred shading pipeline.  The effect runs after geometry buffer generation and lighting has completed.  The ray tracing step needs access to the depth buffer and a buffer containing the normals of the geometry in view.  The blurring step needs access to the light buffer.  The cone tracing step needs access to all of the aforementioned buffers, including the resultant ray traced buffer and blurred light buffer, as well as a buffer containing the specular values for materials in view.  It is also beneficial to include a fallback buffer containing indirect specular contributions derived from methods such as parallax-corrected cube maps.  These will each be addressed as they are used in the implementation.

Therefore, the final list of buffers needed before starting the effect becomes:

  • Depth buffer – the implementation uses a non-linear depth buffer due to its ready availability after the geometry buffer is generated.  McGuire’s initial implementation [1] uses a linear depth buffer and may be more efficient.
  • Normal buffer – the geometry buffer used in this implementation stores all values in view space.  If the implementer stores their values in world space, they will need to be cognizant of the differences and prepared to apply appropriate transforms when necessary.
  • Light buffer – this is a buffer containing all lighting to be applied to the scene.  The exact values stored in this buffer will be refined further during implementation discussion.
  • Specular buffer – stored linearly as Fresnel reflectance at normal incidence (F(0°)) [5].  Some engines, such as Unreal Engine 4, have different workflows where this value may be hard-coded for dialectrics to a value of around 0.04 and stored in base color for metals.  The engine in use for this project is custom and stores the value directly.
  • Roughness buffer – this engine stores the roughness value in the w-component of the specular buffer, and is thus readily available when the previous buffer is bound.
  • Fallback indirect specular buffer – this buffer contains specular lighting values calculated before the ray tracing step using less precise techniques such as parallax-corrected cube maps and environment probes to help alleviate jarring discontinuities between ray hits and misses.

The depth buffer used in this implementation has 32 bits for depth.  All buffers containing lighting data contain 16 bit per channel floating point buffers.

Also needed for this effect is a constant buffer containing values specific to the effect.  In the initial GLSL implementation these were passed as uniforms, but in HLSL we set up a constant buffer like so:

/**
 * The SSLRConstantBuffer.
 * Defines constants used to implement SSLR cone traced screen-space reflections.
 */
#ifndef CBSSLR_HLSLI
#define CBSSLR_HLSLI

cbuffer cbSSLR : register(b0)
{
 float2 cb_depthBufferSize; // dimensions of the z-buffer
 float cb_zThickness; // thickness to ascribe to each pixel in the depth buffer
 float cb_nearPlaneZ; // the camera's near z plane

 float cb_stride; // Step in horizontal or vertical pixels between samples. This is a float
 // because integer math is slow on GPUs, but should be set to an integer >= 1.
 float cb_maxSteps; // Maximum number of iterations. Higher gives better images but may be slow.
 float cb_maxDistance; // Maximum camera-space distance to trace before returning a miss.
 float cb_strideZCutoff; // More distant pixels are smaller in screen space. This value tells at what point to
 // start relaxing the stride to give higher quality reflections for objects far from
 // the camera.

 float cb_numMips; // the number of mip levels in the convolved color buffer
 float cb_fadeStart; // determines where to start screen edge fading of effect
 float cb_fadeEnd; // determines where to end screen edge fading of effect
 float cb_sslr_padding0; // padding for alignment
};

#endif

This constant buffer is contained in it’s own .hlsli file and included in the various steps where needed.  Most of the values map directly to uniform values in the GLSL implementation, and a few others will be discussed as they become pertinent.

Ray Tracing in Screen Space

The ray tracing portion of this technique is directly derived from Morgan McGuire and Mike Mara’s open source implementation of using the Digital Differential Analyzer (DDA) line algorithm to evenly distribute ray traced samples in screen space [1].  Their method handles perspective-correct interpolation of a 3D ray projected to screen space, and helps avoid over- and under-sampling issues present in traditional ray marches.  This helps more evenly distribute the limited number of samples that can be afforded per frame across the ray instead of skipping large portions at the start of the ray and bunching up samples towards the end.

McGuire and Mara’s initial implementation was presented in GLSL and assumed negative one (-1) to be the far plane Z value.  Below, the implementation has been converted to HLSL and uses postive one for the far plane.  The initial implementation also uses a linear depth buffer, though their accompanying paper provides source code for running the effect with a non-linear depth buffer.  The provided implementation assumes non-linear depth, and reconstructs linear Z values as they are sampled from the depth buffer using the methods described in [6].

// By Morgan McGuire and Michael Mara at Williams College 2014
// Released as open source under the BSD 2-Clause License
// http://opensource.org/licenses/BSD-2-Clause
//
// Copyright (c) 2014, Morgan McGuire and Michael Mara
// All rights reserved.
//
// From McGuire and Mara, Efficient GPU Screen-Space Ray Tracing,
// Journal of Computer Graphics Techniques, 2014
//
// This software is open source under the "BSD 2-clause license":
//
// Redistribution and use in source and binary forms, with or
// without modification, are permitted provided that the following
// conditions are met:
//
// 1. Redistributions of source code must retain the above
// copyright notice, this list of conditions and the following
// disclaimer.
//
// 2. Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following
// disclaimer in the documentation and/or other materials provided
// with the distribution.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
// CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,
// INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
// MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
// DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
// USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
// AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
// LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
// IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
// THE POSSIBILITY OF SUCH DAMAGE.
/**
 * The ray tracing step of the SSLR implementation.
 * Modified version of the work stated above.
 */
#include "SSLRConstantBuffer.hlsli"
#include "../../ConstantBuffers/PerFrame.hlsli"
#include "../../Utils/DepthUtils.hlsli"

Texture2D depthBuffer : register(t0);
Texture2D normalBuffer: register(t1);

struct VertexOut
{
 float4 posH : SV_POSITION;
 float3 viewRay : VIEWRAY;
 float2 tex : TEXCOORD;
};

float distanceSquared(float2 a, float2 b)
{
 a -= b;
 return dot(a, a);
}

bool intersectsDepthBuffer(float z, float minZ, float maxZ)
{
 /*
 * Based on how far away from the camera the depth is,
 * adding a bit of extra thickness can help improve some
 * artifacts. Driving this value up too high can cause
 * artifacts of its own.
 */
 float depthScale = min(1.0f, z * cb_strideZCutoff);
 z += cb_zThickness + lerp(0.0f, 2.0f, depthScale);
 return (maxZ >= z) && (minZ - cb_zThickness <= z);
}

void swap(inout float a, inout float b)
{
 float t = a;
 a = b;
 b = t;
}

float linearDepthTexelFetch(int2 hitPixel)
{
 // Load returns 0 for any value accessed out of bounds
 return linearizeDepth(depthBuffer.Load(int3(hitPixel, 0)).r);
}

// Returns true if the ray hit something
bool traceScreenSpaceRay(
 // Camera-space ray origin, which must be within the view volume
 float3 csOrig,
 // Unit length camera-space ray direction
 float3 csDir,
 // Number between 0 and 1 for how far to bump the ray in stride units
 // to conceal banding artifacts. Not needed if stride == 1.
 float jitter,
 // Pixel coordinates of the first intersection with the scene
 out float2 hitPixel,
 // Camera space location of the ray hit
 out float3 hitPoint)
{
 // Clip to the near plane
 float rayLength = ((csOrig.z + csDir.z * cb_maxDistance) < cb_nearPlaneZ) ?
 (cb_nearPlaneZ - csOrig.z) / csDir.z : cb_maxDistance;
 float3 csEndPoint = csOrig + csDir * rayLength;

 // Project into homogeneous clip space
 float4 H0 = mul(float4(csOrig, 1.0f), viewToTextureSpaceMatrix);
 H0.xy *= cb_depthBufferSize;
 float4 H1 = mul(float4(csEndPoint, 1.0f), viewToTextureSpaceMatrix);
 H1.xy *= cb_depthBufferSize;
 float k0 = 1.0f / H0.w;
 float k1 = 1.0f / H1.w;

 // The interpolated homogeneous version of the camera-space points
 float3 Q0 = csOrig * k0;
 float3 Q1 = csEndPoint * k1;

 // Screen-space endpoints
 float2 P0 = H0.xy * k0;
 float2 P1 = H1.xy * k1;

 // If the line is degenerate, make it cover at least one pixel
 // to avoid handling zero-pixel extent as a special case later
 P1 += (distanceSquared(P0, P1) < 0.0001f) ? float2(0.01f, 0.01f) : 0.0f;
 float2 delta = P1 - P0;

 // Permute so that the primary iteration is in x to collapse
 // all quadrant-specific DDA cases later
 bool permute = false;
 if(abs(delta.x) < abs(delta.y))
 {
 // This is a more-vertical line
 permute = true;
 delta = delta.yx;
 P0 = P0.yx;
 P1 = P1.yx;
 }

 float stepDir = sign(delta.x);
 float invdx = stepDir / delta.x;

 // Track the derivatives of Q and k
 float3 dQ = (Q1 - Q0) * invdx;
 float dk = (k1 - k0) * invdx;
 float2 dP = float2(stepDir, delta.y * invdx);

 // Scale derivatives by the desired pixel stride and then
 // offset the starting values by the jitter fraction
 float strideScale = 1.0f - min(1.0f, csOrig.z * cb_strideZCutoff);
 float stride = 1.0f + strideScale * cb_stride;
 dP *= stride;
 dQ *= stride;
 dk *= stride;

 P0 += dP * jitter;
 Q0 += dQ * jitter;
 k0 += dk * jitter;

 // Slide P from P0 to P1, (now-homogeneous) Q from Q0 to Q1, k from k0 to k1
 float4 PQk = float4(P0, Q0.z, k0);
 float4 dPQk = float4(dP, dQ.z, dk);
 float3 Q = Q0; 

 // Adjust end condition for iteration direction
 float end = P1.x * stepDir;

 float stepCount = 0.0f;
 float prevZMaxEstimate = csOrig.z;
 float rayZMin = prevZMaxEstimate;
 float rayZMax = prevZMaxEstimate;
 float sceneZMax = rayZMax + 100.0f;
 for(;
 ((PQk.x * stepDir) <= end) && (stepCount < cb_maxSteps) &&
 !intersectsDepthBuffer(sceneZMax, rayZMin, rayZMax) &&
 (sceneZMax != 0.0f);
 ++stepCount)
 {
 rayZMin = prevZMaxEstimate;
 rayZMax = (dPQk.z * 0.5f + PQk.z) / (dPQk.w * 0.5f + PQk.w);
 prevZMaxEstimate = rayZMax;
 if(rayZMin > rayZMax)
 {
 swap(rayZMin, rayZMax);
 }

 hitPixel = permute ? PQk.yx : PQk.xy;
 // You may need hitPixel.y = depthBufferSize.y - hitPixel.y; here if your vertical axis
 // is different than ours in screen space
 sceneZMax = linearDepthTexelFetch(depthBuffer, int2(hitPixel));

 PQk += dPQk;
 }

 // Advance Q based on the number of steps
 Q.xy += dQ.xy * stepCount;
 hitPoint = Q * (1.0f / PQk.w);
 return intersectsDepthBuffer(sceneZMax, rayZMin, rayZMax);
}

float4 main(VertexOut pIn) : SV_TARGET
{
 int3 loadIndices = int3(pIn.posH.xy, 0);
 float3 normalVS = normalBuffer.Load(loadIndices).xyz;
 if(!any(normalVS))
 {
 return 0.0f;
 }

 float depth = depthBuffer.Load(loadIndices).r;
 float3 rayOriginVS = pIn.viewRay * linearizeDepth(depth);

 /*
 * Since position is reconstructed in view space, just normalize it to get the
 * vector from the eye to the position and then reflect that around the normal to
 * get the ray direction to trace.
 */
 float3 toPositionVS = normalize(rayOriginVS);
 float3 rayDirectionVS = normalize(reflect(toPositionVS, normalVS));

 // output rDotV to the alpha channel for use in determining how much to fade the ray
 float rDotV = dot(rayDirectionVS, toPositionVS);

 // out parameters
 float2 hitPixel = float2(0.0f, 0.0f);
 float3 hitPoint = float3(0.0f, 0.0f, 0.0f);

 float jitter = cb_stride > 1.0f ? float(int(pIn.posH.x + pIn.posH.y) & 1) * 0.5f : 0.0f;

 // perform ray tracing - true if hit found, false otherwise
 bool intersection = traceScreenSpaceRay(rayOriginVS, rayDirectionVS, jitter, hitPixel, hitPoint);

 depth = depthBuffer.Load(int3(hitPixel, 0)).r;

 // move hit pixel from pixel position to UVs
 hitPixel *= float2(texelWidth, texelHeight);
 if(hitPixel.x > 1.0f || hitPixel.x < 0.0f || hitPixel.y > 1.0f || hitPixel.y < 0.0f)
 {
 intersection = false;
 }

 return float4(hitPixel, depth, rDotV) * (intersection ? 1.0f : 0.0f);
}

The DepthUtils.hlsli header contains the linearizeDepth function that’s used to convert a perspective-z depth into a linear value.  The PerFrame.hlsli header contains several values that are set at the start of a frame and remain constant throughout.  Of particular interest are texelWidth and texelHeight, which contain the texel size for the client (1 / dimension).  We use these value to convert pixel positions from the trace result into UV coordinates for easy lookup in subsequent steps.

An idea borrowed from Ben Hopkins (@kode80), who also open sourced his implementation of ray tracing based on McGuire’s initial work, is to use cutoff value for the stride based on Z distance [2].  The idea is that since as the distance grows further from the viewer and perspective projection makes objects smaller in screen space, the stride can be shortened and still likely find its contact point.  This helps distant locations create higher quality reflections than if they were to use a large stride similar to closer locations.  In the above implementation, this idea was extended into adding additional thickness to objects as their distance from the viewer increased.  This resulted in less artifacts at shallow angles where the rayZMin and rayZMax values would grow such that the sampled sceneZMax would fail and be rejected by small margins.

Another interesting idea from Hopkins’ implementation was to store the values and the step derivatives in float4 types.  The goal of this is to encourage the HLSL compiler to take advantage of SIMD operations since they are used in identical operations all at the same time.  In practice, the output from the Visual Studio 2013 Graphics Debugger showed the bytecode was nearly identical between the McGuire implementation and Hopkins’ implementation, but it was left in for being a cool idea.

The image below shows the results of the ray tracing step.  The buffer values include the UV coordinates of the ray hit in the x and y components, the depth in the z component, and the dot product of the view ray and the reflection ray in the w component.  The value stored in the w-component is used in the cone tracing step to fade rays facing towards the camera.  Black pixels mark areas where no intersection occurred.

sslr_ray_traced_buffer

Blurring the Light Buffer

The next step to obtaining glossy reflections is to blur the light buffer.  Specifically, the light buffer is copied to the top-most mip level of a texture supporting a full mip chain, and from there the result is blurred into its lower mip levels.  A separable 1-dimensional Gaussian blur is used.  The below implementation uses a 7-tap kernel, but the implementer should experiment to get a value that seems appropriate for their particular needs.  First the blur is applied vertically to a temporary buffer, then the blur is applied horizontally to the next level down in the mip chain.  The following code listing shows a simple blur shader.  Notice that to use the contents, there would need to exist two additional shaders, one defining each of the pre-processor directives specifying directionality and including the file below.

/**
 * The Convolution shader body.
 * Requires either CONVOLVE_VERTICAL or CONVOLVE_HORIZONTAL
 * to be defined.
 */
#ifndef CONVOLUTIONPS_HLSLI
#define CONVOLUTIONPS_HLSLI

#include "SSLRConstantBuffer.hlsli"

struct VertexOut
{
 float4 posH : SV_POSITION;
 float2 tex : TEXCOORD;
};

Texture2D colorBuffer : register(t1);

#if CONVOLVE_HORIZONTAL
static const int2 offsets[7] = {{-3, 0}, {-2, 0}, {-1, 0}, {0, 0}, {1, 0}, {2, 0}, {3, 0}};
#elif CONVOLVE_VERTICAL
static const int2 offsets[7] = {{0, -3}, {0, -2}, {0, -1}, {0, 0}, {0, 1}, {0, 2}, {0, 3}};
#endif
static const float weights[7] = {0.001f, 0.028f, 0.233f, 0.474f, 0.233f, 0.028f, 0.001f};

float4 main(VertexOut pIn): SV_Target0
{
 float2 uvs = pIn.tex * cb_depthBufferSize; // make sure to send in the SRV's dimensions for cb_depthBufferSize
 // sample level zero since only one mip level is available with the bound SRV
 int3 loadPos = int3(uvs, 0);

 float4 color = float4(0.0f, 0.0f, 0.0f, 1.0f);
 [unroll]
 for(uint i = 0u; i < 7u; ++i)
 {
 color += colorBuffer.Load(loadPos, offsets[i]) * weights[i];
 }
 return float4(color.rgb, 1.0f);
}

#endif

During the blur passes the constant buffer values storing depth buffer size for the rest of the effect are re-purposed for recovering the load positions for fetches from the bound texture.  At the end of all blur passes these values should be reset to the correct dimensions before proceeding.

Cone Tracing

At this point in the effect, the ray traced buffer is complete and the full mip chain of the light buffer has been generated.  The idea in this section comes from the Yasin Uludag’s article in GPU Pro 5 [3].

It was mentioned earlier in the post that for glossy reflections to be represented, both the surface roughness and the distance traveled from the reflecting point to its point of contact needed to be accounted for.  Whereas a perfect mirror would cast a straight line outwards from the origin point, a rougher surface would cast a cone shape.  The figure below shows a representation of this phenomenon (albeit a bit crudely).

sslr_ray_vs_cone_comparison

With these observation made, it can further be distinguished that in screen space a cone (3-dimensional) projects into an isosceles triangle (2-dimensional).  Knowing the location of the starting point and the ray’s end point tells us how far in screen space the ray has traveled.  With the roughness value available for the current surface through sampling the appropriate texture, everything that’s needed to move forward is on-hand.

The steps for cone tracing are as follows.

  1. The adjacent length of the isosceles triangle is found by finding the magnitude of the vector from the origin position to the ray hit position.
  2. The sampled roughness is converted into a specular power.
  3. The specular power is then used to calculate the cone angle (theta) for the isosceles triangle.
  4. The opposite length of the the triangle is found by dividing the cone angle in half and finding the opposite side of a right triangle using basic trigonometry, specifically that tan(theta) = oppositeLength/adjacentLength, which is equivalently represented as oppositeLength = tan(theta) * adjacentLength.
  5. The result is then doubled to recover the full length.
  6. The radius of a circle inscribed in the triangle is found using the formula found at [7] for isosceles triangles.  This is used to determine the sample position and the mip level from which to sample.
  7. The color is sampled and weighted based on surface roughness.
  8. Steps 2-7 are repeated several times until the resulting alpha reaches 1, or the loop hits its iteration limit.  During each iteration, the triangle’s adjacent length is shortened by the previously calculated radius, then each value is recomputed for the new triangle.

Step 7 in particular differs from Uludag’s implementation where he builds out an entire visibility buffer that is used to help diminish contributions from sampled pixels that should not be included as part of the result.  For most cases, the results tend to be good enough with this simplified approach, and the cost saved from not creating the visibility buffer and the hierarchical z-buffer from Uludag’s article can be re-assigned to further refinements or other effects.

The formula for finding the incircle of an isosceles triangle is displayed below.  In the formula, a represents the opposite length of the triangle and h represents the adjacent length.  The following image was obtained from [7].

Once the cone traced color is found, it’s modulated by the calculated Fresnel term using the values from the specular buffer, a normalized vector pointing from the surface location back towards the viewer, and the surface normal.  Finally, several fading steps are applied to help diminish the pronouncement of areas where the ray tracing step failed to find an intersection.  The results of this step are added back to the original light buffer and the process is complete.

The below shader code demonstrates this process.

#include "SSLRConstantBuffer.hlsli"
#include "../../LightingModel/PBL/LightUtils.hlsli"
#include "../../ConstantBuffers/PerFrame.hlsli"
#include "../../Utils/DepthUtils.hlsli"
#include "../../ShaderConstants.hlsli"

struct VertexOut
{
 float4 posH : SV_POSITION;
 float3 viewRay : VIEWRAY;
 float2 tex : TEXCOORD;
};

SamplerState sampTrilinearClamp : register(s1);

Texture2D depthBuffer : register(t0); // scene depth buffer used in ray tracing step
Texture2D colorBuffer : register(t1); // convolved color buffer - all mip levels
Texture2D rayTracingBuffer : register(t2); // ray-tracing buffer
Texture2D normalBuffer : register(t3); // normal buffer - from g-buffer
Texture2D specularBuffer : register(t4); // specular buffer - from g-buffer (rgb = ior, a = roughness)
Texture2D indirectSpecularBuffer : register(t5); // indirect specular light buffer used for fallback

///////////////////////////////////////////////////////////////////////////////////////
// Cone tracing methods
///////////////////////////////////////////////////////////////////////////////////////

float specularPowerToConeAngle(float specularPower)
{
 // based on phong distribution model
 if(specularPower >= exp2(CNST_MAX_SPECULAR_EXP))
 {
 return 0.0f;
 }
 const float xi = 0.244f;
 float exponent = 1.0f / (specularPower + 1.0f);
 return acos(pow(xi, exponent));
}

float isoscelesTriangleOpposite(float adjacentLength, float coneTheta)
{
 // simple trig and algebra - soh, cah, toa - tan(theta) = opp/adj, opp = tan(theta) * adj, then multiply * 2.0f for isosceles triangle base
 return 2.0f * tan(coneTheta) * adjacentLength;
}

float isoscelesTriangleInRadius(float a, float h)
{
 float a2 = a * a;
 float fh2 = 4.0f * h * h;
 return (a * (sqrt(a2 + fh2) - a)) / (4.0f * h);
}

float4 coneSampleWeightedColor(float2 samplePos, float mipChannel, float gloss)
{
 float3 sampleColor = colorBuffer.SampleLevel(sampTrilinearClamp, samplePos, mipChannel).rgb;
 return float4(sampleColor * gloss, gloss);
}

float isoscelesTriangleNextAdjacent(float adjacentLength, float incircleRadius)
{
 // subtract the diameter of the incircle to get the adjacent side of the next level on the cone
 return adjacentLength - (incircleRadius * 2.0f);
}

///////////////////////////////////////////////////////////////////////////////////////

float4 main(VertexOut pIn) : SV_TARGET
{
 int3 loadIndices = int3(pIn.posH.xy, 0);
 // get screen-space ray intersection point
 float4 raySS = rayTracingBuffer.Load(loadIndices).xyzw;
 float3 fallbackColor = indirectSpecularBuffer.Load(loadIndices).rgb;
 if(raySS.w <= 0.0f) // either means no hit or the ray faces back towards the camera
 {
 // no data for this point - a fallback like localized environment maps should be used
 return float4(fallbackColor, 1.0f);
 }
 float depth = depthBuffer.Load(loadIndices).r;
 float3 positionSS = float3(pIn.tex, depth);
 float linearDepth = linearizeDepth(depth);
 float3 positionVS = pIn.viewRay * linearDepth;
 // since calculations are in view-space, we can just normalize the position to point at it
 float3 toPositionVS = normalize(positionVS);
 float3 normalVS = normalBuffer.Load(loadIndices).rgb;

 // get specular power from roughness
 float4 specularAll = specularBuffer.Load(loadIndices);
 float gloss = 1.0f - specularAll.a;
 float specularPower = roughnessToSpecularPower(specularAll.a);

 // convert to cone angle (maximum extent of the specular lobe aperture)
 // only want half the full cone angle since we're slicing the isosceles triangle in half to get a right triangle
 float coneTheta = specularPowerToConeAngle(specularPower) * 0.5f;

 // P1 = positionSS, P2 = raySS, adjacent length = ||P2 - P1||
 float2 deltaP = raySS.xy - positionSS.xy;
 float adjacentLength = length(deltaP);
 float2 adjacentUnit = normalize(deltaP);

 float4 totalColor = float4(0.0f, 0.0f, 0.0f, 0.0f);
 float remainingAlpha = 1.0f;
 float maxMipLevel = (float)cb_numMips - 1.0f;
 float glossMult = gloss;
 // cone-tracing using an isosceles triangle to approximate a cone in screen space
 for(int i = 0; i < 14; ++i)
 {
 // intersection length is the adjacent side, get the opposite side using trig
 float oppositeLength = isoscelesTriangleOpposite(adjacentLength, coneTheta);

 // calculate in-radius of the isosceles triangle
 float incircleSize = isoscelesTriangleInRadius(oppositeLength, adjacentLength);

 // get the sample position in screen space
 float2 samplePos = positionSS.xy + adjacentUnit * (adjacentLength - incircleSize);

 // convert the in-radius into screen size then check what power N to raise 2 to reach it - that power N becomes mip level to sample from
 float mipChannel = clamp(log2(incircleSize * max(cb_depthBufferSize.x, cb_depthBufferSize.y)), 0.0f, maxMipLevel);

 /*
 * Read color and accumulate it using trilinear filtering and weight it.
 * Uses pre-convolved image (color buffer) and glossiness to weigh color contributions.
 * Visibility is accumulated in the alpha channel. Break if visibility is 100% or greater (>= 1.0f).
 */
 float4 newColor = coneSampleWeightedColor(samplePos, mipChannel, glossMult);

 remainingAlpha -= newColor.a;
 if(remainingAlpha < 0.0f)
 {
 newColor.rgb *= (1.0f - abs(remainingAlpha));
 }
 totalColor += newColor;

 if(totalColor.a >= 1.0f)
 {
 break;
 }

 adjacentLength = isoscelesTriangleNextAdjacent(adjacentLength, incircleSize);
 glossMult *= gloss;
 }

 float3 toEye = -toPositionVS;
 float3 specular = calculateFresnelTerm(specularAll.rgb, abs(dot(normalVS, toEye))) * CNST_1DIVPI;

 // fade rays close to screen edge
 float2 boundary = abs(raySS.xy - float2(0.5f, 0.5f)) * 2.0f;
 const float fadeDiffRcp = 1.0f / (cb_fadeEnd - cb_fadeStart);
 float fadeOnBorder = 1.0f - saturate((boundary.x - cb_fadeStart) * fadeDiffRcp);
 fadeOnBorder *= 1.0f - saturate((boundary.y - cb_fadeStart) * fadeDiffRcp);
 fadeOnBorder = smoothstep(0.0f, 1.0f, fadeOnBorder);
 float3 rayHitPositionVS = viewSpacePositionFromDepth(raySS.xy, raySS.z);
 float fadeOnDistance = 1.0f - saturate(distance(rayHitPositionVS, positionVS) / cb_maxDistance);
 // ray tracing steps stores rdotv in w component - always > 0 due to check at start of this method
 float fadeOnPerpendicular = saturate(lerp(0.0f, 1.0f, saturate(raySS.w * 4.0f)));
 float fadeOnRoughness = saturate(lerp(0.0f, 1.0f, gloss * 4.0f));
 float totalFade = fadeOnBorder * fadeOnDistance * fadeOnPerpendicular * fadeOnRoughness * (1.0f - saturate(remainingAlpha));

 return float4(lerp(fallbackColor, totalColor.rgb * specular, totalFade), 1.0f);
}

The following image roughly illustrates the process.  From top to bottom, the floor of the image starts off perfectly mirror-like and gradually becomes rougher.  The red lines indicate the cones.  The circles inscribed in them show how the radii are used for mip selection (i.e., the larger the circle, the further down the mip chain), and the center of each circle is where the sample would be taken.  Notice that for a perfectly mirror-like surface, the cone diminishes to a straight line.

sslr_cone_width_comparison

Bringing It All Together

It’s mentioned earlier that a fallback technique is useful for any screen space reflection technique.  This implementation uses parallax-corrected cube maps based on Lagarde’s post [4]. These also include a fallback to generic, non-corrected cube maps as a last resort.  These values are all computed before the screen space reflections technique starts and are accessed above in the cone tracing step through the “indirectSpecularBuffer” resource.  While fallback methods won’t be as exact as ray traced results, properly set-up cube maps can certainly help alleviate jarring artifacts.  The image below shows a comparison of two sections of the same scene.  The left half of the image does not have good cube map placement and the missed reflection data is quite noticeable under the sphere.  The right half includes blended parallax-corrected cube maps and introduces a much less severe penalty for missed rays.

sslr_fallback_bad_good_comparison

Another artifact of inadequate fallback techniques can also be seen in the left image above.  As the traced ray nears closer to the edge of the screen, it starts to become faded.  The code for this is towards the bottom of the cone tracing shader.  Without a decent fallback technique in place, the differences between the center of the screen and the edges can be quite drastic.  The right half of the image shows such fading only to a very minor degree, most noticeably on the left edge of the picture.

Due to the numerous issues mentioned towards the start of the post, rays facing back towards the viewer are disallowed entirely.  This is an implementation choice and by no means a requirement.  Implementers should experiment with their own scenes and determine whether backwards-traversing rays provide acceptable results for use cases specific to the application.  In the implementation above, ray results start to fade as they become perpendicular such as to not cause a sharp cutoff at any one point.

A final nicety that was added to this implementation is that the indirect specular buffer is actually a part of the light buffer during the initial convolution and is subtracted back out before applying the cone tracing pass.  What this allows for is metals to be reflected more appropriately in the cone traced step.  In the image below, the left half does not take these steps into consideration and the metal’s reflection is black.  The specular highlight shows up in the reflection since it is contributed from direct lighting, the sun in this case, but none of the indirect light is included.  In the right half of the image, these effects are enabled and the sky is observable in the reflected sphere.

sslr_metal_reflections_comparison

The U-shape on the bottom of each sphere is due to not having good fallback techniques in this area of the scene, and can be alleviated as discussed previously.

Areas of Improvement

The biggest area of needed improvement with this technique in its current state is the need for a better blur technique.  The current separable Gaussian blur, while fast, can lead to reflections being blurred onto parts of the scene where they don’t belong.  A feature-aware blur similar to a bilateral blur is likely a better candidate in this space and will be followed up on in a separate post once a better method is determined.  Specifically, the blur will likely need to account for large depth discrepancies and reject samples that do not fall within a specified threshold.  It should be noted that battling these type of artifacts is a potential strength of Uludag’s proposed visibility buffer.

The blur can also be sped up while still obtaining the same results by using the approach found at [10].  This is slated as future work for the current effect, and will likely be included in the same follow up post that revisits a better overall blurring solution.

While testing storage for the blurred results, a Texture2DArray was also tried out.  While this means of storage improved the overall perceived smoothness of the blur over varying roughness values, the memory requirements and increased time to run the blur several times over the full textures were simply not worth the small improvements.  The mip-chained texture provides decent results and blends adequately with trilinear sampling.  While testing values for various kernel sizes and sigmas, the calculator at [11] was extremely helpful for quick iteration.

One further improvement that can be made to the blurred result using the current implementation is to sample several points within the inscribed circle instead of just the center and blend all the results together.  The trade-off for sampling multiple points in this fashion is between performance and quality.  This technique is demonstrated in [8] on page 3 of the conversation.

Another area of improvement for this technique would be to update the reflection model to better match the lighting model used in the rest of the engine’s rendering pipeline.  As mentioned previously, the above implementation for the cone-tracing step is based off Uludag’s explanation provided in [3].  In its current state, the effect uses an approximation of the Phong model, while the rest of the pipeline uses GGX for its specular distribution term.  Uludag does offer suggestions in his article on how to adapt to other reflection models, and this will likely be the topic of a future post once implemented.

Furthermore, using more efficiently packed buffers for lighting data could prove to be a performance improvement for this technique.  As mentioned above, all buffers containing lighting data are 64-bit floating point buffers with 16 bits of precision in each channel.  Future experimentation with a more efficient 32-bit floating point buffer such as DirectX’s DXGI_FORMAT_R11G11B10_FLOAT should be considered.

Results

This section contains images generated using the techniques described above.  Each image is comprised of a few smaller images showing increasing roughness in the floor material.

The first image shows the effect working on a large scale in an area of the scene spanning over 100 meters.

sslr_result_1

The second image shows the effect working in a more localized setting at ground level, similar to how a user would perceive the world in a first-person game or application.  The area uses parallax-corrected cube maps as a fallback technique, and missed ray intersections, such as those that would likely occur around concave objects (the soldier in this case), are very well-blended.

sslr_result_2

The third image again shows the effect in a localized setting.  The later time of day creates a steeper contrast between shadowed and un-shadowed areas causing the effect to be more pronounced and better showing how a rougher surface will blur and even start to pull the reflection vertically.

sslr_result_3

The fourth image again uses a steeper lighting contrast to help demonstrate how the effect applies as the floor material changes from very smooth to very rough.

sslr_result_4

The following videos show the effect running in a real-time interactive application.  For best viewing, it is recommended to either run the videos in full-screen with high-definition enabled, or visit their respective YouTube pages by following these links:  Video 1  Video 2.

Conclusion

This post has presented a full implementation of a solution for glossy screen space reflections.  While the abundance of programmer art and MS Paint images may not be quite as fantastical as those rendered using a proper studio’s asset collection, the contributions of the effect to the final result should be clear.  Even with a basic reflection model, the technique serves to add more realism to a scene and provides a means for believable real-time reflections for rough surfaces.

Acknowledgements

I first came into contact with Bruce Wilkie about a year ago when he posted a topic on gamedev.net.  We were both working on implementing Yasin Uludag’s article from GPU Pro 5 [3].  We spoke a few times on the subject, and it became abundantly clear that he was much more knowledgeable on the matter than me.  He was critical in helping me understand and figure out Uludag’s use of the hierarchical Z-buffer for ray tracing and work the kinks out of my initial attempts at implementing it [8].  Bruce was kind enough to offer that we keep in touch and that I could ask him questions around issues I might have while implementing different features in my engine, which I work on as a hobby in my spare time.  I’ve certainly taken advantage of that offer over the course of the year, and he’s offered various ranges of advice on almost everything graphics-related that’s been posted to this blog to date.  He has shown a great deal of patience in helping clarify certain concepts to me, and has a knack for explaining how to arrive at a solution without simply giving the answer away – an extremely valuable teaching technique.  He also brought the idea of the more efficient blur using [10] to my attention as a solid alternative to the standard approach used above, as well as offered a few more suggestions for improvement over the first draft of this post.

Thank you, Bruce.

I would also like thank Morgan McGuire (@morgan3d) and Mike Mara for open-sourcing and generously licensing their DDA-based ray tracing code.  A thank you goes to Ben Hopkins (@kode80) for doing the same with his implementation.

References

[1] Morgan McGuire and Mike Mara.  http://casual-effects.blogspot.com/2014/08/screen-space-ray-tracing.html

[2] Ben Hopkins.  http://www.kode80.com/blog/2015/03/11/screen-space-reflections-in-unity-5/

[3] Yasin Uludag.  GPU Pro 5.  Hi-Z Screen-Space Cone-Traced Reflections.

[4] Sébastien Lagarde.  https://seblagarde.wordpress.com/2012/09/29/image-based-lighting-approaches-and-parallax-corrected-cubemap/

[5] Sébastien Lagarde.  https://seblagarde.wordpress.com/2011/08/17/feeding-a-physical-based-lighting-mode/

[6] Matt Pettineo.  https://mynameismjp.wordpress.com/2010/09/05/position-from-depth-3/

[7] Weisstein, Eric W. “Inradius.” From MathWorld–A Wolfram Web Resource. http://mathworld.wolfram.com/Inradius.html

[8] https://www.gamedev.net/topic/658702-help-with-gpu-pro-5-hi-z-screen-space-reflections/

[9] https://en.wikipedia.org/wiki/Specular_highlight

[10] http://rastergrid.com/blog/2010/09/efficient-gaussian-blur-with-linear-sampling/

[11] http://dev.theomader.com/gaussian-kernel-calculator/

16 thoughts on “Screen Space Glossy Reflections”

  1. Great work. My only complaint is that this method leaves hard edges at depth discontinuities. For instance, that checkerboard pattern gets blurred, but the edge of the wall is hard edged.

    Thats the main reason I have not switched to cone tracing. Instead I SSR the screen, grabbing environment probes where there is no hit, then blur that ‘reflection buffer’ with a variable tap filter depending on the roughness of the pixel where the ray started. Unfortunately this results with a consistent blur, disregarding distance to the reflected pixel, but at least that matches the environment probes.
    If I had a way to determine distance from environment probe lookups, I could scale the filter width by distance to simulate something similar, but it would not be correct like yours is.

  2. Hey Will,

    I recently ran into this technique and found your discussion with a couple fellow over at GameDev.net essentially trying to decrypt the GPU Pro 5 author’s work since he couldn’t release the source code. I was wondering why you decided to go away from the HiZ Buffer tracing approach, as your results looked great. I read Lukas’ bachelor’s thesis and it looked like he developed a way for fallback cases of rays pointing towards the camera, as it essentially defaults to a regular ray marching approach as outlined here.

    Anyways, just curious about why you decided to nix that implementation and go with McGuire’s implementation with no accelerated data structure.

    Thanks!
    Matt

    1. Hi Matt,

      My engine actually supports both approaches, and let’s the user decide through a config file or an in-app menu which one to use, or to disable SSLR completely. I chose to use the DDA approach for the article mainly because I find it to be a little more straightforward and easier to wrap one’s head around, and also since there’s already a pretty full implementation of the HiZ approach on the gamedev.net thread for anyone willing to sift through it. :)

      Both effects can provide really good results when they’re working well. The main advantage of the HiZ approach is that the ray trace tends to resolve faster since it takes larger strides as it descends mip levels. Rarely in debugging did I ever see more than about 16-25 iterations being used before a result was found. On the other hand, it also tends to miss small depth discrepancies and can leave artifacts or holes in the ray traced buffer. The main advantage with the DDA approach is that the ray traced result tends to be higher quality with fewer misses, as long as the stride isn’t set too high. The downside is that it can be slower than the HiZ approach, especially if the step count is high, which it may need to be if you intend to cover a full frame. Of course, none of that is to say that HiZ doesn’t create good ray trace results, or that DDA is slow – it’s just how they typically tend to compare to one another. Doing a little post-processing on the HiZ ray trace result to fill in tiny misses (push/pull, etc.) can help improve the results from that trace, just as choosing a proper step count and stride can, sometimes drastically, improve the performance of the DDA approach.

      Let me know if you have any questions.

      Thanks,

      Will

    1. Hi,

      I’m currently nearing the end of porting my engine to DirectX 12, and as such a few things are still in a bit of disarray. Hopefully in the coming weeks I get the last bits updated and reach feature parity with my DX 11 engine and at that point I’d be happy to upload a small executable for folks to try out. The shaders stay largely the same as above, but I understand why seeing it in action for yourself beats still images and the videos in the article. I’ll probably just create a new post once the demo is available, so if you have something like Feedly or any other blog tracker, feel free to add this one. I’ll try to not to keep you waiting too long :).

      Thanks,
      Will

  3. Hi, and thanks for the great post!

    I was wondering, if you could enlighten me about one detail about the McGuire’s ray tracing implementation. I have tried to get the SSR working for a while now, but this one detail seems to make it quite too hard for my limited mind :P

    The thing that is bugging me, is the computation of Q0.z, Q1.z and then dQ.z. If I have understood correctly, the projection of csOrigin and csEndPoint results in H0.w = csOrigin.z and H1.w = csEndPoint.z. Now when e.g. Q0 gets calculated, it is essentially set to (csOrig.x, csOrig.y, csOrig.z)/csOrig.z
    so the Q0.z is 1 and similarly the Q1.z is 1. This leads to dQ.z being zero which cannot be right.

    I am probably making just some silly mistake, but for some reason cannot see what it is. Any help would make me very happy.

    Thanks!
    Hank

    1. Hi,
      (csOrig.x, csOrign.y, csOrig.z) / csOrig.z doesn’t look right to me. You’re actually dividing them by the w-component of their respective projected coordinates. The only time the value should be 1 is if the camera space coordinate is on the far plane. The values of H0.w and H1.w will vary based on depth. Dividing the other components by the w-component is just finishing the projection, something akin to what you do with shadow mapping or projective texture mapping. Does that help at all?
      Thanks,
      Will

      1. Hi, and thanks for the quick reply.

        (Math notation: vectors are column vectors, matrices consist of row vectors)
        The thing is that when calculating v’ = P * v, where P is a perspective projection matrix, and v is vector (x, y, z, 1), this leads to v’ = (x’, y’, z’, z). Isn’t this correct? So when H0 is calculated as P * O, H0.w ends up being Oz. Then afterwards we divide O by Oz to get Q0.

        This surely doesn’t seem right, but that is how I see the situation. But maybe I just don’t know how the matrix P (viewToTextureSpace) should be derived and am making false assumptions here. So if you could clarify that, maybe I could undersand the situation a bit better.

        Thanks,
        Hank

        1. Hey, okay I think I’m picking up what you’re putting down :). I misinterpreted a bit in your first post, and get what you’re saying now.
          Yes, that’s a good catch! In fact, I just now tried running the same code with PQk.z hard-coded to 1, dPQk.z hard-coded to 0, and numerator of the rayZMax term hard-coded to 1 in the loop and get the same results. I’m going to experiment with it a little more and try to figure out why McGuire included the term in his implementation since it doesn’t really seem necessary, but it’s definitely a potential optimization if it turns out to not be needed. If I find anything that refutes what you’re saying and requires the z-term to be calculated as shown, I’ll keep you updated. Nice find!
          Thanks,
          Will

          1. Hi,

            Thanks for confirming my suspicion about the Q.z-term. Great to know that I might not be losing my mind after all :D

            This means that the error I have in my implementation isn’t related to the Q.z. Still trying to figure out why I don’t have proper reflections. After your post I made some progress though (freed me to consider other places for errors in my implementation). Now I have some “reflections”, but I need to crank up the stride to have them show at all.

            Thanks,
            Hank

    1. Hi,
      The viewToTextureSpaceMatrix is just a matrix that transforms a coordinate from view space all the way through to texture space. It’s just a concatenation of the current projection matrix and the texture transform matrix:

      0.5f, 0.0f, 0.0f, 0.5f,
      0.0f, -0.5f, 0.0f, 0.5f,
      0.0f, 0.0f, 1.0f, 0.0f,
      0.0f, 0.0f, 0.0f, 1.0f

  4. I’m developing a Dirt 3 mod. I’ve already rewritten about 200 shaders and I’m very interested in adding your ssReflections, but I’m having troubles in making it to work. I have viewspacePosition, viewspaceReflections and clipspacePosition from a vertex shader, but I do not know how it relates to the input vectors of your shader. Would you shed a little light on the subject please. I could share my recent try in 3DMigoto shader hacker format If you would be interested in helping to make it the ssReflections possible in Dirt 3.

  5. If you don’t mind clarifying: so with the cone tracing, judging by the AdjacentUnit being just the normalized vector from reflecting fragment to ray-intersected fragment, the whole get-down is just a sort of multisampling along the ray until it reaches the intersection point, with an increasing miplevel? Of course, this is in accordance with the roughness/gloss and Fresnel as they correspond to the screen-space cone geometry.

    I see the 14-iteration loop, does that mean it can sample beyond the intersection point (up the screen further in the case of a reflective ground-plane)?

    I’m curious as to how the earlier samples along the cone don’t become the majority of the full sample produced by the cone trace, if they are weighted more with glossMult being at its maximal value in the first iterations while it multiplies with gloss further down the line as it approaches the actual ray intersection point.

    What am I missing?

Leave a Reply

Your email address will not be published. Required fields are marked *