We needed to add screen space reflections to our project with the following requirements:
- Compatibility with WebGL
- Rough surfaces support
- The execution time should not exceed 2ms at Full HD resolution on devices equivalent to RTX 2070.
We used AMD's implementation of Screen Space Reflections as the basis for our implementation [AMD-SSSR]. We recommend reading AMD's documentation and also a more detailed review of the algorithm by Kostas Anagnostou [Kostas Anagnostou, SSSR] to better understand the following sections. Since WebGL doesn't support compute shaders, we had to make some compromises for compatibility. Please refer to implementation details section for further insights.
The following table enumerates all external inputs required by SSR.
Name | Format | Notes |
---|---|---|
Color buffer | APPLICATION SPECIFIED |
The HDR render target of the current frame containing the scene radiance |
Depth buffer | APPLICATION SPECIFIED (1x FLOAT) |
The depth buffer for the current frame provided by the application. The data should be provided as a single floating point value, the precision of which is under the application's control |
Normal buffer | APPLICATION SPECIFIED (3x FLOAT) |
The normal buffer for the current frame provided by the application in the [-1.0, +1.0] range. Normals should be in world space |
Material parameters buffer | APPLICATION SPECIFIED (1x FLOAT) |
The roughness buffer for the current frame provided by the application. By default, SSR expects the roughness to be the perceptual / artist set roughness squared. If your GBuffer stores the artist set roughness directly, please set the IsRoughnessPerceptual field of the ScreenSpaceReflectionAttribs structure to true . The user is also expected to provide a channel to sample from the material parameters buffer through the RoughnessChannel field of the ScreenSpaceReflectionAttribs structure. |
Motion vectors | APPLICATION SPECIFIED (2x FLOAT) |
The 2D motion vectors for the current frame provided by the application in the NDC space |
Name | Notes |
---|---|
Depth buffer thickness | A bias for accepting hits. Larger values can cause streaks, lower values can cause holes |
Roughness threshold | Regions with a roughness value greater than this threshold won't spawn rays |
Most detailed mip | The most detailed MIP map level in the depth hierarchy. Perfect mirrors always use 0 as the most detailed level |
Roughness perceptual | A boolean to describe the space used to store roughness in the materialParameters texture. If false, we assume roughness squared was stored in the G-Buffer |
Roughness channel | The channel to read the roughness from the materialParameters texture |
Max traversal intersections | Caps the maximum number of lookups that are performed from the depth buffer hierarchy. Most rays should terminate after approximately 20 lookups |
Importance sample bias | This parameter is aimed at reducing noise by modify sampling in the ray tracing stage. Increasing the value increases the deviation from the ground truth but reduces the noise |
Spatial reconstruction radius | The value controls the kernel size in the spatial reconstruction step. Increasing the value increases the deviation from the ground truth but reduces the noise |
Temporal radiance stability factor | A factor to control the accmulation of history values of radiance buffer. Higher values reduce noise, but are more likely to exhibit ghosting artefacts |
Temporal variance stability factor | A factor to control the accmulation of history values of variance buffer. Higher values reduce noise, but are more likely to exhibit ghosting artefacts |
Bilateral cleanup spatial sigma factor | This parameter represents the standard deviation ( |
The effect can be configured using the ScreenSpaceReflection::FEATURE_FLAGS
enumeration. The following table lists the flags and their descriptions.
Name | Notes |
---|---|
FEATURE_FLAG_PREVIOUS_FRAME |
When using this flag, you only need to pass the color buffer of the previous frame. We find the intersection using the depth buffer of the current frame, and when an intersection is found, we make the corresponding offset by the velocity vector at the intersection point, for sampling from the color buffer. |
FEATURE_FLAG_HALF_RESOLUTION |
When this flag is used, ray tracing step is executed at half resolution |
To integrate SSR into your project, you need to include the following necessary header files:
#include "PostFXContext.hpp"
#include "ScreenSpaceReflection.hpp"
namespace HLSL
{
#include "Shaders/Common/public/BasicStructures.fxh"
#include "Shaders/PostProcess/ScreenSpaceReflection/public/ScreenSpaceReflectionStructures.fxh"
} // namespace HLSL
Now, create the necessary objects:
m_PostFXContext = std::make_unique<PostFXContext>(m_pDevice);
m_SSR = std::make_unique<ScreenSpaceReflection>(m_pDevice);
Next, call the methods to prepare resources for the PostFXContext
and ScreenSpaceReflection
objects.
This needs to be done every frame before starting the rendering process.
{
PostFXContext::FrameDesc FrameDesc;
FrameDesc.Index = m_CurrentFrameNumber; // Current frame number.
FrameDesc.Width = SCDesc.Width; // Current screen width.
FrameDesc.Height = SCDesc.Height; // Current screen height.
m_PostFXContext->PrepareResources(m_pDevice, FrameDesc, PostFXContext::FEATURE_FLAG_NONE);
ScreenSpaceReflection::FEATURE_FLAGS ActiveFeatures = ...;
m_SSR->PrepareResources(m_pDevice, m_pImmediateContext, m_PostFXContext.get(), ActiveFeatures);
}
Now we invoke the method PostFXContext::Execute
. At this stage, some intermediate resources necessary for all post-processing objects
dependent on PostFXContext
are calculated. This method can take a constant buffer directly containing an array from the current and previous
cameras (for this method, you can refer to this section of the code [0] and [1]).
Alternatively, you can pass the corresponding pointers const HLSL::CameraAttribs* pCurrCamera
and const HLSL::CameraAttribs* pPrevCamera
for the current
and previous cameras, respectively. You also need to pass the depth of the current and previous frames (the depth buffers should not contain transparent objects), and a buffer with motion vectors in NDC space, into the corresponding ITextureView* pCurrDepthBufferSRV
, ITextureView* pPrevDepthBufferSRV
, ITextureView* pMotionVectorsSRV
pointers.
{
PostFXContext::RenderAttributes PostFXAttibs;
PostFXAttibs.pDevice = m_pDevice;
PostFXAttibs.pDeviceContext = m_pImmediateContext;
PostFXAttibs.pCameraAttribsCB = m_FrameAttribsCB; // m_Resources[RESOURCE_IDENTIFIER_CAMERA_CONSTANT_BUFFER].AsBuffer();
PostFXAttibs.pCurrDepthBufferSRV = m_CurrDepthBuffer; // m_Resources[RESOURCE_IDENTIFIER_DEPTH0 + CurrFrameIdx].GetTextureSRV();
PostFXAttibs.pPrevDepthBufferSRV = m_PrevDepthBuffer; // m_Resources[RESOURCE_IDENTIFIER_DEPTH0 + PrevFrameIdx].GetTextureSRV();
PostFXAttibs.pMotionVectorsSRV = m_MotionBuffer; // m_GBuffer->GetBuffer(GBUFFER_RT_MOTION_VECTORS)->GetDefaultView(TEXTURE_VIEW_SHADER_RESOURCE);
m_PostFXContext->Execute(PostFXAttibs);
}
Now we need to directly invoke the ray tracing stage. To do this, we call the ScreenSpaceReflection::Execute
method. Before this, we need to fill the passed structures ScreenSpaceReflectionAttribs
and ScreenSpaceReflection::RenderAttributes
with the necessary data. Please read the Input resources section for a more detailed description of each parameter
{
HLSL::ScreenSpaceReflectionAttribs SSRAttribs{};
SSRAttribs.RoughnessChannel = 0;
SSRAttribs.IsRoughnessPerceptual = true;
ScreenSpaceReflection::RenderAttributes SSRRenderAttribs{};
SSRRenderAttribs.pDevice = m_pDevice;
SSRRenderAttribs.pDeviceContext = m_pImmediateContext;
SSRRenderAttribs.pPostFXContext = m_PostFXContext.get();
SSRRenderAttribs.pColorBufferSRV = m_GBuffer->GetBuffer(GBUFFER_RT_RADIANCE)->GetDefaultView(TEXTURE_VIEW_SHADER_RESOURCE);
SSRRenderAttribs.pDepthBufferSRV = m_GBuffer->GetBuffer(GBUFFER_RT_DEPTH)->GetDefaultView(TEXTURE_VIEW_DEPTH_STENCIL);
SSRRenderAttribs.pNormalBufferSRV = m_GBuffer->GetBuffer(GBUFFER_RT_NORMAL)->GetDefaultView(TEXTURE_VIEW_SHADER_RESOURCE);
SSRRenderAttribs.pMaterialBufferSRV = m_GBuffer->GetBuffer(GBUFFER_RT_MATERIAL_DATA)->GetDefaultView(TEXTURE_VIEW_SHADER_RESOURCE);
SSRRenderAttribs.pMotionVectorsSRV = m_GBuffer->GetBuffer(GBUFFER_RT_MOTION_VECTORS)->GetDefaultView(TEXTURE_VIEW_SHADER_RESOURCE);
SSRRenderAttribs.pSSRAttribs = &SSRAttribs;
m_SSR->Execute(SSRRenderAttribs);
}
Now, you can directly obtain a ITextureView
on the texture containing the SSR result using the method ScreenSpaceReflection::GetSSRRadianceSRV
.
After this, you can apply SSR in your rendering pipeline using the formula below.
The alpha channel of the SSR texture stores
The algorithm can be divided into three main parts
In this stage, we prepare the necessary resources for the ray tracing stage
Blue noise aims to uniformly distribute sample points across the sampling domain. This is in direct contrast to white noise, which exhibits clumps and voids. Clumped samples result in redundant data, while voids represent missing data. Blue noise circumvents these issues by maintaining a roughly uniform distribution in space, thus avoiding clumps and voids.
Comparison of sampling with white noise and blue noise |
---|
![]() |
AMD's implementation prepares a 128×128 texture with screen-space (animated) blue noise (PrepareBlueNoiseTexture), based on the work of Eric [Eric Heitz, Blue Noise]. This will be used later to drive the stochastic sampling of the specular lobe.
In general, we follow the AMD's approach, with the exception that we generate two blue noise textures simultaneously and use a pixel shader
ComputeBlueNoiseTexture.fx instead
of a compute shader. The second blue noise texture is required for SSAO, as our goal is to prevent potential correlation between pixels obtained in the
SSR and SSAO steps. AMD uses uint32_t
format to store static arrays such as SobolBuffer
(256 * 256 * 4 bytes = 256 KiB) and
ScramblingTileBuffer
(128 * 128 * 4 * 8 bytes = 512 KiB), which respectively adds 768 KiB to the executable file. We changed format of the static arrays
from uint32_t
to uint8_t
, and also noticed that SobolBuffer
is used only along one dimension. Consequently, we have reduced the added size to the
executable file to 128 KiB by making these optimizations.
The hierarchical depth buffer is a mip chain where each pixel is the minimum (maximum for reserved depth) of the previous level's 2×2 area depths (mip 0 corresponds to the screen-sized, original depth buffer). It will be used later to speed up raymarching, but can also be used in many other techniques, like GPU occlusion culling.
Depth mip chain |
---|
![]() |
We recommend reading this article [Mike Turitzin, Hi-Z], as computing a hierarchical buffer for resolutions not divisible by 2 is not so trivial. The original AMD algorithm uses SPD [AMD-SPD] to convolve the depth buffer (DepthDownsample). SPD allows us to compute it in a single Dispatch call, but since we can't use compute shaders we use a straightforward approach. We calculate each mip level using a pixel shader SSR_ComputeHierarchicalDepthBuffer.fx, using the previous mip level as an input.
The original algorithm starts with a classification pass (ClassifyTiles).
This step writes pixels that will participate in the ray tracing step and subsequent denoising stages to a global buffer
(our denoiser differs from the AMD's implementation, but the underlying idea remains the same).
The decision of whether a pixel needs a ray or not is based on the roughness; very rough surfaces don't get any rays and instead rely on the prefiltered
environment map as an approximation.
Once this done we are (almost) ready to ray march, the only problem is that we don’t know the size of the global array of pixels to trace on the CPU
to launch a Dispatch
. For that reason, the technique fills a buffer with indirect arguments, with data already known to the GPU and uses a
DispatchIndirect
instead. The indirect arguments buffer is populated during the
PrepareIndirectArgs
pass. Nothing particular to mention here apart from that it adds 2 entries to the indirect buffer, one for the pixels to trace and one for the tiles to denoise later.
Since we cannot use compute shaders, we have made compromises. We use a stencil mask to mark pixels that should participate in subsequent calculations.
To do this, we first clear stencil buffer with 0x0
value, and then we run
SSR_ComputeStencilMaskAndExtractRoughness.fx
with stencil test enabled for writing and with the corresponding stencil buffer attached. If the roughness of the current pixel is less than RoughnessThreshold
,
we write the value 0xFF
to the stencil buffer; otherwise, the stencil buffer retains its previous value of 0x0
. In subsequent steps, we enable stencil test
for reading with the COMPARISON_FUNC_EQUAL
function for the value 0xFF
. While writing to the stencil buffer, we also write the roughness to a separate
render target. The separate texture allows us to simplify the code for roughness sampling in subsequent steps of the algorithm and improves performance.
Stencil mask for SSR | Final renderend image with SSR |
---|---|
![]() |
![]() |
We have now reached the most crucial part of the algorithm, for which all the previous preparations were made. We almost entirely repeat the (Intersect) step with some exceptions; refer to the difference section.
Our goal is to solve the rendering equation for the specular part (GGX Microfacet BRDF)
Unfortunately, it is impossible to calculate the rendering equation accurately in real-time, so AMD's SSSR uses Split-Sum-Approximation. We strongly recommend reading the original article by [Brian Karis, PBR], as well as viewing this presentation [Thorsten Thormählen, IBL] starting from slide 29, if you are interested in the derivation of the resulting formula
In the expression shown in the image above, we see two sums,
The result of the pre-computation
Let's now turn to the sum ColorBuffer
; otherwise, we take the value from the EnvironmentMap
(our algorithm does not require an Environment Map as in the AMD implementation, we use Confidence
color channel for resolve radiance;
refer to the difference section).
An attentive reader has likely noticed
Invalid ray |
---|
![]() |
As seen in the image above, the previously proposed method for creating samples to calculate incoming radiance generates rays that
are below the horizon. To solve this problem, we use the method [Eric Heitz, VNDF] for generating
half-vector
Let's directly move on to the stage of hierarchical ray marching. This method allows us to efficiently skip empty areas. Raymarching starts at mip 0 (highest resolution) of the depth buffer. If no collision is detected, we drop to a lower resolution mip and continue raymarching. Again, if no collision is detected we continue dropping to lower resolution mips until we detect one. If we do, we climb back up to a higher resolution mip and continue from there. This allows quickly skipping empty space in the depth buffer. If you're unclear about how hierarchical ray marching works, take a look at the animated slides below.
Hierarchical depth buffer traversal |
---|
![]() |
After finding the intersection point of the ray with the scene, we calculate the screen coordinate and sample incoming radiance from
ColorBuffer
and write it as the result for the current pixel (the original implementation samples from the Environment Map if no intersection occurs;
in our implementation, this is not the case). Also, we record the length of the ray, which will be needed during the denoising stages. You can view the
code implementing this step here:
SSR_ComputeIntersection.fx.
Key differences with the AMD implementation in the ray tracing step:
- Since the scene may contain multiple environment maps, each of which may interact with a different pixel, we decided not to pass the environment map to the SSR stage (although this means we lose grazing specular reflections in areas where ray does not intersect with scene). Instead, we write a confidence value (roughly speaking,
1
if an intersection occurred,0
if not) in the alpha channel of the resulting texture. This value will later be used by the user to interpolate between the value from the SSR and the Environment Map. - Since we do not use AMD's denoiser but our own, we needed to record the result
$p$ - PDF of the generated half vector$\mathbf{h}$ and light vector$\mathbf{l}$ . Read section spatial reconstruction - We added GGX Bias parameter that allows us to reduce the variance a bit. We recommend you watching this video to understand how it works [EA-SSRR]
Specular radiance after ray tracing | Confidence |
---|---|
![]() |
![]() |
As we can see, the image obtained from the ray tracing step is quite noisy. The goal of the next stage of the algorithm is to reduce that noise.
At this step of the denoising algorithm, we make the assumption that closely located surface points have the same visibility, so we attempt to accumulate the incoming radiance of each point from its nearby points.
Same visibility in the red zone |
---|
![]() |
This assumption introduces bias into the final image, but despite this drawback, it significantly reduces noise on surfaces with high roughness. To accumulate samples, we use an approach from [EA-SSRR].They suggest using this formula to accumulate samples:
As can be noted, to calculate the sums in front of the SurfaceHitWS
- RayOriginWS
)),
and then we record the results in separate textures. The variance will be needed for cross-bilateral filtering pass. The ray length will be required during the temporal accumulation stage for
parallax correction.
Specular radiance after spatial reconstruction | Variance |
---|---|
![]() |
![]() |
At this step, we accumulate the image obtained after the spatial reconstruction with the image obtained at this step but from a previous moment in time. We rely on the temporal coherence of frames, as the information between frames does not change significantly. One might wonder why we don't simply use the TAA (Temporal Anti-Aliasing) algorithm. A typical reprojection method, commonly employed in TAA, is not adequately effective for reflections. This is because the objects reflected move according to their own depth, rather than the depth of the surfaces reflecting them, which is what's recorded in the depth buffer. Therefore, we need to ascertain the previous frame's location of these reflected objects. Overall, our implementation of temporal accumulation is similar to the implementation by AMD Reproject.
We use the approach from this presentation [EA-HYRTR], slide 45. At a high level, we can divide the current stage into four parts:
- Calculate the statistics of the current pixel (mean, standard deviation, variance) based on color buffer.
- Compute intensity from the previous frame for two screen space points relative to the current pixel's position. The first position is formed by subtracting the motion vector from the current pixel's position, while the second position is calculated based on the ray length, which we computed during the ray tracing stage and modified during the spatial reprojection stage.
- Based on the statistics of the current pixel and the intensity values for two points, we select the point on which we will base the reprojection.
- If the reprojection is successful, we interpolate the values between the intensity from the selected point (which we calculated in the previous step) and the intensity value for the current pixel. If the reprojection is not successful, we record the intensity value from the current pixel.
For implementation details, look at the source code: SSR_ComputeTemporalAccumulation.fx.
Specular radiance after temporal accumulation |
---|
![]() |
This stage is based on a standard bilateral filter [Wiki, Bilateral filter] with the following specifics:
- We use variance calculated during the spatial reconstruction stage to determine the
$\sigma$ for the spatial kernel$G_s$ of the bilateral filter. - Since the image being processed is quite noisy, instead of using the pixel intensity of the processed image to create the range kernel
$G_r$ , we use the depth buffer and the normals buffer to form the kernel. We took the functions for generate the range kernel$G_r$ from the SVGF algorithm [Christoph Schied, SVGF], expressions$(3)$ and$(4)$
You can find the implementation here: SSR_ComputeBilateralCleanup.fx
Specular radiance after bilater filtering |
---|
![]() |
The final frame with reflections is shown below.
Final image after tone mapping |
---|
![]() |
- Add support for reversed depth buffer
- Add support for compressed normal map
- Add dynamic resolution for the raytracing stage, which will increase performance on weaker GPU
- Spatial reconstruction step uses screen space to accumulate samples. Try to perform accumulation in world coords, this should reduce bias
- We can also try calculating direct specular occlussion in the ray tracing step
- The bilateral filter does not have separability property, so it will have poor performance on large kernel dimensions. Consider replacing it by Guided Image Filtering since this algorithm has this property
- Current implementation of hierarchical ray marching has a problem. We have to try to fix it
- [AMD-SSSR]: FidelityFX Stochastic Screen-Space Reflections 1.4 - https://gpuopen.com/manuals/fidelityfx_sdk/fidelityfx_sdk-page_techniques_stochastic-screen-space-reflections/
- [AMD-SPD]: FidelityFX Single Pass Downsampler - https://gpuopen.com/manuals/fidelityfx_sdk/fidelityfx_sdk-page_techniques_single-pass-downsampler/
- [EA-SSRR] Frostbite presentations on Stochastic Screen Space Reflections - https://www.ea.com/frostbite/news/stochastic-screen-space-reflections
- [EA-HYRTR] EA Seed presentation on Hybrid Real-Time Rendering - https://www.ea.com/seed/news/seed-dd18-presentation-slides-raytracing
- [Eric Heitz, VNDF] Eric Heitz' paper on VNDF - http://jcgt.org/published/0007/04/01/
- [Eric Heitz, Blue Noise] Eric Heitz' paper on Blue Noise sampling - https://eheitzresearch.wordpress.com/762-2/
- [Kostas Anagnostou, SSSR] Notes on Screen-Space Reflections with FidelityFX SSSR - https://interplayoflight.wordpress.com/2022/09/28/notes-on-screenspace-reflections-with-fidelityfx-sssr/
- [Thorsten Thormählen, IBL] Graphics Programming Image-based Lighting - https://www.mathematik.uni-marburg.de/~thormae/lectures/graphics1/graphics_10_2_eng_web.html#1
- [Brian Karis, PBR] Brian Karis: Real Shading in Unreal Engine 4, SIGGRAPH 2013 Course: Physically Based Shading in Theory and Practice - https://cdn2.unrealengine.com/Resources/files/2013SiggraphPresentationsNotes-26915738.pdf
- [Mike Turitzin, Hi-Z] Hierarchical Depth Buffers - https://miketuritzin.com/post/hierarchical-depth-buffers/
- [Christoph Schied, SVGF] Spatiotemporal Variance-Guided Filtering - https://cg.ivd.kit.edu/publications/2017/svgf/svgf_preprint.pdf
- [Wiki, Bilateral filter] - Bilateral filter https://en.wikipedia.org/wiki/Bilateral_filter