Alex Tardif Graphics Engineer
Alex Tardif Graphics Engineer
Update
In many ways, this post is now a subset of a more recent and wonderfully visual tutorial on TAA by Emilio López, which
you can find here: https://www.elopezr.com/temporal-aa-and-the-quest-for-the-holy-trail/ If you are looking for a
walkthrough of TAA as a whole and not just implementation reference, I highly recommend starting there and using this
page as an additional resource afterwards.
Overview
Recently I found myself catching up on TAA, tangentially related to my other post about trying to avoid main-pass temporal
anti-aliasing. In browsing recent(ish) presentations on TAA, I'm under the impression that there aren't many approachable
posts about implementing it (at least not that I was able to find) for someone learning it for the first time and maybe finds it
intimidating, or potentially is confused about how to put the presentations and papers into practice. The reality is that a
decent TAA implementation is within reach of anyone who is interested in learning it, without a significant time investment.
I won't be going deep into TAA theory and background myself, instead I'll be linking the works of people far more well
informed on that than I am before moving onto what I believe to be the essential parts of a serviceable general-purpose TAA
implementation. Nothing I'm doing here is in any way new or original, just an accumulation of various online resources in
one place.
Background
Here I'll provide what I consider to be essential background reading/presentations on the type of TAA we'll be implementing
below.
These are worth reading before going any further in this post:
-High Quality Temporal Supersampling by Brian Karis (2014)
-Temporal Reprojection Anti-Aliasing in INSIDE by Lasse Jon Fuglsang Pedersen (2016)
-An Excursion in Temporal Supersampling by Marco Salvi (2016)
-A Survey of Temporal Antialiasing Techniques by Lei Yang, Shiqiu Liu, and Marco Salvi (2020)
I recommend these if you're curious to go back to earlier days of TAA in the game industry and see the ideas at the time:
-TSSAA (Temporal Super-Sampling AA) by Timothy Lottes (2011). Thank you to P. A. Minerva for finding the archive.
-Anti-Aliasing Methods in CryEngine 3 by Tiago Sousa (2011). There were lots of other great AA presentations at
SIGGRAPH that year, some of them also covered temporal AA.
-Graphics Gems from CryEngine 3 by Tiago Sousa (2013).
And these last two detail a lot of practical issues you run into when working with TAA in a real production environment,
which I consider to be as valuable as the implementation itself.
-Temporal Supersampling and Antialiasing by Bart Wronski (2014).
-Temporal Antialiasing in Uncharted 4 by Ke Xu (2016).
See my post here for details on other AA methodologies, including links to alternative temporal antialiasing
implementations like Activision's Filmic SMAA.
Velocity Buffer
First thing's first, we need to fill a motion vector target, sometimes called a velocity buffer. For starting out, you'll want to
use RG16F as the format so that you have enough floating point precision and aren't fussing with encoding to some more
optimized format, and clear it to 0 every frame. If you're doing a full z-prepass you can do this there, otherwise you can do it
during your main pass (forward pass, gbuffer pass, etc). To fill the target, we're going to need some additional constants -
wherever you're passing in your camera's view + projection matrices, make sure you also store off the matrices from the
previous frame and then supply those to your shader as well. Likewise, for every object you're rendering, pass in the
previous world matrix along with the current world matrix. In the long run that last step is unnecessary for static objects (I'll
talk about this later on), but let's keep things simple and just do this for everything. For skinned objects, you'll either need to
pass in the previous frame's bone transformation matrices, or provide a buffer of previous-frame-transformed vertex
positions.
For skinned objects you'll need to take your skinning path here as well with the previous bone transforms, or alternatively
read in your previously transformed positions from some storage buffer if you already have that.
The pixel shader work is also simple, but this step also proves to be a common place for simple mistakes that can ruin your
resolve later, as we'll soon see.
I don't know many people that actually output this as NDC. I prefer it for this post because I find it easier to not mess up
later steps. People familiar with TAA might be wondering where the jitter is at this point - I'm purposefully leaving it for the
end. We'll first make sure that the reprojection and sampling is correct before introducing the additional complication of
jittering.
Resolve
Now for the fun part! The TAA resolve is where we're going to put all the goods together that we learned from the links
above (and more). After your lighting pass, but before any post processing, we're going to insert the TAA resolve pass. This
means we'll be operating on the HDR targets pre-tonemapping as described in some of the links above. The inputs we need
for the resolve shader are: source color (the current frame's image so far), the history color (last frame's TAA result), the
motion vector target we just populated, and the depth buffer. Since the first frame has no accumulation, a decent default for
frame 0 is to simply copy the source into the history, skip the resolve for a frame, or otherwise ignore the history in the
resolve for the first frame. That accounted for, lets start the resolve. For simplicity's sake we'll do it with a pixel shader using
a full-screen triangle.
The first piece is the neighborhood sampling loop, which is going to accomplish a number of things for us, and we'll go
through each of those step by step.
m1 += neighbor;
m2 += neighbor * neighbor;
The first two lines are setting up the sampling position, making sure we don't exceed the texture extents. Here I'm passing in
a constant, you can do that or use GetDimensions() in the source color input. Next, we actually sample the source color from
this frame. The max(0, ...) here is to help make sure we don't propagate any garbage values from earlier in the frame, if
you've ever seen TAA "bleed" some bad pixel until it consumes the entire image, well, this is why we do this! The purpose
of the next two lines has to do with a method described in Karis's presentation, but was something I didn't quite understand
from the slides until I found this tweet from Tomasz Stachowiak where he spelled it out (thank you Tomasz). Here's what he
said in this and the preceeding tweet:
"Btw, one thing that helps with noise/jitter a bit is un-jittering the image inside the TAA resolve shader. Instead of point-
sampling / fetching the new frame at current pixel location, do a small filter over a local neighborhood. Brian Karis details
that in his TAA talk; he uses Blackman-Harris, but I found that Mitchell-Netravali yields a bit more sharpness. You
effectively reconstruct the image at pixel center, treating the new frame as a set of sub-samples. This negates jitter, and
stabilizes the image."
And it does indeed do exactly as he (and Karis) describe. In my personal experience I would agree here with Tomasz's
preference of a Mitchell filter. If you haven't implemented filters like this before, I highly recommend checking out Matt
Pettineo's github filtering project, which features a number of techniques from his The Order: 1886 SIGGRAPH talk, which
is worth a read.
So now we're accumulating filtered sample information for the current frame, but we're not done! Next we need to grab the
information for a neighborhood clamp and variance clip. For the former we collect neighborhoodMin and
neighborhoodMax, and for the latter we collect the first and second color moments exactly as described in Salvi's
presentation. We'll use these later on. Lastly, we find the sample position of the closest depth in the neighborhood, which we
will use for sampling the velocity buffer. Notice the greater-than sign here, I'm using a reverse depth buffer (and you should
too), if you are not, you will need to flip the sign and default value of closestDepth. There are other choices that people
make to decide where is best to sample the velocity buffer for a given pixel, for example some use highest velocity, but I
prefer velocity at the closest depth (the links at the beginning cover this to some extent as well). Moving on!
if(any(historyTexCoord != saturate(historyTexCoord)))
{
return float4(sourceSample, 1);
}
float3 historySample = SampleTextureCatmullRom(HistoryColor, LinearSampler, historyTexCoord,
float2(historyDimensions.xy)).rgb;
First we get our motion vector, which we do by sampling our velocity buffer and turning that NDC vector into a screen-
space texture coordinate offset. Then we take the current texture coordinate that we got from the full-screen triangle vertex
shader, and subtract that motion vector offset to arrive at the texture coordinate for the history sample. Note I am subtracting
here because of the order of the subtraction done when filling the velocity buffer. I personally find this easier when thinking
about what's being done - subtracting the motion to arrive at the texture coordinate for the history. If you don't like that, you
can add the motion vector here and swap the subtraction from that earlier step.
Next, we calculate our new (filtered) source sample and run a simple check of the history texture coordinate to see if it's
outside the bounds (0-1). If it is, we stop right here and return the filtered source sample. That's an imperfect solution, but
it's a decent starting point for accounting for cases where there is no history to pull from at all.
And now finally we sample the history color from last frame, using an optimized Catmull-Rom filter courtesy (again, thanks
Matt!) of Matt Pettineo, feeding it our history color texture, a linear clamp sampler, the history sample location that we
calculated, and the size of the history color texture either passed in through a constant or via GetDimensions().
We've arrived at the variance clipping calculations, lifted straight from Salvi's presentation. We additionally clamp the
history against the neighborhood min/max as described in the paper, and pass the variance bounds off to the clip function
provided by Playdead from the linked INSIDE presentation.
We've reached the end! We'll use a decent default for how much of the source sample to blend (0.05), and apply a little bit of
what's known as "anti-flicker" (described in the links) to reduce the possibility of encountering high frequency details that
flicker, especially due to jitter (which we'll be adding soon). The Luminance function here is just the simple dot with
float3(0.2127, 0.7152, 0.0722). This won't eliminate flicker, and in fact to do this better, you likely want to be applying
luminance filtering to your source and history sampling. Even then, you will still encounter flickering, and that gets into
applying additional mitigations to other passes as well - specular AA, prefiltering, making things like bloom temporally
aware, etc. This step at least provides an example of one such kind of mitigation that can be done for the purpose of
illustration. Luminance filtering itself has plenty of imperfections, including that it does not (in this form) account for
differences in perceptual lightness of different colors, but in practice this is better than not doing anything.
Alright, we've got our TAA resolve! When your camera is in motion, you should hopefully have something that is nicely
anti-aliased and low on ghosting, and it should also still appear fairly sharp especially compared to simpler TAA
implementations (those filtering choices are very important). But then when you let the camera sit still, the image looks
super aliased! What gives? Without any motion, we're just sampling the same locations over and over and there's nothing to
blend! Enter jittering.
Jitter
We fix the low/no camera motion aliasing by applying sub-pixel jittering through the projection matrix, the idea being that
with enough samples you will converge and stabilize on an antialiased image. In a game this will not truly converge because
of the limitations of being real-time, but it will do enough to give you something that looks good. Before going into the
implementation of jittering to what we've done above, I'd like to present this spicy food for thought from someone whose
perspective I very much appreciate on the subject:
"I sometimes like to get snarky about that and say that jittering is only useful for making glamour shots when the camera
and world is completely still. I mean you can jitter with whatever fancy pattern you want and apply a reconstruction filter
based on those offsets, but once the camera moves it's all out the window. It's not like everything in the image is perfectly
translating in pixel-sized increments along X/Y, in reality it will end up that your shading points are going to "slide" all over
all the geometry. This is of course why the choice of filter for sampling the previous frame texture after reprojection is so
important, since it's not like the exact shading point for the current frame is going to sit nicely at a pixel center in previous
frame. It will always be in-between, and so a sharper reconstruction will keep things from getting too blurry and smeary.
Depending on the game there are of course times where the effective movement of parts of the screen is basically 0 even if
the camera is moving, but then again if something isn't moving maybe you should just leave the sample point alone to get a
stable image instead of potentially introducing flickering from your jitter pattern."
Good thoughts to keep in mind, for now we will continue with a standard addition of jittering to what we've already
implemented. First up is to generate our jittering offsets, the current popular option being a Halton sequence of (2, 3) for (x,
y). Here's a quick implementation from the pseudocode on the wiki page.
while (i > 0)
{
f /= static_cast<float>(b);
r = r + f * static_cast<float>(i % b);
i = static_cast<uint32_t>(floorf(static_cast<float>(i) / static_cast<float>(b)));
}
return r;
}
And now we'll make use of it to generate the jitter for each frame.
Here the variable jitterIndex increases every frame up to the desired sample count, and dimensions is your render target
dimensions. A good default to start from is 8 (so jitterIndex++; jitterIndex = jitterIndex % 8;) but it's worth playing around
with it to see what works for your application. Note the "+ 1" input to the Halton function in order to avoid the first index
returning 0. To apply it, you can either add this jitterX/Y to the projection matrix's [2][0] and [2][1] or [0][2] and [1][2]
(depending on row vs column major), or more clearly you can construct a translation matrix with the jitter and then multiply
it with your projection matrix. You'll also want to store this jitter in a constant, and just like we did with the view and
projection matrices, we'll track the previous frame jitter and pass it as a constant as well. We need these for the next step
where we will modify the velocity buffer generation.
This step is deceptively important when working with a static world/camera, because if we don't remove the jitter, we'll be
sampling outside of our intended reconstruction area which will create a blurry result that we don't want. Put more simply,
we want the motion vectors to be zero when there is no motion! That way, the jittered projection will be working as
intended. Less obviously important, but likely still beneficial is to also incorporate the current frame jitter into
subSampleDistance during the source color sample filtering.
There are lots of other options that people use for jittering. One that Scott Lembcke recommended is Martin Robert's R2
sequence
With this taken care of, you should have a perfectly serviceable TAA implementation for general purpose usage.
Sweeteners
Beyond TAA components like the ones we implemented here, there are plenty of other tricks people use to improve their
TAA implementations. Most of those I tend to categorize under the category of "sweeteners," that is, these are features that
are more than likely geared to the engine you're working with, the type of content in your game, or a combination of both. I
think it can be important to distinguish these from things like reprojection/clipping/clamping/etc because oftentimes other
features don't necessarily translate well from one context to another, and so it can be confusing when people try them out
and they don't work as expected.
When you start to get into stuff like depth based and velocity based sample acceptance/rejection, or stencil masking your
TAA, you're likely getting into more game-specific stuff. I think using YCoCg for clipping probably falls under this
category as well. At least the way it's traditionally implemented, from what I've seen it can work a little better in some
situations but not so well in others. My limited exploration with perceptual lightness has made me somewhat skeptical of
dropping in YCoCg and calling it a day. Likely a lot more tweaking would be needed in practice, and even then I get the
feeling there are other color spaces you could be using. I do absolutely agree with the thought process that led to YCoCg in
TAA though, I think there is a lot of potential to expand upon it. I plan on doing that too, and I'll share my findings on that
whether it's successful or not. Don't take my word for it though, try it out for yourself and see what you think.
From Philip Hammer: "Handling disocclusions as in Uncharted 4 (masking objects and compare curr/prev masks) really
helps with ghosting for 3rd person characters or 1st person weapons."
From Kyle Hayward: "This [the above comment] and frame counting history greatly reduces ghosting."
From Alan Wolfe: "Jorge Jimenez's under appreciated "interleaved gradient noise" is a great choice for per pixel random
numbers when rendering under TAA. http://www.iryoku.com/next-generation-post-processing-in-call-of-duty-advanced-
warfare" See the tweet for more details, as well as Alan's own resource (thanks Alan!):
https://blog.demofox.org/2017/10/31/animating-noise-for-integration-over-time/
Optimization
My implementation above is unoptimized. Given the sampling involved, there is a lot of potential to improve performance
by converting it to a compute shader and making use of groupshared memory. Another important optimization to evolve
your implementation is to not export velocity/motion for static objects in your pass that fills these out. For static objects, the
motion vector (outside of camera motion) will be 0, so an obvious improvement is to not write them out for static objects
and instead run a compute shader after that pass to apply the camera motion by reprojecting from the values in the depth
buffer, and write that out to the velocity buffer.
Challenges
The initial difficulty of working with TAA is getting it up and running properly the first time. Tiny mistakes can easily ruin
your results, a missed multiplication or incorrect conversion can mean the difference between tons of ghosting and a
relatively clean image. Worse, tiny mistakes can lead to mostly subtle issues that go unnoticed in most cases, and these can
be difficult to track down. This is why my approach above focuses on a lot of the basics, and taking them piece by piece.
Start simple, then slowly add features, test, verify, test again.
Once you're past the initial implementation, the rest of your time spent with TAA is likely to be dealing with not-so-edge-
cases like transparency, FX like particles, or anything with UV scrolling, for example. Check the two examples I linked
towards the top about challenges that you would run into in a production environment. No TAA implementation is immune
to these issues, every implementation requires care and feeding throughout the life of a project. This is maybe why we get so
attached to particular implementations, because much like a pet we spend years nurturing them :-)
"Fun" example from Don Williamson: "I had one client that placed objects in a scenegraph hierarchy, parenting to a root.
They'd then unparent to give to gfx but used the initial matrix and the unparented one at two parts of the gfx pipe, resulting
in a subtle reproj drift. Took a while to track that!"
https://github.com/NVIDIAGameWorks/Falcor/blob/master/Source/RenderPasses/Antialiasing/TAA/TAA.ps.slang
https://github.com/playdeadgames/temporal
https://github.com/h3r2tic/rtoy-samples/blob/master/assets/shaders/taa.glsl
https://github.com/Unity-Technologies/Graphics/blob/master/com.unity.render-pipelines.high-
definition/Runtime/PostProcessing/Shaders/TemporalAntialiasing.hlsl
https://github.com/TheRealMJP/MSAAFilter/blob/master/MSAAFilter/Resolve.hlsl
https://github.com/turanszkij/WickedEngine/blob/master/WickedEngine/shaders/temporalaaCS.hlsl
https://gist.github.com/Erkaman/f24ef6bd7499be363e6c99d116d8734d
https://github.com/GameTechDev/TAA/blob/main/MiniEngine/Core/Shaders/TAAResolve.hlsl
https://github.com/PanosK92/SpartanEngine/blob/master/Data/shaders/temporal_antialiasing.hlsl
https://github.com/NVIDIA/Q2RTX/blob/master/src/refresh/vkpt/shader/asvgf_taau.comp
https://ziyadbarakat.wordpress.com/2020/07/28/temporal-anti-aliasing-step-by-step/
Contact
alexdtardif@gmail.com