Lopez Nicolas Rendering Assassins Creed
Lopez Nicolas Rendering Assassins Creed
Welcome to my talk.
Today is also the launch date of the game, and I’m really happy to be here on this special
day.
1
Acknowledgements
First, I want to take a minute to thank the whole Anvil rendering team for their
contribution.
2
Speaker
• Nicolas Lopez
@Nicolas_Lopez_
@nicolas-lopez.bsky.social
@NicolasLopez@mastodon.gamedev.place
And here are some of the games I’ve worked on in the past.
Now I’ll show a trailer of ACShadows to give a bit some context to this talk.
3
Trailer
4
Agenda
1. Anvil Engine
2. Assassin’s Creed Shadows
3. Rendering Assassin’s Creed
4. Scalability & Performance
5. Conclusion
I’ll then dive into Assassin’s Creed Shadows, and some if it’s rendering pillars.
5
Anvil Engine
6
ANVIL
• Born with Assassin’s Creed (2007)
7
ANVIL
• Born with Assassin’s Creed (2007)
• Large dense open world games
8
ANVIL
• Born with Assassin’s Creed (2007)
• Large dense open world games
• Systemic gameplay [Lefebvre 18]
9
ANVIL
*click* Ghost Recon, *click* For Honor *click* and Rainbow Six Siege
If you wonder what a competitive FPS, a military shooter, and Assassin’s Creed have in
common… it’s complicated
10
GENEALOGY OF ANVIL
BLACKSMITH
SCIMITAR
SILEX
*click* with Scimitar, the Assassin’s Creed engine, *click* but also Blacksmith, *click*
and Silex.
11
Paradigm shift
• Many engines at Ubisoft
• Anvil, Snowdrop, Dunia, Voyager, Ubiart, …
• Multiplication of efforts
• Need for tech convergence
• Anvil as a shared Engine
Eventually, we made several observations that would change the way we work:
Working on the same feature or system in so many engines didn’t make sense anymore
12
Context
• Shared Engine
• Used for multiple games, brands, and genres
• Mono-repo
• One single code base
• Multiple teams
• Multi-studio transverse team
• Multi-studio productions
*click* Today we work with Anvil as a shared engine, shared across multiple games,
brands, and genres..
13
Assassin’s Creed Shadows
14
From Valhalla
To Shadows
When we started working on AC Shadows,
ACValhalla had just shipped and was our benchmark as an open world AC rpg.
AC Shadows would be the first next gen only Assassin’s Creed game.
15
From Valhalla to Shadows
• ACValhalla: cross gen title, little scalability (resolution)
• Xbox One Series X, Quality, 2160p, 21.4ms
The main driving factor for scalability was its render resolution. (We can see that the GPU
is underutilized at 4k Native 30fps).
16
A Large Scale Systemic Open World
• Anvil
Scalability
• 16x16km Weather
Quality
30fps, Raytracing
• Destruction Cloudy
Balanced
• Raytracing Sunny
40fps, Raytracing
*click*
It’s made with Anvil
It’s 16x16km open world with Time of Day
So far so good?
I swear, I don’t work for Kojima, but I probably made the same face
17
A Large Scale Systemic Open World
• Anvil
• 16x16km
• Time of day
• Destruction
• Raytracing
• Virtual Geometry
• Systemic Weather
• Seasons
*click*
It’s made with Anvil
It’s 16x16km open world with Time of Day
So far so good?
I swear, I don’t work for Kojima, but I probably made the same face
18
Turns out it wasn’t SO crazy…
19
In more details,
20
A Large Scale Systemic Open World
• Gen 5 only
• Baked or ray traced Global Illumination
PS5 / XBSX Performance – 60fps Balanced – 40fps Quality – 30fps
Target Resolution* 1080p 1280p 1440p
GI Technique Baked GI Raytraced GI Raytraced GI
PS5 Pro
Target Resolution* 1080p 1280p 1440p
GI Technique Raytraced GI Raytraced GI Raytraced GI + Specular
XBSS
Target Resolution* X X 900p
GI Technique X X Baked GI
*Dynamic Resolution
We have 3 modes on consoles: perf, balanced and quality with various target
resolutions.
We decided very early in the development we’d favor Raytracing in Quality modes.
21
Rendering Assassin’s Creed
22
World Structure
16km
• 16x16km
• World partitioning
• Multi-user editing
• Long range rendering
• Streaming grid layers
• LOD Meshes
• Fake Entities
• Vistas
• Data Driven 16km
*click* The world is partitioned in cells and the engine is built with multi-user editing in
mind.
23
Data Driven World Structure
• Short range grid
• Loading range: 96m. Cells: 32m
• Small objects that don’t need to be seen after 96m
• Main grid
• Loading range 128m. Cells: 32m
• Regular props, NPC spawners, … the heaviest
• Long range grid
• Loading range: 384m. Cells: 128m
• Large objects
• Fake entity grid:
• Loading range 1024m (near)
• Loading range 4096m (far)
• Cells: 512m
• “Point cloud”
• Mass impostor rendering
• Until 8182m
• Terrain Vista
In ACShadows, we have various data driven loading grids to represent various levels of
details.
(Describe)
Beyond those grids, we still render what we call “Point Clouds”, our mass impostor
renderer, and the Terrain Vista.
24
Terrain
25
Terrain
LOD Meshes
26
Terrain
LOD Meshes
Fake Entities
27
Fake Entities
28
Fake Entities
LOD Meshes
29
“Point Clouds”
I mentioned before our “point cloud” renderer. It’s a mass quad renderer that we use to
render our trees up to 8 kms aways.
30
“Point Clouds”
I mentioned before our “point cloud” renderer. It’s a mass quad renderer that we use to
render our trees up to 8 kms aways.
31
“Point Clouds”
I mentioned before our “point cloud” renderer. It’s a mass quad renderer that we use to
render our trees up to 8 kms aways.
32
GPU Driven Pipeline(s)
• Corner stone of Anvil Engine since Assassin’s Creed Unity (2014)
• Shipped many games
• Multiple iterations
GPU Cluster Culling, Indirect Drawcalls Mass instancing and batching Continuous LODs and pixel-precise geometry
GPU driven pipelines have been the corner stone of Anvil since ACU in 2014.
It shipped many games since then and went trough multiple iterations
33
GPU Driven Pipeline(s)
• Batch Renderer in Assassin’s Creed Unity [Haar and Aaltonen 15]
• DirectX 11 class APIs
• MultiDrawIndexedInstanceIndirect
• Discrete LODs of Clustered meshes
• Fine culling at the instance/cluster/triangle level
In ACUnity, the Batch Renderer, as we call it, was a DX11 class API GPU Driven pipeline.
The main goal of the Batch Renderer was to reduce the cost of expensive DX11 draw
calls.
34
GPU Driven Pipeline(s)
• Significant work still on the CPU
• No bindless
• per material batching
• large number of drawcalls
• No Async
• per pass culling before each pass
• delays actual rendering
The Batch Renderer had a significant part of it was still on the CPU.
It didn’t support bindless resources, which limited the amount of batching it could
perform (per material batching).
35
GPU Driven Pipeline(s)
• GPU Instance Renderer since AC Valhalla/Mirage [Bussière and Lopez 24]
• DirectX 12 class APIs
• Execute Indirect
• More work on the GPU
• Batch on load
• Bindless
• Async Culling
The second version of our GPU Pipeline is called GPU Instance Renderer or GPUIR.
The GPUIR performs batching on load and is designed to cull millions of instances per
frame.
And as you can see, more steps were moved to the GPU.
36
GPU Driven Pipeline(s)
• Database
• container for data structures that can be shared between the CPU and the GPU
• DOD (Data Oriented Design), with the convenience of C++ OOP on the GPU
• relies on Shader Input Groups (SIG) to generate binding code for C++ and HLSL [Rodrigues17]
• Share full scene description between the CPU and the GPU
• Database analogy
• each member in the structure would be a column
• each instance within the array of MyObject would be a row in the table
I know what you are thinking… but it’s not what you are thinking
*click*
Database is a container for data structures that can be shared between the CPU and
the GPU.
We rely on Shader Input Groups (SIG), our internal compiler for shader bindings,
*click*
In the context of our GPU Driven Pipeline, Database is used to share the
*click*
We use database analogies to structure our data.
37
Database
• SIG Compiler
38
Database Relations
• 1 to 1
• 1 to n
One of the key features of database is the support of relations. Relations are links
between different tables.
39
Database Replication
• Different table instances for CPU and GPU storage
• Copy
• Copy the whole table from CPU to GPU (ByteAddressBuffer)
We support different modes of data replication to ensure the data is propagated from on
instance to another (generally from the CPU to the GPU)
• Copy
most simple replication available. It’s a simple full copy to a ByteAdressBuffer
40
Mesh Description
And this is what our scene description looks like on the GPU. It’s quite close to its CPU
representation.
In a nutshell:
Each LOD has 2 LODDistances, 1 main view and 1 for shadows to allow customization of
shadow LOD distances,
The association of a geometry with a PSO is made at the Submesh level with a
BatchHash.
We make sure we have only 1 batch per PSO and there are no duplicates.
*click*
And this is how we declare this with tables, in SIG. Each type is a database table, and they
are linked by row and range properties.
41
Scene Description
Mesh Description
Now if we zoom out a bit, we want to focus mainly on LeafNodes and CullInstances:
*click*
*click*
We gather instances that are spatially close to each other in the same leaf node to
minimize their bounding volume and make coarse culling more efficient.
42
Scene Description
CPU
Declare tables Update tables
GPU
Fetch data and use it
Add a mesh
*click*
First, we declare CPU and GPU tables separately.
*click*
Then we populate tables. This is how we add a mesh to the CullMesh table and set various
properties to this mesh entry.
*click*
Eventually, we update GPU tables, following the chosen update policy (either full copy, or
dirtied entries).
*click*
Finally, we fetch data on the GPU and use it as we like.
The interface is very CPP like, while maintaining maximum cache efficiency in data
access patterns.
43
GPU Instance Renderer (GPUIR)
Finally, this is what the entire GPU Instance Renderer pipeline looks like.
It has many steps, some of them being very specific to our engine. *click*
*click* On the CPU, there is a coarse Frustum culling to cull leaf nodes (basically groups
of instances)
On the GPU, we have 2 large steps: Frame Culling and Per Pass Culling.
*click* Frame Culling culls instances that passed CPU culling against all view frustums.
Then LOD selection and blending logic is also performed at this stage.
*click* Per Pass Culling performs pass specific Frustum + Occlusion culling. For example,
Sun Shadows perform anti frustum culling.
It also prepares instance data required by VS and PS to access geometry and material
data for a given instance.
Finally at render time, we support optional Cluster and Triangle culling before the final
ExecuteIndirect calls.
44
Micropolygon geometry (MPH)
• Based on Nanite [Karis21]
• Evolution of our clustered mesh pipeline
• Now a cluster hierarchy
• Continuous LODs
• Per cluster geometry streaming
• Mesh Shader / Software Rasterization
• Integrates to GPUIR
*click*
It’s based on Nanite and is evolution of our clustered mesh pipeline.
And polygons are either hardware or software rasterized based on their size.
45
Micropolygon geometry
• Used for Gbuffer and Shadow rasterization
• ½ memory load vs standard GPUIR geometry
• MPH Rasterization similar as others
• METIS-based partitioning for Cluster DAG continuous LOD selection
• Mesh Shader / Software Rasterization into a 64bit Visibility Buffer
• 2 Pass Hierarchical Z Buffer occlusion system
• Deferred Material rendering from Visibility Buffer into Gbuffer
• GPU Feedback based cluster page streaming
• 128 triangles per cluster
Geometry takes roughly half the memory of the same GPUIR geometry
We use METIS [library] based partitioning (ie. Group clusters with the most shared edges).
It renders Deferred Material from a visibility buffer, output into the Gbuffer
Clusters are made of 128 triangles [clusters used to be 64 tris in ACU to match the nb of
lanes per wavefront],
and cluster streaming is based on a GPU Feedback loop [culling traversal – page
requests – async cpu readback]
46
Micropolygon geometry
• Simplygon for cluster group simplification
• Support of user custom shader code (manual API)
• Auto-analytic derivatives
• Lots of tweaking to reduce root page size for a high amount of “lego-kits”
Some differences:
And finally, we had to do a lot of tweaking to reduce root page size to support of a high
number of instances for our game.
47
GPU Driven Pipeline(s)
• Micropolygon
• Instances
• 28k main view
• Triangles
• 34M rendered (all passes)
• 18M main view + 16M shadows & misc
• Rasterization
• 30M software + 4M hardware
Some stats
*click* In this game, Micropolygon is our main city architecture and static geometry
renderer
In Kyoto, we have around 28k micropoly instances, for 34M rendered triangles. As you
can see, in this case, 90% of the triangles are software rasterized.
We have around 9k instances spread across all passes and render 1.7M triangles with it.
48
GPU Driven Pipeline(s)
• Micropolygon
• Instances
• 1k main view
• Triangles
• 7.6M rendered (all passes)
• 1.6M main view + 6.0M shadows & misc
• Rasterization
• 7.3M software + 0.3M hardware
Now in a forest.
*click* Micropolygon is less solicited in this scene but still renders some buildings and
rocks.
*click* The GPUIR is the pipeline doing the heavy lifting here, compared to the previous
frame.
We cull 3M instances to finally render 30k of them. The scene has 1.5B triangles, and we
end up rendering close to 7M of them.
49
Global Illumination
• Large Scale Open World GI
• Data Oriented Design
• Multiple iterations
Open World Irradiance Volumes Large Scale Sparse GI Ray traced Diffuse and Specular
It’s always been a first-class citizen in Assassin’s Creed games, and in my heart
In ACOrigins in the middle, we made it sparse to adapt it to larger 16x16km open worlds.
Finally, in ACShadows, on the right, we added support for seasons, and developed a new
ray traced GI pipeline.
50
Global Illumination
• Scalable Global Illumination pipeline
• Mandatory ray tracing in the hideout
PS5/XBSX Performance modes PS5/XBSX Quality modes PS5 Pro Quality mode
XBSS (Memory constraints) PS5 Pro Performance mode High End PCs
Low End PCs
Ray tracing is costly, and we didn’t want to sacrifice other parts of the game to ship it at
all cost, especially at 60fps on consoles.
Moreover, considering the hardware landscape, it made sense to give more choice to the
players.
Our GI pipeline is a spectrum, going from baked to ray trace diffuse and specular. Ray
tracing can be either rendered in hardware or software with compute shaders.
There is one special case in this game. We have a hideout that’s fully dynamic and requires
ray tracing or it’ll have no GI.
51
Baked Volumetric Diffuse Global Illumination
• Irradiance Volumes [Oat 05] in Assassin’s Creed Unity
• Baked on GPU using Ray Bundles [Tokuyoshi11]
• [Hobson19] for more details (based on Stephen Hill’s work on ACU)
• Multiple iterations
• Time of day support added in Assassin’s Creed Syndicate with key frame blending.
It was baked on the GPU using Ray Bundles. I’ll refer you to the great talk of Josh Hobson
about the Indirect Lighting Pipeline of God of War.
It’s based on the work done in ACU and gives a good view of what we did. The main
difference with ACU is the open world structure.
This system went through multiple iterations, it didn’t support time of day in ACU, but it
was added in the next game, Syndicate.
52
Assassin’s Creed Unity
• Irradiance Volumes
• Uniform probe distribution
• Mipmapped 3D Volumes
• Cover the whole open world
*click*
In ACUnity, the world was covered with uniform mipmapped irradiance volumes.
Mipmaps acted as level of details at farther distances.
*click*
It didn’t support dynamic time of day. It was faked with 4 fixed static ambiances.
Later, AC Syndicate added dynamic time of day by blending between 2 fixed GI key
frames.
53
Baked Volumetric Diffuse Global Illumination
• Uniform probe distribution in Assassin’s Creed Unity
• Wouldn’t scale to 16x16km
Area (km^2) Probe distance (m) Times of day Bake time (d) Disc size (GB)
Assassin’s Creed Unity 4 0.5 4 4 15
Assassin’s Creed Syndicate 6 1 9 3 9
Assassin’s Creed Origins 256 1 11 156 468
As you probably know, Assassin’s Creed pivoted to more rpg like experiences with larger
open worlds since AC Origins.
*click*
Naively extrapolating numbers from AC Unity or Syndicate,
We see this baked GI technique would never have scales to a 16x16km open world.
*click*
If we push this even further, Shadows adds 4 seasons.
Now we’re close to 2 TeraBytes of GI data, and over 600 days of bake time.
Even if we ignore the bake time and imagine we had a very large computing farm,
54
Baked Volumetric Diffuse Global Illumination
• CPU Path Tracing
• Easier to debug Host CPU Distributed
• No memory cliff
Export scene Surfelize
• Easier to distribute
• diverse GPU / drivers farm Density Map Surfel lighting
• faster to dispatch
Sparse Probe Placement
• Embree SIMD
• standalone exe Path tracing
• SN-DBS Distribution GI Data block Compression
• hundreds of machines
First thing we did was to shift to a CPU Path Tracing back-end. For several reasons.
• GPU baking requires a lot of VRam, and memory cliff was an issue especially at the
time of AC Origins.
• There are less variations due to drivers and diverse GPUs. Our inputs are
deterministic.
• It is faster to dispatch because less work is performed on the local machine
*click* We also introduced a density map, and sparse probes to our baked GI.
55
Baked Volumetric Diffuse Global Illumination
• Hand painted density map (“region layout”)
In baked GI solutions, texel or probe density is everything. But it doesn’t need to uniform.
We use a density map to paint the GI resolution and to ensures we have the right amount
of details where it matters the most.
With this, only fraction of the map is at the resolution of older games like ACUnity
(50cm).
But this area is still larger than the area of those games. So, it’s still not enough to
address the GI data size.
56
Baked Volumetric Diffuse Global Illumination
• Probes distributed in a sparse octree
• A node is subdivided if it contains surfels
And in average, we output 10% of the probes we’d get with a uniform distribution.
57
Baked Volumetric Diffuse Global Illumination
• 11 key frames (data driven)
• 0: local lights
• 1-10: sunlight for different times of day
58
Baked Volumetric Diffuse Global Illumination
• Sun, local lights, sky stored separately
• YCoCg
• Y (luma) directional stored as spherical harmonic SH4
• CoCg (chroma) as scalar values
• Seasons
• Minimal bake size overhead
• separate CoCg data per ambiance (season)
• intensity Y remains the same
• 3 bakes (Spring/Summer, Autumn, Winter)
• Calculate GI for each ambiance then combine results (CoCg) in a single output
• Possible mismatch between Y and CoCg
• But geometry mostly the same across seasons
• Mainly vegetation is affected (tree leaves, clutter, …)
• Problematic geometry can be skipped in bakes
We store sun, local lights and sky separately in the YCoCg format.
1) Spring and Summer would output the same diffuse GI (in fact, they are very similar in
terms of albedo).
2) Luma remains the same for all seasons, so we store it only once, but we store separate
chroma values per season.
There can be a mismatch between luma and chroma values, if a geometry changes or
doesn’t exist in each seasons.
Cell
Data
LOD1 New
Data
LOD0
We perform cascade blending, and in place GI Block LOD blending when a new block is
loaded.
60
Baked Volumetric Diffuse Global Illumination
• Moved from discreet GI blocks to cascaded GI volumes
On the left, we had a discreet grid of GI Blocks. Their mipmaps, or LODs, stream with the
view distance, exactly as we’d do with a distance based 2D texture streaming.
On the right, our sparse GI blocks are inserted into cascaded GI volumes to create a
much more continuous level of detail.
61
Predicted disk size (naïve Assassin’s Creed Unity approach)
• Uniform grid
• 1.9TB
Final disk size
• Sparse grid (10% avg vs Uniform Grid)
• Density map
• Dictionary compression
• 9GB
• 200x reduction
*click*
If we had followed the approach of Assassin’s Creed Unity, we would have ended up with
close to 2 TeraBytes of GI data in Shadows.
In the end, with everything I’ve just detailed, we end up with just 9 Gigs.
62
Spring/Summer Autumn Winter
Now I’ll show you some results. Even though we didn’t go for full exactitude, it gives
plausible results.
63
Specular reflections
Screen Space Reflections
64
Specular reflections
• Static « Gbuffer » Local Cubemaps
• Baked, dynamically relit
• Albedo, Normal, Depth
• Shadows
• 8bits texture, 8 keyframes, 1 bit per key
• Select 2 “key frames” closest to current ToD
• Compute blend factor with ToD
• Support Time of Day and Seasons
• Variation States
Shadows are also baked. To do so we store 8 keyframes in a 8 bits texture: 1 bit per key.
Then at runtime, we select the 2 keyframes closest to current ToD, and blend them
accordingly
65
Specular reflections
• Variation States
• Spring & Summer
• Autumn
• Winter
• 16k LCMs! (+ Variations)
In total, we have 16k cube map entities in the world, without counting their variations.
66
Specular reflections
• LCM vs DCM
• Fake entities (lower LODs)
• Far shadow
• Low frequency updates
You can see the difference between the dynamic cube map on the left, and a gbuffer local
cube map at the same location.
67
Specular reflections
• LCM seasons (variations)
Finally, this is an LCM with its seasons, and various times of day, for a given location.
68
Ray tracing
• Initial technical choices
• Inspired by Snowdrop ray tracing pipeline [Kuenlin 24]
• Software (SWRT) [Koshlo 24] and Hardware (HWRT) ray tracing
• Inline for optimal traversal loop
• Uber shader approach to control performances
• But wanted to keep our options open
• Abstraction Inline/Non-inline/SWRT/HWRT
We decided to follow a similar design, favoring inline ray tracing, and supporting both
hardware and software ray tracing.
But we were lacking data to back this up and wanted to keep our options open. So, we
went for a complete abstraction of inline/non-inline/software/hardware.
69
Ray tracing stack
• Fusion
• Hardware abstraction layer of 3D APIs (CPP and HLSL)
• Shared across multiple engines
• Insourcing model
Anvil Engine
Our ray tracing stack is abstracted under a HAL abstraction we call Fusion.
Fusion is shared across multiple engines and developed following an insourcing model.
70
Ray tracing stack
• Unified API
• Cross platform
• Hardware/Software
• Inline/Non-inline
TraceRayInline D3D12 Implementation Interface implemented for each platform / API, and SWRT
On the right, in green, you can see our ray tracing loop, with a callback interface.
71
Ray tracing stack
• Inline/Non-inline
When using inline raytracing, depending on the ray query state, it will invoke the
corresponding callback (hit, anyhit, miss, …).
72
Ray tracing stack
• Callback system
In this case, with non inline raytracing, you can just call the right callback inside the
corresponding shader.
Of course, for hit shaders, you’ll probably want to use a shader table instead.
But it works and it makes it very easy to switch between one pipeline and another, and test
things.
73
Ray tracing stack
• CPP
• Trivial to switch between Software / Hardware Ray tracing
Inline raytracing
Non-inline raytracing
In terms of CPP code, switching between Software and Hardware raytracing is simply
done with an enum.
And the difference to setup inline and non inline ray tracing is minimal, mainly comes to
the Shader Table setup.
74
Ray tracing pipeline
• Hybrid RTGI
• No vertex animation/skinning in BVH
• Per pixel ray tracing combines
1. Secondary Rays
1. Screen space ray tracing
2. World space ray tracing
• Hardware or Software ray tracing
• Acceleration Structure or custom BVH
2. Irradiance from DDGI-like ray traced probes
• 5 cascades of 16x16x8 probes
• Every 2m in the 1st cascade, doubled with each cascade
• 10k probes, 1024 probes updated each frame
- Per Pixel Ray tracing, made of screen space rays and world space rays.
- And DDGI-like probe cascades. We have roughly 10k probes, and update 1k probes
each frame
75
Ray tracing pipeline
• “Ray tracing Gbuffer”
• Ray tracing hits stored as a set of PBR properties
• Deferred Hit Lighting pass
Our ray tracing hits are stored as a set of PBR properties, in a “ray tracing gbuffer”.
76
Ray tracing pipeline
RT Specular*
Hit
Secondary Ray
SS Hits
Screen-space
ray tracing
WS Hits
we resume the screen space ray in world space with a hardware ray to find a hit in the AS.
We store the hits into our ray tracing “gbuffer” at the ray origin location.
Then all these hits are lit in the hit lighting pass.
If there is no hit for a given location, we process a miss and sample the sky.
Finally, we denoise the result, optionally add some extra RT-AO, add RT Specular if
available, to get the final image.
77
RTGI
Raytracing
1. We sum 2 systems with different frequencies: per pixel rays, and ray traced probe.
2. We ray trace diffuse at quarter res on most platforms, so we lose some high
frequency details.
3. There can be some disparities between the raster world and the ray tracing world.
For example, our grass doesn’t exist in the acceleration structure for perf reasons.
*click* So, we reapply some RT-AO to our general AO terms. You can see how it helps
houses, large vegetation, …
*click* We also still evaluate a subtle SSAO term to catch details that might be missing
from low res BVH or low ray count (quarter res)
78
RTGI + RTGI-AO
1. We sum 2 systems with different frequencies: per pixel rays, and ray traced probe.
2. We ray trace diffuse at quarter res on most platforms, so we lose some high
frequency details.
3. There can be some disparities between the raster world and the ray tracing world.
For example, our grass doesn’t exist in the acceleration structure for perf reasons.
*click* So, we reapply some RT-AO to our general AO terms. You can see how it helps
houses, large vegetation, …
*click* We also still evaluate a subtle SSAO term to catch details that might be missing
from low res BVH or low ray count (quarter res)
79
RTGI + RTGI-AO + SSAO
1. We sum 2 systems with different frequencies: per pixel rays, and ray traced probe.
2. We ray trace diffuse at quarter res on most platforms, so we lose some high
frequency details.
3. There can be some disparities between the raster world and the ray tracing world.
For example, our grass doesn’t exist in the acceleration structure for perf reasons.
*click* So, we reapply some RT-AO to our general AO terms. You can see how it helps
houses, large vegetation, …
*click* We also still evaluate a subtle SSAO term to catch details that might be missing
from low res BVH or low ray count (quarter res)
80
RTGI + RTGI-AO + SSAO
1. We sum 2 systems with different frequencies: per pixel rays, and ray traced probe.
2. We ray trace diffuse at quarter res on most platforms, so we lose some high
frequency details.
3. There can be some disparities between the raster world and the ray tracing world.
For example, our grass doesn’t exist in the acceleration structure for perf reasons.
*click* So, we reapply some RT-AO to our general AO terms. You can see how it helps
houses, large vegetation, …
*click* We also still evaluate a subtle SSAO term to catch details that might be missing
from low res BVH or low ray count (quarter res)
81
Acceleration Structure
• Ray tracing materials
• Bindless textures or average color
• Albedo and alpha
• Texture mip bias Primary Rays - no textures Primary Rays – albedo textures
• Average PBR attributes
• Overrides
• Cull meshes by size or type
• Clutter, small props, …
• “BVH Quality” BVH Quality - Low BVH Quality - High
We support both bindless textures or average colors from our material textures.
We use a global LOD offset to determine mesh LODs used in the BVH.
82
Acceleration Structure - No Textures Acceleration Structure – Albedo Textures
- At the top right is what we shipped in shadows, the acceleration structure using
bindless albedo textures.
83
Acceleration Structure
• Alpha test
1. Blue noise approximation (Snowdrop)
• Encode a transparency factor
• opaque texel count / total pixel count
• Stochastically texkill the right amount of texels
Blue noise alpha - 10 hits Blue noise alpha - 2 hits
2. Bindless Alpha texture
• Costly AnyHit with many overlaps
3. Tracing Max Transparent Hit Count
• Force opaque after n hits
• Biased result
- If you want a precise result: Anyhit is very costly with many overlaps.
- And if you try to limit the number of hits, you’ll get way much occlusion.
84
Acceleration Structure
• Alpha test
• Scale triangles according to average opacity
• 30% faster to trace
• Diffuse GI very close to reference
• Close hits handled in screen space AnyHit with alpha texture
It works very well because closest hits are handled in screen space and are much more
accurate.
*click* And it’s also very good and cheap approximation for specular reflections.
85
Acceleration Structure
• Alpha test
• Scale triangles according to average opacity
• 30% faster to trace
• Diffuse GI very close to reference
• Close hits handled in screen space AnyHit with alpha texture
Alpha test approximation w/ ray traced specular ClosestHit with scaled opaque triangles
It works very well because closest hits are handled in screen space and are much more
accurate.
*click* And it’s also very good and cheap approximation for specular reflections.
86
Acceleration Structure
• “RT Materials as a uber shader” approach (inline RT)
• Panicked after watching [Gong 21]
• (Capcom JSON shader mapping)
• But Ubi very strict about shader variety
• Low master shader count, owned by TD Arts
• GPU Driven pipeline reasons (per PSO batching)
• Relatively easy to write a "one fits them all" RT uber shader with Albedo textures
• Works for the vast majority, but epic fail for a few cases that requires manual fixing/overrides
Because we use Inline RT, we rely on a uber shader approach for our ray traced
materials.
*click* Thankfully, Ubi is very strict with shader variety and management.
We have a low master shader count for GPU Driven Pipeline reasons.
In the end it was relatively easy to write a “one fits them all” uber shader
*click*
But there were still a few epic fails, as in these screenshots. They were easily mitigated
with overrides.
87
Acceleration Structure
• Material table
• Unified representation
• PBR materials
• Bindless albedo and alpha textures
• Rough approximation of other PBR values
• Averaged mip 0 of PBR textures
• Mapping InstanceID+GeometryIndex to Material Table
• Season
• Season specific material variations
• Account for season specific material logic
• Fallen leaves, LUTs, …
• Limitations
• Costly deferred weather
• fully evaluated for specular (but not for diffuse)
• static snow applied for diffuse and specular
• baked in the terrain vista
As I said before, we use a unified representation of our materials for ray tracing
They are stored inside a material table that we access using the hit geometry IDs.
Seasons are handled with material variations in this table, to account for fallen leaves,
or the use of look up tables for leaf colors.
Deferred weather is fully evaluated for specular ray tracing, but not for diffuse, because
it’d quite costly to evaluate it multiple times.
Static snow is baked into the terrain vista in winter though, so it’s there even for diffuse.
88
Spring Summer Autumn Winter
This is what our acceleration structure looks like for each season.
89
Acceleration Structure
• Typical urban scene
• 2k+ BLASes
• 30k+ instances
• 300MB BLAS Data
• 20MB TLAS Data Acceleration structure
90
Software ray tracing
• Stack-based traversal [Koshlo 24]
• Compute shader based
• Targets low-end platforms
• Dynamic hideout
3-level BVH partitions
• 3-level BVH with partial updates
relying on space partition to perform partial updates only on cells where something has
changed.
91
Ray traced Indirect Diffuse
• Indoor volumes
• Avoid indoor light leaks
• List of planes
• Same data as BakedGI
• Classification of probes
• Sampling
• Find 8 surrounding probes Probe classification OFF
• Compute probe weights
• Distance, visibility, …
• Sample weighted probes with same classification
Probe classification ON
They are made of a list of planes. It’s the same data we use for baked GI, deferred
weather and other systems.
92
Ray traced Indirect Diffuse
• Light Probes convoluted into
• Irradiance Volume [McGuire 19]
• Cosine weighted irradiance
• 5x5 pixels (octahedral mapping)
• Radiance Cache
• Higher frequency version of IV
• 10x10 pixels (octahedral mapping)
• Use as a cache fallback
• Radiance Cache
• Short rays with probe fallback
• Temporally stable
• Less precise
• Trade-off between precision and stability
Hit Lighting Radiance cache Irradiance
To stabilize the signal, we trace shorter rays, and fallback to the radiance cache, which
is temporally stable.
You can see in the video a challenging scene, and the result with and without the cache.
The output is a bit less precise, but it’s a trade off between precision and stability.
93
Translucency
• Vegetation, Soji doors, etc..
• Stochastically flip rays on translucency surfaces
94
Ray traced Indirect Diffuse
• Omnidirectional Clustered Lights
• Based on deferred clustered lighting
• 2-level hierarchy
• Level 0: 260k Clusters
• Level 1: 4k “HighClusters” (16x16x16)
• A HighCluster is 4x4x4 Clusters (64)
• Uniform grid instead of “Froxels”
• Centered around camera
• “Just” a different spatial mapping
• 10x speed up in challenging scenes
Local lights are inserted in what we call an omnidirectional clustered lighting structure.
The main difference is that the clustered volume is mapped on a uniform grid around
the camera, instead of froxels.
We first perform a coarse cluster culling, and then a finer grain culling at the cluster
level.
95
Raytraced Indirect Specular
• Last minute development (delayed launch)
• Hybrid ray tracing
• Roughly the same design as ray traced diffuse GI
• High-end platforms (PS5 Pro Quality, PC)
• Denoising more difficult
• Iron out BVH quality issues
I discussed mainly ray traced diffuse, but we also support specular ray tracing.
It was a last-minute development after the game was delayed. It follows roughly the
same design as ray traced diffuse GI.
It was a relatively easy development, leveraging most of the work done for diffuse.
96
Denoising
• A-Trous
• SVGF-like denoiser [Schied and al. 17]
• ReBLUR - NVidia [Zhdan 21]
• Modular Snowdrop Denoiser (MSD)
• Spatial and temporal filter
• Based on recurrent blur approach
• Material masking for characters, vegetation, dynamic objects
• SH Denoising (YSHCoCg)
and SH denoising
97
Raw output
This is the RAW output we get from our ray traced diffuse GI
Final lighting
98
Raw output
Denoising
This is the RAW output we get from our ray traced diffuse GI
Final lighting
99
Raw output
Denoising
SH Denoising
This is the RAW output we get from our ray traced diffuse GI
Final lighting
100
Raw output
Denoising
SH Denoising
Lighting result
This is the RAW output we get from our ray traced diffuse GI
Final lighting
101
Raw output
Denoising
SH Denoising
Lighting result
Final image
This is the RAW output we get from our ray traced diffuse GI
Final lighting
102
Baked GI
103
RTGI
104
RTGI+Spec
105
RTGI
106
RTGI+Spec
107
Ray tracing performances
• Ray traced diffuse probes
In terms of performances, this what our ray traced diffuse probes cost,
108
Ray tracing performances
• Per pixel ray traced diffuse
• Quarter resolution (W/2 x H/2)
PS5 1440p PS5 Pro 1440p XBSX 1440p XBSS 900p RTX 4080 1440p
SS tracing 0.54ms 0.33ms 0.43ms 0.44ms 0.11ms
WS tracing 1.38ms 0.72ms 1.31ms 1.38ms 0.19ms
Lighting 1.17ms 0.76ms 1.02ms 1.06ms 0.36ms
Denoising 1.91ms 1.31ms 1.54ms 1.35ms 0.50ms
Total 5.00ms 3.12ms 4.30ms 4.23ms 1.16ms
So 6ms of GPU time, spread across the gfx queue and the async queue.
109
Ray tracing performances
• Per pixel ray traced specular
• Half resolution (W/2 x H), looks significantly better than Quarter resolution (W/2 x H/2)
Finally, some numbers for ray traced specular. We wanted to ship it at quarter res like
diffuse.
Half res looked significantly better, and we had the frame budget for it on the PS5 Pro.
110
Weather and Seasons
• Systemic approach to weather and seasons
• “Atmos” fluid simulation
• Advects humidity, temperature, …
• “Ambiance Graph” drives all the logic
• VFX, Fog, Rain, Wetness, Puddle level
• Deferred rain and snow
• “Deep” Snow for footsteps
• Multistate entities
• Build multistate entity templates
• Data-driven game-specific logic (seasons, pristine/destroyed, …)
• Multiple looks, nav-meshes, …
And an “Ambiance Graph” to drive all the weather and seasons logic
111
Weather and Seasons - Atmos
• Real-time fluid simulation [Hädrich et al. 20]
• Propagates various atmospheric factors
• Feeds dynamic clouds, wind, and rain systems
• Simulate cloud formation and dissipation, …
Atmos would deserve a whole talk. It simulates and propagates atmospheric factors with
a fluid simulation
such as vapor, temperatures, humidity, … and simulates the wind using low resolution
voxel data.
These quantities are then fed to various systems, such as volumetric clouds to drive
their formation, wind, and rain.
On the right, some graphs of various quantities function of the altitude: temperature,
vapor, vorticity, …
112
Weather and Seasons - Ambiance
• Ambiances
• Time of day, weather, grading,
seasons, post effects…
• Mostly curves
• Limited logic
• Ambiance graph
• Data driven
• Visual scripting (node graph)
• Outputs custom UI
*click* We used it to drive time of day, lighting features, weather, grading parameters,
post effects, …
*click* For Shadows, we added the concept of visual scripting with an Ambiance Graph.
It’s based on our node graph system, and fully data driven.
The ambiance graph will consume inputs from the engine, from atmos and drive the
whole weather and season stack, ToD and so on with tech art driven logic.
113
Deferred Weather - Rain
• Deferred Rain [Lagarde 12]
• Wetness and Puddle Level
• Albedo darkening
• Roughness
Our deferred rain rendering is based on Sebastien Lagarde great blog post series.
We use it to manipulate and render wetness and puddle level. We darken albedos and
decrease roughness based on wetness levels.
114
Deferred Weather - Rain
0% Wetness / 0% Puddles 100% Wetness / 0% Puddles 100% Wetness / 100% Puddles
• Final Render
• Darken Albedo
• Lower Roughness
• Wetness [0..2]
• R = no wetness on material
• G = wetness factor [0..1]
• B = puddle factor [1..2]
• Weather Albedo
The middle row shows our wetness mask, encoded as a single float in the Gbuffer.
A value between 0 and 1 encodes wetness, whereas a value between 1 and 2 encodes a
puddle factor.
Red objects, mainly characters, don’t fall into the deferred wetness shading. Characters
use a dynamic character layer system.
115
This is a step-by-step result:
116
This is a step-by-step result:
117
This is a step-by-step result:
118
Deferred Weather - Snow
• “Deep” Snow
• Data driven stamper: capsules, textures
• Terrain deformation, Heightmap
Footsteps stamping
• Deferred Snow
• Like deferred rain approach
• Snow modifies material inputs
• Dynamic Snow Accumulation
• Painted mask (warm zones)
• Static/Dynamic snow
• Pseudo random sparkles
• Specular scale
• Indoor Masking
Static/Dynamic snow map
*click* First there is what we call “deep” snow. It relies on a data driven stamper that can
stamp capsules, or textures.
It’s like the deferred rain. Deferred Snow also modifies material inputs.
We have the concept of cold and warm zones, driven by a static/dynamic snow mask.
119
Deferred Weather - Snow
• Snow modifies material inputs
It lerps albedos toward snow albedo (white) accordingly to current snow level.
And it also lowers translucency the higher the snow level is.
120
Deferred Weather - Snow
• Microvisibility masking
Microvisibility buffer
121
Deferred Weather - Snow
• Threshold snow based on normal orientation
122
Deferred Weather - Snow
the snow will slowly decay and turn into wetness and puddles until it melts and dries.
123
Deferred Weather - Occlusion
• Top-down indoor depth map
• Mid-range
• Raster – same indoor volumes used for GI
• Occludes wetness/snow accumulation
To exclude deferred snow and rain from interiors we use the same indoor volumes we
use for GI.
124
Deferred Weather - Occlusion
• Oriented depth map
• Occludes precipitation particles and ripples
• Close range
• Regular geometry
• Rain orientation
For rain particles and ripples, we render the surroundings in a regular depth map
oriented in the rain direction.
125
Scalability & Performance
126
Scalability
• Very different frames in quality and performance modes
• Xbox One Series X, Quality, 1620p, 33.1ms
Mainly due to the fact we ship different GI systems in performance and quality modes,
our frames look very different in either mode.
If we add to that the number of platforms we had to ship (PC, PS5, PS5 Pro, XBSX, XBSS,
MacOS, …), it was of a complexity we’d never faced before.
127
Platform Manager
• Data driven performance settings
• Per platform and context settings
• Mostly graphics and engine systems
• Terrain, streaming, fog, water,
shadows,…
• Auto generated UI
• Live edition
It is how we implement data driven performance settings, per platform and context. It
is a mainly a tool for Tech Art Directors.
128
Platform Manager
• Profiles
• Platforms, “modes”
• Triggered by users
• Profile settings
• Requires reloading the world
• System and features scalability
• Profile boot settings
• Requires restarting the game
Profiles are basically platform modes, such as perf mode, quality mode.
Profile settings require reloading the world, while profile boot settings require restarting the
game (there are very few of them).
You can see some examples on the right. GI technique, Hair Strands memory budgets,
etc…
129
Platform Manager
• Contexts
• Game states
• Triggered by gameplay
• Context settings
• Runtime specific features
• Change upon game state changes
It is a way to fine tune render features and performances for specific scenarios.
130
Platform Manager
• Modifiers
• Triggered by data
• Painted region layout
• 3D volumes
• Localized performance issue
• dense forest, …
• Specific needs for specific areas
• Forest vs caves, …
or specific needs.
131
Platform Manager
• First-class citizen
• Mandatory support for new systems
• Fast iteration and profiling
• Useful to iterate with vendors
• “Simulate” consoles in the editor
132
Transient resource tracking
• High level
• Resource lifetime
• Transient memory peak
• Included in performance captures
• Low level
• Inspect allocators
• Debug and optimize aliasing
• Memory waste
*click* There is a high-level view available in our internal perf capture tool, that shows
resource lifetime and the allocation peak during the frame (the green curve).
It is more of a logical view. We can color the allocations depending on their lifetime, or
their allocation size.
*click* There is also another more low-level view. We output a SVG with a very detailed
view of our allocators.
It is very useful to detect memory waste, and view memory aliasing patterns.
133
Performance Telemetry
• Teleport camera to each cell
• Loading gates
• Perf counters North/South/East/West
• Auto CPU/GPU captures if low fps
• Compare builds to find regressions
• At different dates
• Against another platform
• …
World Telemetry
It teleports the camera in each world cell and takes a snapshot in each cardinal
direction.
And is stored in a DB so we can track the evolution of our builds, compare them, ..
134
CPU work distribution
Comparison of two PS5 builds (GPU time)
We can track many things from the engine: cpu time, gpu time, npc count, dynamic
resolution factors, any metric really.
A few examples:
At the top left, it’s the distribution of CPU work over a session (between graphics, physics,
gameplay, …)
At the top right, I compare 2 ps5 builds to identify GPU time regressions
At the bottom left, I show a heatmap of the dynamic resolution factor, to identify the
most challenging areas for the GPU.
Anything can be tracked and integrated in our telemetry, and it really helps spot
undesired behaviors in such a large game.
135
Conclusion
136
Conclusion
• Shadows really is a “Large Scale Systemic Open World”
• First game shipped in a monorepo with a shared engine
• Largest Assassin’s Creed game in terms of scope
• Ray tracing and virtualized geometry in a 16x16km Open World
• Most scalable version of Anvil to date
It’s the largest Assassin’s Creed game ever made in terms of scope.
It includes state of the art ray tracing and virtualized geometry in a large open world.
137
Conclusion
138
Conclusion
• Lot of technology developed during production
• Micropolygon, Ray tracing, Atmos, Terrain, GPU Scatter, Hair, …
• Micropolygon
• Had to be conservative and adjust production guidelines
• Extrapolate polygon budgets early in production
• Any mistake would mean reworking assets in the entire 16x16km world
These were developed during the production, and we had to anticipate and extrapolate
extensively
With a game of this size, there’s little room for errors—any mistakes could mean
revisiting and fixing assets across the entire world map.
*click* We’ll invest more and more in Micropolygon and push to increase geometric
details.
139
Conclusion
• Baked and Ray traced Global Illumination
• Good for players
• A headache for developers!
• Performance vs Quality Modes = very different frames
• Underestimated workarounds and hacks artists use with our Baked GI
• System specific bugs
• Real-time cutscenes
It increased the complexity of the game, and we ended up with 2 very different frames in
perf and quality modes.
We also clearly underestimated the workarounds and hacks artists used with our
BakedGI, especially in cutscenes.
And support was more complex because we had system specific bugs.
*click* more ray tracing in future games will help us simplify our pipeline
140
Conclusion
• Combinatory complexity
• Difficult to QA
• 16x16km w/ Time of Day
• Systemic Weather
• 4 Seasons with variations
• Many platforms (PS5, PS5 Pro, XBSS, XBSX, PC, MacOS, Steam Deck)
• Performance/Balanced/Quality modes on Consoles
• Many graphics options
• Many upscalers (TAA, DLSS, XeSS, FSR, PSSR)
• Very different results
• Code complexity + vendor specific frameworks
• Microsoft DirectSR nice on paper
• But lagging behind bleeding edge updates
• No FrameGen support
Upscalers are evolving fast with new versions every now and then. We had specific issues
with each upscaler. They often output quite different results.
I really like the promise of Microsoft DirectSR to abstract all that, but
These 2 points are a deal breaker, but I really hope it is addressed in the future.
*click* Considering all this, we think it’s time to rethink how we approach this complexity
and how we present it to the players.
141
Conclusion
• Graph based season and weather
• Easy to prototype and iterate
• Difficult to QA and repro issues
• Idea requires more iteration
*click*
We also definitely want to push frame modularization and customization further,
142
BIBLIOGRAPHY
• [Bussière and Lopez 24] GPU-Driven Rendering in Assassin’s Creed Mirage, GPU Zen 3
• [Koshlo 24] Ray Tracing in Snowdrop: Scene Representation and Custom BVH, GDC 2024
• [Kuenlin 24] Raytracing in Snowdrop: An Optimized Lighting Pipeline for Consoles, GDC 2024
• [Gong 21] 'Resident Evil Village': Our Approach to Game Design, Art Direction, and Graphics, GDC 2021
• [Karis 21] Nanite: A Deep Dive. SIGGRAPH 2021
• [Zhdan 21] Reblur: A Hierarchical Recurrent Denoiser
• [Hädrich et al. 20] Stormscapes: Simulating Cloud Dynamics in the Now
• [Achard 19] Exploring Raytraced Future in Metro Exodus, GDC 2019
• [Hobson 19] The Indirect Lighting Pipeline of God of War, GDC 2019
• [McGuire 19] Dynamic Diffuse Global Illumination, GDC 2019
• [Lefebvre 18] Virtual Insanity: Meta AI on Assassin's Creed: Origins, GDC 2018
143
BIBLIOGRAPHY
• [Uchimura 18] Practical HDR and Wide Color Techniques in Gran Turismo SPORT, SIGGRAPH ASIA 2018
• [Rodrigues 17] Moving to DirectX 12: Lessons Learned, GDC 2017
• [Schied and al. 17] Spatiotemporal Variance-Guided Filtering, HPG 2017
• [Haar and Aaltonen 15] GPU Driven Rendering Pipelines, SIGGRAPH 2015
• [Jacobs 15] Simulating the Visual Experience of Very Bright and Very Dark Scenes
• [Lagarde 12] Water drop – Physically based wet surfaces, Blog post series
• [Tokuyoshi 11] Fast Global Illumination vis Ray-Bundles
• [Cignoni et al. 05] Batched Multi Triangulation
• [Oat 05] Irradiance Volumes for Games, GDC 2005
• [Jensen 00] Night Rendering
• [Kumar et al. 96] Hierarchical Back-Face Culling
144
Questions? Nicolas Lopez
@Nicolas_Lopez_
@nicolas-lopez.bsky.social
@NicolasLopez@mastodon.gamedev.place
And we are done. I’ll be taking questions if we still have time for it
145