0% found this document useful (0 votes)
341 views145 pages

Lopez Nicolas Rendering Assassins Creed

Nicolas Lopez discusses the launch of Assassin's Creed Shadows and the Anvil Engine, which is designed for large systemic worlds and has evolved since its inception with the first Assassin's Creed game in 2007. The presentation covers the game's rendering capabilities, scalability, and performance, highlighting the engine's shared structure and advancements in GPU-driven pipelines. AC Shadows is a next-gen title featuring a vast open world with complex systems like ray tracing, dynamic weather, and seasons, aiming to push the boundaries of game rendering technology.

Uploaded by

pulp noir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
341 views145 pages

Lopez Nicolas Rendering Assassins Creed

Nicolas Lopez discusses the launch of Assassin's Creed Shadows and the Anvil Engine, which is designed for large systemic worlds and has evolved since its inception with the first Assassin's Creed game in 2007. The presentation covers the game's rendering capabilities, scalability, and performance, highlighting the engine's shared structure and advancements in GPU-driven pipelines. AC Shadows is a next-gen title featuring a vast open world with complex systems like ray tracing, dynamic weather, and seasons, aiming to push the boundaries of game rendering technology.

Uploaded by

pulp noir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 145

RENDERING

Welcome to my talk.

I’m Nicolas Lopez. Today we are going to talk about ACShadows.

Today is also the launch date of the game, and I’m really happy to be here on this special
day.

1
Acknowledgements

First, I want to take a minute to thank the whole Anvil rendering team for their
contribution.

I also want to thank the Shadows production team.

Everything discussed today is a team effort.

2
Speaker
• Nicolas Lopez
@Nicolas_Lopez_
@nicolas-lopez.bsky.social
@NicolasLopez@mastodon.gamedev.place

• Technical Architect @ Ubisoft Montréal


• 15 years+ in games

If you want to find me online, I’m on TwitterX / Bluesky / Mastodon.

And here are some of the games I’ve worked on in the past.

Now I’ll show a trailer of ACShadows to give a bit some context to this talk.

3
Trailer

4
Agenda

1. Anvil Engine
2. Assassin’s Creed Shadows
3. Rendering Assassin’s Creed
4. Scalability & Performance
5. Conclusion

Today I’m going to introduce you to the Anvil Engine.

I’ll then dive into Assassin’s Creed Shadows, and some if it’s rendering pillars.

Finally, I’ll talk about how we approach scalability and performance.

5
Anvil Engine

Let’s talk about Anvil

6
ANVIL
• Born with Assassin’s Creed (2007)

*click* Anvil was born with Assassin’s Creed 1, released in 2007.

*click* Anvil is an engine designed for large systemic worlds

with a focus on long range rendering, massive instance count,

*click* and systemic gameplay.

7
ANVIL
• Born with Assassin’s Creed (2007)
• Large dense open world games

*click* Anvil was born with Assassin’s Creed 1, released in 2007.

*click* Anvil is an engine designed for large systemic worlds

with a focus on long range rendering, massive instance count,

*click* and systemic gameplay.

8
ANVIL
• Born with Assassin’s Creed (2007)
• Large dense open world games
• Systemic gameplay [Lefebvre 18]

*click* Anvil was born with Assassin’s Creed 1, released in 2007.

*click* Anvil is an engine designed for large systemic worlds

with a focus on long range rendering, massive instance count,

*click* and systemic gameplay.

9
ANVIL

*click* Anvil is mostly known for being used in Assassin’s Creed,

but it has also shipped other big franchises such as

*click* Ghost Recon, *click* For Honor *click* and Rainbow Six Siege

If you wonder what a competitive FPS, a military shooter, and Assassin’s Creed have in
common… it’s complicated 

10
GENEALOGY OF ANVIL

BLACKSMITH

SCIMITAR

SILEX

Over time, we developed many variations or forks of the same engine

*click* with Scimitar, the Assassin’s Creed engine, *click* but also Blacksmith, *click*
and Silex.

And it made sense while engines were still relatively small.

11
Paradigm shift
• Many engines at Ubisoft
• Anvil, Snowdrop, Dunia, Voyager, Ubiart, …

• Many forks of the same engine


• Scimitar/Blacksmith/Silex, Dunia/Disrupt/Babel, …

• Big developments have a long life


• Not so many opportunities to re-write systems
• More and more iterative developments

• Multiplication of efforts
• Need for tech convergence
• Anvil as a shared Engine

Eventually, we made several observations that would change the way we work:

*click* We had too many engines at Ubisoft

*click* We had too many forks! of these engines

It led to a multiplication of efforts and developments

*click* We also started to realize big developments have a long life

Some systems take years to build, and can survive a decade

*click* Ultimately, we believe the multiplication of efforts make us less competitive

Working on the same feature or system in so many engines didn’t make sense anymore

12
Context
• Shared Engine
• Used for multiple games, brands, and genres
• Mono-repo
• One single code base
• Multiple teams
• Multi-studio transverse team
• Multi-studio productions

*click* Today we work with Anvil as a shared engine, shared across multiple games,
brands, and genres..

*click* We are organized as a mono-repo

*click* And our teams are scattered across the globe

13
Assassin’s Creed Shadows

Now let’s dive into Assassin’s Creed Shadows

14
From Valhalla

To Shadows
When we started working on AC Shadows,

ACValhalla had just shipped and was our benchmark as an open world AC rpg.

AC Shadows would be the first next gen only Assassin’s Creed game.

15
From Valhalla to Shadows
• ACValhalla: cross gen title, little scalability (resolution)
• Xbox One Series X, Quality, 2160p, 21.4ms

• Xbox One Series X, Performance, 1680p, 16ms

ACValhalla was a cross-gen title, with little scalability.

Its quality and performance modes had 2 very similar frames.

The main driving factor for scalability was its render resolution. (We can see that the GPU
is underutilized at 4k Native 30fps).

Looking at this when we started working on ACShadows, we wanted to push scalability to


the next level.

16
A Large Scale Systemic Open World
• Anvil
Scalability
• 16x16km Weather
Quality
30fps, Raytracing

• Time of day Storm

• Destruction Cloudy
Balanced
• Raytracing Sunny
40fps, Raytracing

• Virtual Geometry Spring 0 6 12 18 24 Time of Day


Summer
• Systemic Weather Autumn
Performance
Winter
Seasons
60fps, Baked GI
• Seasons

So, what is the scope of Shadows?

*click*
It’s made with Anvil
It’s 16x16km open world with Time of Day

So far so good?

*click* It has Destruction, *click* Raytracing and *click* Virtual Geometry

I’m not finished

*click* Systemic weather, *click* and even seasons! *click*

I swear, I don’t work for Kojima, but I probably made the same face

17
A Large Scale Systemic Open World
• Anvil
• 16x16km
• Time of day
• Destruction
• Raytracing
• Virtual Geometry
• Systemic Weather
• Seasons

So, what is the scope of Shadows?

*click*
It’s made with Anvil
It’s 16x16km open world with Time of Day

So far so good?

*click* It has Destruction, *click* Raytracing and *click* Virtual Geometry

I’m not finished

*click* Systemic weather, *click* and even seasons! *click*

I swear, I don’t work for Kojima, but I probably made the same face

18
Turns out it wasn’t SO crazy…

Here you can see various dimensions of this complexity,

with time of day, systemic weather and seasons

19
In more details,

*click* We can see various time of day snapshots

*click* Some weather states

*click* And finally, the 4 seasons

20
A Large Scale Systemic Open World
• Gen 5 only
• Baked or ray traced Global Illumination
PS5 / XBSX Performance – 60fps Balanced – 40fps Quality – 30fps
Target Resolution* 1080p 1280p 1440p
GI Technique Baked GI Raytraced GI Raytraced GI
PS5 Pro
Target Resolution* 1080p 1280p 1440p
GI Technique Raytraced GI Raytraced GI Raytraced GI + Specular
XBSS
Target Resolution* X X 900p
GI Technique X X Baked GI
*Dynamic Resolution

ACShadows is a gen 5 only title

We have 3 modes on consoles: perf, balanced and quality with various target
resolutions.

We decided very early in the development we’d favor Raytracing in Quality modes.

21
Rendering Assassin’s Creed

Now let’s dive into what we like the most, rendering

22
World Structure
16km

• 16x16km
• World partitioning
• Multi-user editing
• Long range rendering
• Streaming grid layers
• LOD Meshes
• Fake Entities
• Vistas
• Data Driven 16km

Anvil is built for long range rendering.

*click* The world is partitioned in cells and the engine is built with multi-user editing in
mind.

The data itself is organized in steaming grid layers.

We have several of these grids.

The definition of this world structure is fully data driven. *click*

23
Data Driven World Structure
• Short range grid
• Loading range: 96m. Cells: 32m
• Small objects that don’t need to be seen after 96m
• Main grid
• Loading range 128m. Cells: 32m
• Regular props, NPC spawners, … the heaviest
• Long range grid
• Loading range: 384m. Cells: 128m
• Large objects
• Fake entity grid:
• Loading range 1024m (near)
• Loading range 4096m (far)
• Cells: 512m
• “Point cloud”
• Mass impostor rendering
• Until 8182m
• Terrain Vista

In ACShadows, we have various data driven loading grids to represent various levels of
details.

(Describe)

Loading grids can be specified at the entity level

Draw distances are driven by what we call LOD Selectors

And constrained by loading grids

Beyond those grids, we still render what we call “Point Clouds”, our mass impostor
renderer, and the Terrain Vista.

24
Terrain

Here we can see the naked terrain,

*click* Now with LOD meshes (short/main/long grids)

*click* And finally with fake entities and point clouds

25
Terrain
LOD Meshes

Here we can see the naked terrain,

*click* Now with LOD meshes (short/main/long grids)

*click* And finally with fake entities and point clouds

26
Terrain
LOD Meshes
Fake Entities

Here we can see the naked terrain,

*click* Now with LOD meshes (short/main/long grids)

*click* And finally with fake entities and point clouds

27
Fake Entities

Again, in reverse order,

Fake entities (that are not supposed to be seen that close)

*click* Then the corresponding LOD meshes (short/main/long)

28
Fake Entities
LOD Meshes

Again, in reverse order,

Fake entities (that are not supposed to be seen that close)

*click* Then the corresponding LOD meshes (short/main/long)

29
“Point Clouds”

I mentioned before our “point cloud” renderer. It’s a mass quad renderer that we use to
render our trees up to 8 kms aways.

This is without point clouds.

*click* with point clouds.

*click* and another view.

30
“Point Clouds”

I mentioned before our “point cloud” renderer. It’s a mass quad renderer that we use to
render our trees up to 8 kms aways.

This is without point clouds.

*click* with point clouds.

*click* and another view.

31
“Point Clouds”

I mentioned before our “point cloud” renderer. It’s a mass quad renderer that we use to
render our trees up to 8 kms aways.

This is without point clouds.

*click* with point clouds.

*click* and another view.

32
GPU Driven Pipeline(s)
• Corner stone of Anvil Engine since Assassin’s Creed Unity (2014)
• Shipped many games
• Multiple iterations

GPU Cluster Culling, Indirect Drawcalls Mass instancing and batching Continuous LODs and pixel-precise geometry

GPU driven pipelines have been the corner stone of Anvil since ACU in 2014.

It shipped many games since then and went trough multiple iterations

ACUnity introduced GPU Cluster Culling and Indirect draws

ACValhalla mass instancing and batching

ACShadows now adds Virtualized Geometry.

33
GPU Driven Pipeline(s)
• Batch Renderer in Assassin’s Creed Unity [Haar and Aaltonen 15]
• DirectX 11 class APIs
• MultiDrawIndexedInstanceIndirect
• Discrete LODs of Clustered meshes
• Fine culling at the instance/cluster/triangle level

In ACUnity, the Batch Renderer, as we call it, was a DX11 class API GPU Driven pipeline.

The main goal of the Batch Renderer was to reduce the cost of expensive DX11 draw
calls.

It relied on MultiDrawIndirect to cull and render clustered meshes.

It introduced fine culling at the instance, cluster and triangle level.

34
GPU Driven Pipeline(s)
• Significant work still on the CPU
• No bindless
• per material batching
• large number of drawcalls

• No Async
• per pass culling before each pass
• delays actual rendering

CULLING GEOMETRY RENDERING

The Batch Renderer had a significant part of it was still on the CPU.

It didn’t support bindless resources, which limited the amount of batching it could
perform (per material batching).

It also didn’t support async computes.

(Capture = ACU on a PS5)

35
GPU Driven Pipeline(s)
• GPU Instance Renderer since AC Valhalla/Mirage [Bussière and Lopez 24]
• DirectX 12 class APIs
• Execute Indirect
• More work on the GPU
• Batch on load
• Bindless
• Async Culling

Batch Renderer GPUIR

The second version of our GPU Pipeline is called GPU Instance Renderer or GPUIR.

It is designed for DX12 class APIs.

The main goal was to address the Batch Renderer drawbacks.

It is fully bindless, batching is now per shader instead of per material.

The GPUIR performs batching on load and is designed to cull millions of instances per
frame.

And as you can see, more steps were moved to the GPU.

To achieve that, the GPUIR relies heavily on something we call “Database”.

36
GPU Driven Pipeline(s)
• Database
• container for data structures that can be shared between the CPU and the GPU
• DOD (Data Oriented Design), with the convenience of C++ OOP on the GPU
• relies on Shader Input Groups (SIG) to generate binding code for C++ and HLSL [Rodrigues17]

• Share full scene description between the CPU and the GPU
• Database analogy
• each member in the structure would be a column
• each instance within the array of MyObject would be a row in the table

row transform type flags parent


0 {…} 2 0x00101 5
1 {…} 1 0x10001 5
2 {…} 2 cell 0x00000 5 row
3 {…} 4 0x01011 1
. . . column . .

I know what you are thinking… but it’s not what you are thinking 

*click*
Database is a container for data structures that can be shared between the CPU and
the GPU.

It’s follows Data Oriented Design, with a referencing mechanism

We rely on Shader Input Groups (SIG), our internal compiler for shader bindings,

to generate all the helper function and binding code.

*click*
In the context of our GPU Driven Pipeline, Database is used to share the

full scene description between the CPU and the GPU.

*click*
We use database analogies to structure our data.

*click* In this example, Row<MyObject> is just the equivalent of a pointer to another


MyObject.

37
Database
• SIG Compiler

• Table/column/row access in C++

• Table/column/row access in HLSL

This is an example on how we declare a table in our SIG format.

And how we access the same members, in CPP, or in HLSL.

All the interface, getters, and so on are generated

38
Database Relations
• 1 to 1

• 1 to n

One of the key features of database is the support of relations. Relations are links
between different tables.

*click* A 1 to 1 relation is the database version of a pointer.

*click* A 1 to n relation, is an extension of the previous concept. It simulates a pointer


with a size.

39
Database Replication
• Different table instances for CPU and GPU storage
• Copy
• Copy the whole table from CPU to GPU (ByteAddressBuffer)

• Dirty rows/pages update


• CPU data storage
• writing maintains a dirty bitfield
• copy dirty row/page ranges

• Dirty page copy


• no CPU storage required
• writing creates dirty page memory
• Compute shader writes dirty pages

Different table instances handle CPU and GPU storage.

We support different modes of data replication to ensure the data is propagated from on
instance to another (generally from the CPU to the GPU)

• Copy
most simple replication available. It’s a simple full copy to a ByteAdressBuffer

• Dirty rows/pages updates


maintains a dirty mask at the row or page granularity and replicate dirty rows or
pages to the destination table.

• Dirty page copy


to avoid storing very large tables on the CPU, it only allocates and stores dirty rows
on the CPU instead, until they are flushed to the GPU.

40
Mesh Description

And this is what our scene description looks like on the GPU. It’s quite close to its CPU
representation.

In a nutshell:

A CullMesh has up to 5 LODs,

Each LOD has 2 LODDistances, 1 main view and 1 for shadows to allow customization of
shadow LOD distances,

The association of a geometry with a PSO is made at the Submesh level with a
BatchHash.

We make sure we have only 1 batch per PSO and there are no duplicates.

*click*

And this is how we declare this with tables, in SIG. Each type is a database table, and they
are linked by row and range properties.

41
Scene Description

Mesh Description

Now if we zoom out a bit, we want to focus mainly on LeafNodes and CullInstances:

*click*

We see the world is organized as a Quad-Tree.

*click*

To achieve good culling performances, it relies on a hierarchical structure made of


entity groups and leaf nodes.

Entity Group = is a group of objects of same type in a loading cell


Leaf Node = is a group of instances

(Instances are split into leadNodes)

We gather instances that are spatially close to each other in the same leaf node to
minimize their bounding volume and make coarse culling more efficient.

42
Scene Description
CPU
Declare tables Update tables

GPU
Fetch data and use it
Add a mesh

In terms of code, this is roughly the lifecycle of these tables.

*click*
First, we declare CPU and GPU tables separately.

*click*
Then we populate tables. This is how we add a mesh to the CullMesh table and set various
properties to this mesh entry.

*click*
Eventually, we update GPU tables, following the chosen update policy (either full copy, or
dirtied entries).

*click*
Finally, we fetch data on the GPU and use it as we like.

The interface is very CPP like, while maintaining maximum cache efficiency in data
access patterns.

Database is a sort of super structured buffer,

43
GPU Instance Renderer (GPUIR)

Frustum + Occlusion culling

Coarse Frustum culling Multi Frustum culling

LOD selection and blending

Indirect Draw Calls Buckets Data required by VS+PS

Finally, this is what the entire GPU Instance Renderer pipeline looks like.

It has many steps, some of them being very specific to our engine. *click*

*click* On the CPU, there is a coarse Frustum culling to cull leaf nodes (basically groups
of instances)

On the GPU, we have 2 large steps: Frame Culling and Per Pass Culling.

*click* Frame Culling culls instances that passed CPU culling against all view frustums.
Then LOD selection and blending logic is also performed at this stage.

*click* Per Pass Culling performs pass specific Frustum + Occlusion culling. For example,
Sun Shadows perform anti frustum culling.

It also prepares instance data required by VS and PS to access geometry and material
data for a given instance.

Finally at render time, we support optional Cluster and Triangle culling before the final
ExecuteIndirect calls.

44
Micropolygon geometry (MPH)
• Based on Nanite [Karis21]
• Evolution of our clustered mesh pipeline
• Now a cluster hierarchy
• Continuous LODs
• Per cluster geometry streaming
• Mesh Shader / Software Rasterization
• Integrates to GPUIR

Additionally, to the GPU Instance Renderer, we added support for a Micropolygon


geometry pipeline.

*click*
It’s based on Nanite and is evolution of our clustered mesh pipeline.

In this pipeline, a mesh is made of a cluster hierarchy with continuous LODs

We stream clusters individually based on their visibility.

And polygons are either hardware or software rasterized based on their size.

45
Micropolygon geometry
• Used for Gbuffer and Shadow rasterization
• ½ memory load vs standard GPUIR geometry
• MPH Rasterization similar as others
• METIS-based partitioning for Cluster DAG continuous LOD selection
• Mesh Shader / Software Rasterization into a 64bit Visibility Buffer
• 2 Pass Hierarchical Z Buffer occlusion system
• Deferred Material rendering from Visibility Buffer into Gbuffer
• GPU Feedback based cluster page streaming
• 128 triangles per cluster

Micropoly is used for Gbuffer and Shadow rasterization

Geometry takes roughly half the memory of the same GPUIR geometry

It shares similarities with other existing techniques:

We use METIS [library] based partitioning (ie. Group clusters with the most shared edges).

Either Mesh Shader or SW Rasterization depending on the triangle size

It has 2 Pass HZB (Hierarchical Z Buffer) occlusion system

It renders Deferred Material from a visibility buffer, output into the Gbuffer

Clusters are made of 128 triangles [clusters used to be 64 tris in ACU to match the nb of
lanes per wavefront],

and cluster streaming is based on a GPU Feedback loop [culling traversal – page
requests – async cpu readback]

46
Micropolygon geometry
• Simplygon for cluster group simplification
• Support of user custom shader code (manual API)
• Auto-analytic derivatives

• Bindless material support with draw call compaction


• Scanline SW Rasterization using XY swapping virtual coords
• Better branch coherency

• Lots of tweaking to reduce root page size for a high amount of “lego-kits”

Some differences:

We use Simplygon for cluster group simplification

We support user custom code shader with a manual API

Our pipeline is fully bindless as it is with the GPUIR

During SW Rasterization, we swap XY virtual coords depending on triangle


configurations to improve branch coherency.

And finally, we had to do a lot of tweaking to reduce root page size to support of a high
number of instances for our game.

47
GPU Driven Pipeline(s)
• Micropolygon
• Instances
• 28k main view
• Triangles
• 34M rendered (all passes)
• 18M main view + 16M shadows & misc
• Rasterization
• 30M software + 4M hardware

• GPU Instance Renderer


• Instances
• 1.3M pre-culling (all passes)
• 9k rendered (all passes)
• ~99% culling efficiency
• Triangles
• 410M pre-culling (all passes)
• 1.7M rendered (all passes)

Some stats

We currently only support micropoly for static opaque geometry.

*click* In this game, Micropolygon is our main city architecture and static geometry
renderer

while GPUIR is our main massive alpha tested vegetation renderer.

In Kyoto, we have around 28k micropoly instances, for 34M rendered triangles. As you
can see, in this case, 90% of the triangles are software rasterized.

*click* For the GPUIR, in this scene it’s mostly trees.

We have around 9k instances spread across all passes and render 1.7M triangles with it.

48
GPU Driven Pipeline(s)
• Micropolygon
• Instances
• 1k main view
• Triangles
• 7.6M rendered (all passes)
• 1.6M main view + 6.0M shadows & misc
• Rasterization
• 7.3M software + 0.3M hardware

• GPU Instance Renderer


• Instances
• 3M pre-culling (all passes)
• 31k rendered (all passes)
• ~99% culling efficiency
• Triangles
• 1.5B pre-culling (all passes)
• 6.8M rendered (all passes)

Now in a forest.

*click* Micropolygon is less solicited in this scene but still renders some buildings and
rocks.

*click* The GPUIR is the pipeline doing the heavy lifting here, compared to the previous
frame.

We cull 3M instances to finally render 30k of them. The scene has 1.5B triangles, and we
end up rendering close to 7M of them.

Our forests are very dense..

49
Global Illumination
• Large Scale Open World GI
• Data Oriented Design
• Multiple iterations

Open World Irradiance Volumes Large Scale Sparse GI Ray traced Diffuse and Specular

Now let’s dive into one of my favorite topics, Global Illumination.

It’s always been a first-class citizen in Assassin’s Creed games, and in my heart 

In ACUnity on the left, we developed a volumetric open world GI.

In ACOrigins in the middle, we made it sparse to adapt it to larger 16x16km open worlds.

Finally, in ACShadows, on the right, we added support for seasons, and developed a new
ray traced GI pipeline.

50
Global Illumination
• Scalable Global Illumination pipeline
• Mandatory ray tracing in the hideout

Dynamic global cubemap


Relightable local cubemaps Ray traced Specular
Software / Hardware
Screen Space Reflections
Baked Diffuse Ray traced Diffuse

PS5/XBSX Performance modes PS5/XBSX Quality modes PS5 Pro Quality mode
XBSS (Memory constraints) PS5 Pro Performance mode High End PCs
Low End PCs

For Shadows, we targeted a scalable GI pipeline from the beginning.

Ray tracing is costly, and we didn’t want to sacrifice other parts of the game to ship it at
all cost, especially at 60fps on consoles.

Moreover, considering the hardware landscape, it made sense to give more choice to the
players.

Our GI pipeline is a spectrum, going from baked to ray trace diffuse and specular. Ray
tracing can be either rendered in hardware or software with compute shaders.

There is one special case in this game. We have a hideout that’s fully dynamic and requires
ray tracing or it’ll have no GI.

51
Baked Volumetric Diffuse Global Illumination
• Irradiance Volumes [Oat 05] in Assassin’s Creed Unity
• Baked on GPU using Ray Bundles [Tokuyoshi11]
• [Hobson19] for more details (based on Stephen Hill’s work on ACU)

• Multiple iterations
• Time of day support added in Assassin’s Creed Syndicate with key frame blending.

The first version of our baked volumetric GI dates from AC Unity.

It was baked on the GPU using Ray Bundles. I’ll refer you to the great talk of Josh Hobson
about the Indirect Lighting Pipeline of God of War.

It’s based on the work done in ACU and gives a good view of what we did. The main
difference with ACU is the open world structure.

This system went through multiple iterations, it didn’t support time of day in ACU, but it
was added in the next game, Syndicate.

52
Assassin’s Creed Unity

• Irradiance Volumes
• Uniform probe distribution
• Mipmapped 3D Volumes
• Cover the whole open world

• 4 static times of day


• fixed ambiances
• no dynamic time of day
• no blending

*click*
In ACUnity, the world was covered with uniform mipmapped irradiance volumes.
Mipmaps acted as level of details at farther distances.

*click*
It didn’t support dynamic time of day. It was faked with 4 fixed static ambiances.

Later, AC Syndicate added dynamic time of day by blending between 2 fixed GI key
frames.

53
Baked Volumetric Diffuse Global Illumination
• Uniform probe distribution in Assassin’s Creed Unity
• Wouldn’t scale to 16x16km

Area (km^2) Probe distance (m) Times of day Bake time (d) Disc size (GB)
Assassin’s Creed Unity 4 0.5 4 4 15
Assassin’s Creed Syndicate 6 1 9 3 9
Assassin’s Creed Origins 256 1 11 156 468

• Assassin’s Creed Shadows


• 256km^2, 4 seasons
• Disc size = 468GB x 4 ~= 1.9TB!!
• Bake time = 156 x 4 = 624 days!!

As you probably know, Assassin’s Creed pivoted to more rpg like experiences with larger
open worlds since AC Origins.

*click*
Naively extrapolating numbers from AC Unity or Syndicate,

We see this baked GI technique would never have scales to a 16x16km open world.

*click*
If we push this even further, Shadows adds 4 seasons.

Now we’re close to 2 TeraBytes of GI data, and over 600 days of bake time.

Even if we ignore the bake time and imagine we had a very large computing farm,

we clearly had to do something to reduce the required storage.

54
Baked Volumetric Diffuse Global Illumination
• CPU Path Tracing
• Easier to debug Host CPU Distributed
• No memory cliff
Export scene Surfelize
• Easier to distribute
• diverse GPU / drivers farm Density Map Surfel lighting
• faster to dispatch
Sparse Probe Placement
• Embree SIMD
• standalone exe Path tracing
• SN-DBS Distribution GI Data block Compression
• hundreds of machines

First thing we did was to shift to a CPU Path Tracing back-end. For several reasons.

• GPU baking requires a lot of VRam, and memory cliff was an issue especially at the
time of AC Origins.
• There are less variations due to drivers and diverse GPUs. Our inputs are
deterministic.
• It is faster to dispatch because less work is performed on the local machine

*click* We also introduced a density map, and sparse probes to our baked GI.

55
Baked Volumetric Diffuse Global Illumination
• Hand painted density map (“region layout”)

In baked GI solutions, texel or probe density is everything. But it doesn’t need to uniform.

We use a density map to paint the GI resolution and to ensures we have the right amount
of details where it matters the most.

With this, only fraction of the map is at the resolution of older games like ACUnity
(50cm).

But this area is still larger than the area of those games. So, it’s still not enough to
address the GI data size.

56
Baked Volumetric Diffuse Global Illumination
• Probes distributed in a sparse octree
• A node is subdivided if it contains surfels

• Discard probes inside geometry


• Avg. 10% probes vs uniform distributions

To further decrease the size of our Baked GI,

We distribute probes in a sparse octree. A node is subdivided only if it contains surfels.

Probes inside geometry are discarded.

And in average, we output 10% of the probes we’d get with a uniform distribution.

57
Baked Volumetric Diffuse Global Illumination
• 11 key frames (data driven)
• 0: local lights
• 1-10: sunlight for different times of day

To support time of day, we store 11 key frames

1 for local lights

And 10 for different positions of the sun

58
Baked Volumetric Diffuse Global Illumination
• Sun, local lights, sky stored separately
• YCoCg
• Y (luma) directional stored as spherical harmonic SH4
• CoCg (chroma) as scalar values

• Seasons
• Minimal bake size overhead
• separate CoCg data per ambiance (season)
• intensity Y remains the same
• 3 bakes (Spring/Summer, Autumn, Winter)
• Calculate GI for each ambiance then combine results (CoCg) in a single output
• Possible mismatch between Y and CoCg
• But geometry mostly the same across seasons
• Mainly vegetation is affected (tree leaves, clutter, …)
• Problematic geometry can be skipped in bakes

We store sun, local lights and sky separately in the YCoCg format.

The luma value is directional and stored as spherical harmonics.

While chroma values are scalar.

For seasons, we aimed at minimal bake size overhead.

So, we made 2 assumptions that we know are fundamentally wrong:

1) Spring and Summer would output the same diffuse GI (in fact, they are very similar in
terms of albedo).

2) Luma remains the same for all seasons, so we store it only once, but we store separate
chroma values per season.

It’s clearly an approximation.

There can be a mismatch between luma and chroma values, if a geometry changes or
doesn’t exist in each seasons.

But in practice, mainly vegetation is heavily affected by seasons. And problematic


geometries can be skipped in the bakes.
59
Baked Volumetric Diffuse Global Illumination
• Voxels inserted in cascaded 3D volumes at runtime for render
• sparse fetch/interpolation too slow with live decompression and blending
• Cascade Blending
• In place cell LOD blending

Cell
Data
LOD1 New
Data
LOD0

At runtime, sparse voxels are inserted in cascaded 3d volumes.

We perform cascade blending, and in place GI Block LOD blending when a new block is
loaded.

60
Baked Volumetric Diffuse Global Illumination
• Moved from discreet GI blocks to cascaded GI volumes

Assassin’s Creed Unity Assassin’s Creed Origins

On the left, we had a discreet grid of GI Blocks. Their mipmaps, or LODs, stream with the
view distance, exactly as we’d do with a distance based 2D texture streaming.

On the right, our sparse GI blocks are inserted into cascaded GI volumes to create a
much more continuous level of detail.

61
Predicted disk size (naïve Assassin’s Creed Unity approach)
• Uniform grid
• 1.9TB
Final disk size
• Sparse grid (10% avg vs Uniform Grid)
• Density map
• Dictionary compression
• 9GB
• 200x reduction

And now some numbers.

*click*
If we had followed the approach of Assassin’s Creed Unity, we would have ended up with
close to 2 TeraBytes of GI data in Shadows.

In the end, with everything I’ve just detailed, we end up with just 9 Gigs.

62
Spring/Summer Autumn Winter

Now I’ll show you some results. Even though we didn’t go for full exactitude, it gives
plausible results.

63
Specular reflections
Screen Space Reflections

Static « Gbuffer Cubemaps »


• dynamically relit
DCM
LCMs

Dynamic Cubemap SSR


• at player position
• lower LODs (fake entities)
• acting as a « regional cubemap »

For Specular Reflections, we use in the following order:

*click* Screen Space Reflections

*click* Relightable gbuffer cubemaps

*click* A dynamic cubemap that acts as a regional cubemap.

64
Specular reflections
• Static « Gbuffer » Local Cubemaps
• Baked, dynamically relit
• Albedo, Normal, Depth
• Shadows
• 8bits texture, 8 keyframes, 1 bit per key
• Select 2 “key frames” closest to current ToD
• Compute blend factor with ToD
• Support Time of Day and Seasons
• Variation States

Gbuffer cubemaps are not something new.

We prebake Albedo, Normal, Depth. Then dynamically relight it at runtime.

Shadows are also baked. To do so we store 8 keyframes in a 8 bits texture: 1 bit per key.

Then at runtime, we select the 2 keyframes closest to current ToD, and blend them
accordingly

For seasons, we use what we call “Variation States”

65
Specular reflections
• Variation States
• Spring & Summer
• Autumn
• Winter
• 16k LCMs! (+ Variations)

Variation States allow for data-driven game-specific logic, such as season

In practice, we store 3 variations per cubemap (remember spring/summer are considered


the same season in terms of indirect lighting).

In total, we have 16k cube map entities in the world, without counting their variations.

66
Specular reflections

• LCM vs DCM
• Fake entities (lower LODs)
• Far shadow
• Low frequency updates

Dynamic Cube Map (Left) – Local Cube Map (Right)

Our Dynamic Cubemap is written to be very cheap to render.

We render fake entities in it and use lower resolution far shadows.

You can see the difference between the dynamic cube map on the left, and a gbuffer local
cube map at the same location.

67
Specular reflections
• LCM seasons (variations)

• LCM time of day (relighting)

Finally, this is an LCM with its seasons, and various times of day, for a given location.

68
Ray tracing
• Initial technical choices
• Inspired by Snowdrop ray tracing pipeline [Kuenlin 24]
• Software (SWRT) [Koshlo 24] and Hardware (HWRT) ray tracing
• Inline for optimal traversal loop
• Uber shader approach to control performances
• But wanted to keep our options open
• Abstraction Inline/Non-inline/SWRT/HWRT

As I said at the beginning, we also support ray tracing.

We were particularly inspired by the work done by the Snowdrop team.

We decided to follow a similar design, favoring inline ray tracing, and supporting both
hardware and software ray tracing.

But we were lacking data to back this up and wanted to keep our options open. So, we
went for a complete abstraction of inline/non-inline/software/hardware.

69
Ray tracing stack
• Fusion
• Hardware abstraction layer of 3D APIs (CPP and HLSL)
• Shared across multiple engines
• Insourcing model
Anvil Engine

GraphicsCore • framegraph, render passes, …


Other Engines
GraphicHal • buffers, textures, renderstates, …

Fusion HAL • DX11, DX12, Vulkan, Metal, …

Our ray tracing stack is abstracted under a HAL abstraction we call Fusion.

Fusion is shared across multiple engines and developed following an insourcing model.

70
Ray tracing stack
• Unified API
• Cross platform
• Hardware/Software
• Inline/Non-inline

TraceRayInline D3D12 Implementation Interface implemented for each platform / API, and SWRT

We developed a unified ray tracing API to abstract platforms specific APIs,


Hardware/Software raytracing, and inline/non-inline

We wanted switching back and forth to require minimal effort.

On the right, in green, you can see our ray tracing loop, with a callback interface.

71
Ray tracing stack
• Inline/Non-inline

Ray tracing loop

Ray tracing callback system

In more details, this callback interface serves as an inline/non-line ray tracing


abstraction.

When using inline raytracing, depending on the ray query state, it will invoke the
corresponding callback (hit, anyhit, miss, …).

72
Ray tracing stack
• Callback system

Callbacks to implement by client shader


Example of Non-inline RT shaders

In this case, with non inline raytracing, you can just call the right callback inside the
corresponding shader.

Of course, for hit shaders, you’ll probably want to use a shader table instead.

But it works and it makes it very easy to switch between one pipeline and another, and test
things.

73
Ray tracing stack
• CPP
• Trivial to switch between Software / Hardware Ray tracing

• Little dev required to switch between Inline / Non-inline

Inline raytracing
Non-inline raytracing

In terms of CPP code, switching between Software and Hardware raytracing is simply
done with an enum.

And the difference to setup inline and non inline ray tracing is minimal, mainly comes to
the Shader Table setup.

74
Ray tracing pipeline
• Hybrid RTGI
• No vertex animation/skinning in BVH
• Per pixel ray tracing combines
1. Secondary Rays
1. Screen space ray tracing
2. World space ray tracing
• Hardware or Software ray tracing
• Acceleration Structure or custom BVH
2. Irradiance from DDGI-like ray traced probes
• 5 cascades of 16x16x8 probes
• Every 2m in the 1st cascade, doubled with each cascade
• 10k probes, 1024 probes updated each frame

Our hybrid ray traced GI combines 2 steps:

- Per Pixel Ray tracing, made of screen space rays and world space rays.

- And DDGI-like probe cascades. We have roughly 10k probes, and update 1k probes
each frame

75
Ray tracing pipeline
• “Ray tracing Gbuffer”
• Ray tracing hits stored as a set of PBR properties
• Deferred Hit Lighting pass

Our ray tracing hits are stored as a set of PBR properties, in a “ray tracing gbuffer”.

And then we relight the hits in a deferred lighting pass.

76
Ray tracing pipeline
RT Specular*

Hit
Secondary Ray

SS Hits
Screen-space
ray tracing

No hit or Hit Lighting Denoised Result Final


Occlusion

WS Hits

Irradiance from RT RT-AO term


Probe Cascades (artistic modulation)

This is what our ray traced diffuse pipeline looks like.

We trace per pixel rays in screen space

If no hit is found, because for example it is out of frustum, or occluded,

we resume the screen space ray in world space with a hardware ray to find a hit in the AS.

We store the hits into our ray tracing “gbuffer” at the ray origin location.

Then all these hits are lit in the hit lighting pass.

If there is no hit for a given location, we process a miss and sample the sky.

Now we add Irradiance from ray traced probes.

Finally, we denoise the result, optionally add some extra RT-AO, add RT Specular if
available, to get the final image.

77
RTGI
Raytracing

There are 3 reasons why we add some RT-AO at the end:

1. We sum 2 systems with different frequencies: per pixel rays, and ray traced probe.

2. We ray trace diffuse at quarter res on most platforms, so we lose some high
frequency details.

3. There can be some disparities between the raster world and the ray tracing world.
For example, our grass doesn’t exist in the acceleration structure for perf reasons.

*click* So, we reapply some RT-AO to our general AO terms. You can see how it helps
houses, large vegetation, …

*click* We also still evaluate a subtle SSAO term to catch details that might be missing
from low res BVH or low ray count (quarter res)

*click* To get this final render

78
RTGI + RTGI-AO

There are 3 reasons why we add some RT-AO at the end:

1. We sum 2 systems with different frequencies: per pixel rays, and ray traced probe.

2. We ray trace diffuse at quarter res on most platforms, so we lose some high
frequency details.

3. There can be some disparities between the raster world and the ray tracing world.
For example, our grass doesn’t exist in the acceleration structure for perf reasons.

*click* So, we reapply some RT-AO to our general AO terms. You can see how it helps
houses, large vegetation, …

*click* We also still evaluate a subtle SSAO term to catch details that might be missing
from low res BVH or low ray count (quarter res)

*click* To get this final render

79
RTGI + RTGI-AO + SSAO

There are 3 reasons why we add some RT-AO at the end:

1. We sum 2 systems with different frequencies: per pixel rays, and ray traced probe.

2. We ray trace diffuse at quarter res on most platforms, so we lose some high
frequency details.

3. There can be some disparities between the raster world and the ray tracing world.
For example, our grass doesn’t exist in the acceleration structure for perf reasons.

*click* So, we reapply some RT-AO to our general AO terms. You can see how it helps
houses, large vegetation, …

*click* We also still evaluate a subtle SSAO term to catch details that might be missing
from low res BVH or low ray count (quarter res)

*click* To get this final render

80
RTGI + RTGI-AO + SSAO

There are 3 reasons why we add some RT-AO at the end:

1. We sum 2 systems with different frequencies: per pixel rays, and ray traced probe.

2. We ray trace diffuse at quarter res on most platforms, so we lose some high
frequency details.

3. There can be some disparities between the raster world and the ray tracing world.
For example, our grass doesn’t exist in the acceleration structure for perf reasons.

*click* So, we reapply some RT-AO to our general AO terms. You can see how it helps
houses, large vegetation, …

*click* We also still evaluate a subtle SSAO term to catch details that might be missing
from low res BVH or low ray count (quarter res)

*click* To get this final render

81
Acceleration Structure
• Ray tracing materials
• Bindless textures or average color
• Albedo and alpha
• Texture mip bias Primary Rays - no textures Primary Rays – albedo textures
• Average PBR attributes
• Overrides
• Cull meshes by size or type
• Clutter, small props, …
• “BVH Quality” BVH Quality - Low BVH Quality - High

• Culling distances from the camera


• Global LOD offset
• Override
• Micropolygon
• Prebake RTMesh for a specified LODError
RT LOD = Last LOD RT LOD = Last LOD - 2

Our Acceleration Structure works as follows.

We support both bindless textures or average colors from our material textures.

We cull meshes by size or type.

We use a global LOD offset to determine mesh LODs used in the BVH.

Most of these can be overridden per asset to correct specific issues.

82
Acceleration Structure - No Textures Acceleration Structure – Albedo Textures

Final Gbuffer - Albedo

This is an example of our Acceleration Structure:

- At the top right is what we shipped in shadows, the acceleration structure using
bindless albedo textures.

- Gbuffer is at the bottom right for comparison.

83
Acceleration Structure
• Alpha test
1. Blue noise approximation (Snowdrop)
• Encode a transparency factor
• opaque texel count / total pixel count
• Stochastically texkill the right amount of texels
Blue noise alpha - 10 hits Blue noise alpha - 2 hits
2. Bindless Alpha texture
• Costly AnyHit with many overlaps
3. Tracing Max Transparent Hit Count
• Force opaque after n hits
• Biased result

Alpha texture - 10 hits Alpha texture - 2 hits


AnyHit callback

For Alpha test, we tried several things

We tried using blue noise to approximate a transparency factor, and stochastically


texkill the right amount of texels.

We added support for bindless alpha texture, with close to no overhead.

But we ended up with either one of these 2 issues:

- If you want a precise result: Anyhit is very costly with many overlaps.

- And if you try to limit the number of hits, you’ll get way much occlusion.

84
Acceleration Structure
• Alpha test
• Scale triangles according to average opacity
• 30% faster to trace
• Diffuse GI very close to reference
• Close hits handled in screen space AnyHit with alpha texture

Indirect Diffuse – alpha textures Indirect Diffuse – scaled triangles


ClosestHit with scaled opaque triangles

Finally, we settled for another way.

We scale triangles according to their average opacity.

It’s 30% faster to trace.

Diffuse GI looks close to our reference.

It works very well because closest hits are handled in screen space and are much more
accurate.

*click* And it’s also very good and cheap approximation for specular reflections.

85
Acceleration Structure
• Alpha test
• Scale triangles according to average opacity
• 30% faster to trace
• Diffuse GI very close to reference
• Close hits handled in screen space AnyHit with alpha texture

Alpha test approximation w/ ray traced specular ClosestHit with scaled opaque triangles

Finally, we settled for another way.

We scale triangles according to their average opacity.

It’s 30% faster to trace.

Diffuse GI looks close to our reference.

It works very well because closest hits are handled in screen space and are much more
accurate.

*click* And it’s also very good and cheap approximation for specular reflections.

86
Acceleration Structure
• “RT Materials as a uber shader” approach (inline RT)
• Panicked after watching [Gong 21]
• (Capcom JSON shader mapping)
• But Ubi very strict about shader variety
• Low master shader count, owned by TD Arts
• GPU Driven pipeline reasons (per PSO batching)
• Relatively easy to write a "one fits them all" RT uber shader with Albedo textures
• Works for the vast majority, but epic fail for a few cases that requires manual fixing/overrides

Because we use Inline RT, we rely on a uber shader approach for our ray traced
materials.

*click* Thankfully, Ubi is very strict with shader variety and management.

We have a low master shader count for GPU Driven Pipeline reasons.

In the end it was relatively easy to write a “one fits them all” uber shader

*click*
But there were still a few epic fails, as in these screenshots. They were easily mitigated
with overrides.

87
Acceleration Structure
• Material table
• Unified representation
• PBR materials
• Bindless albedo and alpha textures
• Rough approximation of other PBR values
• Averaged mip 0 of PBR textures
• Mapping InstanceID+GeometryIndex to Material Table
• Season
• Season specific material variations
• Account for season specific material logic
• Fallen leaves, LUTs, …
• Limitations
• Costly deferred weather
• fully evaluated for specular (but not for diffuse)
• static snow applied for diffuse and specular
• baked in the terrain vista

As I said before, we use a unified representation of our materials for ray tracing

They are stored inside a material table that we access using the hit geometry IDs.

Seasons are handled with material variations in this table, to account for fallen leaves,
or the use of look up tables for leaf colors.

Deferred weather is fully evaluated for specular ray tracing, but not for diffuse, because
it’d quite costly to evaluate it multiple times.

Static snow is baked into the terrain vista in winter though, so it’s there even for diffuse.

88
Spring Summer Autumn Winter

This is what our acceleration structure looks like for each season.

89
Acceleration Structure
• Typical urban scene
• 2k+ BLASes
• 30k+ instances
• 300MB BLAS Data
• 20MB TLAS Data Acceleration structure

BLAS Intersection count


Same scene in game

In a typical urban scene, this is on Xbox Series X, we end up with

around 30k+ instances,

for 2k+ BLASes.

Our BVH data is around 320MB.

90
Software ray tracing
• Stack-based traversal [Koshlo 24]
• Compute shader based
• Targets low-end platforms
• Dynamic hideout
3-level BVH partitions
• 3-level BVH with partial updates

Software Ray Tracing Hardware Ray Tracing

Our software ray tracing stack is very similar to Snowdrop’s.

It’s part of our low-level abstraction layer, Fusion.

We use it mainly for lower end platforms.

It’s structured as a 3-level BVH,

relying on space partition to perform partial updates only on cells where something has
changed.

91
Ray traced Indirect Diffuse
• Indoor volumes
• Avoid indoor light leaks
• List of planes
• Same data as BakedGI
• Classification of probes
• Sampling
• Find 8 surrounding probes Probe classification OFF
• Compute probe weights
• Distance, visibility, …
• Sample weighted probes with same classification

Probe classification ON

We still rely on indoor volumes to avoid leaks in interiors.

They are made of a list of planes. It’s the same data we use for baked GI, deferred
weather and other systems.

They are used to classify probes.

At sampling time, we find the 8 surrounding probes,

and sample weighted probes with the same classification

92
Ray traced Indirect Diffuse
• Light Probes convoluted into
• Irradiance Volume [McGuire 19]
• Cosine weighted irradiance
• 5x5 pixels (octahedral mapping)
• Radiance Cache
• Higher frequency version of IV
• 10x10 pixels (octahedral mapping)
• Use as a cache fallback
• Radiance Cache
• Short rays with probe fallback
• Temporally stable
• Less precise
• Trade-off between precision and stability
Hit Lighting Radiance cache Irradiance

Our light probes are convoluted into:

An irradiance cache, pretty much like DDGI.

A radiance cache, acting as a fallback.

To stabilize the signal, we trace shorter rays, and fallback to the radiance cache, which
is temporally stable.

You can see in the video a challenging scene, and the result with and without the cache.

The output is a bit less precise, but it’s a trade off between precision and stability.

93
Translucency
• Vegetation, Soji doors, etc..
• Stochastically flip rays on translucency surfaces

Lighting – RT Translucency Off Lighting – RT Translucency On AS Translucency Mask

For translucency, we stochastically flip rays on translucent surfaces

Also, if a secondary ray hits a translucent surface such as a paper door,

we sample probes on both sides of the surface,

and blend accordingly using the translucency factor.

94
Ray traced Indirect Diffuse
• Omnidirectional Clustered Lights
• Based on deferred clustered lighting
• 2-level hierarchy
• Level 0: 260k Clusters
• Level 1: 4k “HighClusters” (16x16x16)
• A HighCluster is 4x4x4 Clusters (64)
• Uniform grid instead of “Froxels”
• Centered around camera
• “Just” a different spatial mapping
• 10x speed up in challenging scenes

Local lights are inserted in what we call an omnidirectional clustered lighting structure.

It’s based on a regular clustered lighting code.

The main difference is that the clustered volume is mapped on a uniform grid around
the camera, instead of froxels.

It’s made of a 2-level hierarchy: High Clusters and Clusters.

High Clusters contain 16x16x16 clusters. For a total of 260k clusters.

We first perform a coarse cluster culling, and then a finer grain culling at the cluster
level.

95
Raytraced Indirect Specular
• Last minute development (delayed launch)
• Hybrid ray tracing
• Roughly the same design as ray traced diffuse GI
• High-end platforms (PS5 Pro Quality, PC)
• Denoising more difficult
• Iron out BVH quality issues

I discussed mainly ray traced diffuse, but we also support specular ray tracing.

It was a last-minute development after the game was delayed. It follows roughly the
same design as ray traced diffuse GI.

We mainly target high end platforms.

It was a relatively easy development, leveraging most of the work done for diffuse.

But we had to iron out BVH quality issues, and denoising.

96
Denoising
• A-Trous
• SVGF-like denoiser [Schied and al. 17]
• ReBLUR - NVidia [Zhdan 21]
• Modular Snowdrop Denoiser (MSD)
• Spatial and temporal filter
• Based on recurrent blur approach
• Material masking for characters, vegetation, dynamic objects
• SH Denoising (YSHCoCg)

We have 3 denoisiers in our engine:

A-Trous, an SVGF-like denoiser

ReBlur from NVidia

And MSD from Snowdrop. This is what we shipped

with material masking for characters, vegetation, and dynamic objects

and SH denoising

97
Raw output

This is the RAW output we get from our ray traced diffuse GI

This is after denoising

This is if you use SH to bring more directionality

Final lighting

And the Final image

98
Raw output
Denoising

This is the RAW output we get from our ray traced diffuse GI

This is after denoising

This is if you use SH to bring more directionality

Final lighting

And the Final image

99
Raw output
Denoising
SH Denoising

This is the RAW output we get from our ray traced diffuse GI

This is after denoising

This is if you use SH to bring more directionality

Final lighting

And the Final image

100
Raw output
Denoising
SH Denoising
Lighting result

This is the RAW output we get from our ray traced diffuse GI

This is after denoising

This is if you use SH to bring more directionality

Final lighting

And the Final image

101
Raw output
Denoising
SH Denoising
Lighting result
Final image

This is the RAW output we get from our ray traced diffuse GI

This is after denoising

This is if you use SH to bring more directionality

Final lighting

And the Final image

102
Baked GI

103
RTGI

104
RTGI+Spec

105
RTGI

106
RTGI+Spec

107
Ray tracing performances
• Ray traced diffuse probes

PS5 PS5 Pro XBSX XBSS RTX 4080


1024 probes 1024 probes 1024 probes 512 probes 512 probes
Gbuffer update 0.34ms 0.19ms 0.31ms 0.32ms 0.06ms
Sun shadows 0.19ms 0.12ms 0.17ms 0.18ms 0.03ms
Lighting update 0.36ms 0.23ms 0.34ms 0.46ms 0.08ms
Lighting convolution 0.05ms 0.04ms 0.05ms 0.07ms 0.01ms
Probe relocation 0.09ms 0.08ms 0.09ms 0.09ms 0.03ms
Total 1.03ms 0.66ms 0.96ms 1.12ms 0.21ms

In terms of performances, this what our ray traced diffuse probes cost,

roughly 1ms on most SKUs

108
Ray tracing performances
• Per pixel ray traced diffuse
• Quarter resolution (W/2 x H/2)

PS5 1440p PS5 Pro 1440p XBSX 1440p XBSS 900p RTX 4080 1440p
SS tracing 0.54ms 0.33ms 0.43ms 0.44ms 0.11ms
WS tracing 1.38ms 0.72ms 1.31ms 1.38ms 0.19ms
Lighting 1.17ms 0.76ms 1.02ms 1.06ms 0.36ms
Denoising 1.91ms 1.31ms 1.54ms 1.35ms 0.50ms
Total 5.00ms 3.12ms 4.30ms 4.23ms 1.16ms

Here is the cost of our per pixel ray tracing.

if we add these numbers, on a PS5 base at 1440p,

it’s a total of roughly 5ms + 1ms for the probes.

So 6ms of GPU time, spread across the gfx queue and the async queue.

109
Ray tracing performances
• Per pixel ray traced specular
• Half resolution (W/2 x H), looks significantly better than Quarter resolution (W/2 x H/2)

PS5 Pro 1440p RTX 4080 1440p


Tracing 2.73ms 0.46ms

Lighting 1.51ms 0.51ms

Denoising 2.43ms 0.93ms

Total 6.67ms 1.9ms

Finally, some numbers for ray traced specular. We wanted to ship it at quarter res like
diffuse.

But we ended up tracing it at half resolution instead,

Half res looked significantly better, and we had the frame budget for it on the PS5 Pro.

110
Weather and Seasons
• Systemic approach to weather and seasons
• “Atmos” fluid simulation
• Advects humidity, temperature, …
• “Ambiance Graph” drives all the logic
• VFX, Fog, Rain, Wetness, Puddle level
• Deferred rain and snow
• “Deep” Snow for footsteps
• Multistate entities
• Build multistate entity templates
• Data-driven game-specific logic (seasons, pristine/destroyed, …)
• Multiple looks, nav-meshes, …

Now about weather and seasons

*click* We adopt a systemic approach weather and seasons

With “Atmos”, a fluid simulation

And an “Ambiance Graph” to drive all the weather and seasons logic

*click* Rain and snow are rendered in deferred

*click* We introduced the concept of multistate entities.

To alter entities, their look, or anything, based on game specific logic.

111
Weather and Seasons - Atmos
• Real-time fluid simulation [Hädrich et al. 20]
• Propagates various atmospheric factors
• Feeds dynamic clouds, wind, and rain systems
• Simulate cloud formation and dissipation, …

Atmos would deserve a whole talk. It simulates and propagates atmospheric factors with
a fluid simulation

such as vapor, temperatures, humidity, … and simulates the wind using low resolution
voxel data.

These quantities are then fed to various systems, such as volumetric clouds to drive
their formation, wind, and rain.

On the right, some graphs of various quantities function of the altitude: temperature,
vapor, vorticity, …

112
Weather and Seasons - Ambiance

• Ambiances
• Time of day, weather, grading,
seasons, post effects…
• Mostly curves
• Limited logic
• Ambiance graph
• Data driven
• Visual scripting (node graph)
• Outputs custom UI

In previous games, we relied exclusively on an Ambiance manager system.

*click* We used it to drive time of day, lighting features, weather, grading parameters,
post effects, …

It’s mostly curves and has limited logic.

*click* For Shadows, we added the concept of visual scripting with an Ambiance Graph.

It’s based on our node graph system, and fully data driven.

The ambiance graph will consume inputs from the engine, from atmos and drive the
whole weather and season stack, ToD and so on with tech art driven logic.

113
Deferred Weather - Rain
• Deferred Rain [Lagarde 12]
• Wetness and Puddle Level
• Albedo darkening

• Roughness

• Ripple Normal map applied to puddles

Our deferred rain rendering is based on Sebastien Lagarde great blog post series.

We use it to manipulate and render wetness and puddle level. We darken albedos and
decrease roughness based on wetness levels.

114
Deferred Weather - Rain
0% Wetness / 0% Puddles 100% Wetness / 0% Puddles 100% Wetness / 100% Puddles

• Final Render
• Darken Albedo
• Lower Roughness

• Wetness [0..2]
• R = no wetness on material
• G = wetness factor [0..1]
• B = puddle factor [1..2]

• Weather Albedo

This is an illustration of how it works.

The first row shows the final render

The middle row shows our wetness mask, encoded as a single float in the Gbuffer.

A value between 0 and 1 encodes wetness, whereas a value between 1 and 2 encodes a
puddle factor.

Red objects, mainly characters, don’t fall into the deferred wetness shading. Characters
use a dynamic character layer system.

115
This is a step-by-step result:

You can see a dry scene.

*click* Now the same scene, but wet.

*click* and now with puddles.

116
This is a step-by-step result:

You can see a dry scene.

*click* Now the same scene, but wet.

*click* and now with puddles.

117
This is a step-by-step result:

You can see a dry scene.

*click* Now the same scene, but wet.

*click* and now with puddles.

118
Deferred Weather - Snow
• “Deep” Snow
• Data driven stamper: capsules, textures
• Terrain deformation, Heightmap
Footsteps stamping
• Deferred Snow
• Like deferred rain approach
• Snow modifies material inputs
• Dynamic Snow Accumulation
• Painted mask (warm zones)
• Static/Dynamic snow
• Pseudo random sparkles
• Specular scale
• Indoor Masking
Static/Dynamic snow map

Snow is also an important part of the game.

*click* First there is what we call “deep” snow. It relies on a data driven stamper that can
stamp capsules, or textures.

It can be used to deform the terrain, or any heightmap.

*click* Next is the deferred snow

It’s like the deferred rain. Deferred Snow also modifies material inputs.

We have the concept of cold and warm zones, driven by a static/dynamic snow mask.

119
Deferred Weather - Snow
• Snow modifies material inputs

As I said deferred snow modifies material inputs.

It lerps albedos toward snow albedo (white) accordingly to current snow level.

It does the same with roughness.

And it also lowers translucency the higher the snow level is.

120
Deferred Weather - Snow
• Microvisibility masking

Microvisibility buffer

Next snow is masked out with microvisibility.

121
Deferred Weather - Snow
• Threshold snow based on normal orientation

Finally, we threshold the snow based on normal orientation.

122
Deferred Weather - Snow

This is a whole sequence of dynamic snow accumulation, accelerated.

I don’t show it here, but when the snowstorm stops,

the snow will slowly decay and turn into wetness and puddles until it melts and dries.

This logic is driven by the Ambiance Graph I showed earlier.

123
Deferred Weather - Occlusion
• Top-down indoor depth map
• Mid-range
• Raster – same indoor volumes used for GI
• Occludes wetness/snow accumulation

Indoor occlusion No indoor occlusion Indoor occlusion No indoor occlusion

To exclude deferred snow and rain from interiors we use the same indoor volumes we
use for GI.

We rasterize them in an indoor depth map.

And use it to determine an occlusion factor when we apply the weather.

124
Deferred Weather - Occlusion
• Oriented depth map
• Occludes precipitation particles and ripples
• Close range
• Regular geometry
• Rain orientation

For rain particles and ripples, we render the surroundings in a regular depth map
oriented in the rain direction.

And we use it to occlude rain particles and ripples.

125
Scalability & Performance

Now I’ll talk about our approach to scalability and performances.

126
Scalability
• Very different frames in quality and performance modes
• Xbox One Series X, Quality, 1620p, 33.1ms

• Xbox One Series X, Performance, 1080p, 16.2ms

Mainly due to the fact we ship different GI systems in performance and quality modes,
our frames look very different in either mode.

If we add to that the number of platforms we had to ship (PC, PS5, PS5 Pro, XBSX, XBSS,
MacOS, …), it was of a complexity we’d never faced before.

127
Platform Manager
• Data driven performance settings
• Per platform and context settings
• Mostly graphics and engine systems
• Terrain, streaming, fog, water,
shadows,…
• Auto generated UI
• Live edition

Let’s talk about the platform manager.

It is how we implement data driven performance settings, per platform and context. It
is a mainly a tool for Tech Art Directors.

It is mostly but not only for engine and graphics systems

UI is auto generated, and it’s live editable.

128
Platform Manager
• Profiles
• Platforms, “modes”
• Triggered by users

• Profile settings
• Requires reloading the world
• System and features scalability
• Profile boot settings
• Requires restarting the game

Profiles are basically platform modes, such as perf mode, quality mode.

Modes are triggered by the user.

Profile settings require reloading the world, while profile boot settings require restarting the
game (there are very few of them).

You can see some examples on the right. GI technique, Hair Strands memory budgets,
etc…

129
Platform Manager
• Contexts
• Game states
• Triggered by gameplay

• Context settings
• Runtime specific features
• Change upon game state changes

A context is triggered by gameplay. It can be the Game itself, Menus, Cinematics,


PhotoMode, …

It is a way to fine tune render features and performances for specific scenarios.

130
Platform Manager

• Modifiers
• Triggered by data
• Painted region layout
• 3D volumes
• Localized performance issue
• dense forest, …
• Specific needs for specific areas
• Forest vs caves, …

Modifiers are triggered by data.

They can be painted or triggered by 3D volumes.

It’s mainly used to address localized performance issues

or specific needs.

131
Platform Manager
• First-class citizen
• Mandatory support for new systems
• Fast iteration and profiling
• Useful to iterate with vendors
• “Simulate” consoles in the editor

To summarize, the Platform Manager is a first-class citizen in Anvil.

It is great for fast iteration and profiling.

It is very useful to iterate with vendors (according to them at least )

We can “simulate” console settings in the editor.

132
Transient resource tracking
• High level
• Resource lifetime
• Transient memory peak
• Included in performance captures

• Low level
• Inspect allocators
• Debug and optimize aliasing
• Memory waste

To track transient resource allocations, we use 2 internal tools

*click* There is a high-level view available in our internal perf capture tool, that shows
resource lifetime and the allocation peak during the frame (the green curve).

It is more of a logical view. We can color the allocations depending on their lifetime, or
their allocation size.

*click* There is also another more low-level view. We output a SVG with a very detailed
view of our allocators.

It is very useful to detect memory waste, and view memory aliasing patterns.

133
Performance Telemetry
• Teleport camera to each cell
• Loading gates
• Perf counters North/South/East/West
• Auto CPU/GPU captures if low fps
• Compare builds to find regressions
• At different dates
• Against another platform
• …

World Telemetry

We have a lot of telemetry to hunt performance regressions or anomalies.

It teleports the camera in each world cell and takes a snapshot in each cardinal
direction.

It gathers many counters, CPU, GPU, anything

And is stored in a DB so we can track the evolution of our builds, compare them, ..

134
CPU work distribution
Comparison of two PS5 builds (GPU time)

Dynamic resolution heatmap Meta-AI unit heatmap

We can track many things from the engine: cpu time, gpu time, npc count, dynamic
resolution factors, any metric really.

A few examples:

At the top left, it’s the distribution of CPU work over a session (between graphics, physics,
gameplay, …)

At the top right, I compare 2 ps5 builds to identify GPU time regressions

At the bottom left, I show a heatmap of the dynamic resolution factor, to identify the
most challenging areas for the GPU.

At the bottom right, this is the number of spawned meta-AI units.

Anything can be tracked and integrated in our telemetry, and it really helps spot
undesired behaviors in such a large game.

135
Conclusion

So it’s time to conclude

136
Conclusion
• Shadows really is a “Large Scale Systemic Open World”
• First game shipped in a monorepo with a shared engine
• Largest Assassin’s Creed game in terms of scope
• Ray tracing and virtualized geometry in a 16x16km Open World
• Most scalable version of Anvil to date

ACShadows is first game we shipped in a monorepo ecosystem with a shared engine.

It’s the largest Assassin’s Creed game ever made in terms of scope.

It includes state of the art ray tracing and virtualized geometry in a large open world.

And it’s the most scalable version of Anvil to date.

137
Conclusion

What would we improve?

What would we improve?

138
Conclusion
• Lot of technology developed during production
• Micropolygon, Ray tracing, Atmos, Terrain, GPU Scatter, Hair, …
• Micropolygon
• Had to be conservative and adjust production guidelines
• Extrapolate polygon budgets early in production
• Any mistake would mean reworking assets in the entire 16x16km world

• Extend the use of Micropolygon


• Increase geometric details

We developed a lot of technology for this game: micropolygon geometry, ray-tracing...

These were developed during the production, and we had to anticipate and extrapolate
extensively

to make sure they could perform at the scale.

With a game of this size, there’s little room for errors—any mistakes could mean
revisiting and fixing assets across the entire world map.

*click* We’ll invest more and more in Micropolygon and push to increase geometric
details.

139
Conclusion
• Baked and Ray traced Global Illumination
• Good for players
• A headache for developers!
• Performance vs Quality Modes = very different frames
• Underestimated workarounds and hacks artists use with our Baked GI
• System specific bugs
• Real-time cutscenes

• Make Ray tracing more prevalent to simplify our pipeline

About Baked and Raytraced GI

It’s more choice for players, but a headache for devs 

It increased the complexity of the game, and we ended up with 2 very different frames in
perf and quality modes.

We also clearly underestimated the workarounds and hacks artists used with our
BakedGI, especially in cutscenes.

And support was more complex because we had system specific bugs.

*click* more ray tracing in future games will help us simplify our pipeline

140
Conclusion
• Combinatory complexity
• Difficult to QA
• 16x16km w/ Time of Day
• Systemic Weather
• 4 Seasons with variations
• Many platforms (PS5, PS5 Pro, XBSS, XBSX, PC, MacOS, Steam Deck)
• Performance/Balanced/Quality modes on Consoles
• Many graphics options
• Many upscalers (TAA, DLSS, XeSS, FSR, PSSR)
• Very different results
• Code complexity + vendor specific frameworks
• Microsoft DirectSR nice on paper
• But lagging behind bleeding edge updates
• No FrameGen support

• Time to rethink how we approach this complexity

This game was very difficult to QA.

It’s very large, and very dynamic.

We shipped on many platforms, with many modes, options and upscalers.

Upscalers are evolving fast with new versions every now and then. We had specific issues
with each upscaler. They often output quite different results.

Vendors specific frameworks increased code complexity.

I really like the promise of Microsoft DirectSR to abstract all that, but

It’s still lagging behind in terms of upscaler versions.

And there is no Framegen support, afaik.

These 2 points are a deal breaker, but I really hope it is addressed in the future.

*click* Considering all this, we think it’s time to rethink how we approach this complexity
and how we present it to the players.

141
Conclusion
• Graph based season and weather
• Easy to prototype and iterate
• Difficult to QA and repro issues
• Idea requires more iteration

• GPU frame complexity


• Platforms / Modes / Multiple games in the same codebase
• More data driven

• More customization needed in a shared engine


• (with a monorepo approach)

Graph based season and weather made it easy to prototype

But it was difficult to QA and repro issues.

It’s an idea that requires more iteration and maturity.

*click*
We also definitely want to push frame modularization and customization further,

especially in the context of a monorepo and shared engine.

142
BIBLIOGRAPHY
• [Bussière and Lopez 24] GPU-Driven Rendering in Assassin’s Creed Mirage, GPU Zen 3
• [Koshlo 24] Ray Tracing in Snowdrop: Scene Representation and Custom BVH, GDC 2024
• [Kuenlin 24] Raytracing in Snowdrop: An Optimized Lighting Pipeline for Consoles, GDC 2024
• [Gong 21] 'Resident Evil Village': Our Approach to Game Design, Art Direction, and Graphics, GDC 2021
• [Karis 21] Nanite: A Deep Dive. SIGGRAPH 2021
• [Zhdan 21] Reblur: A Hierarchical Recurrent Denoiser
• [Hädrich et al. 20] Stormscapes: Simulating Cloud Dynamics in the Now
• [Achard 19] Exploring Raytraced Future in Metro Exodus, GDC 2019
• [Hobson 19] The Indirect Lighting Pipeline of God of War, GDC 2019
• [McGuire 19] Dynamic Diffuse Global Illumination, GDC 2019
• [Lefebvre 18] Virtual Insanity: Meta AI on Assassin's Creed: Origins, GDC 2018

Some links related to this talk

143
BIBLIOGRAPHY
• [Uchimura 18] Practical HDR and Wide Color Techniques in Gran Turismo SPORT, SIGGRAPH ASIA 2018
• [Rodrigues 17] Moving to DirectX 12: Lessons Learned, GDC 2017
• [Schied and al. 17] Spatiotemporal Variance-Guided Filtering, HPG 2017
• [Haar and Aaltonen 15] GPU Driven Rendering Pipelines, SIGGRAPH 2015
• [Jacobs 15] Simulating the Visual Experience of Very Bright and Very Dark Scenes
• [Lagarde 12] Water drop – Physically based wet surfaces, Blog post series
• [Tokuyoshi 11] Fast Global Illumination vis Ray-Bundles
• [Cignoni et al. 05] Batched Multi Triangulation
• [Oat 05] Irradiance Volumes for Games, GDC 2005
• [Jensen 00] Night Rendering
• [Kumar et al. 96] Hierarchical Back-Face Culling

144
Questions? Nicolas Lopez
@Nicolas_Lopez_
@nicolas-lopez.bsky.social
@NicolasLopez@mastodon.gamedev.place

And we are done. I’ll be taking questions if we still have time for it 

145

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy