3D Computer Graphics Alan Watt 3th Edition Selected Chapters
3D Computer Graphics Alan Watt 3th Edition Selected Chapters
This book will enable you to master the fundamentals of 3D computer graphics.
As well as incorporating recent advances across al.l of computer graphics it
contains new chapters on
Advanced radiosity
Animation
Pre-calculation techniques
and includes a CD containing a 400 image study.
Alan Watt, based at the University of Sheffield, is the author of several
successful books including Advanced
Animation and Rendering Techniques and
The Computer Image.
II II I m11~11
90 0 0
O>
, 780201 , , . , ,
Typeset by 42
Printed and bound in The United States of America
10 9 8 7
07 06 OS 04 03
Para Dionea
a garota de Copacabana
Contents
Preface
xvi
Acknowledgements
xxi
GD
2
8
1.2
Structure-deforming transformations
1.3
11
12
12
12
14
15
1.4
17
17
18
19
21
23
23
1.5
25
27
27
2.1
33
37
(~)
CONTENTS
CONTENTS
2.1 .2
2.1.3
2.1.4
2.1.5
46
51
53
55
56
2.4
56
64
66
66
69
75
3.2
B-spline representation
3.2.1 B-spline curves
3.2.2 Uniform B-splines
3.2.3 Non-uniform B-splines
3.2.4 Summary of B-spline curve properties
78
78
80
84
90
3.3
Rational curves
3.3.1 Rational Bezier curves
3.3.2 NURBS
90
91
93
94
98
100
101
106
107
110
115
121
123
123
124
125
125
128
128
135
138
140
141
142
142
143
143
143
143
147
147
147
149
152
156
157
159
159
162
164
Introduction
5.1
5.1.1
5.1 .2
5.1 .3
77
4.1
58
59
2.6 Summary
Introduction
38
38
39
44
2.5
6.1
6.2
6.2.1
6.2.2
6.2.3
6.3
6.3.1
6.3.2
6.3.3
6.3.4
6.4 Rasterization
6.4.1 Rasterizing edges
6.4.2 Rasterizing polygons
167
167
168
171
173
177
179
179
180
181
182
183
183
183
185
CONTENTS
CONTENTS
6.5
6.6
6.7
Order of rendering
187
8.5
8.6
6.6.1
6.6.2
6.6.3
6.6.4
6.6.5
6.6.6
6.6.7
6.6.8
6.6.9
6.6.1 0
189
189
190
191
192
193
193
194
196
198
199
202
Introduction
205
205
7.1
206
7.2
207
7.3
208
7.4
211
7.5
212
7.6
213
7.7
Pre-computing BRDFs
219
7.8
221
214
214
216
216
Mapping techniques
223
Introduction
223
228
229
230
8.1
234
8.3
Billboards
235
8.4
Bump mapping
8.4.1 A multi-pass technique for bump mapping
8.4.2 A pre-calculation technique for bump mapping
236
238
2 39
light maps
240
243
245
247
248
249
8.6.1
8.6.2
8 .6.3
8 .6.4
10
251
251
252
254
256
8.8
256
8.9
260
Geometric shadows
Introduction
263
263
9.1
265
9.2
265
9.3
Shadow algorithms
9.3.1 Shadow algorithms: projecting polygons/ scan line
9.3.2 Shadow algorithms: shadow volumes
9.3.3 Shadow algorithms: derivation of shadow polygons from
light source transformations
9.3.4 Shadow algorithms: shadow Z-buffer
267
267
268
Global Illumination
Introduction
10.1 Global
10.1 .1
10.1 .2
10.1.3
illumination models
The rendering equation
Radiance, irradiance and the radiance equation
Path notation
271
271
275
2 75
276
277
278
281
283
284
284
286
288
292
294
297
300
CONTENTS
CONTENTS
11
12
301
303
304
13
Volume rendering
Introduction
370
370
373
377
378
379
379
380
382
384
385
387
388
390
391
392
392
393
397
398
400
4 01
402
404
406
411
418
4 18
306
306
308
310
314
315
318
319
319
321
323
325
325
332
342
342
343
343
344
345
346
347
350
352
419
354
354
355
355
357
363
15.2 Colo ur
15.2.1
15.2.2
15.2.3
420
42 3
4 24
427
14
15
364
427
429
433
367
435
CONTENTS
CONTENTS
16
17
436
436
443
443
444
445
445
447
447
448
452
452
456
458
460
463
465
469
469
470
Computer animation
Introduction
473
473
476
437
439
440
477
477
493
500
479
483
484
488
492
18
504
505
506
507
510
511
515
517
518
519
522
524
526
529
531
17.9 Summary
534
536
536
537
538
539
18.4 Radiosity
541
18.5 RADIANCE
543
18.6 Summary
543
References
544
Index
553
PREFACE
.&ii,QW . .MS\iS@44*ii*AiiM .f
(~)
Preface
'
''
'''
:
!'
Processor
Model
databases
'
''
'
Applications
program
Proo<>re
Display hardware
'''
'''
:
l'
'
+''
'
<ore
r--
,~::,~~ ~(JEpl~
'
Interaction {
This is the third edition of a book that deals with the processes involved in
converting a mathematical or geometric description of an object - a computer
graphics model - into a visualization - a two-dimensional projection - that
simulates the appearance of a real object. The analogy of a synthetic camera is
often used and this is a good allusion provided we bear in mind certain important
limitations that are not us'ually available in a computer graphics camera (depth of
field and motion blur are two examples) and certain computer graphics facilities
that do not appear in a camera (near and far clipping planes).
Algorithms in computer graphics mostly function in a three-dimensional
domain and the creations in this space are then mapped into a two-dimensional
display or image plane at a late stage in the overall process. Traditionally computer graphics has created pictures by starting with a very detailed geometric
description, subjecting this to a series of transformations that orient a viewer
and objects in three-dimensional space, then imitating reality by making the
objects look solid and real - a process known as rendering. In the early 1980s
there was a coming together of research- carried out in the 1970s into reflection
models, hidden surface removal and the like- that resulted in the emergence of
a de facto approach to image synthesis of solid objects. But now this is proving
insufficient for the new demands of moving computer imagery and virtual reality and much research is being carried out into how to model complex objects,
where the nature and shape of the object changes dynamically and into capturing the richness of the world without having to explicitly model every detail.
Such efforts are resulting in diverse synthesis methods and modelling methods
but at the moment there has been no emergence of new image generation techniques that rival the pseudo-standard way of modelling and rendering solid
objects- a method that has been established since the mid-1970s.
So where did it all begin? Most of the development in computer graphics as
we know it today was motivated by hardware evolution and the availability of
new devices. Software rapidly developed to use the image producing hardware.
In this respect the most important development is the so-called raster display, a
device that proliferated in the mass market shortly after the development of the
PC. In this device the complete image is stored in a memory variously called a
Figure P.l
The main elements of a
graphics system.
CJ
frame store, a screen buffer or a refresh memory. This information - the discretized computer image - is continually converted by a video controller into a
set of horizontal scan lines (a raster) which is then fed to a TV-type monitor. The
image is generated by an application program which usually accesses a model or
geometric description of an object or objects. The main elements in such a system are shown in Figure P.l. The display hardware to the right of the dotted line
can be separate to the processor, but nowadays is usually integrated as in the case
of an enhanced PC or a graphics workstation. The raster graphics device overshadows all other hardware developments in the sense that it made possible the
display of shaded three-dimensional objects - the single most important theoretical development. The interaction of three-dimensional objects with a light
source could be calculated and the effect projected into two-dimensional space
and displayed by the device. Such shaded imagery is the foundation of modern
computer graphics.
The two early landmark achievements that made shaded imagery possible are
the algorithms developed by Gouraud in 1971 and Phong in 1975 enabling easy
and fast calculation of the intensities of pixels when shading an object. The
Phong technique is still in mainstream use and is undoubtedly responsible for
most of the shaded images in computer graphics.
(~)
PREFACE
PREFACE
models and the development of global models. Local or direct reflection models
only consider the interaction of an object with a light source as if the object and
light were floating in dark space. That is, only the first reflection of light from
the object is considered. Global reflection models consider how light reflects
from one object and travels onto another. In other words the light impinging on
a point on the surface can come either from a light source (direct light) or indirect light that has first hit another object. Global interaction is for the most part
an unsolved problem, although two partial solutions, ray tracing and radiosity,
are now widely implemented.
Computer graphics research has gone the way of much modern scientific
research - early major advances are created and consolidated into a practical
technology. Later significant advances seem to be more difficult to achieve. We
can say that most images are produced using the Phong local reflection model
(first reported in 1975), fewer using ray tracing (first popularized in 1980) and
fewer still using radiosity (first reported in 1984). Although there is still much
research being carried out in light-scene interaction methodologies much of the
current research in computer graphics is concerned more with applications, for
example, with such general applications as animation, visualization and virtual
reality. In the most important computer graphics publication (the annual SIGGRAPH conference proceedings) there was in 1985 a total of 22 papers concerned with the production techniques of images (rendering, modelling and
hardware) compared with 13 on what could loosely be called applications. A
decade later in 1995 there were 37 papers on applications and 19 on image production techniques.
PREFACE
Acknowledgements
Lightwork Design Ltd (Sheffield, UK) and Dave Cauldron for providing the
facilities to produce the front cover image (model of the Tate Gallery, St Ives,
UK) and the renderer, RadioRay.
Daniel Teece for the images on the back cover which he produced as part of
his PhD thesis and which comprise three-dimensional paint strokes
interactively applied to a standard polygon model.
Lee Cooper for producing Figures 6.12, 7.8, 8.7, 8.10, 10.4, 18.1, 18.3, 18.5,
18.6, 18.7, 18.8, 18.9, 18.10, 18.11, 18.12, 18.13, 18.14, 18.16, 18.17 and
18.19 together with the majority of images on the CD-ROM. These were
produced using Lightworks Application Development System kindly
supplied by Lightwork Design Ltd.
In addition the author would like to thank Keith Mansfield, the production staff at
Addison-Wesley, Robert Chaundry of Bookstyle for his care with the manuscript
and Dionea Watt for the cover design.
The publishers are grateful to the following for permission to reproduce copyright
material:
Figure 2.1 reproduced with the permission of Viewpoint Digital, Inc; Figure 2.4
from Tutorial: Computer Graphics, Ze (Beatty and Booth, 1982), 1982 IEEE, The
Institute of Electrical and Electronics Engineers, Inc., New York; Figures 2.7 and 2.8
from Generative Modelling for Computer Graphics and CAD (Snyder, 1992), Academic
ACKNOWLEDGEMENTS
Press, London; Figure 2.20 reproduced with the permission of Agata Opalach; Figure
13.3 from VOXEL-MAN, Part 1: Brain and Skull, CD-ROM for UNIX workstations and
LINUX PCs, Version 1.1 Karl-Heinz Hohne and Springer-Verlag GmbH & Co. KG
1996, reproduced with kind permission; Figure 16.14 reproduced with the permission of Steven Seitz; Figure 17.28 from ACMTransactions on Graphics, 15:3, July 1996
(Hubbard, 1996), ACM Publications, New York.
Mathematical fundamentals of
computer graphics
Whilst every effort has been made to trace the owners of copyright material, in a
few cases this has proved impossible and we take this opportunity to offer our
apologies to any copyright holders whose rights we may have unwittingly
infringed.
Trademark notice
1.1
1.2
Structure-deforming transformations
1.3
1.4
1.5
CD
Pa =
pb =
P s=
1
Yt- Yz
YI - Y4
1
Xb- Xa
+ pz (yt - y,)]
+ P b (Xs -
Xa)]
These would normally be implem ented using an incremental form, the fin al
equation , for exam ple, becomin g:
Ps := Ps + D.p
with the constant value D.p calculated once per scan line.
CJ
2.1
2.2
2.3
2.4
2.5
2.6
Summary
Introduction
Th e primary purpose of three-dimensional computer graphics is to produce a
two-dimensional image of a scene or an object from a description or model of
the ob ject. The object may be a real or existing object or it may exist only as a
computer description. A less common but extremely important usage is where
the act of creation of the object model and the visualization are intertwined.
This occurs in interactive CAD applications where a designer uses the visualization to assist the act of creating the object. Most object descriptions are approximate in the sense that th ey describe the geometry or shape of the object only
to the extent that inputting this description to a renderer produces an image of
acceptable quality. ln many CAD applications, however, the description h as to
be accurate because it is used to drive a manufacturing process. The final output
is not a two-dimensional image but a real three-dimensional object.
Modelling and representation is a general ph rase which can be applied to any
or all of the following aspects of objects:
The ways in which we can create computer graphics objects are almost as many
and varied as the objects themselves. For example, we might construct an architectural object through a CAD interface. We may take data directly from a device
such as a laser ranger or a three-dimensional digitizer. We may use some interface based on a sweeping technique where so-called ducted solids are created by
sweeping a cross-section along a spine curve. Creation methods have up to now
tended to be manual or semi-manual involving a designer working with an interface. As the demand for the representation of highly complex scenes increases from such applications as virtual reality (VR) - automatic methods are being
investigated. For VR applications of existing realities the creation of computer
graphics representations from photographs or video is an attractive proposition.
The representation of an object is very much an unsolved problem in computer graphics. We can distinguish between a representation that is required for
a machine or renderer and the representation that is required by a user or user
interface. Representing an object using polygonal facets - a polygon mesh representation - is the most popular machine representation. It is, h owever, an
inconvenient representation for a user or creator of an object. Despite this it is
used as both a user and a machine representation. Other methods have separate
user and machine representations. For example, bi-cubic parametric patches and
CSG methods, which constitute user or interface representations may be converted into polygon mesh es for rendering.
The polygon mesh form suffers from many disadvantages when the object is
complex and detailed. In mainstream computer graphics the number of polygons in an object representation can be anything from a few tens to hundreds of
thousands. This has serious ramifications in rendering time and object creation
cost and in the feasibility of using such objects in an animation or virtual reality environment. Other problems accrue in animation where a model h as both
to represent the shape of the object and be controlled by an animation system
which may require collision s to be calculated or the object to change shape as a
function of time. Despite this the polygon mesh is supreme in mainstream computer graphics. Its inertia is due in part to the development of efficient algorithms and hardware to render this description. This has resulted in a somewhat
strange situation where it is more efficient -as far as rendering is concerned - to
represent a shape with many simple elements (polygons) than to represent it
with far fewer (and more accurate) but more complicated elements such as
bi-cubic parametric patches (see Section 3.4.2).
The ability to manipulate the shape of an existing object depends strongly on
the representation. Polygon mesh es do not admit simple shape manipulation .
Moving mesh vertices immediately disrupts the 'polygonal resolution' where a
shape has been converted into polygons with some degree of accuracy that is
related to the local curvature of the surface being represented. For example,
imagine twisting a cube represented by six squares. The twisted object cannot be
INTRODUCTION
represented by retaining only six polygons. Another problem with shape manipulation is scale. Sometimes we want to alter a large part of an object which may
involve moving many elements at the same time; other times we may require a
detailed change.
Different representational methods have their advantages and disadvantages
but there is no universal solution to the many problems that still exist. Rather,
particular modelling methods have evolved for particular contexts. A good
example of this tendency is the development of constructive solid geometry
methods (CSG) popular in interactive CAD because they facilitate an intuitive
interface for the interactive design of complex industrial objects as well as a representation. CSG is a constrained representation in that we can only use it to
model shapes that are made up of allowed combinations of the primitive shapes
or elements that are included in the system.
How do we choose a representation? The answer is that it depends on the
nature of the object, the particular computer graphics technique that we are
going to use to bring the object to life and the application. All these factors are
interrelated. We can represent some three-dimensional objects exactly using a
mathematical formulation, for example, a cylinder or a sph ere; for others we use
an approximate representation. For objects that cannot be represented exactly
by mathematics there is a trade-off between the accuracy of the representation
and the bulk of information used. This is illustrated by the polygon mesh skeletons in Figure 2.1. You can only increase the veracity of the representation by
increasing the polygonal resolution which then has high cost implications in
rendering time.
The ultimate impossibility of this extrapolation h as led to h ybrid methods for
very complex and unique objects such as a human h ead. For example, in representing a particular human head we can use a combination of a polygon mesh
model and photographic texture maps. The solid form of the head is represented
by a generic polygon mesh which is pulled around to match the actual dimensions of the head to be modelled. The detailed likeness is obtained by mapping
a photographic texture onto this mesh. The idea here is that the detailed variations in the geometry are suggested by the texture map rather than by detailed
excursions in the geometry. Of course, its not perfect because the detail in the
photograph depends on the lighting conditions under which it was taken as well
as the real geometric detail, but it is a trick that is increasingly being used.
Whether we regard the texture mapping as part of the representation or as part
of the rendering process is perhaps a matter of opinion; but certainly the use of
photographic texture maps in this context enables us to represent a complex
object like a human head with a small number of polygons plus a photograph.
This compromise between polygonal resolution and a photographic texture
map can be taken to extremes. In the computer games industry the total number of polygons rendered to the screen must be within the limiting number that
can be rendered at, say, 15 frames per second on a PC. A recent football game
consists of players whose heads are modelled with just a cube onto which a
photographic texture is mapped.
INTRODUCTION
the patch in three-dimensional space and its shape. This formula enables us
to gen erate any or every point on the surface of the patch. We can change
the shape or curvature of the patch by editing the mathematical
specification. This results in powerful interactive possibilities. The problems
are, however, significant. lt is very expensive to render or visualize the
patches. When we change the sh ape of individual patches in a net of
patch es there are problems in maintaining 'smoothness' between the patch
and its neighbours. Bi-cubic parametric patches can be either an exact or an
approximate representation. They can only be an exact representation of
themselves, which means that any object, say, a car body panel, can only be
represented exactly if its shape corresponds exactly to the shape of the
patch. This somewhat torturous statement is n ecessary because when the
representation is used for real or existing objects, the shape modelled will
not necessarily correspond to the surface of the object.
An example of the same object represented by both bi-cubic parametric
patches and by polygonal facets is shown in Figure 3.28 (a) and (c). This clearly
shows the complexity/number of elements trade-off with the polygon mesh
representation requiring 2048 elements against the 32-patch representation.
Figure 2.1
The art of wireframe - an
illustration from Viewpoint
Digital's catalogue.
Source: '3 D models by
Viewpoint Digital, Inc.'
Anatomy, Viewpoin t's 3D
Dataset'"' Catalog, 2nd edn.
141788 polygons
35 305 polygons
8993 polygons
We now list, in order of approximate frequency of use, the mainstream models used in computer graphics.
(1) Polygonal Objects are approximated by a n et or mesh of planar
polygonal facets. With this form we can represent, to an accuracy that we
choose, an object of any shape. However, the accuracy is somewhat arbitrary
in this sense. Consider Figure 2.1 again: are 142 000 polygon s really
necessary, or can we reduce the polygonal resolution without degrading the
rendered image, and if so by how much? The shading algorithms are
designed to visually transform the faceted representation in such a way that
the piecewise linear representation is not visible in the shaded version
(except on the silhouette edge). Connected with the polygonal resolution is
the final projected size of the object on th e screen . Waste is incurred when
a complex object, represented by many thousands of polygon s, projects
onto a screen area th at is made up of only a few pixels.
(2) Bi-cubic parametric patches (see Chapter 3) These are 'curved
quadrilaterals'. Generally we can say that the representation is similar to the
polygon mesh except that the individual polygons are now curved surfaces.
Each patch is specified by a mathematical formula that gives the position of
x2+y2+ z2=r2
The relationship between a rendering method and the representation is critically important in the radiosity method and here, to avoid major defects in the
final image, there has to be some kind of interaction between the representation
a.nd the execution of the algorithm. As the algorithm progresses the representatiOn must adapt so that more accurate consideration is given to areas in the
emerging solution that need greater consideration. In other words, because of
the expense of the method, it is difficult to decide a priori what the level of detail
in the representation should be. The unwieldiness of the concept of having a
scene representation depend on the progress of th e rendering algorithm is at the
root of the difficulty of the radiosity method and is responsible for its (current)
lack of uptake as a mainstream tool.
which is the definition for a sphere. On their own these are of limited
usefulness in computer graphics because there is a limited number of objects
that can be represented in this way. Also, it is an inconvenient form as
far as rendering is concerned. However, we should mention that such
representations do appear quite frequently in three-dimensional computer
graphics - in particular in ray tracing where spheres are used frequently both as objects in their own right and as bounding objects for other polygon
mesh representations.
Implicit representations are extended into implicit functions which can
loosely be described as objects formed by mathematically defining a surface
that is influenced by a collection of underlying primitives such as spheres.
Implicit functions find their main use in shape-changing animation - they
are of limited usefulness for representing real objects.
We have arranged the categories in order of popularity; another useful
comparison is: with voxels and polygon meshes the number of representational
elements per object is likely to be high (if accuracy is to be achieved) but the
complexity of the representation is low. This contrasts wi th bi-cubic patches
where the number of elements is likely to be much lower in most contexts but
the complexity of the representation is higher.
We should not deduce from the above categorization that the choice of a representation is a free one. The representational form is decided by both the rendering technique and the application. Consider, for example, the continuous/
discrete representation distinction. A discrete representation - the polygon mesh
- is used to represent the arbitrary shapes of existing real world objects - it is
difficult to see how else we would deal with such objects. In medical imaging the
initial representation is discrete (voxels) because this is what the imaging technology produces. On the other hand in CAD work we need a continuous representation because eventually we are going to produce, say, a machine part from
the internal description. The representation has, therefore, to be exact.
The CSG representation does not fit easily into these comparisons. It is
both a discrete and a continuous representation, being a discrete combination
of interacting primitives, some of which can be described by a continuous
function.
Another important distinguishing factor is surface versus volume representation. The polygon mesh is an approximate representation of the surface of
an object and the rendering engine is concerned with providing a visualization
of that surface. With Gouraud shading the algorithm is only concerned with
using geometric properties associated with the surface representation. In
ray tracing, because the bulk of the cost is involved in tracking rays through
space and finding which objects they intersect, a surface representation implies
high rendering cost. Using a volume representation, where the object space
is labelled according to object occupancy, greatly reduces the overall cost of
rendering.
Figure 2.2
Approximating a curved
surface using polygonal
facets.
rapidly, more polygons are required per unit area of the surface. These factors
tend to be related to the method used for creating the polygons. If, for example,
a mesh is being built from an existing object, by using a three-dimensional digitizer to determine the spatial coordinates of polygon vertices, the digitizer operator will decide on the basis of experience how large each polygon should be.
Sometimes polygons are extracted algorithmically (as in, for example, the creation of an object as a solid of revolution or in a bi-cubic patch subdivision algorithm) and a more rigorous approach to the rate of polygons per unit area of the
surface is possible.
One of the most significant developments in three-dimensional graphics was
the emergence in the 1970s of shading algorithms that deal efficiently with
polygonal objects, and at the same time, through an interpolation scheme,
diminish the visual effect of the piecewise linearities in the representation. This
factor, together with recent developments in fixed program rendering hardware,
has secured the entrenchment of the polygon mesh structure.
In the simplest case a polygon mesh is a structure that consists of polygons
represented by a list of linked (x, y, z) coordinates that are the polygon vertices
(edges are represented either explicitly or implicitly as we shall see in a moment).
Thus the information we store to describe an object is finally a list of points or
vertices. We may also store, as part of the object representation, other geometric
information that is used in subsequent processing. These are usually polygon
normals and vertex normals. Calculated once only, it is convenient to store these
in the object data structure and have them undergo any linear transformations
that are applied to the object.
It is convenient to order polygons into a simple hierarchical structure. Figure
2.3 (a) shows a decomposition that we have called a conceptual hierarchy for reasons that should be apparent from the illustration_ Polygons are grouped into
surfaces and surfaces are grouped into objects. For example, a cylinder possesses
three surfaces: a planar top and bottom surface together with a curved surface.
The reason for this grouping is that we must distinguish between those edges
that are part of the approximation- edges between adjacent rectangles in the
curved surface approximation to the cylinder, for example - and edges that exist
in reality. The way in which these are subsequently treated by the rendering
process is different - real edges must remain visible whereas edges that form part
of the approximation to a curved surface must be made invisible. Figure 2.3(b)
shows a more formal representation of the topology in Figure 2.3(a).
An example of a practical data structure which implements these relationships is shown in Figure 2.3(c). This contains horizontal, as well as vertical, hierarchicallinks, necessary for programmer access to the next entity in a h orizontal
sequence. It also includes a vertex reference list which means that actual vertices
(referred to by each polygon that shares them) are stored only once. Another
difference between the practical structure and the topological diagram is that
access is allowed directly to lower-level entities. Wireframe visualizations of an
object are used extensively, and to produce a wireframe image requires direct
access to the edge level in the hierarchy. Vertical links between the edges' and the
QD
Vertex
.,. ,r .
!
j / Edge
l :/
I '
'' ''
l)
Object
(a)
Figure 2.3
Representation of an object
as a mesh of polygons.
(a) Conceptual hierarchy.
(b) Topological
representation.
Surfaces
Polygons
Edges/vertices
Figure 2 .3 continued
(c) A practical data
structure.
Polygons
Edges
Vertex
ref. no.
Vertex
(c)
model, the most common manifestation of which is a winged-edge data structure (Mantyla 1988). An edge-based model represents a face in terms of a closing
sequence of edges.
The data structure just described encapsulates the basic geometry associated
with a polygonal facets of an object. Information required by applications and
renderers is also usually contained in the scene/object database. The following
list details the most common attributes found in polygon mesh structures. Th ey
are either data structure pointers, real numbers or binary flags. It is unlikely that
all of these would appear in a practical application, but a subset is found in most
object representations.
Polygon attributes
(1) Triangular or not .
(2) Area.
(3) Normal to the plane containing the polygon.
(4) Coefficients (A, B, C, D) of the plane containing the polygon
where Ax + By + Cz + D = 0.
(5) Whether convex or not.
(6) Whether it contains holes or not.
Edge attributes
(1) Length.
(2) Whether an edge is between two polygons or between two surfaces.
(3) Polygons on each side of the edge.
Vertex attributes
(1) Polygons that contribute to the vertex.
(2) Shading or vertex normal - the average of the normals of the polygons
that contribute to the vertex.
(3) Texture coordinates (u , v) specifying a mapping into a two-dimensional
texture image.
All th ese are absolute propereties that exist when the object is created. Polygons
can aquire attributes as they are passed through the graphics pipeline. For example, an edge can be tagged as a silhouette edge if it is between two polygons with
normals facing towards and away from the viewer.
A significant problem that crops up in many guises in computer graphics is
the scale problem. With polygonal representation this means that, in many
application s, we cannot affo rd to render all the polygons in a model if the viewing distan ce and polygonal resolution are such that many polygons project onto
a single pixel. This problem bedevils flight simulators (and similarly computer
games) and virtual reality applications. An obvious solution is to have a hierarchy of models and use the one appropriate to projected screen area. There are
two problems with this; the first is that in animation (and it is animation applications where this problem is most critical) switching between models can cause
visual disturban ces in the animation sequence - the user can see the switch from
one resolution level to another. The other problem is how to generate the hierarchy and to decide h ow many levels it should contain. Clearly we can start with
the highest resolution model and subdivide, but this is not n ecessarily straightforward. We look at this problem in more detail in Section 2.5.
Although a polygon mesh is the most common representational form in computer graph ics, modelling, although straightforward, is somewhat tedious. The
popularity of this representation derives from the ease of modelling, the emergence of rendering strategies (both hardware and software) to process polygonal
objects and the importan t fact that there is no restriction whatever on the shape
or complexity of the object being modelled.
Interactive development of a model is possible by 'pulling' vertices around
with a three-dimensional locator device but in practice this is not a very useful
method. It is difficult to make other than simple shape changes. Once an object
h as b een created, any single polygon cannot be ch anged without also changing
its n eighbours. Thus most creation methods use either a device or a program; the
only method t hat admits user interaction is item 4 on the following list.
Four common examples of polygon modelling methods are:
(1) Using a three-dimensional digitizer or adopting an equivalent manual strategy.
(2) Using an automatic device such as a laser ranger.
(3) Generating an object from a mathematical description.
(4) Generating an object by sweeping.
The first two modelling methods convert real objects into polygon meshes, the
next two generate models from definitions. We distinguish between models generated by mathematical formulae and those generated by interacting with curves
which are defined mathematically.
( 2 .1.3 )
Figure 2.4
The Utah Beetle - an early
example of manual modelling.
Source: Beatty and Booth
Figure 2.5
A rendered poly gonal object
scanned by a laser ranger
and polygonized by a
simple skinning algorithm.
(a) A skinning algorithm
joins points on consecutive
contours to make a threedimensional polygonal
object from t he contours.
(b) A 400 000 polygonal
object produced by a
skinning algorithm.
(a)
(b)
projects onto about half the screen surface implies that each triangle projects
onto one pixel on average. This clearly illustrates the point mentioned earlier
that it is extremely wasteful of rendering resources to use a polygonal resolution
where the average screen area onto which a polygon projects approaches a
single pixel. For model creation, laser rangers suffer from the significant disadvantage that, in the framework described- fully automatic rotating table device
- they can only accurately model convex objects. Objects with concavities will
have surfaces which will not necessarily be hit by the incident beam.
Many polygonal objects are generated through an interface into which a user
puts a model description in the form of a set of curves that are a function of twodimensional or two-parameter space. This is particularly the case in CAD applications where the most popular paradigm is that of sweeping a cross-section in
a variety of different ways. There are tvvo benefits to this approach. The first is
fairly obvious. The user works with some notion of shape which is removed from
the low level activity of constructing an object from individual polygonal facets.
Instead, shape is specified in terms of notions that are connected with the form
of the object - something that Snyder (1992) calls 'the logic of shapes'. A program then takes the user description and transforms it into polygons. The transformation from the user description to a polygon mesh is straightforward. A
second advantage of this approach is that it can be used in conjunction with
either polygons as primitive elements or with bi-cubic parametric patches (see
Section 3.6).
The most familiar manifestation of this approach is a solid of revolution
where, say, a vertical cross-section is swept through 180 generating a solid with
a circular horizontal cross-section (Figure 2.6(a)). The obvious constraint of
solids of revolution is th at they can on ly represent objects possessing rotational
symmetry.
A more powerful generative model is arrived at by considering the same solid
generated by sweeping a circle, with radius controlled by a profile curve,
Figure 2.6
Straight spine objects solid of revolution vs
cross-sectional sweeping.
(a) A solid of revolution
generated by sweeping
a (vertical) cross-section.
(b) The same solid can be
generated by sweeping
a circle, whose radius is
contro lled by a profile
curve, up a straig ht vertical
spine. (c) Non-circular crosssection.
Figure 2.7
Snyder's rail cu rve
product surfaces. Source:
J.M. Snyder, Generotive
Modelling for Computer
Grophics ond CAD,
Academic Press, 1992.
Cross-section
(a)
Radius
(b)
Scale
::
vertically up a straight spine (Figure 2.6(b)). In the event that the profile curve is
a constant, we have the familiar notion of extrusion . This immediately removes
the constraint of a circular cross-section and we can have cross-sections of arbitrary shape (Figure 2.6(c)).
Now consider controlling the shape of the spine. We can incorporate the
notion of a curved spine and generate objects that are controlled by a crosssectional shape, a profile curve and a spine curve as Figure 2.9 demonstrates.
Other possibilities emerge. Figure 2. 7 shows an example of what Snyder calls
a rail product surface. Here a briefcase carrying handle is generated by sweeping
a cross-section along a path determined by the midpoints of two rail curves. The
long axis extent of the elliptical-like cross-section is controlled by the same two
curves - hence the name. A more complex example is the turbine blade shown
in Figure 2.8. Snyder calls this an affine transformation surface - because the
spine is now replaced by affine transformations, controlled by user specified
curves. Each blade is generated by extruding a rectangular cross-section along
the z axis. The cross-section is specified as a rectangle, and three shape controlling curves, functions of z, supply the values used in the transformations of the
cross-section as it is extruded. The cross-section is, for each step in z, scaled
separately in x and y, translated in x, rotated around, translated back in x, and
extruded along the z axis.
Figure 2.8
Snyder's affine
transformation surface.
The generating curves are
shown for a single turbine
blade. Source: J.M. Snyder,
Generative Modelling for
Computer Graphics and CAD,
Academic Press, 1992.
Q (u) = au 3 + bu2 + cu + d
Cross-section
y scale
x scale
z rotate
l:'
T = V / IVI
Figure 2.9
Three problems in crosssectional sweeping.
(a) Controlling the size of
the polygons can become
problematic. (b) How
should the cross-section be
oriented with resepct to the
spine curve? (c) Selfintersection of the crosssection path.
......./
..
Spine
Cross-section path
..
............
\.:::.:.::.:'::-::'..........
__
(b)
(c)
... ..............
Figure 2.10
The Frenet frame at sample
point P on a sweep curve.
A procedure recursively subdivides a line (t1 , (1), (t z, ( z) generating a scalar displacement of the midpoint of the line in a direction normal to the line (Figure
V = 3a u + 2b u + c
2
2.11 (a)).
To extend this procedure to, say, triangles or quadrilaterals in threedimensional space, we treat each edge in turn generating a displacement along
a midpoint vector that is normal to the plane of the original facet (Figure
2. 11(b)). Using this technique we can take a smooth pyramid, say, made of large
triangular faces and turn it into a rugged mountain.
Fournier categorizes two problems in this method -as internal and external
consistency. Internal consistency requires that the shape generated should be
the same whatever the orientation in which it is generated, and that coarser
N=K!IKI
where:
K = VxA x V/I VI 4
B =T xN
( 2 .1.5 )
(a)
\
Figure 2.11
An example of procedural
generation of polygon mesh
objects- fractal terrain.
{a) Line segment
subdivision. (b) Triangle
subdivision.
(b)
details should remain the same if the shape is replotted at greater resolution. To
satisfy the first requirement, the Gaussian randoms generated m ust n ot be a
function of the position of the points, but should be unique to the point itself.
An invariant point identifier needs to be associated with each point. Thi s problem can be solved in terrain generation by giving each point a key value used to
index a Gaussian random number. A hash function can be used to map the two
keys of the end points of a line to a key value for the midpoint. Scale requirements of internal consistency means that the same random numbers must
always be generated in the same order at a given level of subdivision.
External consistency is harder to maintain. Wit hin the mesh of triangles
every triangle shares each of its sides with another; thus the same random displacements must be generated for corresponding points of different connecting
triangles. This is already solved by using the key value of each point and the
hash function, but another problem still exists, that of the direction of the
displacement.
If the displacements are along the surface normal of the polygon under consideration, then adjacent polygons which have different n ormals (as is, by definition, always the case) will have their midpoints displaced into different
positions. This causes gaps to open up. A solution is to displace the midpoint
along the average of the normals to all the polygons that contain it but this
problem occurs at every level of recursion and is consequently very expensive to
implement. Also, this technique would create an unsatisfactory skyline because
the displacements are not constrained to one direction . A better skyline is
obtained by making all the displacements of points internal to the original polygon in a direction normal to the plane of the original polygon. This cheaper
technique solves all problems relating to different surface normals, an d the gaps
created by them. Now surface normals need not be created at each level of recursion and the algorithm is considerably cheaper because of this.
Another two points are worth mentioning. Firstly, note that polygons should
be constant shaded without calculating vertex n ormals - discontinuities
between polygons should not be smoothed out. Secondly, consider colour. The
usual global colour scheme uses a height-dependent mapping. In detail, the
colour assigned to a midpoint is one of its end point 's colours. The colour chosen is determined by a Boolean random which is indexed by the key value of the
midpoint. Once again this must be accessed in this way to maintain consistency,
which is just as important for colour as it is for position.
of how it was built up. The 'logic of the shape' in this representation is in how
the final shape can be made or represented as a combination of primitive shapes.
The designer builds up a shape by using the metaphor of three-dimensional
building blocks an d a selection of ways in which they can be combined. The
high-level nature of the represen tation imposes a certain burden on the designer.
Although with hindsight the logic of the parts in Figure 2.14 is apparent;
the design of complex machine parts using this methodology is a demanding
occupation.
Th e motivation for this type of representation is to facilitate an interactive
mode for solid modelling. The idea is that objects are usually parts that will
eventually be manufactured by casting, machining or extruding and they can be
built up in a CAD program by using the equivalent (abstract) operations combining simple elementary objects called geometric primitives. These primitives
are, for exam ple, spheres, cones, cylinders or rectangular solids and they are
combined using (th ree-dimensional) Boolean set operators and linear transformations. An object representation is stored as an attributed tree. The leaves contain simple primitives and the nodes store operators or linear transformations.
The representation defines not only the shape of the object but its modelling history - th e creation of the object and its representation become one and the same
thing. The object is built up by adding primitives and causing them to combine
with existing primitives. Shapes can be added to and subtracted from (to make
holes) the current shape. For example, increasing the diameter of a hole through
a rectangular solid means a trivial alteration - the radius of the cylinder primitive defining the hole is simply increased. This contrasts with the polygon mesh
representation where the same operation is distinctly non-trivial. Even although
the constituent polygons of the cylindrical surface are easily accessible in a hierarchical sch eme, to generate a new set of polygons means reactivating whatever
modelling procedure was used to create the original polygons. Also, account has
to be taken of the fact that to maintain the same accuracy more polygons will
have to be used.
Boolean set operators are used both as a representational form and as a user
interface technique. A user specifies primitive solids and combines these using
the Boolean set operators. The representation of the object is a reflection or
recording of the u ser interaction operations. Thus we can say that the modelling
information and representation are not separate - as they are in the case of deriving a representation from low-level information from an input device. The lowlevel information in the case of CSG is already in the form of volumetric primitives. The modelling activity becomes the represen tation. An example will
demon strate the idea.
Figure 2.12 shows th e Boolean operations possible between solids. Figure
2.12(a) shows the union of two solids. If we consider the objects as 'clouds' of
points the union operation encloses all points lying within the original two bodies. The second example (Figure 2.12(b)) shows the effect of a difference or subtraction operator. A subtract operator removes all those points in the second
body that are contained within the first. In th is case a cylinder is defined and
Figure 2.12
Figure 2.13
(a)
0
(b)
(c)
assembly. Thus the only information that has to be stored in the leaves of the
tree is the name of the primitive and its dimensions. A node has to contain the
name of the operator and the spatial relationship between the child nodes combined by the operator.
The power of Boolean operations is fur ther demonstrated in the following
examples. In the first example (Figure 2.14(a)) two parts developed separately are
combined to make the desired configuration by using the union operator followed by a difference operator. The second example (Figure 2.14(b)) shows a
complex object constructed only from the union of cylinders, which is then used
to produce, by subtraction, a complex housing.
Although there are substantial advantages in CSG representation, they do suffer from drawbacks. A practical problem is the computation time required to produce a rendered image of the model. A more serious drawback is that the method
imposes limitations on the operations available to create and modify a solid.
Boolean operations are global - they affect the whole solid. Local operations, say
a detailed modification on one face of a complex object cannot be easily implemented by using set operations. An important local modification required in
many objects that are to be designed is blending surfaces. For example, consider
the end face of a cylinder joined onto a flat base. Normally for practical manufacturing or aesthetic reasons, instead of the join being a righ t angle in cross-
Figure 2.14
Examples of geometrically
complex objects produced
from simple objects and
Boolean operations.
(a)
(b)
section a radius is desired. A radius swept around another curve cannot be represen ted in a simple CSG system. This fact h as led to many solid modellers using
an underlying boundary representation. Incidentally there is no reason why
Boolean operations cannot be incorporated in boundary representations systems. For example, many systems incorporate Boolean operations but use a
boundary representation to represent the object. The trade-off between these
two representations has resulted in a debate that has lasted for 15 years. Finally
note that a CSG representation is a volumetric representation . The space occupied by the object - its volume - is represented rather than the object surface.
Space subdivision techniques are methods that consider the whole of object space
and in some way label each point in the space according to object occupancy.
However, unlike CSG, which uses a variety of volumetric elements or geometric
primitives, space subdivision techniques are based on a single cubic element
known as a voxel. A voxel is a volumetric element or primitive and is the smallest
cube used in the representation. We could divide up all of world space into regular or cubic voxels and label each voxel according to whether it is in the object or
in empty space. Clearly this is very costly in terms of memory consumption.
Because of this voxel representation is not usually a preferred mainstream method
but is used either because the raw data are already in this form or it is easiest to
convert the data into this representation - the case, for example, in medical
imagery; or because of the demands of an algorithm. For example, ray tracing in
voxel space has significant advantages over conventional ray tracing. This is an
example of an algorithmic technique dictating the nature of the object representation. Here, instead of asking the question: 'does this ray intersect with any
objects in the scene?' which implies a very expensive intersection test to be carried
out on each object, we pose the question: 'what objects are encountered as we
track a ray through voxel space?' This requires no exhaustive search through the
primary data structure for possible intersections and is a much faster strategy.
Another example is rendering CSG models (Section 4.3) which is not straightforward if conventional techniques are used. A strategy is to convert the CSG
tree into an intermediate data con sisting of voxels and render from this. Voxels
can be considered as an intermediate representation, most commonly in medical imaging where their u se links two-dimensional raw data with the visualization of three-dimensional structures. Alternatively the raw data may themselves
be voxels. This is the case with many mathematical modelling schemes of threedimensional physical phenomena such as fluid dynamics.
The main problem with voxel labelling is the trade-off between the
consumption of vast storage costs and accuracy. Con sider, for example, labelling
square pixels to represent a circle in two-dimensional space. The pixel size
/accuracy trade-off is clear here. The same notion extends to using voxels to
represent a sphere except that now the cost depends on the accuracy and the
cube of the radius. Thus such sch emes are only used in contexts where their
advantages outweigh their cost. A way to reduce cost is to impose a structural
organization on the basic voxel labelling scheme.
The common way of organizing voxel data is to use an octree - a hierarchical
data structure that describes how the objects in a scene are distributed throughout the three-dimensional space occupied by the scene. The basic idea is shown
in Figure 2.15. In Figure 2.1S(a) a cubic space is subject to a recursive subdivision
which enables any cubic region of the space to be labelled with a number. This
subdivision can proceed to any desired level of accuracy. Figure 2.1S(b) shows an
object embedded in this space and Figure 2.1S (c) shows the subdivision and the
/
/1
3 /
23
/ !1
4
n
2<
(-:.:~-
''
:'
v
26
25
''
'~
2xv
---------------t'. . . . .
'
'objects' would be polygons or patches. In general, an occupied region represented by a terminal node would intersect with several polygons and would be
represented by a list of pointers into the object data structures. Thus unlike the
other techniques that we have described octrees are generally not self-contained
representational methods. They are instead usually part of a hybrid sch eme.
I/ ) /
'
(a)
(c)
Figure 2.15
Octree representation.
(a) Cubic space and
labelling scheme, and the
octree for the two levels of
subdivision. (b) Object
embedded in space.
(c) Representation of the
object to two levels of
subdivision.
related octree that labels cubic regions in the space according to whether they
are occupied or empty.
There are actually two ways in which the octree decomposition of a scene can
be used to represent the scene. Firstly, an octree as described above can be used
in itself as a complete representation of the objects in the scene. The set of cells
occupied by an object constitute the representation of the object. However, for
a complex scene, high resolution work would require the decomposition of
occupied space into an extremely large number of cells and this technique
requires enormous amounts of data storage. A common alternative is to use a
standard data structure representation of the objects and to use the octree as a
representation of the distribution of the objects in the scene. In this case, a terminal node of a tree representing an occupied region would be represented by a
pointer to the data structure for any object (or part of an object) contained
within that region. Figure 2.16 illustrates this possibility in the two-dimensional
case. Here the region subdivision has stopped as soon as a region is encountered
that intersects only one object. A region represented by a terminal node is not
necessarily completely occupied by the object associated with that region. The
shape of the object within the region would be described by its data stmcture
representation. In the case of a surface model representation of a scene, the
As we have already implied, the most common use of octrees in computer graphics
is not to impose a data structure, on voxel data, but to organize a scene containing
many objects (each of which is made up of many polygons) into a stmcture of spatial occupancy. We are not representing the objects using voxels, but considering
the rectangular space occupied as polygons as entities which are represented by
voxel space. As far as rendering is concerned we enclose parts of the scene, at some
level of detail, in rectangular regions in the sense of Figure 2.16. For example, we
may include groups of objects, single objects, parts of objects or even single polygons in an octree leaf node. This can greatly speed up many aspects of rendering
and many rendering methods, particularly ray tracing as we have already suggested.
We will now use ray tracing as a particular example. The high inherent cost
in naive ray tracing resides in intersection testing. As we follow a ray through the
scene we have to find out if it collides with any object in the scene (and what
the position of that point is). In the case that each ray is tested against all objects
in the scen e, where each object test implies testing against each polygon in the
object, the rendering time, for scen es of reasonable complexity, becomes unacceptably high. If the scene is decomposed into an octree representation, then
tracing a ray means tracking, using an incremental algorithm from voxel to
voxel. Each voxel contains pointers to polygons that it contains and the ray is
tested against these. Intersection candidates are reduced from n to 111, where:
n = )' polygon count for object
~ects
Figure 2.16
Quadtree representation of
a two-dimensional scene
down to the level of cells
containing at most a single
object. Terminal nodes for
cells containing objects
would be represented by a
poin ter to a data structure
representation of the object.
Figure 2.17
Ascene consisting of a few
objects of high polygon
count. The objects are small
compared w ith t he volume
of the room.
/~
1-1-
w
e =empty
r=rod
b = box
c = circle
BSP trees
e
bebcbccc
rrc r c r c e
(2) The maxim um octree depth . The greater th e depth the greater the
decomposition and the fewer the candidate polygons at a leaf node. Also,
because the size of a voxel decreases by a factor of 8 at every level, the fewer
the rays that will enter the voxel for any given rendering.
In general the degree of decomposition should not be so great that the savings
gained on intersection are wiped out by the higher costs of tracking a ray
through decomposed space. Experience has shown that a default value of 8 for
the above two factors gives good results in general for an object (or objects) distributed evenly throughout the space. Frequently scenes are rendered where this
condition does not hold. Figure 2 .1 7 shows an example where a few objects with
high polygon count are distributed around a room whose volume is large compared to the space occupied by the objects. In this case octree subdivision will
proceed to a high depth subdividing mostly empty space.
Any object on one side of a plane cannot intercept any object on the other
side.
Given a view point in the scene space, objects on the same side as the viewer
are nearer than any objects on the other side.
When a BSP tree is used to represent a subdivision of space into cubic cells, it
sh ows n o significant advantage over a direct data structure encoding of the
octree. It is the same information encoded in a different way. However, nothing
said above requires that the subdivision should be into cubic cells. In fact the
Figure 2.18
Quadtree and SSP tree
representations of a onelevel subdivision of a twodimensional region.
y = 1023f314l
y =t)
[ill]
'
'
X =
X =
2 3 4
1023
2
Quadtree
BSP tree
idea of a BSP tree was originally introduced in Fuchs (1980) where the p lanes
used to subdivide space could be at any orientation. We revisit BSP trees in the
context of hidden surface removal (Chapter 6).
( 2.3.3)
f{P) ::; (1 - R2
)2
ds R
where d is the distance of the point to the generator and R is its radius of
influence.
Figure 2.19
An isosurface of equal
temperature around two
heat sources (solid line).
A scalar field F(P ) which determines the combined effect of the individual
potential functions of the generators. This implies the existence of a
blending method which in the simplest case is addition - we evaluate a
sc~lar field by evaluating the individual con tributions of each generator at a
pomt P and adding their effects togeth er.
As we have discussed, polygon mesh models are well established as the de facto
standard representational form in computer graphics but they suffer from significant disadvantages, notably that the level of detail, or number of polygons,
required to synthesize the object for a high quality rendition of a complex object
is very large. If the object is to be rendered on screen at different viewing distances the pipeline has to process thousands of polygons that project onto a few
pixels on the screen. As the projected polygon size decreases, the polygon overheads become significant and in real time applications this situation is intolerable. High polygon counts per object occur either because of object complexity or
because of the nature of the modelling system. Laser scanners and the output
from programs like the marching cubes algorithm (which converts voxels into
polygons) are notorious for producing very large polygon counts. Using such
facilities almost always results in a model that, when rendered, is indistinguishable from a version rendered from a model with far fewer faces.
As early as 1976, one of the pioneers of 3D computer graphics, James H. Clark,
wrote:
An example (Figure 2.20 Colour Plate) illustrates the point. The Salvador Dali
imitation on the left is an isosurface formed by point generators disposed in
space as shown on the right. The radius of each sphere is proportional to the
radius of influence of each generator. The dark spheres represent negative gen.
erators which are used to 'carve' concavities in the model. (Although we can
form concavities by using only positive generators, it is more convenient to use
negative ones as we require far fewer spheres.) The example illustrates the paten.
tial of the method for modelling organic shapes.
Deformable object animation can be implemented by displaying or choreographing the points that generate the object. The problem with using implicit
functions in animation is that there is not a good intuitive link between moving
groups of generators and the deformation that ensues because of this. Of course,
this general problem is suffered by all modelling techniques where the geometry
definition and the deformation method are one and the same thing.
In addition to this general problem, unwanted blending and unwanted
separation can occur when the generators are moved with respect to each
other and the same blending method retained.
A significant advantage of implicit functions in an animation context is the
ease of collision detection that results from an easy inside-outside function.
Irrespective of the complexity of the modelled surface a single scalar value
defines the isosurface and a point P is inside the object volume or outside it
depending on whether F(P) is less than or greater than this value.
It makes no sense to use 500 polygons in describing an object if it covers only 20 raster
units of the display ... For example, when we view the human body from a very large
distance, we might need to present only specks for the eyes, or perhaps just a block for the
head, totally eliminating the eyes from consideration ... these issues have not been
addressed in a unified way.
Did Clark realize that not many years after he had written these words that
500 000 polygon objects would become fairly commonplace and that complex
scenes might contain millions of polygons?
Existing systems tend to address this problem in a somewhat ad hoc manner.
For example, many cheap virtual reality systems adopt a two- or three-level representation switching in surface detail, such as the numbers on the buttons of a
telephone as the viewer moves closer to it. This produces an annoying visual disturbance as the detail blinks on and off. More considered approaches are now
being proposed and lately there h as been a substantial increase in the number of
papers published in this area.
Thus mesh optimization seems necessary and the problem cannot be dismissed by relying on increased polygon throughput of the workstations of the
future. Th e position we are in at the moment is that mainstream virtual reality
platforms produce a visually inadequate result even from fairly simple scenes.
We have to look forward not only to dealing with the defects in the image synthesis of such scenes, but also to being able to handle scenes of real world complexity implying many millions of polygons. The much vaunted 'immersive'
applications of virtual reality will never become acceptable unless we can cope
with scenes of such complexity. Current hardware is very far away from being
able to deal with a complex scene in real time to the level of quality attainable
for single object scenes.
from applications like computer games and virtual reality, the issue of efficient
scene management has become increasingly important. This means that representational forms have to be extended to collections of objects; in other words
the scene has to be considered as an object itself. This has generally meant using
hierarchical or tree structures, such as BSP trees to represent the scene down to
object and sub-object level. As rendering has increasingly migrated into real time
applications, efficiency in culling and hidden surface removal has become as
important as efficient rendering for complex scenes. With the advent of 3D
graphics boards for the PC we are seeing a trend develop where the basic rendering of individual objects is handled by hardware and the evaluation of which
objects are potentially visible is computed by software. (We will look into culling
and hidden surface removal in Chapters 5 and 6). An equally important effi
ciency measure for objects in complex scenes has come to be known as Level of
Detail, or LOD, and it is this topic that we will now examine.
@)
Figure 2.21
Asimple vertex deletion
criterion. Delete V? Measure
d, the distance from V to
the (average) plane through
the triangles that share V.
Selective refinement - an LOD representation may be used in a contextdependent manner. Hoppe gives the example of a user flying over a terrain
where the terrain mesh need only be fully detailed near the viewer.
@)
Figure 2.22
Hoppe's (1996) progressive
mesh scheme based on
edge collapse
transformations.
then we can generate a continuum of geomorphs between the two levels by having the edge shrink under control of the blending param eter as:
Vn := Vn + ad
M,,_,
Finest
(a)
mesh
M,
and
Vcz := Vrz - ad
Texture coordinates can be interpolated in the same way as can scalar attributes associated with a vertex such as colour.
The remaining question is: h ow are the edges selected for collapse in the
reduction from M; to Mi-t? This can be done either by using a simple heuristic
approach or by a m ore rigorous method that measures the difference between a
particular approximation and a sample of the original mesh. A simple metric
that can be used to order the edges for collapse is:
Mo
Coarsest
mesh
IVn- Vczl
INfl"N21
that is, the length of the edge divided by the dot product of the vertex normals.
On its own this m etric will work quite well, but if it is continually applied the
mesh will suddenly begin to 'collapse' and a more considered approach to edge
selection is mandatory. Figure 2.23 is an example that uses this technique.
Hoppe casts this as an energy function minimzation problem. A mesh M is
optimized with respect to a set of points X which are the vertices of the m esh Mn
together (optio nally) with points randomly sampled from its faces. (Although
this is a length y process it is, of course, executed once only as an off-line preprocess.) The energy function to be minimized is:
(b)
where
Ectist
for a vertex split. Hoppe quotes an example of an object with 13 546 faces which
was simplified to an Mo of 150 faces using 6698 edge collapse transformations.
The original data are then stored as Mo together with the 6698 vertex split
records. The vertex split records themselves exhibit redundancy and can be compressed using classical data compression techniques.
Figure 2.22(b) shows a single edge collapse between two consecutive levels.
The notation is as follows: Vrt and Vrz are the two vertices in the finer mesh that
are collapsed into one vertex Vc in the coarser mesh, where
d = Vu + Vrz
2
d2 (x;, M)
3234
From the diagram it can be seen that this operation implies the collapse of the
two faces {t and f2 into new edges.
Hoppe defines a continuum between any two levels of detail by using a blending parameter a . If we define:
= _L
Figure 2.23
The result of applying the
simple edge elimination
criterion described in the
text- the model eventually
breaks up.
2004
1540
908
SUMMARY
REPRESENTATION AND MODELLING OF THREE-DIMENSIONAL OBJECTS (1)
an approximation but designing a car door panel using a single patch results
in an exact representation. CSG representations are exact but we need to make
two qualifications. They can only describe that subset of shapes that is
possible by combining the set of supplied primitives. The representation is
abstract in that it is just a formula for the composite object- the geometry h as
to be derived from the formula to enable a visualization of the object.
is the sum of the squared distances from the points X to the mesh - when a vertex is removed this term will tend to increase.
>pring
{M) =
:1 KIIVj - Vkll
Summary
Object representations have evolved under a variety of influences - ease of rendering, ease of shape editing, suitability for animation, dependence on the
attributes of raw data and so on. There is no general solution that is satisfactory
for all practical applications and the most popular solution that has served us fo r
so many years - the polygon mesh- has significant disadvantages as soon as we
leave the domain of static objects rendered off-line. We complete this chapter by
listing the defining attributes of any representation. These allow a (very) general
comparison between the methods. (For completeness we have included comments on bi-cubic patches which are dealt with in the next chapter.)
@)
polygons that cannot be seen from the view point, for example. Rendering processes
involve operations like shading and texture mapping and are considerably more
costly than the geometric operations most of which involve matrix m ultiplication.
A diagram representing the consecutive process in a graph ics pipeline is
shown in Figure 5.1. From this it can be seen that the overall process is a progression through various three-dimensional spaces - we transform the object
representation from space to space. In the final space, which we have called
screen space, the rendering operations take place. Th is space is also th ree-dimensional for reasons that we will shortly discover.
~r--------===~
c~o~o~rd
~
in~a~t~
e ~s=p~a~
ce
~s~
in~
th~e==g=r=
a~
ph~
ic=scp~i=
p=
e~
lin
~e
~~==~~~~====~
@I)
5.1
5.2
5.3
Introduction
The purpose of a graphics pipeline is to take a description of a scene in threedimensional space and map it into a two-dimensional projection on the view surface - the monitor screen. Although various three-dimensional display devices
exist, most computer graphics view surfaces are two-dimensional, the most common being the TV-type monitor. In most VR applications a pair of projections is
produced and displayed on small monitors encased in a helmet - a head-mounted
display, or HMD. The only difference in this case is that we now have a pair of
two-dimensional projections instead of one - the operations remain the same.
In what follows we will consider the case of polygon mesh models. We can
loosely classify the various processes involved in the graphics pipeline by putting
them into one of two categories - geometric (this chapter) and algorithmic
(Chapter 6). Geometric processes involve operations on the vertices of polygonstransforming the vertices from one coordinate space to another or discarding
Figure 5.1
A three-dimensional
rendering pipeline.
LocI
t
a coord"mae
space
space
Object
definition
View space
Compose scene
Define view
reference
Define lighting
1
Modelling
transformation
I
Cull
Clip to 3D
view volume
View
transformation
I
3D screen
space
Hidden surface
removal
Rasterization
Shading
Dis play
space
()
virtual camera is often used as the analogy in viewing systems, but if such an
allusion is made we must be careful to distinguish between external camera pa.
rameters- its position and the direction it is pointing in- and internal camera
parameters or those that affect the nature and size of the image on the film
plane. Most rendering systems imitate a camera which in practice would be a
perfect pinhole (or lensless) device with a film plane that can be positioned at
any distance with respect to the pinhole. However, there are other facilities in
computer graphics that cannot be imitated by a camera and because of this the
analogy is of limited utility.)
We will now deal with a basic view coordinate system and the transformation
from world space to view coordinate space. The reasons that this space exists,
after all we could go directly from world space to screen space, is that certain
operations (and specifications) are most conveniently carried out in view space.
Standard viewing systems like that defined in the PHIGS graphics standard are
more complicated in the sense that they allow the user to specify more facilities
and we will deal with these in Section 5.3.
We define a viewing system as being the combina tion of a view coordinate
system together with the specification of certain facilities such as a view volume.
The simplest or minimum system would consist of the following:
Figure 5.2
The minimum entities
required in a pract ical
viewing system. (a) View
point C and viewing
direction N. (b) A view
plane normal to the viewing
direction N positioned d
units from C. (c) A view
coordinate system w ith the
origin C and UV axes
embedded in plane parallel
to the view plane. (d) A
view volume defined by the
frustum formed by C and
the view plane window.
/\ V{)
These entities are shown in Figure 5.2. The view coordinate system, UVN, has N
coincident with the viewing direction and V and U lying in a plane parallel to
the view plane. We can consider the origin of the system to be the view point C.
The view plane containing U and V is of infinite extent and we specify a view
volume or frustum which defines a window in the view plane. It is the contents
of this window- the projection of that part of the scene that is contained within
the view volume - that finally appears on the screen.
Thus, using th e virtual camera analogue we h ave a camera that can be positioned anywhere in world coordinate space, pointed in any direction and
rotated about the viewing direction N .
To transform points in world coordinate space we invoke a change of coordinate system transformation and this splits into two components: a translational
one and a rotational on e (see Chapter 1). Thus:
Y
Xv l =
z,~
T vlew
[ Yw
Xw l
z,...
1
'
A view point which establishes the viewer's position in world space; this can
either be the origin of the view coordinate system or the centre of pro jection
together with a view direction N.
(~)
/'
'
,/
,'
'~
(c)
,'
"
View plane
window
(d )
where:
and:
0 0-Cxl
1
0 1 0 -Cr .
T= [ 0 0 1-C
0 0 0 1
Ux Ur U,
Vx Vr V,
R =[ Nx Nr N,
0~ ]
The only problem now is specifying a user interface for the system and mapping
whatever parameters are used by the interface into U, V and N. A user needs to
specify C, Nand V. C is easy enough. N , the viewing direction or view plane normal, can be entered say, using two angles in a spherical coordinate system - this
seems reasonably intuitive:
@)
where:
<P
sin 8
N , = cos <P
Vis more problematic. For example, a user may require 'up' to be the same sense
as 'up' in the world coordinate system. However, this cannot be achieved by
setting:
v = (0, 0, 1)
because V must be perpendicular to N . A sensible strategy is to allow a user to
specify an approximate orientation for V , say V' and h ave the system calculate
V. Figure 5.3 demonstrates this. V' is the user-specified up vector. This is projected onto the view plane:
visibility:= Nr N > 0
where:
This results in a left-hand coordinate system, which although somewhat inconsistent, conforms with our intuition of a practical viewing system, which has
increasing distances from the view point as increasing values along the view
direction axis. Having established the viewing transformation using UVN notation, we will in subsequent sections use (xv, yv, Zv) to specify points in the view
coordinate system.
V'
Figure 5.3
The up vector V can be
calculated from an
' indication' given by V'.
(a)
>-
(b )
Figure 5.5
Culling or back-face
elimination. (a) The desired
view of the object (back
faces shown as dotted lines).
(b) A view of the geometry
of the culling operation.
(c) The culled object.
Figure 5.7
Clipping against a view
volume - a rout ine polygon
operation in the pipeline.
(a) Polygons outside the
view volume are discarded.
(b) Polygons inside the view
volume are retained.
(c) Polygons intersecting
a boundary are clipped.
@
'''
''
'
'
;(}/ 4
./
(a)
(b )
'
<J
(c)
Invisible
(c)
(b)
(a)
Xv
'fy a far c11p plane and use depth modulated fog to diminish
game we may speo
f
r
the disturbance as objects 'switch-on' when they suddenly intersect the ar c lp
pla~:,e further simplify the geometry by specifying a square view plane window
of dimension 2h arranged symmetrically about the viewing direction, then the
four planes defining the sides of the view volume are given by:
View volume
y,. =
hzv
d
Clipping against the view volume (Figure 5. 7) can now be carried out using polygon plane intersection calculations given in Section 1.4.3. This illustrates the
principle of clipping, but the calculations involved are more efficiently carried
out in three-dimensional screen space as we shall see.
Figure 5.6
A practical view volume:
the near clip plane is made
coincident w ith the view
plane.
@)
Figure 5.8
Two points projected onto
a plane using parallel and
perspective projections.
Projection p lane
--------------
--~---~
Figure 5.10
Deriving a perspective
transform ation.
--------------x.
Parallel projection
x,
PI
Pl
P(x., y, z.)
---~----
- -------_ _ __ _
-------
_ __
,.Pi
Perspective p rojection
Zv
= Zv/d
X] [Xvl
[~ ~ T~" ~
where:
Figure 5.9
In a perspective projection
a distan t line is displayed
smaller than a nearer line
the same length.
T pers = [
View plane
HH]
0 1/d 0
= Xv
Ys =
Yv
Zv
=0
Expressed as a matrix:
Tort= [
( 5.2.4 )
~o~sidering the view volume in Figure 5.6, the full perspective transformation
IS
Ys = Y/w
Zs = Z/w
~ ! ~ ~]
0 0
0 0 0 1
given by:
x,
d~
hzv
Ys
= dL
hzv
Zs
z, =A+ B!zv
((1 - d/z.)
(f- d)
r
e square screen
op mg a stmt1ar manipulation to Section 5.2.3, we have:
.
d
X =- Xv
h
Y=
z=
We now consider extending the above simple transformations to include the
simplified view volume introduced in Figure 5.6. We discuss in more detail the
transformation of the third component of screen space, namely z,- ignored so
far because the derivation of this transformation is somewhat subtle. Now, the
bulk of the computation involved in rendering an image takes place in screen
space. In screen space polygons are clipped against scan lines and pixels, and
hidden surface calculations are performed on these clipped fragments. In order
to perform hidden surface calculations (in the Z-buffer algorithm) depth information has to be generated on arbitrary points within the polygon . In practical
terms this means, given a line and plane in screen space, being able to intersect
the line with the plane, and to interpolate the depth of this intersection point,
lying on the line, from the depth of the two end points. This is only a meaningful operation in screen space providing that in moving from eye space to
screen space, lines transform into Jines and planes transform into planes. It can
be shown (Newman and Sproull 1973) that these conditions are satisfied provided the transformation of z takes the form:
{zv
f- d
df
f-d
W= Zv
giving:
where:
0
d/h
0
0
0
0
fl(f- d)
[5. 1]
We c~n now express the overall transformation from world space to screen space
as a smgle transformation
obtained by concatenating the view and perspecti ve
.
. .
transforma t10n
gtvmg:
where A and B are constants. These constants are determined from the following constraints:
The transformation of a box with one side parallel to the image plane is shown
in Figure 5.1 2. Here rays from the vertices of the box to the view point become
parallel in three-dimensional screen space.
The overall precision required for the screen depth is a function of scene
complexity. For most scenes 8 bits is insufficient and 16 bits usually suffices. The
effects of insufficient precision is easily seen when, for example, a Z-buffer
algorithm is used in conjunction with two intersecting objects. If the objects
exhibit a curve where they intersect, this will produce aliasing artefacts of
increasing severity as the precision of the screen depth is reduced.
Now return to th e problem of clipping. It is easily seen from Figure 5.12 that
in the homogeneous coordinate representation of screen space the sides of the
view volume are parallel. This means tha t clipping calculations reduce to limit
comparisons - we no longer have to substitute points into plane equations. The
clipping operations must be performed on the homogeneous coordinates before
the perspective divide, and translating the definition of the viewing frustum into
homogeneous coordinates gives us the clipping limits:
Figure 5.11
Illustrating the distortion in
three-dimensional screen
space due to the Zv to z,
transformation.
Zs = I
Zs
xs w
-w s y s w
0s zs w
-w s
T pers
00
0 0 (/([-d) - d(/(( - d)
0 1
00
d!h
0
0
0 0
1
O = Tpers2T persl
0 1
This enables a useful visualization of the p rocess. The first matrix is a scaling
(d/h) in x and y. This effectively converts the view volume from a truncated pyramid with sides sloping at an angle determined by hid into a regular pyramid with
sides sloping at 45 (Figure 5.13). For example, point:
(0, lz, d, 1) transforms to (0, d, d, 1)
and point:
(0, -lz, d, 1) transforms to (0, - d, d, 1)
The second transformation maps the regular pyramid in to a box. The near plane
maps into the (x, y) plane and the far plane is mapped into z=l. For example,
point:
Ys
Figure 5.12
Transformation of box and
light rays from eye space to
screen space.
z,
-:X
Figure 5.13
Transformation of the view
volume into a canonical
view volume (a box) using
two matrix transformations.
Yv
View volume
Zv
I
I
I
Near plane
Far plane
(a)
We start this section by overviewing the extensions that PHlGS offers over the
minimal viewing system described in the previous section. These are:
(1) The notion of a view point or eye point that establishes the origin of the view
Zv
(b)
(0, d,d)
Yv
+tr------.------o~
--~~--~------------~
- Ir-----~------
Near
plane
Far
plane
(c)
coordinate system and the centre of projection is now discarded. The equivalent
of view space in PHIGS is the view reference coordinate system (VRC)
established by defining a view reference point (VRP). A centre of projection is
established separately by defining a projection reference point (PRP).
(2) Feature (1) means that a line from the centre of projection to the centre of
the view plane window need not be parallel to the view plane normal. The
implication of th is is that oblique projections are possible. This is
equivalent, in the virtual camera analogy, to allowing the film plane to tilt
with respect to the direction in which the camera is pointing. This effect is
used in certain camera designs to correct for perspective distortion in such
contexts as, for example, photographing a tall building from the ground.
(3) Near and far clipping planes are defined as well as a view plane. In the
previous viewing system we made the back clipping plane coincident with
the view plane.
(4) A view plane window is defined that can have any aspect ratio and can be
positioned anywhere in the view plane. In the previous viewing system we
defined a square window symmetrically disposed about the 'centre' of the
view plane.
(5) Multiple view reference coordinate systems can be defin ed or many views of
a scene can be set up.
Consider the notion of distance in viewing systems. We have previously used the
idea of a view point distance to reflect the dominant intuitive notion that the
further the view point is from the scen e the smaller will be the projection of that
scene on the view plan e. The problem arises from th e fact that in any real system, or a general computer graphics system, there is no such thing as a view
point. We can have a centre of projection and a view plane, and in a camera or
eye analogy th is is fine. In a camera the view plane or film plane is contained in
the camera. The scene projection is determined both by the distance of the camera from the subject and the focal length of the lens. However, in computer
graphics we are free to move the view plane at will with respect to the centre of
projection and the scene. There is no lens as such. How then do we categ?rize
distance? Do we use the distance of the view plane from the world coordmate
origin, the distance of the centre of projection from the wo~ld ~oordinate origin
or the distance of the view plane from the centre of proJeCtiOn? The general
systems such as PHIGS leaves the user to answer that question. It is perhaps this
attribute of the PHIGS viewing system that makes it appear cumbersome.
PHIGS categorizes a viewing system into t hree stag:s (Figure 5. ~4).
Establishing the position and orientation of the view plane IS known as VIew
orientation. This requires the user to supply:
(1) The view reference point (VRP) - a point in world coordinate space.
(2) The viewing direction or view plane normal (VPN) - a vector in world
coordinate space.
(3} The view up vector (VUV) - vector in world coordinate space.
The second stage in the process is known as view mapping and d etermines how
points are mapped onto t h e view plane. This requires:
(NPC).
This infor mation is used to map information in the VRC or VRCs into normalized projection coordinates (NPC). NPC space is a cube with coordinates in each
direction restricted to th e range 0 to 1. The rationale for this space is to allow different VRCs to be set up when more than one view of a scene is required (and
mapped subsequently into different view ports on the screen). Each view has its
own VRC associated with it, and different views are mapped into NPC space.
The final processing stage is the workstation transformation, that is, the
normal device dependent transformation.
We now d escribe these aspects in detail.
World coordinates
Itransformation
Workstation J
Fig ure 5.14
Establishing a viewing
system PHIGS.
Device coordinates
u = (VUV)
N = (VPN)
Xu
(VPN)
An interface to establish the VPN can easily be set up using the suggestion given
in Section 5.1.3. (Note tha t the VUV must not be parallel to the VPN.) The geom etric relationship between the orientation and mapping parameters is shown
in Figure 5.1 5. Thus the view orientation stage establishes the position and orientation of the VRC relative to the world coordinate origin and the v iew plane
specification is defined relative to the VRC.
Figure 5.15
PHIGS - view orientation
and view mapping
parameters. Note that this
is a right-handed coordinate
system.
Figure 5.16
Geometry of the view
volume for parallel and
perspective projections.
View orientation
View volume
V\0Umin
Vm on
I
I
I
I
VPN {?
Near plane
(a)
Far plane
View plane
View plane window
PRP
Near plane
(b)
Figure 5.18
A uv coordinate system is
established in the view
plane forming a threedimensional left-handed
(right-handed for GKS-3 0
and PHIGS) system w ith the
VPN.
VPN
PRP
View plane window
View plane
YPN
(a)
~vie':
VPN
PRP
Figure 5.17
(a) 'Standard' projection and
(b) oblique projection
obtained by moving the PRP
vertically down in a direction
parallel to the view plane.
@)
be~md,
(b)
(1) A line from the PRP to the view plane window centre is parallel to the VPN.
(2) Moving the PRP results in an oblique projection. The condition in (1) is no
longer true.
Figure 5.19
The VUV establishes the
direction of the v axis
allowing the view plane to
'twist' about the VPN.
I
I
r;)
Figure 5.21
(a) The situation after
making the PRP the origin.
(b) After transforming to a
symmetrical view volume.
Figure 5.20
Establishing a twodimensional w indow
in the view plane.
Near plane
y
Far plane
View vo lume
View plane
-- -- --centre line
View plane
( 5.3.5 )
----11 ---.-
~--------~ --------~
(a)
Table 5.1
-".. tuouoo- oH o'oouoo uoouo.ooo ooooooooo ._,,., , ,,,,,,,,, ,,,,,,,,,u o"'' ' '' ''' "''''''' ... '''''''
Interface values
(b )
VPD
Far plane distance
Near plane distance
Um3A, Umln
Xmu, Xmin
VmdX1 Vm;n
Ym""'
Ymtn
o oooo~ooO'OHOOooO ooooooooo~oooooooooooo o o o. oooooooooooo o ooooooooo~oooooooooooo oo ooooooo oooooo ouooooooooooooooooooooooonoo ro oooo O. oo ,. oooo>ooo -. ooo o ooooo,.ooo ~ 0
where T persib and T persz have the same effect as T ptrsi and Tpersz of the previous viewing system. These are obtained by modifying T pers1 and T pers2 to include the view
plane window parameters and the separation of the near plane from the view plane.
(~)
We first n eed to shear such that the view volume centre line becomes coincident with the zv axis. This means adjusting x and y values by an amount proportional to z. It is easily seen that:
1 0
+ Xmin 0
Xmax
2d
T persla
0 1 Ymax+ Ymin 0
2d
0
1
0 0
1
0 0
0
For example, the upper and lower view plane window edges transform as follows:
(0,
Ymax,
-d, 1)
transforms to
X max
+ X min
2
Ymx 2
-d
Ymin
I
1 )
and
(0,
Ym1n,
-d, 1)
transforms to
Xmax
+ Xmin
2
I
Ymax -
Ymln - d 1 )
I
transforming the original view volume into a symmetrical view volume (Figure
S.21(b)).
The second transform is:
2d
T perslb
6.2
Shading pixels
6.3
6.4
Rasterization
6.5
Order of rendering
0 0
6.6
2d
0 0
6.7
1 0
0 1
Introduction
Xmax- Xmin
6.1
Ymax- Ymin
0
0
0
0
This scaling is identical in effect to T persi of the simple viewing system converting the symetrical view volume to a 45 view volume. For example, consider the
effect of T verst = T perstbT pers t a on the line through the view plane window centre.
This transforms the point:
( Xmax ; Xmin
Ymax ; Ymin
1
-d, 1)
tO
(0, 0, -d, 1)
making the line from the origin to this point coincident with the -z axis - the
required result.
Finally we have:
fl(f- n) -fnl(f- n)
1
In this chapter we will describe the algorithmic operations that are required to
render a polygon mesh object. We will describe a particular, but common,
approach which uses a hidden surface algorithm called the Z-buffer algorithm
and which utilizes some form of interpolative sh ading. The advantage of this
strategy is that it en ables us to fetch individual polygons from the object database in any order. It also means that there is absolutely no upward limit on scene
complexity as far as the polygon count is concerned. The enduring success of
this approach is due n ot only to these factors but also to the visual success of
interpolative shading techniques in making the piecewise linear nature of the
object almost invisible. The disadvantage of this approach, which is not without
importance, is its inherent inefficien cy. Polygons may be rendered to the screen
which are subsequently overwritten by po lygons nearer to the viewer.
In this renderer the processes that we need to perform are, rasterization, or
finding the set of pixels onto which a polygon projects, hidden surface removal
and shading. To this we add clipping against the view volume, a process that we
(@)
dealt with briefly as a pure geometric operation in the previous chapter. In this
chapter we will develop an algorithmic structure that 'encloses' the geometric
operation.
As we remarked in the previous chapter, these processes are mostly carried out
in three-dimensional screen space, the innermost loop in the algorithm being a
'for each' pixel structure. In other words, the algorithms are known as screen
space algorithms. This is certainly not always the case - rendering by ray tracing
is mostly a view space algorithm and rendering using the radiosity method is a
world space algorithm.
~nd this can be pre-calculated and stored as part of the database. When an object
Is processed by the 3D viewing pipeline, its local coordinate system origin (the
bounding sphere we assume is centred on this origin) is also transformed by the
pipeline.
If th e object is completely outside the view volume it can be discarded; if it is
entirely within the view volume it does not need to be clipped. If neither of
these conditions applies then it may need to be clipped. We cannot be certain
because alth ough the sphere may intersect the view volume the object that it
contains may not. This problem with bounding objects affects their use throughout computer graphics and it is further examined in Chapter 12.
For a sphere the conditions are easily shown (Figure 6. 1) to be:
We have already considered the principle of clipping in the previous chapter and
now we will describe an efficient structure for the task. In that analysis we saw
how to determine whether a single point was within the view volume. This is an
inefficient approach -we n eed a fast method for evaluating whether an object is
wholly outside, wholly inside or straddles the view volume. Clipping has
become an extremely important operation with the growth of polygon counts
and the demand for real time rendering. In principle we want to discard as many
polygons as possible at an early stage in the rendering pipeline. The two com.
mon approaches to avoiding detailed low-level testing are scene management
techniques and bounding volumes. (Bounding volumes th emselves can be considered a form of scene management.) We will look at using a simple bounding
volume.
It is possible to perform a simple test that will reject objects wholly outside
the view volume and accept those wholly within the view volume. This can be
achieved by calculating a bounding sphere for an object and testing this against
the view volume. The radius of the bounding sphere of an object is a fixed value
Zc
Zc
>r+n
Zc
> -r + f
Zc
Zc
Zc
Zc > - yc - V2r
z,. > -r+ n
Zc >r + f
where:
Y<
Xcl _ [ d/h
0 0 0
Y<
0 d/h
Zc
0
0
0 1 0
0 0 1
l [X
Yvl
Zv
In other words, this operation takes place in the clipping space shown in Figure
5.13.
45
Zc
Figure 6.1
Showing one of the
conditions for a bounding
sphere to lie wholly within
the view volume.
Objects that need to be clipped can be dealt with by the SutherlandHodgman re-entrant polygon clipper (Sutherland and Hodgman 1974). This is
easily ~xtended to three dimensions. A polygon is tested against a clip boundary
by testmg each polygon edge against a single infinite clip boundary. This structure is shown in Figure 6.2.
We consider the innermost loop of the algorithm, where a single edge is being
tested against a single clip boundary. In this step the process outputs zero, one or
two vertices to add to the list of vertices defining the clipped polygon. Figure 6.3
Figure 6.2
Sutherland- Hodgman
clipper clips each polygon
against each edge of each
clip rectangle.
SHADI NG PIXELS
Figure 6.4
Dot product test to
determine whether a line
is inside or outside a clip
bOundary.
Clip rectangle
Polygon
Outside
Inside
Clip boundary C
X
s
Bottom clip
Left clip
Right clip
=:> F is inside
Top clip
shows the four possible cases. An edge is defined by vertices S and F. In the first case
the edge is inside the clip boundary and the existing vertex F is added to the output Jist. In the second case the edge crosses the clip boundary and a new vertex I is
calculated and output. The third case shows an edge that is completely outside the
clip boundary. This produces no output. (The intersection for the edge that caused
the excursion outside is calculated in the previous iteration and the intersection for
the edge that causes the incursion inside is calculated in the next iteration.) The
final case again produces a new vertex which is added to the output list.
To calculate whether a point or vertex is inside, outside or on the clip boundary we use a dot product test. Figure 6.4 shows a clip boundary C with an outward normal Nc and a line with end points S and F . We represent the line
parametrically as:
P (t)
= S + (F -
::;. S is outside
[6.1]
S)t
where:
0s ts 1
Nc'(F - X) < 0
and:
N c'(P (t) - X)= 0
Case I -output F
Case 2 - output /
defines the point of intersection of the line and the clip boundary. Solving
Equation 6.1 for t enables the intersecting vertex to be calculated and added to
the output list.
In practice the algorithm is written recursively. As soon as a vertex is output
th e procedure calls itself with that vertex and no intermediate storage is required
fo r the partially clipped polygon. This structure makes the algorithm emi nently
suitable for h ardware implementation.
Shading pixels
Figure 6.3
Sutherland- Hodgman
clipper - within the polygon
loop each edge of a
polygon is tested against
each clip boundary.
Case 3 - no output
(@)
of 'global' techniques, such as ray tracing and radiosity, Phong shading has
remained ubiquitous. This is because it enables reality to be mimicked to an
acceptable level at reasonable cost.
There are two separate considerations to shading the pixels onto which a
polygon projects. First we consider how to calculate the light reflected at any
point on the surface of an object. Given a theoretical framework that enables us
to do this, we can then calculate the light intensity at the pixels onto which the
polygon projects. The first consideration we call'local reflection models' and the
second 'shading algorithms'. The difference is illustrated conceptually in Figure
6.5. For example, one of the easiest approaches to shading- Gouraud shadingapplies a local reflection model at each of the vertices to calculate a vertex intensity, then derives a pixel intensity using the same interpolation equations as we
used in the previous section to interpolate depth values.
Basically there is a conflict here. We only want to calculate the shade for each
pixel onto which the polygon projects. But the reflected light intensity at every
point on the surface of a polygon is by definition a world space calculation. We
are basing the calculation on the orientation of the surface with respect to a light
source both of which are defined in world space. Thus we use a 2D projection of
the polygon as the basis of an interpolation scheme that controls the world
space calculations of intensity and this is incorrect. Linear interpolation, using
equal increments, in screen space does not correspond to how the reflected
intensity should vary across the face of the polygon in world space. One of the
reasons for this is that we have already performed a (non-affine) perspective
transformation to get into screen space. Like many algorithms in 3D computer
graphics it produces an acceptable visual result, even using incorrect mathematics. However, this approach does lead to visible artefacts in certain contexts. The
comparative study in Chapter 18 has an illustration of an artefact caused by this.
A local reflection model enables the calculation of the reflec ted light intensity
fro m a point on the surface of an object. The development of a variety of local
reflection models is dealt with in Chapter 7, here we will confine ourselves to
considering, from a practical view point, the mpst common model and see how
it fits into a renderer.
This model, introduced in 1975 evaluates the intensity of the reflected light
as a function of the orientation of the surface at the point of interest with respect
to the position of a point light source and surface properties. We refer to such a
model as a local reflection model because it only considers direct illumination.
It is as if the object under consideration was an isolated object floating in free
space. Interaction with other objects that result in shadows and inter-reflection
are not taken into account by local reflection models. This point is emphasized
in Figure 6.6; in Chapter 10 we deal with global illumination in detail.
The physical reflection phen omena that the model simulates are:
1
These are illustrated in Figure 6.7 fo r a point light source that is sending an infinitely thin beam of light to a point on a surface. Perfect specular reflection
occurs when incident light is reflected, with out diverging, in the 'mirror' direction. Imperfect specular reflection is th at which occurs when a thin beam of
light strikes an imperfect mirror, that is a surface whose reflecting properties are
that of a perfect mirror but only at a microscopic level - because the surface is
physically rough. Any area element of such a surface can be considered to be
made up of thousands of tiny perfect mirrors all at slightly different orientations.
Screen space
,.11
I
Figure 6.5
Illustrating the difference
between local reflection
models and shading
algorithms. (a) Local
reflection models calculate
light intensity at any point P
on the surface of an object.
(b) Shading algorithms
interpolate pixel values from
calculated light intensities at
the polygon vertices.
'
.,
' '>
'
Direct
'>
\ ,'
I''
Direct
I'
'>
..
' ',
''
',
''
'''
',:.
', '
)'-
(a)
(b)
Figure 6.6
(a) A local reflection model
calculates intensity at P.
and Pa considering direct
illumination only. (b) Any
indirect ref lected lig ht from
A t o B or from B to A is not
taken in to accoun t.
A
(a)
(b )
SHADING PIXELS
Figure 6.8
The 'computer graphics'
surface.
Figure 6.7
The three reflection
phenomena used in
computer g raphics.
(a) Perfect specular
reflection. (b) Imperfect
specular reflection.
(c) Perfect diffuse reflection.
(a)
,,,
-' 1'
Transparent layer
Perfect diffuse reflection
Diffuse surface
(c)
Perfect specular reflection does not occur in practice but we use it in ray tracing
models (see Chapter 12) simply because calculating interaction due to imperfect
specular reflection is too expensive. A perfect diffuse surface reflects the light
equally in all directions and such a surface is usually called matte.
The Phong reflection model considers the reflection from a surface to consist
of three components linearly combined:
reflected light= ambient Ught +diffuse component + specular component
The ambient term is a constant and simulates global or indirect illumination.
This term is necessary because parts of a surface that cannot 'see' the light
source, but which can be seen by the viewer, need to be lit. Otherwise they
would be rendered as black. In reality such lighting comes from global or indirect illumination and simply adding a constant side-steps the complexity of
indirect or global illumination calculations.
It is useful to consider what types of surface such a model simulates. Linear
combination of a diffuse and specular component occurs in polished surfaces
such as varnished wood. Specular reflection results from the transparent layer
and diffuse reflection from the underlying surface (Figure 6.8). Many different
physical types, although not physically the same as a varnished wood, can be
approximately simulated by the same model. The veracity of this can be demonstrated by considering looking at a sample of real varnish ed wood, shiny plastic
and gloss paint. If all contextual clues are removed and the reflected light from
each sample exhibited the same spectral distribution, an observer would find it
difficult to distinguish between the samples.
As well as possessing the limitation of being a local model, the Ph ong reflection model is completely empirical or imitative. One of its major defects is that
the value of reflected intensity calculated by the model is a function only of the
viewing direction and the orientation of the surface with respect to the light
source. In practice, reflected light intensity exhibits bi-directional behaviour. It
depends also on the direction of the incident light. This defect has led
to much research into physically based reflection models, where an attempt
is made to model reflected light by simulating real surface properties. However,
the subtle improvements possible by using such models- such as the ability to
make surfaces look metallic - have not resulted in the demise of the Phong
reflection model and the main thrust of current research into rendering
methods dea ls with the limitation of 'localness'. Global methods, such as radiosity, result in much more significant improvements to the apparent reality of a
scen e.
Leaving aside, for a moment, the issue of colour, the physical nature of a surface is simulated by controlling the proportion of the diffuse to specular reflection and we have the reflected light:
I
Where the proportions of the three components, ambient, diffuse and specular
are controlled by three constants, where:
k, + kct + ks = 1
@)
SHADI NG PIXELS
where:
[; is the intensity of the incident light
e is the angle between the surface normal at the point of interest and the
direction of the light source
--
Figure 6.10
The Phong specular
component.
, I/
/ 1 '-
In vector notation:
I"= Ii (LN)
! , =/, (R V )"
v
where:
The behaviour of this equation is illustrated in Figures 6.11 and 6.12 (Colour
Plate). Figure 6.11 shows the light intensity at a single point P as a function of
the orientation of the viewing vector V . The semicircle is the sum of the constant ambient term and the diffuse term - which is constant for a particular
value of N. Addition of the specular term gives the profile shown in the figure.
As the value of n is increased the specular bump is narrowed. Figure 6.12 (Colour
Plate) shows the equation applied to the same object using different values of
k, and kct and n.
When n = x the surface is a perfect mirror - all reflected light emerges along the
mirror direction. For other values of n an imperfect specular reflector is simulated (Figure 6.7(b)). The geometry of this is shown in Figure 6.10. In vector
notation we have:
I,= l; (R V)"
--
, I /
/1'-
A number of practical matters that deal with colour and the simplification of the
geometry now need to be explained.
--
L'
'.I /
/ 1'-
N
R
Surface
Figure 6.9
The Phong diffuse
component.
Figure 6.11
The light intensity at point
P as a function of t he
orientation of th e viewing
vector V.
II=
The expense of the above shading equation, which is applied a number of times
at every pixel, can be considerably reduced by making geometric simplifications that
reduce the calculation time, but which do not affect the quality of the shading. First
if the light source is considered as a point source located at infinity then Lis constant
over the domain of the scene. Second we can also place the view point at infinity
making V constant. Of course, for the view and perspective transformation , the view
point needs to be firmly located in world space so we end up using a finite view point
fo r the geometric transformations and an infinite one for the shading equation .
Next the vector R is expensive to calculate and it is easier to define a vector
H (halfway) which is the unit normal to a hypothetical surface that is oriented
in a direction halfway between the light direction vector L and the viewing vector V (Figure 6.13). It is easily seen that:
H = (L + V) /2
This is the orientation that a surface would require if it was to reflect light maximally along the V direction. Our shading equation now becomes:
I= Iaka + Ji(kd (LN) + (NH)")
because the term (NH) varies in the same manner as (R- V). These simplifications mean that I is now a function only of N.
For coloured objects we generate three components of th e intensity I ,, I 8 and
Ib controlling the colour of the objects by appropriate setting of the diffuse reflection coefficients k,, kb and k8 In effect the specular highlight is just the reflection
of the light source in the surface of the object and we set the proportions of the
k, to match the colour of the light. For a white light, k, is equal in all three equations. Thus we have:
Ir = lakar + Ji((kdr(L-N) + ks(NH)")
Figure 6 .14
Light source represen ted
as a specularly reflecting
surface.
Object surface
Now <1> is the angle between -L, the direction of the point on the surface th at we
are considering, and L ,, the orientation of th e light source (Figure 6.14). The
val ue of L that we use in the shading eq uation is then given by:
- '-. 1 /
/ 1'-
Figure 6 .13
H is the normal to a surface
orientation that would
reflect all the ligh t along V.
the visibility of the polygon edges in the final shaded image. Information is
interpolated from values at the vertices of a polygon and the situation is exactly
analogous to depth interpolation.
( 6.3.1 )
( 6.3.2 )
I s,n
- t:.x-
Xb - Xa
= Is. 1>-l
Here we interpolate vertex normals across the polygon interior and calculate fo r
each polygon pixel projection an interpolated normal. This interpolated normal
is then used in the shading equation which is applied for every pixel projection.
This has the geometric effect (Figure 6.16) of 'restoring' some curvature to polygonally faceted surfaces.
The price that we pay for this improved model is efficiency. Not only is
the vector interpolation three times the cost of intensity interpolation, but each
vector has to be normalized and a shading equation calculated for each pixel
projection.
Incremental computations can be employed as with intensity interpolation ,
and the interpolation would be implemented as:
N,.. is then used to calculate an intensity at vertex A that is common to all the
polygons that share vertex A.
For computational efficiency the interpolation equations are implemented. as
incremental calculations. This is particularly important in the case of the thud
equation, which is evaluated for every pixel. If we defi ne t:.x to be the in~re
mental distance along a scan line then M , the change in intensity from one p1xel
to the next, is:
Iii, =
Where Nsx, Nsr and N,, are the components of a general scan line normal vector
N, and:
(Ib - I.)
+ /:;.!,
!:;.N,x =
Because the intensity is only calculated at vertices the method cannot adequately
deal with highlights and this is its major disadvantage. The cause of this defect can
be understood by examining Figure 6.16(a). We have to bear in mind that the
t:.x
Xh - Xa
(Nbx - Nax)
I
--~
lA
IH
(a)
Figure 6.15
The vertex normal NAis the
average of the normals N,,
N2, NJ, and N., the normals
of the polygon that meet at
the vertex.
Npl
Figure 6.16
Illustrating the difference
between Gouraud and
Phong shading. (a) Gouraud
shading. (b) Phong shading.
'
'
(b)
RASTERIZATION
11Nsy=
11Nsz =
( 6.3.3 )
( 6.3.4 )
t.x
Xo - Xa
t.x
Xb -
Xa
Gouraud shading is effective for shading surfaces which reflect light diffusely.
Specular reflections can be modelled using Gouraud shading, but the shape of
the specular highlight produced is dependent on the relative positions of the
underlying polygons. The advantage of Gouraud sh ading is that it is computationally the less expensive of the two models, requiring only the evaluation of
the intensity equation at the polygon vertices, and then bilinear interpolation of
these values for each pixel.
Phong shading produces highlights which are much less dependent on the
underlying polygons. But, more calculations are required involving th e interpolation of the surface normal and the evaluation of the intensity func tion fo r each
pixel. These facts suggest a straightforward approach to speeding up Phong shading by combining Gouraud and Phong shading.
(Nbz- Naz)
Most renderers have a hierarchy of shading options where you trade wait time
against the quality of the shaded image. This approach also, of course, applies to
the addition of sh adows and texture. The normal hierarchy is:
Wireframe No rendering or shading at all. A wireframe display may be
used to position objects in a scene by interacting with the viewing
parameters. It is also commonly used in animation systems where an
animator may be creating movement of objects in a scene interactively. He
can adjust various aspects of the animation and generate a rea l time
animation sequence in wireframe display mode. In both these applications
a full shaded image is obviously not necessary. One practical problem is that
using the same overall renderer strategy for wireframe rendering as for
shading (that is, independently drawing each polygon) will result in each
edge being drawn twice - doubling the draw time for an object.
Flat shaded polygons Again a fast option. The single 'true' polygon
normal is used, one shade calculated using the Gouraud equation for each
polygon and the shading interpolative process is avoided.
(Nbr - Nar)
(~)
These options are compared in some detail in the comparative case study in
Chapter 18.
Rasterization
Having looked at how general points within a polygon can be assigned intensities that are determined from vertex values, we now look at how we determine
the actual pixels which we require intensity values for. The process is known as
ras terization or scan conversion. We consider this somewhat tricky problem in
two parts. First, how do we determine the pixels wh ich the edge of a polygon
straddles? Second, h ow do we organize this information to determine the interior points?
( 6.4.1 )
Rasterizing edges
There are two different ways of rasterizing an edge, based on whether line drawing or solid area filling is being used. Line drawing is not covered in this book,
since we are interested in solid objects. However, the main feature of line drawing algorithms (for example, Bresenham's algorithm (Bresenham 1965)) is that
they must generate a linear sequence of pixels with no gaps (Figure 6.17).
v
I
Figure 6.17
Pixel sequences required
for (a) line drawing and
(b) polygon filling.
_,..v
/
j_
I'
(a)
.,. . /
(b )
., r--
RASTERIZATION
For solid area filling, a less rigorous approach suffices. We can fill a polygon using
horizontal line segments; these can be thought of as the intersection of the poly.
gon with a particular scan line. Thus, for any given scan line, what is requi red
is the left- and right-hand limits of the segment, that is the intersections of
the scan line with the left- and right-hand polygon edges. This means that
for each edge, we need to generate a sequence of pixels corresponding to the
edge's intersections with the scan lines (Figure 6.17(b)) . This sequence may
have gaps, when interpreted as a line, as shown by the right-hand edge in the
diagram.
The conventional way of calculating these pixel coordinates is by use of what
is grandly referred to as a 'digital differential analyzer', or DDA for short. All this
really consists of is finding how much the x coordinate increases per scan line,
and then repeatedly adding this increment.
Let (x,, y,), (xc, y.) be the start and end points of the edge (we assume that
Y > y,) The simplest algorithm for rasterizing sufficient fo r polygon edges is:
X:= Xs
m := (Xe - Xs)/(yc - y,)
for y := y, to Y do
output(round(x), y)
X:= X+ m
xi :=Xs
xf := - (y. - y,)
mi := (Xe - x,) div (y. - y,)
mf := 2*[(Xe - x,) mod(y. - y,)]
for y := y, to Y do
output(xi, y)
xi:= x i + mi
xf:= xf+ mf
if xf> 0 then {xi:= xi+ 1; xf:= xf- 2(y. - y,))
Although this appears now to involve two divisions rather than one, they are
both integer rather than floating point. Also, given suitable hardware, they can both
be evaluated from the same division, since the second (mod) is simply the remainder from the first (div). Finally it only remains to point out that the 2(yc - y,) within
the loop is constant and would in practice be evaluated just once outside it.
( 6 .4.2 )
Rasterizing polygons
Now that we know how to find pixels along the polygon edges, it is necessary to
turn our attention to filling the polygons themselves. Since we are concerned
with shading, 'filling a polygon' means finding the pixel coordinates of interior
points and assigning to these a value calculated using one of the incremental
shading schemes described in Section 6.3. We need to generate pairs of segment
end points and fill in horizontally between them. This is usually achieved by
constructing an 'edge list' for each polygon.
In principle this is done using an array of linked lists, with an element for
each scan line. Initially all the elements are set to NIL. Then each edge of the
polygon is rasterized in turn, and the x coordinate of each pixel (x , y) thus generated is inserted into the linked list corresponding to that value of y. Each of the
linked lists is then sorted in order of increasing x. The result is something like
that shown in Figure 6.18. Filling-in of the polygon is then achieved by, for each
scan line, taking successive pairs of x values and filling in between them (because
for y := y, to Y do
output(xi,y)
xi:= x i + mi
xf := xf+ m f
if xf> 0.0 then {xi := xi + 1; xf:= xf-1.01
Because the fractional part is now independent of the integer part, it is possible to scale it throughout by 2(y. - y,), with the effect of converting everything
to integer arithmetic:
~~
Figure 6 .18
An example of a linked list
maintained in ploygon
rasterization.
lL L.
y
Figure 6. 19
Problems w ith polygon
boundaries - a 9-pixel
polygon fills 16 pixels.
ORDER OF RENDERING
Figure 6.20
Three polygons intersecting
a scan line.
Incidentally, in rules (2) and (3), whether the first or last element is ignored
is arbitrary, and the choice is based around programming convenience. The four
possible permutations of these two rules define the sample point as one of the
four corners of the pixel. The effect of these rules can be demonstrated in Figure
6.20. Here we have th ree ad jacent polygons A, Band C, with edges a, b, c and d.
The rounded x values produced by these edges for the scan shown are 2, 4 ,4 and
7 respectively. Rule 3 then gives pixels 2 and 3 for polygon A, none for polygon
B, and 4 to 6 for polygon C. Thus, overall, there are no gaps, and n o overlapping.
The reason why horizontal edges are discarded is because the edges adjacent to
them will have already contributed the x values to make up the segment (for
example, the base of the polygon in Figure 6.18; note also that, for the sake of
simplicity, the scan conversion of this polygon was not done strictly in accordance with the rasterization rules mentioned above).
Order of rendering
There are two basic ways of ordering the rendering of a scene. These are: on a
polygon-by-polygon basis, where each polygon is rendered in turn, in isolation
from all the rest; and in scan line order, where the segments of all polygons in
that scene which cross a given scan line are rendered, before moving on to the
next scan line. In some textbooks, this classification has the habit of becoming
~opelessly confused with the classification of hidden surface removal algonthms. In fact, the order of rendering a scene places restrictions upon which hidden surface algorithms can be used, but is of itself independent of the method
employed for hidden surface removal. These are the common hidden surface
removal algorithms that are compatible with the two methods:
By polygon: Z-buffer.
By scan line: Z-buffer; scan line Z-buffer, spanning scan line algorithm.
taneously to hold in memory rasterization, shading, and perhaps texture information for all polygons which cross a particular scan line. The main drawback
of by-polygon rendering is that it does not make use of possible efficiency measures such as sharing information between polygons (for example, most edges in
a scene are shared between two polygons). The method can only be used with
the Z-buffer hidden surface algorithm, which as we shall see, is rather expensive
in terms of memory usage. Also, scan-line-based algorithms possess the property
that the completed image is generated in scan line order, which has advantages
for hardware implementation and anti-aliasing operations.
An important difference between the two rendering methods is in the construction of the edge list. This has been described in terms of rendering on a
polygon-by-polygon basis. If, however, rendering is performed in scan line order,
two problems arise. One is that rasterizing all edges of all polygons in advance
would consume a vast quantity of memory, setting an even lower bound on the
maximum complexity of a scene. Instead, it is usual to maintain a list of 'active
edges'. When a new scan line is started, all edges which start on that scan line
are added to the list, whilst those which end on it are removed. For each edge in
the active list, current values for x, shading information etc. are stored, along
with the increments for these values. Each time a new edge is added, these
values are initialized; then the increments are added for each new scan line.
The other problem is in determining segments, since there are now multiple
polygons active on a given scan line. In general, some extra information will
need to be stored with each edge in the active edge list, indicating which polygon it belongs to. The exact details of this depend very much upon the h idden
surface removal algorithm in use. Usually an active polygon list is maintained
that indicates the polygons intersected by the current scan line, and those therefore that can generate active edges. This list is updated on every scan line, new
polygons being added and inactive ones deleted.
The outline of a polygon-by-polygon renderer is thus:
for each polygon do
(~
The major hidden surface removal algorithms are described in most computer
graphics textbooks and are classified in an early, but still highly relevant, paper by
Sutherland et al. (1974) entitled 'A characterization of ten hidden-surface algorithms'. In this paper algorithms are characterized as to whether they operate primarily in object space or image (screen) space and the different uses of 'coherence'
that the algorithms employ. Coherence is a term used to describe the process where
geometrical units, such as areas or scan line segments, instead of single points, are
operated on by the hidden surface removal algorithm.
There are two popular approaches to hidden surface removal. These are scanline-based systems and Z-buffe r-based systems. Other approaches to hidden surface removal such as area subdivision (Warnock 1969), or depth list schemes
(Newell et al. 1972) are not particularly popular or are reserved for specialpurpose applications such as flight simulation.
The Z-buffer algorithm, developed by Catmull (1975), is as ubiquitou s in computer graphics as the Phong reflection model and interpolator, and the combination of these represents the most popular rendering option . Using
Sutherland's classification scheme (Sutherland eta/. 1974), it is an algorithm that
operates in image, that is, screen space.
Pixels in the interior of a polygon are shaded, using an incremental shading
scheme, and their depth is evaluated by interpolation from the z values of the
polygon vertices after the viewing transformation has been applied. The equations in Section 1.5 are used to interpolate the depth values.
The Z-buffer algorithm is equivalent, for each point (xs, y,) to a search through
the associated z values of each interior polygon point, to find that point with the
minimum z value. This search is conveniently implemented by using a Z-buffer,
that holds for a current point (x, y) the smallest z value so far encountered.
During the processing of a polygon we either write the intensity of a point (x, y )
into the frame buffer, or n ot, depending on whether the depth z, of the current
point, is less than the depth so far encountered as recorded in the Z-buffer.
One of the major advantages of the Z-buffe r is that it is independent of object
representation form. Although we see it used most often in the context of poly-
gon mesh rendering, it can be used with any representation - all that is required
is the ability to calculate a z value for each point on the surface of an object. It
can be used with CSG objects and separately rendered objects can be merged
into a multiple object scene using Z-buffer information on each object. These
aspects are examined shortly.
The overwhelming advantage of the Z-buffer algorithm is its simplicity of
implementation. Its main disadvantage is the amount of memory required for
the Z-buffer. The size of the Z-buffer depends on the accuracy to which the depth
value of each point (x, y) is to be stored; which is a function of scene complexity. Between 20 and 32 bits is usually deemed sufficient and the scene has to be
scaled to this fixed range of z so that accuracy within the scene is maximized.
Recall in the previous chapter that we discussed the compression of z, values.
This means that a pair of distinct points with different Zv values can map into
identical z, values. Note that for frame buffers with less than 24 bits per pixel,
say, the Z-buffer will in fact be larger than the frame buffer. ln the past, Z-buffers
have tended to be part of the main memory of the host processor, but now
graphics terminals are available with dedicated Z-buffers and this represents the
best solution.
The memory problem can be alleviated by dividing the Z-buffer into strips or
partitions in screen space. The price paid for this is multiple passes through the
geometric part of the renderer. Polygons are fetched from the database and rendered if their projection falls within the Z-buffer partition in screen space.
An interesting use of the Z-buffer is suggested by Foley et al. (1989). This
involves rendering selected objects but leaving the Z-buffer contents unmodified
by such objects. The idea can be applied to interaction where a threedimensional cursor object can be moved about in a scene. The cursor is the
selected object, and when it is rendered in its current position, the Z-buffer is not
written to. Nevertheless the Z-buffer is used to perform hidden surface removal
on the object and will move about the scene obscuring some objects and being
obscured by others.
( 6.6.2 )
Requicha 1986). The Z-buffer algorithm is driven from object surfaces rather
than pixel-by-pixel rays. Consider the overall structure of both algorithms.
Ray tracing
for each pixel do
generate a ray and find all object surfaces that intersect the ray
evaluate the CSG tree to determine the boundary of the first surface along the
ray
apply Z-huffer algorithm and slzade or 11ot
Z-buffer
for each primitive object do
for primitive surface F do
for each point P in a sufficiently dense grid 011 F do
c =fop b
For example, consider the operator Zmin We may have two sub-images, say of
single objects, that have been rendered separately, the Z values of each pixel in
the final rendering contain ed in the Z channel. Compositing in this context
means effecting hidden surface removal between the objects and is defined as:
RGBc = (if Zc < Z" then RGBc else RGBb)
Zc
= min(Zc, Zb)
The a parameter (0 s a s 1) is the fraction of the pixel area that the object covers. It is used as a factor that controls the mixing of the colours in the two images.
The use of the a channel effectively extends area anti-aliasing across the emu.
positing of images. Of course, this parameter is not n ormally calculated by a basic
Z-buffer renderer and because of thjs, the method is only suitable when used in
conjunction with the A-buffer hidden surface removal meth od (Carpenter 1984),
an anti-aliased extension to the Z-buffer described in Section 14.6.
Th e operator over is defined as:
RGBc =
~(f over b)
+ (1 -
~)(b
over
( 6.6.4)
Thus we have three con current bilinear interpolation processes and a triple nested
l~op. The z values and intensities, I, are available at each vertex and the in terpolation scheme for z and I is distributed between the two inner loops of the algorithm.
An extended version of the by-polygon algorithm with Z-buffer h idden
surface removal is as follows:
for all x, y do
Z-Buffer[x, y] := maximwn_depth
for each polygon do
construct an edge list from the polygon edges (that is, for each edge, calculate the
values of x, z and I for each scan line by interpolation and store them in the edge
list)
for y := Ymin to Ymtl)( do
for each segment in EdgeListfy] do
get X~err,XrigiJt, Ziefi,Znsilr, lte{t,lrl:;l't
for X := Xiefl to X,;8;,, do
linearly interpolate z and I between ZJtofi,Znsllt and l;,rr,I,;3;,1 respectively
if z < Z_Buffer[x,yj then
Z_Buffer[x,y] := z
frame_buffer[x,yj := I
The structure of the algorithm reveals th e major inefficiency of th e m ethod in
that shading calculations are performed on hidden pixels which are then either
ignored or subseq uently overwritten.
If Phong interpolation is used then the final reflection model calculations
:Vhich are a function of the interpolated normal, should also appear within th~
mnermost loop; that is, interpolate N rather than I, and replace the last line with :
frame_buffer[x,y] := ShadingFunction(N)
Scan line Z-buffer
There is a variatio n of the Z-buffer algorith m for use with scan-line-based render~rs, .known (not suprisingly) as a scan line Z-buffer. This is simply a Z-buffer
whtch IS only on e pixel high, and is used to solve the hidden surface problem
for a given scan line. It is re-initialized for each new scan line. Its chief advantage lies in the small am ount of memory it requires relative to a full-blown z_
buffer; and .it is common to see a scan line Z-buffer-based program runnin g on
systems whtch do n ot have sufficient m emo ry to support a full Z-buffer.
Spanning hidden surface removal
A spanning hidden surface removal algorithm attempts, for each scan line, to
fin d 'spans' across wh ich shading can be performed. The hidden surface removal
@)
problem is thus solved by dividing the scan line into lengths over which a single
surface is dominant. This means that shading calculations are performed only
once per pixel, removing the basic inefficiency inherent in the Z-buffer method.
Set against this is the problem that spans do not necessarily correspond to
polygon segments, making it harder to perform incremental shading calculations (the start values must be calculated at an arbitary point along a polygon
segment, rather than being set to the values at the left-hand edge). The other
major drawback is in the increase in complexity of the algorithm itself, as will be
seen.
Jt is generally claimed that spanning algorithms are more efficient than Zbuffer-based algorithms, except for very large numbers of polygons (Foley eta/.
1989; Sutherland et al. 1974). However, since extremely complex scenes are now
becoming the norm, it is becoming clear that overall, the Z-buffer is more efficient, unless a very complex shading function is being used.
..
........
,'"' ..........
'
. . ....
,'
,
............
/
I
I
,-'
.,.,.-'
,._"'"
'
I
I
'
'vt\
I ',
.,."'<(
: ',
I
I
I
I
t(
Scan line
I
:
..,. . ,
)""'
\
\_
'
\
I
1\
:
I
: .,.-.
V
I
I, '
;
,' 1
1
..1'
t, " I
": :
'
I
I
I
I
I
',
\.
( 6.6.7 )
,'
, .... ,
I
I
I
I
I
I
I
I
I
:
I
Spans
Th e basic idea, as has been mentioned, is that rather than solving the hidden
surface problem on a pixel-by-pixel basis using incremental z calculation, the
spanning scan line algorithm uses spans along the scan line over which there is
no depth conflict. The hidden surface removal process uses coherence in x and
deals in units of many pixels. The processing implication is that a sort in x is
required for each scan line and the spans have to be evaluated.
The easiest way to see how a scan line algorithm works is to consider the situation in three-dimensional screen space (x,, y,, z,). A scan line algorithm effectively moves a scan line plane, that is a plane parallel to the (xs, z,) plane, down
the y, axis. This plane intersects objects in the scene and reduces th e hidden surface problem to two-dimensional space (x,, z,). Here the intersection of th e scan
line plane with object polygons become lines (Figure 6.21). These line segments
are then compared to solve the h idden surface problem by considering 'spans'.
A span is that part of a line segment that is contained between the edge intersections of all active polygons. A span can be considered a coherence unit,
within the extent of which the hidden surface removal problem is 'constant' and
can be solved by depth comparisons at either end of the span. Note that a more
complicated approach has to be taken if penetrating polygon s are allowed.
It can be seen from this geometric overview that the first stage in a spanning
scan line algorithm is to sort the polygon edges by y, vertex values. This results
in an active edge list which is updated as the scan line moves down they, axis.
If penetrating polygons are not allowed, then each edge intersection with the
current scan line specifies a point on the scan line where 'something is changing', and so these collectively define all the span boundary points.
By going through the active edge list in order, it is possible to generate a set
of line segments, each of which represents the intersection of the scan line plane
with a polygon. These are then sorted in order of increasing x,.
Figure 6 .21
A scan line plane is moved
down through the scene
producing line segments
and spans.
The innermost loop then processes each span in the current scan line. Active
line segments are clipped against span boundaries and are thus subdivided by
these boundaries. The depth of each subdivided segment is then evaluated at one
of the span boundaries and hidden surface removal is effected by searching within
a span for the closest subdivided segment. This process is shown in Figure 6.22.
In pseudo-code the algorithm is:
for each polygon do
Generate and bucket sort in y, the polygon edge information
for each scan line do
for each active polygon do
Determine the segment or intersection of the scan plane and polygon
Smt these active segments in x,
Update the rate of change per scan line of the shading parameters
Generate the span boundaries
for each span do
Clip active segments to span boundaries
Evaluate the depth for all clipped segments at one of the span boundaries
Solve the hidden surface problem for the span with minimum z,
Shade the visible clipped line segment
Update the shading parameters for all other line segments by the rate of
change per pixel times the span width
Note that integrating shading information is far more cumbersome than with
the Z-buffer. Records of values at the end of clipped line segments have to be
kept and updated. This places another scene complexity overhead (along with
the absolute number of polygons overhead) on the efficiency and memory
requirements of the process.
(~)
Figure 6.22
Processing spans.
z,
z,
'
!""'~~
--+--'
.Y---
I
I
I
I
':'-..o
1 1'
~,:
1
I
1
I
1
1
IT........-/1
I
I
1
;1
x,
o .......... :
1
I
I .;9
II lI
~
: l
+'~I
z,
1
1
I
z,
~ t
1
l--
+--9
~
:I
I
l ~
I
I
I
x,
space just covers the projection of the face. Then it is simply a matter of com.
paring the nearest vertex depth of the face against the value in the Z-pyramid.
Using the Z-pyramid to test a complete polygon for visibility is the same except
that the screen space bounding box of the polygon is used.
In this way the technique tries to make the best of both object and image
space coherence. Using spatial subdivision to accelerate hidden surface removal
is a old idea and seems to have been first mooted by Scl!umaker et al. (1969).
Here the application was flight simulation where the real time constraint was, in
1969, a formidable problem.
Temporal coherence is exploited by retaining the visible cubes from the previous frame. For the current frame the polygons within these cubes are rendered
first and the cubes marked as such. The algorithm then proceeds as normal. This
strategy plays on the usual event that most of the cubes from the previous frame
will still be visible; a few will become invisible in the current frame and a few
cubes, invisible in the previous frame, will become visible.
( 6.6.9 )
Z-buffer summary
From an ease of implementation point-of-view the Z-buffer is the best algorithm.
It has significant memory requirements particularly for high resolution frame
buffers. However, it places no upward limit on the complexity of scenes, an
advantage that is now becoming increasingly important. It renders scenes one
object at a time and for each object one polygon at a time. This is both a natural
and convenient order as far as database considerations are concerned.
An important restriction it places on the type of object that can be rendered
by a Z-buffer is that it cannot deal with transparent objects without costly modification. A partially transparent polygon may:
(1) Be completely covered by an opaque nearer polygon, in which case there is
no problem; or,
(2) Be the nearest polygon, in which case a list of all polygons behind it must
be maintained so that an appropriate combination of the transparent
polygon and the next nearest can be computed. (The next nearest polygon
is not, of course, known until all polygons are processed.)
Compared with scan line algorithms, anti-aliasing solutions, particularly hardware implementations, are more difficult.
Cook, Carpenter and Catmull (1987) point out that a Z-buffer has an
extremely important 'system' advantage. It provides a 'back door' in that it can
combine point samples with point samples from other algorithms that have
other capabilities such as radiosity or ray tracing.
If memory requirements are too prodigious then a scan line Z-buffer is the
next best solution. Unless a renderer is to work efficiently on simple scenes, it is
doubtful if it is worth contemplating the large increase in complexity that a
spanning scan line algorithm demands.
Historically there has been a sh ift in research away from hidden surface problems to realistic image synthesis. This has been motivated by the easy availability of high spatial and colour resolution terminals. All of the 'classic' hidden
surface removal algorithms were developed prior to the advent of shading
complexities and it looks as if the Z-buffer will be the most popular survivor for
conventional rendering.
Figure 6.23
BSP operations for
a four object scene.
(a) Constructing a BSP tree.
(b) Descending the t ree
with the view point
coordinates gives
the nearest object.
(c) Evaluating a visibility
order for all objects.
split into two constituents. The process continues recursively until all polygons
are contained by a plane. Obviously the procedure creates more polygons than
were originally in the scene but practice has shown that this is usually less than
a factor of two.
The process is shown for a simple example in Figure 6.24. The first plane chosen, plane A, containing a polygon fro m object 1, splits object 3 into two parts.
The tree builds up as before and we now use the convention IN/OUT to say
which side of a partition an entity lies since this now has meaning with respect
to the polygonal objects.
Far to near ordering was the original scheme used with BSP trees. Rendering
polygons into the frame buffer in this order results in the so-called painter's algorithm - near polygons are written 'on top of' farther ones. Near to far ordering
can also be used but in this case we have to mark in some way the fact that a
pixel has already been visited. Near to far ordering can be advantageous in
extremely complicated scenes if some strategy is adopted to avoid rendering
completely occluded surfaces, fo r example, by comparing their image plane
extents with the (already rendered) projections of nearer surfaces.
Thus to generate a visibility order for a scene we:
3
B
(a)
3
Nearest
>-
(b)
View point
(c)
l st
4th
This results in a back to fron t ordering fo r the polygons with respect to the current view point position and these are rendered into the frame buffer in this
order. If this procedure is used then the algorithm suffers from the same efficiency disadvantage as the Z-buffer - rendered polygons may be subsequently
obscured. However, one of the disadvantages of the Z-buffer is immediately overcome. Polygon ordering allows the unlimited use of transparency with no additional effort. Transparent polygons are simply composited according to their
transparency value .
~
2nd
3rd
(number of polygons per object) is much greater than scene complexity (number of objects per scene) and for the approach to be useful we have to deal with
polygons within objects rather than entire objects. Also there is the problem of
positioning the planes - itself a non-trivial problem. If the number of objects is
small then we can have a separating plane for every pair of objects - a total of
n 2 for an n object scene.
For polygon visibility ordering we can choose planes that contain the face
polygons. A polygon is selected and used as a root node. All other polygons are
tested against the plane containing this polygon and placed on the appropriate
descendant branch. Any polygon which crosses the plane of the root polygon is
Figure 6.24
A BSP tree for polygons.
(~)
figure 6.25
Multi-pass super-sampling.
(a) Aliased image
(1 sample/pixel). (b) A one
component/pass of the
anti-aliased image (four
samples/pixel or four
passes). For this pass the
view point is moved up
and to the left by 'lz pixel
dimension).
View point
(a)
-..
'....
(b)
....,
...;
Figure 6.16
Simulating depth of field by
shearing the view frustum
and translating the view
point.
(.
'
''
'
'
(4
View plane
Soft sh adows are easily created by accumulati ng n passes and ch anging the
position of a point ligh t source between passes to simulate sampling of an ~rea
source. Clearly this approach will also enable shadows from separate light
sources to be rendered.
7.1
7.2
7.3
7.4
7.5
7.6
7.7
Pre-computing BRDFs
7.8
Introduction
Local reflection models, and in particular the Phong model (introduced in Chapter
5), have been part of mainstream rendering since the mid-1970s. Combined with
interpolative shading of polygons, local reflection models are incorporated in
almost every conventional renderer. The obvious constraint of locality is the
strongest disadvantage of such models but despite the availability of ray tracers and
radiosity renderers the mainstream rendering approach is still some variation of the
strategy described earlier - in other words a local reflection model is at the heart of
the process. However, nowadays it would be difficult to find a renderer that did not
have ad hoc additions such as texture mapping and shadow calculation (see
Chapters 8 and 9). Texture mapping adds interest and variety, and geometrical
shadow calculations overcome the most significant drawback of local models.
Despite the understandable emphasis on the development of global models,
th ere has been some considerable research effort into improving local reflection
models. However, not too much attention h as been paid to these, and most
Figure 7.10
Reflection behaviour due to
Hanrahan and Kreuger's
model (after Hanrahan and
Kreuger (1 993)).
\~
\
..1J. .
Mapping techniques
Surface specular
reflection
Subsurface reflection
and transmission
of incidence of the light. For a plane surface the amount of light entering the surface depends on Fresnel's law - the more light that enters the surface, the higher
will be the contribution or influence from subsurface events to the total reflected
light L,. So the influence of Lrv depends on the angle of incidence. Subsurface scattering depends on the physical properties of the material. A material is modelled
by a suspension of scattering sites or particles and parametrized by absorption
and scattering cross-sections. These express the probability of occurrence per unit
path length of scattering or absorption. The relative size of these parameters
determines whether the scattering is forward, backward or isotropic.
The effect of these two factors is shown, for a simple case, in Figure 7.10. The
first row shows high/low specular reflection as a function of angle of incidence.
The behaviour of reflected light is dominated by surface scattering or specular
reflection when the angle of incidence is high and by subsurface scattering when
the angle of incidence is low. As we have seen, this behaviour is modelled, to
a certain exten t, by the Cook and Torrance model of Section 7.6. The second
row shows reflection lobes due to subsurface scattering and it can be seen that
materials can exhibit backward, isotropic or forward scattering behaviour. (The
bottom lobes do not, of course, contribute to Lr but are nevertheless important
when considering materials that are made up of multiple layers and thin translucent materials that are backlit.) The third row shows that th e combination of L,
and Lrv will generally result in non-isotropic behaviour which exhibits the following general attributes:
Such factors result in the subtle differences between the model and Lambert's law.
CJ
8.1
8.2
8.3
Billboards
8.4
Bump mapping
8.5
Light maps
8.6
8.7
8.8
8.9
Introduction
In this chapter we will look at techniques which store information (usually) in a
two-dimensional domain which is used during rendering to simulate textures.
The mainstream application is texture mapping but many other applications are
described such as reflection mapping to simulate ray tracing. With the advent of
texture mapping hardware the use of such facilities to implement real time rendering has seen the development of light maps. These use the texture facilities
to enable the pre-calculation of (view independent) lighting which then
' reduces' rendering to a texture mapping operation.
Texture mapping became a highly developed tool in the 1980s and was the
technique used to enhance Phong shaded scenes so that they were more visually
interesting, looked more realistic or esoteric. Objects that are rendered using
only Phong shading look plastic-like and texture mapping is the obvious way to
add interest with out m uch expense.
Texture mapping developed in parallel with research into global illumination
algorithms - ray tracing and radiosity (see Chapters 10, 11 and 12). It was a
device that could be used to enhance the visual interest of a scene, rather than
MAPPING TECHNIQUES
its photo-realism and its main attraction was cheapness - it could be grafted
onto a standard rendering method without adding too much to the processing
cost. This contrasted to the global illumination methods which used completely
different algorithms and were m uch more expensive than direct reflection
models.
Another use of texture mapping that became ubiquitous in the 1980s was to
add pseudo-realism to shiny animated objects by causing their surrounding
environment to be reflected in them. Thus tumbling logos and titles became
chromium and the texture reflected on them moved as the objects moved. This
technique - known as environment mapping- can also be used with a real photographed environment and can help to merge a computer animated object with
a real environment. Environment mappi ng does not accomplish anything that
could not be achieved by ray tracing - but it is much more efficient. A more
recent use of environment mapping techniques is in image-based rendering
which is discussed in Chapter 16.
As used in computer graphics, 'texture' is a somewhat confusing term and
generally does not mean controlling the small-scale geometry of the surface of a
computer graphics object- the normal meaning of the word. It is easy to modulate the colour of a Phong shaded object by controlling the value of the three
diffuse coefficients and this became the most common object parameter to be
controlled by texture mapping. (Colour variations in the physical world are not,
of course, generally regarded as texture.) Thus as the rendering proceeds at pixelby-pixel level, we pick up values for the Phong diffuse reflection coefficients and
the diffuse component (the colour) of the shading changes as a function of the
texture map(s). A better term is colour mapping and this appears to be coming
into common usage.
This simple pixel-level operation conceals many difficulties and the geometry
of texture mapping is not straightforward. As usual we make simplifications that
lead to a visually acceptable solution. There are three origins to the difficulties:
(1) We mostly want to use texture mapping with the most popular representation
Aliasing breaks this up and the resulting mess is usually high visible. This
effect occurs as the periodicity in the texture approaches the pixel
resolution.
We. now list the possible ways in which certain properties of a computer
graphiCs model can be modulated with variations under control of a texture
map. We have listed these in approximate order of their popularity (which also
tends to relate to their ease of use or implementation). These are:
(1) Colour As we h ave already pointed out, this is by far the most common
object property that is controlled by a texture map. We simply modulate the
diffuse reflection coefficients in the Phong reflection model with the
corresponding colour from the texture map. (We could also change the
specular coefficients across the surface of an object so that it appears shiny
and matte as a function of the texture map. But this is less common as
being able to perceive this effect on the rendered object depends 'on
producing specular h ighlights on the shiny parts if we are using the basic
Phong reflection model.)
(2) Specular 'colour' This technique- known as environment mapping or
chrome mapping - is a special case of ray tracing where we use texture map
techniques to avoid the expense of full ray tracing. The map is designed so
that it looks as if the (specular) object is reflecting the environment or
background in which it is placed.
(3) Normal vector perturbation This elegant technique applies a
perturbation to the surface normal according to the corresponding value in
the map. The technique is known as bump mapping and was developed by
a famous pioneer of three-dimensional computer graphic techniques ]. Blinn. The device works because the intensity that is returned by a Phong
shading equation reduces, if the appropriate simplifications are made, to a
function of the surface normal at the point currently being shaded. If the
surface normal is perturbed then the shading changes and the surface that
is rendered looks as if it is textured. We can therefore use a global or general
definition for the texture of a surface which is represented in the database as
a polygon mesh structure.
(4) Displacement mapping Related to the previous technique, this
mapping method uses a height field to perturb a surface point along the
direction of its surface normal. It is not a convenient technique to
implement since the map must perturb the geometry of the model rather
than modulate parameters in the shading equation.
(5) Transparency A map is used to control the opacity of a transparent
object. A good example is etched glass where a shiny surface is roughened
(to cause opacity) with some decorative pattern.
There are many ways to perform texture mapping. The choice of a particular
met~od depends mainly on time constraints and the quality of the image
requ1red. To start with we will restrict the discussion to two-dimensional texture
(~)
MAPPING TECHNIQUES
INTRODUCTION
maps- the most popular and common form- used in conjunction with polygon mesh objects. (Many of the insights detailed in this section are based on
descriptions in Heckbert's (1986) defining work in this area.)
Mapping a two-dimensional texture map onto the surface of an object then
projecting the object into screen space is a two-dimensional to two-dimensional
transformation and can thus be viewed as an image warping operation. The most
common way to do this is to inverse map - for each pixel we find its corresponding 'pre-image' in texture space (Figure 8.1(b)). However, for reasons that
will shortly become clear, specifying this overall transformation is not straightforward and we consider initially that texture mapping is a two-stage process
that takes us from the two-dimensional texture space into the three-dimensional
space of the object and then via the projective transform into two-dimensional
screen space (Figure 8.1(a)). The first transformation is known as parametrization
and the second stage is the normal computer graphics projective transformation.
The parametrization associates all points in texture space with points on the
object surface.
The use of an anti-aliasing method is mandatory with texture mapping. This
is easily seen by considering an object retreating away from a viewer so that its
projection in screen space covers fewer and fewer pixels. As the object size
decreases, the pre-image of a pixel in texture space will increase covering a larger
area. If we simply point sample at the centre of the pixel and take the value of
T(u, v) at the corresponding point in texture space, then grossly incorrect results
will follow (Figure 8.2(a), (b) and (c)). An example of this effect is shown in Figure
8.3. Here, as the chequerboard pattern recedes into the distance, it begins to break
up in a disturbing manner. These problems are highly visible and move when animated. Consider Figure 8.2(b) and (c). Say, for example, that an object projects
onto a single pixel and moves in such a way that the pre-image translates across
the T(u, v) . As the object moves it would switch colour from black to white.
Texture space
(11, v)
Figure 8.2
Pixels and pre-images in
T(u,v) space.
With
anti-aliasing
Without
anti-aliasing
D
Pixel
shade
With
anti-aliasing
Without
anti-aliasing
D
Pre-image of
pixel centre
(b)
D
Pixel
shade
With
anti-aliasing
Without
anti-aliasing
D
Pixel
shade
Screen s pace
Object space
(X" yw, Zw)
(x,. y,)
Air'
(d)
Surface
parametrization
(~)
Pre-pixel image
Inverse
mapping
D
Pixel
' - -- -- - - --..!
Projection
(a)
Figure 8.1
Two ways of viewing the
process of two-dimensional
texture mapping.
(a) Forward mapping.
(b) Inverse mapping.
Inverse mapping
'Pre-image' of pixel
(b)
Anti-aliasing in this context then means integrating the information over the
pixel pre-image and using this value in the shading calculation for the current
pixel (Figure 8.2(d)). At best we can only approximate this integral because
we have no knowledge of the sh ape of th e quadrilateral, only its four corner
points.
(~
MAPPING TECHNIQUES
(~
X=
Figure 8.3
Aliasing in texture mapping.
The pattern in (b) is a supersampled (anti-aliased)
version of that in (a). Aliases
still occur but appear at a
higher spatial frequency.
(a)
(b)
The most popular practical strategy for texture mapping is to associate, during
the modelling phase, texture space coordinates (u, v) with polygon vertices. The
task of the rendering engine then is to find the appropriate (11, v) coordinate for
pixels internal to each polygon. The main problem comes about because the
geometry of a polygon mesh is only defined at the vertices- in other words there
is no analytical parametrization possible. (If the object has an analytical defmition - a cylinder, for example - then we have a parametrization and the mapping of the texture onto the object surface is trivial.)
There are two main algorithm structures possible in texture mapping, inverse
mapping (the more common) and forward mapping. (Heckbert refers to these as
screen order and texture order algorithms respectively.) Inverse mapping (Figure
8.l(b)) is where the algorithm is driven from screen space and for every pixel we
find by an inverse mapping its 'pre-image' in texture space. For each pixel we
find its corresponding (u, v) coordinates. A filtering operation integrates the
information contained in the pre-image and assigns the resulting colour to the
pixel. This algorithm is advantageous if the texture mapping is to be incorporated into a Z-buffer algorithm where the polygon is rasterized and depth and
lighting interpolated on a scan line basis. The square pixel produces a curvilinear quadrilateral as a pre-image.
In forward mapping the algorithm is driven from texture space. This time a
square texel in texture space produces a curvilinear quadrilateral in screen space
and there is a potential problem due to holes and overlaps in the texture image
when it is mapped into screen space. Forward mapping is like considering the
texture map as a rubber sheet- stretching it in a way (determined by the parametrization) so that it sticks on the object surface thereby performing the normal
object space to screen space transform.
du + ev + f
+ hv + i
[8.1]
y = gu
gu + hu + i
where:
(x, y)
= (x'/w, y'/w)
and
(u, v)
= (11'/q, v'/q)
u'
l l[l
=
[ABC x'
~~~ ~
ei -
fh
ch - bi
fg-
di
ai - cg
bf- ce
cdaf
dh -eg
bg - ah
ae- bd
l[ l
x'
y'
Now recall that in most practical texture mapping applications we set up, during the modelling phase, an association between polygon mesh vertices and texture map coordinates. So, for example if we have the association for the four
vertices of a quadrilateral we can find the nine coefficients (a, b, c, d, e, f, g, h, i).
We thus have the required inverse transform for any point within the polygon.
This is done as follows. Return to the first half of Equation 8.1, the equation for
x. Note that we can multiply top and bottom by an arbitrary non-zero scalar constant without changing the value of y, in effect we only have five degrees of freedom - not six- and because of this we can, without loss of generality set i = 1.
Thus, in the overall transformation we only have 8 coefficients to determine and
our quadrilateral-to-quadrilateral association will give a set of 8 equations in 8
unknowns which can be solved by any standard algorithm for linear equations
-Gaussian elimination, for example. Full details of this procedure are given in
Heckbert (1986).
MAPPING TECHNIQUES
TWO-DIMENSIONAL TEXTURE MAPS TO POLYGON MESH OBJECTS
Figure 8.4
Two-stage mapping as a
toward process. (a) 5
mapping. (b) 0 mapping.
20 texture map
(u', v', q)
where:
@)
_____.
~
w
Intermediate
surface
(a)
(b)
(1) The first stage is a mapping from two-dimensional texture space to a simple
u = u'/q
v =v'/q
q = 1/z
(u, v)
u'/q, v'!q
Note that this costs two divides per pixel. For the standard incremental implementation of this interpolation process we need three gradients down each edge
(in the current edge-pair) and three gradients for the current scan line.
( 8.1.2)
The previous method for mapping two-dimensional textures is now undoubtedly the most popular approach. The method we now describe can be used
in applications where there is no texture coordinate-vertex coordinate correspondence. Alternatively it can be used as a pre-process to determine this
correspondence and the first method then used during rendering.
Two-part texture mapping is a technique that overcomes the surface parametrization problem in polygon mesh objects by using an 'easy' intermediate surface onto which the texture is initially projected. Introduced by Bier and Sloan
(1986), the method can also be used to implement environment mapping and is
thus a method that unifies texture mapping and environment mapping.
The process is known as two-part mapping because the texture is mapped
onto an intermediate surface before being mapped onto the object. The intermediate surface is, in general, non -planar but it possesses an analytic mapping
function and the two-dimensional texture is mapped onto this surface without
difficulty. Finding the correspondence between the obj ect point and the texture
point then becomes a three-dimensional to three-dimensional mapping.
The basis of the method is most easily described as a two-stage forward map
ping process (Figure 8.4):
(2) A second stage maps the three-dimensional texture pattern onto the object
surface.
T' (X;,
y;, Z;)
-+
-- ( cr (8 - Bo), d1 (h - ho) )
where c and dare scaling factors and 8o and ho position the texture on the cylinder of radius r.
Various possibilities occur for the 0 mapping where the texture val ues for
O(xw, Yw, Zw) are obtained from T '(x;, y;, Zi), and these are best considered fro m a
ray tracing point of view. The four 0 mappings are shown in Figure 8.5 and are:
(I ) The intersection of the reflected view ray with the intermediate surface, T'.
MAPPING TECHNIQUES
Figure 8.5
The tour possible 0
mappings that map the
intermediate surface texture
T' onto the object
( l ) Reflected ray
T ( u, v)
(8, h) -
(3)
N(x,y,
Z;)
(u, v)
T '(8,h)
(2) The intersection of the surface normal at (xw, yw, Zw) with T'.
(3) The intersection of a line through (xw, yw, zw) and the object centroid with T'.
(4) The intersection of the line from (xw, yw, Zw) to T' whose orientation is given
by the surface normal at (X;, )'1, z;). If the intermediate surface is simply a
plane then this is equivalent to considering the texture map to be a slide in
a slide projector. A bundle of parallel rays of light from the slide projector
impinges on the object surface. Alternatively it is also equivalent to threedimensional texture mapping (see Section 8.7) wh ere the field is defined by
'extruding' the two-dimensional texture map along an axis normal to the
plane of the pattern.
Inverse
mapping
0
U
(2)
T "(x;,y,z;)
(0. 0, Zw)
(1) Inverse map four pixel points to four points (xw, yw, Zw) on the surface of the
object.
(2) Apply the 0 mapping to find the point (8, h) on the surface of the cylinder.
In the shrinkwrap case we simply join the object point to the centre of the
cylinder and the intersection of this line with the surface of the cylinder
gives us (x;, y;, z;).
Xw,
Yw1 Zw ._ (8, h)
(I)
Screen
space
(x
y.)D
(8, h)
@)
@)
MAPPING TECHNIQUES
BILLBOARDS
Figure 8. 7 (Colour Plate) shows examples of mapping the same texture onto an
object using different intermediate surfaces. The intermediate objects are a plane
(equivalently no intermediate surface - the texture map is a plane), a cylinder
and a sphere. The simple shape of the vase was chosen to illustrate the different
distortions that each intermediate object produces. There are two points that can
be made from these illustrations. First, you can choose an intermediate mapping
that is appropriate to the shape of the object. A solid of revolution may be best
suited, for example, to a cylinder. Second, although the method does not place
any constraints on the shape of the object, the final visual effect may be deemed
unsatisfactory. Usually what we mean by texture does not involve the texture
pattern being subject to large geometric distortions. It is for this reason that
many practical methods are interactive and involve some strategy like predistorting the texture map in two-dimensional space until its produces a good
result wh en it is stuck onto the object.
(~)
Figure 8.9
Providing the viewing
direction is approximately
parallel to the ground plane,
objects like trees can be
represented as a billboard
and rotated a bout their
y axis so that they are
oriented normal to t he Los
vector.
MAPPING TECHNIQUES
BUMP MAPPING
8 = n - cos-1 (LosB n)
aP
au
aP
av
=Pu X P v
where:
where P ,. ~nd Pv are the partial derivatives lying in the tangent plane to the surface at. pom~ P . Wh~t we want to do is to have the same effect as displacing
the pomt P m the direction of the surface normal at that point by an amount
B(u, v) - a one-dimensional analogue is shown in Figure 8.11. That is:
is the viewing direction vector from the view point to the required position of the billboard in world coordinates
L os
Given 8 and the required translation we can then construct a modelling transformation for the geometry of the billboard and transform it. Of course, this simple example will only work if the viewing direction is parallel or approximately
parallel to the ground plane. When this is not true the two-dimensional nature
of the billboard will be apparent.
Billboards are a special case of impostors or sprites which are essentially precomputed texture maps used to by-pass normal rendering when the view point
is only changing slightly. These are described in detail in Chapter 14.
Locally the surface would n ot now be as smooth as it was before because of this
displacement and the normal vector N to the 'new' surface is given by differentiating this equation:
N = P',. + P ',.
P ',. = P ,. + B,.N + B(u, v)N ..
Bump mapping
Bump mapping, a technique developed by Blinn (1978), is an elegant device that
enables a surface to appear as if it were wrinkled or dimpled without the need to
model these depressions geometrically. Instead, the surface normal is angularly
perturbed according to information given in a two-dimensional bump map and
this 'tricks' a local reflection model, wherein intensity is a function mainly of the
surface normal, into producing (apparent) local geometric variations on a
smooth surface. The only problem with bump mapping is that because the pits
or depressions do not exist in the model, a silhouette edge that appears to pass
through a depression will n ot produce the expected cross-section. In other words
the silhouette edge will follow the original geometry of the model.
It is an important technique because it appears to texture a surface in the normal sense of the word rather than modulating the colour of a flat surface. Figure
8.10 (Colour Plate) sh ows examples of this technique.
Texturing the surface in the rendering phase, without perturbing the geometry, by-passes serious modelling problem s that would otherwise occur. If the
object is polygonal the mesh would have to be fine enough to receive the
perturbations from the texture map - a serious imposition on the original modelling phase, particularly if the texture is to be an option. Thus the technique
converts a two-dimensional height field B(u, v), called the bump map, and
which represents some desired surface displacement, into appropriate perturbations of the local surface normal. When. this surface normal is used in the
shading equation the reflected light calculations vary as if the surface had been
displaced.
Consider a point P(u, v) on a (parameterized) surface corresponding to B(u, v) .
We defin e the surface n ormal at the point to be:
@)
P( u)
Original Surface
B(u)
A bump m ap
P'(u)
Lenglhening or shonening
O(u) using B(u)
N'(u)
Figure 8 .11
A one-dimensional example
of the stages involved in
bump mapping (after Blinn
(1978)).
MAPPING TECHNIQUES
Figure 8.12
Geometric interpretation of
bump mapping.
/
//
Jt.l' ( . . .
I
The norm alized components of these vectors defines the matrix that transforms
points into tangent space:
I
I
I
I
I
Tx Tr Tz
I
I
- B,
B
L rs =
Bx Br Bz
[ Nx NY Nz
D is given by
D =B.,A-8,.8
(1) The object is rendered using a normal renderer with texture m apping
N= P,.xP11
(2) T and B are found at each vertex and the light vector transformed into tan-
If B is small we can ignore the final term in each equation and we have:
N = N + BuN X P v + BvP u X N
or
N = N + BuN X Pv- BvN
P 11
= N + (B"A - B.B)
=N+D
Then Dis a vector lying in the tangent plane th at 'pulls' N into the desired orientation and is calculated from the partial derivatives of th e bump m ap and the
two vectors in the tangent plane (Figure 8.12).
( 8-4.1 )
gent space.
(3) A second image is created in the same way but now the texture/vertex correspondence is shifted by small amounts in the direction of the X , Y components of L rs. We now have two image projections where the height field
or the bump map has been mapped o nto the object and shifted with respect
to the surface. If we subtract these two images we get the differential coefficient which is the required term D L . (Finding the differential coefficient of
an image by subtraction is a standard image processing technique- see, for
example, Watt and Policarpo (1998)).
(4) The object is rendered in the normal manner without any texture and this
component is added to the subtrahend calculated in step (3) to give the final
bump-mapped image.
Thus we replace the explicit bump mapping calculations with two texture
mapped rendering passes, an image subtract, a Gouraud shading pass then an
image added to get the final result.
= N-L + DL
The first component is the normal Gouraud component and the second component is found from the differential coefficient of two image projections fo rmed
by rendering the surface with the h eight field as a normal texture map. To do
this it is necessary to transform the light vector into tangent space at each vertex of the polygon. This space is defined by N , B and T, where:
(~ MAPPING TECHNIQUES
LIGHT MAPS
where:
a
equation. This is called surface caching because it stores the final value required
for the pixel onto which a surface point projects and because texture caching
hardware is used to implement it. If this strategy is employed then the texture
mapping transform and the transform that maps light samples on the surface of
the object into a light map should be the same.
Light maps were first used in two-pass ray tracing (see Section 10.7) and are
also used in Ward's (1994) RADIANCE renderer. Their motivation in these applications was to cache diffuse illumination and to enable the implementation of a
global illumination model that would work in a reasonable time. Their more
recent use in games engines has, of course, been to facilitate shading in real time.
The first problem with light maps is how do we sample and store, in a twodimensional array, the calculated reflected light across the face of a polygon in
three-dimensional space. In effect this is the reverse of texture mapping where
we need a mapping from two-dimensional space into three-dimensional object
space. Another problem concerns economy. For scenes of any complexity it
would clearly be uneconomical to construct a light map fo r each polygon rather we require many polygons to share a single light map.
Zhukov et a/. (1998) approach the three-dimensional sampling problem by
organizing polygon s into structures called 'polypacks' . Polygons are projected
into the world coordinate planes and collected into polypacks if their angle with
a coordinate plane does not exceed some threshold (so that the maximal projection plane is selected fo r a polygon) an d if their extent does not overlap in the
projection. The world space coordinate planes are subdivided into square cells
(the texels or 'lumels') and back projected onto the polygon. The image of a
square cell on a polygon is a parallelogram (whose larger angle s 102). These are
called patches and are the subdivided polygon elements for which the reflected
light is calculated. This scheme thus samples every polygon with almost square
elements storing the result in the light map (Figure 8.13).
These patches form a subdivision of the scene sufficient for the purpose
of generating light maps and a single light intensity for each patch can be
calculated using whatever algorithm the application demands (for example
Phong shading or radiosity). After this phase is complete there exists a set of
(parallelogram-shaped) samples for each polygon. These then have to be 'stuffed'
= - Bu(B.Pv)
b =- (Bv1Pu1 - B.,(T.Pv))
C
= IPu X P v1
For each point in the bump map these points can be pre-computed and a map
of perturbed normals is stored for use during rendering instead of the bump
map.
Light maps
Light maps are an obvious extension to texture maps that enable lighting to be
pre-calculated and stored as a two-dimensional texture map. We sample the
reflected light over a surface and store this in a two-dimensional map. Th us
shading reduces to indexing into a light map or a light modulated texture map.
An advantage of the technique is that there is no restriction on the complexity
of the rendering method used in the pre-calculation - we could, for example, use
radiosity or any view-independent global illumination method to generate the
light maps.
In principle light maps are similar to environment maps (see Section 8.6). In
environment mapping we cache, in a two-dimensional map, all the illumination
incident at a single point in the scene. With light maps we cache the reflected
light from every surface in the scene in a set of two-dimensional maps.
If an accurate pre-calculation method is used then we would expect the technique to produce better quality shading and be faster than Gouraud interpolation. This means that we can incorporate shadows in the final rendering. The
obvious disadvantage of the technique is that fo r moving objects we can only
invoke a very simple lighting environment (diffuse shading with the light source
at infinity). A compromise is to use dynamic shading for moving objects and
assume that they do not interact, as far as shading is concerned, with static
objects shaded with a light map.
Light maps can either be stored separately from texture maps, or the object's
texture map can be pre-modulated by the light map . If the light map is kept as
a separate entity then it can be stored at a lower resolution th an the texture map
because view-independent lighting, except at shadow edges, changes more
slowly than texture detail. It can also be high-pass filtered which will ameliorate
effects such as banding in the final image and also has the benefit of blurring
shadow edges (in the event that a h ard-edged shadow generation procedure has
been used).
If an object is to receive a texture then we can modulate the brightness of the
texture during the modelling phase so that it has the same effect as if the
(unmodulated) texture colours were injected into, say, a Phong sh ading
@)
Figure 8.13
Form ing a light map in the
'maximal' world coordinate
plane.
Back projection of
'lume l' onto polygon
forms a patch
(~)
MAPPING TECHNIQUES
where (x, y, z) is the point on the object corresponding to the texel (u, v). This
transform ation can be seen as a linear transformation in three-dimensional
space with the texture map embedded in the z = 1 plane. The coefficients are
fo und from the vertex/texture coordinate correspondence by inverting the U
m atrix in:
[ ;: ;: ; :] = [
~ ~
l[ l
g h i
we have:
A =XU-1
The inverse U" 1 is guaranteed to exist providing the three points are noncollinear. Note that in terms of our treatment in Section 8.1 this is a forward
mapping from texture space to object space. Examples of a scene lit using this
technique are shown in Figure 8. 14 (Colour Plate).
(~)
It is (geom~trically) correct only when the object becomes small with respect
to the envuonment that contains it. This effect is usually not noticeable in
the sense that we are not disturbed by 'wrong' reflections in the curved
sur~ace of a shiny object. The extent of the problem is shown in Figure 18.9
wh1ch sh ows the same object ray traced and environ ment mapped.
An object can only reflect the environment - not itself - and so the
technique is 'wrong' for concave objects. Again this can be seen in Figure
18.9 where the reflection of the spout is apparent in the ray traced image.
A separate map is required for each object in the scene that is to be
environment mapped.
In one common form of environment mapping (sphere mapping) a new
m ap is required whenever the view point changes.
R v = 2(N V)N - V
[8.2)
Figure 8.15(c) shows that, in practice, for a single pixel we should consider the
~eflection b~am, rather than a single vector, and the area subtended by the beam
m the map 1s then filtered for the pixel value. A reflection beam originates either
(~)
MAPPING TECHNIQUES
Figure 8 .15
Environment mapping
(a) The ray tracing model that part of the environment
reflected at point P is
determined by reflecting the
view ray R,. (b) We try to
achieve the same effect as
in (a) by using a function
of R, to index into a twodimensional map. (c) A pixel
subtends a reflection beam.
In real time polygon mesh rendering, we can calculate reflected view vectors
only at the vertices and use linear interpolation as we do in conventional texture
mapping. Because we expect to see fine detail in the resulting image, the quality
of this approach depends strongly on the polygon size.
In effect an environment map caches the incident illumination from all directions at a single point in the environment with the object that is to receive the
mapping removed from the scene. Reflected illumination at the surface of an
object is calculated from this incident illumination by employing the aforementioned geometric approximation - that the size of the object itself can be considered to approach the point and a simple BRDF which is a perfect specular term the reflected view vector. It is thus a view-independent pre-calculation technique.
R,
3D object
(a)
M(Rv)
Cubic mapping
2D environmenlal map
3D object
(b)
Area subtcnded in
(c)
View point
from four pixel corners if we are indexing the map for each ~ixel, or ~to~i~~~i~
gon vertices if we are using a fast (approximate) scheme. An lm~ortaf ~tion of
note here is that the area intersected in the environment map lS a Uf1 b
the curvature of the projected pixel area on the object surface. How~~er,. ;c:~~~
we are now using texture mapping techniques we can employ pre- erm
aliasing methods (see Section 8.8).
Figure 8.16
Cubic environment
mapping: the reflection
beam can range over
more than one map.
@)
(~)
MAPPING TECHNIQUES
f h
blems of a cubic map is that
ivalently
four walls and the floor and ceiling. One o t e pro .
if we are con sidering a reflection beam formed byt~IX~~~:~:~,i~~:;~nto more
by the reflected view vectors at a polygon vertex, e
bd' .ded so that
than one map (Figure 18.16). In that case the polygon can be su lVl
each piece is constrained to a single n:ap.
d
ine the mapping from the
With cubic maps we need an algonthm to eterm
.
1
(W' th
.
re two-dimenswna maps.
1
three-dimensional view vector mto one or roo
.
.
.
laced
te~hniques descr~bed i~
the
the
the
by a Simple calculatwn.) I we c .
be (the case if the view were
same coordinate frame as the envuonment map cu I
the world axes in both
constructed by pointing the (virtual or real) camera a ong
directions) , then the mapping is as follows.
For a single reflection vector:
b
This involves a simple
(1) Find the face it intersects - the map num er.
fl
d view vector
. n of the components of the normalized re ecte
companso
h
against the (unit) cube extent which is centred on t e ongm.
II
face2
(2) Map the components into (u, v) coordinates. For example, a point
(x, y, z) intersecting the face normal to the negative z axis is given by:
0.5
v = - z + 0.5
It= X+
Sphere mapping
The first use of environment mapping was by Blinn and Newell (1976) wherein
a sphere rather than a cube was the basis of the method used. The environment
map consisted of a latitude- longitude projection and the reflected view vector,
Rv, was mapped into (u, v) coordinates as:
11 =
..!. ( 1+ _!
2
1t
tan- 1
R,.r ))
Rvx
-1t
Rvz + 1
V = - -2
+zL
+
+Y
L +yL +yL +yL
+
I'
+X
y face 2
II
II
Figure 8.17
Cubic environment map
convention.
- x face 3
ll
-L
+x
-xface4
-X
+z
+X
x face 3
ll
+ X face 1
11
zfaceO
II
The m ain problem with this simple technique is the singularities at the poles. In
the polar area small changes in the direction of the reflection vector produce
large changes in (u, v) coordinates. As Rvz -+ 1, both R,., and R.1 _,. 0 and R,.,JR,.x
becomes ill-defined. Eq uivalently, as v -+ 1 or 0 the behaviour of u starts to break
down causing visual disturbances on the surface. This can be ameliorated by
modulating the horizontal resolution of the map with sin e (where e is the
elevation angle in polar coordinates).
An alternative sphere mapping form (Haeberli and Segal 1993; Miller et al.
1998) consists of a circular map which is the orthographic projection of the
reflection of the environment as seen in the surface of a perfect mirror sphere
(Figure 8.18) . Clearly such a map can be generated by ray tracing from the view
plane. (Alternat ively a photograph can be taken of a shiny sphere.) Although the
map caches the incident illumination at the reference point by using an orthographic projection it can be used to generate, to within the accuracy of the
process, a normal perspective projection.
To generate the m ap we proceed as follows. We trace a parallel ray bundle one ray for each texel (u, v) and reflect each ray from the sphere. The point on
the sphere at the point h it by the ray from (u, v) is P , where:
Environment
Figure 8.18
Constructing a spherical
map by ray tracing from the
map texels o nto a reflective
sphere.
Figure 8.19
Sampling the surface
of a sphere. (a) Cubic
perspective: under-sampling
at the centre of the map
(equator and meridia n)
compared to t he corners.
(b) Mercator or
latitude-longitude: severe
over-sa mpling at edges of
the map in t he v direction
(poles). (c) Orthographic:
severe under-sampling at
the edges of the map in the
u direction (equator).
( a)
Environment map
P x == U
Pr == V
P, == (1.0 - Pi - P /) 112
(b)
This is also the normal to the sph ere at the hit point and we can compute the
reflected vector using Equation 8.2.
To index into the map we reflect the view vector fro m the object (either for
each pixel or for each polygon vertex) and calculate the map coordinates as:
1
Rx
+ 2
m
U==-
V==
1
Rr
+ 2
m
where:
m == 2(Ri
(c)
+ R/ + (R, +
1)2) 112
Sphere mapping overcomes the main limitation of cubic maps which require, in
general, access to a number of the face maps, and is to be preferred when speed
is important. However, both types of sphere mapping suffer more from nonuniform sampling than cubic mapping. Refer to Figure 8.19 which attempts to
demonstrate this point. In all three cases we consider that the environment map
is sampling incoming illumination incident on the surface of the unit sphere.
The illustration shows the differen ce between the areas on the surface of the
sphere sampled by a texel in the environment map. Sam~ling only a~proacl:es
uniformity when the viewing direction during the rendenng phas_e aligns Wit~
the viewing direction from which th e map was computed. For this reason this
type of spherical mapping is considered to be view dependent and a new map
has to be computed when the view direction changes.
So far we have restricted the discussion to geometry and assumed that the object
which is environment mapped possesses a perfect mirror surface and the map is
indexed by a single reflected view ray. What if we want to use objects with
reflecting properties other than that of a perfect mirror. Using the n ormal Phong
local reflection model, we can consider two components- a diffuse component
plus a specular component - and construct two maps. The diffuse map is
indexed by the surface normal at the point of interest and the specular map is
indexed by the reflected view vector. The relative contribution from each map is
determined by diffuse and specular reflection coefficients just as in standard
Phong shading. This enables u s to render objects as if they were Phong shaded
but with the addition of reflected environment detail which can be blurred to
MAPPING TECHNIQUES
f J(L)
D(N) =
We have seen in preceding sections that there are many difficulties associated
with mapping a two-dimensional texture onto the surface of a three-dimensional object. Th e reasons for this are:
(1) Two-dimensional texture mapping based on a surface coordinate system can
Area(L) x fd (NL)
4rt
wh ere:
N is the surface normal at the point of interest
I(L) is the environment map as a function of L the incident direction to
which the entry I in the map corresponds
Area is the area on the surface of the unit sphere associated with L
{ct is the diffuse convolution function :
fd(x) = kdx for x > 0
and
{ct(x) = 0 for x s 0
kct is the diffuse reflection coefficient that weights the contribution of D(N) in
ll=X
Thus for each value of N we sum over all values of L the area-weighted dot
product or Lambertian term .
Th e specular map is defined as:
4rt
where:
R is the reflected view vector
and
W=Z
where:
.'fl(L) x Area(L) x (, (R L)
S(R)
V= y
{s(x) = 0 for x s 0
(Note that if {s is set to unity the surface is a perfect mirror and the environment
map is unaltered.)
The reflected intensity at a surface point is thus:
D(N) + S(R )
( 8.7.1 )
Three-dimensional noise
A popular class of procedural texturing techniques all have in common the fact
that they use a three-dimensional, or spatial, noise function as a basic modelling
(~
MAPPING TECHNIQUES
k
primitive. These techniques, the most notable of which is the simulation of turbulence, can produce a surprising variety of realistic, natural-looking texture
effects. In this section we will concern ourselves with the issues involved in the
algorithmic generation of the basic primitive - solid noise.
Perlin (1985) was the first to suggest this application of n oise, defining a function noiseO that takes a three-dimensional position as its input and returns a
single scalar value. This is called model-directed synthesis - we evaluate the
noise function only at the point of interest. Ideally the function should possess
the following three properties:
(1) Statistical invariance under rotation.
(2) Statistical invariance under translation.
(3) A narrow bandpass limit in frequency.
The first two conditions ensure that the noise function is controllable - that is,
no matter how we move or orientate the noise function in space, its general
appearance is guaranteed to stay the same. The third condition enables us to
sample the noise function without aliasing. Whilst an insufficiently sampled
noise function may not produce noticeable defects in static images, if used in
animation applications, incorrectly sampled noise will produce a shimmering or
bubbling effect.
Perlin's method of generating noise is to define an integer lattice, or a set of
points in space, situated at locations (i, j, k) where i, j and k are all integers. Each
point of the lattice has a random number associated with it. This c~n be done
either by using a simple look-up table or, as Perlin (1985) suggests, via a hashing function to save space. The value of the noise function, at a point in space
coincident with a lattice point, is just this random number. For points in space
not on the lattice - in general (u, v, w) - the noise value can be obtained by linear interpolation from the nearby lattice points. If, using this method, we generate a solid noise function T(u, v, w) then it will tend to exhibit directional (axis
aligned) coherences. These can be ameliorated by using cubic interpolation but
this is far more expensive and the coherences still tend to be visible. Alternative
noise generation methods that eliminate this problem are to be found in Lewis
(1989); however, it is worth bearing in mind that the entire solid noise function
is sampled by the surface and usually undergoes a transformation (it is modulated, for example, to simulate turbulence) and this in itself may be enough to
eliminate the coherences.
( 8.7.2 )
Simulating turbulence
turbulence(x) =
k+l
The ~uncation band limits the function ensuring proper anti-aliasing. Consider
the difference between the first two terms in the progression, noise (x) and noise
~2x)/2. T~e noise function in the latter term will vary twice as fast as the first _
1t has twtce the frequency- and will contain features that are half the size of the
first. Moreover, its contribution to the final value for the turbulence is also scaled
by one-half. At each scale of detail the amount of noise added into the series is
proportional to the scale of detail of the noise and inversely proportional to the
~eq.ue~cy of the noise. This is self-similarity and is analogous to the selfSimtlanty obtained through fractal subdivision, except that this time the subdivi~i~n drives not displacement, but octaves of noise, producing a function that
exhibits the same noisy behaviour over a range of scales. That this function
should prove so useful is best seen from the point of view of signal analysis
which tells us that the power spectrum of turbulence() obeys a 1/f power law'
thereby loosely approximating the 1/fz power law of Brownian motion.
'
The . turbulence function in isolation only represents half the story, however.
Rendenng the turb~lence function directly results in a homogeneous pattern that
co~ld not be descnbed as naturalistic. This is due to the fact that most textures
whtch occur naturally, contain some non-homogeneous structural features and so
~ann?t .be simulated by turbulence alone. Take marble, for example, which has eastly distmguished veins of colour running through it that were made turbulent
before the marble solidified during an earlier geological era. In the light of this fact
we can identify two distinct stages in the process of simulating turbulence, namely:
(1) Representation of the basic, first order, structural features of a texture
through some basic functional form. Typically the function is continuous
and contains significant variations in its first derivatives.
(2) Addition of second and higher order detail by using turbulence to perturb
the parameters of the function.
The classic example, as first described by Perlin, is the turbulation of a sine wave
to give the appearance . of marble. Unperturbed, the colour veins running
th~ough the mar~le are gtven by a sine wave passing through a colour map. For
a sme wave runnmg along the x axis we write:
marble(x) = marble_colour (sin(x))
T~e colour ~ap r:zarble_colour? maps a scalar input to an intensity. Visualizing
th1s expressiOn, Figure 8.20(a) IS a two-dimensional slice of marble rendered with
the colour spline given in Figure 8.20(b). Next we add turbulence:
MAPPING TECHNIQUES
(~)
on was simply that which after experimentation gave the best results. We shall
work in two space due to the expense of the three-dimensional volumetric
approach referred to at the end of the last section.
A flame region is defined in the xy plane by the rectangle with minimax coordinates (- b, 0), (b, h). Within this region the flame's colour is given by:
flame(x) = (I- y/h) flame_colour(abs(x/b))
(a)
Figure 8.20
Simulating marble.
(a) Unturbulat ed slice
obtained by using the spline
show n in (b). (b) Colour
spline used to produce (a).
(c) Marble section obtained
by turbulating the slice
shown in (a).
( 8.7.3 )
(b)
This is shown schematically in Figure 8.22 (Colour Plate). Flame_colour (x) consists
of three separate colour splines that map a scalar value x to a colour vector. Each of
the R, G, B splines have a maximum intensity at x = 0 which corresponds to the
centre of the flame and a fade-off to zero intensity at x = 1. The green and blue
splines go to zero faster than the red. The colour returned by flame_colour() is
weighted according to its height from the base of the flame to get an appropriate
variation along y. The flame is rendered by applying flame() to colour a rectangular
polygon that covers the region of the flames definition . The opacity of the polygon
is also textured by using a similar functional construction . Figure 8.22 also shows
the turbulated counterpart obtained by introducing the turbulence function thus:
tel
~ ~ ~ Flame shape
WJ.J
changes
The turbulence function can be defined over time as well as space simply by
adding an extra dimension representing time, to the noise integer lattice. So the
lattice points will now be specified by the indices (i, j, k, I) enabling us to extend
the parameter list to noise (x, t) and similarly for turbulence (x, t). In ternal
to these procedures the time axis is not treated any differently from the three
spatial axes.
For example, if we want to simulate fire, the first thing that we do is to try to
represent its basic form functionally, that is, a 'flame shape'. The completely ad
hoc nature of this function al sculpting is apparent here. The final form decided
,' I
,'
'I
'I
I
I
I
I
I
Figure 8.23
Animating turbulence f or a
two-dimensional object .
Space of the
turbulence
functions
@)
MAPPING TECHNIQUES
filtering process consequently becomes expensive. Refer again to Figure 8.2. This
shows that when we are considering a pixel its pre-image in texture space is, in
general, a curvilinear quadrilateral, because the net effect of the texture mapping
and perspective mapping is of a non-linear transformation. The figure also
shows, for the diagonal band, texture for which, unless this operation is performed or approximated, erroneous results will occur. In particular, if the texture
map is merely sampled at the inverse mapping of the pixel centre then the sampled intensity may be correct if the inverse image size of the pixel is sufficiently
small, but in general it will be wrong.
In the context of Figure 8.24(a), anti-aliasing means approximating the
integration shown in the figure. An approximate, but visually successful, method
ignores the shape but not the size or extent of the pre-image and pre-calculates
all the required filtering operations. This is mip-mapping invented by Williams
(1983) and probably the most common anti-aliasing method developed specifically for texture mapping. His method is based on pre-calculation and an assumption that the inverse pixel image is reasonably close to a square. Figure 8.24(b)
shows the pixel pre-image approximated by a square. It is this approximation that
enables the anti-aliasing or filtering operation to be pre-calculated. In fact there
are two problems. The first is more common and is known as compression or
minification. This occurs when an object becomes small in screen space and consequently a pixel has a large pre-i mage in texture space. Figure 8.24(c) shows this
situation. Many texture elements (sometimes called 'texels') need to be mapped
into a single pixel. The other problem is called magnification. Here an object
becomes very close to the viewer and only part of the object may occupy the
whole of screen space, resulting in pixel pre-images that have less area than one
texel (Figure 8.24(d)). Mip-mapping deals with compression and some elaboration to mip-mapping is usually required for the magnification problem.
( 8 .7.4 )
Texture space
Image
space
T(u,,)
Texture space
Image
space
T(u, 1')
tl
Figure 8.24
Mip-mapping
approximations. (a) The
pre-image of a pixel is a
curvilinear quadrilateral
in texture space.
(b) A pre-image can be
approximated by a square.
(c) Compression is required
when a pixel maps
onto many texels.
(d) Magnification is required
when a pixel maps onto less
than one texel.
(a)
(b)
(c)
(d)
~)
where au and av are the original dimensions of the pre-image in texture space
and ax = ay = 1 for a square pixel.
A 'correct' or accurate estimation of D is important. If Dis too large then the
image will look blurred, too small and aliasing artefacts will still be visible.
Detailed practical methods for determining depending on the mapping context
are given in Watt and Watt (1992).
In a theoretical sense the magnification problem does not exist. Ideally we
would like mip-maps th at can be used at any level of detail, but in practice, storage limitations restrict the highest resolution mask to, say, 512 x 512 texels. This
problem does not seem to have been addressed in the literature and the following two approaches are supplied by Silicon Graphics for their workstation family. Silicon Graphics suggest two solutions. First, to simply extrapolate beyond
the highest resolution mip-map, and a more elaborate procedure that extracts
separate texture information into low and high frequency components.
Extrapolation is defined as:
LOD(+l)
512
(~)
512
128
128m
D -
64
64!;[l
---+-
HighD
Pixel
Figure 8.25
Showing the princi ple of
mip-mapping.
1
10
(~
MAPPING TECHNIQUES
Figure 8.27
Interactive texture mapping
-painting in T(u,v) space.
(a) Texture is painted
using an interactive paint
program. (b) Using the
object's bounding box,
the texture map points are
projected onto the object.
All projectors are parallel to
each other and normal to
the bounding box face. (c)
The object is rendered, the
'distortion' visualized and
the artist repeats the cycle if
necessary.
T (tt,l)
@I)
T (u. I')
Bounding
box
Bounding box
Plane of symmetry
(b)
Rendered
object
(c)
that t~e artist can paint the texture on the object directly and the program,
reversmg th~ normal texture mapping procedure, can derive the texture map
from the Object. Once the process is complete, new views of the object can be
rendered and texture mapped in the normal way.
. T~is appr.oach requires a technique that identifies, from the screen pixel that
IS bemg pomted to, the corresponding point on the object surface. In the
method described by Hanrahan and Haeberli (1990) an auxiliary frame buffer
known as an item buffer, is used. Accessing this buffer with the coordinates of
the screen cursor gives a pointer to the position on the object surface and th
co~responding (u, v) coordinate values for the texture map. Clearly we need a~
object representation where the surface is everywhere parametrized and
H~nrahan and Haeberli (1990) divide the object surface into a large number of
m1cropolygons. The overall idea is illustrated in Figure 8.28.
MAPPING TECHNIQUES
Figure 8.28
Iterative texture mapping painting in object space.
Geometric shadows
Item buffer
Screen space
Object space
Texture space
9. 1
9.2
9.3
Shadow algorithms
Introduction
This chapter deals with the topic of 'geometric' shadows or algorithms that
calculate the shape of an area in shadow but only guess at its reflected light
intensity. This restriction has long been tolerated in mainstream rendering; the
rationale presumably being that it is better to have a shadow with a guessed
intensity than to have no shadow at all.
Shadows like texture mapping are commonly handled by using an empirical
add-on algorith m. They are pasted into the scen e like texture maps. The other
parallel with texture maps is that the easiest algorithm to use computes a map
for each light source in the scene, known as a sh adow map. The map is accessed
during rendering just as a texture map is referenced to find out if a pixel is in
shadow or not. Like the Z-bu ffer algorithm in hidden surface removal, this algorithm is easy to implement and has become a pseudo-standard. Also like the Zbuffer algorithm it trades simplicity against high memory cost.
Shadows are important in scenes. A scene without shadows looks artificial.
They give clues concerning the scene, consolidate spatial relationships between
objects and give information on the position of the light source. To compute
shadows completely we n eed knowledge both of their shape and the light intensity inside them. An area of the scene in shadow is not completely bereft of light.
ft is simply not subject to direct illumination, but receives indirect illumination
from another nearby object. Thus shadow intensity can only be calculated taking this into account and this means using a global illumination model such as
radiosity. In this algorithm (see Chapter 11) shadow areas are treated no differently from any other area in the scene and the shadow intensity is a light intensity, reflected from a surface, like any other.
Shadows are a function of the lighting environment. They can be h ard edged
or soft edged and contain both an umbra and a penumbra area. The relative size
Global illumination
Light
source
Figure 9.7
'Pre-image' of a p ixel in the
shadow Z-buffer.
(3) We use this fraction to give an appropriate attenuated intensity. The visual
effect of this is that the hard edge of the shadow will be softened for those
pixels that straddle a shadow boundary.
Full details of this approach are given in Reeves et al. (1987). The price paid for
this anti-aliasing is a considerable increase in processing time. Pre-filtering techniques (see Chapter 14) cannot be used and a stochastic sampling scheme fo r
integrating within the pixel pre-image in the shadow Z-buffer map is suggested
in Reeves et al. (1987).
10.1
10.2
10.3
10.4
10.5
Path tracing
10.6
10.7
10.8
10.9
Caching illumination
Introduction
@)
surface to the other. The value of such models is that they enable a comparison
between the multitude of global illumination algorithms most of which evaluate a less than complete solution. By their nature the algorithms consist of a
we~lth ~f h euristic detail and the global illumination models facilitate a companson m terms of which aspects are evaluated and which are not.
~he first ~ode! that we will look at was introduced into the computer graphics
l~terature m 1986 by Kajiya (Kajiya 1986) and is known as the rendering equation. It encapsul~tes global illumination by describing what happens at a point
x on a surf~ce. It_ Is a _complet:ly general mathematical statement of the problem
and global.~ll ummatwn algonthms can be categorized in terms of this equation.
In fact, Ka)lya states that its purpose:
is to provide a ~nifi.~d context for viewing them [rendering algorithms] as more or less
accurate approxi mations to the solution for a single equation.
where:
l(x, x'? is the t~~nsport intensity or the intensity of light passing from point x'
to pomt x. Ka)lya terms this the unoccluded two point transport intensity.
g(x, x') is the visibility function between x and x'. If x and x' cannot ' see' each
other then this is zero. If they are visible then g varies as the inverse square of
~(x, x~ x") is th e scattering term with respect to direction x' and x". It is the
mt~~sity of the ~nergy scattered towards x by a surface point located at x'
arr_1vmg from pomt or direction x". Kaj iya calls this the unoccluded threepomt transport reflectance. It is related to the BRDF (see Chapter 7) by:
p(x, x', x") = p(S'tn, cjl'tn, S',.r, cp',er) cos e cos e,.r
whe_re 8' and cjl' are the azimuth and elevation angles related to point x' (see
~ectwn 7.3) and 8 is the angle between the surface normal at point x and the
hne x'x.
The i~tegral is over s, ~ll points_on all surfaces in the scene, or equivalently over
all pomts _on th~ hemisphere Situated at point x'. The equation states that the
transport mtensity from point x' to point x is equal to (any) light emitted from
x' towards x plus the light scattered from x' towards x from all other surfaces in
the scene - that is, that originate from direction x".
GLOBAL IllUMINATION
Expressed in the above terms the rendering equation implies that we must
have:
We have already met all these factors; here the formulation gathers them into a
single equation. The important general points that come out of considering the
rendering equation are:
Figure 10.1
Radiance, irradiance and
irradiance distribution
function (after Greger et al.
(1998)).
t4
I '
)?SJ
',
,//,/ t2
'',,,'',',,.
(a)
(3) It is a recursive equation - to evaluate I(x, x') we n eed to evaluate I(x', x")
which itself will use the same equation. This gives rise to one of the most
popular practical methods for solving the problem which is to trace light
from the image plane, in the reverse direction of light propagation,
following a path that reflects from object to object. Algorithms that adopt
this approach are: path tracing, ray tracing and distributed ray tracing, all of
which will be described later.
Q /_
(a) A two-dimensional
radiance distribution for a
point in the centre of a
room where each wall
exhibits a different radiance.
(~)
- ~'''
(b) The field radiance for a
point on a surface element.
lrradiance Eis the cosine
weighted average of the
radiance - in this case 3.5 11:.
(b)
(c)
where:
( 10.1.2)
L ;n
f Ln cos e dw
II
= p/ rr.
(~)
GLOBAL ILLUMINATION
The rendering equation can be recast as the radiance equation which in its
simplest form is:
Lrcf =
cos SodA
droin = .,..,11-x---x-'l~lz
J pL1n
(l)jn
--7
Path notation
fl
where the symbols are defined in Figure 10.2(b). This can be modified so that the
integration is performed over all surfaces - usually m ore convenient in practical
algorithms - rather than all incoming angles and this gives the rendering equation in terms of radiance:
fs
p (X,
(J);n
--7
roou1)Ln
(X,
X, X
in
cos
llx _eoxdA
'll
which n ow includes the visibility function. This comes about by expressing the
solid angle dro1n in terms of the projected area of the differential surface region
visible in the direction of(!);., (Figure 10.2(c)):
Figure 10.2
The radiance equation.
(a)
\
(b) Symbols used to define
the directional dependence.
@)
(b)
Figure 10.3
The four ' mechanisms' of
light transport: (a) diffuse
to diffuse; (b) specular to
diffuse; (c) diffuse to
specular; (d) specular to
specular (after Wallace et a/.
(1987)).
(a)
(b)
(c)
(d)
GLOBAL ILLUMINATION
@)
Figure 10.4
A selection of global
illuminations paths in a
sim ple environment. See
also the Colour Plate version
of this figure.
We will now look at the development of popular or established global illumination algorithms using as a basis for our discussion the preceding concepts. The
order in which the algorithms are discussed is somewhat arbitrary; but goes from
incomplete solutions (ray tracing and radiosity) to general solutions. The idea of
this section is to give a view of the algorithms in terms of global interaction.
Return to consideration of the brute force solution to the problem. There we
considered the notion of starting at a light source and following every ray of
light that was emitted through the scene and stated that this was a computationally intractable problem. Approximations to a solution come from constraining the light-object interaction in some way and/or only considering a
(~)
GLOBAl IlLUMINATION
small subset of the rays that start at the light and bounce around the scene. The
main approximations which led to ray tracing and radiosity constrained the
scene to contain only specular reflectors or only (perfect) diffuse reflectors
respectively.
In what follows we give a review of ray tracing and radiosity sufficient for
comparison with the other methods we describe, leaving the implementation
details of these important methods for separate chapters.
@J
(10.3.1)
Figure 10.5
Whitted ray tracing.
Whitted ray tracing (visibility tracing, eye tracing) traces light rays in the reverse
direction of propagation from the eye back into the scene towards the light
source. To generate a two-dimensional image plane projection of a scene using
ray tracing we are only interested in these light rays that end at the sensor or eye
point and therefore it makes sense to start at the eye and trace rays out into the
scene. It is thus a view-dependent algorithm. A simple representation of the
algorithm is shown in Figure 10.5. The process is often visualized as a tree where
each node is a surface hit point. At each node we spawn a light ray and a
reflected ray or a transmitted (refracted) ray or both.
Whitted ray tracing is a hybrid - a global illumination model onto which is
added a local model. Consider the global interaction. The classic algorithm only
includes perfect specular interaction. Rays are shot into the scene and when they
hit a surface a reflected (and transmitted) ray is spawned at the point of intersection and they themselves are then followed recursively. The process stops when
Figure 10.6
Whitted ray tracing: the
relationship between light
paths and local and g lobal
contributions for one of the
cases shown in Figure 10.4 .
the energy of a ray drops below a predetermined minimum or if it leaves the scene
and travels out into empty space or if a ray hits a surface that is perfectly diffuse.
Thus the global part of ray tracing only accounts for pure specular-specular interaction. Theoretically there is nothing to stop us calculating diffuse global interaction, it is just that at every hit point an incoming ray would have to spawn
reflected rays in every direction into a hemispherical surface centred on the point.
To the global specular component is added a direct contribution calculated by
shooting a ray from the point to the light source which is always a point source
in this model. The visibility of the point from the light source and its direction
can be used to calculate a local or direct diffuse component - the ray is just L in
a local reflection model. Thus (direct) diffuse reflection (but not diffuse-diffuse)
interaction is considered. This is sometimes called the shadow ray or shadow
feeler because if it hits any object between the point under consideration and the
light source then we know that the point is shadow. However, a better term is
light ray to emphasize that it is used to calculate a direct contribution (using a
local reflection model) which is then passed up the tree. The main problem with
Whitted ray tracing is its restriction to specular interaction - most practical
scenes consist of predominantly diffuse surfaces.
Consider the LSSE + LDSE path in Figure 10.4, reproduced in Figure 10.6
together with th e ray tree. The initial ray from the eye hits the perfect mirror
Mirror sphere,
"' /
----------------------------------------------- - 0 / I"
- - - ----- -- - - - ~
no
refraction
Eye
Pixel
(~)
opaque
sphere
Initial ray
(c) Contributions from global and local components
Eye
Semi-transparent
object
Transmitted
(refracted)
2 _____.._
3 ----..-
L _,]oc,]
-----+-Light ray
Reflected
LSSE
4 -----.-
(a)
LDSE
contribution
(~)
GLOBAL ILLUMINATION
sphere. For this sphere there is no contribution from a local diffuse model. At the
next intersection we hit the opaque sphere and trace a global specular component which hits the ceiling, a perfect diffuse surface, and the recursion is terminated. Also at that point we have a contribution from the local diffuse model for
the sphere and the viewer sees in the pixel associated with that ray the colour of
the reflected image of the opaque sphere in the mirror sphere.
A little thought will reveal that the paths which can be simulated by Whitted
ray tracing are constrained to be LS*E and LDS*E. Ray traced images therefore
exhibit reflections in the surfaces of shiny objects of nearby objects. lf the
objects are transparent any objects that the viewer can see behind the transparent object are refracted. Also, as will be seen in Chapter 12, shadows are calculated as part of the model- but only 'perfect' or hard-edged shadows.
Considering Whitted ray tracing in terms of the rendering equation the
following holds. The scattering term pis reduced to the Jaw for perfect reflection
(and refraction). Thus the integral over all S- the entire scene - reduces to calculating (for reflection) a single outgoing ray plus the light ray which gives
the diffuse component and adding these two contributions together. Thus the
recursive structure of the rendering equation is reflected perfectly in the algorithm but the integral operation is reduced to a sum of three analytically calculated components- the contributions from the reflected, transmitted and light
rays.
( 10.3.2)
Rad io sity
Classic radiosity implements diffuse-diffuse interaction. Instead of following
individual rays 'interaction' between patches (or polygons) in the scene are considered. The solution is view independent and consists of a constant radiosity for
every patch in the scene. View independence means that a solution is calculated
for every point in the scene rather than just those points that can be seen from
the eye (view dependent). This implies that a radiosity solution has to be fol
lowed by another process or pass that computes a projection, but most work is
carried out in the radiosity pass. A problem or contradiction with classical radiosity is that the initial discretization of the scene has to be carried out before the
process is started but the best way of performing this depends on the solution.
In other words, we do not know the best way to divide up the scene until after
we have a solution or a partial solution. This is an outstanding problem with the
radiosity method and accounts for most of its difficulty of use.
A way of visualizing the radiosity process is to start by considering the light
source as an (array of) emitting patches. We shoot light into the scene from the
source(s) and consider the diffuse-diffuse interaction between a light patch and
all the receiving patches that are visible from the light patch - the first hit
patches. An amount of light is deposited or cached on these patches which are
then ordered according to the amount of energy that has fallen onto the patch
and has yet to be shot back into the scene. The one with the highest unshot
energy is selected and this is considered as the next shooting patch. The process
continues iteratively until a (high) percentage of the initial light energy is distributed around the scene. At any stage in the process some of the distributed
energy will arrive back on patches that have already been considered and this is
why the process is iterative. The process will eventually converge because the
reflectivity coefficient associated with each patch is, by definition, less than
unity and at each phase in the iteration more and more of the initial light is
absorbed. Figure 10.7 (Colour Plate) shows a solution in progress using this algorithm. The stage shown is the state of the solution after 20 iterations. The four
illustrations are:
(1) The radiosity solution as output from the iteration process. Each patch is
allocated a constant radiosity.
(2) The previous solution after it has been subject to an interpolation process.
(3) The same solution with the addition of an ambient term. The ambient 'lift'
is distributed evenly amongst all patches in the scene, to give an early well
lit solution (this enhancement is described in detail in Chapter 11).
(4) The difference between the previous two images. This gives a visual
indication of the energy that had to be added to account for the unshot
radiosity.
The transfer of light between any two patches - the diffuse-diffuse interaction is calculated by considering the geometric relationship between the patches
(expressed as the form factor). Compared to ray tracing we follow light from the
light source through the scene as patch-to-patch diffuse interaction, but instead
of following individual rays of light, the form factor between two patches averages the effect of the paths that join the patches together. This way of considering the radiosity method is, in fact, implemented as an algorithm structure. It is
called the progressive refinement method.
This simple concept has to be modified by a visibility process (not to be confused by the subsequent calculation of a projection which includes, in the normal way, hidden surface removal) that takes into account the fact that in general
a patch may be only partially visible to another because of some intervening
patch. The end result is the assignment of a constant radiosity to each patch in
the scene - a view-independent solution which is then injected into a Gouraudstyle renderer to produce a projection. In terms of path classification, conventional radiosity is LD*E.
The obvious problem with radiosity is that although man-made scenes usually consist mostly of diffuse surfaces, specular objects are not unusual and these
cannot be handled by a radiosity renderer. A more subtle problem is that the
scene has to be discretized into patches or polygons before the radiosities are
computed and difficulties occur if this polygonization is too coarse.
We now consider radiosity in terms of the rendering equation. Radiosity is the
energy per unit time per unit area and since we are only considering diffuse illumination we can rewrite the rendering equation as:
@)
GLOBAL ILLUMINATION
B(x')
which for a single sample we would expect to be high. In practice we would take
where now the only directional dependence is incorporated in the form factor F.
The equation now states that the radiosity of a surface element x is equal to the
emittance term plus the radiosity radiated by all other elements in the scene
onto x. The form factor F is a coefficient that is a function only of the spatial relationship between x and x' and this determines that fraction of B(x') arriving at x.
F also includes a visibility calculation.
This observation, that the error in the estimate is inversely proportional to the
square root of the number of samples, is extremely important in practice. To
~al~e the error, for example, we must take four times as m any samples.
Eqm valently we can say that each additional sample has less and less effect on
the result and this has to be set against the fact that computer graphics implementations tend to involve an equal, and generally high cost, per sample. Thus
the main goal in Monte Carlo methods is to get the best result possible with a
given number of samples N. This means strategies that result in variance reduction. The two common strategies for selecting samples are stratified sampling
and importance sampling.
Th e simplest form of stratified sampling divides the domain of the integration
into equal strata and estimates each partial integral by one or more random samples (Figure 10.8). ln this way each sub-domain is allocated the same number of
samples. Thus:
I =
Jf(x) dx
0
N
= L,
l= t
=N
f f(x)dx
Si
tt f(~)
N
I =
- -- - -- --1-- ""-~...,..
f f(x) dx
i'
f (x)
''
'
'''
'''
'
CY
p rim
=f (2 (x)dx - (2 (~)
0
Figure 10.8
Stratified sampling of f(x).
~I
(~)
+ + +
+
+
+
+
+
+
+
+
+
+
+ +
Figure 10.9
Stratified sampling in
computer graphics: a pixel
is divided into 16 sub-pixels
and 16 sample points are
generated by jittering the
centre point of each subpixel or stratum.
p(x) > 0
p (x)dx = 1
For example, we could choose p(x ) to be the normalized absolute value of f(x) or
alternatively a smoothed or approximate version of f(x) (Figure 10.10). Any function f(x) that satisfies the above conditions will not necessarily suffice. If we
choose an f(x) that is too far from the ideal then the efficien cy of this technique
will simply drop below t hat of a naive meth od that uses random samples .
Importance sampling is of critical importance in global illumination algorithms
that utilize Monte Carlo approaches fo r the simple and obvious reason that
although the rendering equation describes the global illumination at each and
every point in the scene we do not require a solution th at is equally accurate. We
require, for example, a more accurate result for a brightly illuminated specular
surface than for a dimly lit diffuse walL Importance sampling enables us to build
algorithms where the cost is distributed according to the final accuracy that we
require as a function of light level and surface type.
An important practical implication of Monte Carlo methods in computer graphics is that they produce stochastic noise. For example, consider Whitted ray tracing
and Monte Carlo approaches to ray tracing. In Whitted ray tracing the perfect specular direction is always chosen and in a sense the integration is reduced to a deterministic algorithm which produces a noiseless image. A crude Monte Carlo
approach that imitated Whitted ray tracing would produce an image where the
final pixels' estimates were, in general, slightly different from the Whitted solution.
These differences manifest themselves as noticeable noise. Also note that in
Whitted ray tracing if we ignore potential aliasing problems we need only initiate
one ray per pixel. With a Monte Carlo approach we are using samples of the rendering equation to compute an estimate of intensity of a pixel and we n eed to fire
many rays/pixels which bounce around the scene. In Ka jiya's pioneering algorithm
(Kajiya 1986), described in the next section, he used a total of 40 rays per pixel.
Global illumination algorithms that use a Monte Carlo approach are all based
on these simple ideas. Their inherent complexity derives from the fact that the
integration is now multi-dimensional.
As the name implies, importance sampling tends to select samples in important regions of the integrand. Importance sampling implies prior knowledge of
the function that we are going to estimate an integral for, which at first sight
appears to be a contradiction. However, most rendering problems involve an
integrand which is the product of two functions, one of which is known a priori
as in the rendering equation. For example, in a Monte Carlo approach to ray
tracing a specular surface we would choose reflected rays which tended to cluster around the specular reflection direction thus sampling the (known) BRDF in
regions where it is likely to return a high value. Thus, in general, we distribute
the samples so that their density is highest in the regions where the function has
a high value or where it varies significantly and quickly. Considering again our
simple one-dimensional example we can write:
I=
f p(x) f(x) dx
p(x)
where the first term p(x) is an importance weighting function . This func tion p(x)
is then the probability density function (PDF) of the samples. That is the samples need to be chosen such that they conform to p(x). To do this we define P(x)
to be the cumulative function of the PDF:
X
P(x)
=Jp(t)dt
0
= P-1(t).
Using this
az
Jo [p(X)
f(x) jz p(x)dx - J2
J F (x) dx - J2
unp -
j( x )
1
p(x)
The question is how do we choose p(x). This can be a function that satisfies the
following condition s:
Figure 10.10
Illustrating the idea of
importance sample.
~I
(~)
GLOBAL ILLUMINATION
PArH TRACING
Figure 10.11
Path tracing
In his classic paper that introduced the rendering equation, Kajiya (1986) was
the first to recognize that Whitted ray tracing is a deterministic solution to the
rendering equation. In the same paper he also suggested a non-deterministic
variation of Whitted ray tracing - a Monte Carlo method that he called path
tracing.
Kajiya gives a direct mathematical link between the rendering equation and
the path tracing algorithm by rewriting the equation as:
--- ~
Light
, ,/
where M is the linear operator given by the integral in the rendering equation.
This can then be written as an infinite series known as a Neuman series as:
where I is now the sum of a direct term, a once scattered term, a twice scattered
term, etc. This leads directly to path tracing, which is theoretically known as a
random walk. Light rays are traced backwards (as in Whitted ray tracing) from
pixels and bounce around the scene from the first hit point, to the second, to
the third, etc. The random walk has to terminate after a certain number of steps
- equivalent to truncating the above series at some point when we can be sure
that no further significant contributions will be encountered.
Like Whitted ray tracing, path tracing is a view-dependent solution .
Previously we have said that there is no theoretical bar to extending ray tracing
to handle all light-surface interactions including diffuse reflection and transmission from a hit point; just the impossibility of the computation. Path tracing
implements diffuse interaction by initiating a large number of rays at each pixel
(instead of, usually, one with Whitted ray tracing) and follows a single path for
each ray through the scene rather than allowing a ray to spawn multiple
reflected children at each hit point. The idea is shown in Figure 10.11 which can
be compared with Figure 10.5. All surfaces, whether diffuse or specular can
spawn a reflection /transmission ray and this contrasts with Whitted ray tracing
where the encounter with a diffuse surface terminates the recursion. The other
important difference is that a number of rays (40 in the original example) are initiated for each pixel enabling BRDFs to be sampled. Thus the m ethod simulates
full L(DIS)*E interaction.
A basic path tracing algorithm using a single path from source to termination
will be expensive. If the random walks do not terminate o n a light source then
they return zero contribution to the final estimate and unless the light sources are
large, paths will tend to terminate before they reach light sources. Kajiya addressed
this problem by introducing a light or shadow ray that is shot towards a point on
an (area) light source from each hit point in the random walk and accumulating
this contribution at each point in the path (if the reflection ray from the same
point directly hits the light source then the direct contribution is ignored).
___ ..,_
Eye
I= ge +gMI
Eye
Light
Eye
---
Ka jiya points out that Whitted ray tracing is wasteful in the sense that as the
algorithm goes deeper into the tree it does more and more work. At the same
time the con tribution to the pixel intensity from events deep in the tree
becomes less and less. In Kajiya's approach the tree has a branching ratio of one
and at eac~ hit point a ~andom variable, from a distribution based on the spec~
ular and d1ffuse BRDFs, IS used to shoot a single ray. Kajiya points out that this
process has to maintain the correct proportion of reflection, refraction and
shadow rays for each pixel.
In terms of Monte Carlo theory the original algorithm reduces the variance
for direct illumination but indirect illumination exhibits high variance. This is
particu larly true for LS*DS* E paths (see Section 10. 7 for fu rther consideration of
this type of path) where a diffuse surface is receiving light from an emitter via a
number of s~ec~lar paths. !hus the algorithm takes a very long time to produce
~ good ~uahty lffiage. (Ka)lya quotes a time of 20 hours for a 512 x 512 pixel
1mage With 40 path s per pixel.)
~mp~rtance sampling can be introduced into path/ray tracing algorithms by
basm g It on the BRDF and ensuring that more rays are sent in directions that will
return large con~ributions. However, this can only be done approximately
~ecause the asso:1ated PDF cannot be integrated and inverted. Another problem
IS that the BRDF 1s only one component of the integrand local to the current surf~ce p.oint - we have no knowledge of the light incident on this point from all
directw~s ov~r the. he~ispherical space- the field radiance (apart from the light
due to d1rect Illummatwn). In conventional path /ray tracing approaches all rays
are trac:d independently of each other, accumulated and averaged into a pixel.
No use IS ma~e of info rmation gained while the process proceeds. This important observatiOn has led to schemes that cache the information obtained during
the ray trace. The most familiar of these is described in Section 10.9.
(~)
GLOBAL ILLUMINATION
Figure 10.13
Distributed ray tracing for
reflection (see Figure 10.4
tor the complete geometry
of this case).
due to moving objects and effective anti-aliasing (see Chapter 14 for the anti-aliasing implications of this algorithm). Figure 10.14 (Colour Plate) is an image rendered with a distributed ray tracer that demonstrates the depth of field
phenomenon. The theoretical importance of this work is their realization that all
these phenomena could be incorporated into a single multi-dimensional integral
which was then evaluated using Monte Carlo techniques. A ray path in this algorithm is similar to a path in Kajiya's method with the addition of the camera lens.
The algorithm uses a combination of stratified and importance sampling. A pixel
is stratified into 16 sub-pixels and a ray is initiated from a point within a sub-pixel
by using uncorrelated jittering. The lens is also stratified and one stratum on the
pixel is associated with a single stratum on the lens (Figure 10.15). Reflection and
transmission lobes are importance sampled and the sample point similarly jittered.
Cook et al. (1984) pre-calculate these and store them in look-up tables associated
with a surface type. Each ray derives an index as a function of its position in the
I II
Figure 10.12
Perfect refraction through a
solid glass sphere is
indistinguishable from
texture mapping.
(~)
Light source
Figure 10.15
Distributed ray tracing:
four rays per pixel. The
pixel, lens and light source
are stratified; the reflection
lobe is importance sampled.
GLOBAL ILLUMINATION
pixel. The primary ray and all its descendants have the same index. This means
that a ray emerging from a first hit along a direction relative toR, will emerge from
all other hits in the same relative R direction for each object (Figure 10.16). This
ensures that each pixel intensity, which is finally determined from 16 samples, is
based on samples that are distributed, according to the importance sampling criterion, across the complete range of the specular reflection functions associated
with each object. Note that there is nothing to prevent a look-up table being twodimensional and indexed also by the incoming angle. This enables specular reflection functions that depend on angle of incidence to be implemented. Finally, note
that transmission is implemented in exactly the same way using specular transmission functions about the refraction direction.
In summary we have:
@'J ''''
5t
UGHTPASS
Pre-sampled reflection
function for object 2
(
Figure 10.16
Distributed ray tracing and
reflected rays.
Pre-sampled
reflection function
for object I
Pixel indices
(1) The process of distributing rays means that stochastic anti-aliasing becomes
AD&*
(~)
Figure 10.1 7
Two-pass ray tracing for the
LSSDE path in Figure 10.4.
EYE PASS
@)
GLOBAL ILLUMINATION
bins. In effect the first pass imposes a texture map or illumination map- the varying brightness of the caustic - on the diffuse surface. The resolution of the illumination map is critical. For a fixed number of shot light rays, too fine a map may
result in map elements receiving no rays and too coarse a map results in blurring.
The second pass is the eye trace - conventional Whitted ray tracing - which
terminates on the diffuse surface and uses the stored energy in the illumination
map as an approximation to the light energy that would be obtained if diffuse
reflection was followed in every possible direction from the hit point. In the
example shown, the second pass simulates a DE path (or ED path with res~ect to
the trace direction). The 'spreading' of the illumination from rays traced m the
first pass over the diffuse surface relies on the fact that the rate of change of diffuse illumination over a surface is slow. It is important to note that there can
only be one diffuse surface included in any path. Both the eye trace and the light
trace terminate on the diffuse surface - it is the 'meeting point' of both traces.
It is easy to see that we cannot simulate LS*D paths by eye tracing alone. Eye
rays do not necessarily hit the light and we have no way of finding out if a surface has received extra illumination due to specular to diffuse transfer. This is
illustrated for an easy case of an LSDE path in Figure 10.18.
The detailed process is illustrated in Figure 10.19. A light ray strikes a surface
at p after being refracted. It is indexed into the light map associated with the
Light
LSDE
Mirror
Figure 10.19
Two-pass ray tracing and
light maps. (a) First pass:
light is deposited in a ligh t
map using a standard
texture mapping T.
(b) Second pass : when
object 2 is conventionally
eye traced extra illumination
at pis obtained by indexing
the light map with T.
~bjwl
Light
map
Object 2
(b)
object using a standard texture mapping fun ction T. During the second pass an
eye ray hits P. The same mapping function is used to pick up any illumination
for the point P and this contribution weights the local intensity calculated for
that point.
An important point h ere is that the first pass is view independent- we construct a light map for each object which is analogous in this sense to a texture
map - it becomes part of the surface properties of the object. We can use the light
maps from any view point after they are completed and they need only be computed once for each scene.
Figure 10.20(a) and (b) (Colour Plate) shows the same scene rendered using a
Whitted and two-pass ray tracer. In this scene there are three LSD paths:
(1) Two caustics from the red sphere - on e directly from the light and one from
the light reflected from the curved mirror.
(2) One (cusp) reflected caus tic from the cylindrical mirror.
(3) Secondary illumination from the planar mirror (a non-caustic LSDE path).
Eye
Figure 10.18
An example of an LSDE path
(see also Figures 1 0.4 and
1 0.1 7 for examples of SDE
paths). An eye ray can
'discover' light ray Land
reflected ray R but cannot
find the LSDE path.
~)
GLOBAL ILLUMINATION
Figure 10.21
The virtual environment
rnethod for incorporating
DSD paths in the radiosity
rnethod.
@)
..
deposited. This accounts for the LS* paths. A radiosity solution is then invoked
using these values as em itting patches and the deposited energy is distributed
through the D* ch ain. Finally an eye pass is initiated and this provides the final
projection and the ES* or ES*D paths.
Comparing the string LS*(D*)S*E with th e complete global solution, we see
that th e central D* paths should be extended to (D*S*D*)* to make
LS*(D*S*D*)*S*E which is equivalent to the complete global solution L(SID)*E.
Conventional or classical radiosity does not include d iffuse-to-diffuse transfer
that takes place via an intermediate specular surface. In other words once we
invoke th e radiosity phase we need to include the possibility of transfer via an
interm ediate specula r path DSD.
The first an d perhaps the simplest approach to including a specular transfer
into the radiosity solution was based on modifying the classical radiosity algorithm for flat specular surfaces, such as mirrors, and is called the virtual window
approach. This idea is shown in Figure 10.21. Conventional radiosity calculates
the geometric relationship between the light source and the floor and the LDE
path is accounted for by the diffuse- diffuse in teraction between these two surfaces. (Note that since the light source is itself an emitting diffuse patch we can
term the path LDE or DDE). What is missing from this is the contribution of
light energy from th e LSD or DSD path that would deposit a bright area of light
on the floor. The DSD path from the light source via the m irror to the floor can
be accounted for by con structing a virtual environment 'seen ' through the mirror as a window. The virtual light source then acts as if it was the real light source
reflected fro m th e m irror. However, we still need to accoun t for the LSE path
which is the detailed reflected image formed in the m irror. This is view dependent and is determined during a second pass ray tracing phase. The fact that this
algorithm only deals with what is, in effect, a special case illustrates the inherent difficulty of extending radiosity to include other transfer mechanisms.
Caching illum ination is the term we have given to the scheme of storing threeo r five-dimensional values fo r illumination, in a data structure associated with
(~)
LIGHT VOLUMES
GLOBAL ILLUMINATION
the scene, as a solution progresses. Such a scheme usually relates to viewdependent algorithms. In other words the cached values are used to speed up or
increase the accuracy of a solution; they do not comprise a view-dependent solution in their own right. We can compare such an approach with a viewindependent solution such as radiosity where final illum ination values are effec.
tively cached on the (discretized) surfaces themselves. Th e difference between
such an approach and the caching methods described in this section is that the
storage m ethod is independent of the surface. This means that the m eshing
problems inherent in surface discretization methods (Chapter 11) are avoided.
Illumination values on surfaces are stored in a data structure like an octree which
represents the entire three-dimensional extent of the scene.
Consider again the simplified form of the radiance equation:
Figure 10.22
Adaptive importance
sampling in path tracing
(after Lafortune and
Williams (1995)).
(a) Incoming radiance at
a point Pis cached in a
SD tree and builds up
into a d istribution function.
(b) A fu ture reflected
directio n fro m P is selected
on the basis of both the
BRDF and the field radiance
distribution function.
L .urfocc
I pLn
The BRDF is known but Ln is not and this, as we pointed out in Section 10.4,
limits the efficacy of importance sampling. An estimate of Lin can be obtained as
the solution proceeds and this requires that the values are stored. The estimate
can be used to improve importance sampling and this is the approach taken by
Lafortune and Williams (1995) in a technique that they call adaptive importance
sampling. Their method is effectively a path tracing algorithm which uses previously calculated values of radiance to guide the current path. The idea is
shown in Figure 10.22 wh ere it is seen that a reflection direction during a path
trace is chosen according to both the BRDF for the point and th e current value
of the field radiance distribution function for that point which has been built up
p
from previous values. Lafortune and Williams (1995) cache radiance values in a
five-dimen sional tree - a two-dimensional extension of a (three-dimensional)
octree.
The RADIANCE renderer is probably the most well-known global illum ination renderer. Developed by Ward (1994) over a period of n ine years, it is a strategy, based o n path tracing, that solves a version of th e ren dering equation under
m ost conditions. The emphasis of this work is firmly on the accuracy required
for architectural simulatio ns under a variety of lighting conditions varying from
sunligh t to complex artificial lighting set-ups. The algorithm is effectively a combin ation of determ inistic and stochastic approaches and Ward (1994) describes
the underlying motivations as follows:
The key to fast convergen ce is in deciding what to sample by removing those parts of the
integral we can compute deterministically and gauging the importance of the rest so as to
maxim ise the payback from our ray calculations.
Specular calculations are made separately and the core algorithm deals with indirect diffuse interaction. Values resulting from (perfect) diffuse interaction are
cached in a (three-dimensional) octree and these cached values are used to interpolate a new value if a current hit point is sufficiently close to a cach ed point.
This basic approach is elaborated by determining the 'irradiance gradient' in the
region currently being examined which leads to the use of a higher-o rder (cubic)
interpolation procedure for the interpolation. The RA DIANCE renderer is a path
tracing algorithm that term in ates early if the cached values are 'close enough'.
Finally Ward expresses some strong opin ions about the practical efficacy of
the radiosity m ethod. It is unusual for such criticisms to appear in a computer
graphics paper and Ward is generally concerned that the radiosity method has
not migrated from the research laboratories. He says:
For example, most radiosity systems are not well automated, and do not permit general
reflectance models or curved surfaces . . .. Acceptance of physically based rendering is
bound to improve, but researchers must first demonstrate the real-life applicability of their
techniques. There have been few notable successes in applying radiosity to the needs of
practising design ers. While much research has been done on improving effici ency of the
basic radiosity method, problems associated with more realistic complicated geometries,
have only recently got the atten tion they deserve. For whatever reason it appears that
radiosity has yet to fulfil its promise, and it is time to re-examine th is technique in the
light of real-world applications and other alternatives for solving the rendering equation .
',
'..,
' ,
', '- I /
- o ~
/ I '-
An example of the use of the RADIANCE renderer is given in the com parative
image study in Chapter 18 (Figure 18.19).
(a)
(b)
Ligh t volume is the term given to schemes that cache a view-independent global
illum ination by sto ring radiance o r irradiance values at sample points over all
space (including empty space). Th us they differ from the previous sch emes
(~
GLOBAL ILLUMINATION
Mesh optimization The solu tion is view independent and the third
phase optimizes or decimates the mesh by progressively removing mesh es as
long as the resulting change due to the removal does not drop below a
(perceptually based) threshold. The output from this phase is an irregular
m esh whose detail relates to the variation of light over th e surface.
Walter et al. (199 7) point out th at a strong advantage of the technique is that its
m odularity enables optimization for different design goals. For example, the
light transpo rt phase can be optimized for the required accuracy of the BRDFs.
The density phase can vary its criteria according to perceptual accuracy and t he
decimation phase can achieve high com pression while m ain tainin g perceptual
quality.
A current disadvantage of the approach is that it is a three-dimensional viewindependent solution which implies that it can o nly display diffuse-diffuse
interaction. However, Walter et al. (1997) point out th at this restriction co mes
out of the density estimation ph ase. The particle tracing module can deal with
any type of BRDF.
INTRODUCTION
CJ
11 .1
Radiosity theory
11 .2
11 .3
11.4
11 .5
11 .6
11 .7
Meshing strategies
Introduction
Ray tracin g, the first computer graphics model to embrace global interaction, or
at least one aspect of it - suffers from an identifying visual signature: you can
usually tell if an image has been synthesized using ray tracing. It only. models
one aspect of the light interaction - that due to perfect specular reflectiOn and
transmission. The interaction between diffusely reflecting surfaces, which tends
to be the predominant light transport mechanism in interiors, is still modelled
using an ambient constant (in the local reflection component . of the model).
Con sider, fo r example, a room with walls and ceiling painted w1th a matte material and carpeted. If th ere are no specularly reflecting objects in the room, then
those parts of the room that cannot see a light source are lit ~y dif~use inter~c
tion. Such a room tends to exhibit slow and subtle changes of mtens1ty across 1ts
surfaces.
In 1984, using a method whose theory was based on the principles of
radiative heat transfer, researchers at Cornell University, developed the
radiosity method (Goral et al. 1984). This is now known as classical radiosity and
it simulates LD*E paths, that is, it can only be used, in its unextended
form, to render scenes that are made up in their entirety of (perfect) diffuse
surfaces.
To accomplish this, every surface in a scene is divided up into elements called
patches and a set of equations is set up based on the conservation of light energy.
A single patch in such an environment reflects light received from every other
patch in the environment. It may also emit light if it is a light source - light
sources are trea ted like any other patch except that they have non-zero selfemission. The interaction between patches depends on their geometric relationship. That is distance and relative orientation. Two parallel patches a short
distance apart will have a high interaction. An equilibrium solution is possible
if, for each patch in the environment, we calculate its interaction between it and
every other patch in the environment.
One of the major contributions of the Cornell group was to invent an efficient way- the hemicube algorithm - for evaluating the geometric relationship
between pairs of patches; in fact, in the 1980s most of the in novations in radiosity methods h ave come out of this group.
The cost of the algorithm is O(N2) where N is the number of patches into
which the environment is divided. To keep processing costs down, the patch es
are made large and the light intensity is assumed to be constant across a patch.
This immediately introduces a quality problem - if illumination discontinuities
do not coincide with patch edges artefacts occur. This size restriction is the practical reason why the algorithm can only calculate diffuse interaction, which by
its nature changes slowly across a surface. Adding specular interaction to the
radiosity method is expensive and is still the subject of much research. Thus
we have the strange situation that the two global interaction methods - ray
tracing and radiosity- are mutually exclusive as far as the phenomena that they
calculate are concerned. Ray tracing cannot calculate diffuse interaction and
radiosity cannot incorporate specular interaction. Despite this, the radiosity
method has produced some of the most realistic images to date in computer
graphics.
The radiosity method deals with shadows without further enhancement. As
we have already discussed, the geometry of shadows is more-or-less straightforward to calculate and can be part of a ray tracing algorithm or an algorithm
added onto a local reflection model renderer. However, the intensity within a
shadow is properly part of diffuse interaction and can only be arbitrarily approximated by other algorithms. The radiosity method takes shadows in its stride.
They drop out of the solution as intensities like any other. The only problem is
that the patch size may h ave to be reduced to delineate the shadow boundary to
some desired level of accuracy. Shadow boundaries are areas where the rate
of change of diffuse light intensity is high and the normal patch size may cause
visible aliasing at the sh adow edge.
The radiosity method is an object space algorithm, solving for the intensity
at discrete points or surface patches within an environment and not for pixels
in an image plane projection. The solution is thus independent of viewer
position. This complete solution is then injected into a renderer that computes
a particular view by removing hidden surfaces and forming a projection.
This phase of the method does not require much computation (intensities
are already calculated) and different views are easily obtained from the general
solution .
1- RtFn
-RzF21
Radiosity theory
[
Elsewhere in the text we have tried to maintain a separation between the alga.
rithm that implements a method and the underlying mathematics. It is the case,
however, that with the radiosity method, the algorithm is so intertwined with
the mathematics that it would be difficult to try to deal with this in a separate
way. The theory itself consists of nothing more than definitions - there is no
manipulation. Readers requiring further theoretical insight are referred to the
book by Siegel and Howell (1984).
The radiosity method is a conservation of energy or energy equilibrium
approach, providing a solution for the radiosity of all surfaces within an
enclosure. The energy input to the system is from those surfaces that act as emitters. In fact, a light source is treated like any other surface in the algorithm
except that it possesses an initial (non-zero) radiosity. The method is based
on the assumption that all surfaces are perfect diffusers or ideal Lambertian
R,.F,t
F;jA; = F1;A;
il
Such an equation exists for each surface patch in the enclosure and the complete
environment produces a set of n simultaneous equations of the form:
1-
~,p,,
l[ l [l
Bt
Bz
~~~
Et
Ez
~~~
patches j and i.
We can use a reciprocity relationship to give:
L B1F;;
-R,F,z
--RzFz,
RtFt,
(11.1}
Solving this equation is the radiosity method. Out of this solution comes Bt the
radiosity for each patch. However, there are two problems left. We need a way of
computing the form factors. And we need to compute a view and display the
patches. To do this we need a linear interpolation method - just like Gouraud
shading - otherwise the subdivision pattern - the patches themselves - will be
visible.
The E;s are non-zero only at those surfaces that provide illumination and
these terms represent the input illumination to the system. The R1s are known
and the Ft;S are a function of the geometry of the environment. The reflectivities
are wavelength-dependent terms and the above equation should be regarded as
a monochromatic solution; a complete solution being obta ined by solving for
however many colour bands are being considered. We can note at this stage that
F;; = 0 for a plane or convex surface - none of the radiation leaving the surface
will strike itself. Also from the definition of th e form factor the sum of any row
of form factors is unity.
Since the form factors are a function only of the geometry of the system they
are computed once only. The method is bound by the time taken to calculate the
form factors expressing the radiative exchange between two surface patches A t
and A;. This depends on their relative orientation and the distance between them
and is given by:
surfaces.
Radiosity, B, is defined as the energy per unit area leaving a surface patch per
unit time and is the sum of the emitted and the reflected energy:
B; = ; + R;
- RtFtz
1- RzFzz
(~)
Figure 11.1
Form factor geometry for
two patches i and j (after
Goral eta/. (1984)).
f f cos
Ai Ai
<j>; cos
4>; dA,dA;
nr2
@)
figure 11.2
The justification for using a
hemicube. Patches A Band
have the same form
factor.
where the geometric convention s are illustrated in Figure 11.1. In any practical
environment A; may be wholly or partially invisible from At and the integral
needs to be multiplied by an occluding factor which is a binary function that
depends on whether the differential area dA; can see dA; or not. This double integral is difficult to solve except for specific shapes.
intensities at this stage. The hemicube algorithm only facilitates the calculation
of the form factors that are subsequently used in calculating diffuse intensities
and a 'label buffer' is maintained indicating which patch is currently nearest to
the hemicube pixel.
TC
where we are now considering the form factor between the elemental area ciA;
and the finite area A;. dA; is positioned at the centre point of patch i. The veracity of this approximation depends on the area of the two patches compared with
the distance, r, between them. If r is large the inner integral does not change
much over the range of the outer integral and the effect of the outer integral is
simply multiplication by unity.
A theorem called th e Nusselt analogue tells us that we can consider the projection of a patch j onto the surface of a hemisphere surrounding the elemental
patch dAt and that this is equivalent in effect to considering the patch itself. Also
patches that produce the same projection on the hemisphere have the same
form factor. This is the justification for the hemicube method as illustrated in
Figure 11.2. Patches A, Band Call have the same form factor and we can evaluate the form factor of any patch j by considering not the patch itself, but its projection onto the faces of a hemicube.
A h emicube is used to approximate the h emisphere because flat projection
planes are computationally less expensive. The hemicube is constructed around
the centre of each patch with the hemicube Z axis and the patch normal coin
cident (Figure 11.3). The faces of th e hemicube are divided into pixels - a somewhat confusing use of the term since we are operating in object space. Every
other patch in the environment is projected onto this hemicube. Two patches
that project onto the same pixel can h ave their depths compared and the further
patch be rejected, since it cannot be seen from the receiving patch. This
approach is analogou s to a Z-buffer algorithm except that there is no interest in
Hemicube
(divided into pixel s)
/
Patch i
Figure 11.3
Evaluating the form factor F,,
Patch i
@)
This process is shown in Figure 11.6. Form factors are a function only of the
environment and are calculated once only and can be reused in stage (2) for different reflectivities and light source values. Thus a solution can be obtained for
the same environment with, for example, some light sources turned off. The
solution produced by stage (2) is a view-independent solution and if a different
view is required th en only stage (3) is repeated. This approach can be used, for
example, wh en generating an animated walk-through of a building interior.
Each frame in the animation is computed by changing the view point and calculating a new view from an unchanging radiosity solution. It is only if we
change the geometry of the scene that a re-calculation of the form factors is necessary. If the lighting is changed and the geometry is unaltered, then only the
equation n eeds resolving - we do not have to re-calculate the form factors.
Stage (2) implies the computation of a view-independent rendered version of
the solution to the radiosity equation which supplies a single value, a radiosity,
for each patch in the environment. From these values vertex radiosities are calculated and these vertex radiosities are used in the bilinear interpolation scheme
to provide a final image. A depth buffer algorithm is used at this stage to evaluate the visibility of each patch at each pixel on the screen. (This stage should not
be confused with the hemicube operation that has to evaluate inter-patch visibility during the computation of form factors.)
The time taken to complete the form factor calculation depends on the square
of the num ber of patches. A hemicube calculation is performed for every patch
(onto which all other patches are projected). The overall calculation time thus
depends on the complexity of the environment and the accuracy of the solution,
Each pixel on the hemicube can be considered as a small patch and a differential to &nite area form factor, known as a delta form factor, defined for each
pixel. The form factor of a pixel is a fraction of the differential to finite area form
factor for the patch and can be defined as:
- cos <jl; cos
-
p; t.A
rtfl
= AFq
Change the
geometryof
the scene
Change the
wavelength
dependent
properties
(colours or
lighting)
N,
Change view
Figure 11.4
F;; is obtained by summing
the form factors of the
pixels onto which patch i
projects.
Patch i
Figure 11 .6
Stages in a complete
radiosity solution. Also
shown are the points in
the process where various
modifications can be made
to the image.
Discretized
environment
t
Form factor
calculations
-+
Full matrix
solution
~ View-independent solution
'Standard'
renderer
t
Specific view
@)
Cohen and Greenberg (1985) point out that the Gauss- Siedel method is guaran.
teed to converge rapidly for equation sets such as Equation 11.1. The sum of any
row of form factors is by definition less than unity and each form factor is
multiplied by a reflectivity of less than one. The summation of the row terms in
Equation 11.1 (excluding the main diagonal term) is thus less than unity. The
mean diagonal term is always unity (F11 = 0 for all i) and these conditions guarantee fast convergence. The Gauss-Siedel method is an extension to the following iterative method. Given a system of linear equations:
A x=E
a1JXJ - . ..
X1,
E; -
- a;, x,<kl
a;;
[11.3]
Note that when i = 1 the right-hand side of the equation contains terms with
superscript k only, and Equation 11 .3 reduces to Equation 11.2. When i = n the
right-hand side contains terms with superscript (k+l) only.
Convergence of the Gauss-Siedel method can be improved by the following
method. Having produced a new value x;<kll, a better value is given by a weighted
average of the old and new values:
-at, x,
X I=
in general:
.<k+ll _ E;- anx1<kJ - .. . XI
-
- a,, x,<kt
au
[11.2]
E
au
= __;,
Using the radiosity method in a practical context, such as in the design of building interiors, means that the designer has to wait a long time to see a completed
image. This is disadvantageous since one of the raisons d'etre of computer-based
design is to allow the user free and fast experimentation with the design parameters. A long feedback time discourages experimentation and stultifies the
design process.
In 1988 the Cornell team developed an approach, called 'progressive refinement' that enabled a designer to see an early (but approximate) solution. At this
stage major errors can be seen and corrected, and another solution executed. As
the solution becomes more and more accurate, the designer may see more subtle changes that have to be made. We introduced this method in the previous
chapter, we will now look at the details.
The general goal of progressive or adaptive refinement can be taken up by any
slow image synthesis technique and it attempts to find a compromise between
the competing demands of interactivity and image quality. A synthesis method
that provides adaptive refinement would present an initial quickly rendered
image to the user. This image is then progressively refined in a 'graceful' way.
This is defined as a progression towards higher quality, greater realism etc., in a
way that is automatic, continuous and not distracting to the user. Early availability of an approximation can greatly assist in the development of techniques
and images, and reducing the feedback loop by approximation is a necessary
adjunct to the radiosity method.
@)
The two major cost factors in the radiosity method are the storage costs and
the calculation of the form factors. For an environment of SOx 103 patches, even
although the resulting square matrix of form factors may be 90% sparse (many
patches cannot see each other) this still requires 109 bytes of storage (at four
bytes per form factor).
Both the requirements of progressive refinement and the elimination of
pre-calculation and storage of the form factors are met by an ingenious restructuring of the basic radiosity algorithm. The stages in the progressive refinement
are obtained by displaying the results as the iterative solution progresses. The
solution is restructured and the form factor evaluation order is optimized so that
the convergence is 'visually graceful'. This restructuring enables the radiosity
of all patches to be updated at each step in the solution, rather than a step
providing the solution for a single patch. Maximum visual difference between
steps in the solution can be achieved by processing patches according
to their energy contribution to the environment. The radiosity method is
particularly suited to a progressive refinement approach because it computes
a view-independent solution. Viewing this solution (by rendering from a
particular view point) can proceed independently as the radiosity solution
progresses.
In the conventional evaluation of the radiosity matrix (using, for example,
the Gauss-Seidel method) a solution for one row provides the radiosity for a
single patch i:
B; = ; + Ri
L B;F;;
i=l
This is an estimate of the radiosity of patch i based on the current estimate of all
other patches. This is called 'gathering'. The equation means that (algorithmically) for patch i we visit every other patch in the scene and transfer the appropriate amount of light from each patch j to patch i according to the form factor.
The algorithm proceeds on a row-by-row basis and the entire solution is updated
for one step through the matrix (although the Gauss- Seidel method uses the
n ew values as soon as they are computed). If the process is viewed dynamically,
as the solution proceeds, each patch intensity is updated according to its row
position in the radiosity matrix. Light is gathered from every other patch in the
scene and used to update the single patch currently being considered.
The idea of the progressive refinement method is that the entire image of
all patches is updated at every iteration. This is termed 'shooting', where
the contribution from each patch i is distributed to all other patches. The differen ce between these two processes is illustrated diagramatically in Figures
11. 7(a) and (b). This re-ordering of the algorithm is accomplished in the following way.
A single term determines the contribution to the radiosity of patch j due to
that from patch i:
I'<
B,lk+ I I= E, + R,
L F,;BP)
j:: l
(b) Shooting
(a) Gathering
Figure 11.7
(a) Gathering and
(b) shooting in radiosity
solution strategies
(based on an illustration i n
Cohen
et a/. ( 1988)).
due to
Bi = R,B;F;; A;/A;
and this is true for all patch es j . This relationship can be used to determin e the
contribution to each patch j in the environment from the single patch i. A single radiosity (patch i) shoots light into the en vironment and the radiosities of all
patches j are updated simultaneously. The first complete update (of all the
radiosities in the environment) is obtained from 'on the fly' form factor computations. Thus an initial approximation to the complete scene can appear when
only the first row of form factors has been calculated. This eliminates high startup or pre-calculation costs.
This process is repeated until convergence is achieved . All radiosities are initially set either to zero or to their emission values. As this process is repeated for
each patch i the solution is displayed and at each step the radiosities fo r each
patch j are updated. As the solution progresses the estimate of the radiosity at a
patch i becomes more and more accurate. For an iteration the environment
already contains the contribution of the previous estimate of B; and the so-called
'unshot' radiosity - the difference between the current and previous estimatesis all that is injected into the environment.
If the output from the algorithm is displayed without further elaboration ,
then a scene, initially dark, gradually gets lighter as the incremental radiosities
are added to each patch. The 'visual convergence' of this process can be
@)
optimized by sorting the order in which the patches are processed according to
the amount of energy that they are likely to radiate. This means, for example,
that emitting patches, or light sources, should be treated first. This gives an early
well lit solution. The next patches to be processed are those that received most
light from the light sources and so on. By using this ordering sch eme, the solution proceeds in a way that approximates the propagation of light th rough an
environment. Although this produces a better visual sequence than an unsorted
process, the solution still progresses from a dark scene to a fully illuminated
scene. To overcome this effect an arbitrary ambient light term is added to the
intermediate radiosities. This term is used only to enhance the display and is not
part of the solution. The value of the ambient term is based on the current estimate of the radiosities of all patches in the environment, and as the solution
proceeds and becomes 'better lit' the ambient contribution is decreased.
Four main stages are completed for each iteration in the algorithm. These are:
(1) Find the patch with the greatest (unshot) radiosity or emitted energy.
(2) Evaluate a column of form factors, that is, the form factors from this patch
to every other patch in the environment.
(3) Update the radiosity of each of the receiving patches.
(4) Reduce the temporary ambient term as a function of the sum of the
differences between the current values calculated in step (3) and the
previous values.
An example of the progressive refinement during execution is shown in Figure
10.7 and Section 10.3.2 contains a full description of this figure.
poration of specular reflection are important, addressing the visual defects due
to meshing accounts for most research emphasis and it is with this aspect t hat
we will deal.
Hemicube artefacts
The serious problem of the hemicube method is aliasing caused by th e regular
division of the hemicube into uniform pixels. Errors occur as a function of the
size of the hemicube pixels due to th e assumption that patches will project
exactly onto an integer number of pixels, which in general, of course, they do
not. This is similar to aliasing in ray tracing. We attempt to gather information
from a three-dimension al environmen t by looking in a fixed number of directions. In ray tracing these directions are given initially by evenly spaced eye-topixel rays. In the radiosity method, by projecting the patches onto hemicubes
we are effectively sampling with projection rays from the hemicube origin.
Figure 11 .8 sh ows a two-dimension al analogue of the problem where a number
of identical polygons project onto either on e or two pixels depending on the
interference between the projection rays and the polygon grid. Th e polygons are
of equal size and equal orientation with respect to patch i. Their form factors
should be different - because th e number of pixels onto which each polygon
projects is differe nt for each polygon. However, as the example shows, neighbourin g polygons which sh ould h ave almost equal form factors will produce
values in the ratio 2:1.
The geometry of any practical scene can cause problems with the hemicube
meth od. Its accuracy depends on th e distance between the patches involved in
th e calculation . When distances become small th e method falls down . This
Figure 11.8
Interference between
hemicube sampling and a
set of equal polygons (after
Wallace et a!. (1989)).
Distance from
hemicube origin
increases
Number of pixels""-'<-'-that b, d, f, h, j
project onto
F;; = FdAMj
Distance from
hemicube origin
increases
"10 equal
polygons
Number ofptxe!s
that a, c, e. g. i
project onto
Patch i
situation occurs in practice, for example, when an object is placed on a supporting surface. The errors in form factors occur precisely in those regions from
which we expect the radiosity technique to excel and produce subtle phenomena such as colour bleeding and soft shadows. Baum et al. (1989) quantify the
error involved in form factor determination for proximal surfaces, and demonstrate the hemicube method is only accurate in contexts where the inter-patch
distance is at least five patch diameters.
Yet another hemicube problem occurs with light sources. In scenes which the
radiosity method is used to render, we are usually concerned with area sources
such as fluorescent lights. As with any other surface in the environment we divide
the light sources into patches and herein lies the problem. For a standard solution
an environment will be discretized into patches where the subdivision resolution
depends on the area of surface (and the accuracy of the solution required).
However, in the case of light sources the number of hemicubes required or the
num ber of patches required depends on the distan ce from the closest surface it
illuminates. A hemicube operation effectively reduces an emitting patch to a
point source. Errors will appear on a close surface as isolated areas of light if the
light source is insufficiently subdivided. With strip lights, where the length to
breadth ratio is great, insufficient subdivision can give rise to banding or aliasing
artefacts that run parallel with the long axis of the light source. An example of
the effect of insufficient light source subdivision is shown in Figure 11.14.
Hemicube aliasing can, of course, be ameliorated by increasing the resolution
of the hemicube, but this is inefficient, increasing the computation required for
all elements in the scene irrespective of whether they are aliased by the
hemicube or not; exactly the same situation which occurs with conventional
(context independent) anti-aliasing measures (Chapter 14).
Reconstruction artefacts
Reconstruction artefacts are so called because they originate from the nature of
the method used to reconstruct or approximate the continuous radiosity function from the constant radiosity solution. We recall that radiosity methods can
only function under the constant radiosity assumption which is that we divide
the environment up into patches and solve a system of equations on the basis
that the radiosity is constant across each patch.
The commonest approach - bilinear interpolation - is overviewed in Figure
11.10. Here we assume that the curved surface shown in Figure 11.10(a) will
exhibit a continuous variation in radiosity value along the dotted line as shown.
Patch)
Patch i
Figure 11 .9
All of patch j can be seen
from t he hemicube origin
and the FdNAi app roximation
falls down.
Area of patch i
shadowed by the
intervenin g patch
Intervening patch
Uniform meshing
Figure 11.10
Normal reconstruction
approach used in the
radiosity method.
(a) Compute a constant
radiosity solution.
(b) Calculate the vertex
radiosities.
(c) Reconstruction by linear
interpolation.
of Mach bands because by reducing the size of the elements they reduce the
difference between vertex radiosities.
More advanced strategies involve surface interpolation methods (Chapter 3).
Here the radiosity values are treated as samples of a continuous radiosity fu nction and quadratic or cubic Bezier/B-spline patch meshes are fitted to these. The
obvious difficulties with this approach - its inherent cost and the need to
prevent wanted discontinuities being smoothed out - has meant that the most
popular reconstruction method is still li near interpolation.
Meshing artefacts
(a)
(c)
The first step in the radiosity method is to compute a constant radiosity solution
which will result in a staircase approximation to the continuous function. The
radiosity values at a vertex are calculated by averaging the patch radiosities that
share the vertex (Figure ll.lO(b)). These are then injected into a bilinear interpolation scheme and the surface is effectively Gouraud shaded resulting in the
piecewise linear approximation (Figure ll.lO(c)).
The most noticeable defect arising out of this process is Mach bands which,
of course, we also experience in normal Gouraud shading, where the same interpolation method is used. The 'visual importance' of these can be reduced by
using texture mapping but they tend to be a problem in radiosity applications
because many of these exhibit large area textureless surfaces - interior walls in
buildings, for example. Subdivision m eshing strategies also reduce the visibility
One of th e most difficult aspects of the radiosity approach, and one that is still
a major research area, is the issue of meshing. In the discussions above we have
simply described patches as entities into which the scene is divided with the proviso that these should be large to enable a solution which is n ot prohibitively
expensive. However, the way in wh ich we do this has a strong bearing on the
quality of the final image. How should we do this so that the appearance of
artefacts is minimized? The reason this is difficult is that we can only do
this when we already h ave a solution, so that we can see where the problems
occur. Alternatively we have to predict where the problems will occur and subdivide accordingly. We begin by looking at the nature and origin of meshing
artefacts.
First some terminology:
Patches These are the entities in the initial representation of the scene.
In a standard radiosity solution, where subdivision occurs during the
solution, patches form the input to the program.
Elements
The simplest type of meshing artefact - a so-called D 0 discontinuity - is a discontinuity in th e value of the radiosity function. The common sources of such
a discontinuity are shadow boundaries caused by a point light source and objects
which are in contact. In the former case the light source suddenly becomes visible as we move across a surface and the reconstruction and meshing 'spreads'
the sh adow edge towards the mesh boundaries. Thus the shadow edge will tend
to take the shape of the mesh edges giving it a staircase appearance. However,
because we tend to use area light sources in radiosity applications the discontinuities that occur are higher than D 0 Nevertheless these still cause visible
(~)
Figure 11.11
Shadow and light leakage.
Light
side
Meshing strategies
Meshing strategies that attempt to overcome these defects can be categorized in
a number of ways. An important distinction can be made on the basis of when
the subdivision takes place:
Interpolated
intensity
-'. I /
/ 1'-
(1) A priori - meshing is completed before the radiosity solution is invoked; that
is we predict where disconti nuities are going to occur and mesh accordingly.
This is also called discontinuity meshin g.
(2) A posteriori- the solution is initiated with a 'start' mesh which is refined as
the solution progresses. Th is is also called adaptive meshing.
Dark side
Partition wall
Light side
-'. I /
/ 1'-
lLd1: ~nterpolated
'
mtens1ty
As we have seen, when two objects are in contact, we can eliminate shadow
and light leakage by ensuring that mesh element boundaries from each object
coincide, which is thus an a priori meshing.
Another distin ction can be made depending on the geom etric nature of the
meshing. We can, for example, simply subdivide square patches (non-uniformly)
reducing the error to an acceptable level. The com monest approach to date,
Cohen and Wallace (1993) term this h-refinement. Alternatively we could adopt
an approach where the discontinuities in the radiosity function are tracked
across a surface and the mesh boundaries placed along the discontinuity boundary. A form of this approach is called r-refinement by Cohen and Wallace (1993)
where the n odes of the initial mesh are moved in a way th at equalizes the error
in the elements that share the node. These approaches are illustrated conceptually in Figure 11.12.
Adaptive or
a posteriori meshing
Figure 11.12
Examples of refinement
strategies (a posterion).
Discontinuity in
the radiosity function
8
where:
Fu is the form factor from patch i to patch j
F(iq>i is the form factor fro m element q of patch ito patch j
A <iq) is the subdivided area of element q of patch i
R is the number of elements in patch i
Patch form factors obtained in this way are then used in a standard radiosity
solution.
This increases the number of form factors from N x N to M x N, where M is
th e total number of elements created, and naturally increases the time spent in
form factor calculation. Patches that n eed to be divided into elements are
revealed by examining the graduation of th e coarse patch solution. The previously calculated (coarse) patch solution is retained and the fine element radiosities are then obtained from this solution using:
N
B;q
= Eq + Rq
B jF(iq)j
[11.4]
i=l
where:
h-refinement - subdivides
initial patches
is th e radiosity of element q
Bi is the radiosity of patch j
F(iq>i is the element q to patch j form factor
B ;q
In other words, as far as the radiosity solution is concerned, the cum ulative
effect of elements of a subdivided patch is identical to that of the undivided
patch; or, subdividing a patch into elements does not affect the amount of light
that is reflected by the patch. So after determining a solution for patches, the
radiosity within a patch is solved indepen dently among patches. In doing this,
Equation 11.4 assumes that only the patch in question has been subdivided into
elements - all other patches are undivided. The process is applied iteratively
until the desired accuracy is obtained. At any step in the iteration we can identify three stages:
r refinement- moves
(1) Subdividing selected patches into elements and calculating element-topatch form factors.
The idea is to generate an accurate solution for the radiosity of a point from
the 'global' radiosities obtained from the initial 'coarse' patch computation.
Patches are subdivided into elements. Element-to-patch form factors are calcu
lated where the relationship between element-to-patch and patch-to-patch form
factors is given by:
Where stage (2) just occurs for the first iteration , the coarse patch radiosities are
calculated once only. The method is distinguished from simply subdividing the
environment into smaller patches. Th is strategy would result in M x M new form
factors (rather than M x N) an d an M x M system of equations.
Subdivision of patches into elements is carried out adaptively. The areas that
require subdivision are not known prior to a solution being obtained. These
areas are obtained from an initial solution and are then subject to a form factor
subdivision . The previous form factor matrix is still valid and the radiosity solution is not re-com puted.
(~
MESHING STRATEGIES
emitter. In this case we are subdividing a patch (the emitter) over whose surface
the light intensity will be considered uniform.
Adaptive subdivision can be incorporated in the progressive refin ement
method. A naive approach would be to compute the radiosity gradient and subdivide based on the contribution of the current shooting patch. However, this
approach can lead to unnecessary subdivisions. The sequence, shown in Figure
11.15 shows the difficulties encountered as subdivision, performed after every
iteration, proceeds around one of the wall lights. Originally two large patches
situated away from the wall provide general illumination of the object. This
immediately causes subdivision around the light- wall boundary because the
program detects a high difference between vertices belonging to the same
patches. These patches have vertices both under the light and on the wall.
However, this subdivision is n ot fine en ough and as we start to shoot energy
from the light source itself light leakage begins to occur. Light source patches
continue to sh oot energy in the order in which the model is stored in the data-
Only part of the form factor determination is further discretized and this is
then used in the third phase (determination of the element radiosities from the
coarse patch solution). This process is repeated until it converges to the desired
degree of accuracy. Thus image quality is improved in areas that require more
accurate treatment. An example of this approach is sh own in Figure 11.13. Note
the effect on the quality of the shadow boundary. Figure 11.14 sh ows the same
set-up but this time the light source is subdivided to a lower and high er resolu.
tion than in Figure 11.13. Although the effect, in this case, of insufficient sub.
division of emitting and non-emitting patches is visually similar, the reasons for
these discrepancies differ. In the case of non-emitting patches we have changes
in reflected light intensity that do not coincide with patch boundaries. We
increase the number of patches to capture the discontinuity. With emitting
patches the problem is due to the number of hemicube emplacements per light
source. Here we increase the number of patches that represent the emitter
because each hemicube emplacement reduces a light to a single source and we
need a sufficiently dense array of these to represent the spatial extent of the
Figure 11 .13
Figure 11 .13
continued
!<C7
I
I
jagged.
(~)
MESHING STRATEGIES
Figure 11.14
Figure 11 .15
base and we spiral up the sphere, shooting energy onto its inside and causing
more and more light leakage. Eventually the light emerges onto the wall and
brightens up the appropriate patches. As the fan of light rotates above the light
more and more inappropriate subdivision occurs. This is because the subdivision
is based on the current intensity gradients which move on as further patches are
shot. Note in the final frame this results in a large degree of subdivision in an
area of highlight saturation. These redundant patches slow th e solution down
more and more and we are inadvertently making things worse as far as execution time is concerned.
Possible alternative strategies are:
(1) Limit the subdivision by only initiating it after every
n patches instead of
(11.7.2)
A priori meshing
We will now look at two strategies for a priori meshing - processing a scene
before it is used in a radiosity solution.
Figure 11.16
Hierarchical radiosity: scene
subdivision is determ ined by
energy interchange.
Hierarchical radiosity
----
[J
are subdivided into four elements. At the next level of subdivision on ly two out
of 16 form factor estimates exceed the threshold and they are subdivided. It is
easily seen from the illustration that in this example the pattern of subdivision
'homes into' the common edge.
A hierarchical subdivision strategy starts with an (initial) large patch subdivision of n patches. This results in n (n-1 )/2 form factor calculations. Pairs of
patches that cannot be used at this initial level are then subdivided as suggested
by the previous figure, the process continuing recursively. Thu s each in itial
patch is represented by a hierarchy and links. The structure con tain s both the
geometric subdivision and links that tie an element to other elements in the
scene. A node in the hierarchy represents a group of elements and a leaf node a
single element. To make this process as fast as possible a crude estimate of the
form factor can be used. For example, the expression inside the integral definition of the form factor:
Figure 11.18
Interaction between two
patches can consist of
energy interchange between
any pair of nodes at any
level in their respective
Figure 11.17
Hierarchical radiosity:
the geometric effect
of subdivision of two
perpendicular patches (after
Hanrahan eta/. (1991 )).
hierarchy.
(2)
B
Node to
internal node
linkAC
rn
First s ubdivision :
twooutof1 6
form factors exceed
/
can be used but note that this does not take into account any occluding patches.
Thus the stages in establishing the hierarchy are:
(1) Start with an initial patch subdivision. This would normally use much larger
patches than those required for a conventional solution.
(2) Recursively apply the following:
(a) Use a quick estimate of the fo rm factor between pairs of linked surfaces.
(b) If this falls below a threshold or a subdivision limit is reached, record
their interaction at that level.
(c) Subdivide the surfaces.
It is important to realize that two patches can exhibit an interaction between
any pair of nodes at any level in their respective hierarchies. Thus in Figure 11.18
a link is shown between a leaf node in patch A and an internal node in patch C.
The tree shown in the figure for A represents the subdivisions necessary for its
interaction with patch B and that for patch C represents its interactions with
some other patch X. This means that energy transferred from A to the internal
node in C is inherited by all the child nodes below the destination in C.
Subdivision of A
AB interaction
Subdivision of C
ex interaction
Comparing this formulation with the classical full matrix solution we have
now replaced the form factor matrix with a hierarchical representation. This
implies that a 'gathering' solution proceeds from each node by following all the
links from that node and multiplying the radiosity fo un d at the end of the lin k
by the associated form factor. Because links have, in general, been established at
any level, the hierarchy needs to be followed in both directions from a node.
The iterative solution proceeds by executing two steps on each root node
until a solution has converged. The first step is to gather energy over each
incoming link. The second step, known as ' pushpull' pushes a n ode's reflected
radiosity down the tree and pulls it back up again. Pushing involves simply
adding the radiosity at each child node. (Note that sin ce radiosity has units of
power/unit area the value remains undiminished as the area is subdivided.) The
total energy received by an element is the sum of the energy received by it
directly plus the sum of the energy received by its parents. When the leaves are
reached the process is reversed and the energy is pulled up the tree, the current
energy deposited at each n ode is calculated by averaging the child node contributions.
In effect this is just an elaboration of the Gauss-Siedel relaxation method
described in Section 11.3. For a particular patch we are gath ering contributions
from all other patches in the scene to enable a new estimate of the current patch.
The difference now is that the gathering process involves following all links out
of the current hierarchy and the hierarchy has to be updated correctly with the
bi-directional traversal or pushpull process.
The above algorithm description implies that at each node in the q uadtree
data structure the following information is available:
MESHI NG STRATEGIES
GatherRad(p)
p.Bg
:= 0
p.Bs := Bup
return B,,P
The procedure is first called at the top of the hierarchy with th e gathered radios
ity at that level. The recursion has the effect of passing o r pushing down th is
radiosity onto the child nodes. At each internal node the gathered power is
added to the inherited power accumulated along the downwards path. When a
leaf n ode is reached any emission is added into the gathered radiosity for that
node and the result assigned to the sho oting radiosity for that node. The recursion then unwinds pulling the leaf node radiosity up the tree and perfo rming an
area weighting at each node.
Although hierarchical radiosity is an efficient method and one that can be
finely controlled (the accuracy of t he solution depends o n the form factor tolerance and the minimum subdivision area) it still suffers from shadow leaks and
jagged shadow boundaries because it subdivides the environment regularly
(albeit non-uniformly) without regard to the position of shadow boundaries.
Reducing t he value of the control parameters to give a m ore accurate solution
can still be prohibitively expensive. This is the motivation of the approach
described in the n ext section.
Finally we can do no better th an to quote from the original paper, in w hich
the auth ors give their inspiration fo r the approach:
The hierarchical subdivision algorithm proposed in th is paper is inspired by methods
recently developed for solving the N-body problem. In the N-body problem, each of the n
particles exerts a force on all the other n- 1 particles, implying n(rz-1 )/2 pairwise
interactions. The fast algorithm computes all the forces on a particle in less t han quadratic
time, building on two key ideas:
(1) Numerical calculations are subject to error, and therefore, the force ac ting on a particle
need only be calculated to within the given precision.
(2) The force due to a cluster of particles at some distant point can be approximated, within
the given precision, with a single term - cutting down on the total number of interactions.
Discontinuity meshing
The commonest, and simplest, type of a priori meshing is to take care of the special case of interpenetrating geo metry (D 0 ) as we suggested at the beginning o f
this section. This is mostly done semi-manually when the scene is constructed
and disposes of shadow and light leakage - the most visible radiosity artefact.
The more general approaches attend to higher-order d iscontinuities. D 1 and D 2
discontinuities occur when an object interacts with an area light source - the
characteristic penumbra- umbra transition within a shadow area - as described
in Chapter 9.
As we have seen, common a posteriori methods generally approach the problem by subdividing in th e region of discontinuities in the radiosity function and
can only eliminate errors by resorting to higher and higher meshing densities.
The idea behind discontinuity meshing is to p redict where the discontinuities
are going to occur and to align the mesh edges exactly with the path of the discontinuity. This approach is by definition an a priori m ethod. We p redict where
the discontinuities will occur and mesh, before invoking the solution phase so
t hat when the solution proceeds t here can be no artefacts present d ue to t he
non-alignment of discontinuities and mesh edges.
MESHING STRATEGIES
Figure 11.20
VE event causing a O'
discontinuity (after Lischinski
et at. (1992)).
Receiver
Penl umbra
Umbra
Pen-
oO
Source 0 occluder
visibility along xy
Source
Emitter
y
Pen-
Umbra
jumbra :
Pen
iumbra \
Figure 11.19
Figure 11.21
VEevent causing a Y
discontinuity (after Lischinski
eta/. (1992)).
:[;J
Vo
Source 0 occluder
visibility along xy
C )
Figure 11 .23
processing a VE wedge
(after Lischinski et a/.
(1 992)).
Processing continues
down the unclipped
pan of the wedge
discontinuity to that surface. The discontinuity is 'inserted' into the mesh of the
surface. As each surface is processed it clips out part of the wedge and the algorithm proceeds 'down' only the unclipped part of the wedge. When the wedge
is completely clipped the processing of that particular VE event is complete.
The insertion of the discontinuity into the mesh representing the surface is
accomplished by using a DM tree which itself consists of two components - a
two-dimensional BSP tree connecting into a winged edge data structure (Mantyla
1988) representing the interconnection of surface nodes. The way in which this
works is shown in Figure 11.24 for a single example of a vertex generating three
VE events which plant three discontinuity/critical curves on a receiving surface.
If the processing order is a, b, c then the line equation for a appears as the root
node and splitting it into two regions as shown. b, the next wedge to be
processed is checked against region R, which splits into R1 1 and R 12 and so on.
a
Figure 11.22
Umbra and penumbra from
shadow volumes formed by
VE events.
/"'-.
Figure 11.24
Constructing a DM tree for
a single VE event (after
Lischinski et a/. (1992)).
/b'-..._
/"'-.
Rn t
Rm
12.1
12.2
12.3
12.4
12.5
12.6
12.7
We have already seen that we trace infinitesimally thin light rays through the
scene, following each ray to discover perfect specular interactions. Tracing
implies testing the current ray against objects in the scen e - intersection testing
- to find if the ray hits any of them . And, of course, this is the source of the cost
in ray tracing - in a naive algorithm, for each ray we have to test it against all
objects in th e scene (and all polygons in each object). At each boundary,
between air and an object (or between an object and air) a ray will 'spawn' two
more rays. For example, a ray initially striking a partially transparent sphere will
generate at least four rays for the object - two emerging rays and two internal
rays (Figure 10.5). The fact that we appropriately bend the transmitted ray mean s
that geometric distortion due to refraction is taken into account. That is, when
we form a projected image, objects that are behind transparent objects are appropriately distort ed. If the sphere is hollow the situation is more complicated there are now four intersections encountered by a ray travelling through the
object.
To perform this tracing we follow light beams in the reverse direction of light
propagation - we trace light rays from the eye. We do this eye tracing because
tracing rays by starting at the light source(s) would be hopelessly expensive. This
is because we are only interested in that small subset of light rays which pass
th rough the image plane window.
At each hit point the same calculations have to be made and this implies that
the easiest way to implement a simple ray tracer is as a recursive procedure. The
recursion can terminate according to a number of criteria:
e
e
It usually 'contains' a local reflection model such as the Phong reflection model,
and the question arises: why not use ray tracing as the standard approach 10
rendering, rather than using a Phong approach with extra algorithms for hidden
CB
At each point p that a ray hits an object, we spawn in general, a reflected and a
transmitted ray. Also we evaluate a local reflection model by calculating L at that
point by shooting a ray to the light source which we consider as a point. Thus
at each point the intensity of the light consists of up to three components:
A local component.
A contribution from a global reflected ray that we follow.
Shadows
Shadows are easily included in the basic ray tracing algorithm. We simply calculate L, the light direction vector, and insert it into the intersection test part of the
algorithm. That is, Lis considered a ray like any other. If L intersects any objects,
then the point from which L emanates is in shadow and the intensity of direct
illumination at that poin t is consequently reduced (Figure 12.1). This generates
hard-edged shadows with arbitrary intensity. The approach can also lead to great
expense. If there are n light sources, then we have to generate 11 intersection tests.
We are already spawning two rays per hit point plus a shadow ray, and for n light
sources this becomes (n + 2) rays. We can see that as the number of light sources
increases shadow computations are quickly going to predominate since the major
cost at each hit point is the cost of the intersection testing.
In an approach by Haines and Greenberg (1986) a 'light buffer' was used as a
shadow testing accelerator. Shadow testing times were reduced, using this procedure, by a factor of between 4 and 30. The method pre-calculates for each light
source, a light buffer which is a set of cells or records, geometrically disposed as
two-dimen sional arrays on the six faces of a cube surrounding a point light
source (Figure 12.2). To set up this data structure all polygons in the scene are
cast or projected onto each face of the cube, using as a projection centre the position of the light source. Each cell in the light buffer then contains a list of polygons that can be seen from the light source. The depth of each polygon is
calculated in a local coordinate system based on the light source, and the records
are sorted in ascending order of depth. This means that for a particular ray from
the eye, there is immediately available a list of those object faces that may
occlude the intersection point under consideration.
Shadow testing reduces to finding the cell through which the shadow feeler
ray passes, accessing the list of sorted polygons, and testing the polygons in the
list until occlusion is found, or the depth of the potentially occluding polygon
is greater than that of the intersection point (which means that there is no occlusion because the polygons are sorted in depth order). Storage requirements are
prodigious and depend on the number of light sources and the resolution of the
(~)
Figure 12.1
Shadow shape is compu ted
by calculating L and
inserting it into the
Intersection tester.
(~)
Figure 12.2
Shadow testing accelerator
of Haines and Greenberg
Light buffer
Figure 12.3
A reflected ' hidden' surface.
(~
(1986).
Cell that shadow feeler intersects
Light source
coordinate
system
'VEye
Current intersection point
( 12.1.4)
We will now examine the working of a ray tracing algorithm using a particular
example. The example is based on a famous image, produced by Turner Whitted
in 1980, and it is generally acknowledged as the first ray traced image in computer graphics. An imitation is shown in Figure 12.4 (reproduced as a monochrome image here and colour image in the Colour Plate section).
First some symbolics. At every point P that we hit with a ray we consider two
major components a local and a global component:
J(P) ::: l tocat(P) + lglobai(P)
::: l tocot(P ) + k,s I(P,) + k,g](Pt)
where:
P is the hit point
P, is the hit point discovered by tracing the reflected ray from P
Pt is the hit point discovered by tracing the transmitted ray from P
k,8 is the global reflection coefficient
ktg
Figure 12.4
The Whitted scene (se e also
Colour Pla te section).
(~)
The basic control procedure for a ray tracer consists of a simple recursive procedure that reflects th e action at a node where, in general, two rays are spawned.
Thus the procedure will contain two calls to itself, one for the transmitted and
one for the reflected ray. We can summarize the action as:
ShootRay (ray structure)
intersection test
if ray intersects an object
get normal at intersection point
calculate local intensity (Jiocai)
decrement current depth of trace
if depth of trace > 0
calculate and shoot the reflected ray
calculate and shoot the refracted ray
where the last two lines imply a recursive call of ShootRay(). This is the basic
control procedure. Around the recursive calls there has to be some more detail
which is:
Calculate and shoot reflected ray elaborates as
if object is a reflecting object
\ I/
(~)
Its direction.
kt~
Thus the general structure is of a procedure calling itself twice for a reflected and
refracted ray. The first part of the procedure finds the object closest to the ray
start. Then we find the normal and apply the local shading model, attenuating
the light source intensity if there are any objects between the intersection point
P and the object. We then call the procedure recursively for the reflected and
transmitted ray.
The number of recursive invocations of ShootRay() is controlled by the depth
of trace parameter. If this is unity the scene is rendered just with a local reflection model. To discover any reflections of another object at a point P we need a
depth of at least two. To deal with transparent objects we need a depth of at least
three. (The initial ray, the ray that travels through the object and th e emergent
ray have to be followed. The emergent ray returns an intensity from any object
that it hits.)
Ray 1
This ray is along a direction where a specular highlight is seen on the highly
transparent sphere. Because the ray is near the mirror d irection of L , th e contri
bution from the specular component in liocai(P) is high and the contributions
0.1
0.8
0.1
0.9
0.9
0.2
0.8
0.4
0.0
k1g
0.9
(low)
(high)
(low)
(high)
k~g
o,o
0.2
0.8
0.4
0.0
Chequerboard
kct (local)
k, (local)
1.0
0.2
0.0
0.2
0.0/1 .0
0.2
0.4
ktg
0
0
Blue background
kct (local)
0.1
0.1
1.0
Ambient light
0.3
0.3
0.3
k"'
(white)
(white)
(white)
1.0
0.0
(high blue)
Ray 2
Almost the same as ray 1 except that the specular highlight appears on the inside
wall of the hollow sphere. This particular ray demonstrates another accepted
error in ray tracing. Effectively the ray from the light t ravels through the sphere
without refracting (that is, we simply compare L with the local value of N and
ignore t he fact tha t we are now inside a sph ere). This means that the specular
highlight is in the wrong position but we simply accept this because we have no
intuitive expectation of the correct position anyway. We simply accept it to be
correct.
Ray 3
Ray 3 also hits the thin-walled sphere. The local contribution at all hits with the
hollow sphere are zero and t he predominant contribution is the chequerboard.
This is subject to slight distortion due to the refractive effect of the sphere walls.
The red (or yellow) colour comes from the high kd in lJocaJ(P ) where Pis a point
on the chequerboard. k,8 and kts are zero for this surface. Note, however, that we
have a mix of two chequerboards. One is as described and the other is the super.
imposed reflection on the outside surface of the sphere.
Figure 12.5
A polygon that lies almost in
the x,.,yw plane will have a
high zw component. We
choose this plane in w hich
to perform interpolation of
vertex normals.
Ray4
Again this hits the thin-walled sphere, but this time in a direction where the distance travelled through the glass is significant (that is, it only travels through the
glass it does not hit the air inside) causing a high refractive effect and making
the ray terminate in the blue background.
Zw
Interpolation plane
the facet approximates. This entity is required for the local illumination component and to calculate reflection and refraction. Recall that in Phong interpolation (see Section 6.3.2) we used the two-dimensional component of screen space
to interpolate, pixel by pixel, scan line by scan line, the normal at each pixel
projection on the polygon. We interpolated three of th e vertex normals using
two-dimensional screen space as the interpolation basis. How do we interpolate
from the vertex normals in a ray tracing algorithm, bearing in mind that we are
operating in world space? One easy approach is to store the polygon
normal for each polygon as well as its vertex normals. We find the largest of its
three compon ents xw, y.... and Zw. The largest component identifies which of the
three world coordinate planes the polygon is closest to in orientation, and we
can use this plane in which to interpolate using the same interpolation sch eme
as we employed for Phong interpolation (see Section 1.5). This plane is equivalent to the use of the screen plane in Phong interpolation. The idea is shown in
Figure 12.5. This plane is used for the interpolation as follows. We consider
the polygon to be represented in a coordinate system where the hit point P is
the origin. We then have to search the polygon vertices to find the edges that
cross the 'medium' axis. This enables us to interpolate the appropriate vertex
normals to find N , and N b from which we find the required normal Nr (Figure
12.6). Having found the interpolated normal we can calculate the local illumination component and the reflected and the refracted rays. Note that because we
RayS
This ray hits the opaque sphere and returns a significant contribution from the
local component due to a white kd (local). At the first hit the global reflected ray
hits the chequerboard. Thus there is a mixture of:
white (from the sphere's diffuse component)
red/yellow (reflected from the chequerboard)
Ray 6
This ray hits the chequerboard initially and the colour comes completely from
the local component for that surface. However, the point is in shadow and this
is discovered by the intersection of the rayLand the opaque sphere.
Ray 7
The situation with this ray is exactly the same as for ray 6 except that it is the
thin walled sphere that intersects L. Thus the shadow area intensity is not
reduced by as much as the previous case. Again we do n ot consider the recursive
effect that L would in fact experience and so the shadow is in effect in the wrong
place.
Interpolation plane
y,
Figure 12.6
Finding an interpolated
normal at a hit point P.
V3
@})
(12.5.1)
The trace depth required in a ray tracing program depends upon the nature of
the scene. A scene containing highly reflective surfaces and transparent objects
will require a higher maximum depth than a scene that consists entirely of
poorly reflecting surfaces and opaque objects. (Note that if the depth is set equal
to uni ty then the ray tracer functions exactly as a conventional renderer, which
removes hidden surfaces and applies a local reflection model.)
It is pointed out in Hall and Greenberg (1983) that the percentage of a scene
that consists of highly transparent and reflective surfaces is, in general, small and
it is thus inefficient to trace every ray to a maximum depth. Hall and Greenberg
suggest using an adaptive depth control that depends on the properties of the
materials with which the rays are interacting. The context of the ray being traced
now determines the termination depth, which can be any value between unity
and the maximum pre-set depth.
Rays are attenuated in various ways as they pass through a scene. When a ray
is reflected at a surface, it is attenuated by the global specular reflection coeffi.
cient for the surface. When it is refracted at a surface, it is attenuated by the
global transmission coefficient for the surface. For the moment, we consider
only this attenuation at surface intersections. A ray that is being examined as a
result of backward tracing through several intersections will make a contribution
to the top level ray that is attenuated by several of these coefficients. Any contribution from a ray at depth n to the colour at the top level is attenuated by the
product of the global coefficients en countered at each node:
ktkz . .. ku-1
If this value is below some threshold, there will be no point in tracing further.
In general, of course, there will be three colour contributions (RGB) for each
ray and three components to each of the attenuation coefficients. Thus when
the recursive procedure is activated it is given a cumulative weight parameter
that indicates the final weight that will be given at the top level to the colour
returned for the ray represented by that procedure activation. The correct weight
for a new procedure activation is easily calculated by taking the cumulative
weight for the ray currently being traced and multiplyi ng it by the reflection or
(~)
transmission coefficient for the surface intersection at which the new ray is
being created.
Another way in which a ray can be attenuated is by passing for some distance
through an opaque material. This can be dealt with by associating a transmittance coefficient with the material composing an object. Colour values would
then be attenua ted by an amount determined by this coefficient and the distance
a ray travels through the material. A simple addition to the intersection calculation in the ray tracing procedure would allow this feature to be incorporated.
The use of adaptive depth control will prevent, for example, a ray that initially hits an almost opaque object spawning a transmitted ray that is then
traced through the object and into the scene. The intensity returned from the
scene may then be so attenuated by the initial object that this computation is
obviated. Thus, depending on the value to which the threshold is pre-set, the ray
will, in this case, be terminated at the first hit.
For a h ighly reflective scene with a maximum tree depth of 15, Hall and
Greenberg report (1983) that this method results in an average depth of 1. 71,
giving a large potential saving in image generation time. The actual saving
achieved will depend on the nature and distribution of the objects in the scene.
In the previous section it was pointed out that even for highly reflective scenes,
th e average depth to which rays were traced was between one and two. This fact
led Weghorst et al. (1984) to suggest a hybrid ray tracer, where the intersection
of the initial ray is evaluated during a preprocessing phase, using a hidden surface algori thm. The implication here is that the h idden surface algorithm will be
more efficient than the general ray tracer for the first hit. Weghorst eta/. (1984)
suggest executing a modified Z-buffer algorithm, using the same viewing parameters. Simple modifications to the Z-buffer algorithm will make it produce,
for each pixel in th e image plane, a pointer to the object visible at that pixel. Ray
tracing, incorporating adaptive depth control then proceeds from that point.
Thus the expensive intersection tests associated with the first hit are eliminated.
Two properties are required of a bounding volume. First, it should have a sizn.
pie intersection test - thus a sphere is an obvious candidate. Second, it should
efficiently enclose the object. In this aspect a sphere is deficient. If the object is
long and thin the sphere will contain a large void volume and many rays Will
pass the bounding volume test but will not intersect the object. A rectangular
solid, where the relative dimensions are adjustable, is possibly the best simple
bounding volume. (Details of intersection testing of both spheres and boxes are
given in Chapter 1.)
The dilemma of bounding volumes is that you cannot allow the complexity
of the bounding volume scheme to grow too much, or it obviates its own purpose. Usually for any scene, the cost of bounding volume calculations will be
related to their enclosing efficiency. This is easily shown conceptually. Figure
12.7 shows a two-dimensional scene containing two rods and a circle representing complex polygonal objects. Figure 12.7(a) shows circles (spheres) as bounding volumes with their low enclosing efficiency for the rods. Not only are the
spheres inefficient, but they intersect each other, and the space occupied by
other objects. Using boxes aligned with the scene axes (axis aligned bounding
boxes, or AABBs) is better (Figure 12.7(b)) but n ow the volume enclosing the
sloping rod is inefficient. For this scene the best bounding volumes are boxes
with any orientation (Figure 12.7(c)); the cost of testing the bounding volumes
increases from spheres to boxes with any orientation. These are known as OBBs.
Weghorst et al. (1984) define a 'void' area, of a bounding volume, to be the
difference in area between the orthogonal projections of the object and bounding volume onto a plane perpendicular to the ray and passing through the
origin of the ray (see Figure 12.8). They show that the void area is a function of
object, bounding volume and ray direction and define a cost function for an
intersection test:
T
= b*B + i*J
where:
T is the total cost function
b is the number of times that the bounding volume is tested for intersection
(a)
(b)
(c)
Figure 12.8
Void area
It is pointed out by the authors that the two products are generally interdependent. For example, reducin g B by reducing the complexity of the bounding
volume will almost certainly increase i. A quantitive approach to selecting the
optimum of a sphere, a rectangular parallelepiped and a cylinder as boun ding
volumes is given.
@.5.4)
Figure 12.9
A simple scene and the
associated bounding
cylinder tree structure.
those nodes where intersections occur. Thus a scene is grouped, wh ere possible,
into object clusters and each of those clusters may contain other groups of
objects that are spatially clustered. Ideally, high-level clusters are enclosed in
bounding volumes that contain lower-level clusters and bounding volumes.
Clusters can only be created if objects are sufficiently close to each other.
Creating clusters of widely separated objects obviates the process. The potential
clustering and the depth of the hierarchy wiU depend on the nature of the scene:
the deeper the hierarchy the more th e potential savings. The disadvantage
of this approach is that it depends critically on the nature of the scene. Also,
considerable user investment is required to set up a suitable hierarchy.
Bounding volume hierarchies used in collision detection are discussed in
Chapter 17. Although identical in principle, collision detection requires efficient
testing for intersection between pairs of bounding volumes, rather than ray/
volume testing. OBB hierarchies have proved useful in this and are described in
Section 17.5.2.
The use of spatial coherence
Currently, spatial coherence is the only approach that looks like making ray tracing a practical proposition for routine image synthesis. For this reason it is dis
cussed in some detail. Object coherence in ray tracing has generally been
ignored. The reason is obvious. By its nature a ray tracing algorithm spawns rays
of arbitrary direction anywhere in the scene. It is difficult to use such 'random'
rays to access the object data structure and efficiently extract those objects in the
path of a ray. Unlike an image space scan conversion algorithm where, for exam
ple, active polygons can be listed, there is no a priori information on the
sequence of rays that will be spawned by an initial or view ray. Naive ray tracing
algorithms execute an exhaustive search of all objects after each hit, perhaps
modified by a scheme such as bounding volumes, to constrain the search.
The idea behind spatial coherence schemes is simple. The space occupied by
the scene is subdivided into regions. Now, rather than check a ray against all
objects or sets of bounded objects, we attempt to answer the question: is the
(~)
region, through which the ray is currently travelling, occupied by any objects?
Either there is nothing in this region, or the region contains a small subset of the
objects. This group of objects is then tested for intersection with the ray. The size
of the subset and the accuracy to which the spatial occupancy of the objects is
determined varies, depending on the nature and number of the objects and the
method used for subdividing the space.
Th is approach, variously termed spatial coherence, spatial subdivision or
space tracing has been independently developed by several workers, notably
Glassner (1984), Kaplan (1985) and Fujimoto et a/. (1986). All of these
approaches involve pre-processing the space to set up an auxiliary data structure
that contains information about the object occupancy of the space. Rays are
then traced using this auxiliary data structure to enter the object data structure.
Note tha t this philosophy (of pre-processing the object environment to reduce
the computational work required to compute a view) was first employed by
Schumaker et nl. (1969) in a hidden surface removal algorithm developed for
flight simulators (see Section 6.6.10). In this algorithm, objects in the scene are
clustered into groups by subdividing the space with planes. The spatial subdivision is represented by a binary tree. Any view point is located in a region represented by a leaf in the tree. An on-line tree traversal for a particular view point
quickly yields a depth priority order for the group clusters. The important point
about this algorithm is that the spatial subdivision is computed off-line and an
auxiliary structure, the binary tree representing the subdivision, is used to determine an initial priority ordering for the object clusters. The motivation for this
work was to speed up the on-li ne hidden surface removal processing and enable
image generation to work in real time.
Dissatisfaction with the bounding volume or extent approach, to reducing
the number of ray object intersection tests, appears in part to have motivated the
development of spatial coherence methods (Kaplan 1985). One of the major
objections to bounding volumes has already been pointed out. Their 'efficiency'
is dependent on how well the object fills the space of the bounding volume. A
more fundamental objection is that such a scheme may increase the efficiency
of the ray-object intersection search, but it does nothing to reduce the dependence on the number of objects in the scene. Each ray must still be tested against
the bounding extent of every ob ject and the search time becomes a function of
scene complexity. Also, although major savings can be achieved by using a hierarchical structure of bounding volumes, considerable investment is requi red to
set up an appropriate hierarchy, and depending on the nature and disposition of
objects in the scene, a hierarchical description may be difficult or impossible.
The major innovation of methods described in this section is to make the rendering time constant (for a particular image space resolution) and eliminate its
dependence on scene complexity.
The various schemes that use the spatial coherence approach differ mainly in
the type of auxiliary data structure used. Kaplan (1985) lists six properties that a
practical ray tracing algorithm should exhibit if the technique is to be used in
routine rendering applications. Kaplan's requirements are:
then the number of intersection tests required for a region is small and does not
tend to increase with the complexity of the scene.
Tracking a ray using an octree
In order to use the space subdivision to determine which objects are close to a
ray, we must determine which subregion of space the ray passes through. This
involves tracking the ray into and out of each subregion in its path. The main
operation required during this process is that of finding the node in the octree,
and hence the region in space, that corresponds to a point (x, y, z).
The overall tracking process starts by detecting the region that corresponds to
the start point of the ray. The ray is tested for intersection with any objects that
lie in this region and if there are any intersections, then the first one encountered is the one required for the ray. lf there are no intersections in the initial
region, then the ray must be tracked into the next region through which it
passes. This is done by calculating the intersection of the ray with the boundaries of the region and thus calculating the point at which the ray leaves the
region. A point on the ray a short distance into the next region is then used to
find the node in the octree that corresponds to the next region. Any objects in
this region are then tested for intersections with the ray. The process is repeated
as the ray tracks from region to region until an intersection with an object is
found or until the ray leaves occupied space.
The simplest approach to finding the node in the octree that corresponds to
a point (x, y, z) is to use a data structure representation of the octree to guide the
search for the node. Starting at the top of the tree, a simple comparison of coordinates will determine which child node represents the subregion that contains
the point (x, y, z). The subregion, corresponding to the child node, may itself
have been subdivided and another coordinate comparison will determine which
of its children represents the smaller subregion that contains (x, y, z) . The search
proceeds down the tree until a terminal node is reached. The maximum number
of nodes traversed during this search will be equal to the maximum depth of the
tree. Even for a fairly fine subdivision of occupied space, the search length will
be short. For example, if the space is subdivided at a resolution of 1024 x 1024 x
1024, then the octree will have depth 10 (= logs(1024 x 1024 x 1024)).
So far we have described a simple approach to the use of an octree representation of space occupancy to speed up the process of tracking a ray. Two variations of this basic approach are described by Glassner (1984) and Fujimoto et al.
(1986). Glassner describes an alternative method for finding the node in the
octree corresponding to a point (x, y, z). In fact, he does not store th e structure
of the octree explicitly, but accesses information about the voxels via a hash
table that contains an entry for each voxel. The hash table is accessed using a
code number calculated from the (x, y, z) coordinates of a point. The overall ray
tracking process proceeds as described in our basic method.
In Fujimoto et al. (1986) another approach to tracking the ray through the
voxels in the octree is described. This method eliminates floating point multiplications and divisions. To understand the method it is convenient to start by
ignoring the octree representation. We first describe a simple data structure rep.
resentation of a space subdivision called SEADS (Spatially Enumerated Auxiliary
Data Structure). This involves dividing all of occupied space into equally sized
voxels regardless of occupancy by objects. The three-dimensional grid obtained
in this way is analogous to that obtained by the subdivision of a two.
dimensional graphics screen into pixels. Because regions are subdivided regard.
less of occupancy by objects, a SEADS subdivision generates many more voxels
than the octree subdivision described earlier. It thus involves 'unnecessary'
demands for storage space. However, the use of a SEADS enables very fast track.
ing of rays from region to region. The tracking algorithm used is an extension of
the DDA (Digital Differential Analyzer) algorithm used in two-dimensional
graphics for selecting the sequence of pixels that represent a straight line
between two given end points. The DDA algorithm used in two-dimensional
graphics selects a subset of the pixels passed through by a line, but the algorithm
can easily be modified to find all the pixels touching the line. Fujimoto et al.
(1986) describe how this algorithm can be extended into three-dimensional
space and used to track a ray through <l SEADS three-dimensional grid. The
advantage of the '3D-DDA' is that it does not involve floating point multiplication and division. The only operations involved are addition, subtraction and
comparison, the main operation being integer addition on voxel coordinates.
The heavy space overheads of the complete SEADS structure can be avoided
by returning to an octree representation of the space subdivision. The 3D-DDA
algorithm can be modified so that a ray is tracked through the voxels by traversing the octree. In the octree, a set of eight nodes with a common parent
node represents a block of eight adjacent cubic regions forming a 2 x 2 x 2 grid.
When a ray is tracked from one region to another within this set, the 3D-DDA
algorithm can be used without alteration. If a ray enters a region that is not represented by a terminal node in the tree, but is further subdivided, then the subregion that is entered is found by moving down the tree. The child node
required at each level of descent can be discovered by adj usting the control variables of the DDA from the level above. If the 3D-DDA algorith m tracks a ray out
of the 2 x 2 x 2 region currently being traversed, then the octree must be traversed upwards to the parent node representing th e complete region. The 3DDDA algorithm then continues at this level, tracking the ray within the set of
eight regions containing the parent region. The upward and downward traversals of the tree involve multiplication and division of the DDA control variables
by 2, but this is a cheap operation.
Finally, we summarize and compare the three spatial coherence meth ods by
listing their most important efficiency attributes:
Octrees: are good for scenes whose occupancy density varies widely regions of low density will be sparsely subdivided, high density regions will
be finely subdivided. However, it is possible to have small objects in large
regions. Stepping from region to region is slower than with the other two
methods because the trees tend to be unbalanced.
SEADS: stepping is faster than an octree but massive memory costs are
incurred by the secondary data structure.
BSP: the depth of the tree is smaller than an octree for most scenes because
the tree is balanced. Octree branches can be short, or very long for regions
of high spatial occupancy. The memory costs are generally lower than those
of an octree. Void areas will tend to be smaller.
In this unique sch eme, suggested by Arvo and Kirk (1987), instead of subdividing object space according to occupancy, ray space is subdivided into fivedimensional hypercubic regions. Each hypercube in five-dimensional space is
associated with a candidate list of ob jects for intersection. That stage in ob ject
space subdivision schemes where th ree-space calculations have to be invoked to
track a ray through object space is now eliminated. The hypercube that contains
the ray is found and this yields a complete list of all the objects that can intersect the ray. The cost of the intersection testing is now traded against higher
scene pre-processing complexity.
A ray can be considered as a single point in five-dimensional space. It is a line
with a three-dimensional origin together with a direction that can be specified
by two angles in a unit sphere. Instead of u sing a sphere to categorize direction,
Arvo and Kirk (1987) use a 'direction cube'. (This is exactly the same tool as the
light buffer used b y Haines an d Greenberg (1986) - see Section 12.1.3.) A ray is
thus specified by the 5-tuple (x, y, z, u, v), where x, y, z is the origin of the ray
and 11, v the direction coordinates; together with a cube face label that indicates
which face of the direction cube the ray passes through. Six copies of a fivedimensional hypercube (one for each direction cube face) thus specify a collection of rays h aving similar origins and similar directions.
This space is subdivided according to object occupancy and candidate lists are
constructed fo r the subdivided regions. A 'hyper-octree' - a five-dimensional
analog of an octree - is used for the subdivision.
To construct candidate lists as five-dimensional space is subdivided, the threedimensional equivalent of the hypercube must be used in three-space. This is a
'beam' or an unbounded three-dimensional volume that can be considered the
union of the volume of ray origins and the direction pyramid formed by a ray origin and its associated direction cell (Figure 12.1 0). Note that the beams in threespace will everywhere intersect each other, whereas their hypercube equivalents in
five-space do not intersect. This is the crux of the method - the five-space can be
subdivided and that subdivision can be acheived using binary partitioning.
However, th e construction of the candidate lists is now more difficult than with
object space subdivision schemes. The beams must be intersected with the bounding volumes of objects. Arvo and Kirk (1987) report that detecting polyhedral
intersections is too costly and suggest the approximation where beams are represented or bounded by cones interacting with spheres as object bounding volumes.
Figure 12.10
Figure 12.11
Reflection in beam tracing.
(~)
Incident beam
Renected beam
A single ray
specified by
(X, y, z. u. v)
Reflection (and refraction) are modelled by calling the beam tracer recursively. A n ew beam is generated for each beam- object intersection. The crosssection of an y refelected beam is defined by the area of the polygon clipped by
the incident beam and a virtual eye po int (Figure 12.11 ).
Apart from the restriction to polygonal objects the approach has other disadva ntages. Beam s that partially intersect objects change into beams with complex
cross-sections. A cross-section can become disconnected or may contain a hole
(Figure 12.12). An other disadvantage is that refraction is a non-linear phenomenon and the geometry of a refracted beam will not be preserved. Refraction
therefore, has to be approximated using a linear transformation.
Another approach to beam tracing is the pencil technique of Shinya et al.
(1987). In this method a pencil is formed from rays called 'paraxial rays'. These
are rays that are near to a reference ray called an axial ray. A paraxial ray is represented by a four-dimensional vector in a coordinate system associated with the
axial ray. Paraxial approximation theory, well known in optical design and electromagnetic analysis, is then used to trace the paraxial rays through th e environment. This means that for any rays that are n ear the axial ray, the pencil
transformations are linear and are 4 x 4 matrices. Error analysis in paraxial
theory supplies functions that estimate erro rs and provide a constraint fo r the
spread angle of the pencil.
Figure 12.12
(~)
The 4 x 4 system matrices are determined by tracing the axial ray. All the
paraxial rays in the pencil can then be traced using these matrices. The paraxial
approximation theory depends on surfaces being smooth so that a paraxial ray
does not suddenly diverge because a surface discontinuity has been encountered.
This is the main disadvantage of the method.
An approach to ray coherence that exploits the similarity between the inter.
section trees generated by successive rays is suggested by Speer et al. (1986). This
is a direct approach to beam tracing and its advantage is that it exploits ray
coherence without introducing a new geometrical entity to replace the ray. The
idea here is to try to use the path (or intersection tree) generated by the previous
ray, to construct the tree for the current ray (Figure 12.13). As the construction
of the current tree proceeds, information from the corresponding branch of the
previous tree can be used to predict the next object h it by the current ray. This
means that any 'new' intervening object must be detected as shown in Figure
12.14. To deal with this, cylindrical safety zones are constructed around each ray
in a ray set. A safety zone for ray,_z is shown in Figure 12.1 5. Now if the current
ray does not pierce the cylinder of the corresponding previous ray, and this ray
intersects the same object, then it cannot intersect any n ew intervening objects.
If a ray does not pierce a cylinder, then new intersection tests are required as in
standard ray tracing, and a new tree that is different from the previous tree, is
constructed.
In fact, Speer eta/. (1986) report that this method suffers from the usual computational cost paradox- the increase in complexity necessary to exploit the ray
coherence properties costs more than the standard ray tracing as a fu nction of
scene complexity. This is despite the fact that two-thirds of the rays behave
coherently. The reasons given for this are the cost of maintaining and piercechecking the safety cylinders, whose average radius and length decrease as a
function of scene complexity.
Figure 12.14
Ray,
Figure 12.15
Figure 12.13
Many people associate the term ' ray tracing' with a novel technique b ut, in fact,
it h as always been part of geometric optics. For example, an early use of ray tracing in geometric optics is found in Rene Descartes' treatise, published in 1637,
explaining the shape of the rainbow. From experimental observations involving
a spherical glass flask filled with water, Descartes used ray tracing as a theoretical framework to explain the phenomenon. Descartes used the already known
laws of reflection and refraction to trace rays through a spherical drop of water.
Rays entering a sph erical water drop are refracted at the first air- water interface, internally reflected at the water- air interface and finally refracted as they
emerge from th e drop. As shown in Figure 12.16, horizontal rays entering the
Figure 12.16
11
10~~~~~~~~~~~~=-87
9-
Figure 12.17
(~)
fOrmation of a rambow.
From sun
6 ----'-------;.,<.__
5-----...,.__
4--------,f.-_
3=~-------======~
2~--r---------------~~
From s un
drop above the horizontal diameter emerge at an increasing angle with respect
to the incident ray. Up to a certain maximum the angle of the exit ray is a function of the height of the incident ray above the horizontal diameter. This trend
continues up to a certain ray, when the behaviour reverses and the angle
between the incident and exit ray decreases. This ray is known as the Descartes
ray, and at this point the angle between the incident and exit ray is 42. Incident
rays close to the Descartes ray emerge close to it and Figure 12.16 shows a con
centration of rays around the exiting Descartes ray. It is this concentration of
rays that makes the rainbow visible.
Figure 12.1 7 demonstrates the formation of the rainbow. An observer looking
away from the sun sees a rainbow formed by '42' rays from the sun. The paths
of such rays form a 42 'hemicone' centred at th e observer's eye. (An interesting
consequence of this model is that each observer has his own personal
rainbow.)
Th is early, elegant use of ray tracing did not, however, explain that magical
attribute of the rainbow - colour. Thirty years would elapse before Newton discovered that white light contained light at all wavelengths. Along with the fact
that the refractive index of any material varies for light of different wavelengths,
Descartes' original model is easily extended. About 42 is the maximum angle for
red light, while violet rays emerge after being reflected and refracted through
40. The model can then be seen as a set of concentric hemicones, one fo r each
wavelength, centred on the observer's eye.
This simple model is also used to account for the fainter secondary rainbow.
This occurs at 51 o and is due to two internal reflections inside the water drops.
display will decrease the blue component, leaving the red and green components
unchanged.
Gamma correction leaves zero and maximum intensities unchanged and
alters the intensity in mid-range. A 'wrong' gamma that occurs either because
gamma correction has not been applied or because an inaccurate value of
gamma has been used in the correction will always result in a wrong image With
respect to the calculated colour.
16.1
16.2
16.3
16.4
View interpolation
16.5
16.6
Introduction
A new field with many diverse approaches, image-based rendering (IBR) is difficult to categorize. The motivation for the name is that most of the techniques
are based on two-dimensional imagery, but this is not always the case and the
way in which the imagery is used varies widely amongst methods. A more accurate common thread that runs through all the methods is pre-calculation. All
methods make cost gains by pre-calculating a representation of the scene from
which images are derived at run-time. IBR has mostly been studied for the common case of static scenes and a moving view point, but applications for dynamic
scenes have been developed.
There is, however, no debate concerning the goal of IBR which is to decouple
rendering time from scene complexity so that the quality of imagery, for a given
frame time constraint, in applications like computer games and virtual reality
can be improved over conventionally rendered scenes where all the geometry is
reinserted into the graphics pipeline whenever a change is made to the view
point. It has emerged, simultaneously with LOD approaches (see Chapter 2) and
scene management techniques, as an effective means of tackling the dependency
of rendering time on scene complexity.
()
20
20
Rendering
process
Figure 16.1
Planar imposters and image
warping.
Impostor is the name usually given to an image of an object that is used in the
form of a texture map - an entity we called a billboard in Chapter 8. In Chapter
8 the billboard was an object in its own right - it was a two-dimensional entity
inserted into the scene. Impostors are generalizations of this idea. The idea is
that because of the inherent coherence in consecutive frames in a moving view
point sequence, the same impostor can be reused over a number of frames until
an error measure exceeds some threshold. Such impostors are sometimes qualified by the adjective dynamic to distinguish them from pre-calculated object
images that are not updated. A planar sprite is used as a texture map in a normal
rendering engine. We use the adjective planar to indicate that no depth information is associated with the sprite- just as there is no depth associated with a
texture map (although we retain depth information at the vertices of the rectangle that contains the sprite). The normal (perspective) texture mapping in the
renderer takes care of warping the sprite as the view point changes.
There are many different possible ways in which sprites can be incorporated
into a rendering sequence. Schaufler's method (Schaufler and Sturzlinger 1996)
is typical and for generating an impostor from an object model it proceeds as follows. The object is enclosed in a bounding box which is projected onto the
image plane resulting in the determination of the object's rectangular extent in
screen space - for that particular view. The plane of the impostor is chosen to be
that which is normal to the view plane normal and passes through the centre of
the bounding box. The rectangular extent in screen space is initialized to transparent and the object rendered into it. This is then treated as a texture map and
placed in the texture memory. When the scene is rendered the object is treated
as a transparent polygon and texture mapped. Note that texture mapping takes
into account the current view transformation and thus the impostor is warped
slightly from frame to frame. Those pixels covered by the transparent pixels are
unaffected in value or z depth. For the opaque pixels the impostor is treated as
a normal polygon and the Z-buffer updated with its depth.
In Maciel and Shirley (1995) 'view-dependent impostors' are pre-calculatedone for each face of the object's bounding box. Space around the object is then
divided into view point regions by frustums formed by the bounding box faces
and its centre. If an impostor is elected as an appropriate representation then
whatever region the current view point is in determines the impostor used.
Calculating the validity of planar impostors
@)
The magnitude of the error depends on the depth variation in the region of
the scene represented by the impostor, the distance of the region from the View
point and the movement of the view point away from the reference position
from which the impostor was rendered. (The distance factor can be gainfully
exploited by using lower resolution impostors for distant objects and grouping
more than one object into clusters.) For changing view point applications the
validity has to be dynamically evaluated and new impostors generated as
required.
Shade eta/. (1996) use a simple metric based on angular discrepancy. Figure
16.2 shows a two-dimensional view of an object bounding box with the plane of
the impostor shown in bold. vo is the view point for the impostor rendering and
v 1 is the current view point. xis a point or object vertex which coincides with x'
in the impostor view. Whenever the view point changes from Vo, x and x' subtend an angle e and Shade eta/. calculate an error metric which is the maximum
angle over all points x.
Schaufler and Sturzlinger's (1996) error metric is based on angular discrepancy
related to pixel size and the consideration of two worst cases. First, consider the
angular discrepancy due to translation of the view point parallel to the impostor
plane (Figure 16.3(a)). This is at a maximum when the view point moves normal
to a diagonal of a cube enclosing the bounding box with the impostor plane
coincident with the other diagonal. When the view point moves to 111 the points
x', X1 and xz should be seen as separate points. The angular discrepancy due to
this component of view point movement is then given by the angle e,,..,
between the vectors l'JXJ and V1Xz. As long as this is less that the angle subtended
by a pixel at the view point this error can be tolerated. For a view point moving
towards the object we consider the construction in Figure 16.3(b). Here the worst
case is the corner of the front face of the cube. When the view point moves in
to 111 the points x 1 and xz should be seen as separate and the angular discrepancy
is given as BsJze- An impostor can then be used as:
use_imposter := (8trans <
8 screcn)
or
(8size
< 8screen)
Bounding box
Imposter plane
XI
Figure 16.3
Schaufler's worst case
angular discrepancy metric
(after Shaufler ( 1996)).
(a) Translation of view point
parallel to an impostor.
(b) Translation of view point
towards an impostor plane.
x'
Xt ~
\
Vi~ VO
(a)
(b)
ew,
~ r~
''
\i
vo
''
where:
field of view
Bscreen
= screen resoJution
VI
vo
Current
v iew point
Imposter
view point
An important technique that has been used in con junction with ZD imagery is
the allocation of different amounts of rendering resources to differen t parts of
the image. An influential (h ardware) approach is due to Regan and Pose (1994).
They allocated different frame rates to objects in the scene as a function of their
distance from the view point. This was called priority rendering because it co111_
bined the environment map approach with updating the scene at different rates.
They use a six-view cubic environment map as the basic pre-computed solution.
In addition, a multiple display memory is used for image composition and on
the fly alterations to the scene are combined with pre-rendered imagery.
The method is a hybrid of a conventional graphics pipeline approach with an
image-based approach. It depends on dividing the scene into a priority hierarchy. Objects are allocated a priority depending on their closeness to th e current
position of the viewer and their allocation of rendering resources and update
time are determined accordingly. The scene is pre-rendered as environment
maps and, if the viewer remains stationary, no changes are made to the enVj.
ronment map. As the viewer changes position the new environment map from
the new view point is rendered according to the priority scheme.
Regan and Pose (1994) utilize multiple display memories to implement priority rendering where each display memory is updated at a different rate according to the information it contains. If a memory contains part of the scene that
is being approached by a user then it has to be updated, whereas a memory that
contains information far away from the current user position can remain as it is.
Thus overall different parts of the scene are updated at different rates - hence
priority rendering. Regan and Pose (1994) use memories operating at 60, 30, 15,
7.5 and 3.75 frames per second. Rendering power is directed to those parts of the
scene that need it most. At any instant the objects in a scene would be organized
into display memories according to their current distance from the user.
Simplistically the occupancy of the memories might be arranged as concentric
circles emanating from the current position of the user. Dynamically assigning
each object to an appropriate display memory involves a calculation which is
carried out with respect to a bounding sphere. In the end this factor must impose
an upper bound on scene complexity and Regan and Pose (1994) report a test
experiment with a test scene of only 1000 objects. Alternatively objects have to
be grouped into a hierarchy and dealt with through a secondary data structure
as is done in some speed-up approaches to conventional ray tracing.
(16.2.2)
Figure 16.4
The layer approach of
Lengyel and Snyder (after
Lengyel and Snyder (1 99 7)).
Rendering resources are
allocated to perceptually
important parts of the scene
(layers). Slowly changing
layers are updated at a
lower frame rate and at
lower resolution.
Another key idea of Lengyel and Snyder's work is that any layer can itself be
decomposed into a number of components. The layer approach is taken into the
shading itself and different resources given to different components in the shading. A moving ob ject may consist of a diffuse layer plus a highlight layer plus a
shadow layer. Each com ponent produces an image stream and a stream of twodimensional transformations representing its translation and warping in image
space. Sprites may be represented at different resolutions to the screen resolution
and may be updated at different rates. Thus sprites have different resolution in
both space and time.
A sprite in the context of this work is now an 'independent' entity rather than
being a texture map tied to an object by the normal vertex/texture coordinate
association. It is also a pure two-dimensional object - not a two-dimensional
part (a texture map) of a three-dimensional object . Thus as a sprite moves the
appropriate warping has to be calculated.
In effect the traditional rendering pipeline is split into 'parallel' segments
each representing a different part of the image (Figure 16.4). Different quality
settings can be applied to each layer which manifests in different frame rates and
different resolutions for each layer. The layers are then combined in the compositor with transparency or alpha in depth order.
A sprite is created as a rectangular entity by establishing a sprite rendering
transform A such that the projection of the object in the sprite domain fits
tightly in a bounding box. This is so that points within the sprite do n ot sample
non-object space. The transform A is an affine transform that maps the sprite
onto the screen and is determined as follows. If we consider a point in screen
space p, then we have:
p,= Tp
where p is the equivalent object point in world space and T is the concatenation
of the modelling, viewing and projection transformations.
We require an A such that (Figure 16.5):
Ps = A- 1ATp = A q
Image layering
Lengyel and Snyder (1997) generalized the concept of impostors and variable application of rendering resources calling their technique 'coherent image layers'. Here
the idea is again to devote rendering resources to different parts of the image according to need expressed as different spatial and/or temporal sampling rates. The technique also deals with objects moving with respect to each other. This is done by
dividing the image into layers. (This is, of course, an old ideai since the 1930s cartoon
production has been optimized by dividing the image into layers which are worked
on independently and composed into a final film.) Thus fast-moving foreground
objects can be allocated more resources than slow-moving background objects.
Layers
Compositor
Figure 16.5
The sprite rendering
transform A.
..._
Object space
Sprite space
Image space
1~1~ [? H~I
q =A-'Tp
Figure 16.6
The effect of the rigid
motion of the points in the
bOunding polyhedron in
screen space is expressed
as a change in the affine
trilnsform A (after Lengy el
@)
Object's characteristic
bounding polyhedron
[_T__ I/
:' ~ ~
ps =Aq
Frame,
A= [ abtxl
c d ty
= maxi!P; - Ap';ll
i
where A p'; is a set of characteristic points in the layer in the current frame warped
into their position from the previous frame and p; the position the points actually
occupy. (Th ese are always transformed by T, the modelling, viewing and perspective transform in order to calculate the warp. This sounds like a circular argument
but finding A (previous section) involves a best fit procedure. Remember that the
warp is being used to approximate the transformation T.) Thus a threshold can be
set and the layer considered for re-rendering if this is exceeded.
For changes due to relative motion between the light source and the object
represented by the layer, the angular change in L , the light direction vector from
the object, can be computed.
Finally, a metric associated with the magnification/minifi cation of the layer
has to be computed. If the relative movement between a viewer and object is
such that layer samples are stretched or compressed then th e layer may need to
be re-rendered. This operation is similar to determining the depth parameter in
mip-mapping and in this case can be compu ted from the 2 x 2 sub-matrix of the
affine transform.
After a frame is complete a regulator considers resource allocation for the next
frame. This can be done either on a 'budget-fi lling' basis where the scene quality is maximized or on a threshold basis where the error th resholds are set to the
highest level the user can tolerate (freeing rendering resources for other tasks) .
The allocation is made by evaluating the error criteria and estimating th e
rendering cost per layer based on the fraction of the rendering budget consumed
by a particular layer. Layers can th en be sorted in a benefit/cost order and
re-rendered or warped by the regulator.
*!
...
Three-dimensional warping
f(x', y')
which implies a reference pixel will move to a new destination. (This is a simple
statement of the problem which ignores important practical problems that we
shall address later.) If we assume that the change in the view point is specified
by a rotation R == [r;;] followed by a translation T == (Lix, .G.y, .G.z )T of the view coordinate system (in world coordinate space) and that the internal parameters of
the viewing system/camera do not change - the focal length is set to unity- then
the warp is specified by:
x'
(16.1]
y' =
where:
Z(x, y) is the depth of the point P of which (x, y) is th e projection.
Xv
y' == Yr
Zv
Zv
where (xv, yv, Zv) are the coordinates of the point P in the n ew viewing system.
A visualization of this process is shown in Figure 16.7 .
We now consider the problems tha t occur with this process. The first is called
image folding or topological folding and occurs when more than one pixel in
the reference image maps into position (x', y') in the extrapolated image (Figure
16.8(a)). The straightforward way to resolve this problem is to calculate Z (x', y')
from Z(x, y) but this requires an additional rational expression and an extra Zbuffer to store the results.
McMillan (1995) has developed an algorithm that specifies a unique evaluation order fo r computing the warp fu nction such that surfaces are drawn in a
back-to-front order thus enabling a simple painter's algorithm to resolve this visibility problem. The intuitive justification for this algorithm can be seen by considering a simple special case shown in Figure 16.9. In this case the view point
has moved to the left so that its projection in the image plane of the reference
view coordinate system is outside and to the left of the reference view window.
This fact tells us that the order in which we need to access pixels in the reference
is from right to left. This then resolves the problem of the leftmost pixel in the
reference image overwriting th e right pixel in the warped image. McMillan
shows that the accessing or enumeration order of the reference image can be
reduced to nine cases depending on the position of the projection of the new
view point in the reference coordinate system. These are shown in Figure 16.10.
The general case, where the new view point stays within the reference view window divides the image into quadrants. An algorithm structure th at utilizes this
method to resolve depth problems in the many-to-one case is thus:
(~)
Figure 16.7
Athree-dimensio nal w arp is
calculated f rom rot ation R
and translat ion T applied to
the view coordinate system.
Figure 16.8
Problems in image warping .
(a) Image folding: more
than one pixel in the
reference view maps
into a single pixel in the
extrapolated view. (b) Holes:
information occluded in the
refere nce view is required
in t he extrapolated view.
(c) Holes: the projected area
of a surface increases in the
extrapolated view because
its normal rotates towards
the viewing directio n.
(d) See Colour Plate section.
(~)
Figure 16.9
The view point translates
to the left so that the
projection o~ t he new view
point in the 1mage p lane
of the reference view
coordinate system is to the
left of the reference view
window. The correct
processing order of the
reference pixels is from right
to left.
Warped
image
Reference image
(b)
Processing order
neighbouring pixels. The extent of the holes problem depends on the difference
between the reference and extrapolated view points and it can be ameliorated by
considering more than one reference image, calculating an extrapolated image
from each and compositing the result. Clearly if a sufficient number of reference
images are used then the hole problem will be eliminated and there is no need
for a local solution which may insert erroneous information.
A more subtle reason for unassigned pixels in the extrapolated image is apparent if we consider surfaces whose normal rotates towards the view direction in
Extrapolated
DDDDGJ
(1) Calculate the projection of the new view point in the reference coordinate
system.
(2) Determine th e enumeration order (one out of the nine cases shown in
Figure 16.10) depending on the projected point.
b;d ~~ Q
(3) Warp the reference image by applying Equation 16.1 and writing the result
into the frame buffer.
The second problem produced by image warping is caused wh en occluded areas
in the reference image 'need' to become visible in the extrapolated image (Figure
16.8(b)) producing holes in the extrapolated image. As the figure demonstrates,
holes and folds are in a sense the inverse of each other, but where a deterministic solution exists for folds no theoretical solution exists for holes and a heuristic n eeds to be adopted- we cannot recover informa tion that was not there in
the first place. However, it is easy to detect where holes occur. They are simply
unassigned pixels in the extrapolated image and this enables the problem to be
localized and the most common solution is to fill them in with colours from
Figure 16.10
Avisualization of McMillan's
priority algorithm indica ting
the correct processing order
as a function of view point
motion for nine cases (after
McMillan (1995)).
D DJ D c!J
[3
c:J EJ
Projection of new
view point in the
image pl ane of
reference view point
-motion in y
the new view system (Figure 16.8(c)). The projected area of such a surface into
the extrapolated image plane will be greater than its projection in the reference
image plane and for a one-to-one forward mapping holes will be produced. This
suggests that we must take a rigorous approach to reconstruction in the inter.
polated image. Mark et al. (1997) suggest calculating the appropriate dimension
of a reconstruction kernel, for each reference pixel as a fu nction of the view
point motion but they point out that this leads to a cost per pixel that is greater
than the warp cost. This metric is commonly known as splat size (Chapter 13)
and its calculation is not straightforward for a single reference image with z
depth only for visible pixels. (A method that stores multiple depth values for a
pixel is dealt with in the next section.)
The effects of these problems on an image are shown in Figure 16.8(d)
(Colour Plate). The first two images show a simple scene and the corresponding
Z-buffer image. The next image shows the artefacts due to translation (only). In
this case these are holes caused by missing information and image folding. The
next image shows artefacts due to rotation (only) -holes caused by increasing
the projected area of surfaces. Note how these form coherent patterns. The final
image shows artefacts caused by both rotation and translation.
Finally, we note that view-dependent illumination effects will n ot in general
be handled correctly with this simple approach. This, however, is a problem that
is more serious in image-based modelling methods (Section 16.6). As we have
already noted in image warping we must have reference images whose view
point is close to the required view point.
(1 6 .3.2)
Figure 16.11
A represen tation of a
with each source view are compared and enable the layers to be sorted in depth
order.
An alternative approach which facilitates a more rigorous sampling of th e
scene is to use a modified ray tracer. Th is can be done simplistically by initiating
a ray for each pixel from the LDI view point and allowing the rays to penetrate
th e object (rather than being reflected or refracted). Each hit is then recorded
as a new depth pixel in the LDI. All of the scene can be considered by precalculating six LOis each of which consists of a 90 frustum centred on the reference view point. Shade et al. (1998) point out that this sampling scheme is not
uniform with respect to a hemisphere of directions cen tred on the view point.
Neighbouring pixel rays project a smaller area on to the image plane as a function of the angle between the image plane normal and the ray direction and they
weight the ray direction by the cosine of that angle. Thus, each ray has four
coordinates: two pixel coordinates and two angles for the ray direction. The
algorithm structure to calculate the LOis is then:
(1) For each pixel, modify the direction and cast the ray in to the scen e.
(2) For each hit: if the intersected objects lies within the LDI frustum it is reprojected through the LDI view point.
(3) If the n ew hit is within a tolerance of an existing depth pixel the colour of
the new sample is averaged with the existing one; otherwise a new depth
pixel is created.
During the rendering phase, an incremental warp is applied to each layer in back
to front order and images are alpha blended into the frame buffer without the
need for Z sorting. McMillan 's algorithm (see Section 16.3. 1) is used to ensure
VIEW INTERPOLATION
the pixel motion between reference frames. This is a dense set of motion vectors
that relates a pixel in the source image to a pixel in the destination image. The
simplest example of a m otion field is that due to a camera translating parallel to
its image plane. In that case the motion field is a set of parallel vecto rs - one for
each pixel- with a direction opposite to the cam era motion and having a magnitude proportional to t he depth of the pixel. This pixel-by-pixel correspond ence can be determined for each pair of images since the three-dimensional
(image space) coordinates of each pixel is known, as is the camera or view point
m o tion. The determina tion of warp scripts is a pre-processing step and an interior is finally represented by a set of reference images together with a warp script
relating every adjacent pair. For a large scene that requires a number of varied
walkthroughs the total storage requirement m ay be very large; however, any
derived or interpolated view only requires the appropriate pair of reference
images and the warp script.
At run time a view or set of views between two reference images is then
reduced to linear interpolation. Each pixel in both the source and destination
images is moved along its motion vector by the amount given by linearly interpolating the image coordinates (Figure 16.13). This gives a pair of interpolated
images. These can be com posited and using a pair of images in this way reduces
the hole problem. Chen and Williams (1993) fill in remaining holes with a procedure t hat uses the colour local to the hole. Overlaps are resolved by using a Zbuffer to determine the nearest surface, the z values being linearly interpolated
along with the (x, y) coordinates. Finally, note that linear interpolation of the
motion vectors produces a warp which will not be exactly the same as that produced if the camera was moved into the desired position. The method is o nly
exact from the special case of a cam era translating parallel to its image plane.
Williams and Chen (1993) point out that a better approximation can be
obtained by quadratic or cubic interpolation in the image plane.
Surface
Figure 16.12
Parameters used in splat size
computation (after Shade et
a/. (1 998)).
S urface
nonnal
Output
camera
LDI
camera
that the pixels are selected for warping in the correct order according to the projection of the output camera in the LDI's system.
To enable splat size computation Shade et al. (1998) use the following formula
(Figure 16.12):
2
.
SlZe
2fOVt
z
fovz
dz cos St res t tan 2
where:
size is the dimension of a square kernel (in practice this is rounded to 1, 3, 5
or 7)
the angles e are approximated as the angles <j>, where <P is t he angle between
the surface normal and the z axis of the camera system
fov is the field of view of a camera
res = w*h (the width and height of the LDI)
View interpolation
View interpolation techniques can be regarded as a subset of 30 warping methods. Instead of extrapolating an image from a reference image, they interpolate
a pair of reference images. However, to do this three-dimensional calculations
are necessary. In the light of our earlier two-dimensional/three-dimensional categorization they could be considered a two-dimensional technique but we have
decided to emphasize the interpolation aspect and categorize them separately.
Williams and Chen (1993) were the first to implement view interpolation for
a walkthrough application. This was achieved by pre-computing a set of reference images representing an interior - in this case a virtual museum. Frames
required in a walkthrough were interpolated at run time from these reference
frames. The interpolation was achieved by storing a 'warp script' that specifies
(~)
Reference image I
Reference image 2
Interpolated pixel
@)
( 16.4.1)
View morphing
Up to now we have considered techniques that deal with a moving view point
and static scenes. In a development that they call view morphing Seitz and Dyer
(1996) address the problem of generating in-between images where non-rigid
transformations have occurred. They do this by addressing the approximation
implicit in the previous section and distinguish between 'valid' and 'non-valid'
in-between views.
View interpolation by warping a reference image into an extrapolated image
proceeds in two-dimensional image plane space. A warping operation is just that
- it changes the shape of the two-dimensional projection of objects. Clearly the
interpolation should proceed so that the projected shape of the objects in the reference projection is consistent with their real three-dimensional shape. In other
words, the interpolated view must be equivalent to a view that would be generated in the normal way (either using a camera or a conventional graphics
pipeline) by changing the view point from the reference view point to that of the
interpolated view. A 'non-valid' view means that the interpolated view does not
preserve the object shape. If this condition does not hold then the interpolated
views will correspond to an object whose shape is distorting in real threedimensional space. This is exactly what happens in conventional image morphing between two shapes. 'Impossible', non-existent or arbitrary shapes occur as
in-between images because the motivation here is to appear to change one object
into an entirely different one. The distinction between valid and invalid view
interpolation is shown in Figure 16.14.
An example where linear interpolation of images produces valid interpolated
views is the case where the image planes remain parallel (Figure 16.15).
Physically, this situation would occur if a camera was allowed to move parallel
to its image plane (and optionally zoom in and out). If we let the combined
viewing and perspective transformations (see Chapter 5) be Vo and Vt for the
two reference images then the transformation for an in-between image can be
obtained by linear interpolation:
V ; ::
(1 - s) V o + SV t
:: VP
In other words linear interpolation of pixels along a path determined by pixel
correspondence in two reference images is exactly equivalent to projecting
the scene point that resulted in these pixels through a viewing and projective
transformation given by an intermediate camera position, provided parallel
views are maintained, in other words using the transformation V1, which would
VIEW INTERPOLATION
Figure 16.14
Distinguishing between
valid and invalid view
interpolation. In (a), using
astandard (morphing)
approach of linear
interpolation produces
gross shape deformation
(this does not matter if we
are morphing betw een
tWO different objects - i t
beComes part of the effect).
(b) The interpolated (or
morphed view) is consistent
with object shape.
(Courtesy of Steven Seitz.)
(~)
(a)
-,,
I .'
... __
--
Virrual Cameras
..._
--
Morptt View
(b)
be obtained if Vo and Vt were linearly interpolated. Note also that we are interpolating views that would correspond to those obtained if we had moved the
camera in a straight line from Co to Ct. In other words the interpolated view
corresponds to the camera position:
C = (sCx, sCr, 0)
If we have reference views that are not related in this way th en the interpolation
has to be preceded (and followed) by an extra transformation. This is the general
situation where the image planes of the reference views and the image plane of
the required or interpolated view have no parallel relationship. The first transformation, which Seitz and Dyer call a 'prewarp', warps the reference images so
that they appear to have been taken by a camera moving in a plane parallel to
its image plane. The pixel interpolation, or morphing, can the be performed as
in the previous paragraph and the result of this is postwarped to form the final
interpolated view, which is the view required from the virtual camera position.
(~)
FOUR-DIMENSIONAL TECHNIQUES -THE LUM IGRAPH OR LIGHT FIELD REND ER ING APPROACH
Figure 16.15
Moving the camera from Co
to c, (and zooming) means
that the image planes
remain parallel and P, can
be linearly interpolated from
Po and p, (after Seitz and
Dyer (1996)).
@)
Reference image
Co
(1) Prewarp R o and R, using To"1 and Tt- 1 to produce R o' and R t'.
Reference image
Figure 16.16
Prewarping reference
images, interpolating and
postwarping in view
interpolation (after Seitz
and Dyer (1996)).
Ro(reference
image)
~
-'
~--
[j'J
,,/'
~..
R , (reference
',
tmage)
fiR .
/~
, "'// '
Cn '-....
R, '
, /'
-----------------
,
~
,,'' / /
--.;_
Three-step transformation
sequence
C, ---------~ -......
............ . ......
.......... .._
....................... . . .
-.... ~
c,
Up to now we have considered systems that have used a single image or a small
number of reference images from which a required image is generated. We have
looked at two-dimensional techniques and methods where depth information
has been used -three-dimensional warping. Some of these methods involve precalculation of a special form of rendered imagery (LDis) others post-process a
conventionally rendered image. We now come to a method that is an almost
total pre-calculation technique. It is an approach that bears some relationship to
environment mapping. An environment map caches all the light rays that arrive
at a single point in the scene - the source or reference point for the environment
map. By p lacing an object at that point we can (approximately) determine those
light rays th at arrive at the surface of the object by indexing into the map. This
scheme can be extended so that we store in effect an environment ma p for every
sampled point in the scene. That is, for each point in the scene we have knowledge of all light rays arriving at that point. We can now place an object at
any point in the scen e and calculate the reflected light. The advantage of
this approach is that we now minimize most of the problems related to threedimensional warping at the cost of storing a vast amount of data.
A light field is a similar approach. For each and every point of a region in the
scene in which we wish to reconstruct a view we pre-calculate and store or cache
the radian ce in every direction at that point. This representation is called a light
field or Lumigraph (Levay and Hanrahan 1996; Gortler et a!. 1996) and we construct for a region of free space by which is meant a region free of occluders. The
importance of free space is that it reduces the light field fro m a fivedimensional to a four-dimensional function. In general, for every point (x, y, z)
in scene space we have light rays travelling in every direction (parametrized by
two angles) giving a five-dimensional function . In occluder free space we can
assume (unless there is atmospheric interaction) that the radiance along a ray is
constant. The two 'free space scenes' of interest to us are: viewing an object from
@)
anywhere outside its convex hull and viewing an environment such as a room
from somewhere within its (empty) interior.
The set of rays in any region in space can be parametrized by their intersection with two parallel planes and this is the most convenient representation for
a light .field (Figure 16.17(a)). The planes can be position ed anywhere. For exam.
pie, we can position a pair of planes parallel to each face of a cube enclosing an
object and capture all the radiance information due to the object (Figure
16.7(b)). Reconstruction of any view of the object then consists of each pixel in
the view plane casting a ray through the plane pair and assigning L(s, t, u, v) to
that pixel (Figure 16.7(c)). The reconstruction is essentially a resampling process
and unlike the methods described in previous sections it is a linear operation.
Light fields are easily constructed from rendered imagery. A light field for a
single pair of parallel planes placed near an object can be created by moving the
camera in equal increments in the (s, t) plane to generate a series of sheared
perspective projections. Each camera point (s, t) then specifies a bundle of rays
arriving from every direction in the frustum bounded by the (u, v) extent. It
could be argued that we are simply pre-calculati ng every view of th e object that
we require at run time; however, two factors mitigate this brute-force approach.
First, the resolution in the (s, t) plane can be substantially lower than the resolution in the (u, v) plane. If we consider a point on the surface of the object coincidence, say, with the (11, v) plane, then the (s, t) plane contains the reflected
L (s , t, u, v)
Ll
(a)
Figure 16.17
Light field rendering using
parallel plane representation
for rays. (a) Parametrization
of a ray using parallel
planes . (b) Pairs of planes
positioned on the face
of a bounding cube can
represent all the radiance
information due to an
object. (c) Reconstruction
for a single pixel l(x,y).
(b)
(c)
light in every direction (constrained by the (s, t) plane extent). By definition, the
radiance at a single point on the surface of an object varies slowly with direction
and a low sampling frequency in the (s, t) plane will capture this variation. A
higher sampling frequen cy is required to calculate the variation as a function of
position on the surface of the object. Second, there is substantial coherence
exhibited by a light field. Levay and Hanrahan (1996) report a compression ratio
of 118:1 for a 402 Mb light field and conclude th at given this magnitude of
compression the simple (linear) re-sampling scheme together with simplicity
advantages over other IBR methods make light fields a viable proposition.
camera for each shot and a sufficient number of shots to capture the structure Of
the building, say, that is being modelled. Extracting the edges from the shots of
the building enables a wireframe model to be constructed. This is usually done
semi-automatically with an operator matching corresponding edges in the different projections. It is exactly equivalent to the shape fro m stereo problem
using feature correspondence except t hat now we use a human being instead of
a correspondence-establishing algorithm. We may end up performing a large
amount of manual work on the projections, as much work as would be entailed
in using a conventional CAD package to construct the building. The obvious
potential advantage is that photo-modelling offers the possibility of automatically extracting the rich visual detail of the scene, as well as the geometry.
It is interesting to note that in modelling from photographs approach es, the
computer graphics community has side-stepped the most difficult problems that
are researched in computer vision by embracing some degree of manual intervention. For example, the classical problem of correspondence between images
projected from different view points is solved by having an operator manually
establish a degree of correspondence between fra mes which can enable the success of algorithms that establish detailed pixel-by-pixel correspondence. In com.
puter vision such approaches do not seem to be considered. Perhaps this is due
to well-established traditional attitudes in computer vision which has tended to
see the imitation of human capabilities as an ultimate goal, as well as constraints
from applications.
Using photo-modelling to capture detail has some problems. One is that the
information we obtain may contain light source and view-dependent phenomena such as shadows and specular reflections. These would have to be removed
before the imagery could be used generate the simulated environment from any
view point. Another problem of significance is that we may need to warp detail
in a photograph to fit the geometric model. This may involve expanding a very
small area of an image. Consider, for example, a photograph - taken from the
ground - of high building with a detailed facade. Important detail information
near the top of the building may be mapped into a small area due to the projective distortion. In fact, this problem is identical to view interpolation .
Let us now consider the use of photo-modelling without attempting to
extract the geometry. We simply keep the collected images as two-dimensional
projections and use these to calculate n ew two-dimensional projections. We
never attempt to recover three-dimensional geometry of the scene (although it
is necessary to consider the three-dimensional information concerning the
projections). This is a form of image-based rendering and it has something of a
history.
Consider a virtual walk through an art gallery or m useum. The quality
requirements are obvious. The user needs to experience the subtle lighting con
ditions designed to best view the exhibits. These must be reproduced and suffi
cient detail must be visible in the paintings. A standard computer graphics
approach may result in using a (view-independent) radiosity solution for the
rendering together with (photographic) texture maps for the paintings. The
@)
radiosity approach, where the expensive ren dering calculations are performed
once only to give a view-independent solution may suffice in many contexts in
virtual reality, but it is not a general solution for scenes that contain complex
geometrical detail. As we know, a radiosity rendered scene has to be divided up
into as large elements as possible to facilitate a solution and th ere is always a
high cost fo r detailed scene geometry.
This kind of application -virtual tours around buildings and the like - has
already emerged with the bulk storage freedom offered by videodisk and CDROM. Th e inherent disadvantage of most approaches is that they do not offer
continuous movement or walkthrough but discrete views selected by a user's
position as h e (interactively) navigates around the building. They are akin to an
interactive catalogue and require the user to navigate in discrete steps fro m one
position to the other as determined by the points fro m which the photographic
images were taken. The user 'hops' from view point to view point.
An early example of a videodisk implementation is the 'Movie Map' developed
in 1980 (Lippman 1980). In this early example the streets of Aspen were filmed at
10-foot intervals. To invoke a walkthrough, a viewer retrieved selected views from
two videodisk players. To record the environment four cameras were used at every
view point- thus enabling the viewer to pan to the left and right. The example
demonstrates the trade-off implicit in this approach - because all reconstructed
views are pre-stored the recording is limited to discrete view points.
An obvious computer graphics approach is to use environment maps - origin ally developed in rendering to enable a surrounding environment to be
reflected in a shiny ob ject (see Chapter 8). In image-based rendering we simply
replace the shiny object with a virtual viewer. Consider a user positioned at a
point from which a six-view (cubic) environment map has been constructed
(either photographically or synthetically). If we make the approximation that
the user's eyes are always positioned exactly at the environment map's view
point then we can com pose any view direction-dependent projection demanded
by the user changing his direction of gaze by sampling the appropriate environment maps. This idea is shown schematically in Figure 16.18. Thus we have, for
a stationary viewer, coincidentally positioned at the environment map view
point, achieved our goal of a view-independent solution. We have decoupled the
viewing direction from the rendering pipeline. Composing a new view now consists of sampling environment maps and the scene complexity problem has been
bound by the resolution of the pre-computed or photographed maps.
The highest demand on an image generator used in immersive virtual reality
comes from head movements (we need to compute at 60 frames per second to
avoid the head latency effect) and if we can devise a method where the rendering cost is almost indepen dent of head movement this would be a great step forward. However, the environment map suggestion only works for a stationary
viewer. We would n eed a set of maps for each position that the viewer could be
in. Can we extend the environment map approach to cope with complete walkthroughs? Using the constraint that in a walkthrough the eyes of the user are
always at a constant height, we could construct a number of environment maps
PHOTO-MODELLING AN D IBR
(~)
Figure 16.18
Compositing a user
projection from an
environment map.
Environment map
view points
Reconstruction
Environment map
whose view points were situated at the lattice points of a coarse grid in a plane,
parallel to the ground plane and positioned at eye height. For any user position
could we compose, to some degree of accuracy, a user projection by using infor.
mation from the environment maps at the four adjacent lattice points? The
quality of the final projections are going to depend on the resolution of the
maps and the number of maps taken in a room - t~e resolut_ion of the ey~ pl~ne
lattice. The map resolution will determine the detatled quahty of the proJeCtion
and the number of maps its geometric accuracy.
To be able to emulate the flexibility of using a traditional graphics pipeline
approach, by using photographs (or pre-rendered environment maps), "!e eit~er
have to use a brute-force approach and collect sufficient views compattble wtth
the required 'resolution' of our walkthrough, or we have to try to obtain new
views from the existing ones.
Currently, viewing from cylindrical panoramas is being established as a P~P
ular facility on PC-based equipment (see Section 16.5.1). This involves collectmg
the component images by moving a camera in a semi-constrained mannerrotating it in a horizontal plane. The computer is used merely to 'stitch' the com
ponent images into a continuous panorama - no attempt is made to recover
depth information .
This system can be seen as the beginning of development that may even_tu
ally result in being able to capture all the information in a scene by walking
around with a video camera resulting in a three-dimensional photograph of the
scene. We could see such a development as merging the separate stages of mod
elling and rendering, there is now no distinction between them. The virtual
Developed in 1994, Apple's QuicklimeVR is a classic example of using a photographic panorama as a pre-stored virtual environment. A cylindrical panorama
is chosen for this system because it does not require any special equipment
beyond a standard camera and a tripod with some accessories. As for reprojection - a cylindrical map has the advantage that it only curves in one direction thus making the necessary warping to produce the desired p lanar projection
fast . The basic disadvantage of the cylindrical map - the restricted vertical field
of view - can be overcome by using an alternative cubic or spherical map but
both of these involve a more difficult photographic collection process and the
sphere is more difficult to warp. The inherent viewing disadvantage of the
cylinder depends on the application. For example, in architectural visualization
it may be a serious drawback.
Figure 16.19 (Colour Plate) is an illustration ofthe system. A user takes a series
of normal photographs, using a camera rotating on a tripod, which are then
'stitched' together to form a cylindrical panoramic image. A viewer positions
himself at the view point and looks at a portion of the cylindrical surface. Th e
re-projection of selected part of the cylinder onto a (planar) view surface
involves a simple image warping operation which, in conjunction with other
speed-up strategies, operates in real time on a standard PC. A viewer can continuously pan in the horizontal direction and the vertical direction to within the
vertical field of view limit.
Currently restricted to monocular imagery, it is interesting to note that one
of the most lauded aspects of virtual reality- three-dimensionality and immersion - h as been for the moment ignored. It may be that in the immediate future
monocular non-immersive imagery, which does n ot require expensive stereo
viewing facilities and which concen trates on reproducing a visually complex
environment, will predominate in the popularization of virtual reality facilities.
Compositing panoramas
e = tan- (x/z)
1
photographs. The above equations can still be used substituting the focal length
of the lens for z and calculating x and y from the coordinates in the photograph
plane and the lens parameters. This is equivalent to considering the scene as a
picture of itself - all objects in the scene are considered to be at the same depth.
Another inherent advantage of a cylindrical panorama is that after the overlap.
ping planar photographs are mapped into cylindrical coordinates (just as if we had
a cylindrical film plane in the camera) the construction of the complete panorama
can be achieved by translation only - implying that it is straightforward to auto.
mate the process. The separate images are moved over one another until a match
is achieved - a process sometimes called 'stitching'. As well as translating the
component images, the photographs may have to be processed to correct for expo.
sure differences that would otherwise leave a visible vertical boundary in the
panorama.
The overall process can now be seen as a warping of the scene onto a cylin.
drical viewing surface followed by the inverse warping to re-obtain a planar pro.
jection from the panorama. From the user's point of view the cylinder enables
both an easy image collection model and a natural model for viewing in the
sense that we normally view an environment from a fixed height - eye level look around and up and down.
( 16.6 .3 )
(~)
Figure 16.20
The pixel that corresponds
to point Pi n the virtual view
receives a weight ed average
of the corresponding pixels
In t he reference images.
The w eights are inversely
proportional to e, and 92.
Virtual
view point
Reference
view point 2
Computer animation
R eference
view point I
Model surface
model. The extent of this difference depends on the labour that the user has put
into the interactive modelling phase and the assumption is that the geometric
model will be missing such detail as window recesses and so on. For example, a
facade modelled as a plane may receive a texture that contains such depth information as shading differences and this can lead to images that do not look correct. The extent of this depends on the difference between the required viewing
angle and the angle of the view from which the texture map was selected.
Debevec eta/. (1996) go on to extend their method by using the geometric
model to facilitate a corresponden ce algorithm that enables a depth map to be
calculated and the geometric detail missing from the original model to be
extracted. Establishing correspondence also enables view interpolation.
This process is called ' model-based' stereo and it uses the geometric model as
a priori information which enables the algorithm to cope with views that have
been taken from relatively far apart - one of the practical motivations of the
work is that it operates with a sparse set of views. (The main problem with traditional stereo correspondence algorithms is that they try to operate without
prior knowledge of scene structure. Here the extent of the correspondence problem predominantly depends on how close the two views are to each other.)
17.1
17.2
17.3
17.4
17.5
Collision detection
17.6
Collision response
17.7
Particle animation
17.8
Behavioural animation
17.9
Summary
Introduction
Leaving aside some toys of the nineteenth century, it is interesting to consider that
we have only had the ability to create and disseminate moving imagery for a very
short period -since the advent of film. In this time it seems that animation has not
developed as a mainstream art form. Outside the world of Disney and his imitators
there is little film animation that reaches the eyes of the common man. It is curious
that we do not seem to be interested in art that represents movement and mostly
consign animation to the world of children's entertainment. Perhaps we can thank
Disney for raising film animation to an art form and at the same time condemning
it to a strange world of cute animals who are imbued with a set of human emotions.