Physicsof Digital Photography
Physicsof Digital Photography
Download details:
IP Address: 109.92.175.138
This content was downloaded on 09/12/2020 at 15:06
The accuracy of webcams in 2D motion analysis: sources of error and their control
A Page, R Moreno, P Candelas et al.
Series Editor
R Barry Johnson a Senior Research Professor at Alabama A&M
University, has been involved for over 50 years in lens design,
optical systems design, electro-optical systems engineering, and
photonics. He has been a faculty member at three academic
institutions engaged in optics education and research, employed
by a number of companies, and provided consulting services.
Dr Johnson is an IOP Fellow, SPIE Fellow and Life Member, OSA Fellow, and was
the 1987 President of SPIE. He serves on the editorial board of Infrared Physics &
Technology and Advances in Optical Technologies. Dr Johnson has been awarded
many patents, has published numerous papers and several books and book chapters,
and was awarded the 2012 OSA/SPIE Joseph W Goodman Book Writing Award for
Lens Design Fundamentals, Second Edition. He is a perennial co-chair of the annual
SPIE Current Developments in Lens Design and Optical Engineering Conference.
Foreword
Until the 1960s, the field of optics was primarily concentrated in the classical areas of
photography, cameras, binoculars, telescopes, spectrometers, colorimeters, radio-
meters, etc. In the late 1960s, optics began to blossom with the advent of new types of
infrared detectors, liquid crystal displays (LCD), light emitting diodes (LED), charge
coupled devices (CCD), lasers, holography, fiber optics, new optical materials,
advances in optical and mechanical fabrication, new optical design programs, and
many more technologies. With the development of the LED, LCD, CCD and other
electo-optical devices, the term ‘photonics’ came into vogue in the 1980s to describe
the science of using light in development of new technologies and the performance of
a myriad of applications. Today, optics and photonics are truly pervasive throughout
society and new technologies are continuing to emerge. The objective of this series is
to provide students, researchers, and those who enjoy self-teaching with a wide-
ranging collection of books that each focus on a relevant topic in technologies and
application of optics and photonics. These books will provide knowledge to prepare
the reader to be better able to participate in these exciting areas now and in the future.
The title of this series is Emerging Technologies in Optics and Photonics where
‘emerging’ is taken to mean ‘coming into existence,’ ‘coming into maturity,’ and
‘coming into prominence.’ IOP Publishing and I hope that you find this Series of
significant value to you and your career.
Physics of Digital Photography
(Second Edition)
D A Rowlands
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system
or transmitted in any form or by any means, electronic, mechanical, photocopying, recording
or otherwise, without the prior permission of the publisher, or as expressly permitted by law or
under terms agreed with the appropriate rights organization. Multiple copying is permitted in
accordance with the terms of licences issued by the Copyright Licensing Agency, the Copyright
Clearance Centre and other reproduction rights organizations.
Permission to make use of IOP Publishing content other than as set out above may be sought
at permissions@ioppublishing.org.
D A Rowlands has asserted his right to be identified as the author of this work in accordance with
sections 77 and 78 of the Copyright, Designs and Patents Act 1988.
DOI 10.1088/978-0-7503-2558-5
Version: 20201001
IOP ebooks
British Library Cataloguing-in-Publication Data: A catalogue record for this book is available
from the British Library.
US Office: IOP Publishing, Inc., 190 North Independence Mall West, Suite 601, Philadelphia,
PA 19106, USA
For my parents, Ann and Gareth
Contents
Preface xv
Author biography xvi
Abbreviations xvii
vii
Physics of Digital Photography (Second Edition)
viii
Physics of Digital Photography (Second Edition)
ix
Physics of Digital Photography (Second Edition)
x
Physics of Digital Photography (Second Edition)
xi
Physics of Digital Photography (Second Edition)
xii
Physics of Digital Photography (Second Edition)
xiii
Physics of Digital Photography (Second Edition)
Index 6-1
xiv
Preface
xv
Author biography
D A Rowlands
Andy Rowlands gained a first-class degree in Mathematics and
Physics and a PhD in Theoretical Condensed Matter Physics from
the University of Warwick, UK.
He was subsequently awarded a Fellowship in Theoretical Physics
from the Engineering and Physical Sciences Research Council
(EPSRC), which he held at the University of Bristol, UK, and
this was followed by research positions at Lawrence Livermore
National Laboratory, USA, Tongji University in Shanghai, China, and the University
of Cambridge, UK.
Andyʼs combined interests in physics and photography inspired the writing of this
book. His photographic work, much of which features China, can be viewed at
http://www.andyrowlands.com.
xvi
Abbreviations
1D one dimension
2D two dimensions
ADC analog-to-digital converter
ADU analog-to-digital unit
AF autofocus
AFoV angular field of view
AHD Adaptive Homogeneity-Directed
APEX Additive System of Photographic Exposure
APS-C Advanced Photo System Type-C
AS aperture stop
ATF aberration transfer function
Av aperture value
AW adopted white
BSI backside illumination
Bv brightness value
CAT chromatic adaptation transform
CCD charge-coupled device
CCE charge collection efficiency
CCT correlated colour temperature
CDS correlated double sampling
CFA colour filter array
CIE Commission Internationale de l’Eclairage
CIPA Camera and Imaging Products Association
CMM colour-matching module
CMOS complimentary metal-oxide semiconductor
CoC circle of confusion
CRT cathode ray tube
CSF contrast sensitivity function
CTF contrast transfer function
DCNU dark current non-uniformity
DN data number
DNG digital negative
DoF depth of field
DOL digital output level
DR dynamic range
DSC/SMI digital still camera sensitivity metameric index
D-SLR digital single-lens reflex
DSNU dark signal non-uniformity
EC exposure compensation
EP entrance pupil
ESF edge spread function
ETTR expose to the right
Ev exposure value
EW entrance window
EXIF Exchangeable Image File
FF fill factor
FFP front focal plane
xvii
Physics of Digital Photography (Second Edition)
xviii
Physics of Digital Photography (Second Edition)
xix
IOP Publishing
Chapter 1
Photographic optics
A camera provides control over the light from the photographic scene as the light
flows through the lens to the sensor plane (SP), where an optical image is formed.
The nature of the optical image, along with the choice of exposure duration,
determines the exposure distribution at the SP. This stimulates a response from the
imaging sensor, and it is the magnitude of this response, along with subsequent
signal processing, that ultimately defines the appearance of the output digital image.
The photographer must balance a variety of technical and aesthetic factors that
determine the nature of the optical image formed at the SP. For example, the field of
view of a camera and lens combination is an important geometrical factor that
influences photographic composition. For a given sensor format, this is primarily
controlled by the choice of lens focal length. Another fundamental aesthetic aspect is
the choice of focus point and the depth of field (DoF), which is the depth of the
photographic scene that appears to be in focus. DoF is primarily controlled by the
object distance and the lens aperture diameter, which restricts the size of the light
bundle passing through the lens. Another aspect is the appearance of subject motion.
This depends on the exposure duration, which is controlled by the shutter. Short and
long exposure durations can, respectively, freeze or blur the appearance of subject
motion.
This chapter begins with the basics of optical image formation and works through
the optical principles that determine the fundamental nature of the optical image
and subsequent exposure distribution formed at the SP. Many of these principles can
be described using simple photographic formulae. Photometry is used to quantify
light, and Gaussian optics is used to describe light propagation in terms of rays.
Gaussian optics is a branch of geometrical optics that assumes ideal imaging by
neglecting lens aberrations and their associated image defects. Although aberrations
are an inherent property of spherical surfaces, modern photographic lenses are
generally well-corrected for aberrations through skilful lens design. Formulae based
on Gaussian optics predict the position and size of the optical image representative
of a well-corrected photographic lens.
In practice, an exposure distribution is required that stimulates a useful response
from the imaging sensor. This is a fundamental aim of a photographic exposure
strategy, which forms the subject of chapter 2. One of the key quantities involved is
the lens f-number. Although photography is typically carried out in air, the
derivation of the f-number presented in this chapter does not make any assumption
about the nature of the refractive media surrounding the lens. This helps to provide
deeper insight into its physical significance.
A variety of physical phenomena that are beyond the scope of geometrical optics,
such as diffraction, are introduced in chapter 3. Such phenomena profoundly affect
image quality, which is the subject of chapter 5.
1-2
Physics of Digital Photography (Second Edition)
I
U
I
n n
R
U U z
C
S
S
Figure 1.1. Snell’s law of refraction illustrated using a converging surface with positive surface power. The
incident ray (blue) is refracted at the spherical surface (magenta). The dotted line is the surface normal. Here,
n′ > n , and R is positive since the surface is positioned to the left of C.
n′ − n
Φs = . (1.2)
R
A converging surface has a positive surface power. Given an incoming ray travelling
parallel to the OA, a converging surface will redirect the ray so that it intersects the
OA. On the other hand, a diverging surface has negative power and will bend such a
ray away from the OA. If R → ∞, then the spherical surface takes the form of a plane
positioned perpendicular to the OA. In this case, the surface has zero refractive
power and a ray travelling parallel to the OA cannot be redirected, irrespective of
the value of n′.
A simple lens or element is defined as a transparent refractive medium such as
glass with refractive index nlens bounded by two refracting surfaces. A positive
overall refractive power is required for image formation by a photographic lens.
Although refraction by a single converging spherical surface can form an optical
image of an object, an image formed by a lens can be located outside the lens
refractive medium.
1-3
Physics of Digital Photography (Second Edition)
(1) (2) z
Figure 1.2. SA due to a single spherical surface. Rays originating from the same object point on the OA
converge to different image points (1) and (2) on the OA after undergoing refraction.
1-4
Physics of Digital Photography (Second Edition)
OA where all angles are infinitesimally small. In particular, the following relation
holds exactly:
sin θ = θ (when θ → 0).
This means that the effects of aberrations are infinitesimally small in the paraxial
region and so the image-forming properties are ideal. Paraxial quantities are
conventionally denoted using lower-case symbols:
S→s
S ′ → s′
I →i
I ′ → i′
U →u
U ′ → u′ .
Snell’s law becomes
n′ i′ = n i . (1.3)
For simplicity, again consider a single spherical surface separating two refractive
media. Figure 1.3 shows a point object p positioned at the OA at a distance s from
the surface. The plane perpendicular to the OA at p is defined as the object plane
(OP).
Consider a paraxial ray making an infinitesimal angle u with the OA. After
refraction, this ray will intersect the OA again at point p′ a distance s′ from the
surface. The point p′ is defined as the image of p. The plane perpendicular to the OA
at p′ is defined as the image plane (IP).
Graphical consideration of figure 1.3 along with equation (1.3) can be used to
derive the following relationship [1]:
OP IP
i
n n
p u u p
C z
R
s
s
Figure 1.3. Snell’s law in the paraxial region where all angles are infinitesimally small. The object is positioned
at the OA, and the dotted line is the surface normal. Here, u is positive, u′ is negative and s, s′ are both positive.
1-5
Physics of Digital Photography (Second Edition)
n n′
+ = Φs . (1.4)
s s′
Here Φs is the surface power defined by equation (1.2). Given a single spherical
surface and the OP distance s, the significance of equation (1.4) is that only
knowledge of the surface power is needed to locate the corresponding IP.
The photographic sign convention is such that the OP distance s is positive when
measured from right to left, and the IP distance s′ is positive when measured from
left to right.
Although equation (1.4) can be used to locate the positions of the OP and IP, the
angles involved are paraxial. Therefore, the object at the OP and image at the IP can
only be points positioned infinitesimally close to the OA. Further development is
needed in order to describe imaging of real objects of arbitrary size.
1-6
Physics of Digital Photography (Second Edition)
i y i
n n
u u
C z
s
s
Figure 1.4. The point of ray intersection with the spherical surface (magenta curve) at height y is projected
onto a tangent plane (red line) at the same height y. Here the ray tangent slopes u and u′ are positive and
negative, respectively.
also present in the paraxial region, is retained when extending the paraxial region.
Substituting equation (1.6) into (1.4) yields the following result:
n′u′ = nu − yΦs . (1.7)
This is the form of Snell′s law obtained by extending the paraxial region and
retaining the surface power. The significance of this equation is the linear relation-
ship between the ray tangent slopes and heights; all slopes and heights can be scaled
without affecting the positions of the OP and IP. In other words, Fermat′s principle
must be violated in order to exclude aberrations from consideration.
For a single surface, equation (1.4) is now formally valid for imaging of points
positioned at arbitrary height above the OA and therefore real objects of arbitrary
size. Figure 1.5 shows an object of height h positioned at the OP. The location of the
IP can be found by tracing rays from the axial (on-axis) position on the OP. Rays
from a non-axial position on the OP such as the top of the object will meet at a
unique point on the same IP since aberrations are absent. Therefore, the IP defines
the plane where the optical image of the OP can be said to be in sharp focus.
1-7
Physics of Digital Photography (Second Edition)
OP IP
u
h
y
u z
h
n n
s s
Figure 1.5. Imaging by a spherical surface after extending the paraxial region. Example rays for an object
point on the OA and at the top of the object of height h are shown. The ray slopes and intersection height have
been indicated for the grey ray. In this case, both u and u′ are negative.
Since Gaussian optics can describe imaging of real objects by a single surface,
Gaussian optics can also describe imaging of real objects by a general compound lens
that may consist of a large number of surfaces of various curvatures and spacings.
This is achieved by tracing paraxial rays through the lens by applying equation (1.7)
at each surface. This is known as a ynu raytrace [2, 3]. Only ray slopes u and u′
measured relative to the OA are needed along with the ray intersection height y at
each tangent plane. The IP will be located after the final surface.
As an example, figure 1.6 illustrates the transfer of a ray between two refracting
surfaces. Only the tangent planes to the surfaces are shown. Equation (1.7) can be
applied at the tangent plane to surface 1:
n1′u1′ = n1u1 − y1Φ1,
where u1 and u1′ are the ray slopes of incidence and refraction and y1 is the ray height
at the tangent plane. Similarly, equation (1.7) can be applied at the tangent plane to
surface 2:
n 2′u 2′ = n2u2 − y2 Φ2 .
The transfer of the ray between the surfaces is achieved by noting that
n2u2 = n1′u1′
y2 = y1 + t12 u1′,
where t12 is the axial distance between the surfaces.
1-8
Physics of Digital Photography (Second Edition)
1 2
y1 u1 u1
u2
y2
n1 n1 n2
t12
Details of standard layouts for various types of photographic lenses can be found
in the lens patent literature and a variety of textbooks [4–6].
1-9
Physics of Digital Photography (Second Edition)
first final
surface H H surface
y1
uk
z
P P F
n n
Figure 1.7. Principal points and second focal point F’ for a compound lens. The ray illustrated is used to locate
the second principal plane H′.
first final
surface H H surface
yk
u1
z
F P P
n n
Figure 1.8. Principal points and first focal point F for a compound lens. The ray illustrated is used to locate the
first principal plane H.
The principal planes may be located in any order and not necessarily inside the
lens itself. The second principal plane H′ is often located to the left of H, and so the
terms ‘front’ and ‘rear’ are usually avoided.
An important property of H and H′ is that they are planes of unit magnification
since rays travel parallel between them in this equivalent refracting scenario. It must
be emphasised that rays do not necessarily follow the path indicated by the dashed
lines in figure 1.7 and 1.8, and that these equivalent refracting scenarios must
generally be treated as hypothetical.
1-10
Physics of Digital Photography (Second Edition)
OP H H IP
n n
s s
Figure 1.9. The Guassian conjugate equation describes the relationship between the OP to H distance s and the
IP to H’ distance s′ for a general compound lens.
1-11
Physics of Digital Photography (Second Edition)
Thin lens
The thin lens illustrated in figure 1.10 has negligible thickness compared to its
diameter. The total refractive power Φ is simply the sum of the individual surface
powers Φ1 and Φ2 [1]:
Φ = Φ1 + Φ2 ,
n −n
Φ1 = lens ,
R1 (1.9)
n′ − nlens
Φ2 = .
R2
R2 R1
C2 C1
n tlens = 0 n
nlens
Figure 1.10. The idealised thin lens with t → 0. R1 and R2 are the radii of curvature of the first and second
refracting surfaces, respectively. The corresponding centres of curvature are labelled C1 and C2. Here R2 is
negative because C2 lies to the left of the second surface.
1-12
Physics of Digital Photography (Second Edition)
If nlens > n, n′, then Φ1 and Φ2 are both positive. Equation (1.8) with Φ defined by
equation (1.9) is known as Descartes′ thin lens formula.
Since tlens → 0, the tangent planes for the two spherical surfaces shown in
figure 1.10 are coincident with the principal planes H and H′ ‘at the lens’.
Accordingly, the OP distance s and IP distance s′ are both measured ‘from the
lens’. Given s, the distance s′ can be found by solving equations (1.8) and (1.9).
Air has a refractive index equal to 1.000 277 at standard temperature and pressure
(0 °C and 1 atm). In comparison, water has a refractive index equal to 1.3330 at
standard temperature (20 °C). A value of 1 for the refractive index of air is often
assumed to simplify photographic formulae. For precise calculations, refractive
indices may be defined relative to air rather than to vacuum. In either case, equations
(1.8) and (1.9) simplify when the lens is surrounded by air:
1 1 ⎛1 1 ⎞
+ = (nlens − 1)⎜ − ⎟.
s s′ ⎝ R1 R2 ⎠
This is known as the lensmakers’ formula. In analogy with imaging by a single
refracting surface, the significance of Descartes’ and the lensmakers’ formulae is that
only the total refractive power is required to locate the IP.
Thick lens
In practice, an element of a compound lens can be treated mathematically as a thin
lens for preliminary calculations or if neglecting its thickness does not affect the
accuracy of the calculation [3]. For a single thick lens, a third term Φ12 emerges:
H H
R2 R1
z
C2 C1
n nlens n
tlens
Figure 1.11. Thick lens with thickness t lens .
1-13
Physics of Digital Photography (Second Edition)
Φ = Φ1 + Φ2 − Φ12 ,
n −n
Φ1 = lens ,
R1
n′ − nlens (1.10)
Φ2 = ,
R2
t
Φ12 = lens Φ1Φ2 .
nlens
As illustrated in figure 1.11, tlens is the lens thickness at the OA. Equation (1.8) with
Φ defined by equation (1.10) is known as Gullstrand′s equation.
Evidently the total refractive power Φ can be obtained using the above equation
after performing a ynu raytrace through all surfaces and calculating f ′, where
f ′ = y1/uk .
1-14
Physics of Digital Photography (Second Edition)
(a) n n
H H
P P F
z
f
(b) n n
H H
F P P
z
f
Figure 1.12. Front and rear effective focal lengths for a compound lens. (a) Rear effective focal length f ′ and
rear focal point F′. (b) Front effective focal length f and front focal point F.
n n n′ n′
= + = . (1.13)
f s s′ f′
This form of the Gaussian conjugate equation describes the behaviour illustrated in
figure 1.13. If the OP is brought forward from infinity so that s decreases, the
distance s′ increases and so the IP moves further away from the rear focal plane.
1-15
Physics of Digital Photography (Second Edition)
OP n H H n IP
F P P F
z
f f
s s
Figure 1.13. Imaging according to the Gaussian conjugate equation. Only rays from the axial position on the
OP are shown.
Nodal points
Photographers often refer to the nodal points of a lens rather than the principal
points. The nodal points can be visualised intuitively; a ray aimed at the first nodal
point in object space will emerge from the second nodal point with the same slope in
image space. The nodal points are therefore points of unit angular magnification.
The front nodal point is often assumed to be the no-parallax point in panoramic
photography; in fact, the no-parallax point is the lens entrance pupil [8].
The front and rear effective focal lengths are always defined from the principal
points [1, 7]. Nevertheless, they can be measured from the nodal points provided the
definition is reversed [4]; the distance from the first nodal point to the front focal
point is equal to the magnitude of the rear effective focal length f ′, and the distance
from the second nodal point to the rear focal point is equal to the magnitude of the
front effective focal length f . This is illustrated in figure 1.14.
If the object-space and image-space refractive indices n and n′ are equal, then the
first nodal and first principal points coincide, the second nodal and second principal
points coincide, and f = f ′. This is naturally the case in air, where a nodal slide can
be used to experimentally determine effective focal lengths [1].
1-16
Physics of Digital Photography (Second Edition)
f H H f
n n (> n)
F P P u = u F
z
N N
u
f f
Figure 1.14. Lens nodal points; a ray aimed at the first nodal point, N, emerges from the second nodal point,
N′, at the same tangent slope, u. In this example, image space has a higher refractive index than object space.
Table 1.1. Focal lengths and surface powers for a thin lens calculated using
equation (1.9) assuming that R1 = 0.1m, R2 = −0.1 m and n lens = 1.5. All powers
and focal lengths are measured in diopters and metres, respectively.
n 1 1.33 1.33 1
n′ 1 1.33 1 1.5
Φ1 5 1.7 1.7 5
Φ2 5 1.7 5 0
Φ 10 3.4 6.7 5
fE 0.1 0.294 0.149 0.2
f 0.1 0.391 0.199 0.2
f′ 0.1 0.391 0.149 0.3
1
fE = . (1.14)
Φ
Combining equations (1.11), (1.12) and (1.14) reveals the following relationship:
n n′ 1
= = . (1.15)
f f′ fE
Unlike f and f ′, the effective focal length fE is not a physically representable length
in general. However, in photography the usual case is that the object-space and
image-space refractive media are both air so that n = n′ = 1. In this case, fE = f = f ′
and photographers simply refer to the ‘focal length’. The Gaussian conjugate
equation may then be written as follows,
1-17
Physics of Digital Photography (Second Edition)
1 1 1
+ = (in air).
s s′ fE
In order to illustrate the general relationship between f, f ′ and fE , table 1.1 lists
example data for a thin lens calculated using Descartes’ formula. It can be seen that
if either of n or n′ are changed, then the total refractive power Φ and effective focal
length fE will change. Significantly, this affects the values of both f and f ′.
1.1.10 Magnification
Lateral or transverse magnification m is defined as the ratio of the image height h′ to
the object height h:
h′
m= . (1.16)
h
As shown in figure 1.15, the IP projected onto the imaging sensor by a photographic
lens is real and inverted. In optics, the usual sign convention is to take h and h′ as
positive when measured upwards from the OA, and so h′ and m are negative quantities.
Within Gaussian optics, m does not vary over the IP and it can be expressed in terms
of the object and image distances measured along the OA from the principal points,
ns′
m=− .
n′s
Substituting into the Gaussian conjugate equation defined by equation (1.13) yields
f
m= . (1.17)
f−s
The Lagrange theorem can be used to express m in terms of the initial and final ray
slopes:
nu
m= . (1.18)
n′u′
Although this book uses the optics sign convention for the object and image
heights, m is often taken to be a positive value in photographic formulae. In order to
OP IP
n
h
u u
h
n
s s
Figure 1.15. Magnification is defined by m = h′/h within Gaussian optics.
1-18
Physics of Digital Photography (Second Edition)
f
∣m∣ = . (1.19)
s−f
This commonly encountered formula shows that the magnification reduces to zero
when focus is set at infinity.
A macro lens can achieve life-size (1:1) reproduction. This occurs when the OP is
positioned at s = 2f so that ∣m∣ = 1. Higher magnifications may be possible for
f < s < 2f via use of a bellows, lens extension tube, or close-up filter.
1-19
Physics of Digital Photography (Second Edition)
1.2 Focusing
In the previous section describing optical image formation, an OP was selected and
the Gaussian conjugate equation was used to determine the location of the
corresponding IP where the optical image of the OP appears in sharp focus. In a
digital camera, the IP needs to be positioned to coincide with the imaging sensor.
However, the location of the imaging sensor is fixed, and so a focusing operation is
required in general.
In a digital camera, the imaging sensor is fixed in position at the sensor plane (SP),
which is analogous to the film plane in a film camera. Recall the definition of the rear
focal plane and the concept of setting focus at infinity described in sections 1.1.6 and
1-20
Physics of Digital Photography (Second Edition)
1.1.9. Significantly, the SP is positioned to coincide with the rear focal plane when
focus is set at infinity.
When considering the Gaussian conjugate equation defined by equation (1.13), it
is clear that when the OP is brought forward from infinity, the IP will no longer
coincide with the rear focal plane and SP but will naturally move behind them. In
order to record an optical image in sharp focus at the SP, the IP must be brought
forward to coincide with the SP. This is achieved by moving the optical elements of
the lens in order to bring the rear focal plane forward until the IP and SP coincide.
The optical elements can be moved either manually by adjusting the lens barrel or
via an autofocus (AF) motor linked to the AF system. Since the SP and IP only
coincide with the rear focal plane when focus is set at infinity, these terms should not
be used interchangeably.
In a lens that utilizes traditional or unit focusing, the whole lens barrel is moved in
order to achieve sharp focus. Other focusing methods include front-cell focusing
and internal focusing. In the case of internal focusing, a floating optical element
or group is moved inside the lens in order to achieve sharp focus [5, 6, 9]. As
discussed in section 1.3, focusing can affect the framing of the scene. This is known
as focus breathing, and different focusing methods have different breathing
properties.
1-21
Physics of Digital Photography (Second Edition)
(a) SP
H H IP
P P
z
Δ
s→∞ s = f
(b) OP H H SP IP
P P
z
Δ f e
l l = f + e
(c) SP
OP H H IP
P P
z
Δ f e
s=l−e
d s = f + e
Figure 1.16. Geometry for focusing movement in a lens that utilizes unit focusing. Note that f ′ = fE for a lens
immersed in air.
1-22
Physics of Digital Photography (Second Edition)
Here l is the original OP distance before the focusing movement. Since e has been
defined relative to the infinity focus position, e → 0 when s → ∞.
The magnification can be expressed in terms of e by combining equations (1.19)
and (1.20),
e
∣m∣ = .
fE
As required, ∣m∣ → 0 when focus is set at infinity.
Since a lens is unable to focus on an OP positioned at the front focal point or
closer, equation (1.21) reveals that a unit-focusing lens is unable to focus on an OP
positioned closer than an original OP distance l = 3fE from the first principal point.
At this minimum focus distance, e = fE . After the focusing movement, the minimum
possible object-to-image distance is 4fE + Δ, where Δ is the separation between the
compound lens principal planes. Note that Δ can be positive or negative, depending
on the order of the principal planes. For an ideal thin lens, Δ → 0 and the minimum
possible object-to-image distance is 4fE .
Manual focusing scales on lenses are calibrated using the distance from the SP to
the OP. The location of the sensor/film plane is indicated by a o symbol marked on
the camera body. From figure 1.16(c) where d denotes the SP to OP distance, the
extension e is obtained by making the substitution d = l + fE + Δ in equation (1.21),
1
e=
2 (
(d − 2fE − Δ) − (d − 2fE − Δ)2 − 4f E2 . )
The above formulae derived within Gaussian optics are almost exact for a photo-
graphic lens that has been well-corrected for aberrations. In practice, defocus
aberration may be used to counterbalance other aberrations, and so the ideal IP
position may be intentionally shifted very slightly from the Gaussian image position.
1-23
Physics of Digital Photography (Second Edition)
1-24
Physics of Digital Photography (Second Edition)
roof
(a) pentaprism
ocular
XP
focusing screen
reflex SP
mirror
autofocus
module
(b)
eye
Figure 1.18. (a) Roof pentaprism used in the SLR viewfinder. Here the cone of light emerging from the lens
exit pupil (XP) corresponds to the central position on the SP. (b) The pentaprism corrects the orientation of the
optical image seen by the eye.
1-25
Physics of Digital Photography (Second Edition)
SP
(equivalent) microlenses CCD signal
(a)
(b)
(c)
Figure 1.19. Principle of the PDAF module. (a) The OP is in sharp focus at the equivalent SP and so the
signals on each half of the CCD strip are identical. (b) and (c) The OP is out of focus as indicated by the signal
phase shift on each half of the CCD strip.
a plane that is optically equivalent to the SP. Figure 1.19 illustrates the basic
function of the AF module. Tiny microlenses direct the light arriving from the
camera lens onto a CCD strip.
Consider light originating from a small region on the OP indicated by an AF
point on the focusing screen. Photographic lenses contain an adjustable aperture
stop (AS) such as that shown in figure 1.21 that limits the diameter of the ray bundle
that can pass through the lens. It will be explained in section 1.3.1 that the exit pupil
(XP) is the image of the AS due to optical elements behind the AS. PDAF is based
upon the fact that when the OP is in sharp focus at the SP, symmetry dictates that
the light distribution over equivalent portions of the XP will appear identical when
viewed from the small region on the SP that corresponds to the AF point.
Figure 1.19(a) shows that in this case, an identical optical image and electronic
signal is obtained along each half of the CCD strip. In figure 1.19(b) and (c), the
optical image at the SP is not in sharp focus and is described as being out of focus. In
these cases, the optical images on each half of the CCD strip will shift either towards
or away from each other. The nature of the shift indicates the direction and amount
by which the lens needs to be adjusted by its AF motor.
A horizontal CCD strip is suited for analysing signals that arise when the AF
point contains horizontal detail. The optimum signal will arise from a change of
1-26
Physics of Digital Photography (Second Edition)
Figure 1.20. An example focusing screen with 13 AF points. The left and right groups correspond to horizontal CCD
strips contained in the AF module, whereas the AF points in the central group correspond to cross-type CCD strips.
contrast provided by a vertical edge. This means that the AF module may be unable
to achieve focus if the AF point covers an area that contains only a horizontal edge.
Conversely, a vertical CCD strip is suited for analysing signals that arise from detail
in the vertical direction such as a horizontal edge. Cross-type AF points such as
those shown in the centre of figure 1.20 utilize both a horizontal and a vertical CCD
strip so that scene detail in both directions can be utilized to achieve sharp focus.
1.3 Framing
The framing or field of view (FoV) is the area of the OP that can be imaged by the
camera. In this section, it is shown that the FoV is formed by a specific light cone
with the base at the OP and apex situated at the lens entrance pupil. In photography,
a more useful way to express FoV is via the angle of the FoV or angular field of view
(AFoV) [3]. The AFoV describes the angular extent of the light cone that defines the
FoV, and it can be measured along the horizontal, vertical or diagonal directions.
Equation (1.14) defined the effective focal length fE as the reciprocal of the total
refractive power. The shorter the effective focal length of a lens, the more strongly it
is able to bend the incoming light rays, and hence the wider the AFoV. Conversely,
the longer the effective focal length, the narrower the AFoV.
A number of important concepts will be introduced through the derivation of the
AFoV formula. These include entrance and exit pupils, field stops, chief rays, pupil
magnification, bellows factor and focus breathing. Finally, the concept of perspective will
be described. This depends upon the distance between the OP and the entrance pupil.
1-27
Physics of Digital Photography (Second Edition)
Figure 1.21. Adjustable 8-blade iris diaphragm that acts as an aperture stop. Iris diaphragms with an odd
number of blades are more commonly found in practice.
the AS in object space and image space rather than the AS itself. In a photographic
lens, these images are typically virtual and so cannot be displayed on a screen.
• Within Gaussian optics, the entrance pupil (EP) is a flat surface that coincides
with the Gaussian object-space image of the AS formed by the elements in
front of the AS.
• Within Gaussian optics, the exit pupil (XP) is a flat surface that coincides with
the image-space Gaussian image of the AS formed by the elements behind the
AS.
• For a symmetric lens design, the EP and XP will coincide with the first and
second principal planes, respectively. Accordingly, they may be located in any
order and not necessarily inside the lens.
• As discussed in section 1.5.10, the pupils are not flat surfaces when a lens is
described using real ray optics, but rather they are portions of spheres that
intersect the Gaussian images of the AS at the OA [14].
Figure 1.22 shows the location of the pupils for an example lens design in which the
images of the AS are real, and figure 1.23 shows a photographic lens design in which
the images of the AS are virtual. As evident from the ray bundles bounded by the
rays highlighted in blue in the figures, the pupil diameters control the amount of light
that can enter and exit the lens. This is fundamental for defining exposure, as
discussed in section 1.5. However, it is also evident from the ray bundles bounded by
the rays highlighted in green in the figures that the pupil positions are fundamental
for defining the AFoV indicated by α. This is because the object-space centre of
perspective is located at the centre of the EP, and the image-space centre of
perspective is located at the centre of the XP.
1-28
Physics of Digital Photography (Second Edition)
EW EP FS AS XP XW
u α u
Figure 1.22. The chief rays (green) define the AFoV (α). The marginal rays (blue) define the maximum cone
acceptance angles u and u′ and therefore the amount of light passing through the lens, but are not relevant to a
discussion of framing or AFoV. Shown are the entrance window (EW) positioned on the OP, the entrance
pupil (EP), field stop (FS), aperture stop (AS), exit pupil (XP) and exit window (XW) positioned on the SP.
EW XP AS EP XW
u
α
Figure 1.23. Virtual pupils for an example photographic lens design. The chief rays (green) define the AFoV.
of the limiting FS seen through the front and back of the lens are known as the
entrance window (EW) and exit window (XW), respectively. When a photographic
lens is focused on a specified OP, these windows are positioned on the OP and SP so
that the FoV area is defined by the EW.
A meridional plane is a plane containing the OA. As illustrated in figure 1.22, the
chief ray (highlighted in green) is the incoming meridional ray from the scene that
passes by the edge of the limiting FS. The chief ray passes through the centre of the
1-29
Physics of Digital Photography (Second Edition)
AS. Consequently, the chief ray also passes through the centre of the pupils, as
evident in figure 1.22 and 1.23. In the photographic lens with virtual pupils shown in
figure 1.23, the virtual extension of the entering chief ray passes through the centre of
the EP, and the virtual extension of the exiting chief ray passes through the centre of
the XP.
Formally, the chief ray defines the AFoV as the angle α subtended by the edges of
the EW from the centre of the EP. Therefore, the apex of the light cone forming the
AFoV is located at the centre of the EP. If a lens designer moves the position of the
AS, the pupil positions and pupil magnification (see below) will change. This will
alter the AFoV and perspective (see section 1.3.7), but the FoV area defined by the
EW will remain unchanged.
Although not needed for a discussion of framing, it is instructive to note that the
marginal ray (highlighted in blue in the figures) is the meridional ray from the axial
position on the OP that passes by the edge of the AS. Consequently, the marginal ray
also passes by the edges of the pupils. In other words, the EP diameter defines the
maximum cone of light that emerges from the axial position on the OP and is accepted
by the lens, and the XP diameter defines the maximum cone of light that subsequently
exits the lens and converges at the axial position on the SP. Adjusting the diameter of
the AS restricts the size of the ray bundle formed by the marginal rays. Evidently, this
will affect the amount of light reaching the SP but will not affect the AFoV.
Note that within Gaussian optics, the chief and marginal rays are treated
mathematically as ray tangent slopes by extending the paraxial region according
to equation (1.6).
1-30
Physics of Digital Photography (Second Edition)
(a) EP H XP H SP
mp > 1
1
2
D mpD2
z
f
(1 − mp )f
(b) EP XP SP
mp = 1
1
2
D
z
f
(c) H EP H XP SP
mp < 1
1
2
D mpD2
z
f
(1 − mp )f
Figure 1.24. Gaussian pupil and focal plane distances when focus is set at infinity. The pupils and principal
planes are not required to be in the order shown.
point. Since the principal planes are planes of unit magnification, the cone has
diameter D at the intersection with the second principal plane. Since the second
principal point lies a distance f ′ from the rear focal point, the XP will be positioned
at a distance m p f ′ from the SP. Therefore
1-31
Physics of Digital Photography (Second Edition)
′ = (1 − m p)f ′ .
sXP (1.23)
The distance sEP can be found by solving the following Gaussian conjugate
equation:
n n′ 1
+ = .
sEP ′
sXP fE
By using the relationship between the focal lengths defined by equation (1.15), the
solution is found to be
⎛ 1 ⎞
sEP = ⎜1 − ⎟f . (1.24)
⎝ mp ⎠
Applying the Gaussian conjugate equation yields the analogous expression for s′:
s′ = (1 + ∣m∣)f ′ . (1.26)
Combining equations (1.24) and (1.25) yields the required expression for the OP
distance measured from the EP:
⎛ 1 1 ⎞
s − sEP = ⎜ + ⎟f . (1.27)
⎝ ∣m∣ mp ⎠
Similarly, combining equations (1.23) and (1.26) yields the required expression for
the IP distance measured from the XP:
′ = (∣m∣ + m p) f ′ .
s′ − sXP (1.28)
1-32
Physics of Digital Photography (Second Edition)
(a) OP EP H XP H SP
mp < 1
h
h
α α z
sEP sXP
s s
(b) OP EP XP SP
mp = 1
h
h
α α z
s s
(c) OP H EP H XP SP
mp > 1
h
h
α α z
sEP sXP
s s
Figure 1.25. Geometry defining the AFoV when the pupil magnification m p < 1, m p = 1 and m p > 1. The
pupils and principal planes are not required to be in the order shown.
1-33
Physics of Digital Photography (Second Edition)
The AFoV can be defined in the horizontal or diagonal direction simply by replacing
d with the length of the imaging sensor in the horizontal or diagonal direction.
Figure 1.26 illustrates the relative difference in the resulting FoV as a function of f
when focus is set at infinity.
Figure 1.26. Relative difference in FoV for a selection of focal lengths (mm) corresponding to a 35 mm full-
frame system with focus set at infinity.
1-34
Physics of Digital Photography (Second Edition)
The focal length f appearing in equation (1.30) is the front effective focal length.
The quantity b is the so-called bellows factor
∣m∣
b=1+ . (1.31)
mp
The bellows factor takes a value b = 1 when focus is set at infinity, but its value
increases as the magnification increases. Therefore, the presence of the bellows
factor reveals that the AFoV actually depends upon the magnification and therefore
the distance to the OP upon which the lens is focused.
For a unit-focusing lens, the AFoV becomes smaller when the OP is brought closer
to the lens. Consequently, the object appears to be larger than expected. For an
internally focusing lens, recall that the focal length f reduces when the OP is brought
closer to the lens. This can have an additional effect on the AFoV that opposes the
bellows factor. Any overall change in AFoV with OP position is referred to as focus
breathing. This phenomenon is discussed in more detail in the next section.
1-35
Physics of Digital Photography (Second Edition)
designing the lens so that the product bfnew always remains equal to f when the OP
distance s changes. Expensive cinematography lenses are often designed in this way
since focus breathing is disadvantageous in cinematography. However, internal focusing
in photographic lenses often overcompensates for the reduction in AFoV that would
occur with a unit-focusing lens. For example, the AFoV of many 70–200 mm lenses on
the 35 mm full-frame format increases as s is reduced. Objects therefore appear
smaller than expected. This behaviour is opposite to that of unit-focusing lenses.
Focus breathing does not occur when focus is set at infinity. In this case, the
AFoV formula reduces to
⎛d ⎞
α(s→∞) = 2 tan−1 ⎜ ⎟ .
⎝ 2f ⎠
fE,35 = mf fE . (1.32)
The quantity mf is the focal length multiplier. This formula is valid when focus is set
at infinity and when the object and image space media are both air so that f = fE .
By definition, mf = 1 for a 35 mm full-frame format camera. Approximate values
for other formats are listed in figure 1.27. These mf values are strictly valid only
when focus is set at infinity. At closer focus distances, equation (1.32) is no longer
exact because the bellows factor b for each format will be different when the cameras
1-36
Physics of Digital Photography (Second Edition)
being compared are focused on the same OP. A generalized equation is derived in
chapter 5 where cross-format comparisons are discussed in detail.
It should be noted that focal length fE is a property of the lens and its value does
not change when a lens designed for a given format is used on a different format. For
example, 35 mm full-frame lenses can be used on APS-C cameras. In this case, the
35 mm full-frame lens will project an optical image at the SP with a wider image
circle than the APS-C sensor format diagonal can accommodate. This is not optimal in
terms of image quality since the image will be cropped. Nevertheless, the lens will continue
to operate as a fE lens on the APS-C camera with AFoV given by equation (1.30).
Again, a lens with full-frame equivalent focal length fE,35 defined by equation (1.32)
would be required to produce the same AFoV on a 35 mm full-frame camera. In this
scenario, the term crop factor is used for mf instead of focal length multiplier.
However, focal-length multiplier is the appropriate term when the lens projects an
image circle optimized for the sensor format.
1.3.7 Perspective
Figure 1.28 shows a simple photographic scene with two objects labelled A and B.
When the photographer views the scene from position 1, object B appears to be
almost at tall as object A. In contrast, object B only appears to be about half as tall
as object A when the photographer views the scene from position 2. Furthermore,
the horizontal space between objects A and B appears much more compressed when
the scene is viewed from position 2. In other words, positions 1 and 2 offer a different
perspective of the scene.
In the simple example above, perspective is found to depend upon the distance
between the pupil of the photographer’s eye and the object upon which the eye is
focused. When taking a photograph using a camera, perspective again depends only
upon the distance between the lens EP and the OP. This is the true perspective of the
photograph [5], and the location of the EP defines the centre of perspective in object
space. Mathematically, the OP is located at a distance s − sEP from the centre of
perspective, where s is the distance from H to the OP, and sEP is the distance from H
to the EP.
Certain photographic tasks require a particular perspective. For example, it is
preferable to carry out portrait photography at a perspective that provides flattering
compression of facial features. This requirement specifies a suitable working distance
between the photographer and the model. The working distance is independent of
focal length since focal length affects AFoV and magnification but not perspective.
Nevertheless, a portrait lens should provide an appropriate framing once the
A
B
1 2
Figure 1.28. A simple photographic scene viewed from two different perspectives.
1-37
Physics of Digital Photography (Second Edition)
perspective and therefore working distance has been established. On the 35 mm full-
frame format, it is generally considered that a focal length of around 85 mm provides
a suitable framing or AFoV when used at portrait working distances.
A photographic print or screen image should be viewed from a position that
portrays the true perspective. This again requires that the photograph be viewed
from the centre of perspective, but in image space rather than object space.
Therefore, if a photograph does not undergo any enlargement from the sensor
dimensions, the pupil of the photographer’s eye should be positioned at a distance
from the photograph equivalent to s′ − sXP ′ . If the photograph is enlarged by a
factor X from the sensor dimensions as illustrated in figure 1.29, the distance s′ − sXP
′
also needs to be scaled by X . Utilizing equation (1.28), the required viewing distance
l ′ can be written
′ )X
l ′ = (s′ − sXP
= (∣m∣ + m p) f ′X .
OP EP H XP H SP photograph
α α z
sEP sXP
s s
l
′ in image space.
Figure 1.29. True perspective is defined by the distance s − sEP in object space and s′ − s XP
When a photograph is enlarged by a factor X from the optical image recorded at the SP, the photograph needs
to be viewed from a distance equivalent to l ′= (s′ − s XP
′ )X in order to appear at the true perspective.
1-38
Physics of Digital Photography (Second Edition)
EP XP
OP H H SP
y
θ θ
h θ
θ
h
u u
z
s s
Figure 1.30. Keystone distortion is caused by the magnification varying across the IP. The pupil magnification
m p = 1 in this example.
1-39
Physics of Digital Photography (Second Edition)
y
u′ = .
s′
Therefore
⎛ s′ ⎞
u= ⎜ ⎟u′ = mu′ ,
⎝s⎠
where m is the Gaussian magnification at the OA. From the geometry it can be seen
that
s 1
tan θ = =
y u
s′ 1
tan θ′ = = .
y u′
Combining these equations yields the Scheimpflug condition,
tan θ′ = m tan θ .
The human visual system (HVS) subconsciously interprets leaning vertical lines as
parallel when the tilt angle θ is small, and so keystone distortion is objectionable for
small tilt angles.
Tilting can be avoided by using an ultra-wide angle lens and cropping the output
digital image or preferably by using the shift function of a tilt-shift lens to shift the
lens relative to the SP in a vertical plane. Tilt-shift lenses can also tilt the lens relative
to the SP so as to tilt the plane of focus without affecting the AFoV. An example
application is DoF extension [5].
1-40
Physics of Digital Photography (Second Edition)
OP EP H XP H SP
n n
near
c 1
D mp D2
2
|m|
c
P P
sn sn
sEP sXP
s s
OP EP H XP H SP
n n
far
1
D mp D2
2
c
|m| c
P P
sf sf
sEP sXP
s s
Figure 1.31. Geometry for the DoF equations. The pupil magnification m p > 1 in this example. The near DoF
and far DoF boundaries are defined by the distances where the blur spot diameter is equal to the
acceptable CoC diameter c.
In the lower figure, rays from a point object positioned behind the OP converge in
front of the SP. This point also appears as a blur spot at the SP with diameter c.
Provided the blur spot diameter does not exceed a prescribed value, objects
situated between sn and sf will remain acceptably sharp or in focus at the SP. The
blur spot with this prescribed diameter c is known as the acceptable circle of
1-41
Physics of Digital Photography (Second Edition)
confusion (CoC) [5, 6]. The criterion for defining the CoC diameter is based upon
the fact that the resolving power (RP) of the HVS is limited, and so blur spots on the
SP that are smaller than the CoC will not produce noticeable blur on the photo-
graph. Since the RP of the HVS depends upon the viewing conditions, the prescribed
CoC also depends upon the viewing conditions. Camera and lens manufacturers
typically assume the following set of standard viewing conditions when defining
DoF scales [6]:
• Enlargement factor X from the sensor dimensions to the dimensions of the
viewed output photograph: X = 8 for the 35 mm full frame format.
• Viewing distance: L = Dv, where Dv = 250 mm is the least distance of distinct
vision.
• Resolving power (RP) of the HVS at Dv: RP(Dv) = 5 lp/mm (line pairs per
mm) when viewing a pattern of alternating black and white lines. More than
5 lp/mm cannot be distinguished from uniform grey..
Table 1.2 lists examples of standard CoC diameters based upon the above set of
standard viewing conditions. Note that smaller sensor formats require a smaller
CoC since the enlargement factor X will be greater.
When the viewing conditions differ from those defined above, a custom CoC
diameter can be calculated using the following formula:
1.22 × L
c= . (1.33)
X × Dv × RP(Dv )
This formula will be derived in chapter 5 where the concept of the CoC will be
discussed in further detail.
1-42
Physics of Digital Photography (Second Edition)
c ⎛ s − sn ⎞
=⎜ ⎟D. (1.34)
∣m∣ ⎝ sn − sEP ⎠
Similarly, the geometry shown in the lower diagram of figure 1.31 reveals that
c ⎛ s −s ⎞
=⎜ f ⎟D. (1.35)
∣m∣ ⎝ sf − sEP ⎠
The distance sEP from H to the EP has been derived in section 1.3.3 and is defined by
equation (1.24),
⎛ 1 ⎞
s EP = ⎜1 − ⎟ f.
⎝ mp ⎠
c(s − sEP )
near DoF = . (1.36)
∣m∣D + c
An object positioned closer than the OP with a separation distance greater than
s − sn from the OP is considered to be out of focus.
c(s − s EP )
far DoF = . (1.37)
∣m∣D − c
1-43
Physics of Digital Photography (Second Edition)
2∣m∣Dc (s − sEP )
total DoF = .
m2D 2 − c 2
As discussed in the following section, the near and far contributions to the total DoF
are not equal in general.
1-44
Physics of Digital Photography (Second Edition)
Figure 1.32. Near DoF divided by the far DoF as a function of distance s from the first principal plane for an
example set of f, N and c values. At the minimum focus distance past s = f, the near and far DoF are almost
equal and so the ratio is 1:1. At the hyperfocal distance H, the ratio is 1: ∞ and so the fraction reduces to zero.
the far DoF extends to infinity. The distance s − sEP = H measured from the lens EP
is known as the hyperfocal distance [5]. From equation (1.37), the far DoF extends to
infinity at the following magnification:
c
∣m∣ = (when s − sEP = H). (1.38)
D
Substituting for ∣m∣ with s − sEP = H yields the following formula for H:
fD
H= + f − sEP . (1.39)
c
Now substituting equation (1.38) into the near DoF formula defined by equation
(1.36) with s − sEP = H yields the following result:
H
H= .
2
This reveals that when the OP is positioned at the hyperfocal distance, the far
DoF extends to infinity and the near DoF extends to half the hyperfocal distance
itself.
According to Gaussian optics, focusing at the hyperfocal distance yields the
maximum available DoF for a given combination of camera settings. This is useful
for landscape photography, although in practice it is advisable to set focus at an OP
positioned beyond the hyperfocal distance if the aim is to ensure the distant features
remain in focus. Fixed-focus cameras are set at the hyperfocal distance since these
rely on maximising the total DoF to produce in-focus images.
1-45
Physics of Digital Photography (Second Edition)
Practical calculations
Equation (1.39) can be written in the following way for practical calculations:
H = h + f − sEP
1-46
Physics of Digital Photography (Second Edition)
Figure 1.33. Blur spot as a function of object distance from the first principal point after focusing on the OP
positioned at distance s. The horizontal dashed line indicates an acceptable CoC with diameter c = 0.030 mm.
⎛ s − s EP ⎞
φ = cos−1 ⎜ n ⎟.
⎝ s − sEP ⎠
This angle is valid in any pivot direction. The distance sn − sEP can be found by re-
arranging the near DoF formula defined by equation (1.36),
1-47
Physics of Digital Photography (Second Edition)
p
OP near
DoF
s s − sEP
sn
sEP EP
H
z
Figure 1.34. Geometry of the focus and recompose technique.
∣m∣D (s − sEP )
sn − s EP = .
∣m∣D + c
The maximum pivot angle is found to be
⎛ ∣m∣D ⎞
φ = cos−1 ⎜ ⎟ .
⎝ ∣m∣D + c ⎠
For the special case that the lens is focused at the hyperfocal distance H, equation
(1.38) reveals that ∣m∣D = c and so the maximum allowed pivot angle is
φ = cos−1(0.5) = 60°.
Generally, if the calculated φ is too small for the required recomposition, it is
advisable to first compose the scene and then set focus by using an alternative
viewfinder AF point that lies as close as possible to the location of the subject. This
method was used in single-AF mode to focus on the eye in the example shown in
figure 1.35.
1.4.6 Bokeh
As mentioned in the previous section, the photographic DoF formulae give
approximate results since they only describe uniform blur due to defocus. All other
blur contributions have been neglected, including those arising from the higher order
1-48
Physics of Digital Photography (Second Edition)
Figure 1.35. Accurate focus on the eye is critical when the DoF is shallow.
1-49
Physics of Digital Photography (Second Edition)
IP SP
SP IP
Figure 1.36. Blur spots for a lens with under-corrected SA. The upper diagram shows a blur spot at the SP
arising from a background point located just beyond the far DoF boundary, whereas the upper diagram shows
a blur spot at the same SP arising from a foreground point located just in front of the near DoF boundary.
(Blur spots reproduced from [15] with kind permission from ZEISS (Carl Zeiss AG).)
behind the same SP. At the SP, the point will appear as a smaller blur spot with blur
distribution that sharply increases at the edges. Overlapping blur spots of this type
can produce harsh bokeh. Since the rays originate from in front of the near DoF
boundary, under-corrected SA is found to lead to harsh foreground bokeh.
The situation reverses for a lens with residual over-corrected SA. In this case, the
background bokeh becomes harsher, and the foreground bokeh becomes smoother.
Since background bokeh is generally considered more important, slightly under-
corrected SA is preferred in practice [15].
Many other factors affect the appearance of a blur spot. For example, chromatic
aberration at the light–dark boundary may be visible on the rim of an isolated blur
spot due to variation of focus with wavelength. The number of blades used by the iris
diaphragm affects the shape of a blur spot. A higher blade count will produce a more
circular blur spot at the OA. Even if the blur spots are circular at the OA, their shape
may be altered at field positions away from the OA. For example, field curvature
1-50
Physics of Digital Photography (Second Edition)
and vignetting at large aperture diameters can produce a cats-eye shaped blur
towards the edges of the frame.
Note that factors 1–3 depend upon the relative aperture (RA) of the lens. Later in
this section, it will be shown how the familiar f-number emerges from the RA.
Subsequently, the camera equation will be derived. This describes the photometric
exposure distribution at the SP in terms of the above factors. In chapter 2, an
exposure strategy will be developed based on the camera equation.
1.5.1 Photometry
The ‘amount’ of light from the photographic scene that passes through the lens to
the SP can be quantified in terms of power or flux. Photometric or luminous flux
denoted by Φv is the rate of flow of electromagnetic energy emitted from or received
by a specified surface area, appropriately weighted to take into account the spectral
sensitivity of the HVS. The unit of measurement is the lumen (lm).
1-51
Physics of Digital Photography (Second Edition)
Table 1.3 lists various other photometric quantities that are useful for describing
the flow of flux from the photographic scene to the SP.
• Illuminance E v : This is the luminous flux per unit area received by a specified
surface from all directions. The unit of measurement is the lux (lx), which is
equivalent to l m m−2. Integrating illuminance over a surface area yields the
total flux associated with that area. In photography, the illuminance at the SP
is of primary interest.
• Luminous intensity Iv : Consider the flux emerging from the scene being
photographed. When flux is emitted from an area, the quantity analogous to
illuminance is luminous exitance Mv. However, luminous exitance does not
depend upon a specific direction and so it is not a convenient quantity for
measuring the light emitted from the scene in the direction of the camera.
Luminous intensity Iv is a much more useful quantity as it defines the flux
emitted from a point source into a cone. As illustrated in figure 1.37, the cone
Table 1.3. Common photometric quantities with their symbols and SI units. The ‘v’ (for visual)
subscript is often dropped from photographic formulae.
Photometry
Luminous flux Φv lm
Luminous intensity Iv lm sr−1 cd
Luminance Lv lm m−2 sr−1 cd m−2
Luminous exitance Mv lm m−2 lx
Illuminance Ev lm m−2 lx
Spectral exposure Hv lm s m−2 lx s
(a) r (b)
r2
r r
θ ω
θ = 1 radian ω = 1 steradian
Figure 1.37. (a) An angle of 1 radian (rad) is defined by an arc length r of a circle with radius r. Since the
circumference of a circle is 2πr , the angle corresponding to a whole circle is 2π radians. (b) A solid angle ω
projects a point source onto the surface of a sphere, thus defining a cone. For a cone of radius r, a surface area
of r2 defines a solid angle of 1 steradian (sr). Since the surface area of a whole sphere is 4πr 2 , the solid angle
corresponding to a whole sphere is 4π steradians.
1-52
Physics of Digital Photography (Second Edition)
is defined per unit solid angle measured in steradians (sr). The unit of
measurement of luminous intensity is the candela (cd), equivalent to l m sr−1.
In the present context, the cone subtended by the lens EP is of primary interest.
• Luminance L v : When the source is not an isolated point but is extended,
luminous intensity at a source point can be spread into an infinitesimal source
area. This defines luminance L v as the luminous flux per unit solid angle per
unit projected source area. Integrating luminance over the cone angular
subtense yields the flux received by the observer (such as the eye or lens EP)
from the source position. Integrating over the entire source area then yields
the total flux received by the observer.
Based on the above, the scene luminance distribution can now be defined as an array
of infinitesimal luminance patches representing the photographic scene, where the
solid angle defines a cone subtended by the scene position from the lens EP. The lens
transforms the flux defined by the scene luminance distribution into an array of
infinitesimal illuminance patches on the SP referred to as the sensor-plane illumi-
nance distribution.
1-53
Physics of Digital Photography (Second Edition)
OP
θ
dA dθ z
φ
Figure 1.38. The angle element dθ is integrated from 0 to θ. The OA is in the z-direction, and the angle ϕ
revolves around the OA.
θ 2π
Φ= ∫θ=0 ∫ϕ=0 Iv(θ1) dω(θ1, ϕ) dθ1 dϕ. (1.43)
1-54
Physics of Digital Photography (Second Edition)
rdθ
r sin θ
dS
r sin θ dφ
θ dθ
r
y
dφ
φ
x
Figure 1.39. Surface area element dS of a cone of flux. Here the z-axis representing the OA is in the vertical
direction.
Substituting equations (1.42) and (1.44) into (1.43) and performing the integration
over ϕ yields
θ θ
Φ = 2π ∫0 I cos θ1 sin θ1 dθ1 = πL dA ∫0 (2 sin θ1 cos θ1) dθ1. (1.45)
Here θ1 is the dummy variable. Performing this integration provides the following
result:
Φ = πL sin2 θ dA (1.46)
Here dA is the infinitesimal area element associated with the axial position on the
plane surface, L is the luminance associated with dA and θ is the real vertical angle
subtended by the cone with the OA.
Equation (1.46) is valid when tracing real rays through an optical system. Recall
from section 1.1.4 that Gaussian optics avoids describing lens aberrations by
considering only the paraxial region where sin θ → θ . Subsequently, a linear extension
of the paraxial region is performed by introducing the ray tangent slope u in place of θ.
Consequently, the form of equation (1.46) valid within Gaussian optics is
Φ = πL u 2 dA (1.47)
1-55
Physics of Digital Photography (Second Edition)
OP EP H XP H SP
n n
1
2 mpD
1
2D
dA u u dA
P z
P
f f
sEP sXP
s s
Figure 1.40. Gaussian geometry defining the relative aperture (RA) and working f-number Nw .
The formal way to obtain this result is to note that the dummy variables appearing
in equation (1.45) become cos θ1 = 1, sin θ1 = u1, and dθ1 = du1 within Gaussian
optics.
Since u is a ray tangent slope, an important feature of the Gaussian description is
that the base of the cone becomes flat. This feature is illustrated in figure 1.40.
The aim is to obtain an expression for the resulting illuminance E associated with
dA′, the infinitesimal area at the axial position on the SP.
1-56
Physics of Digital Photography (Second Edition)
First note that dA and dA′ may be expressed as the product of infinitesimal
heights in the x and y directions,
dA = dhx dhy
dA′ = dhx′ dh y′ .
By utilizing equation (1.16), the ratio of dA′ to dA can now be expressed in terms of
the magnification,
dA′ dh x′ dh y′
= = m2 .
dA dhx dhy
Substituting for dA in equation (1.47) yields
dA′ 2
Φ = πL u .
m2
Further progress can be made by utilising the Lagrange theorem [1, 2]. This theorem
defines an optical invariant valid within Gaussian optics. Application of the
invariant provides the relationship defined by equation (1.18) between the magni-
fication and the marginal ray tangent slopes u and u′,
nu
m= .
n′u′
Now substituting for m yields the flux at the axial position on the SP in terms of the
ray tangent slopes,
⎧⎛ n′ ⎞ ⎫2
Φ = πL dA′⎨⎜ ⎟u′⎬ .
⎩⎝ n ⎠ ⎭
The corresponding illuminance at the axial position is given by
Φ
E=T , (1.48)
dA′
where T ⩽ 1 is a lens transmittance factor that takes into account light loss due to the
lens material. It now follows that
π ⎧⎛ n′ ⎞ ⎫2
E= LT ⎨⎜ ⎟2u′⎬ . (1.49)
4 ⎩⎝ n ⎠ ⎭
⎛ n′ ⎞
RA = ⎜ ⎟2u′ . (1.50)
⎝n ⎠
1-57
Physics of Digital Photography (Second Edition)
1.5.4 f-number
When focus is set at infinity, the OP distance s → ∞, and the IP (and SP) distance
s′ → f ′, where f ′ is the rear effective focal length. The corresponding geometry is
shown in figure 1.41. The ray tangent slope u′ is seen to be
D
u (′s→∞) = ,
2f ′
where D is the diameter of the entrance pupil. Substituting into equation (1.49) yields
the following expression for the illuminance at the axial position on the SP:
π ⎧ ⎛ n′ ⎞ D ⎫ 2
E(s→∞) = LT ⎨⎜ ⎟ ⎬ .
4 ⎩⎝ n ⎠ f ′ ⎭
OP EP H XP H SP
∞
n n
1
m D
2 p
1
2
D
u dA
z
f f
sEP sXP
Figure 1.41. Gaussian geometry defining the f-number N for a lens focused at infinity. In this illustration the
pupil magnification m p > 1.
1-58
Physics of Digital Photography (Second Edition)
The refractive indices can be removed by utilising the relationship between the front
and rear effective focal lengths defined by equation (1.15),
f′ n′
= .
f n
Therefore
π ⎛ D ⎞2
E(s→∞) = LT ⎜ ⎟ .
4 ⎝f⎠
Back in the nineteenth century, the quantity D/f was defined as the apertal ratio [17],
however this term has not come into widespread use. The apertal ratio is the specific
case of the RA when the lens is focused at infinity. In photography it is numerically
more convenient to consider the reciprocal of the apertal ratio instead. This is
defined as the f-number N:
f
N= . (1.51)
D
RA and f-number therefore have a reciprocal relationship [14]. The f-number is
usually marked on lens barrels using the symbols f /n or 1:n, where n is the numerical
value of the f-number. (In order to avoid confusion with focal length, the symbol N
is used for f-number throughout this book.) The expression for the illuminance at the
axial position (axial infinitesimal area element) on the SP when focus is set at infinity
becomes
π 1
E(s→∞) = LT 2 . (1.52)
4 N
Note that according to equation (1.51), the f-number is defined as the front effective
focal length divided by the diameter of the EP [2, 18]. The f-number is commonly but
incorrectly defined as the effective focal length fE or rear effective focal length f ′
divided by the diameter of the EP. In the former case, the correct numerical value for
N would be obtained only if the object-space medium is air. In the latter case, the
correct numerical value for N would be obtained only if the object-space and image-
space media both have the same refractive index.
It is often assumed that the Gaussian expression for the f-number is valid only
within Gaussian optics. It will be shown in section 1.5.10 that equation (1.51) is in
fact exact for a lens that is free of SA and coma. It will also be shown that the
minimum value of the f-number in such a lens is limited to N = 0.5 in air.
1-59
Physics of Digital Photography (Second Edition)
m pD /2
u′ = .
′ )
(s′ − sXP
Here D is the diameter of the EP, m p is the pupil magnification, s′ is the image
distance measured from the second principal plane and sXP′ is the distance from the
′
second principal plane to the XP. The distance s′ − sXP was derived in section 1.3.3
and is defined by equation (1.28),
′ = (∣m∣ + m p) f ′ .
s′ − sXP
Therefore
m pD
u′ = .
2 (∣m∣ + m p) f ′
Substituting into equation (1.49) and utilising the relationship between the front and
rear effective focal lengths defined by equation (1.15) leads to the following
expression for the illuminance at the axial position (axial infinitesimal area element)
on the SP:
π ⎛ D ⎞2
E= LT ⎜ ⎟ . (1.53)
4 ⎝ bf ⎠
Here b is the bellows factor defined by equation (1.31), which emerged when deriving
the AFoV formula
∣m∣
b=1+ .
mp
π 1
E= LT 2 . (1.54)
4 Nw
Nw = bN . (1.55)
When focus is set at infinity, ∣m∣ → 0 and the bellows factor b → 1. The working
f-number then reduces to the f-number N .
Consider a lens that utilises unit focusing. When focus is set on an OP positioned
closer than infinity, ∣m∣ > 0 and so Nw > N . Comparison of equations (1.52) and
(1.54) reveals that the illuminance decreases relative to its value at infinity focus.
This is a consequence of focus breathing. The contribution from the bellows factor
can become significant at close focus distances in the same way that it affects the
1-60
Physics of Digital Photography (Second Edition)
AFoV. For example, when ∣m∣ = 1 and m p = 1, the bellows factor becomes b = 2 and
equation (1.54) reveals that the flux arriving at the axial position on the SP is
reduced to one quarter of the amount that would be expected according to the value
of N . For lenses that utilise internal focusing, the decrease in focal length that occurs
at closer focusing distances may compensate for the flux reduction, and so any
change in illuminance depends upon the specific focus breathing properties of the
lens design.
Photographic exposure strategy will be discussed in chapter 2. Standard
exposure strategy is based upon N rather than Nw as it assumes that focus is set
at infinity. Nevertheless, knowledge of Nw will indicate exposure compensation
that may need to be applied at close focusing distances when a hand-held exposure
meter is used. In the case of in-camera through-the-lens (TTL) metering systems,
any illuminance change due to the OP distance will automatically be taken into
account [5].
1.5.6 f-stop
Recall that when focus is set at infinity, the illuminance at the axial area element dA′
on the SP is defined by
π 1
E(s→∞) = LT 2 .
4 N
Here L is the scene luminance at the axial area element on the OP, T is the lens
transmittance factor and N = f /D is the f-number. The flux collected at dA′ is given
by
Φ = E dA′ .
Significantly, if the front effective focal length f and EP diameter D are changed but
their ratio is kept constant, N will remain constant and the flux or luminous power Φ
incident at dA′ will also remain constant provided L is time-independent.
Now consider adjusting the f-number itself. Since E is inversely proportional to
the square of N, the f-number must decrease by a factor 2 in order to double the
flux Φ. Analogously, the f-number must increase by a factor 2 in order to halve Φ.
Adjustable iris diaphragms are constructed so that successive increments will double
or halve the flux Φ. This leads to the following series of possible f-numbers when the
surrounding medium is air:
1-61
Physics of Digital Photography (Second Edition)
dA
ϕ
u
z
OP EP
r
Figure 1.42. Geometry of the cosine fourth law which explains natural vignetting. Recall that the base of the
cone becomes flat in Gaussian optics.
1-62
Physics of Digital Photography (Second Edition)
Performing these modifications to the derivation given in section 1.5.2 leads to the
following Gaussian result:
Φ(φ) = πL dA u 2 cos4 φ .
Using this in place of equation (1.47) yields the illuminance at any desired position
on the SP. For example, equation (1.54) generalises to
π ⎛⎜ x y ⎞⎟ 1 ⎧ ⎛ x y ⎞⎫
E (x , y ) = L , T cos4 ⎨φ⎜ , ⎟⎬ . (1.56)
4 ⎝ m m ⎠ Nw 2
⎩ ⎝ m m ⎠⎭
The coordinates (x , y ) indicate the position on the SP, and the coordinates
(x /m, y /m ) indicate the corresponding position on the OP. These two sets of
Figure 1.43. Natural cosine-fourth falloff for a rectilinear lens focused at infinity in air. The upper and lower
diagrams show effective focal lengths 24 mm and 50 mm, respectively, on a camera with a 35 mm full-frame
sensor.
1-63
Physics of Digital Photography (Second Edition)
coordinates are related via the magnification. The coordinates are typically dropped
in photographic formulae,
π 1
E= L 2 T cos4 φ . (1.57)
4 Nw
n′
φ= m p φ′ .
n
Again the ‘v’ subscripts have been dropped for clarity. The time integral can be
replaced by a product provided the illuminance does not change during the exposure
duration,
H = Et. (1.58)
The time t is the exposure duration, informally referred to as the shutter speed.
Since illuminance E is the power or flux received per unit area weighted by the
spectral response of the HVS, photometric exposure H is the electromagnetic energy
received per unit area weighted by the same response. Substituting equation (1.57)
into (1.58) yields the camera equation
π t
H= L 2 T cos4 φ . (1.59)
4 Nw
1-64
Physics of Digital Photography (Second Edition)
The camera equation describes the photometric exposure distribution at the SP. It
should be remembered that H is a function of position on the SP, where each
position is associated with an infinitesimal area element dA′. In analogy with
equation (1.57), the camera equation can be expressed more explicitly using a
coordinate representation,
π ⎛⎜ x y ⎞⎟ t ⎧ ⎛ x y ⎞⎫
H (x , y ) = L , T cos4 ⎨φ⎜ , ⎟⎬ .
⎝
4 m m Nw ⎠ 2
⎩ ⎝ m m ⎠⎭
The coordinates (x , y ) indicate the position on the SP, and the coordinates
(x /m, y /m ) indicate the corresponding position on the OP. These two sets of
coordinates are related via the magnification. Again, the angle φ is the object-space
angle subtended from the centre of the EP by the off-axis scene position under
consideration. Even if the scene luminance distribution is uniform, H will vary over
the SP due to natural vignetting along with any mechanical vignetting.
For a given scene luminance distribution, the magnitude of the photometric
exposure distribution at the SP depends primarily upon the working f-number and
the exposure duration t. Photometric exposure is independent of the camera ISO
setting, although a higher ISO setting will reduce the maximum exposure that can be
tolerated. The choice of exposure settings depends upon the exposure strategy of the
photographer. Exposure strategy is discussed in chapter 2.
1.5.9 Shutters
A camera uses a shutter to control the exposure duration t. Three main types of
shutter are used in modern cameras.
1-65
Physics of Digital Photography (Second Edition)
(a) (b)
shutter shutter
traversal time traversal time
height on SP
height on SP
shutter speed t t
(slow) (fast)
time → time →
shutter shutter
traversal time traversal time
Figure 1.45. (a) Traversal of each curtain when the shutter speed is slow. (b) Traversal of each curtain when
the shutter speed is faster than the curtain traversal time (sync speed).
For an FP shutter, the shutter traversal time is the time needed for a single curtain
to traverse the sensor. This value is typically of order 1/250 s and is the same for both
curtains. Two different scenarios are shown in figure 1.45 that illustrate the
significance of the curtain traversal time. Figure 1.45(a) shows the opening and
closing of the shutter for a slow shutter speed (long exposure duration) where
t > 1/250 s and figure 1.45(b) shows the opening and closing of the shutter for a fast
shutter speed (short exposure duration) where t < 1/250 s. Evidently, a shutter speed
faster than the shutter traversal time is obtained only when the second curtain starts
closing before the first curtain has fully opened. In this case, the imaging sensor is
exposed by a moving slit that is narrower than the height of the imaging sensor
(figure 1.46). This means that shutter speeds faster than the shutter traversal time
should not be used with conventional flash as the very brief flash duration would
freeze the appearance of the slit and produce dark bands in the photograph. For this
reason, the shutter traversal time is also known as the sync speed.
All types of rolling shutter can cause an artefact known as rolling shutter
distortion, particularly in photographs of scenes that include fast motion. Rolling
shutter distortion occurs due to the fact that different vertical positions on the SP are
not exposed at the exact same instant of time.
1-66
Physics of Digital Photography (Second Edition)
Figure 1.46. The sensor is exposed by a moving slit when the second curtain starts closing before the first
curtain has fully opened. This occurs when shutter speeds faster than the sync speed (shutter traversal time) are
used.
Leaf shutter
This type of shutter is positioned next to the AS of the lens. It is commonly used in
compact cameras. The blades open from the centre and so the shutter traversal time
is very quick, which is advantageous for flash photography. However, the fastest
achievable shutter speed is limited to about 1/2000 s since the same blades are used
for opening and closing the shutter.
Electronic shutter
Compared to a mechanical shutter, an electronic shutter allows much faster shutter
speeds to be used. At present, electronic shutter speeds of up to 1/32000 s are
available in consumer cameras. The fastest shutter speed depends on the row
readout speed, which is the time taken for the charge signals corresponding to a row
of pixels on the imaging sensor to be electronically read out. Advantages of
electronic shutters include silent operation, absence of mechanical wear and absence
of unwanted vibrations described as shutter shock.
In a CCD sensor, all rows can be read simultaneously. This enables an electronic
global shutter to be used, which has two main advantages. First, rolling shutter
distortion will be absent. Second, the shutter traversal speed is equivalent to the row
readout speed. Since the row readout speed is equivalent to the fastest possible
shutter speed, flash can be used at very fast shutter speeds.
At present, electronic global shutters have not been implemented in consumer
cameras with CMOS sensors, although this will likely change in the near future.
Instead, consumer cameras with CMOS sensors use an electronic rolling shutter that
requires each row to be read in sequence. Since each row needs to be exposed for the
same time duration, the exposure at a given row cannot be started until the readout
of the previous row has been completed. The shutter traversal time is therefore
limited by the frame readout speed, which is typically slower than the shutter
traversal time of a mechanical shutter. This has two main disadvantages. First,
rolling shutter distortion is more severe than that caused by mechanical FP shutters.
Second, the use of conventional flash is limited to shutter speeds slower than the
frame readout speed. In cameras that offer both a mechanical and an electronic
shutter, the use of flash with electronic rolling shutter is often disabled. Electronic
first curtain shutter is a compromise solution that offers some of the advantages of
1-67
Physics of Digital Photography (Second Edition)
an electronic shutter. The electronic rolling shutter is used only to start the exposure
at each row. This process is synchronized with a mechanical shutter that ends the
exposure.
OP IP
EP XP
n n
dA U U dA
z
Figure 1.47. When an optical system is described using real ray optics, the object-space and image-space
numerical apertures are defined in terms of the half-cone angles U and U ′ formed by the real marginal ray.
1-68
Physics of Digital Photography (Second Edition)
In the present context, the equivalent theorem is the sine theorem. This can be used
to obtain a relationship between the real marginal ray angles U , U ′ and the
magnification M defined by the real marginal ray IP,
NA n sin U
M= = . (1.61)
NA′ n′ sin U ′
Here NA = n sin U is the object-space numerical aperture, and NA ’= n′ sin U ′ is
the image-space numerical aperture.
By substituting equation (1.61) into (1.60) and utilising the fact that M 2 = dA′/dA,
the illuminance E = (T Φ)/dA′ at the axial position on the IP becomes
π
E = LT × (RA)2
4
where RA is the relative aperture defined by the real marginal ray,
2 NA′ 2n′ sin U ′
RA = = .
n n
This is a generalisation of equation (1.50). Recall that the f-number is defined as the
reciprocal of the RA when focus is set at infinity, and therefore
n
N= . (1.62)
2 NA′∞
Here NA′∞ is the image-space numerical aperture when focus is set at infinity. It
should be noted that the above expression for the f-number defines the illuminance
at the axial position on the IP defined by the real marginal ray. In the presence of
aberrations, this IP may not correspond exactly with the plane of best focus defined
by the camera SP.
The image-space angle U ′ and image-space numerical aperture NA′ are maxi-
mised when focus is set at infinity. When a lens that utilises unit focusing is focused
on an OP positioned closer than infinity, the object-space angle U and object-space
numerical aperture NA both become larger and the magnification increases
according to equation (1.61). On the other hand, U ′ and NA′ both become smaller
and this reduces the illuminance at the axial position on the IP. The maximum
achievable magnification will in theory be higher if n > n′. This principle is utilised
in microscopes by immersing the objective in immersion oil that has n ≈ 1.5.
Conversely for a given scene luminance, the maximum achievable illuminance at
the axial position on the IP can in principle be increased by using an image-space
medium with n′ > n.
Now consider the case of an aplanatic lens. This is defined as a lens free from SA and
coma. According to Abbe’s sine condition [1, 2, 16], in an aplanatic lens the
magnification M defined by the real marginal ray is equal to the Gaussian magnifi-
cation m, and so the sine theorem defined by equation (1.61) takes the following form:
sin U ′ sin U
= (aplanatic lens).
u′ u
1-69
Physics of Digital Photography (Second Edition)
EP H H SP
f
D/2
U
z
f
Figure 1.48. When an aplanatic lens focused at infinity is described using real ray optics, the second equivalent
refracting surface or second principal surface (dashed blue curve) is part of a perfect hemisphere centred at the
rear focal point. The Gaussian principal planes are indicated by H and H’. The EP diameter is D. The XP is
not involved.
Figure 1.48 shows an aplanatic lens with focus set at infinity. In this special case, the
object-space angles sin U and u approach zero but the quotient sin U /u approaches
unity, and therefore sin U ′ = u′ [22]. In section 1.5.4 and figure 1.41, it was shown
that
D
u (′s→∞) = .
2f ′
This means that the image-space numerical aperture for an aplanatic lens focused at
infinity can be written as follows:
D
NA′∞ = n′ (aplanatic lens).
2f ′
The final result is obtained by substituting NA′∞ into equation (1.62) and utilising
the fact that n′/f ′ = n/f,
n f
N= = (aplanatic lens) . (1.63)
2 NA′∞ D
In other words, the Gaussian expression for the f-number is exact for an aplanatic
lens [1, 5, 14]. Equation (1.63) is important for two main reasons:
1-70
Physics of Digital Photography (Second Edition)
D/2
U
u
f
Figure 1.49. Remembering that u′ must be interpreted as a tangent when the paraxial region is extended, the
sine of the real angle U ′ must equal u′ when an aplanatic lens is focused at infinity.
1. It shows that the f-number cannot be made arbitrarily small. The maximum
value of the sine function is unity and so the minimum possible f-number in
air is seen to be N = 0.5 for an aplanatic lens. A similar limit can be expected
for non-aplanatic lenses that have been well-corrected. The limit can be
lowered by using an image-space medium with a higher refractive index than
object space so that n′ > n [2].
2. In order for equation (1.63) to hold, the real image-space ray bundle must be
associated with a second equivalent refracting surface or second principal
surface that takes the form of a perfect hemisphere of radius f ′ centred at the
rear focal point. The geometry is shown in figure 1.48 and 1.49. This is
consistent with the fact that the principal ‘planes’ obtained by extending the
paraxial region within Gaussian optics are actually curved surfaces centred
at the object and image points when described using real ray optics [1, 3, 5].
References
[1] Jenkins F A and White H E 1976 Fundamentals of Optics 4th edn (New York: McGraw-Hill)
[2] Kingslake R and Johnson R B 2010 Lens Design Fundamentals 2nd edn (New York:
Academic)
[3] Smith W J 2007 Modern Optical Engineering 4th edn (New York: McGraw-Hill)
[4] Kingslake R 1983 Optical System Design 1st edn (New York: Academic)
[5] Kingslake R 1992 Optics in Photography, SPIE Press Monograph vol PM06 (Bellingham,
WA: SPIE)
[6] Ray S F 2002 Applied Photographic Optics: Lenses and Optical Systems for Photography,
Film, Video, Electronic and Digital Imaging 3rd edn (Oxford: Focal Press)
[7] Greivenkamp J E 2004 SPIE Field Guides Field Guide to Geometrical Optics vol FG01
(Bellingham, WA: SPIE)
[8] Johnson R B 2008 Correctly making panoramic imagery and the meaning of optical center
Proc. SPIE 7060 70600F
[9] Goldberg N 1992 Camera Technology: The Dark Side of the Lens (New York: Academic)
[10] Kreitzer M H 1982 Internal focusing telephoto lens US Patent Specification 4359297
[11] Sato S 1994 Internal focusing telephoto lens US Patent Specification 5323270
1-71
Physics of Digital Photography (Second Edition)
[12] Mukai H, Karasaki T and Kawamura K 1985 Focus conditioning detecting device for
cameras US Patent Specification 4552445
[13] Stauffer N L 1975 Auto focus camera US Patent Specification 3860035
[14] Blahnik V 2014 About the Irradiance and Apertures of Camera Lenses (Oberkochen: Carl
Zeiss Camera Lens Division)
[15] Nasse H H 2010 Depth of Field and Bokeh (Oberkochen: Carl Zeiss Camera Lens Division)
[16] Born M and Wolf E 1999 Principles of Optics: Electromagnetic Theory of Propagation,
Interference and Diffraction of Light 7th edn (Cambridge: Cambridge University Press)
[17] Sutton T and Dawson G 1867 A Dictionary of Photography (London: Sampson Low, Son, &
Marston)
[18] Hatch M R and Stoltzmann D E 1980 The f-stops here Opt. Spectra 80
[19] Foote P D 1915 Illumination from a radiating disk Bull. Bureau Stand. 12 583
[20] Koyama T 2006 Optics in digital still cameras Image Sensors and Signal Processing for
Digital Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 2
[21] Kerr D 2007 Derivation of the ‘Cosine Fourth’ law for falloff of illuminance across a camera
image unpublished
[22] Sasian J 2012 Introduction to Aberrations in Optical Imaging Systems (Cambridge:
Cambridge University Press)
1-72
IOP Publishing
Chapter 2
Digital output and exposure strategy
Chapter 1 derived the optical formulae required to define the illuminance distribu-
tion formed by an optical image at the sensor plane (SP). The photometric exposure
distribution at the SP was defined as the illuminance distribution multiplied by the
exposure time or duration t, as described by the camera equation derived in section 1.5.
For a time-dependent illuminance distribution, the multiplication generalises to a time
integral.
Although the choice of photometric exposure distribution depends upon the
aesthetic and technical requirements of the individual photographer, the exposure
distribution must at least generate a useful response from the imaging sensor so that
a satisfactory digital output image can be obtained. The Camera and Imaging
Products Association of Japan (CIPA) DC-004 and International Organization for
Standardization (ISO) 12232 photographic standards [1, 2] provide a standard
exposure strategy based on the digital output JPEG image produced by the camera,
and not the raw data. For a typical photographic scene, standard exposure strategy
aims to map the metered average scene luminance to a standard mid-tone lightness
in the output JPEG image file. This can be used as a starting point for further
adjustments to accommodate non-typical scenes or aesthetic requirements.
Since the standard exposure strategy is based upon the digital output JPEG image
file, this chapter begins with an overview of the chain of processes that lead to the
production of a JPEG image from the photometric exposure distribution at the SP.
These processes are described in much greater detail in chapters 3 and 4, which cover
raw data and raw conversion, respectively.
Subsequently, the theory of the standard exposure strategy is developed. This is
followed by a description of the metering and exposure modes found on modern
digital cameras, and advanced topics related to a practical exposure strategy such as
photographic lighting, the use of photographic filters and high dynamic range
imaging.
Finally, it should be noted that the standard exposure strategy does not aim to
maximise image quality. Image quality in relation to photographic practice is
discussed in chapter 5.
An analog-to-digital converter (ADC) converts the voltage into a raw value, which
can be one of many possible discrete raw levels. Raw values can be expressed in
terms of digital numbers (DN) or identically in terms of analog-to-digital units
(ADU). The raw values associated with all photosites comprise the raw data. The
raw data together with camera metadata can be stored in a raw file.
Each raw level is specified by a string of binary digits or bits, each of which can be
either 0 or 1. The length of the string is known as the bit depth. For a bit depth equal
to M, the number of possible raw levels is given by 2M . A raw file with a bit depth
equal to M is referred to as an M-bit raw file. An M-bit raw file therefore provides
2M possible raw levels per photosite. Typically M = 10, 12 or 14 in consumer
cameras.
Figure 2.1 shows an idealised sensor response curve. The proportionality between
scene luminance and electron count is evident. Modern imaging sensors respond
linearly over much of the sensor response curve. However, a useful response to
2-2
Physics of Digital Photography (Second Edition)
FWC
electrons
signal
noise floor
photons
Figure 2.1. Model sensor response curve. A useful response is obtained between the noise floor and full-well
capacity.
photometric exposure is obtained only between lower and upper electron counts, ne,j ,
referred to as the noise floor and full-well capacity (FWC), respectively. The noise
floor is defined by the read noise, which is the signal noise due to the electronic
readout circuitry. Read noise will be present every time charge readout occurs, even
in the absence of photometric exposure. It therefore defines the minimum usable
output signal from an engineering perspective. FWC is the maximum number of
photoelectrons that can be stored at a photosite. When FWC is reached, the
photosite is described as saturated.
2.1.2 Colour
A colour can be specified by its luminance and chromaticity. Luminance only defines
the achromatic or greyscale component of a colour, and so full colour reproduction
requires a strategy for detecting chromaticity.
As described in chapter 4, the human eye uses three types of cone cells for
detecting colour. Each colour can be specified in terms of a unique set of three
tristimulus values that arise from the light entering the eye and the response of each
type of cone cell. The LMS colour space describes all possible colours in terms of all
valid sets of tristimulus values. A colour space that describes all possible colours is
referred to as a reference colour space.
A similar approach is used in consumer cameras. One of three different types of
colour filter are placed above each photosite on the imaging sensor. The pattern of
colour filters forms a colour filter array (CFA). For example, figure 2.2 illustrates a
Bayer CFA, which uses a pattern of red, green and blue filters [3]. When the voltage
signal from a given photosite is quantized by the ADC, a raw value R , G or B
associated with a given type of filter will be recorded. Since only one type of filter can
be placed over each photosite, full RGB information is not available at each
photosite location. This missing information can be obtained through interpolation
by carrying out a computational process known as colour demosaicing.
After the colour demosaic has been performed, the colour of the light recorded at
a given photosite is described by a set of R , G , B values, which can be referred to as
raw tristimulus values. The internal camera raw space describes all possible colours
2-3
Physics of Digital Photography (Second Edition)
Figure 2.2. Bayer CFA showing the red, green and blue mosaics.
that the camera can record. Although the raw tristimulus values specifying a given
colour in the camera raw space are not numerically the same as the tristimulus
values that specify the same colour in the LMS colour space, a linear transformation
should exist between the camera raw space and the LMS colour space if the camera
is to correctly record colour. Also notice that the camera raw space is sampled
discretely since the number of raw levels is restricted to 2M for each colour
component per photosite.
The camera raw space is device-dependent and may contain many colours that
cannot be displayed on a standard three-channel display monitor. Therefore, the
camera raw space is not a suitable colour space for viewing images. However, the
camera raw space can be transformed into a standard output-referred colour space
designed for viewing images on a standard display monitor. Familiar examples
include sRGB [4] and AdobeⓇ RGB [5].
The linear RGB components of the output-referred colour space are denoted by
RL , G L , BL . These can be referred to as relative tristimulus values of the output-
referred colour space. The set of raw tristimulus values, R , G , B obtained from a
given photosite can be transformed into RL , G L , BL values by applying the following
linear matrix transformation:
⎡ RL ⎤ ⎡R⎤
⎢ ⎥ ⎢ ⎥
⎢G L ⎥ = RD⎢ G ⎥ .
̲ ̲
⎣ BL ⎦ ⎣B ⎦
Here R̲ is a colour rotation matrix,
⎡ R11 R12 R13 ⎤
⎢ ⎥
R̲ = ⎢ R21 R22 R23 ⎥ .
⎣ R31 R32 R33⎦
The matrix entries are colour space and camera dependent, however each row will
always sum to unity. The diagonal matrix D̲ takes care of white balance, which will
be discussed in chapter 4. Colour rotation matrices can be determined by character-
ising the colour response of a camera, which is also described in chapter 4.
When a colour is represented by RL , G L , BL , the luminance component of the
colour is obtained as a weighted sum. In the case of the sRGB colour space, the
weighting is defined as follows:
Y = 0.2126 RL + 0.7152 G L + 0.0722 BL. (2.2)
2-4
Physics of Digital Photography (Second Edition)
2-5
Physics of Digital Photography (Second Edition)
Raw DR is limited by the noise floor from below, and by FWC or ADC
saturation from above. Ideally the scene DR will be less than the raw DR, otherwise
scene information will be clipped irrespective of the photographic exposure strategy.
As described in chapter 5, raw DR can be specified using electrons or raw levels.
In terms of raw levels, raw DR per photosite can be defined as follows:
n DN,clip
raw DR(ratio) = :1
σDN,read
⎛ n DN,clip ⎞ (2.3)
raw DR(stops) = log2⎜ ⎟.
⎝ σDN,read ⎠
Here nDN,clip is the raw clipping point expressed as a DN, which is the maximum
usable raw level in the raw data. The noise floor or read noise expressed using DN
has been denoted by σDN,read , and this should not drop below the quantization step
(1 DN) on average.
Image DR
Image DR is the maximum DR that can be represented by the encoded output
image file such as a JPEG or TIFF file.
Since the DOLs used by an image file are not involved in the capture of raw data,
they can in principle represent an infinite amount of DR. However, the image DR
cannot be greater than the raw DR when the output image file is produced from the
raw file.
The entire raw DR can be transferred to the image file if an appropriate nonlinear
tone curve is applied, even if the image file bit depth is lower than the bit depth of the
raw data. In this case, the DOLs are nonlinearly related to the scene luminance,
which is discussed further in the next section. For accurate luminance reproduction,
any nonlinearity must be compensated for by an opposing display nonlinearity.
For reasons discussed in section 2.3, the image DR is typically much less than the
raw DR when the output image file is encoded using the default tone curve applied
by the camera manufacturer. In order to access more of the available raw DR, a
custom tone curve can be applied using an external raw converter.
Display DR
The display DR is the contrast ratio between the highest and lowest luminance
values that can be produced by the display medium.
If the display DR is less than the image DR, the image DR can be compressed
into the display DR through tone mapping, for example by reducing display
contrast. Display DR is discussed further in section 2.13.
2-6
Physics of Digital Photography (Second Edition)
However, RL , G L , BL are not directly encoded in the output JPEG image file.
Instead, bit depth reduction is performed in conjunction with gamma encoding, and it
is the resulting digital values, R ′DOL , G ′DOL , B ′DOL , known as digital output levels
(DOLs) that are encoded. Since the standard exposure strategy is based upon the
output JPEG image file, it is important to gain an understanding of the nature of the
DOLs.
Figure 2.3. The upper diagram shows a wide DR represented using few tonal levels. The lower diagram shows
a narrower DR represented using a greater number of tonal levels.
2-7
Physics of Digital Photography (Second Edition)
Although the 8-bit DOLs of an output image file can represent the entire raw DR
in principle, they need to be appropriately allocated. It turns out that DOLs cannot
be allocated in a linear manner as this would introduce visible banding or
posterisation artefacts that degrade image quality. This problem can be overcome
by allocating DOLs in a nonlinear manner so that they become nonlinearly related to
the raw data. This allocation procedure is known as gamma encoding. However, the
nonlinearity needs to be compensated for when displaying the image. This is known
as gamma decoding.
2.2.2 Posterisation
Banding or posterisation artefacts occur when a specified DR is displayed using an
insufficient number of tonal transitions. For example, the upper diagram of figure 2.4
shows a linear luminance gradient that appears smooth. In contrast, an insufficient
number of tonal levels have been used to display the same linear gradient in the
lower diagram, and so posterisation artefacts are visible.
When designing a digital camera, the ADC bit depth is chosen such that the
minimum noise level in the raw data exceeds the quantization step (1 DN or 1 ADU)
on average. Consequently, luminance gradients appear smooth since the tonal
transitions defined by the raw levels are smoothed or dithered by the noise, and so
raw data is never posterised [6]. However, the noise level would in general be too low
to dither the tonal transitions if the bit depth of the raw data were to be subsequently
reduced to 8 in a linear manner.
However, posterisation can be minimised by performing the bit depth reduction
in a nonlinear manner. The idea is to make efficient use of the 256 available tonal
levels per channel by allocating more levels to tonal transitions that the HVS can
more easily perceive, while allocating fewer levels to tonal transitions that are
Figure 2.4. The upper diagram shows a linear gradient of 8-bit gamma-encoded DOLs; this appears correctly
as a linear luminance gradient on a display with a compensating nonlinear display gamma. The middle and
lower diagrams show posterisation when the data is truncated to 6 bits or 64 tonal levels, and 5 bits or 32 tonal
levels, respectively.
2-8
Physics of Digital Photography (Second Edition)
invisible. In other words, it is necessary to take into account the way that the HVS
perceives luminance.
2.2.3 Lightness
The HVS does not perceive luminance linearly in terms of its physiological
brightness response. According to the Weber–Fechner law, the relationship is
approximately logarithmic. Physically this means that darker tones are perceived
as brighter than their luminance values would suggest, as illustrated by observing the
linear luminance gradient in the upper diagram of figure 2.4.
When using relative colourimetry, lightness can be thought of as brightness
defined relative to a reference white. Whereas brightness as a descriptor ranges from
‘dim’ to ‘bright’, lightness ranges from ‘dark’ to ‘light’. The lightness function L*
defined by the CIE (International Commission on Illumination) is specified by the
following formula:
L* = 116 f (Y ) − 16,
where
⎧Y 1/3 if Y > δ 3
⎪
f (Y ) = ⎨ Y 4 ,
⎪ + otherwise
⎩ 3δ 2
29
and δ = 6/29. Here Y = L /L n denotes relative luminance, with L n being the
luminance of the reference white. Lightness is therefore a nonlinear function of
relative luminance and takes values between 0 and 100 or 100%, as illustrated in
figure 2.5. The lightness function is part of the colour model associated with the CIE
LAB perceptually uniform reference colour space defined in section 4.4.3 of
chapter 4. In particular, notice that 18.4% relative luminance lies midway on the
lightness scale as this value corresponds to L* = 50 when Y is normalised to the range
[0,100]. Accordingly, 18.4% relative luminance and L* = 50% are referred to as
middle grey.
When reducing bit depth to 8, posterisation can be minimised by using an
encoding that linearly allocates DOLs according to lightness L* rather than relative
luminance Y. In this way, more DOLs will be allocated to luminance levels that the
HVS is more sensitive to, while fewer DOLs will be allocated to luminance levels
that the HVS cannot easily discern from neighbouring ones. A so-called encoding
gamma curve that is similar to L* is used in practice.
2-9
Physics of Digital Photography (Second Edition)
Figure 2.5. 18% relative luminance corresponds to 50% lightness (middle grey) on the CIE lightness curve. The
corresponding 8-bit DOL in the sRGB colour space is 118. This DOL is approximately 46% of the maximum
DOL since the sRGB gamma curve is not identical to the CIE lightness curve.
R′ ∝ (RL )γE
G′ ∝ (G L )γE (2.4)
B′ ∝ (BL )γE .
2-10
Physics of Digital Photography (Second Edition)
Figure 2.6. A γE = 1/2.2 curve applied to 12-bit linear raw data normalised to the range [0,1] and subsequently
quantised to 8-bit DOLs.
2-11
Physics of Digital Photography (Second Edition)
V ′ = V γE
(2.6)
L n = C{V ′}γD + B.
Here V ′ = R′, G′ or B′ are the nonlinear values defined by equation (2.4), L n
represents the normalised luminance output from the display, C is a gain and B is a
black-level offset. The gain and offset control the display DR, which is discussed in
section 2.13.1. Note that a single quantity derived from the set of DOLs known as
luma is used for V ′ in practice.
The significance of equation (2.6) is that the display gamma γD is the reciprocal of
the encoding gamma γE . Consequently, the overall gamma will be unity and L n will
be linearly related to V,
L n = CV + B.
However, there may be several nonlinearities that need to be compensated for in a
general imaging chain, each associated with their own gamma. When differing
gammas exist between successive stages in an imaging chain, a change of gamma is
referred to as gamma correction. In practice, an overall gamma slightly higher than
unity is preferred in order to account for environmental viewing factors such as
flare [7].
The term ‘gamma correction’ should be avoided when referring to the encoding
gamma as this may give the impression that the raw data is in some way being
corrected. It should be remembered that linear raw data would appear at an
appropriate lightness if viewed on a linear display. The primary purpose of gamma
encoding and decoding is to prevent posterisation artefacts from arising when
reducing bit depth to 8 by more efficiently utilizing the available tonal levels. The
2-12
Physics of Digital Photography (Second Edition)
⎛ Y (max. DOL) ⎞
image DR (stops) = log2⎜ ⎟. (2.7b)
⎝ Y (min. non‐clipped DOL) ⎠
2-13
Physics of Digital Photography (Second Edition)
The image DR that can be represented by the standard encoding gamma curve of a
chosen output-referred colour space is discussed below. Subsequently, general tone
curves and their effect on image DR are described.
2-14
Physics of Digital Photography (Second Edition)
Substituting equations (2.8) and (2.10) into equation (2.7b) and utilising equation
(2.9) reveals that 8-bit DOLs encoded using the sRGB gamma curve can represent
the following image DR:
⎛ 1 ⎞
image DR (sRGB gamma) = log2⎜ ⎟ ≈ 11.69 stops.
⎝ (1/255)/12.92 ⎠
This is comparable with the raw DR provided by consumer cameras. If required, it
may be necessary in some cases to use a custom tone curve to transfer the full raw
DR to the output image file.
2-15
Physics of Digital Photography (Second Edition)
Figure 2.8. (Left) Example tone curve (blue) and gamma curve with γE = 1/2.2 (black) applied to linear raw
data normalised to the range [0,1]. (Right) Corresponding curves plotted using gamma-encoded axis values
with γE = 1/2.2.
1. An s-curve can reduce the raw DR transferred to the output image file (the
image DR) down to a value commensurate with the contrast ratio of a
typical display (the display DR) or the contrast ratio of a typical photo-
graphic scene.
2. An s-curve increases mid-tone contrast in the image, and this is considered to
be visually more pleasing to the HVS. The increased mid-tone contrast
occurs at the expense of contrast in the shadows and highlights, which
become compressed.
An s-curve reduces the raw DR transferred to the output image file by raising the
black clipping level and therefore lowers the image DR compared to that provided
by the encoding gamma curve of the chosen output-referred colour space.
The example s-curve illustrated in figure 2.8 can accommodate approximately 7.7
stops of raw DR after quantizing to 8-bit DOLs compared to the 17.59 stops
provided by the example encoding gamma curve with γE = 1/2.2. In this example, the
image DR is slightly less than 8 stops since the s-curve gradient near the black level is
slightly less than unity when plotted on linear axes.
In general, the image DR represented by an 8-bit JPEG file that has already had
an arbitrary tone curve applied cannot be deduced unless the details of the tone
curve are known. The tone curve can be determined experimentally by photograph-
ing a calibrated step wedge and calculating the opto-electronic conversion function
(OECF) [9]. The OECF defines the relationship between radiant flux (see chapter 3)
at the SP and the output image file DOLs.
Finally, it should be noted that unlike traditional DSLR cameras, modern
smartphone cameras do not necessarily apply the same fixed global tone curve to
all images. Instead, the tone curve may adapt to each individual image.
2-16
Physics of Digital Photography (Second Edition)
Furthermore, the tone curve may become dependent on pixel location, which is
known as local tone-mapping. This is discussed further in section 2.12.2.
2-17
Physics of Digital Photography (Second Edition)
raw raw
headroom shadow DR highlight DR headroom
0 118 255
Figure 2.9. DR contributions for a model sensor response curve. Here the image DR is defined by 8-bit DOLs
between DOL = 255 and the black clipping point, typically DOL = 1. Middle grey is positioned at DOL = 118
for a white-balanced output JPEG image encoded using the sRGB colour space. The total raw headroom is the
raw DR minus the image DR. The magnitude of the upper and lower contributions to the raw headroom
depends upon the image DR along with the camera manufacturer’s positioning of DOL = 118 on the sensor
response curve.
Y (DOL = 255)
shadow DR =
Y (DOL = 118)
Y (DOL = 118)
highlight DR = .
Y (min. non‐clipped DOL)
The magnitude of the highlight DR and shadow DR is dependent upon camera
model and the nature of the default tone curve chosen by the camera manufacturer.
Camera manufacturers are free to place middle grey in the encoded JPEG image
file at any desired position on the sensor response curve. As discussed in section 2.6,
the position chosen by the camera manufacturer contributes to the standard output
sensitivity (SOS) of the DOLs to incident photometric exposure. The SOS value is
used for the ISO setting as part of an exposure strategy.
The available raw headroom is dependent upon camera model. As evident in
figure 2.9, placing middle grey at a lower position on the sensor response curve will
increase the raw headroom contribution from above the JPEG white clipping point
and decrease the contribution from below the JPEG black clipping point.
Conversely, placing middle grey at a higher position on the sensor response curve
will decrease the raw headroom contribution from above the JPEG white clipping
point and increase the contribution from below the JPEG black clipping point.
It is beneficial in terms of SNR to place the image DR higher on the sensor
response curve, but this reduces the ability to recover clipped highlights when using a
custom tone curve. Camera manufacturers must balance these factors when design-
ing image-processing engines.
2.4 Histograms
Histograms provide a useful means of interpreting image data. This section discusses
luminance histograms and image histograms. An image histogram represents DOLs
and is the type of histogram shown on the viewfinder or rear liquid crystal display
(LCD) of a digital camera.
2-18
Physics of Digital Photography (Second Edition)
A raw histogram is a plot of photosite (sensor pixel) count as a function of raw value
for a specified raw channel. Since the raw data for each raw channel is approx-
imately linearly related to relative scene luminance, raw histograms are similar to
luminance histograms and will be heavily skewed to the left for typical photographic
scenes.
Figure 2.10. (a) Luminance histogram with an average relative luminance of 18.4%. (b) Corresponding image
histogram obtained by converting to 8-bit DOLs of the sRGB colour space.
2-19
Physics of Digital Photography (Second Edition)
For the sRGB colour space, the DOLs denoted by R ′DOL , G ′DOL , and B ′DOL can
be calculated using equations (4.26) and (4.27) of chapter 4 if the image is encoded
using the conventional sRGB encoding gamma curve. As described in section 2.3.2,
for preferred tone reproduction traditional DSLR in-camera image processing
engines may alter the luminance distribution before applying the encoding gamma
curve, and these steps may be combined by using a single LUT.
Image histograms corresponding to the JPEG output can seen on the back of a
digital camera when reviewing an image. Image histograms can also be seen directly
through the viewfinder in real time on a mirrorless camera or camera with liveview
capability. Furthermore, histograms provided by commercial raw converters are
image histograms representing DOLs even if a ‘linear’ tone curve is selected.
Since DOLs are distributed closely in line with lightness (relative brightness) and
not luminance, image histograms are much easier to interpret than luminance
histograms. For example, the same data used for the luminance histogram in
figure 2.10(a) appears much more symmetrical when plotted as DOLs of the sRGB
color space in figure 2.10(b). This is particularly useful when making exposure
decisions.
Standard exposure strategy for digital cameras is based upon the output JPEG
image file from the camera and not the raw data. Japanese camera manufacturers
are required to determine ISO settings using either the SOS or recommended
exposure index (REI) methods introduced by CIPA in 2004.
Standard exposure strategy assumes that 〈L〉 for a typical scene corresponds with
middle grey on the lightness scale. The aim of the SOS method is to ensure that 〈L〉
for a typical scene is correctly reproduced as middle grey in the JPEG output,
irrespective of the camera JPEG tone curve.
2-20
Physics of Digital Photography (Second Edition)
〈H 〉S = P. (2.12)
The constant P is known as the photographic constant. The value was obtained from
statistical analysis of typical scenes and user preference as to the nature of a well-
exposed photograph [10]. ISO 12232 is the latest standard on exposure metering and
this uses P = 10.
Reflected-light meters measure 〈L〉, the arithmetic average scene luminance. In
order to estimate 〈H 〉 given 〈L〉, the camera equation is rewritten in the following
way:
t
〈H 〉 = q 〈L〉 . (2.13)
N2
The constant q = 0.65 ensures proportionality between 〈L〉 and 〈H 〉. It combines
various assumptions about a typical lens and includes an ‘effective’ cos4 θ natural
vignetting value. A derivation is given in the section below.
The two equations above can be combined to give the reflected light meter
equation,
t K
= .
N2 〈L〉S
2-21
Physics of Digital Photography (Second Edition)
〈L〉
≈ 18%.
L max
In other words, the average scene luminance for a typical scene will always be
approximately 18% of the maximum scene luminance. Since 18% relative scene
luminance corresponds to middle grey, a typical scene is assumed to have a
luminance distribution that averages to middle grey.
2-22
Physics of Digital Photography (Second Edition)
2-23
Physics of Digital Photography (Second Edition)
meter readings according to the Zone System [16]. This effectively reduced the
recommended 〈H 〉 to a geometric average not consistent with the arithmetic average
used in ISO 2720. In fact, a geometric average is approximately a factor 8/10 smaller
for a typical scene than a true arithmetic average, and so photographs generally
appeared 1/3 of a photographic stop too bright [16]. The change in definition for
colour reversal film speed therefore enabled practitioners of the Zone System to
obtain correct exposure recommendations without the need for meter recalibration.
On the other hand, in-camera metering systems perform an arithmetic average and
must be consistent with the new film speed definition.
〈H 〉S = 10.
2-24
Physics of Digital Photography (Second Edition)
luminance ratio of 30:1 or scene DR of log2(90/3) = 4.9 stops, well below the 160:1
luminance ratio or 7.3 stops of scene DR often quoted for a typical scene. Higher
scene DR arises when multiple light sources are present and illuminate different
parts of the scene, and in particular when scene areas are shaded from a light source.
Recall that reflected-light meters measure average luminance and are calibrated in
terms of the metering constant K, was derived from the photographic constant P.
These constants were determined through observer statistical analysis of photo-
graphs of real scenes and can therefore be associated with an average luminance 〈L〉
rather than an average reflectance.
An estimate for 〈L〉 expressed as a percentage of the maximum scene luminance
can be obtained by comparing the value of the hand-held reflected-light metering
constant K with the value of the incident-light metering constant C recommended for
calibration of hand-held incident-light meters [17],
〈L〉 K
≈π .
L max C
For example, using the value C = 250 common for flat receptors and within the
range specified in ISO 2720 together with K = 12.5 yields 〈L〉/L max ≈ 16%. This
estimate is close to 18%, which has special significance in relation to the HVS since
18% relative luminance corresponds to middle grey on the lightness scale.
An interpretation of this result is that the scene luminance distribution for typical
scenes approximately averages to middle grey or an average luminance that is 18% of
the maximum. Equivalently, the average scene luminance is always log2(100/18) ≈
2.5 stops below the maximum scene luminance, irrespective of the scene DR.
2.5.6 Exposure value
Recall that a reflected light meter functioning using average photometry will
recommend combinations of N, t and S that satisfy the reflected-light meter equation,
t K
= .
N2 〈L〉S
This can be rewritten as the APEX (Additive System of Photographic Exposure)
equation desgined to simplify manual calculations
Ev = Av + Tv = Sv + Bv. (2.18)
Here Ev is the exposure value. The aperture value (Av), time value (Tv), speed value
(Sv) and brightness value (Bv) are defined by
Av = log2N 2
Tv = − log2t
S
Sv = log2
3.125
〈L〉
Bv = log2 .
0.3K
2-25
Physics of Digital Photography (Second Edition)
2-26
Physics of Digital Photography (Second Edition)
For sensors that use the same technology and have equal quantum efficiency and fill
factor, the sensitivity is independent of photosite area (sensor pixel size). This is
explained by figure 2.11.
Since 2004, Japanese camera manufacturers have been required to use either the
SOS or REI methods first introduced by CIPA [1]. The digital output is specified in
terms of DOLs in the output JPEG image and not in terms of raw levels.
Figure 2.11. Sensor sensitivity is independent of photosite area for sensors that use the same technology with
equal quantum efficiency and fill-factor. In this case, the photosites will fill with electrons at the same rate. An
analogy is commonly made to the fact that large and small buckets collect rainwater at the same rate.
2-27
Physics of Digital Photography (Second Edition)
speed is to ensure that the DR for a typical scene is placed on the sensor response
curve such that the output JPEG image file will not contain clipped highlights. In
this section, it will be assumed that the output image file is an 8-bit JPEG file.
For a typical scene with 18% average luminance, a scene object with 100%
relative luminance must be placed just below the JPEG clipping point in order to
prevent highlights from clipping. In fact, the definition includes a safety factor so
that a scene object with 100% relative luminance will be placed half a stop below the
JPEG clipping point. In other words, the highlights will not actually clip until the
average scene luminance drops to approximately 12.8% of the scene maximum.
2-28
Physics of Digital Photography (Second Edition)
The ISO speed Ssat can now be determined from Hsat by substituting equation (2.21)
into (2.20),
78
Ssat =
Hsat
This measured value should be rounded to the nearest standard value. The base ISO
speed corresponds to the analog gain setting that allows full use of the sensor
response curve.
Figure 2.12 shows the 8-bit JPEG tone curves for two different camera models. Since
both curves clip at exactly the same position, the measured ISO speed will be the
same for both cameras [18]. However, each curve leads to a different mid-tone
lightness since the DOL corresponding to 18% relative luminance is not the same.
Figure 2.12. 100% relative luminance is tied to the JPEG highlight clipping point (DOL = 255 for 8-bit output)
assuming a 12.8% average scene luminance. The position of DOL = 255 determines the ISO speed. Both JPEG
tone curves here define the same ISO speed but lead to different mid-tone image lightness. The JPEG clipping
point may be placed below the raw clipping point.
2-29
Physics of Digital Photography (Second Edition)
This is a disadvantage for photographers who primarily use JPEG output since
images from different camera models produced using the same S, t, N may not turn
out to have the same mid-tone lightness.
10
SSOS = . (2.22)
〈H 〉
The aim of SOS is to produce an image with a standard mid-tone lightness rather
than ensure a minimum image quality level.
SOS relates the ISO setting to a mid-tone DOL instead of the DOL at JPEG
saturation used by saturation-based ISO speed. The particular mid-tone chosen is
defined by DOL = 118, the reason being that this corresponds with middle grey (18%
relative luminance) on the standard encoding gamma curve of the sRGB colour space,
as shown in figure 2.13. In other words, if a typical scene with a 18% average luminance
is metered using average photometry, the output image will turn out to be photo-
metrically correct when viewed on a display with a compensating display gamma.
The camera manufacturer is free to place middle grey at any desired position on
the sensor response curve through use of both analog and digital gain, and the
measured SSOS value will adjust accordingly in order to ensure that the standard mid-
tone lightness is achieved. For example, the two JPEG tone curves shown in
figure 2.12 would lead to different measured SSOS values and hence a different
recommended Ev when metering the same scene. However, use of the recommended
Ev in both cases would lead to the same standard mid-tone lightness expected by the
photographer. Furthermore, the shadow and highlight clipping points do not affect
SSOS and can be freely adjusted. This is discussed further in section 2.6.4, which
discusses extended highlights.
2-30
Physics of Digital Photography (Second Edition)
Figure 2.13. (Upper) Standard sRGB gamma curve (magenta) with output quantised to 8 bits. Metered 〈L〉
corresponds to DOL = 118 assuming an 18% average scene luminance. (Lower) Same curve plotted using base
2 logarithmic units on the horizontal axis; 〈L〉 lies approximately 2.5 stops below Lsat and the curve clips to
black approximately 11.7 stops below Lsat .
Using q = 0.65 along with K = 15.4 as defined by equation (2.14) ensures that the
photographic constant P = 10,
P = Kq = 15.4 × 0.65 = 10.
Subsequently, SOS can be determined from equation (2.22). The calculated value
should be rounded to the nearest standard value tabulated in CIPA DC-004 or ISO
12232. This means that the quoted SSOS may differ from the calculated value to
within a tolerance of ± 1/3 stop.
2-31
Physics of Digital Photography (Second Edition)
2-32
Physics of Digital Photography (Second Edition)
Figure 2.14. The diagram on the left shows digital output level (DOL) as a function of stops above and below
middle grey (DOL = 118) for the OlympusⓇ E-620. A JPEG image taken at ISO 200 has an extra stop of
highlight headroom compared to an image taken at ISO 100. The diagram on the right shows the
corresponding raw levels as a function of scene brightness (logarithmic luminance). (Reproduced from [19]
courtesy of http://www.dpreview.com.)
Further information is provided by the light blue curve in the same diagram on
the left-hand side. This is the JPEG tone curve obtained at ISO 100 when
underexposing by one stop by using the t and N recommended by metering at
ISO 200. Naturally the image is darker, but notably the highlights have the same
clipping point as the ISO 200 curve.
The diagram on the right-hand side of figure 2.14 shows the corresponding raw
levels. These curves are nonlinear since the horizontal axis represents scene bright-
ness (logarithmic luminance) rather than relative scene luminance. Surprisingly, it
can be seen that ISO 100 and ISO 200 do not produce the same raw data.
Furthermore, the ISO 100 curve underexposed by one stop coincides exactly with
the ISO 200 curve.
The above investigation reveals that ISO 100 and ISO 200 are in fact using the
same analog gain setting. The ISO 200 setting is defined by metering at ISO 100 and
then reducing the metered Ev by one stop. This effectively lowers the position of
middle grey (18% relative luminance) on the sensor response curve by one stop.
Digital gain is then utilised by applying a different tone curve to the ‘underexposed’
raw data. This tone curve renders JPEG output with the standard mid-tone lightness
by relating middle grey with DOL = 118, but with an additional stop of highlight
headroom made accessible by the ‘underexposing’ of the scene luminance distribution.
For example, if the ISO 100 tone curve saturates at the same point as the standard
sRGB curve, then the highlights at ISO 200 will clip at 200% relative luminance.
Equivalently, the average scene luminance can drop to 9% before the highlights must
clip. This additional stop of highlight headroom is lost if the ISO 100 ‘low-ISO’
setting is selected. An equivalent viewpoint is that the ISO 100 setting is defined by
‘overexposing’ the raw data by one stop, which causes a loss of one stop of highlight
2-33
Physics of Digital Photography (Second Edition)
headroom in the JPEG output at ISO 100 compared to ISO 200. In reality, neither
viewpoint can be considered as underexposing or overexposing because SOS permits
the application of both analog and digital gain [19].
When reviewing cameras, reviewers may investigate and report the nature of the
JPEG tone curve. As described in section 2.3.4, the number of stops provided by the
tone curve above and below middle grey (DOL = 118) is referred to as the highlight
DR and shadow DR, respectively [19]. This information is useful for photographers
who primarily use JPEG output from the camera. Since the application of digital
gain adversely affects achievable SNR, camera manufacturers must balance various
image quality trade-offs when designing an in-camera JPEG processing engine, and
so it is important to consider all aspects of the JPEG output when comparing camera
image quality. Photographers who process the raw data themselves using raw
processing software are of course free to apply any desired tone curve to the raw
data, and so information regarding aspects of the raw data such as raw DR will be of
greater interest.
2-34
Physics of Digital Photography (Second Edition)
2-35
Physics of Digital Photography (Second Edition)
t C
2
= .
N ES
Here C is the incident light meter calibration constant [2, 12]. The above equation can
be expressed in terms of the APEX system described in section 2.5.6,
Ev = Av + Tv = Sv + Bv = Sv + Iv.
Here Iv is the incident light value,
E
Iv = log2 .
0.3 C
Incident-light metering is useful when an important subject must be exposed
correctly. It is also used for studio flash photography. Incident-light metering is
more reliable than reflected-light spot metering as the measurement is independent
of the percentage relative luminance of the subject, which does not need to be middle
grey. Since the illuminance incident on the subject is known, the subject will be
exposed correctly according to its own percentage reflectance.
2-36
Physics of Digital Photography (Second Edition)
2-37
Physics of Digital Photography (Second Edition)
2-38
Physics of Digital Photography (Second Edition)
Figure 2.15. The subject can appear as a silhouette under strong back lighting.
2-39
Physics of Digital Photography (Second Edition)
Consequently, the shadows and highlights will have softer edges and so the
subject contrast and scene DR will be lower.
In general, the more indirect the light source, the softer the light. Although harsh
lighting may be desirable in certain situations, soft or diffuse lighting is generally
preferred and is considered to be light of higher quality. Natural light can be diffused
by three main processes:
1. Diffuse reflection from surfaces. Diffuse reflecting materials have an irregular
molecular structure. When incoming light is incident at the surface of such a
material, the molecules near the surface vibrate and cause the light to be
emitted in many different directions.
2. Scattering from small objects. For example, Mie scattering can describe the
scattering of light by cloud droplets and dust particles in the sky.
3. Rayleigh scattering from air molecules in the atmosphere. The mechanism of
Rayleigh scattering is different to ordinary scattering since air molecules are
much smaller than the wavelength of the incoming light. Consequently, the
waves will be diffracted and emerge as spherical waves.
Light is naturally softer in the early morning and evening as the distance from the
Sun to the observer is greater. Therefore light will be scattered to a greater extent.
Since light from a flash unit is direct and therefore harsh, dedicated diffusers are
often used for flash photography. Larger diffusers produce softer light.
denser
B layer
sunlight A Earth
Figure 2.16. The orange and red light seen at a sunrise or sunset is caused by increased Rayleigh scattering due
to the greater distance travelled by the sunlight.
2-40
Physics of Digital Photography (Second Edition)
waves. The main contributions are from blue and green since violet itself is partly
absorbed by ozone in the stratosphere. Furthermore, the photopic (daylight) eye
response falls off rapidly into the violet end of the visible spectrum, as evident from
the standard 1924 CIE luminosity function for photopic vision illustrated in
figure 3.1 of chapter 3.
In the early morning or late evening, the same observer will be at a position that
corresponds to B or C in figure 2.16. In these cases, the sunlight must travel a much
greater distance through the atmosphere to reach the observer. Furthermore, the
atmosphere closer to the Earth’s surface is denser. Consequently, the blue end of the
visible spectrum will have been scattered away by the time the sunlight reaches
the observer. The remaining mixture of wavelengths appear orange, and eventually red.
Rayleigh scattering can be enhanced by pollutants such as aerosols and sulfate
particles close to the Earth’s surface provided their size is less than λ/10. Mie
scattering from dust and a mild amount of cloud providing diffuse reflection can
both enhance the appearance of any coloured light that has already undergone
Rayleigh scattering, thus creating a sunrise or sunset with a more dramatic
appearance. An example is illustrated in figure 2.17.
2-41
Physics of Digital Photography (Second Edition)
Accordingly, the original Ev can be restored by increasing Tv, thus enabling a longer
exposure duration to be used than would originally have been possible. A variety of
ND filter labelling systems are in use. For example:
• Filter factor (transmittance).
An ‘NDn’ or ‘ND ×n ’ filter has fractional transmittance T = 1/n. This reduces
the Ev by log2n stops. For example, T = 1/32 for an ND32 filter, and so the
Ev is reduced by 5 stops.
• Optical density.
An ‘ND d’ filter has an optical density d = −log10T . Since this is a base 10
logarithmic measure, the Ev is reduced by log2(10−d ) stops. For example, an
ND 1.5 filter reduces the Ev by 5 stops.
• ND 1 number (stops).
This is a direct way of labelling the Ev reduction in stops. For example, an
ND 105 filter reduces the Ev by 5 stops.
There are several situations where ND filters are useful. For example:
1. Shallow DoF in bright daylight conditions. In this case, use of a low f-
number to achieve a shallow DoF may require an exposure duration t
shorter than the fastest shutter speed available on the camera. In this case,
the photograph would be overexposed. However, use of an ND filter can
prevent overexposure and the associated clipping of highlights by allowing a
longer exposure duration to be used.
2. Creative motion blur effects. In particular, landscape photographers use ND
filters in order to smooth the appearance of the flow of water.
2-42
Physics of Digital Photography (Second Edition)
Figure 2.18. Use of a soft-edge graduated 0.9 ND filter has prevented the foreground shadow detail from
clipping to black.
There are several types of GND filter available, each providing a different trans-
mittance profile. For example:
• Hard edge GND filter.
This provides a hard transition between the dark and clear areas, and is useful when
the scene contains a horizon. The transition line is typically located halfway. If the
scene contains objects projected above the horizon, the lightness of the objects may
need to be corrected digitally when processing the raw file or output image.
• Soft edge GND filter.
This provides a more gradual transition between the dark and clear areas.
The effective transition line is again typically halfway. A soft edge GND filter
is more useful than a hard edge filter when the scene contains objects
projected above the horizon.
• Attenuator/blender GND filter.
This provides a gradual transition over the entire filter and is useful when the scene
does not contain a horizon. This type of filter is also used in cinematography.
• Reverse GND filter.
Here the darkest region of the filter is located midway, and the lightest region is
at the top. This is useful for sunrise and sunset scenes that contain a horizon.
• Center GND filter.
Nowadays this refers to a reverse GND filter mirrored at the transition line,
which is useful when a sunrise or sunset is reflected onto water. The term can
2-43
Physics of Digital Photography (Second Edition)
Figure 2.21 illustrates an example scene taken with and without the use of a
polarizing filter.
y
E
Ey
θ
x
Ex
(a) (b)
Figure 2.19. (a) The electric field vector for unpolarized light can take any value at random. (b) Viewed along
the z-axis out of the page, the electric field vector E can be resolved into x and y components.
2-44
Physics of Digital Photography (Second Edition)
y
y
x
x
(a) (b)
Figure 2.20. (a) Linear polarization. In this example, the wave is confined to lie parallel to the y-axis. (b) The
corresponding vector E is fixed in the direction of the y-axis.
Figure 2.21. (Left) Photograph taken with a polarizing filter fitted to the lens. (Right) Photograph taken
shortly afterwards with the polarizing filter removed.
2-45
Physics of Digital Photography (Second Edition)
A2
Ee = 〈∣Ex∣2 〉 = A2 〈cos2 θ〉 = .
2
An ideal polarizing filter therefore acts as a 1-stop ND filter when the transmitted
light is unpolarized.
On the other hand, completely plane polarized light has a fixed angle θ that
defines the plane of polarization. This angle may be defined relative to a convenient
choice of x and y coordinate axes which are perpendicular to the direction of
propagation. The polarizing filter itself transmits light only in a plane defined by the
angle of rotation of the filter. This plane is referred to as the plane of transmission.
When a beam of plane polarized light passes through a polarizing filter, the axes
defining the angle θ can be aligned with the plane of transmission. In this way, θ
defines the angle between the plane of polarization and the plane of transmission.
The filter eliminates the perpendicular Ey component, and only the Ex component
remains in the beam. Since the angle θ is fixed and does not fluctuate, the irradiance
is reduced to the following value:
Ee = ∣Ex∣2 = A2 cos2 θ .
This is known as Malus’ law [21]. When the plane of polarization and the plane of
transmission are aligned, θ = 0° and so 100% transmission is achieved in principle.
When θ = 90°, no light is transmitted. In practice, polarizing filters are not ideal and
so the transmission never quite reaches these extremes. When partially or fully plane
polarized light is mixed with unpolarized light, the 100% transmission figure will not
be achieved even for an ideal filter. As already noted, the utility of the polarizing
filter is that the ratio between the unpolarized light and partially or fully plane
polarized light entering the lens can be altered.
2-46
Physics of Digital Photography (Second Edition)
normal
reflected
plane of
incidence
n φ φ
(a)
n
refracted
normal s-polarized
φB φB
n
(b)
n
s and p-polarized
Figure 2.22. (a) Reflection and refraction at a dielectric surface. (b) At Brewster’s angle ϕB , the reflected beam
is completely polarized perpendicular to the plane of incidence.
plane polarized by s-vibrations, and the reflected light is said to be s-polarized. This
happens when the angle of incidence ϕ takes a particular value known as Brewster’s
angle, ϕB. Significantly, ϕB depends only on the refractive indices of the materials, n
(usually air) and n′. Simple trigonometry shows that
n′
tan ϕB = .
n
For a beam of light in air (n = 1) incident on a glass dielectric surface (n′ = 1.5),
Brewster’s angle ϕB ≈ 57 ° and approximately 15% of the incident light is reflected. If
the beam is incident upon water (n′ = 1.33), then ϕB ≈ 53 °.
If the light beam is incident on a stack of a sufficient number of glass plates, all
s-polarized light will eventually be reflected from the incident beam, leaving only
p-polarized light in the refracted beam [21].
2-47
Physics of Digital Photography (Second Edition)
smaller than the wavelength of the incoming light [21]. Since shorter wavelengths
experience greater scattering but the violet light is partially absorbed by ozone, the
overall colour of the sky is pale blue during the daytime.
Light that scatters in the plane perpendicular to the direction of the incoming
light will be completely plane polarized. If the incoming light is propagating in the
z-direction, light that scatters at an angle θ with the z-axis will only be partially plane
polarized, and the light parallel to the z-axis will be unpolarized. This is illustrated in
figure 2.23. For an observer on the ground, light scattered from the vertical strip of
sky positioned at a right angle between the Sun and the observer will exhibit the
strongest polarization.
Since light from clouds will be unpolarized due to repeated diffuse reflection,
rotating the polarizing filter to reduce the polarized light will cause the blue sky to be
selectively darkened. Figure 2.24 shows how use of a polarizing filter at wide angles
can reveal the graduated darkening with respect to angle θ.
plane-polarized
unpolarized
x
sun
particle
θ z
partially
plane-polarized plane-polarized
observer
Figure 2.23. Scattering of incoming sunlight from an air molecule in the sky.
2-48
Physics of Digital Photography (Second Edition)
Figure 2.24. The sky polarization gradient is revealed by use of a strong polarizing filter at a wide angle.
the light is prevented from entering the autofocus and metering modules in a plane
polarized state.
2-49
Physics of Digital Photography (Second Edition)
The frames are combined to construct an HDR image, which describes a scene
luminance distribution of higher DR than that achievable using a single frame. The
distribution can be referred to as an HDR luminance map.
The basic theory behind the construction of an HDR luminance map is discussed
below, which is followed by a brief description of tone mapping. In common with
raw data, HDR images are approximately linear. Since true linear HDR displays are
not yet available on the consumer market, the HDR luminance map needs to be
compressed to fall within a luminance ratio commensurate with that of a typical low
dynamic range (LDR) display.
The third requirement above, which is known as the Luther–Ives condition, is rarely
satisfied in practice. This means that raw data is only approximately proportional to
illuminance at the SP, and this proportionality also varies with the spectral
composition of the illumination.
Nevertheless, it can be assumed for simplicity that the raw data expressed using
DNs is proportional to the average illuminance at the corresponding pixel on the SP,
n DN ∝ t 〈E 〉 = 〈H 〉.
For a given frame, dividing the raw levels by the frame exposure duration yields a
distribution of scaled or relative illuminance values
n
〈E 〉 ∝ DN . (2.24)
t
Significantly, the range of relative illuminance values can be extended by taking
multiple frames, each using a different t. When equation (2.24) is applied to each
2-50
Physics of Digital Photography (Second Edition)
frame, the overall result will be a relative illuminance distribution of higher DR that
covers the scene DR captured by all frames. This distribution can be referred to as
an HDR relative illuminance map.
It should be noted that the individual relative illuminance distributions for the
frames will in general overlap, and so the relative illuminance at a given pixel could,
in principle, be deduced from any of the frames that cover the value. However, each
frame will have a different noise distribution since each frame corresponds to a
different Ev. This is discussed in section 3.8 of chapter 3, where it is shown that
photon shot noise typically increases as the square root of the electron count, and so
SNR generally improves at higher Ev. For a given pixel, a naive solution to this issue
would be to use the relative illuminance value from the frame with the lowest noise.
However, it will be shown in section 5.10.3 of chapter 5 that an overall gain in SNR
can be achieved by frame averaging, meaning that temporal noise can be reduced by
averaging over all valid frames. Since the exposure duration of each frame is
different, appropriate weighting factors can be included that maximize overall SNR
when the frame averaging is performed [24].
The optimum weighting depends upon all noise sources [24], which include fixed
pattern noise, read noise and photon shot noise. If only photon shot noise is
included, the optimum weighting for frame i turns out to be its exposure duration
t (i ) ,
∑t(i )〈E (i )〉
E^ = i
.
∑t(i )
i
Here E^ denotes the optimized HDR relative illuminance map. Note that all relative
illuminance values obtained from any clipped n DN will be incorrect. These will
ideally be omitted from the frame averaging by using techniques that take into
account the noise distribution [24].
Assuming that the lens has been corrected for natural vignetting, the HDR
relative illuminance map is proportional to the HDR relative luminance map that
describes the scene luminance distribution. If absolute measurements of the
maximum and the minimum luminance levels are taken using a luminance meter
at the time the frames are captured, an HDR luminance map that estimates the
absolute scene luminance distribution can also be determined.
In practice, the frames used to construct the HDR image can be obtained by
performing raw conversion without any tone curve or encoding gamma curve
applied, and saving the frames as linear 16-bit TIFF images. If the ‘dcraw’ freeware
raw converter is used, the commands are
dcraw ‐v ‐w ‐H 0 ‐o 1 ‐q 3 ‐4 ‐T filename,
where sRGB has been chosen as the output colour space. If only camera output
JPEG images are available, the encoding gamma curve (or preferably the overall
tone curve) needs to be reversed before the images can be combined.
2-51
Physics of Digital Photography (Second Edition)
2-52
Physics of Digital Photography (Second Edition)
DR that is not transferred to the image. This can be recovered by applying a custom
tone curve to the raw file using an external raw converter.
The type of tone curve described above is an example of a global TMO as it
operates on all pixels identically. In the context of HDR imaging, alternative types
of TMOs have been developed within the computer science community that
generally aim to compress the HDR luminance map into a smaller luminance ratio,
while at the same time minimising the visual impact of the luminance compres-
sion. For example, local tone-mapping operators (local TMOs) have been
developed, which are pixel-dependent operators where the tone mapping depends
upon the pixel environment. Although global contrast must ultimately be reduced,
local TMOs preserve contrast locally. This technique takes advantage of the fact
that the HVS is more sensitive to the photometrically correct appearance of local
contrast in comparison to the global luminance levels throughout the tone-
mapped image, which are photometrically incorrect. A drawback of local TMOs
is that they can produce undesirable artefacts such as halos around high-contrast
edges.
As with conventional LDR images, tone-mapped images are encoded using
DOLs and stored using conventional formats such as JPEG or TIFF. Nevertheless,
the various types of TMO that have been developed are not necessarily implemented
in the same manner [25]. Strategies include the following:
1. The linear HDR pixel values are directly tone-mapped into DOLs. Gamma
encoding is not required as it has essentially been built into the TMO.
2. The linear HDR pixel values are tone-mapped in the scene-referred lumi-
nance domain. These LDR values result from luminance compression and
are therefore pseudo-linear values that subsequently need to be transformed
into DOLs by applying the encoding gamma curve of the chosen output-
referred colour space, γE .
3. The linear HDR pixel values are tone-mapped into LDR output display
luminance values. In this case, the required DOLs can be obtained by
inverting a display model for the display device that relates DOLs and display
luminance. An example display model is described in section 2.13.2.
As discussed in the next section, the brightness and contrast controls of a display
device provide an additional form of global tone mapping.
Figure 2.25 illustrates the use of a local TMO to produce a satisfactory output
image. Figure 2.25(a) shows the JPEG output from the camera without any EC
applied. Evidently, the highlights have clipped. Figure 2.25(b) shows the JPEG
output using −3 Ev where the highlight information has been preserved but the
shadows have clipped. Figure 2.25(c) shows the JPEG output obtained using +3 Ev.
Finally, figure 2.25(d) shows a locally tone mapped JPEG image. This was
constructed by reversing the tone curves applied to the JPEG images, combining
them into an HDR image, applying a local TMO, and reapplying the encoding
gamma curve.
2-53
Physics of Digital Photography (Second Edition)
Figure 2.25. (a) JPEG output as metered by camera. (b) JPEG output using −3 Ev. (c) JPEG output using
+3 Ev. (d) Locally tone-mapped and enhanced JPEG image.
2.13.1 Luma
The voltage signals used to drive the display channels are not directly derived from
the DOLs. Instead, JPEG encoding involves a conversion of the DOLs to the
Y’CbCr colour space, also written as Y ′CRCB.
The Y ′ component is defined as luma, and the CR and CB are blue-difference and
red-difference chroma components, respectively. The Y ′CRCB representation is more
efficient since the chroma components can be subsampled. Chroma subsampling
takes advantage of the low sensitivity of the HVS to colour differences so that the
chroma components can be stored at a lower resolution.
Recall that relative luminance Y is a weighted sum of linear relative tristimulus
values. For example, Y for the sRGB colour space is calculated from equation (2.2),
Y = 0.2126 RL + 0.7152 G L + 0.0722 BL.
2-54
Physics of Digital Photography (Second Edition)
On the other hand, luma Y ′ is defined as a weighted sum of DOLs. For example, Y ′
for the sRGB colour space is defined as
Y ′ ≈ Y γE . (2.25)
2-55
Physics of Digital Photography (Second Edition)
C + B = 1. (2.28)
Although tone curves applied by in-camera image processing engines are designed to
restrict the image DR to a value commensurate with the contrast ratio of a typical
display, the image DR often exceeds the display DR, particularly if a custom tone
curve has been used to transfer the entire raw DR to the encoded output image.
2-56
Physics of Digital Photography (Second Edition)
Figure 2.26. Absolute display luminance L n for an LCD display as a function of (a) luma Y ′, and (b) relative
luminance Y. In both cases L refl has been set to zero, and C + B = 1 so that the image DR is compressed into
the contrast ratio of the display.
In this case, the image DR is compressed into the display DR through contrast
reduction provided equation (2.28) holds,
C + B = 1.
This is illustrated in figure 2.26. If C + B = 1 were to be violated, then clipping of
image DOLs would occur if the image DR were to exceed the display DR.
Recall that the raw DR cannot be larger than the scene DR, and the image DR
cannot be larger than the raw DR when the output image is produced from the raw
file. In other words, the image DR is always the maximum scene DR that can be
rendered on the display. If the display DR is larger than the image DR, the image
DR can be expanded into the display DR through contrast expansion. However,
contrast expansion cannot increase the represented scene DR.
References
[1] Camera & Imaging Products Association 2004 Sensitivity of digital cameras CIPA DC-004
[2] International Organization for Standardization 2006 Photography–Digital Still Cameras–
Determination of Exposure Index, ISO Speed Ratings, Standard Output Sensitivity, and
Recommended Exposure Index, ISO 12232:2006
[3] Bayer B E 1976 Color imaging array US Patent Specification 3971065
[4] International Electrotechnical Commission 1999 Multimedia Systems and Equipment–
Colour Measurement and Management–Part 2-1: Colour Management–Default RGB
Colour Space–sRGB, IEC 61966-2-1:1999
[5] Adobe Systems Incorporated 2005 AdobeⓇ RGB (1998) Color Image Encoding Version
2005-05
[6] Martinec E Noise, Dynamic Range and Bit Depth in Digital SLRs (unpublished)
[7] Poynton C 2003 Digital Video and HDTV algorithms and Interfaces (San Mateo, CA:
Morgan Kaufmann Publishers)
2-57
Physics of Digital Photography (Second Edition)
[8] Sato K 2006 Image-processing algorithms Image Sensors and Signal Processing for Digital
Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 8
[9] International Organization for Standardization 2009 Photography–Electronic Still-Picture
Cameras - Methods for Measuring Opto-Electronic Conversion Functions (OECFs), ISO
14524:2009
[10] Connelly D 1968 Calibration levels of films and exposure devices J. Photogr. Sci. 16 185
[11] American National Standards Institute 1971 General-Purpose Photographic Exposure
Meters (Photoelectric Type), ANSI PH3.49-1971
[12] International Organization for Standardization 1974 Photography–General Purpose
Photographic Exposure Meters (Photoelectric Type)–Guide to Product Specification, ISO
2720:1974
[13] International Organization for Standardization 1982 Photography—Cameras—Automatic
Controls of Exposure, ISO 2721:1982
[14] American National Standards Institute 1979 Method for Determining the Speed of Color
Reversal Films for Still Photography, ANSI PH2.21-1979
[15] International Organization for Standardization 2003 Photography–Colour Reversal Camera
Films–Determination of ISO Speed, ISO 2240:2003
[16] Holm J 2016 private communication
[17] Stimson A 1962 An interpretation of current exposure meter technology Photogr. Sci. Eng. 6 1
[18] Yoshida H 2006 Evaluation of image quality Image Sensors and Signal Processing for Digital
Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 10
[19] Butler R 2011 Behind the Scenes: Extended Highlights! (http://www.dpreview.com/articles/
2845734946)
[20] George C 2008 Mastering Digital Flash Photography (Lewes: Ilex Press)
[21] Jenkins F A and White H E 1976 Fundamentals of Optics 4th edn (New York: McGraw-Hill)
[22] Goldberg N 1992 Camera Technology The Dark Side of the Lens (New York: Academic)
[23] Debevec P E and Malik J 1997 Recovering high dynamic range radiance maps from
photographs Proceedings: SIGGRAPH '97: Proceedings of the 24th Annual Conference on
Computer Graphics and Interactive Techniques (New York: ACM Press/Addison-Wesley) pp
369–78
[24] Granados M, Adjin B, Wand M, Theobalt C, Seidel H-P and Lensch H P A 2010 Optimal
HDR reconstruction with linear digital cameras Proc. of the 2010 IEEE Computer Society
Conf. on Computer Vision and Pattern Recognition (San Francisco, CA) (Piscataway, NJ:
IEEE) 215–22
[25] Eilertsen G, Mantiuk R K and Unger J 2017 A comparative review of tone-mapping
algorithms for high dynamic range video Comput. Graph. Forum 36 565–92
[26] Mantiuk R K, Myszkowski K and Seidel H-P 2015 High dynamic range imaging Wiley
Encyclopedia of Electrical and Electronics Engineering (New York: Wiley)
2-58
IOP Publishing
Chapter 3
Raw data model
includes basic PSFs for the optics, OLPF, and sensor components. Subsequently, a
model for the charge signal generated by the blurred and sampled optical image at
the SP is developed, along with a model for the conversion of the charge signal into
digital raw data. A model for signal noise is also included.
The PSF and OTF are fundamental measures of image quality (IQ). In particular,
the modulation transfer function (MTF), which can be extracted from the OTF, is
widely used in photography to interpret lens performance and to define camera
system resolving power. The camera model derived in this chapter provides
mathematical formulae that will be used to discuss IQ in chapter 5.
3-2
Physics of Digital Photography (Second Edition)
3.1.1 Radiometry
Table 3.1 lists the spectral radiometric counterparts of all the photometric quantities
introduced in section 1.5.1 of chapter 1. Spectral radiometric quantities are denoted
with an ‘e,λ’ subscript to distinguish them from photometric quantities. The ‘e’
represents ‘energetic’.
In a spectral radiometric description, the lens transforms the scene spectral
radiance distribution at each λ into the sensor plane spectral irradiance distribution
at the same λ.
Full radiometric quantities are obtained from their spectral representations by
directly integrating over all wavelengths. For example, irradiance Ee is obtained
from spectral irradiance Ee,λ in the following manner:
Ee = ∫ Ee,λ dλ.
Table 3.1. Common spectral radiometric, radiometric, and photometric quantities with their symbols and
SI units.
Radiometry
Radiant flux Φe W J s− 1
Radiant intensity Ie W sr−1
Radiance Le W sr−1 m−2
Radiant exitance Me W m −2
Irradiance Ee W m −2
Radiant exposure He J m −2
Photometry
Luminous flux Φv lm
Luminous intensity Iv lm sr−1 cd
Luminance Lv lm m−2 sr−1 cd m−2
Luminous exitance Mv lm m−2 lx
Illuminance Ev lm m−2 lx
Photometric exposure Hv lm s m−2 lx s
3-3
Physics of Digital Photography (Second Edition)
However, camera components such as the optics affect the spectral irradiance
distribution at the SP in a wavelength-dependent manner. Such contributions can be
included as spectral weighting functions before integrating over the range of
wavelengths that produce a response from the camera.
Figure 3.1. Standard 1924 CIE luminosity function for photopic (daylight) vision V (λ ) (green curve). The
curve is normalised to a peak value of unity at 555 nm. The 1951 CIE luminosity function for scotopic (low
light) vision V ′(λ ) is also shown (blue curve).
3-4
Physics of Digital Photography (Second Edition)
π ⎛x y⎞ 1 ⎧ ⎛ x y ⎞⎫
Eλ,ideal(x , y ) = Le,λ⎜ , ⎟ 2 T cos4 ⎨φ⎜ , ⎟⎬ . (3.2)
4 ⎝ m m ⎠ Nw ⎩ ⎝ m m ⎠⎭
Here Le,λ is the spectral scene radiance, Nw is the working f-number, T is the lens
transmittance factor and m is the Gaussian magnification. The magnification has
been used to express the coordinates on the SP denoted by (x , y ) in terms of the
coordinates on the object plane (OP).
The cosine fourth term can be replaced with an image-space term, R(x , y, λ ),
referred to as the relative illumination factor (RI). This models the combined effects
of the natural fall-off due to the cosine fourth law, along with the vignetting arising
from the specific real lens design [6]. The RI factor is normalised to unity at the
optical axis (OA). Vignetting arising from the lens design typically decreases as the
f-number N increases, and so the RI factor is a function of N.
3-5
Physics of Digital Photography (Second Edition)
Figure 3.2. Point spread functions obtained using a microscope. The white square represents the size of a
photosite with 8.5 μ m pixel pitch. Figure reproduced from [12] with kind permission from ZEISS (Carl Zeiss AG).
3-6
Physics of Digital Photography (Second Edition)
When the camera system is treated as LSI, the output function g (x , y ) can be
determined at an arbitrary position on the SP via the following convolution integral:
∞ ∞
g (x , y ) = ∫−∞ ∫−∞ f (x′, y′) h(x − x′, y − y′) dx′ dy′. (3.4)
g ( x , y ) = f ( x , y ) ∗ h( x , y ) . (3.5)
2. For illustration purposes, assume that the system is ideal apart from the
existence of an isolated PSF centred at position x0 on the SP and denoted by
h(x − x0 ). If the delta function at x0 is replaced by the PSF, which also
defines an area of unity, then weighting f (x ) by h(x − x0 ) and integrating
over all space will yield the output function value g (x0 ). In contrast to the
delta function, the PSF also has non-zero values at a range of positions x = xi
surrounding x0. Denoting the domain or kernel of the PSF by A, it follows
that
⎧ g(x0), x = x0
∞ ⎪
∫−∞ f (x ) h(x − x0) dx = ⎨ g(xi ), x = xi ∈ A .
⎪ 0, ∣x − x ∣ ∉ A
⎩ 0
3-7
Physics of Digital Photography (Second Edition)
3. Now consider a real system where blur exists at all positions on the SP. In
this case, all the PSFs associated with every point on the SP will overlap. In
order to determine the output function g (x ) at an arbitrary position x0, all
point spread towards position x0 needs to be evaluated. In an LSI system, it
turns out that this can be achieved simply by making the replacement
h(x − x0 ) → h(x0 − x ). In 1D, this amounts to flipping the PSF,
∞
∫−∞ f (x) h(x0 − x) dx = g(x0).
To see this graphically, consider the example PSF shown by the black curve in
figure 3.3 centred at point x0. This example PSF is symmetric for simplicity, but
this need not be the case in general. Three example points x1, x2 and x3
contained within the domain of the PSF are shown. The contribution to
g (x0 ) from x0 itself is the product f (x0 )h(x0 − x0 ) or f (x0 )h(0). The contri-
bution from x1 is seen to be the product f (x1)h(x0 − x1). Similarly, the
contributions from x2 and x3 are seen to be f (x2 )h(x0 − x2 ) and
f (x3)h(x0 − x3), respectively. The PSF has insufficient width to contribute to
g (x0 ) when centred at positions beyond x3. In summary,
g(x0) = f (x0)h(x0 − x0) + f (x1)h(x0 − x1)
+ f (x2 )h(x0 − x2 ) + f (x3)h(x0 − x3).
4. Since x actually varies continuously between x0 and x3, there are many more
contributions that must be accounted for,
x3
g (x 0 ) = ∫−x
3
f (x′) h(x0 − x′) dx′ .
f (x)
h(x)
x
x0 x1 x2 x3
A
Figure 3.3. Informal derivation of the convolution operation in 1D. The input function f (x ) is shown at
positions xi = x0, x1, x2 and x3. The PSF denoted by h(x ) is assumed to be shift-invariant.
3-8
Physics of Digital Photography (Second Edition)
The integration limits have been defined to take into account contributions
from both the left-hand side and right-hand side of x0. Moreover, the value x3
must be replaced by infinity to take into account a PSF of infinite extent,
∞
g (x 0 ) = ∫−∞ f (x′) h(x0 − x′) dx′.
The end result is that the output function g (x0 ) is given by the product of the
overlap of the input function f (x ) with the PSF centred at x0. In other words,
the functional form of the spread around x0 that contributes to the output at
x0 is defined by the PSF itself.
5. In an LSI system, the complete output function g (x ) can be obtained by
performing the above calculation at every output position x,
∞
g (x ) = ∫−∞ f (x′) h(x − x′) dx′ .
In this case, the PSF becomes infinitesimally narrow and its height extends to
infinity so as to preserve an area of unity. Since there is no spread around x
contributing to the output at x, the output function remains unchanged
compared to the input function,
∞
g (x ) = ∫−∞ f (x′) δ(x − x′) dx′ = f (x).
The next simplest PSF is a scaled delta function where B is a constant,
h(x − x′) = B δ(x − x′).
Again there is no spread around x contributing to the output at x and so the
result is a simple multiplication,
∞
g (x ) = B ∫−∞ f (x′) δ(x − x′) dx′ = B f (x).
2. Consider the input function f (x ) defined by the rectangle function illustrated
in figure 3.4(b),
f (x ) = rect(x ).
3-9
Physics of Digital Photography (Second Edition)
x0 x x0 x x0 x
wx 2wx
Figure 3.4. (a) Delta function of unit area positioned at x0. (b) 1D rectangle function centred at x0. (c) 1D
triangle function centred at x0.
g(x ) = tri(x ).
3-10
Physics of Digital Photography (Second Edition)
(a) x = -1 1
-3 -2 -1 0 1 2 3
(b) x = -0.5 1
-3 -2 -1 0 1 2 3
(c) g(x) 1
-3 -2 -1 0 1 2 3
Figure 3.5. Graphical solution of the convolution integral of example 2. The input function f (x′) is shown by
the blue rectangle function. The flipped PSF is shown in yellow. (a) For x = −1, the area of overlap is zero and
so g(−1) = 0. (b) For x = −0.5, the area of overlap shown in grey is equal to 0.5 and so g(−0.5) = 0.5. (c) The
complete output function g (x ) is a triangle function.
3-11
Physics of Digital Photography (Second Edition)
∞ ∞
h(x , y ) = ∫−∞ ∫−∞ H (μx , μy) ei 2π(μ x+μ y) dμx dμy
x y
∞ ∞ . (3.7)
H (μx , μy ) = ∫−∞ ∫−∞ h(x , y ) e−i 2π (μx x+μyy ) dx dy
A major advantage of the spatial frequency representation arises from the con-
volution theorem, which infers that the convolution operation in the real domain is
equivalent to a simple product in the Fourier domain. The above equation may
therefore be written as follows:
G (μx , μy )
H (μx , μy ) = .
F (μx , μy )
Recall that the total volume under a PSF is normalised to unity [14],
∫ ∫ h(x, y ) dx dy = 1.
Therefore, OTFs are always normalised to unity at (0,0).
The OTF is seen to provide a simple relationship between the ideal input and real
output spectral irradiance distributions as a function of spatial frequency. Since the
OTF is a complex quantity in general, more specific information can be extracted by
expressing the OTF in terms of its modulus and phase,
H (μx , μy ) = ∣H (μx , μy )∣ e iϕ(μx , μy ).
The modulus ∣H (μx , μy )∣ and phase ϕ(μx , μy ) are defined as the modulation transfer
function (MTF) and phase transfer function (PTF), respectively,
MTF(μx , μy ) = ∣H (μx , μy )∣
.
PTF(μx , μy ) = ϕ(μx , μy )
3-12
Physics of Digital Photography (Second Edition)
The MTF is commonly used as a fidelity metric in imaging. In order to gain physical
insight into the significance of the MTF, consider imaging a sinusoidal target pattern
at the OP defined by a single spatial frequency. The ideal spectral irradiance
waveform f (x , y ) at the SP is shown by the blue curve in figure 3.6. At the SP, the
spatial frequency of f is an image-space spatial frequency, which depends upon the
magnification. This spatial frequency has been denoted by μx0, μy0 .
The modulation depth of f is defined as the ratio of the magnitude of the waveform
(ac value) to the mean value (dc bias),
∣f ∣
Mf = .
dc
Notice that the modulation depth is independent of the spatial frequency μx0, μy0 . In
an LSI system, convolving f with a PSF will yield a real output spectral irradiance
waveform g (x , y ) at the same spatial frequency μx0, μy0 and with the same dc bias [1].
However, the magnitude and therefore the modulation depth will be attenuated
according to equation (3.8),
∣ g∣ 1 G (μx0 , μy0)
Mg(μx0 , μy0) = = f .
dc dc F (μx0 , μy0)
In other words, the modulation depth of the output waveform depends upon the
spatial frequency μx0 , μy0 [1, 15]. Combining the above equations yields the
following:
1 cycle
= 1/μx0
|f |
|g|
dc
x
Figure 3.6. The blue curve shows an ideal input irradiance waveform f (x, y ) defined by a single spatial
frequency μx0 , μy0 at the SP. The magenta curve shows an example reduction in magnitude after convolving
f (x, y ) with a PSF that partially fills in the troughs of the waveform. Phase differences have not been included.
3-13
Physics of Digital Photography (Second Edition)
Mg(μx0 , μy0)
∣H (μx0 , μy0)∣ = .
Mf
If the whole process is repeated but the spatial frequency of the sinusoidal target
pattern is changed each time, the following general function of image-space spatial
frequency will be obtained:
Mg(μx , μy )
∣H (μx , μy )∣ = .
Mf
This is precisely the MTF defined by equation (3.8).
Modulation depth can be interpreted by expressing the ac/dc ratio in the form of
the Michelson equation [15],
E max − E min
M= .
E max + E min
Here Emin and Emax are the minimum and maximum values of the spectral
irradiance waveform at a specified spatial frequency. Since neither Emax nor Emin
can be negative, it must follow that 0 ⩽ M ⩽ 1. This interpretation is synonymous
with contrast, although contrast as a term is applied to square waveforms rather
than sinusoidal waveforms If the camera component under consideration introduces
point spread, then
0 ⩽ Mg(μx , μy ) ⩽ Mf ⩽ 1.
0 ⩽ MTF(μx , μy ) ⩽ 1 .
Since most of the point spread is typically concentrated over a small blur spot
surrounding the ideal image point, the MTF will only be reduced slightly from unity
at low spatial frequencies. However, the MTF will drop significantly at higher
spatial frequencies as the peak to peak separation approaches the size of the blur
spot.
The spatial frequency at which the MTF drops to zero for the camera component
under consideration is defined as its cut-off frequency. Image information cannot be
transmitted above the cut-off frequency. This will be important for defining
resolving power in chapter 5.
As an example, a perfect aberration-free lens has an associated PSF that arises
due to lens aperture diffraction. For a circular lens aperture, the PSF is defined by
the well-known Airy pattern illustrated in figure 3.14 of section 3.2.4. The
corresponding MTF when the f-number is set at N = 4 is shown along one spatial
frequency direction in figure 3.7. Section 3.2 will discuss lens aperture diffraction in
further detail.
3-14
Physics of Digital Photography (Second Edition)
Figure 3.7. MTF due to lens aperture diffraction for monochromatic light with λ = 550 nm along one spatial
frequency direction for a perfect aberration-free lens set at N = 4. In this case the cut-off frequency is 455
cycles/mm.
3-15
Physics of Digital Photography (Second Edition)
Each of these component PSFs can be subdivided into further PSF contributions
that are similarly convolved together. The simple model described in this chapter
includes only the contribution to the lens PSF arising from lens aperture diffraction,
and the contribution to the sensor PSF arising from the detector aperture at each
photosite.
The aim is to determine the real spectral irradiance distribution at the SP denoted
by g (x , y ). This is obtained by convolving the camera system PSF with the ideal
spectral irradiance distribution denoted by f (x , y ):
g(x , y ) = hsystem(x , y ) ∗ f (x , y ),
where
f (x , y ) = Eλ,ideal(x , y )
g(x , y ) = Eλ(x , y ),
and Eλ,ideal(x , y ) is defined by equation (3.2),
π ⎛x y⎞ 1 ⎧ ⎛ x y ⎞⎫
Eλ,ideal(x , y ) = Le,λ⎜ , ⎟ 2 T cos4 ⎨φ⎜ , ⎟⎬ .
4 ⎝ m m ⎠ Nw ⎩ ⎝ m m ⎠⎭
Physically it is useful to think of the camera system PSF as a blur filter analogous to
those available in image editing software. The convolution operation is analogous to
the sliding of a blur filter over the ideal optical image to produce the real optical
image. The shape, diameter, and strength of the blur filter determines the level of
blur present in the real image. Most of the blur strength is concentrated in the region
close to the ideal image point.
It will be shown in section 3.3 that a sampling of the real spectral irradiance
distribution must accompany the detector-aperture contribution to the sensor PSF.
Ultimately this arises from the fact that sensor photosites are not point objects, and
so the charge signal obtained from a photosite is not a continuous function over the
SP. Mathematically, the sampling operation can be modelled as a multiplication by
the so-called comb function. The comb function is a 2D array of delta functions that
restrict the output of the convolution operation to the appropriate grid of sampling
positions defined by the photosite or sensor pixel spacings px and py,
⎡x y⎤
g˜(x , y ) = (hsystem(x , y ) ∗ f (x , y )) comb⎢ , ⎥ .
⎢⎣ px py ⎥⎦
3-16
Physics of Digital Photography (Second Edition)
the FT of the camera system PSF is the camera system transfer function or camera
system OTF,
Hsystem(μx , μy ) = FT{hsystem(x , y )} .
From the convolution theorem introduced in section 3.1.7, the camera system OTF
is obtained as a multiplication of the component OTFs,
Hsystem(μx , μy ) = Hoptics(μx , μy ) HOLPF(μx , μy ) Hsensor(μx , μy ) .
The camera system PSF is the inverse FT of the camera system OTF.
The following sections derive expressions for the optics, OLPF, and imaging
sensor contributions to the model camera system PSF and OTF.
3.2 Optics
The Airy pattern is a famous pattern in optics that arises from the PSF due to
diffraction at a lens aperture. Since diffraction is of fundamental importance to IQ in
photography, this section traces the physical origins of the diffraction PSF, starting
from the wave equation. The mathematical expression for the diffraction PSF for a
circular lens aperture is arrived at in section 3.2.4.
This section also briefly shows how a description of lens aberrations can be
included within the framework of linear systems theory by introducing the wave-
front error.
Electromagnetic waves are described by Maxwell’s equations, which form the basis
of electromagnetic optics. However, the theory of electromagnetic optics greatly
simplifies when the propagation medium satisfies four important properties. If the
medium is linear, isotropic, uniform and nondispersive [13, 16], then Maxwell’s
equations reduce to the following two wave-type equations:
1 ∂ 2E
∇2 E − =0
c 2 ∂t 2
1 ∂ 2H
∇2 H − 2 2 = 0.
c ∂t
3-17
Physics of Digital Photography (Second Edition)
E
λ
x H
Figure 3.8. Electromagnetic wave at a given instant in time propagating in the z-direction. The electric and
magnetic fields oscillate in phase and are perpendicular to each other and to the direction of propagation.
Here c is the speed of light in the medium. Significantly, these equations are also
satisfied by any of the individual vector field components:
1 ∂ 2U (r , t )
∇2 U (r , t ) − = 0. (3.9)
c2 ∂t 2
Here U (r, t ) can be any of Ex, Ey, Ez, Hx, Hy or Hz. In other words, it is not
necessary to solve for the full vector quantities when the medium satisfies the above
properties, and instead only one scalar wave equation needs to be solved. This is the
basis of the scalar theory of wave optics [13, 16].
At the interface between two dielectric media such as the lens aperture and the iris
diaphragm, equation (3.9) no longer holds for the individual components.
Nevertheless, the coupling at the interface can be neglected provided the aperture
area is large compared to the wavelength [13, 16]. In this case wave optics remains a
good approximation. On the other hand, the vector theory of electromagnetic optics
based upon Maxwell’s equations must be used when the medium does not satisfy one
of the properties of linearity, isotropy, uniformity or nondispersiveness. For
example, the vector theory is needed to describe polarization.
For monochromatic light, the complex solution of equation (3.9) can be written
U (r , t ) = U (r) e i 2πνt .
• ν is the optical frequency.
• U (r) is the complex field or complex amplitude,
U (r) = A(r) e iϕ(r).
• A(r) is a complex constant.
• ∣U (r)∣ defines the real amplitude.
• ϕ(r) is the phase describing the position within an optical cycle or wavelength.
3-18
Physics of Digital Photography (Second Edition)
Figure 3.9. Cross section of a spherical wave with the source at the origin. The phase is ϕ = arg(A) − kr and
the wavefronts satisfy kr = 2πn + arg(A), where n is an integer. The spacing between consecutive wavefronts is
given by λ = 2π /k .
Figure 3.9 shows example wavefronts for a spherical wave. These are positions with
the same phase. At a point on a wavefront, the vector normal to the wavefront at
that point describes the direction of propagation. These vectors can be associated
with the geometrical rays used in chapter 1.
Substituting U (r, t ) into equation (3.9) reveals that the complex amplitude must
satisfy the Helmholtz equation,
(∇2 + k 2 )U (r) = 0.
3-19
Physics of Digital Photography (Second Edition)
This relation is valid provided the irradiance is stationary, meaning that fluctuations
will occur over very short time scales but the average value will be independent of
time. In this case the time dependence of the wave solutions does not need to be
considered.
In reality, light is generally partially coherent, which requires a much more
complicated analysis [16]. In this chapter, the light will be assumed to be
polychromatic and incoherent, which is a reasonable approximation under typical
photographic conditions. In this simplified approach, the light can be treated
mathematically as monochromatic but incoherent, and the polychromatic effects
can be taken into account simply by integrating the spectral irradiance at the SP over
the spectral passband [4, 6]. The spectral passband of the camera is the range of
wavelengths over which it responds, λ1 → λ2 . The advantage of this approach is that
the camera response functions can be included as weighting functions when
integrating over the spectral passband. This will become apparent when deriving
the charge signal in section 3.6.
In the theory of wave optics, power per unit area is commonly referred to as the
optical intensity or simply the intensity. These terms will not be used in this book so
as to avoid confusion with the radiant intensity, which is a directional quantity
involving solid angle [9].
1 ∞ ∞
e ikR
U (x2 , y2 ) =
iλ
∫−∞ ∫−∞ U (x1, y1)
∣R∣
dx1 dy1 .
3-20
Physics of Digital Photography (Second Edition)
Since R depends only on the distance perpendicular to the OA between the source
points and the resulting field points rather than on their absolute positions in the
plane, R ≡ R(x2 − x1, y2 − y1), the Fresnel–Kirchhoff equation is seen to be a
convolution of the source field with a spherical wave impulse response,
⎛ 1 e ikR ⎞
U (x2 , y2 ) = U (x1, y1) ∗ ⎜ ⎟.
⎝ iλ ∣R∣ ⎠
The impulse response is an example of a Green’s function [13].
The Huygens–Fresnel principle can be observed when a wave passes through a
narrow opening such as a lens aperture. Spherical waves are seen to propagate
radially outwards, but the effects of superposition are restricted at points nearer to
the edges of the aperture. Consequently the whole wavefront is distorted as shown in
figure 3.10, leading to a visible diffraction pattern of constructive and destructive
interference. The diffraction pattern leads to a blurring of the optical image at the
SP. This blur contribution can be modelled by the diffraction PSF.
Figure 3.10. Diffraction of incoming plane waves at an aperture. Spherical waves propagating radially
outward are shown from three example source points marked by crosses. Superposition of spherical waves is
limited closer to the edges of the aperture leading to a spreading out of the waves and a visible diffraction
pattern at the SP.
3-21
Physics of Digital Photography (Second Edition)
the basic underlying principles. The general result is defined by equation (3.16), and the
result for the specific case of a circular aperture is defined by equation (3.19).
The following derivation proceeds by first determining the diffraction PSF for
fully coherent illumination. This is the diffraction amplitude PSF as it is based on
complex amplitude. A delta function source point at the axial position on the OP is
considered, and the lens is treated as a thin lens with the aperture stop at the lens. An
expression is derived for the complex amplitude response at the SP arising from the
delta function input. The magnification is then used to project the input coordinates
onto the SP to form an LSI system valid for coherent illumination, and the thin lens
model is generalised to a compound lens model. For fully incoherent illumination,
the diffraction PSF is obtained in terms of irradiance as the squared magnitude of
the amplitude PSF.
In this section, the following notation is used to distinguish spatial coordinates on
the OP, aperture plane, exit pupil (XP) plane and SP:
OP SP
Figure 3.11. A spherical wave emerging from an axial source position is modified by the phase transmission
function of the lens and subsequently converges towards the ideal image position at the SP.
3-22
Physics of Digital Photography (Second Edition)
1 e ikR
U (xap, yap) = .
iλ ∣R∣
Here R is the distance between the object position (xop, yop ) and the position
on the aperture plane (xap, yap ),
The geometry is shown in figure 3.12. The lens modifies the spherical
wavefront at the aperture plane in the following manner:
U ′(xap, yap) = U (xap, yap) a(xap, yap) tl (xap, yap).
Here tl is the phase transformation function [13, 16] for a converging thin lens
with focal length f,
⎛ k 2 ⎞
tl (xap, yap) = exp ⎜ −i
⎝ 2f
(
xap + yap2 ⎟ .
⎠
)
This form of tl neglects aberrations and is strictly valid only in the paraxial
region [13]. The aperture function a(xap, yap ) restricts the size of the aperture
by taking a value of unity where the aperture is clear, and zero where the
aperture is blocked.
Now the Fresnel–Kirchhoff equation can be applied to the set of source
points {(xap, yap )} lying in the aperture plane. The resulting field at the SP is
given by
1 ∞ ∞
e ikR ′
U ′(x , y ) =
iλ
∫−∞ ∫−∞ U ′(xap, yap)
∣R′∣
dxap dyap .
OP SP
R R
(xap , yap )
(xop , yop ) (x, y)
s s
Figure 3.12. The set of coordinates {(xap, yap )} lie in the aperture plane for a thin lens with the aperture at the
lens. The coordinates (xop, yop ) and (x, y ) denote the source and image positions, respectively. The distances
R and R′ are indicated for an example coordinate (xap, yap ) .
3-23
Physics of Digital Photography (Second Edition)
The complex amplitude U ′(x , y ) is the amplitude response due to the delta
function input and can be denoted hamp,diff (xop, yop ; x , y ). This emphasizes
the dependence on the OP coordinates through R. Putting everything
together leads to the following expression:
∞ ∞
1
hamp,diff (xop, yop ; x , y ) =
(iλ)2
∫−∞ ∫−∞ dxap dyap
(3.11)
e ikR e ikR ′
× tl (xap, yap) a(xap, yap)
∣R∣ ∣R′∣
The response hamp,diff (xop, yop ; x , y ) depends upon both the OP and SP
coordinates. It is therefore shift variant and not yet in the form of a PSF.
Before proceeding further, various simplifying approximations need to be
made.
(2) Simplifying approximations
Close to the OP, the spherical wavefronts can be approximated as parabolic
wavefronts. This is achieved by performing a Taylor expansion of R in the
numerator of equation (3.11) and dropping higher-order terms At the lens,
the parabolic phase term can be further expanded and terms dropped. This
leads to a plane wave approximation,
e ikR e ikR plane
≈ ,
∣R∣ s
where
2 2 2
xop + yop xap + yap2 xap xop + yap yop
R plane = s + + − .
2s 2s s
After diffraction by the lens aperture, similar approximations can be made.
The parabolic approximation in the so-called near-field region leads to a
description of diffraction known as Fresnel diffraction. The plane-wave
approximation in the far-field region where the SP is located leads to a
description known as Fraunhofer diffraction,
′
e ikR ′ e ikR plane
≈ ,
∣R′∣ s′
where
2
x2 + y2 xap + yap2 xap x + yap y
′ = s′ +
R plane + − .
2s′ 2s′ s′
3-24
Physics of Digital Photography (Second Edition)
e ik(s+s ′)
hamp,diff (xop, yop ; x , y ) ≈
(iλ)2 ss′
∞ ∞ ⎧ ik ⎫ ⎧ ik 2 2 ⎫
× ∫ ∫
−∞ −∞
a(xap, yap) exp ⎨ (x 2 + y 2 )⎬ exp ⎨
⎩ 2s′ ⎭ ⎩ 2s
xop + yop ⎬
⎭( )
⎧k ⎛1 1 1 ⎞⎫
× exp ⎨ xap
⎩2
(
2
+ yap2 ⎜ +
⎝s
) s′
− ⎟⎬
f ⎠⎭
⎧ −ik ⎫ ⎧ −ik ⎫
× exp ⎨
⎩ s (
xap xop + yap yop ⎬ exp ⎨
⎭ ) ⎩ s′ (
xap x + yap y ⎬ dxap dyap .
⎭ )
Fortunately, this expression can be simplified. Arguments can be given for
dropping the final exponential term on the second line [13]. Furthermore,
the exponential term on the third line vanishes according to the Gaussian
lens conjugate equation in air,
1 1 1
+ − = 0.
s s′ f
e ik(s+s ′) ⎧ ik 2 ⎫ ∞ ∞
hamp,diff (xop, yop ; x , y ) ≈ exp ⎨ ( x + y 2 ⎬
) ∫ ∫ a(xap, yap)
(iλ)2 ss′ ⎩ 2s′ ⎭ −∞ −∞
(3.12)
⎧ −ik ⎫ ⎧ −ik ⎫
× exp ⎨
⎩ s (
xap xop ) ⎬
+ yap yop exp
⎭
⎨ (
⎩ s′ ap
⎬ )
x x + yap y dxap dyap .
⎭
e ik(s+s ′) ⎧ ik ⎫
hamp,diff (x − mxop, y − myop) ≈ exp ⎨ (x 2 + y 2 )⎬
2
(iλ) ss′ ⎩ 2s′ ⎭
∞ ∞ ⎧ −ik ⎫ ⎧ −ik ⎫
× ∫−∞ ∫−∞ a(xap, yap) exp ⎨ (x − mxop)xap⎬ exp ⎨ (y − myop)yap⎬ dxap dyap .
⎩ s′ ⎭ ⎩ s′ ⎭
3-25
Physics of Digital Photography (Second Edition)
e ik(s+s ′) ⎧ ik ⎫
hamp,diff (x , y , λ) ≈ exp ⎨ (x 2 + y 2 )⎬
2
(iλ) ss′ ⎩ 2s′ ⎭
(3.13)
∞ ∞
× ∫−∞ ∫−∞ a(xap, yap) ( )
x
e−i 2π λs ′ xap e − i 2π ( λys′ )y
ap
dxap dyap .
In the second line, the wavenumber k has been replaced by making the
substitution k = 2π /λ .
If the light is fully incoherent, the system is not linear in complex amplitude
and so hamp,diff (x , y, λ ) does not define a PSF. Instead, the system is linear in
irradiance. The diffraction PSF for incoherent lighting can be straightfor-
wardly determined from hamp,diff (x , y, λ ), as shown in step (6) below.
(4) Fourier transformation
By comparing equation (3.13) with the FT defined by equation (3.7), the
second line should be recognisable as the FT of the aperture function,
A(μx , μy ) = FT{a(x , y )}.
This defines the amplitude transfer function. However, the spatial frequencies
μx and μy are to be substituted by the following real-space quantities defined
at the SP:
x y
μx = , μy = .
λs′ λs′
Now equation (3.13) may be written as follows:
e ik(s+s ′) ⎧ ik ⎫ ⎛ x y ⎞⎟
hamp,diff (x , y , λ) ≈ exp ⎨ (x 2 + y 2 )⎬ A⎜μx = , μy = .
2
(iλ) ss′ ⎩ 2s′ ⎭ ⎝ λs′ λs′ ⎠
When the illumination is coherent, this important result shows that the real
optical image at the SP is a convolution of the ideal image predicted by
geometrical optics with a PSF that is the Fraunhofer diffraction pattern of the
XP [13].
(5) Compound lens model
The derivation for the thin lens model can be generalised to the case of a
general compound lens by associating all diffraction effects with either the
entrance pupil (EP) or the XP [13].
Taking the viewpoint that all diffraction effects are associated with the XP,
the coordinates of the aperture function must now be considered on the XP
plane instead. The aperture function must take a value of unity where the
projected aperture is clear, and zero where the projected aperture is blocked:
U0 e ikz ′ ⎧ ik 2 ⎫ ⎛ x y ⎞⎟
hamp,diff (x , y , λ) ≈ exp ⎨ (x + y 2 )⎬A⎜μx = , μy = . (3.14)
iλz′ ⎩ 2z′ ⎭ ⎝ λz ′ λz ′ ⎠
Here U0 is a complex constant, and A(x /λz′, y /λz′) is the FT of the aperture
function evaluated at the spatial frequencies μx = x /λz′ and μy = y /λz′.
3-26
Physics of Digital Photography (Second Edition)
The distance s′ measured from the aperture plane to the SP for the thin
lens has been replaced by the distance z′ measured from the XP to the SP.
The Gaussian expression for z′ is
n′
′ =
z′ = s′ − sxp DxpNw. (3.15)
n
′ are the distances from the second principal plane to the SP
Here s′ and sxp
and XP, respectively, Dxp is the diameter of the XP, and Nw is the working f-
number.
(6) Incoherent illumination
When the illumination is incoherent, the system is linear in irradiance rather
than complex amplitude. In this case, the diffraction PSF is obtained by
utilizing equation (3.10) and taking the squared magnitude of the amplitude
PSF defined by equation (3.14):
U02 ⎛ x y ⎞⎟ 2
hdiff (x , y , λ) = A⎜μx = , μy = . (3.16)
(λz′)2 ⎝ λz ′ λz ′ ⎠
The volume under the PSF should be normalised to unity, and this
expression is valid for source points on or very close to the OA.
This leads to the important result that for incoherent illumination, the real optical
image at the SP is a convolution of the ideal image predicted by geometrical optics
with a PSF that is proportional to the squared magnitude of the Fraunhofer diffraction
pattern of the XP [13].
⎡ ⎤ 2J1⎡⎣πDxp μr ⎤⎦
Acirc (μx , μy ) = jinc⎣Dxp μx2 + μy2 ⎦ = jinc⎡⎣Dxp μr ⎤⎦ = . (3.17)
πDxp μr
Here μr is the radial spatial frequency, and J1 is a Bessel function of the first kind.
3-27
Physics of Digital Photography (Second Edition)
1
y
Dxp
U02 ⎛ ⎛ x ⎞2 ⎛ y ⎞2 ⎞
2
π 2
hdiff,circ(x , y , λ) = Dxp jinc⎜⎜Dxp ⎜ ⎟ + ⎜ ⎟ ⎟⎟ . (3.18)
(λz′)2 4 ⎝ ⎝ λz ′ ⎠ ⎝ λz ′ ⎠ ⎠
For the special case that focus is set at infinity, z′ → m pf ′ = Dxp N . Now substituting
for z′ yields the following expression:
r = 1.22 λN .
This defines the well-known Airy disk, and the diameter or spot size of the Airy disk
is 2.44 λN .
The spot size contains the major contribution to the point spread associated with
diffraction. Since the spot size becomes wider as the f-number N increases, increased
blurring of the image occurs with increasing N. This effect is known as diffraction
softening. Landscape photographers often need to maximise depth of field by using a
small aperture while simultaneously avoiding noticeable diffraction softening. On a
35 mm full-frame camera, around N = 11 is generally considered to achieve the
optimum balance when the output image is viewed under standard viewing
conditions.
3-28
Physics of Digital Photography (Second Edition)
Figure 3.14. Cross-section of the incoherent diffraction PSF for a single wavelength with the lens focused at
infinity. The lens aperture is unobstructed and circular, and the lighting is incoherent. The horizontal axis
represents radial distance from the origin measured in units of λN . The first zero ring at a radial distance
rAiry = 1.22 λN defines the Airy disk. The volume under the PSF should be normalised to unity.
Figure 3.15. Incoherent diffraction PSF for a single wavelength λ with the lens focused at infinity. The lens
aperture is unobstructed and circular, and the lighting is incoherent. No contrast adjustments have been made.
3-29
Physics of Digital Photography (Second Edition)
2
A(μx , μy ) , is a real-space quantity as the spatial frequencies are substituted by real-
2
space coordinates, x /(λz′), y /(λz′). Therefore, the FT of A(μx , μy ) will yield a
Fourier-domain function related to a(x , y ) since the real-space coordinates will be
substituted by the spatial frequencies (λz′μx , λz′μy ). The following identity can be
used to take the FT:
⎧ ⎛ x y ⎟⎞ 2⎫
⎧ ⎛ x y ⎞⎟ *⎛⎜ x x ⎞⎟⎫
FT⎨ A⎜ , ⎬ = FT⎨A⎜ , A , ⎬
⎩ ⎝ λz ′ λz ′ ⎠ ⎭ ⎩ ⎝ λz′ λz′ ⎠ ⎝ λz′ λz′ ⎠⎭
= (λz′)2 p(λz′μx , λz′μy ) ⊗ p* (λz′μx , λz′μy ).
The self cross-correlation or auto correlation operation is denoted by the ⊗ symbol.
Correlation is defined as follows:
∞
g (x ) = ∫−∞ f (x) h(x0 − x) dx0.
Compared to a convolution, the correlation operation does not flip the function h
before the overlap is calculated. For auto correlation, f = h.
Substituting the above identity into equation (3.16) yields the following general
expression for the incoherent diffraction OTF:
After normalising to unity at (0,0), the OTF is seen to be the normalised auto-
correlation function of the amplitude transfer function [13]. At the SP, the Gaussian
expression for z′ is given by equation (3.15),
n′
′ =
z′ = s′ − sxp DxpNw.
n
For the case of a circular lens aperture,
⎡ 2 2 ⎤
⎢ λz′ μx + μy ⎥ ⎡ λz′μ ⎤ ⎡μ ⎤
a circ(λz′μx , λz′μy ) = circ⎢ ⎥ = circ ⎢ r
⎥ = circ ⎢ r ⎥.
⎢⎣ D xp ⎥⎦ ⎣ Dxp ⎦ ⎣ μc ⎦
Here the radial spatial frequency μr and the quantity μc are defined as follows:
x2 + y2 r
μr = =
λz ′ λz ′
Dxp
μc = .
λz ′
The diffraction OTF becomes
⎡μ ⎤ ⎡μ ⎤
Hdiff,circ(μx , μy , λ) = U02 circ⎢ r ⎥ ⊗ circ⎢ r ⎥ .
⎣ μc ⎦ ⎣ μc ⎦
3-30
Physics of Digital Photography (Second Edition)
The auto correlation can be performed by graphically calculating the overlap [13].
This yields the following normalised result:
⎧ ⎛ ⎞
⎪ 2 ⎜ −1 ⎛ μr ⎞ μr ⎛ μ ⎞2
⎟ for μr ⩽ 1
⎪ π ⎜cos ⎜ μ ⎟ − μ 1 − ⎜ μ ⎟
r
⎝ ⎠ ⎝ ⎟
Hdiff,circ(μr , λ) = ⎨ ⎝ c c c⎠ ⎠
μc
. (3.20)
⎪ μr
⎪0 for >1
⎩ μc
3-31
Physics of Digital Photography (Second Edition)
Figure 3.16. Optics OTF or MTF as a function of f-number for an ideal aberration-free lens with a circular
aperture. The wavelength has been taken as λ = 550 nm and so the cut-off frequency is μc = 1818/N.
lens design under consideration, this must be obtained numerically using lens design
software.
Figure 3.17 shows the geometry for defining the wavefront error [13]. A Gaussian
reference sphere that intersects the XP plane at the OA is constructed. This reference
sphere describes the shape of the ideal spherical wave as it emerges from the XP and
converges towards the ideal Gaussian image point defined by the paraxial chief ray.
For a real ray emerging from the XP at position (xxp, yxp ), the wavefront error
W (xxp, yxp ) describes the distance along the ray between the reference sphere and the
actual emerging wavefront (the real XP) that intersects the XP plane at the OA. The
wavefront error is measured in units of wavelength and is related to the optical path
difference (OPD) [14, 17],
OPD(xxp, yxp)
W (xxp, yxp) = .
λ
Multiplying the wavefront error function by 2π gives the phase difference [14, 17].
For example, if the OPD = λ/4 at a given position (xxp, yxp ), then W = 1/4 at the
same position. The peak-to-peak wavefront error WPP is the maximum OPD value
over all points on the reference sphere. The Rayleigh quarter-wave limit [18] specifies
WPP = 1/4 as an aberration allowance for a lens to be sensibly classed as perfect or
diffraction limited [19].
A useful measure for the overall effect of aberrations is the root mean square
(RMS) wavefront error WRMS. This is defined in terms of the mean of the squared
3-32
Physics of Digital Photography (Second Edition)
XP SP
(x, y)
(xxp , yxp )
Figure 3.17. Gaussian reference sphere (blue arc) and example real wavefront (green curve) at the XP. The
paraxial chief ray has been denoted in blue. The red line indicates the optical path difference OPD(xxp, yxp ) for
the example real wavefront.
wavefront error over the reference sphere coordinates minus the square of the mean
wavefront error,
WRMS = 〈W 2〉 − 〈W 〉2 .
An RMS wavefront error up to 0.07 corresponds to a small amount of aberration
content. Medium aberration content has an RMS wavefront error between 0.07 and
0.25, and a large aberration content has an RMS wavefront error above 0.25 [14].
Geometrical optics provides a good approximation to the MTF in the large
aberration content region, WRMS > 0.25.
The Strehl ratio is the ratio of the irradiance at the centre of the PSF for an
aberrated system compared to the irradiance at the centre of the PSF for a diffraction-
limited system. According to the Marechal criterion, a lens can be considered
essentially diffraction limited if the Strehl ratio is at least 80%. For defocus aberration,
where the plane of best focus no longer corresponds with the SP, the Merechal criterion
is identical to the Rayleigh quarter-wave limit as there is an exact relationship between
the peak-to-peak and RMS wavefront errors in this case, WPP = 3.5 WRMS.
Since the wavefront error generally becomes worse for light rays that are further
from the OA, the camera system PSF will no longer be isoplanatic over the entire
SP. In other words, the system can no longer be treated as LSI once a description of
aberrations is included. In practice, a way forward is to treat the system as a group of
subsystems that are LSI over specific regions of the SP [4, 6].
3.3 Sensor
Camera imaging sensors comprise a 2D array of sensor pixels or photosites. As
described in section 3.6.2, each photosite contains at least one photoelement that
converts light into stored charge.
3-33
Physics of Digital Photography (Second Edition)
(a) (b)
Figure 3.18. Example (a) rectangular, and (b) notched rectangular photosensitive detection areas, Adet .
3-34
Physics of Digital Photography (Second Edition)
Adet
FF = . (3.22)
Ap
Since the spectral irradiance distribution over Adet contributes to generating the
charge signal for a photosite, mathematically the spectral irradiance must be
integrated over Adet to yield the spectral radiant flux Φe,λ responsible for generating
the charge signal. Significantly, integrating the spectral irradiance over Adet is
proportional to averaging over the same area,
∫ f (x , y ) dA = Adet f (x , y ) Adet ,
Adet
where the angled brackets denote a spatial average over Adet . This averaging blurs
the corresponding scene detail. As mentioned in the introduction to this section, the
blurring can be considered a form of point spread and can equivalently be expressed
in terms of a spatial detector-aperture PSF. The generation of the charge signal itself
will be discussed in section 3.6.
1 ⎡x⎤ ⎡y⎤
hdet−ap(x , y ) = rect⎢ ⎥ rect⎢ ⎥ . (3.23)
Adet ⎣ dx ⎦ ⎣ dy ⎦
3-35
Physics of Digital Photography (Second Edition)
1
dx dy 1
d
2 y
1
d
2 x
The aperture area is defined by Adet , and the 1/Adet factor ensures that
the convolution operation performs an averaging rather than an integration. The
1/Adet factor will be cancelled out when the charge signal is derived in section 3.6. A
2D square detector-aperture PSF is illustrated in figure 3.19. Detector-aperture PSFs
for a variety of aperture shapes can be found in the literature [20, 21].
3.3.3 Sampling
Recall that the output function g (x , y ) defined by equation (3.21) is the input
function f (x , y ) convolved with the camera system PSF,
g(x , y ) = f (x , y ) ∗ hsystem(x , y ).
Along with the diffraction PSF, consider including the detector-aperture PSF
derived above as part of the camera system PSF,
hsystem(x , y ) = hdiff (x , y ) ∗ hdet−ap(x , y ).
Now at the centre of the detector aperture area, the convolution operation will
output the averaged value of f (x , y ) over the extent of the aperture as it slides over
f (x , y ). This will occur as the centre passes over every spatial coordinate on the SP,
and so the output function g (x , y ) is a continuous function of position.
However, only one output value for g (x , y ) associated with each photosite is
required. This means that the detector-aperture PSF must be accompanied by a
sampling operation that restricts the output of the convolution operation to the
appropriate grid of sampling positions.
Mathematically, the sampling of a function can be achieved by multiplying with
the Dirac comb function illustrated in figure 3.20,
⎡x y⎤ ∞ ∞
comb⎢ , ⎥ = ∑ ∑ δ(x − mpx ) δ(y − npy ).
⎢⎣ px py ⎥⎦ m =−∞ n =−∞
The Dirac comb function is a 2D array of delta functions. Points on the output grid
of sampling positions are separated by integer multiples of the pixel pitch in the
x and y directions, denoted as px and py. The pixel pitch in a given direction is equal
3-36
Physics of Digital Photography (Second Edition)
dy
dx
x
px py
Figure 3.20. Detector sampling represented by a Dirac comb function. Each upward arrow represents a delta
function.
to the reciprocal of the number of photosites per unit area in that direction. For
square photosites, px = py = p.
Multiplying equation (3.21) by the Dirac comb yields the following sampled
output function denoted with a tilde symbol:
⎡x y⎤
g˜(x , y ) = (f (x , y ) ∗ hsystem(x , y )) comb⎢ , ⎥ . (3.24)
⎢⎣ px py ⎥⎦
3-37
Physics of Digital Photography (Second Edition)
In general, the extent of dx and dy that define Adet will be less than the pixel pitches
px and py.
Figure 3.21 and 3.22 illustrate the MTF for a square detector with dx = 3.8 μ m
and px = 4 μ m . In analogy with the lens aperture diffraction MTF, the detector-
aperture MTF also has a cut-off frequency. This is defined as the frequency where
the MTF first drops to zero. In the present example, the cut-off frequency is
(3.8 μ m)−1 = 263 cycles/mm.
The sensor Nyquist frequency μNyq discussed in the next section is the maximum
cut-off frequency required to prevent aliasing and is related to the pixel pitch. If
dx = px, then μNyq will be half the detector-aperture cut-off frequency. In the present
example, μNyq = 125 cycles/mm. The detector cut-off frequency is slightly higher
than 2 × μNyq since the FF is less than 100%. Although a higher FF increases quantum
Figure 3.21. Detector-aperture MTF in the x-direction for a rectangular detector. In this example the detection
area width dx is 3.8 μ m for a 4 μ m pixel pitch px. The detector cut-off frequency is 263 cycles/mm.
3-38
Physics of Digital Photography (Second Edition)
efficiency (section 3.6), a higher FF also increases the width of the detector-aperture
PSF. This in turn increases the point spread and lowers the detector cut-off frequency.
The spatial frequency representation of equation (3.24) is discussed in the next
section.
The OLPF can reduce aliasing to an acceptable level by modifying the flow of light
before it reaches the SP.
By design, the OLPF is another source of blur that can be modelled by a PSF.
This section begins with a brief introduction to sampling theory and the causes of
aliasing. Subsequently, it is shown that the introduction of an OLPF can minimise
aliasing, and an expression for the PSF and MTF due to the OLPF is given. These
can be included as part of the camera system PSF and MTF.
∞ .
F (μx ) = ∫−∞ f (x) e−i 2πμ x dx
x
A function with an FT that is zero everywhere outside some region or band is said to
be band limited [22, 23]. In the present context, the input function defined by
3-39
Physics of Digital Photography (Second Edition)
equation (3.26) is always band limited because the diffraction OTF always has a
finite-valued cut-off frequency μc .
In analogy with section 3.3.3, the sampling of a continuous input function f (x )
that is band limited to the region [−μx,max , μx,max ] can be achieved by multiplying
with a Dirac comb function:
⎡ x ⎤
f˜ (x ) = f (x ) comb⎢ . (3.27)
⎣ Δx ⎥⎦
The tilde symbol indicates that the function is a sampled function. The spatial period
Δx is the spacing of the discrete sampling intervals. The Dirac comb in 1D is defined
by
∞
⎡ x ⎤
comb⎢
⎣ Δx ⎥⎦
= ∑ δ(x − nΔx ).
n =−∞
In analogy with equation (3.6), the value of an arbitrary sample at location xn can be
obtained by integration,
∞
f ( xn ) = ∫−∞ f (x) δ(x − nΔx) dx = f (nΔx).
⎛ ⎡ μ ⎤⎞
F˜ (μx ) = F (μx ) ∗ ⎜⎜Δμx comb⎢ x ⎥⎟⎟ .
⎝ ⎣ Δμx ⎦⎠
Significantly, the convolution of the continuous function F (μx ) with a Dirac comb
function of infinite extent and period Δμx yields a periodic sequence of copies of
F (μx ) with period Δμx . This important result is illustrated in figure 3.23(a) and (b).
3-40
Physics of Digital Photography (Second Edition)
(a)
(b)
(c)
(d)
Figure 3.23. (a) Fourier transform F (μx ) shown by the triangle function (yellow) along with the Dirac
sampling comb (blue arrows). (b) The convolution of F (μx ) with the sampling comb yields copies of F (μx ) with
period Δμx . (c) Ideal reconstruction filter in the Fourier domain (red rectangle). (d) Aliasing caused by
undersampling.
3.4.3 Reconstruction
In principle, the original function f (x ) can be recovered from its samples by
multiplying F̃ (μx ) with an ideal reconstruction filter or ideal low-pass filter. The ideal
reconstruction filter in the Fourier domain is a rectangle function defined by
3-41
Physics of Digital Photography (Second Edition)
⎧1/ Δμ , − μ
max ⩽ μ ⩽ μmax
Hideal(μx ) = ⎨ x
.
⎩0 otherwise
This isolates the so-called baseband F (μx ), as illustrated in figure 3.23(c).
Subsequently, f (x ) can be recovered by taking the inverse FT of the baseband.
It is also instructive to show how f (x ) can be recovered from its samples in the
real domain. The baseband can be written as follows:
F (μx ) = H (μx )F˜ (μx ).
Taking the inverse FT yields
f (x ) = FT−1{H (μx )F˜ (μx )} = f˜ (x ) ∗ h ideal(x ). (3.28)
3.4.4 Aliasing
Up to this point in the discussion, it has been assumed that exact reconstruction is
always possible. However, this is not the case, and two conditions are required [22, 23].
• The signal must be band limited. This prevents the existence of replicated
spectra of infinite extent that are impossible to separate.
• The sampling rate must be greater than twice the maximum frequency present
in the signal. This ensures that the replicated spectra do not overlap and
corrupt the baseband.
Since the signal sampled at the SP will be band-limited by the diffraction PSF,
hdiff (x , y ), the first of these conditions is always satisfied for a camera system.
However, the second of these conditions depends upon a number of variables.
Notably, the sampling rate is fixed by the pixel pitch. Whether or not this sampling
3-42
Physics of Digital Photography (Second Edition)
Figure 3.24. Illustration of aliasing. In this example, the sampling indicated by the black circles is insufficient
for reconstruction of the continuous signal shown in green. Consequently, the signal shown in green incorrectly
appears as the signal shown in magenta upon reconstruction.
rate is greater than twice the maximum frequency present in the signal depends upon
the extent of the low-pass filtering by the camera components. The most important
of these is the cut-off frequency defined by the PSF due to diffraction, hdiff (x , y ), and
this depends upon the lens f-number used to take the photograph. It will be shown
later in this section that an OLPF can be fitted above the sensor to ensure that the
second condition is always satisfied, irrespective of the lens f-number selected.
The second condition above is a statement of the Shannon–Whittaker sampling
theorem [24]. If the sampling theorem is not satisfied, a perfect copy of F (μx ) cannot
be isolated and so the original function will not be correctly recovered. This issue is
known as aliasing because higher spatial frequencies will incorrectly appear as lower
spatial frequencies in the reconstructed signal. A simple example is shown in
figure 3.24. Expressed mathematically, aliasing can be avoided by sampling at a
rate μx,Nyq that must satisfy
1
μx,Nyq = > 2μx,max . (3.29)
Δx
Here μx,max is the highest spatial frequency content of the function, and μx,Nyq is
known as the Nyquist rate. Figure 3.23(d) shows an example of aliasing by sampling
at a rate that fails to satisfy the above condition. This is known as undersampling.
The sampling theorem applied to a function of two spatial variables f (x , y ) yields
separate Nyquist rates in the x and y directions, μx,Nyq and μy,Nyq .
3-43
Physics of Digital Photography (Second Edition)
The frequencies μx,sensor and μy,sensor are defined as the sensor Nyquist frequencies in
the x and y directions. Although the detection areas are usually rectangular or L-
shaped, the photosites are typically square so that px = py = p. In this case it is usual
practice to refer to a single sensor Nyquist frequency,
1
μsensor = .
2p
3.4.6 Pre-filtering
The continuous function to be sampled at the SP is the real 2D spectral irradiance
distribution modelled by equation (3.26) and denoted as g (x , y ),
g(x , y ) = f (x , y ) ∗ hsystem(x , y ).
At present the camera system PSF includes the optics (diffraction) and sensor
(detector-aperture) contributions.
The FT of g (x , y ) is denoted as G(μx , μy ). In order to minimise aliasing, all spatial
frequency content present in G(μx , μy ) above the sensor Nyquist frequency μNyq
needs to be removed before g (x , y ) is sampled. In other words, g (x , y ) must be
py = 2a
√
2a
px = 2a a
Figure 3.25. For a Bayer CFA, the effective pixel pitch for the red and and blue mosaics is p = px = py = 2a .
The green mosaic has p = 2 a rotated at 45°, which results in a higher Nyquist frequency.
3-44
Physics of Digital Photography (Second Edition)
⎧ δ (x − a 0 )δ (y − b 0 ) ⎫
⎪ ⎪
1 ⎪+ δ (x − a1)δ(y − b1) ⎪
h OLPF(x , y ) = ⎨ ⎬ . (3.30)
4 ⎪+ δ (x − a2 )δ(y − b2 ) ⎪
⎪
⎩+ δ (x − a3)δ(y − b3) ⎪ ⎭
Figure 3.26. Four-spot OLPF. The first plate splits the light in the horizontal direction, and the second plate
splits the light in the vertical direction.
3-45
Physics of Digital Photography (Second Edition)
The constants define the point separation and hence the strength of the filter. A
maximum strength filter is obtained by setting the point separation equal to the pixel
pitch:
(a 0, b0) = (0, 0)
(a1, b1) = (px , 0)
(3.31)
(a2 , b2 ) = (0, py )
(a3, b3) = (px , py ).
In this example, the spots are not situated symmetrically about the origin and so
there will be a phase contribution to the OTF. It is preferable that OLPF filters be
designed so that the spots are symmetrical about the origin. The use of a birefringent
OLPF has other benefits such as reduction of colour interpolation error when
demosaicing the raw data.
Objects containing fine repeated patterns are most susceptible to aliasing artefacts
since these patterns are most likely to be associated with well-defined spatial
frequencies above μNyq . The disadvantage of using an OLPF is that its PSF
contributes to the camera system PSF at all times, even when other contributions
such as lens aperture diffraction are already sufficiently band-limiting the spatial
frequency content. As discussed in chapter 5, this reduces the camera system MTF at
spatial frequencies below μNyq and reduces perceived image sharpness.
Cameras with very high sensor pixel counts have a lower sensor Nyquist
frequency and are therefore less prone to aliasing. It is becoming more common
for camera manufacturers to use a very weak OLPF or even completely remove it in
such cameras. Although any aliased scene information corresponding to frequencies
above μNyq cannot be recovered from a single frame, the photographer can use
image-processing software to reduce the prominence of the aliasing artefacts.
⎧ exp( −i 2π (a 0 μx +b0 μy )) ⎫
⎪ ⎪
1 ⎪+ exp( −i 2π (a1 μx +b1 μy )) ⎪
HOLPF(μx , μy ) = ⎨ ⎬ . (3.32)
4 ⎪+ exp( −i 2π (a 2 μx +b2 μy ))⎪
⎪+ exp( −i 2π (a3 μx +b3 μy )) ⎪
⎩ ⎭
3-46
Physics of Digital Photography (Second Edition)
Figure 3.27. MTF for a maximum-strength four-spot OLPF in the x-direction (blue). The detector-aperture
MTF (magenta) corresponds to a pixel pitch px = 4.0 μ m and detection area width px = 3.8 μ m . The product of
the OLPF and detector MTFs (black) drops to zero at μNyq = 125 cycles/mm.
MTFOLPF(μx ) = ∣cos(πpx μx )∣
.
PTFOLPF(μx ) = −πpx μx
The PTF arises from the fact that the four spots defined by equation (3.31) are not
symmetrical about the origin and so the overall image is shifted by half a pixel in the
x and y directions. The PTF will vanish if the spots can be arranged symmetrically
around the origin.
Figure 3.27 illustrates the maximum-strength four-spot MTF in the x-direction
along with the detector-aperture MTF for the same model parameters used in
section 3.3.4. The cut-off frequency for the detector-aperture MTF is 263 cycles/mm.
Since the cut-off frequency for the four-spot filter is defined by the sensor Nyquist
frequency μNyq = 125 cycles/mm, the combined MTF of the four-spot and detector-
aperture MTFs taken at each spatial frequency similarly drops to 125 cycles/mm.
The detector-aperture MTF suppresses the combined MTF above μNyq . Stronger
suppression will occur when other contributions to the camera system MTF such as
the optics MTF are included in the model, and this will reduce aliasing to a minimal
level.
3-47
Physics of Digital Photography (Second Edition)
⎡x y⎤
g˜(x , y , λ) = E˜λ(x , y ) = (f (x , y , λ) ∗ hsystem(x , y , λ)) comb⎢ , ⎥ .
⎢⎣ px py ⎥⎦
Here the dependence on wavelength λ has been reintroduced for clarity. Since the
detector-aperture contribution to the camera system PSF averages the spectral
irradiance distribution over each photosite, the comb function restricts the output of
the convolution operation to the appropriate grid of sampling positions defined by
the pixel pitches px and py.
The input function denoted by f (x , y ) is the ideal spectral irradiance distribution
at the SP defined by equation (3.2),
π ⎛x y⎞ 1 ⎧ ⎛ x y ⎞⎫
f (x , y , λ) = Eλ,ideal(x , y ) = Le,λ⎜ , ⎟ 2 T cos4 ⎨φ⎜ , ⎟⎬ .
4 ⎝ ⎠
m m Nw ⎩ ⎝ m m ⎠⎭
Here Lλ is the spectral scene radiance, m is the magnification, T is the lens
transmittance factor, and Nw is the working f-number. The cosine fourth term can
be replaced by the RI, R(x , y, λ ), which includes vignetting arising from a specific
real lens design [6].
For the model camera system introduced in section 3.1.10, the contributions to
the camera system PSF derived in this chapter can now be summarised, along with
the corresponding contributions to the camera system MTF in the Fourier domain.
U02 ⎛ ⎛ x ⎞2 ⎛ y ⎞2 ⎞
2
π 2
hdiff (x , y , λ) = Dxp jinc⎜⎜Dxp ⎜ ⎟ + ⎜ ⎟ ⎟⎟ .
(λz′)2 4 ⎝ ⎝ λz ′ ⎠ ⎝ λz ′ ⎠ ⎠
Here z′ is the distance from the XP to the SP, and Dxp is the diameter of the XP.
The PSF for a four-spot OLPF is defined by equation (3.30),
⎧ δ (x − a 0 )δ (y − b 0 ) ⎫
⎪ ⎪
1 ⎪+ δ (x − a1)δ(y − b1) ⎪
h OLPF(x , y ) = ⎨ ⎬.
4 ⎪+ δ (x − a2 )δ(y − b2 ) ⎪
⎪ a3)δ(y − b3) ⎪
⎩+ δ (x − ⎭
For a full-strength filter, the constants take values (a 0, b0 ) = (0, 0), (a1, b1) = (px , 0),
(a2, b2 ) = (0, py ), and (a3, b3) = (px , py ).
3-48
Physics of Digital Photography (Second Edition)
1 ⎡x⎤ ⎡y⎤
hdet−ap(x , y ) = rect⎢ ⎥rect⎢ ⎥ .
Adet ⎣ dx ⎦ ⎣ dy ⎦
The detection area Adet = dx dy , where dx and dy are the horizontal and vertical
dimensions of the effective aperture area.
The MTF due to lens aperture diffraction for a circular aperture is defined by
equation (3.20),
⎧ ⎛ ⎞
⎪ 2 ⎜ −1 ⎛ μr ⎞ μr ⎛ μ ⎞2
⎟ for μr ⩽ 1
⎪π ⎜ cos ⎜ ⎟ − 1 − ⎜ r⎟
⎝ μc ⎠ μc ⎝ μc ⎠ ⎟ μc
MTFdiff,circ(μr , λ) = ⎨ ⎝ ⎠ .
⎪ μ
⎪0 for r > 1
⎩ μc
Here μr = μx2 + μy2 is the radial spatial frequency, and μc = 1/(λNw ) is the cut-off
frequency due to diffraction.
The MTF due to a four-spot OLPF is defined by equation (3.32),
exp( −i 2π (a 0 μx +b0 μy ))
1 + exp( −i 2π (a1 μx +b1 μy ))
MTFOLPF(μx , μy ) = .
4 + exp( −i 2π (a 2 μx +b2 μy ))
+ exp( −i 2π (a3 μx +b3 μy ))
For a full-strength filter, the constants are the same as those for the PSF given above.
Finally, the spatial detector-aperture MTF for a rectangular CCD detector is
defined by equation (3.25),
Again, dx and dy are the horizontal and vertical dimensions of the effective aperture
area.
3-49
Physics of Digital Photography (Second Edition)
The detector-aperture contribution to the camera system PSF averages the spectral
irradiance distribution over the flux detection area of each photosite, and the comb
function restricts the output of the convolution operation to the appropriate grid of
sampling positions defined by the pixel pitches px and py. In other words, the (x , y )
coordinates correspond with the array of photosite centres.
Spectral exposure is generally defined as spectral irradiance integrated over the
exposure duration,
t
He,λ = ∫ Ee,λ(t′)dt′
0
3-50
Physics of Digital Photography (Second Edition)
3.6.2 Photoelements
Each photosite contains at least one photoelement such as a photogate (MOS
capacitor) or a photodiode that convert light into stored charge [27–31].
Figure 3.28 illustrates the basic structure of a photogate. A polysilicon electrode
situated above doped p-type silicon is held at a positive bias. This causes mobile
positive holes to flow towards the ground electrode while leaving behind the
immobile negative acceptor impurities, thus creating a depletion region. The spectral
flux Φe,λ incident at the photogate can be considered as a stream of photons. Each
photon has electromagnetic energy hc/λ , where c is the speed of light and h is Plank’s
constant. Silicon has a band gap energy of 1.12 eV, which means that the depletion
region can absorb photons with wavelengths shorter than 1100 nm [32]. When an
incident photon is absorbed in the depletion region, an electron–hole pair is created.
The gate voltage prevents the electron–hole pairs from recombining. The holes flow
towards the ground electrode and leave behind the electrons as stored charge. The
well capacity is determined by factors such as gate electrode area, substrate doping,
gate voltage, and oxide layer thickness.
In a photodiode, the depletion region is created by a reverse bias applied to the
junction between n-type and p-type silicon. Overlaying electrodes are not required,
and the well capacity is limited by the junction width.
microlens
colour filter
polysilicon gate
SiO2 layer
depletion
region p-type Si
3-51
Physics of Digital Photography (Second Edition)
(a) (b)
Figure 3.29. Example CFAs. (a) Bayer CFA. (b) Fuji X-Trans® CFA.
®
3-52
Physics of Digital Photography (Second Edition)
3-53
Physics of Digital Photography (Second Edition)
Here H˜ e,λ is the sampled spectral exposure defined by equation (3.34), and the
sampling coordinates (x , y ) have been dropped for clarity.
Only a fraction of the photon count defined by equation (3.35) will be converted
into stored electric charge. When a CFA is fitted above the imaging sensor, the
average number of stored electrons generated per incident photon with wavelength λ
can be expressed in the following way:
ne,i (λ) = n ph,i (λ) QEi (λ). (3.36)
Here i is the mosaic label. The overall external quantum efficiency [32], or simply the
quantum efficiency (QE), is defined by
Adet
FF = .
Ap
• Transmission function T (λ ): This function takes into account unwanted
surface absorption and reflectance effects. Reflection at the SiO2–Si interface
can be reduced by using anti-reflective films [32]. In front-illuminated devices
with photogates, the polysilicon electrodes reduce sensitivity in the blue
region below 600nm and become opaque below 400 nm [3]. Backside-
illuminated devices do not suffer from reduced sensitivity in the blue region
provided appropriate anti-reflective coating is applied and the wafer is
thinned to minimise recombination [3].
• CFA transmission function TCFA,i : This is dependent upon the mosaic label i.
The set of CFA transmission functions were introduced in the previous
section.
3-54
Physics of Digital Photography (Second Edition)
As discussed in the previous section, the spectral passband of the camera should
ideally correspond to that of the HVS. The above equation can alternatively be
expressed in the following way:
λ2
Ap
ne,i =
e
∫ Ri (λ)H˜ e,λ,i dλ . (3.39)
λ1
∫ Ri(λ) Φλ,i dλ = Ii .
Here Ii is the total generated photocurrent measured in amperes at a photosite
belonging to mosaic i,
Qi
Ii = .
t
The charge signal Qi at a photosite belonging to mosaic i is defined by
Qi = ne,i e . (3.40)
The set of spectral responsivity functions are also known as the photoconversion
response functions or the camera response functions. Figure 3.30 illustrates an
example set [33]. There will be only one response function in the absence of a CFA.
The camera response functions should ideally be linear functions of spectral
exposure and therefore linear functions of radiant exposure.
3-55
Physics of Digital Photography (Second Edition)
Figure 3.30. Relative spectral responsivity Ri (λ ) for each colour channel of the Nikon® D700 as a function of
wavelength. The curves have been normalised by the peak response.
Here the Ri (λ ) are the camera response functions, and Le,λ is the spectral radiance of
the illumination source. More generally, Le,λ can take the form of a spectral power
distribution (SPD) to be defined in chapter 4. The magnitude of Le,λ is removed
through normalisation as only the spectral characteristics of Le,λ are relevant [4, 34].
The corresponding OTF is defined by
3-56
Physics of Digital Photography (Second Edition)
λ2
tAp
λ2 ⎡x y⎤
ne,i = ∫ Ri (λ)(Eλ,ideal(x , y ) ∗ h poly(x , y )) comb⎢ , ⎥dλ .
e ⎢⎣ px py ⎥⎦
λ1
charge
motion
charge
detection
charge
motion
Figure 3.31. Charge readout mechanism as a function of time for a 2 × 2 block of photosites at the corner of a
CCD sensor.
3-57
Physics of Digital Photography (Second Edition)
Despite the above differences, the actual charge detection process is similar for
both CCD and CMOS sensors [32]. The charge signal at a photosite is converted by
a charge amplifier into a voltage Vp :
Gp
Vp = Q. (3.41)
C
The mosaic label i has been dropped for clarity. Here C is the sense node capacitance,
and Gp is the source follower amplifier gain of order unity [3]. The maximum voltage
Vp,FWC occurs at full-well capacity (FWC):
Gp
Vp,FWC = QFWC. (3.42)
C
The value QFWC is the maximum charge that the photosite can hold. The conversion
gain [32] describes the output voltage change per electron in μV/e− units,
Gp
GCG = e.
C
For a CCD sensor, the PGA will be off-chip. The programmable gain G is controlled
by the camera ISO setting S.
As described in chapter 2, the numerical value of S is determined from the JPEG
output and not the raw data. However, the actual numerical value of S is not
important in the present context. The important aspect is that doubling S doubles G.
In the simplest case, S controls a single analog PGA. Some cameras use a second-
stage PGA for intermediate ISO settings.
The maximum output voltage Vmax occurs at the maximum raw level. The ISO
setting that uses the least analog amplification to achieve the maximum raw value is
defined as the base ISO setting, Sbase . This is typically designed to occur when FWC
is utilised. Assuming that FWC is utilised, the corresponding gain G = G base satisfies
3-58
Physics of Digital Photography (Second Edition)
Since V cannot be larger than Vmax , the following relation always holds:
Vp G ISO ⩽ Vp,FWC .
When G ISO is raised above the base value (G ISO = 1), the detected voltage
Vp < Vp,FWC and so FWC is not utilised. For example, Vp ⩽ 0.5 Vp,FWC when
G ISO = 2, which might correspond to a doubling of S from say ISO 100
(G ISO = 1) to ISO 200 (G ISO = 2). There are disadvantages of not utilising FWC:
• Raw dynamic range (raw DR) is reduced, meaning that less scene contrast can
be reproduced in the raw data.
• Less radiant exposure is utilised at the maximum V, and this lowers signal-to-
noise ratio (SNR).
Signal noise will be discussed in section 3.8. SNR and raw DR are discussed in
chapter 5.
⎡ V ⎤
n DN = INT⎢ k (2M − 1)⎥ . (3.48)
⎣ Vmax ⎦
3-59
Physics of Digital Photography (Second Edition)
3-60
Physics of Digital Photography (Second Edition)
For generality, the mosaic index i should be included since g may not be
identical for different mosaics.
• If the ISO setting S and associated ISO gain G ISO are doubled, a given nDN
will result from half the radiant exposure He and half the electron count ne.
However, g will also be halved and so the raw value nDN will remain
unchanged. Consistent with the discussion in section 3.7.1, this can be useful
when needing to freeze the appearance moving action or when an exposure
duration short enough to counteract camera-shake in low-light conditions is
needed.
• When S is adjusted so that G ISO = U, the conversion factor becomes g = 1. In
this case, there is a one-to-one correspondence between electron counts and
raw levels. Because U depends on factors such as the bit depth of the ADC,
FWC, and the constant k, the unity gain will differ between camera models.
One reason for adding the bias offset is that its presence can aid the analysis of signal
noise. However, the raw value nDN,bias must be subtracted from the raw data when
the raw data file undergoes conversion into a viewable output digital image. Some
camera manufacturers automatically subtract nDN,bias before the raw data file is
written.
Recall that equation (3.49) defined the maximum possible raw level as
n DN,max = INT[k (2M − 1)] ⩽ 2M − 1.
However, the following condition must also hold in the presence of a bias offset:
n DN,max + n DN,bias ⩽ 2M − 1.
This means that when Vbias is included and nDN,bias subtracted, the effective maximum
raw level or raw clipping point nDN,clip is lower than nDN,max ,
n DN,clip = n DN,max − n DN,bias .
Since nDN,bias is an offset rather than a factor, subtracting nDN,bias does not
necessarily reduce the maximum achievable raw DR.
3.8 Noise
Signal noise can be defined as unwanted variations in the voltage signal. Unless
eliminated, the noise will appear in the raw data. Signal noise can be broadly
3-61
Physics of Digital Photography (Second Edition)
classified into two main types. Temporal noise arises from the fact that the voltage
signal always fluctuates over short time scales. A major contribution to the temporal
noise arises from the quantum nature of light itself. Contributions to the total
temporal noise also arise from a variety of camera components. Fixed pattern noise
(FPN) is a type of noise that does not vary over time.
In digital cameras, the main contributions to the total temporal noise are photon
shot noise, read noise, and dark-current shot noise [32, 35].
σe2,ph = ne (3.52)
Although photon shot noise is a dominant source of noise at high exposure levels, it
increases as the square root of the electron count and so it becomes relatively less
significant as the exposure level increases. In other words, the SNR increases with
exposure level.
3-62
Physics of Digital Photography (Second Edition)
below) are generated inside a photoelement and are not included in the definition of
the read noise [32]. Forms of read noise include the following:
• Thermal (Johnson-Nyquist) noise: This occurs due to thermal agitation of
electrons.
• Flicker (1/f) noise: This is a circuit resistance fluctuation that appears at low
frequencies.
• Reset or kTC noise: This is a type of thermal noise due to reset of the photon-
well capacitance. In CMOS sensors it can be minimzed using the correlated
double-sampling (CDS) technique [32].
Since read noise arises from voltage fluctuations in the readout circuitry rather than
fluctuations in the charge signal, it is measured as a standard deviation using DNs in
the raw data, σDN,read . However, read noise can be converted into electron counts by
using the conversion factor g defined in section 3.7.3. In this case the read noise is
denoted as σe,read . The conversion between DNs and electron counts is discussed
further in section 3.9.
A certain amount of read noise will already be present in the voltage signal before
it is amplified by the PGA. This part of the read noise is defined as the upstream read
noise and it will be amplified when the ISO gain G ISO is increased. On the other hand,
downstream read noise arises due to circuitry downstream from the PGA and is
therefore independent of G ISO.
3-63
Physics of Digital Photography (Second Edition)
Independent contributions to the total FPN are noise signals that are added directly.
This differs from independent contributions to the total temporal noise, which are
added in quadrature. Since FPN does not change from frame to frame, it can in
principle be removed. This is described in the next section.
There is another type of pattern noise that should be mentioned. Although read
noise should in principle be purely temporal and form a Gaussian distribution about
the expected signal, the read noise in some cameras is not purely Gaussian but has
some periodic pattern component arising from circuit interference [35]. Although the
pattern component is not fixed from frame to frame and therefore cannot be strictly
categorized as DSNU, any overall pattern can be detected by averaging over many
3-64
Physics of Digital Photography (Second Edition)
bias frames, a bias frame being a frame taken with zero integration time at a
specified ISO setting.
3-65
Physics of Digital Photography (Second Edition)
2 2
noise, σDN,read , measured in DN. The square of the photon shot noise, σDN,ph ,
2
is given by subtracting σDN,read from the graph value.
Figure 3.32 shows a temporal noise measurement for the Olympus® E-M1 at ISO
1600. The fitted straight line is shown near the origin.
The value of the conversion factor g emerges from the above measurement. The first
step is to utilise equation (3.52), which states that the square of the photon shot noise is
equal to the mean signal itself when both are expressed using input-referred units,
σe2,ph = ne .
Figure 3.32. Temporal noise measurement for the Olympus® E-M1 at ISO 1600.
3-66
Physics of Digital Photography (Second Edition)
This means that the value of g at the selected ISO setting is obtained as the inverse of
the gradient of the fitted straight line [32, 35, 36].
Figure 3.33. Gaussian read noise distribution measured in DN for the Olympus® E-M1 at a selection of high
ISO settings. The distribution is centred at the DN = 256 bias offset present in the raw data.
3-67
Physics of Digital Photography (Second Edition)
The ISO setting S is proportional to the PGA gain defined by equation (3.45) of
section 3.7. The read noise term σ0 describes the contribution to the total read noise
arising from electronics upstream from the PGA, and the read noise term σ1
describes the contribution from electronics downstream from the PGA [35]. Since
these terms describe temporal noise, they must be added in quadrature. This model
can be fitted to the data of figure 3.34.
As a second example, figure 3.35 shows measured data for the Nikon® D700. The
behaviour of the conversion factor suggests that all S below ISO 200 are extended
low-ISO settings. Furthermore, the read noise data appears to indicate the use of a
second-stage PGA at certain S. A more sophisticated noise model is required for a
two-stage PGA [35],
2 2
σDN,read = M 2{(Sσ0)2 + σ12} + σ22.
3-68
Physics of Digital Photography (Second Edition)
Figure 3.34. Read noise and conversion factor measurement for the Olympus® E-M1 plotted as a function of
ISO setting using base 2 logarithmic axes. (Upper) Output-referred read noise expressed using digital numbers
(DN). Data measured by the author. (Centre) Conversion factor measured in electrons/DN. Data courtesy of
W. J. Claff. (Lower) Calculated input-referred read noise measured in electrons.
Here σ0 is the read noise upstream from the PGA, S is the main (first-stage) ISO
setting, and σ1 is the noise contribution from the first-stage amplifier. For inter-
mediate S, the multiplier M for the second-stage amplifier takes a value of 1.25 or
1.6, and all read noise present is amplified. The final term σ2 is the read noise
3-69
Physics of Digital Photography (Second Edition)
Figure 3.35. Read noise and conversion factor measurement for the Nikon® D700 plotted as a function of ISO
setting using base 2 logarithmic axes. Data courtesy of W. J. Claff. (Upper) Output-referred read noise
expressed using digital numbers (DN). (Centre) Conversion factor expressed using electrons/DN. (Lower)
Input-referred read noise expressed using electrons.
contribution from the second-stage amplifier along with the read noise downstream
from the PGA.
Since FPN can in principle be removed, only a temporal noise model needs to be
included to complete the raw data model derived in this chapter. Read noise can be
3-70
Physics of Digital Photography (Second Edition)
included using models such as those given above, and photon shot noise expressed as
a standard deviation with zero mean varies as the square root of the signal expressed
using input-referred units (electrons) according to equation (3.52). Conversion
between input-referred and output-referred units (DN) is achieved using the
conversion factor defined by equation (3.51). The temporal noise model expressed
using input-referred units and output-referred units can be added to equations (3.39)
and (3.50), respectively.
References
[1] Gaskill J D 1978 Linear Systems, Fourier Transforms, and Optics (New York: Wiley-
Interscience)
[2] Holst G C 1998 CCD Arrays, Cameras, and Displays 2nd edn (Winter Park, FL: JCD
Publishing, and Bellingham, WA: SPIE)
[3] Holst G C and Lomheim T S 2011 CMOS/CCD Sensors and Camera Systems 2nd edn
(Winter Park, FL: JCD Publishing, and Bellingham, WA: SPIE)
[4] Fiete R D 2010 Modeling the Imaging Chain of Digital Cameras, SPIE Tutorial Text vol
TT92 (Tutorial Texts in Optical Engineering) (Bellingham, WA: SPIE Press)
[5] Farrell J E, Xiao F, Catrysse P B and Wandell B A 2003 A simulation tool for evaluating
digital camera image quality Proc. SPIE 5294 124
[6] Maeda P, Catrysse P and Wandell B 2005 Integrating lens design with digital camera
simulation Proc. SPIE 5678 48
[7] Farrell J E, Catrysse P B and Wandell B A 2012 Digital camera simulation Appl. Opt. 51
A80
[8] Gonzalez R C and Woods R E 2008 Digital Image Processing 3rd edn (Englewood Cliffs,
NJ: Prentice Hall)
[9] Palmer J M and Grant B G 2009 The Art of Radiometry SPIE Press Monograph vol. 184
(Bellingham, WA: SPIE Press)
[10] Camera & Imaging Products Association 2004 Sensitivity of Digital Cameras CIPA DC-004
[11] International Organization for Standardization 2006 Photography—Digital Still Cameras—
Determination of Exposure Index, ISO Speed Ratings, Standard Output Sensitivity, and
Recommended Exposure Index, ISO 12232:2006
[12] Nasse H H 2008 How to Read MTF Curves (Carl Zeiss Camera Lens Division)
[13] Goodman J 2004 Introduction to Fourier Optics 3rd edn (Englewood, CO: Roberts and
Company)
[14] Shannon R R 1997 The Art and Science of Optical Design (Cambridge: Cambridge
University Press)
[15] Boreman G D 2001 Modulation Transfer Function in Optical and Electro-Optical Systems,
SPIE Tutorial Texts in Optical Engineering Vol. TT52 (Bellingham, WA: SPIE Publications)
[16] Saleh B E A and Teich M C 2007 Fundamentals of Photonics 2nd edn (New York: Wiley-
Interscience)
[17] Shannon R R 1994 Optical specifications Handbook of Optics (New York: Mc-Graw-Hill)
ch 35
[18] Born M and Wolf E 1999 Principles of Optics: Electromagnetic Theory of Propagation,
Interference and Diffraction of Light 7th edn (Cambridge: Cambridge University Press)
[19] Smith W J 2007 Modern Optical Engineering 4th edn (New York: McGraw-Hill)
3-71
Physics of Digital Photography (Second Edition)
[20] Yadid-Pecht O 2000 Geometrical modulation transfer function for different pixel active area
shapes Opt. Eng. 39 859
[21] Fliegel K 2004 Modeling and measurement of image sensor characteristics Radioengineering
13 27
[22] Wolberg G 1990 Digital Image Warping 1st edn (Piscataway, NJ: IEEE Computer Society
Press)
[23] Gonzalez R C and Woods R E 2007 Digital Image Processing 3rd edn (Englewood Cliffs,
NJ: Prentice-Hall)
[24] Shannon C E 1949 Communication in the presence of noise Proc. Inst. Radio Eng. 37 10
[25] Greivenkamp J E 1990 Color dependent optical prefilter for the suppression of aliasing
artifacts Appl. Optics 29 676
[26] Palum R 2009 Optical antialiasing filters Single-Sensor Imaging: Methods and Applications
for Digital Cameras ed R Lukac (Boca Raton, FL: CRC Press) ch 4
[27] Theuwissen A J P 1995 Solid-State Imaging with Charge-Coupled Devices (Dordrecht:
Kluwer)
[28] Sze S M and Ng K K 2006 Physics of Semiconductor Devices 3rd edn (New York: Wiley-
Interscience)
[29] Sze S M and Lee M-K 2012 Semiconductor Devices: Physics and Technology 3rd edn (New
York: Wiley)
[30] Yamada T 2006 CCD image sensors Image Sensors and Signal Processing for Digital Still
Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 4
[31] Takayanagi I 2006 CMOS image sensors Image Sensors and Signal Processing for Digital
Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 5
[32] Nakamura J 2006 Basics of image sensors Image Sensors and Signal Processing for Digital
Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 3
[33] Jiang J, Liu D, Gu J and Susstrunk S 2013 What is the space of spectral sensitivity functions
for digital color cameras? IEEE Workshop on the Applications of Computer Vision (WACV)
(Clearwater Beach, FL) (Piscataway, NJ: IEEE Computer Society) 168–79
[34] Subbarao M 1990 Optical transfer function of a diffraction-limited system for polychromatic
illumination Appl. Optics 29 554
[35] Martinec E 2008 Noise, Dynamic Range and Bit Depth in Digital SLRs unpublished
[36] Mizoguchi T 2006 Evaluation of image sensors Image Sensors and Signal Processing for
Digital Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 6
[37] Sato K 2006 Image-processing algorithms Image Sensors and Signal Processing for Digital
Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 8
3-72
IOP Publishing
Chapter 4
Raw conversion
The objective of raw conversion is to convert a recorded raw file into a viewable
output image file. Although this objective was briefly summarised in section 2.1 of
chapter 2, numerous procedures are required in practice. In traditional digital
cameras, fundamental steps in the raw conversion process include the following:
1. Linearisation: The raw data may be stored in a compressed form that needs
to be decompressed for processing, for example, by using a look-up
table (LUT).
2. Dark signal subtraction: As mentioned in section 3.8 of chapter 3, cameras
can measure and subtract the dark signal before the raw data file is written by
utilising the average raw value of the optical black photosites that are
positioned at the edges of the imaging sensor and shielded from light. More
sophisticated methods will compensate for shading gradients with respect to
row or column as a function of the operating temperature.
3. White balance: In traditional digital cameras, this is achieved by direct
application of multipliers to the raw channels in order that equal raw values
correspond to a neutral subject.
4. Colour demosaic: Raw data obtained from a sensor with a colour filter array
(CFA) contains incomplete colour information. The missing data needs to be
determined through interpolation.
5. Colour-space transformation: The demosaiced raw data resides in the internal
camera raw space and needs to be transformed into a standard device-
independent output-referred colour space such as sRGB for viewing on a
display. In traditional digital cameras, this is achieved through application of
a colour rotation matrix that can be derived after characterising the camera
by mapping the internal camera raw space to a reference colour space such as
CIE XYZ.
6. Image processing: Various proprietary image-processing algorithms will be
applied by the in-camera image processing engine.
4-2
Physics of Digital Photography (Second Edition)
colour space, an ideal camera raw space would contain all possible colours. In other
words, it would be a reference colour space associated with a non-standard colour
model. In practice, camera raw spaces are very large but are dependent upon camera
model and do not contain all possible colours. In fact, camera raw spaces cannot be
regarded as true colour spaces at all unless a condition known as the Luther–Ives
condition is satisfied. This condition is discussed in section 4.4.
In general, camera raw spaces can be regarded as large approximate colour
spaces. Before any processing of the raw data can occur, a camera must be
characterized by establishing a linear mapping between the camera raw space and
a reference colour space such as CIE XYZ. Unless the Luther–Ives condition is
satisfied exactly, this linear mapping can only be defined in an approximate way.
After a camera has been characterized, colours specified in the camera raw space can
be converted into colours specified in an appropriate output-referred colour space.
This colour space conversion will be performed as part of the conversion of the raw
data into a viewable output digital image.
The reference colour space specified by the International Color Consortium (ICC)
is known as the profile connection space (PCS) [1], and this can be based on either
CIE LAB or CIE XYZ.
4-3
Physics of Digital Photography (Second Edition)
Light incident upon the retina of the eye is generally composed of a mixture of
electromagnetic waves of various wavelengths. This polychromatic mixture can be
characterised by its spectral power distribution (SPD), P(λ ). This can be any function
of power contribution at each wavelength such as spectral radiance, Le,λ , which was
introduced in chapter 3.
For a given set of viewing conditions, there may be many different SPDs that
appear to have the same colour to a human observer. For example, different SPDs
may be found that appear to have the same colour as a pure monochromatic
spectrum colour. This concept is known as the principle of metamerism, and a group
of SPDs that yield the same visual colour response under identical viewing
conditions are known as metamers. The principle of metamerism is fundamental
to the theory introduced in the following sections.
4-4
Physics of Digital Photography (Second Edition)
Figure 4.2. Normalised eye response curves based on the Stiles and Burch 10° colour-matching functions [2]
adjusted to 2°. Arbitrary colours have been used for the curves.
Under the same viewing conditions, the same L, M , S triple will result from
metameric SPDs. The set of all distinct L, M , S triples defines the LMS colour
space.
Alternative reference colour spaces such as CIE XYZ have mathematical
properties that are more useful for digital imaging.
4-5
Physics of Digital Photography (Second Edition)
Here [E(λ )] is one unit of the monochromatic target colour, and [R], [G], [B] each
denote one unit of the primaries. The nature of these units is discussed in section
4.1.4 below. The colour-matching functions are shown in figure 4.3. They define the
CIE 2° standard colourimetric observer representative of normal human colour
vision and serve as mathematical functions that can be used to obtain metameric
colour matches. The associated reference colour space is known as CIE RGB, which
is discussed in section 4.1.6. Colour-matching functions obtained using a 10° viewing
Figure 4.3. 1931 CIE colour-matching functions, r̄(λ ), ḡ(λ ), b̄(λ ), which define the 2° standard observer. The
units are defined according to equation (4.2) so that the area under each curve is the same.
4-6
Physics of Digital Photography (Second Edition)
angle were later defined in 1964, and these define the CIE 10° standard colourimetric
observer.
Notice that the colour-matching functions have negative values at certain wave-
lengths. In these cases it was found that a primary needed to be added to the target
colour to obtain a colour match rather than be mixed with the other primaries. This
is a consequence of the fact that real primaries are being used, and so the linear
combination must permit negative values. In fact, a set of three colour matching
functions that do not have any negative values can only be associated with
imaginary primaries such as the eye cone primaries. Ultimately, this is a conse-
quence of the fact that the eye cone primaries cannot be uniquely stimulated due to
the overlap between the eye cone response functions.
It should also be noted that any set of three linearly independent monochromatic
sources can be chosen as primaries. However, sources with wavelengths in each of
the red, green and blue regions of the visible spectrum are most useful in practice
since a large range of colours can be matched by such a choice when only positive
linear combinations of the primaries are allowed, as in the case of a display device.
Significantly, it follows from Grassman’s laws that all valid sets of colour matching
functions are related to each other via a linear transformation.
4.1.4 Units
It is important to understand the units that were defined by the CIE when the
experimental data of Wright and Guild was analysed.
According to equation (4.1), one unit of the monochromatic target colour
denoted by [E(λ )] was defined such that
[E(λ)] ≡ r¯(λ)[R ] + g¯(λ)[G ] + b¯(λ)[B ],
where [R], [G], [B] each denote one unit of the associated primaries λR, λ G , λB.
Rather than use absolute power units, the CIE normalised the units so that the
following result is obtained when both sides of the above expression are integrated
over the visible spectrum:
[E] ≡ [R ] + [G ] + [B ] . (4.2)
This means that the units for the primaries are defined such that one unit of each will
match the colour of a hypothetical polychromatic equi-energy source that has
constant power at all wavelengths. This SPD is known as illuminant E.
Consequently, the area under each of the curves appearing in figure 4.3 is the
same. Since the [R], [G], [B] units each represent their own individual quantities of
power, they must be treated as dimensionless.
4-7
Physics of Digital Photography (Second Edition)
Since the primary units are dimensionless, it is important to keep track of the
luminance and radiance ratios implied by the units. The luminance ratio between
[E], [R ], [G ] and [B ], respectively, is given by
This implies that 0.17697 cd m−2 of the red primary, 0.81240 cd m−2 of the green
primary, and 0.01063 cd m−2 of the blue primary would be needed to match 1 cd/m2
of illuminant E.
The standard luminosity function V (λ ) that was introduced in section 3.1.1 of
chapter 3 can be used to obtain the corresponding radiance ratio [3]. This function is
discussed further in the next section. By utilising the value of V (λ ) at the primary
wavelengths, the radiance ratio between the units is found to be
c = 5.6508 . (4.5)
Substituting c into equation (4.4) reveals the relationship between the standard
luminosity function and the colour-matching functions,
4-8
Physics of Digital Photography (Second Edition)
Figure 4.4. The colour-matching functions r̄(λ ), ḡ(λ ) and b̄(λ ) each multiplied according to the luminance ratio
between their respective dimensionless units illustrates the relationship with the 1924 CIE standard luminosity
function V (λ ).
This relationship is illustrated in figure 4.4. The area under each of the colour-
matching functions shown in figure 4.3 is found to be 1/c of the area under V (λ ).
The colour of P itself can be matched by integrating over the visible spectrum,
780
P≡ ∫380 P(λ)[E(λ)] dλ .
4-9
Physics of Digital Photography (Second Edition)
The same R, G , B triple will result from metameric SPDs when viewed under
identical conditions.
SPDs are commonly expressed using spectral radiance units, W sr−1 m−2 nm−1.
The numerical values of r̄(λ ), ḡ(λ ), b̄(λ ) are defined according to equation (4.6), and
so the CIE RGB tristimulus values can be negative. Tristimulus values can be
absolute or relative. Normalisation of the tristimulus values will be discussed in
sections 4.1.10 and 4.1.11.
The CIE RGB colour space contains all colours visible to the 2° standard
colourimetric observer. However, the CIE RGB colour space is rarely used as a
reference space in practice. Nevertheless, it provides the foundation for the widely
used CIE XYZ reference colour space to be introduced in section 4.1.8.
The following section discusses the rg chromaticity diagram, which provides a
straightforward way to visualise the CIE RGB colour space.
4-10
Physics of Digital Photography (Second Edition)
Figure 4.5. 1931 CIE rg chromaticity diagram defined by the grey horseshoe-shaped area. The red, green and
blue circles mark the primaries of the 1931 CIE RGB colour space, and the point corresponding to illuminant
E is shown in white. The black circles on the boundary indicate a selection of pure spectrum colours and their
associated wavelengths.
chromaticity coordinates of the primaries that define the CIE RGB colour space are
(0,0), (0,1) and (1,0) by definition, and illuminant E has coordinates (1/3,1/3) due to the
normalisation of the units. All visible chromaticities form a horseshoe shape rather
than a triangle due to the overlap of the eye cone response functions shown in
figure 4.2. Pure spectral colours or hues are located on the curved part of the horseshoe,
and saturation decreases with inward distance from the horseshoe boundary.
The horseshoe area has been shown in grey rather than colour because a standard
monitor cannot display all of these colours correctly. For reference, figure 4.14 in
section 4.5 displays the chromaticities of the smaller sRGB colour space as these can
be shown correctly on a standard monitor.
4-11
Physics of Digital Photography (Second Edition)
a new set of colour-matching functions x̄(λ ), ȳ(λ ), z̄(λ ) and associated primaries
λX , λY , λZ . These new colour matching functions are obtained as a linear trans-
formation from r̄(λ ), ḡ(λ ), b̄(λ ) as follows:
⎡ x¯(λ) ⎤ ⎡ r¯(λ) ⎤
⎢ ⎥ ⎢ ⎥
_ ⎢ g¯(λ)⎥ ,
⎢ y¯ (λ)⎥ = T
⎢⎣ z¯(λ) ⎥⎦ ⎢⎣ b¯(λ)⎥⎦
where T
_ is a 3 × 3 transformation matrix,
⎡ 0.49 0.31 0.20 ⎤
_ = c⎢ 0.17697 0.81240 0.01063⎥ ,
T
⎢⎣ ⎥
0.00 0.01 0.99 ⎦
and c = 5.6508 is the normalisation constant defined by equation (4.5). The above
transformation matrix T was defined so that the CIE XYZ colour space would have
the following properties:
• One of the tristimulus values, Y, would be proportional to luminance.
• Negative tristimulus values would not occur.
The first property can be examined by writing the above matrix equation more
explicitly,
x¯(λ) = c(0.49 r¯(λ) + 0.31 g¯(λ) + 0.20 b¯(λ))
y¯ (λ) = c(0.17697 r¯(λ) + 0.81240 g¯(λ) + 0.01063 b¯(λ))
z¯(λ) = c(0.01 g¯(λ) + 0.99 b¯(λ)).
Figure 4.6. 1931 CIE colour-matching functions x̄(λ ), ȳ(λ ), z̄(λ ). Arbitrary colours have been used for the
curves.
4-12
Physics of Digital Photography (Second Edition)
Significantly, it can be seen that the ratio between the coefficients appearing on the
second line is precisely the luminance ratio between the [R ], [G ] and [B ] units defined
by equation (4.3). Furthermore, the CIE defined x̄(λ ) and z̄(λ ) such that they yield
zero luminance. Consequently, all luminance information about the colour is
described by the ȳ(λ ) colour-matching function only. Since the constant
c = 5.6508 is the same as that defined by equation (4.5), ȳ(λ ) is in fact equal to
the standard luminosity function V (λ ) defined by equation (4.6),
Figure 4.7. 1931 CIE rg chromaticity diagram defined by the grey shaded horseshoe-shaped area. The red,
green and blue points mark the primaries of the 1931 CIE RGB colour space. The black points mark the
primaries of the CIE XYZ colour space.
4-13
Physics of Digital Photography (Second Edition)
and blue circles, all visible chromaticities can be obtained as positive linear
combinations of the λX , λY , λZ primaries.
In analogy with section 4.1.6, the CIE XYZ tristimulus values representing the
colour of an SPD denoted by P are defined as follows:
780 780
X=k ∫380 P(λ) x¯(λ) dλ , Y = k ∫380 P(λ) y¯ (λ) dλ ,
780
. (4.8)
Z=k ∫380 P(λ) z¯(λ) dλ
4-14
Physics of Digital Photography (Second Edition)
Figure 4.8. 1931 CIE xy chromaticity diagram defined by the grey shaded horseshoe-shaped area. The black
circles on the boundary indicate a selection of pure spectrum colours and their associated wavelengths. The
point corresponding to illuminant E is shown in white. The red, green and blue circles mark the primaries of
the 1931 CIE RGB colour space.
4-15
Physics of Digital Photography (Second Edition)
Now the tristimulus values appearing in equation (4.8) are absolute, and Y specifies
absolute luminance Yabs or L v measured in cd/m2,
780
Yabs = K m ∫380 P(λ) y¯ (λ) dλ .
4-16
Physics of Digital Photography (Second Edition)
4-17
Physics of Digital Photography (Second Edition)
4.2 Illumination
In the previous section, it was shown that the colour of an SPD can be specified by a set
of three tristimulus values belonging to a reference colour space, or alternatively by
luminance together with chromaticity coordinates belonging to a reference colour space.
This section describes the properties of SPDs that arise from everyday sources of
illumination such as heated objects, and in particular the concept of correlated
colour temperature.
2hc 2 1
Le,λ(T ) = .
λ5 ⎛ hc ⎞
exp ⎜ ⎟−1
⎝ λk BT ⎠
Here h is Planck’s constant, kB is the Boltzmann constant, and c is the speed of light
in the medium. The temperature T measured in Kelvin (K) is known as the colour
temperature of the black body as its colour ranges from red at low temperatures
through to blue at high temperatures.
The Planckian locus is the locus formed on a chromaticity diagram by a black-
body radiator as a function of temperature. For the xy chromaticity diagram, the
chromaticity coordinates of the Planckian locus can be obtained by using equation
(4.8) to calculating the tristimulus values for a given SPD specified as Le,λ(T ) for a
given colour temperature, and then substituting these into equation (4.9) to calculate
the chromaticity coordinates as a function of colour temperature (x(T ), y(T )). In
practice, it is convenient to use an approximate formula such as a cubic spline [7].
The Planckian locus is shown on the xy chromaticity diagram in figure 4.9.
4-18
Physics of Digital Photography (Second Edition)
Figure 4.9. Planckian locus (black curve) on the 1931 CIE xy chromaticity diagram calculated using the cubic
spline approximation. A selection of colour temperatures are indicated. Only chromaticities contained within
the output-referred sRGB colour space have been shown in colour.
4-19
Physics of Digital Photography (Second Edition)
isotherm in uv space between (u, v ) and the Planckian locus. According to the CIE,
the concept of CCT is only valid for distances up to a value ±0.05 from the
Planckian locus in uv space. The colour tint will be magenta or red below the
Planckian locus and green or amber above the Planckian locus.
⎡ X (WP)⎤
⎢ ⎥
⎢Y (WP) ⎥ ,
⎢⎣ Z (WP) ⎥⎦
where Y (WP) = 1. The values for X (WP) and Z(WP) can be calculated using
equation (4.10).
The terms ‘white point’ and ‘reference white’ are often used interchangeably. In this
book, ‘white point’ will be used only to describe the above property of the illumination.
The term ‘reference white’ introduced in section 4.1.12 will be used to describe the
white reference of a colour space defined by the unit vector in that colour space.
Figure 4.10. SPDs for some example CIE standard illuminants. All curves are normalised to a value of 100 at
560 nm.
4-20
Physics of Digital Photography (Second Edition)
Table 4.1. XYZ colour space data for a selection of CIE standard illuminants. The D series all represent
natural daylight.
define some common standard illuminants. Table 4.1 gives the white point for a
selection of standard illuminants in terms of chromaticity coordinates and relative
tristimulus values, together with the corresponding CCTs.
If the illumination under consideration is a standard illuminant, this can be
indicated at the lower-right-hand corner of the vector. For example, the white point
of D65 illumination in the XYZ colour space can be indicated in the following way:
⎡ X (WP)⎤ ⎡ 0.9504 ⎤
⎢ ⎥
⎢Y (WP) ⎥ = ⎢ 1 ⎥.
⎢⎣ ⎥
⎢⎣ Z (WP) ⎥⎦ 1.0888 ⎦
D65
4-21
Physics of Digital Photography (Second Edition)
By combining equations (3.39) and (3.50) from chapter 3, the raw channels can be
modelled as follows:
λ2
R=k ∫λ R1(λ) E˜e,λ(x , y ) dλ
1
λ2
G=k ∫λ R2(λ) E˜e,λ(x , y ) dλ
1
λ2
B=k ∫λ R3(λ) E˜e,λ(x , y ) dλ .
1
The SPD is specified in terms of spectral irradiance at the sensor plane (SP) as
described below, and the integration is over the spectral passband of the camera. The
normalization constant k is defined by
Ap t
k= .
gi e
Here Ap is the photosite area, t is the exposure duration, e is the elementary charge,
and gi is the conversion factor between electron counts and DN for mosaic i, as
described in section 3.7.3 of chapter 3.
The actual R , G1, G2 , B values obtained in practice are quantized values modelled
by taking the integer part of the above equations. As described in section 3.7.2 of
chapter 3, the maximum raw value is defined by the raw clipping point and is limited
by the bit depth of the analog-to-digital converter (ADC).
The SPD used to obtain the tristimulus values of the camera raw space has been
specified in terms of E˜λ(x , y ). This is the convolved and sampled spectral irradiance
distribution at a given photosite defined by equation (3.33) of chapter 3,
⎡x y⎤
E˜e,λ(x , y ) = (Ee,λ,ideal(x , y ) ∗ hsystem(x , y , λ)) comb ⎢ , ⎥ .
⎢⎣ px py ⎥⎦
Here Ee,λ,ideal(x , y ) is the ideal spectral irradiance distribution at the SP, px and py are
the pixel pitches in the horizontal and vertical directions, hsystem (x , y, λ ) is the system
point spread function (PSF), and (x , y ) are the sampling coordinates associated with
the photosite.
4-22
Physics of Digital Photography (Second Edition)
B1 R1 G1 R2 B1 G1 B2
R1 Gi R2 G3 Bj G4 G3 Rk G4
B2 R3 G2 R4 B3 G2 B4
Ri = (R1 + R2 )/2
Bi = (B1 + B2 )/2.
In the middle diagram, only the blue pixel component is known at site j. The red and
green pixel components Rj and Gj can be calculated as
Rj = (R1 + R2 + R3 + R 4)/4
Gj = (G1 + G 2 + G 3 + G4)/4.
Finally, only the red pixel component is known at site k in the diagram on the right.
The blue and green pixel components can be calculated as
Bk = (B1 + B2 + B3 + B4)/4
G k = (G1 + G 2 + G 3 + G4)/4.
Although conceptually simple, bilinear interpolation generates cyclic pattern noise
and zipper patterns along edges due to the cyclic change of direction of the
interpolation filter [11, 12].
Although the demosaicing algorithms used by in-camera image processing
engines are proprietary, a variety of sophisticated demosaicing algorithms have
been published in the literature [13]. For example, the open-source raw converter
‘dcraw’ offers the following [12]:
• Halfsize: This method does not carry out an interpolation but instead
produces a smaller image by combining each 2 × 2 Bayer block into a single
raw pixel vector. Since the red and blue photosites are physically separated,
this method causes colour fringes to appear along diagonal edges [12].
• Bilinear Interpolation: This method is illustrated in figure 4.11. It is used
mainly as a first step in the VNG algorithm described below.
• Threshold-based Variable Number of Gradients (VNG) [14]: This method
measures the colour gradients in each of the eight directions around each
pixel. Only the gradients closest to zero are used to calculate the missing
colour components in order to avoid averaging over sharp edges. Although
this method is slow and produces zipper patterns at orthogonal edges,
it excels for shapes that do not have well-defined edges such as leaves and
feathers [12].
• Patterned Pixel Grouping (PPG) [15]: This method first fills in the green
mosaic using gradients and pattern matching before filling in the red and blue
4-23
Physics of Digital Photography (Second Edition)
Of the above example methods, the AHD method yields the best quality output
overall. A weakness of the AHD method is that horizontal and vertical interpolation
can fail simultaneously at 45° edges. This is problematic for Fuji® raw files produced
by cameras that use the Fuji® X-Trans® CFA illustrated in figure 3.29 of chapter 3.
This non-Bayer type of CFA is designed to reduce the need for an optical-low pass
filter. Since the raw data contains many 45° edges, dcraw instead defaults to the PPG
method for Fuji® raw files [12].
4-24
Physics of Digital Photography (Second Edition)
Here TCFA,i is the CFA transmission function for mosaic i, η(λ ) is the charge
collection efficiency (CCE) of a photoelement, T (λ ) is the SiO2/Si interface trans-
mission function, and FF = Adet /Ap is the fill factor.
The camera response functions can be interpreted as specifying ‘amounts’ of the
camera raw space primaries at each wavelength. Since the camera response functions
are defined by physical filters, their values will always be non-negative.
Consequently, the primaries of the camera raw space must be imaginary, in analogy
with the eye cone primaries of the HVS. Recall that imaginary primaries are invisible
to the HVS as they are more saturated than pure spectrum colours.
where ‘reference’ refers to the scene illumination white point that yields the unit
vector.
Recall from section 4.1.4 that the reference white of the CIE RGB and CIE XYZ
colour spaces is illuminant E. This was achieved by introducing dimensionless units
for the primaries with each primary unit representing its own specific quantity of
power. In contrast, the units of the camera raw space primaries are not normalized
in such a manner. Due to the transmission properties of the CFAs used by camera
manufacturers, the reference white of a camera raw space is typically a magenta
colour that does not necessarily have an associated CCT.
4-25
Physics of Digital Photography (Second Edition)
Here T
_ is a 3 × 3 transformation matrix,
4-26
Physics of Digital Photography (Second Edition)
Figure 4.12. A typical colour chart that can be used for camera colour characterisation.
4-27
Physics of Digital Photography (Second Edition)
4. Measure average R , G , B values for each patch. The ISO 17321-1 standard
recommends that the block of pixels over which the average is taken should
be at least 64 × 64 pixels in size. Each patch can then be associated with an
appropriate average raw pixel vector.
5. Build a 3 × n matrix A_ containing the XYZ vectors for each patch 1, .., n as
columns,
⎡ X1 X2 ⋯ Xn ⎤
⎢ ⎥
A
_ = ⎢Y1 Y2 ⋯ Yn ⎥
⎣ Z1 Z2 ⋯ Zn ⎦
Similarly build a 3 × n matrix B
_ containing the corresponding raw pixel
vectors as columns,
⎡ R1 R2 ⋯ R n ⎤
⎢ ⎥
_ = ⎢ G1 G2 ⋯ Gn ⎥
B
⎣ B1 B2 ⋯ Bn ⎦
4-28
Physics of Digital Photography (Second Edition)
* = 2 2 2
ΔE ab (L1* − L 2*) + (a1* − a 2*) + (b1* − b 2*) .
4-29
Physics of Digital Photography (Second Edition)
illumination white point, the specific CIE XYZ and raw pixel vectors under
consideration can be denoted using the notation introduced in section 4.2.3,
⎡ X (WP)⎤ ⎡ R(WP)⎤
⎢ ⎥ ⎢ ⎥
⎢Y (WP) ⎥ and ⎢ G(WP)⎥ .
⎢⎣ Z (WP) ⎥⎦ ⎢⎣ B(WP)⎥⎦
D65 D65
2. Scale the average raw pixel vector so that all raw tristimulus values are
restricted to the range [0,1]. Since the green raw tristimulus value is typically
the first to saturate (reach its maximum value) under most types of
illumination, generally G(WP) = 1 whereas R(WP) and B(WP) are both
less than 1,
⎡ R(WP) < 1⎤
⎢ ⎥
⎢ G(WP) = 1⎥ .
⎢⎣ B(WP) < 1⎥⎦
D65
3. Scale T
_D65 so that it maps the scaled average raw pixel vector to the scaled
average CIE XYZ pixel vector for the white patch,
⎡ 0.9504 ⎤ ⎡ R(WP) < 1⎤
⎢1.0000 ⎥ = T ⎢ ⎥
_ D65 ⎢ G(WP) = 1 ⎥ .
⎢⎣ ⎥⎦
1.0888 ⎢⎣ B(WP) < 1 ⎥⎦
D65
Further details of normalisations used in digital cameras and raw converters are
described in sections 4.8.2 and 4.9.
4-30
Physics of Digital Photography (Second Edition)
Figure 4.13. Gamuts of the sRGB, Adobe® RGB and ProPhoto RGB output-referred colour spaces. The
reference white of sRGB and Adobe RGB is CIE illuminant D65, and the reference white of ProPhoto RGB is
CIE illuminant D50. The grey horseshoe-shaped region defines all (x, y ) chromaticities visible to the 2°
standard colourimetric observer.
4-31
Physics of Digital Photography (Second Edition)
relates to image quality and not colour conversion. Up until section 4.10, the present
chapter deals with the linear forms of the output-referred colour spaces.
Figure 4.14. The coloured region defined by the ITU-R BT.709 primaries shows the chromaticities represented
by the sRGB colour space. The grey horseshoe-shaped region defines all chromaticities visible to the 2°
standard colourimetric observer.
4-32
Physics of Digital Photography (Second Edition)
In this book, the tristimulus values of the linear form of the sRGB colour space
are denoted using an ‘L’ subscript in order to distinguish them from tristimulus
values of the final nonlinear form of the sRGB colour space discussed in section
4.10.
Using relative colourimetry with tristimulus values normalised to the range [0,1],
the primaries of the sRGB colour space are defined by
⎡ RL ⎤ ⎡1 ⎤ ⎡0⎤ ⎡0⎤
⎢ ⎥ ⎢ ⎥ ⎢1 ⎥ , and ⎢ 0 ⎥ .
⎢G L ⎥ = ⎢ 0 ⎥ , ⎢⎣ ⎥⎦ ⎢⎣ ⎥⎦
⎣ BL ⎦ ⎣ 0 ⎦ 0 1
The reference white obtained by adding a normalised unit of each primary is CIE
illuminant D65,
⎡ RL(WP) ⎤ ⎡1⎤
⎢ ⎥
⎢G L(WP)⎥ = ⎢1⎥ .
⎢⎣ ⎥⎦
⎢⎣ BL(WP) ⎥⎦ 1
D65
⎡ RL ⎤ ⎡X ⎤
⎢ ⎥ −1 ⎢ ⎥
G
⎢ ⎥L = M
_ sRGB Y ,
⎢⎣ ⎥⎦
⎣ BL ⎦D65 Z D65
where
4-33
Physics of Digital Photography (Second Edition)
⎡X ⎤ ⎡ RL ⎤
⎢Y ⎥ ⎢ ⎥
=M
_ sRGB⎢G L ⎥ ,
⎢⎣ ⎥⎦
Z D65 ⎣ BL ⎦ D65
with
⎡X ⎤ ⎡R⎤
⎢Y ⎥ _ D65 ⎢ G ⎥
=T .
⎢⎣ ⎥⎦ ⎢ ⎥
Z D65 ⎣B ⎦ D65
The transformation from CIE XYZ to the linear form of the sRGB colour space is
defined by
⎡ RL ⎤ ⎡X ⎤
⎢ ⎥ −1 ⎢ ⎥
⎢G L ⎥ =M
_ sRGB Y .
⎢⎣ ⎥⎦
⎣ BL ⎦D65 Z D65
The subscripts in both of the above equations indicate that the scene illumination
has a D65 white point. Combining the equations yields the required transformation
from the camera raw space to sRGB,
⎡ RL ⎤ ⎡R⎤
⎢ ⎥ −1 ⎢ ⎥
G
⎢ L⎥ = M
_ sRGB D65 ⎢ G ⎥
T
_ .
⎣ BL ⎦D65 ⎣ ⎦
B D65
4-34
Physics of Digital Photography (Second Edition)
Normalisation
By considering the raw pixel vector corresponding to the scene illumination white
point (i.e. a 100% neutral diffuse reflector photographed under the scene illumina-
tion), the T_ D65 matrix can be normalised according to section 4.4.4 so that the
maximum green raw tristimulus value maps to Y = 1,
⎡ 0.9504 ⎤ ⎡ R(WP) < 1⎤
⎢1.0000 ⎥ = T ⎢ ⎥
_ D65 ⎢ G(WP) = 1 ⎥ .
⎢⎣ ⎥
1.0888 ⎦ ⎢⎣ B(WP) < 1 ⎥⎦
D65
Since the reference white of the sRGB colour space is D65, the total transformation
is normalised as follows:
⎡1⎤ ⎡ R(WP) < 1⎤
⎢1⎥ = M ⎢ ⎥
−1
_ D65 ⎢ G(WP) = 1 ⎥
_ sRGB T .
⎢⎣ ⎥⎦
1 ⎢⎣ B(WP) < 1 ⎥⎦
D65
4-35
Physics of Digital Photography (Second Edition)
can compute its own estimate by analysing the raw data using the automatic
WB function. In all cases, the camera estimate for the adapted white is
known as the camera neutral [30] or adopted white (AW) [29]. The AW can be
regarded as an estimate of the scene illumination white point, which for
simplicity is assumed to correspond to the adapted white.
2. Choose a standard illumination white point that will replace the AW when
the image is encoded. This is defined as the encoding white and is typically the
reference white of the selected output-referred colour space, which is
illuminant D65 in the case of the sRGB colour space.
3. Perform the AW replacement by applying a chromatic adaptation transform
(CAT) to the raw data in order to adapt the AW to the encoding white.
4. In combination with step 3, convert the camera raw space into the linear
form of the selected output-referred colour space. Subsequently, the image
can be encoded, as described in chapter 2.
5. View the image on a calibrated display monitor. In the case of an image
encoded using the sRGB colour space, a scene object that appeared to be
white at the time the photograph was taken will now be displayed using the
D65 reference white. Ideally, the ambient viewing conditions should match
those defined as appropriate for viewing the sRGB colour space.
Further details of the above approach, labelled ‘Strategy 1’, are given in section 4.7.
This is the type of approach used by various external raw converters and modern
smartphone cameras.
However, traditional digital camera manufacturers typically reformulate equa-
tion (4.15) in the following way:
4-36
Physics of Digital Photography (Second Edition)
⎡ RL ⎤ ⎡R⎤
⎢ ⎥
⎢G L ⎥ _ _⎢G ⎥
= RD . (4.16)
⎢ ⎥
⎣ BL ⎦D65 ⎣ B ⎦scene
The above approach, labelled ‘Strategy 2’, has several advantages over strategy 1.
Further details are given in section 4.8. Practical examples of strategies 1 and 2 are
given in the sections 4.8 and 4.9.
4-37
Physics of Digital Photography (Second Edition)
Following the description of how to convert from the camera raw space to the
CIE XYZ colour space given in section 4.4.2, the AW can be specified using XYZ
tristimulus values by applying the camera transformation matrix T
_,
⎡ X (AW)⎤ ⎡ R(AW)⎤
⎢ ⎥ ⎢ ⎥
⎢Y (AW) ⎥ _ ⎢ G(AW) ⎥
=T
⎢⎣ Z (AW) ⎥⎦ ⎢⎣ B(AW) ⎥⎦
scene scene
• The ‘scene’ subscript denotes the true white point of the scene illumination.
The X(AW), Y(AW), Z(AW) are the camera estimates of the scene illumi-
nation white point, which is assumed to correspond to the adapted white.
• The transformation matrix T _ should ideally be obtained from a character-
isation performed using illumination with a white point that matches the AW.
• Since the green raw tristimulus value is typically the first to reach its
maximum value under most types of illumination, in general G(AW) = 1
while R(AW) < 1 and B(AW) < 1.
After X(AW), Y(AW), Z(AW) are known, equation (4.9) can be used to calculate
the (x , y ) chromaticity coordinates of the AW.
Using the methods described in section 4.2.2, the AW can alternatively be
specified in terms of a CCT and tint provided the chromaticity lies within a tolerated
distance from the Planckian locus.
WB will be incorrect if the AW (scene illumination CCT estimate) is not
sufficiently close to the true adapted white. An example of incorrect WB is shown
in figure 4.15. In each case, the same raw data has been white balanced for the sRGB
colour space using a different AW. Remembering that a blackbody appears red at
low colour temperatures and blue at high colour temperatures, the upper image
assumes a scene illumination CCT lower than the true value. Consequently, the
image appears to be bluer or colder than expected. Conversely, the lower image
assumes a scene illumination CCT higher than the true value, and so the image
appears to be redder or warmer than expected. The central image shows correct WB
obtained using the CCT calculated by the in-camera auto-WB function.
von-Kries CAT
Back in 1902, von-Kries postulated that the chromatic adaptation mechanism of the
HVS can be modelled as an independent scaling of each of the eye cone response
4-38
Physics of Digital Photography (Second Edition)
4-39
Physics of Digital Photography (Second Edition)
functions, or equivalently the L, M and S values in the LMS colour space introduced
in section 4.1.2.
Consider the raw data represented using the CIE XYZ colour space. In the
following example, the aim is to transform the raw data so that the scene
illumination white point is adapted to D65,
⎡X ⎤ ⎡X ⎤
⎢Y ⎥ = CAT AW→D65⎢Y ⎥ .
⎢⎣ ⎥⎦ ⎢⎣ ⎥⎦
Z D65 Z scene
In this case the von-Kries CAT can be written
⎡ L(D65) ⎤
⎢ 0 0 ⎥
⎢ L(AW) ⎥
−1
⎢ M (D65) ⎥
_ vK⎢ 0
CAT AW→D65 = M 0 ⎥M _ vK.
⎢ M (AW) ⎥
⎢ S (D65) ⎥
⎢ 0 0
S (AW) ⎥⎦
⎣
The matrix M
_ vK transforms each raw pixel vector into a diagonal matrix in the LMS
colour space. The modern form of the transformation matrix M _ vK is the Hunt–
Pointer–Estevez transformation matrix [31] defined as
⎡ 0.38971 0.68898 − 0.07868⎤
M
_ vK = ⎢− 0.22981 1.18340 0.04641⎥ .
⎢⎣ ⎥
0.00000 0.00000 1.00000 ⎦
After applying M_ vK , the L, M, S values are independently scaled according to the
von-Kries hypothesis. In the present example, the scaling factors arise from the ratio
between the AW and D65 white points. These can be obtained from the following
white point vectors:
⎡ L(AW) ⎤ ⎡ X (AW)⎤
⎢ ⎥ ⎢ ⎥
⎢ M (AW)⎥ = M_ vK⎢Y (AW) ⎥
⎢⎣ S (AW) ⎥⎦ ⎢⎣ Z (AW) ⎥⎦
scene
⎡ L(D65) ⎤ ⎡ X (WP) = 0.9504 ⎤
⎢ ⎥ ⎢ ⎥
⎢ M (D65)⎥ = M_ vK⎢Y (WP) = 1.0000 ⎥ .
⎢⎣ S (D65) ⎥⎦ ⎢⎣ Z (WP) = 1.0888 ⎥⎦
D65
4-40
Physics of Digital Photography (Second Edition)
Since the colour demosaic has not yet been performed, the above vectors represent
Bayer blocks rather than raw pixels. Equivalently, the scaling factors can be applied
directly to the raw channels.
The diagonal scaling factors are known as raw WB multipliers or raw channel
multipliers. They can be obtained directly from the AW calculated by the camera,
⎡ R(AW)⎤
⎢ ⎥
⎢ G(AW) ⎥ .
⎢⎣ B(AW) ⎥⎦
scene
4-41
Physics of Digital Photography (Second Edition)
The interpolation approach is used by the Adobe® DNG converter, and full details
are given in section 4.9.
4-42
Physics of Digital Photography (Second Edition)
• D
_ is a diagonal matrix containing raw channel multipliers appropriate for the
AW.
• R
_ is the colour rotation matrix optimised for the scene illumination. It is
algebraically defined as
−1 (4.17)
R
_ =M
_ sRGB _ _ −1.
CAT AW→D65 TD
Each row of this matrix sums to unity.
Since equations (4.15) and (4.16) are white balanced, the raw pixel vector
corresponding to the AW is mapped to the sRGB reference white,
⎡ RL = 1⎤ ⎡ R(AW)⎤
⎢ ⎥ ⎢ ⎥
⎢G L = 1⎥ _ _ ⎢ G(AW) ⎥
= RD .
⎣ BL = 1⎦D65 ⎢⎣ B(AW) ⎥⎦
scene
4-43
Physics of Digital Photography (Second Edition)
In particular, the camera raw space reference white is mapped to the sRGB
reference white,
⎡ RL = 1⎤ ⎡ R = 1⎤
⎢ ⎥
⎢G L = 1⎥ _ ⎢ G = 1⎥
=R .
⎢ ⎥
⎣ BL = 1⎦D65 ⎣ B = 1 ⎦reference
Recall the algebraic definition of the colour rotation matrix optimised for the scene
illumination,
4-44
Physics of Digital Photography (Second Edition)
−1
R
_ =M
_ sRGB _ _ −1.
CAT AW→D65 TD
In principle, T_ should be obtained from a characterisation performed using an
illuminant with a white point that matches the AW. This means that a different
optimised colour rotation matrix can be defined for every possible scene illumination
white point.
In practice, sufficient accuracy is achieved by defining a small fixed set of colour
rotation matrices, each optimised for use with a particular camera WB preset. This is
due to the fact that the variation of colour rotation matrices with respect to CCT is
very small. The procedure is as follows:
1. The camera determines the AW and associated raw channel multipliers
appropriate for the scene illumination. The diagonal WB matrix D _ is then
applied to the raw channels. In particular, this adapts the AW to the camera
raw space reference white.
2. The camera chooses the preset rotation matrix optimised for illumination
with a white point that provides the closest match to the AW.
3. The camera applies the chosen preset colour rotation matrix to convert to the
output-referred colour space. In particular, the camera raw space reference
white will be mapped to the output-referred colour space reference white.
As an example, table 4.2 lists raw channel multipliers and colour rotation matrices
for the preset WB settings found in the raw metadata of the Olympus® E-M1 digital
camera. The listed values are 8-bit fixed point numbers used by the internal
processor and so need to be divided by 256.
Notice that the custom WB preset has been set to 6400 K. In this case the raw WB
multipliers are found to be
⎡ 2.1719 0 0 ⎤
_ 6400 K = ⎢ 0
D 1 0 ⎥ .
⎢⎣ ⎥
0 0 1.4922 ⎦6400 K
Table 4.2. Raw channel multipliers and raw-to-sRGB colour rotation matrices corresponding to in-camera
preset CCTs for the Olympus® E-M1 with 12-40 f/2.8 lens and v4.1 firmware. These raw metadata values can
be divided by 256 as they are 8-bit fixed-point numbers used by the internal processor. The third and fourth
columns of the ‘Raw WB multipliers’ column represent the two green mosaics.
3000 K Tungsten 296 760 256 256 324 −40 −28 −68 308 16 16 −248 488
4000 K Cool fluorescent 492 602 256 256 430 −168 −6 −50 300 6 12 −132 376
5300 K Fine weather 504 434 256 256 368 −92 −20 −42 340 −42 10 −140 386
6000 K Cloudy 544 396 256 256 380 −104 −20 −40 348 −52 10 −128 374
7500 K Fine weather, shade 588 344 256 256 394 −116 −22 −38 360 −66 8 −112 360
5500 K Flash 598 384 256 256 368 −92 −20 −42 340 −42 10 −140 386
6400 K Custom 556 382 256 256 380 −104 −20 −40 348 −52 10 −128 374
4-45
Physics of Digital Photography (Second Edition)
The camera has chosen to use the 6000 K preset rotation matrix in conjunction with
the 6400 K raw channel multipliers
⎡ ⎤ ⎡ ⎤
1 ⎢ 380 − 104 − 20 ⎥ ⎢ 1.4844 − 0.4063 − 0.0781⎥
R
_ 6000 K = − 40 348 − 52 = − 0.1563 1.3594 − 0.2031 . (4.18)
256 ⎢⎣ ⎥ ⎢ ⎥
10 − 128 374 ⎦ ⎣ 0.0391 − 0.5000 1.4609 ⎦
For a given camera model, the preset rotation matrices and raw channel multipliers
are dependent on factors such as
• The output-referred colour space selected by the photographer in the camera
settings, for example, sRGB or Adobe® RGB.
• The lens model used to take the photograph.
• The camera used to take the photograph. Due to sensor calibration differ-
ences between different examples of the same camera model, the listed matrix
may differ even if the same settings are selected on different examples and the
same firmware is installed.
4.8.2 dcraw
The widely used open-source raw converter ‘dcraw’ by default outputs directly to the
sRGB colour space with a D65 encoding illumination white point by utilising a
variation of Strategy 2.
Recall that the colour rotation matrix optimised for use with the scene
illumination is defined by equation (4.17),
−1
R
_ =M
_ sRGB _ _ −1.
CAT AW→D65 TD
Also recall from the previous section that digital cameras typically use a small set of
preset rotation matrices optimised for a selection of preset CCTs. Alternatively,
numerical methods can be applied to interpolate between two preset rotation
matrices that are optimised for use with either a low CCT or high CCT illuminant.
In contrast, dcraw takes a very computationally simple approach by using only a
single rotation matrix optimised for scene illumination with a D65 white point. In
other words, R_ ≈R _ D65, where
−1 −1
(4.19)
R
_ D65 = M
_ sRGB T
_ D65 D
_ D65 .
4-46
Physics of Digital Photography (Second Edition)
⎡ 1 ⎤
⎢ 0 0 ⎥
⎢ R(D65) ⎥
⎢ 1 ⎥
_ D65 = ⎢
D 0 0 ⎥. (4.20)
⎢ G(D65) ⎥
⎢ 1 ⎥
⎢⎣ 0 0
B(D65) ⎥⎦
The overall transformation from the camera raw space to sRGB is defined by
⎡ RL ⎤ ⎡R⎤
⎢ ⎥
⎢G L ⎥ ≈R _⎢G ⎥
_ D65 D .
⎢ ⎥
⎣ BL ⎦D65 ⎣ B ⎦scene
Notice that D_ appropriate for the AW is applied to the raw data and so the correct
raw channel multipliers appropriate for the scene illumination are applied. Although
the colour transformation matrix T_ D65 is optimised for scene illumination with a D65
white point only, the colour rotation matrix R _ D65 is perfectly valid for transforming
from the camera raw space to sRGB. In fact, R _ D65 becomes exact when the AW is
D65. However, the overall colour transformation loses accuracy when the scene
illumination CCT differs significantly from 6500 K.
4-47
Physics of Digital Photography (Second Edition)
By considering the unit vector in the sRGB colour space, the above matrix can be
applied to obtain the raw tristimulus values for illumination with a D65 white point,
⎡ R(WP) = 0.4325⎤ ⎡ RL = 1⎤
⎢ ⎥ ⎢ ⎥
⎢ G(WP) = 1.0000 ⎥
−1
=T
_ D65 M
_ sRGB⎢G L = 1⎥
⎢⎣ B(WP) = 0.7471 ⎥⎦ ⎣ BL = 1⎦D65
D65
Now equation (4.20) can be used to extract the raw channel multipliers for scene
illumination with a D65 white point,
⎡ 2.3117 0 0 ⎤
_ D65 = ⎢ 0
D 1 0 ⎥ .
⎢⎣ ⎥
0 0 1.3385⎦D65
Finally, the colour rotation matrix can be calculated from equation (4.19),
Each row sums to unity as required. The form of the matrix is similar to the example
Olympus matrix defined by equation (4.18). However, the Olympus matrix
corresponds to a characterisation performed using a 6000 K illuminant rather
than 6500 K.
4-48
Physics of Digital Photography (Second Edition)
⎡X ⎤ ⎡R⎤
⎢Y ⎥ _ −1⎢ G ⎥
= CAT AW→D65 C . (4.22)
⎢⎣ ⎥⎦ ⎢ ⎥
Z D50 ⎣ B ⎦scene
4-49
Physics of Digital Photography (Second Edition)
However, the Adobe transformation matrices are defined in the reverse direction to
the transformation matrices of section 4.4.2,
_ (1) ∝ ColorMatrix1−1
T
_ (2) ∝ ColorMatrix2−1.
T
Here (1) and (2) refer to the low-CCT and high-CCT characterisation illuminants,
respectively. The DNG specification uses linear interpolation based upon the inverse CCT.
Interpolation algorithm
The optimised transformation matrix C
_ is calculated by interpolating between
ColorMatrix1 and ColorMatrix2 based on the scene illumination CCT estimate
4-50
Physics of Digital Photography (Second Edition)
denoted by CCT(AW), together with the CCTs associated with each of the two
characterisation illuminants denoted by CCT(1) and CCT(2), respectively, with
CCT(1) < CCT(2). If only one matrix is included, then C _ will be optimal only if
CCT(AW) matches CCT(1) or CCT(2). To facilitate the interpolation, note that
ColorMatrix1 and ColorMatrix2 are by default normalized so that the WP of D50
illumination, rather than the AW, maps to the maximum raw tristimulus value.
The interpolation itself is complicated by the fact that the AW is calculated by the
camera in terms of raw values,
⎡ R(AW)⎤
⎢ ⎥
⎢ G(AW) ⎥ .
⎢⎣ B(AW) ⎥⎦
scene
Finding the corresponding CCT(AW) requires converting to CIE XYZ via a matrix
transformation C _ that itself depends upon the unknown CCT(AW). This problem
can be solved using a self-consistent iteration procedure.
1. Make a guess for the AW chromaticity coordinates, (x(AW), y(AW)). For
example, the chromaticity coordinates corresponding to one of the character-
isation illuminants could be used.
2. Convert (x(AW), y(AW)) to the corresponding (u(AW), v(AW)) chromatic-
ity coordinates of the 1960 UCS colour space using equation (4.11) of section
4.2.2.
3. Use Robertson’s method [8] or an approximate formula [9, 10] as described
in section 4.2.2 to determine a guess for CCT(AW).
4. Calculate the interpolation weighting factor α according to
(CCT(AW))−1 − (CCT(2))−1
α= . (4.23)
(CCT(1))−1 − (CCT(2))−1
5. Calculate the interpolated transformation matrix C
_,
C
_ = α ColorMatrix1 + (1 − α ) ColorMatrix2.
6. Convert the AW from the camera raw space to the CIE XYZ colour space,
⎡ X (AW)⎤ ⎡ R(AW)⎤
⎢ ⎥ −1⎢ ⎥
⎢Y (AW) ⎥ _ ⎢ G(AW) ⎥
=C ,
⎢⎣ Z (AW) ⎥⎦ ⎢⎣ B(AW) ⎥⎦
scene scene
and calculate a new guess for (x(AW), y(AW)) by applying equation (4.9).
4-51
Physics of Digital Photography (Second Edition)
⎡X ⎤ ⎡R⎤
⎢Y ⎥ =F
_D_⎢G ⎥ .
⎢⎣ ⎥⎦ ⎢ ⎥
Z D50 ⎣ B ⎦scene
⎡1⎤ ⎡ R(AW)⎤
⎢1⎥ = D ⎢ ⎥
_ ⎢ G(AW) ⎥ .
⎢⎣ ⎥⎦
1 ⎢⎣ B(AW) ⎥⎦
scene
• The forward matrix _F maps from the camera raw space to the PCS, i.e. the
CIE XYZ colour space with a D50 white point. This means that the reference
white of the camera raw space is adapted to the white point of D50
illumination,
⎡ X (WP) = 0.9641⎤ ⎡1⎤
⎢ ⎥
⎢Y (WP) = 1.0000 ⎥ _ ⎢1⎥ .
=F
⎢⎣ ⎥⎦
⎢⎣ Z (WP) = 0.8249 ⎥⎦ 1
D50
• The camera raw space reference white is adapted to the white point of D50
illumination.
4-52
Physics of Digital Photography (Second Edition)
F
_ = α ForwardMatrix1 + (1 − α ) ForwardMatrix2.
4-53
Physics of Digital Photography (Second Edition)
Figure 4.16. The blue line shows a power law gamma curve defined by γE = 1/2.2, which appears as a straight
line when plotted using logarithmic axes. In contrast, the piecewise sRGB gamma curve shown in green has
constant gamma only below 0.003 130 8. Here V = RL , G L or BL and V ′ = R′, G′ or B′.
R′, G′, B′ values are obtained by applying a piecewise gamma curve in the following
way:
For RL , G L , BL ⩽ 0.0031308,
R′ = 12.92 RL
G′ = 12.92 G L (4.25)
B′ = 12.92 BL.
For RL , G L , BL > 0.0031308,
Subsequently, image DOLs can be obtained by scaling R′, G′, B′ to the range
[0, 2M − 1], where M is the required bit depth, and then quantizing to the nearest
integer:
′ = Round{(2M − 1)R′}
R DOL
′ = Round{(2M − 1)G′}
GDOL (4.27)
′ = Round{(2M − 1)B′} .
BDOL
In order to avoid numerical issues close to zero, notice that the sRGB piecewise
gamma curve has a linear portion with constant encoding gamma γE = 1 below
0.0031308.
4-54
Physics of Digital Photography (Second Edition)
The differences between the sRGB gamma curve and a standard γE = 1/2.2 curve
can be seen more clearly by plotting both curves using logarithmic axes. As
illustrated in figure 4.16, data plotted on logarithmic axes is not altered numerically
but the spacing between axis values changes in a logarithmic manner. A power law
plotted on linear axes appears as a straight line when plotted on logarithmic axes
with the gradient of the line equal to the exponent of the power law. Unlike the
sRGB curve, the standard γE = 1/2.2 curve appears as a straight line with gradient
equal to 1/2.2.
4.10.2 sRGB colour cube
Recall that output-referred RGB colour spaces are additive and so their gamuts each
define a triangle on the xy chromaticity diagram. Furthermore, sRGB and Adobe®
RGB use real primaries. These properties imply that sRGB and Adobe® RGB can
be visualised as cubes when described by a colour model based on a 3D Cartesian
coordinate system.
In the case of sRGB, recall that the (x , y ) chromaticity coordinates of the red,
green, and blue primaries are defined as
The gamut defined by the primaries is illustrated in figure 4.14. Using tristimulus
values of the linear form of sRGB normalised to the range [0,1], each primary can
also be written as a vector,
⎡ RL ⎤ ⎡1 ⎤ ⎡0⎤ ⎡0⎤
⎢ ⎥ ⎢ ⎥ ⎢1 ⎥ , and ⎢ 0 ⎥ .
⎢G L ⎥ = ⎢ 0 ⎥ , ⎢⎣ ⎥⎦ ⎢⎣ ⎥⎦
⎣ BL ⎦ ⎣ 0 ⎦ 0 1
In order to visualise the sRGB colour cube correctly on a display monitor, the linear
form of sRGB needs to be converted into its nonlinear form by applying the
encoding gamma curve and then quantizing to DOLs using equation (4.27). A cube
defined by all DOLs will appear correctly when viewed on a standard gamut
monitor since the display gamma will compensate for the encoding gamma.
An 8-bit sRGB colour cube is illustrated in figure 4.17. Using Cartesian
coordinates, black is defined by the origin,
′ , GDOL
(R DOL ′ , BDOL
′ ) = (0, 0, 0).
All greyscale colours lie along the diagonal between the origin and the reference
white. The red, green and blue primaries are defined by
′ , GDOL
(R DOL ′ , BDOL
′ ) = (255, 0, 0), (0, 255, 0), and (0, 0, 255).
4-55
Physics of Digital Photography (Second Edition)
magenta
reference white (D65)
(255,255,255)
blue
primary
cyan
red
primary
yellow
black
(0,0,0)
green primary
Figure 4.17. sRGB colour cube defined by 8-bit DOLs.
Since the DOLs are nonlinearly related to scene luminance, they are specified by
coordinates rather than vectors.
Adobe® RGB can also be described using a colour cube, but can only be
visualised correctly on a wide-gamut monitor. On the other hand, ProPhoto RGB
cannot be visualised as a cube since part of its gamut lies outside the visible
chromaticities defined by the horseshoe-shaped region of the xy chromaticity
diagram.
4-56
Physics of Digital Photography (Second Edition)
D50 white point. In recent versions of Microsoft® Windows®, ICC colour profiles
are stored in the following directory:
C: \Windows\System32\spool\drivers\color.
Colour-managed applications such as raw converters and image-editing software
will automatically utilise the display profile created by the profiling software. The
translation of colours between ICC profiles is carried out by the colour matching
module (CMM).
When raw conversion is performed by the in-camera image processing engine, an
EXIF (exchangeable image file format) tag is added to the JPEG metadata
indicating the appropriate ICC profile for the chosen output-referred colour space.
Typically, sRGB [25] and Adobe® RGB [26] are the available in-camera options.
The ICC profile enables the encoded JPEG image DOLs to be correctly interpreted
and does not affect the raw data. If the option is given, it is good practice to embed
the ICC profile in the image file for the benefit of users who do not have the relevant
ICC profile installed on their viewing device.
• sRGB is the appropriate option for displaying images on standard-gamut
monitors and on the internet. It is also the most suitable option for displaying
images on devices that do not have colour management in place.
• Adobe® RGB is the appropriate option for displaying images on wide-gamut
monitors and for printing. Multi-ink printers can have very large gamuts.
4-57
Physics of Digital Photography (Second Edition)
Only colours contained in both the working space and the monitor/display gamut
can be seen when viewing or editing the image. Nevertheless, colours in the working
space that cannot be seen on the display will be visible on a print if they lie within the
printer gamut.
Since the output image file from step 4 has been encoded using a large colour
space, it can be archived and used as a source file for producing an appropriate
output JPEG image as required. Different applications require specific adjustments
to be made, such as resampling, sharpening or conversion to a smaller output-
referred colour space.
When converting the image file from step 4 into a different colour space, software
such as Adobe® Photoshop® (via the ‘Convert to Profile’ option) will perform gamut
mapping with an applied rendering intent. Perceptual intent is recommended when
converting to a smaller colour space such as sRGB. Although any out-of-gamut
colours will be clipped, perceptual intent will shift the in-gamut colours in order to
preserve colour gradations.
The advantage of using a working space larger than the final chosen output-
referred colour space is that a greater amount of colour information can be
preserved by minimizing the clipping of colours until the final gamut-mapping stage.
4-58
Physics of Digital Photography (Second Edition)
The extra tonal levels can be advantageous as part of a raw workflow. When an 8-bit
image file obtained from a raw converter undergoes further manipulation such as
extensive levels adjustment using image editing software, posterisation artefacts may
arise in areas of steep tonal gradation. However, a 16-bit image file provides a much
finer mesh for the calculations before conversion to an 8-bit image file or 8-bit DOLs
for an 8-bit display, and so these artefacts are avoided. For the same reason, 16-bit
image files are recommended for use with wide gamut colour spaces such as
ProPhoto RGB, which provide steeper colour gradations.
Information will be lost when a JPEG file is resaved, even if no adjustments are
made. In contrast, TIFF (Tagged Image File Format) files are lossless and can be
repeatedly saved without loss of information. 16-bit TIFF files can be archived and
an appropriate JPEG file produced as required.
Working space
As explained in step 3 of section 4.11.2, the working space is the colour space used
for editing an imported image and is unrelated to the display profile. Several options
are available, as discussed in the colour management policies section below.
Nevertheless, it is generally advisable to set the working space to be the colour
space associated with the ICC profile of the imported image. This is likely to be
either sRGB or Adobe® RGB for an imported image obtained directly from an in-
camera image-processing engine.
Note that the monitor/display ICC profile is available as ‘Monitor RGB - profile
name’ in the working space option list. However, it is inadvisable to select this as the
working space.
When following a maximal colour strategy, the ICC profile of the imported image
will be a large output-referred colour space such as Adobe® RGB, or preferably
WideGamut RGB or ProPhoto RGB. Recall from section 4.11.2 that a major
advantage of using a large working space is that a greater amount of colour information
can be preserved by minimizing the clipping of colours, even if the image will ultimately
be converted to a smaller colour space at the final gamut-mapping stage.
4-59
Physics of Digital Photography (Second Edition)
Figure 4.18. ‘Color Settings’ control box available in Adobe® Photoshop® CS5.
A drawback of using ProPhoto RGB as the working space is that its gamut
cannot be shown in its entirety on a wide-gamut display, and so some of the colours
that appear in print may not be visible on the monitor/display. This can be
understood by recalling from sections 4.1 and 4.3 that non-negative linear combi-
nations of three real primaries cannot reproduce all visible chromaticities. This is
due to the fact that the three types of eye cone cell cannot be stimulated
independently, and so the visible gamut appears as a horseshoe shape rather than
a triangle on the xy chromaticity diagram. Although the primaries associated with
response functions of capture devices can be imaginary, the primaries of a display
device must be physically realizable and therefore cannot lie outside of the visible
gamut. Since non-negative linear combinations of primaries lie within a triangle
defined by the primaries on the xy chromaticity diagram, a display device that uses
only three real primaries cannot reproduce all visible colours. Modern wide-gamut
displays can show many colours outside both the sRGB and Adobe® RGB gamuts,
4-60
Physics of Digital Photography (Second Edition)
although some regions of these gamuts may not be covered. ProPhoto RGB includes
many additional colours that can be shown on wide-gamut displays, but two of the
primaries are imaginary and so the entire colour space cannot be shown. The gamuts
of several output-referred colour spaces are illustrated in figure 4.13.
Figure 4.19. ‘Embedded Profile Mismatch’ dialogue box present upon opening an image using Adobe®
Photoshop® CS5.
4-61
Physics of Digital Photography (Second Edition)
Provided the working space is larger than the image ICC profile colour space, it is
also perfectly acceptable to choose option B and convert the colours to the working
space. Although the additional colours available in the working space will not be
utilised upon conversion, the entire gamut of the working space can subsequently be
utilised when editing the image. In some cases this can be undesirable. If the image
ICC profile colour space is larger than the working space, choosing option B may
lead to unnecessary clipping of the scene colours. In this case the rendering intent
default setting in the colour options will be applied. Rendering intent is discussed in
the conversion options section below.
If the embedded profile is missing, a colour profile can be assigned to the image.
In order to test different profiles, choose ‘Leave as is (don’t color manage)’ and then
assign a profile using the ‘Assign Profile’ menu option. The appearance of the image
will suggest the most suitable profile. Most likely, the RGB values will correspond to
the sRGB colour space.
Conversion options
The conversion options define the default settings for converting between ICC
profiles. Conversion may be required when there is an embedded profile mismatch
upon opening an image. It can also be performed at any time by using the ‘Convert
to Profile’ menu option. For example, if the image ICC profile is ProPhoto RGB and
this has been set as the working space for editing, the photographer may wish to
create a version converted to sRGB for displaying on the internet.
Since the source and destination colour spaces have different gamuts, gamut
mapping may result in a change to the image DOLs. The ICC have defined several
rendering intents that can be applied by the CMM. In photography, the two most
useful rendering intents are relative colorimetric and perceptual.
• Relative colorimetric intent clips out-of-gamut colours to the edge of the
destination gamut, and leaves in-gamut colours unchanged. It is generally
advisable to enable black point compression when using relative colorimetric
intent, particularly if the source image colours all lie inside the gamut of the
destination colour space.
• Perceptual intent also clips out-of-gamut colours to the edge of the destination
gamut, but at the same time shifts in-gamut colours to preserve colour
gradations. This can be viewed as a compression of the source gamut into the
destination gamut and may be preferable when there are source image colours
that lie outside of the destination gamut.
If the preview option is selected, the most suitable rendering intent can be chosen on
a case-by-case basis.
The ‘Proof Colors’ menu option enables the effects of conversion to destination
profiles to be simulated. The destination profile can be selected via ‘Proof Setup’.
Note that the ‘Monitor RGB’ option effectively simulates the appearance of colours
on the monitor/display without the display profile applied, in other words an
uncalibrated monitor/display.
4-62
Physics of Digital Photography (Second Edition)
n = n(h) × n(v),
where n(h ) is the number of pixels in the horizontal direction and n(v ) is the
number of pixels in the vertical direction.
• The image display resolution specifies the number of displayed image pixels
per unit distance, most commonly in terms of pixels per inch (ppi). This refers
to the manner in which an image is displayed and is not a property of the
image itself.
• The screen resolution is the number of monitor pixels per inch (ppi). The
screen resolution defines the image display resolution when the image is
displayed on a monitor.
• The print resolution is the image display resolution for a hardcopy print. For
high quality prints, 300 ppi is considered to be sufficient when a print is
viewed under standard viewing conditions. These conditions are described in
section 1.4.1 of chapter 1 and section 5.2.2 of chapter 5. Print resolution in
ppi should not be confused with the printer resolution in dpi (see below).
• The image display size is determined by the pixel count together with the
image display resolution,
pixel count
image display size = .
image display resolution
The print size is the image display size for a hardcopy print.
• The printer resolution is a measure of the number of ink dots per unit distance
used by a printer to print an image, and is commonly measured in dots per
inch (dpi). A higher dpi generally results in better print quality. Unlike screen
resolution, printer resolution in dpi is independent from image display
resolution.
For example, a 720 × 480 pixel image will appear with an image display size equal to
10 by 6.66 inches on a computer monitor set at 72 ppi. The same image will appear
with an image print size equal to 2.4 by 1.6 inches when printed with the image
display resolution set to 300 ppi, and is independent from the printer resolution
in dpi.
Commercial printers often request images to be ‘saved at 300 ppi’ or ‘saved at
300 dpi’. Such phrases are not meaningful since neither image display resolution nor
printer resolution are properties of an image. Nevertheless, it is possible to add a ppi
resolution tag to an image. This does not alter the image pixels in any way and is
simply a number stored in the image metadata to be read by the printing software.
4-63
Physics of Digital Photography (Second Edition)
In any case, the printing software will allow this value to be overridden by providing
print size options.
It is likely that the commercial printer intends to print the image with a 300 ppi
print resolution. In the absence of any further information such as the final print size,
it is advisable to leave the pixel count unchanged, add a 300 ppi resolution tag to the
image, and rely on the printing software used by the client to take care of any
resizing. However, if the final print size is already known then quality can be
optimized by resizing the image in advance using more specialised software. This
will change the pixel count according to the following formula:
required pixel count = required print size × image display resolution.
For example, if the image print size will be 12 × 8 inches and the image display
resolution needs to be 300 ppi, then the required pixel count is 3600 × 2400. If the
image pixel count is currently 3000 × 2000, then the pixel count needs to be increased
through resampling. The mathematics of resampling is discussed in section 5.7 of
chapter 5.
Adobe® Photoshop® provides the ‘Image Size’ option shown in figure 4.20. For
the present example, one way of using this feature is to set the required width and
height to be 3600 pixels and 2400 pixels in ‘Pixel Dimensions’ with ‘Resolution’ set
to 300 ppi. In this case the ‘Document Size’ will automatically change to 12 × 8
inches. Alternatively, the width and height in ‘Document Size’ can be set to 12 inches
and 8 inches, respectively. In this case the ‘Pixel Dimensions’ will automatically
change to 3600 × 2400, and selecting ‘OK’ will perform the resampling.
Figure 4.20. ‘Image Size’ control box available in Adobe® Photoshop® CS5.
4-64
Physics of Digital Photography (Second Edition)
References
[1] International Color Consortium 2010 Image Technology Colour Management—Architecture,
Profile Format, and Data Structure Specification ICC.1:2010 (profile version 4.3.0.0)
[2] Commission Internationale de l’Eclairage 2006 Fundamental Chromaticity Diagram with
Physiological Axes–Part 1 CIE 170-1:2006
[3] Ohta N and Robertson A R 2005 Colorimetry: Fundamentals and Applications (New York:
Wiley)
[4] Hunt R W G and Pointer M R 2011 Measuring Colour 4th edn (New York: Wiley)
[5] Hunt R W G 1997 The heights of the CIE colour-matching functions Color Res. Appl. 22 355
[6] Stewart S M and Johnson R B 2016 Blackbody Radiation: A History of Thermal Radiation
Computational Aids and Numerical Methods (Boca Raton, FL: CRC Press)
[7] Kim Y-S, Cho B-H, Kang B-S and Hong D-I 2006 Color temperature conversion system and
method using the same US Patent Specification 7024034
[8] Robertson A R 1968 Computation of correlated color temperature and distribution
temperature J. Opt. Soc. Am. 58 1528
[9] McCamy C S 1992 Correlated color temperature as an explicit function of chromaticity
coordinates Color Res. Appl. 17 142
[10] Hernández-Andrés J, Lee R L and Romero J 1999 Calculating correlated color temperatures
across the entire gamut of daylight and skylight chromaticities Appl. Opt. 38 5703
[11] Sato K 2006 Image-processing algorithms Image Sensors and Signal Processing for Digital
Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 8
[12] Coffin D 2015 private communication
[13] Li X, Gunturk B and Zhang L 2008 Image demosaicing: a systematic survey Visual
Communications and Image Processing Proc. SPIE 6822 68221J
[14] Chang E, Cheung S and Pan D Y 1999 Color filter array recovery using a threshold-based
variable number of gradients Sensors, Cameras, and Applications for Digital Photography
Proc. SPIE 3650 36–43
[15] Lin C-k 2004 Pixel grouping for color filter array demosaicing (Portland State University)
[16] Hirakawa K and Parks T W 2005 Adaptive homogeneity-directed demosaicing algorithm
IEEE Trans. Image Process. 14 360
[17] Luther R 1927 Aus dem Gebiet der Farbreizmetrik (On color stimulus metrics) Z. Tech.
Phys. 12 540
[18] Hung P-C 2006 Color theory and its application to digital still cameras Image Sensors and
Signal Processing for Digital Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/
Taylor & Francis) ch 7
[19] International Organization for Standardization 2012 Graphic Technology and Photography-
Colour Target and Procedures for the Colour Characterisation of Digital Still Cameras
(DCSs) ISO 17321-1:2012
[20] Holm J 2006 Capture color analysis gamuts Proc. 14th Color and Imaging Conference IS&T
(Scottsdale, AZ) (Springfield, VA: IS&T) pp 108–13
[21] International Organization for Standardization 2009 Photography–Electronic Still-Picture
Cameras–Methods for Measuring Opto-electronic Conversion Functions (OECFs), ISO
14524:2009
[22] Hung P-C 2002 Sensitivity metamerism index for digital still camera Color Science and
Imaging Technologies Proc. SPIE 4922
4-65
Physics of Digital Photography (Second Edition)
4-66
IOP Publishing
Chapter 5
Camera image quality
This chapter discusses the theory and interpretation of fundamental camera image
quality (IQ) metrics. Unlike conventional treatments, this chapter also demon-
strates how camera IQ metrics should be applied and interpreted when comparing
cameras with different pixel counts and when performing cross-format compar-
isons between camera systems based on different sensor formats [1]. Various
photographic techniques for utilizing the full IQ potential of a camera are also
discussed.
Many aspects of IQ are subjective in that different observers will value a given
aspect of IQ differently. For example, image noise deemed unacceptable by one
observer may be judged to be aesthetically pleasing by another. Furthermore,
observers may judge the relative importance of various IQ aspects differently.
Nevertheless, objective IQ metrics are useful when determining the suitability of a
given camera system for a given application. Objective IQ metrics that describe
camera system capability include the following:
• Camera system modulation transfer function (MTF) and lens MTF.
• Camera system resolving power (RP) and lens RP.
• Signal-to-noise ratio (SNR).
• Raw dynamic range (DR).
In order to clarify the distinction between the two different sets of metrics listed
above, consider the example of camera system RP, which is often confused with
perceived image sharpness. Camera system RP is defined as the highest spatial
frequency that the camera and lens combination can resolve. However, this spatial
frequency will not be observed in an output photograph viewed under standard
viewing conditions, which include a standard enlargement and viewing distance. In
this case, contrast reproduction described by the camera system MTF at much lower
spatial frequencies will have a much greater impact upon perceived image sharpness.
A camera with lower camera system RP could produce images that are perceived as
sharper provided it offers superior MTF performance over the relevant range of
spatial frequencies.
Furthermore, IQ metrics can be misleading when comparing different camera
models. A simple example is SNR per photosite or sensor pixel. If the cameras being
compared have sensors with different pixel counts but are otherwise identical, the
sensor with larger photosites would naturally yield a higher SNR per photosite since
SNR increases with light-collecting area. However, a similar gain in SNR could be
achieved for the camera with smaller photosites simply by downsampling the output
digital image so that the pixel counts match. Accordingly, it is more appropriate to
compare SNR at a fixed spatial scale when comparing camera models.
Finally, it is important to appreciate that identical exposure settings will not yield
photographs with the same appearance characteristics when using cameras based on
different sensor formats since the total light received by each camera will not be the
same. Cross-format IQ comparisons should be based on a framework where each
camera receives the same amount of light. This requires equivalent rather than
identical exposure settings on each format. When equivalent exposure settings are
used, real-world IQ differences arise from characteristics of the underlying camera
and lens technologies being compared rather than the total light received. It will
become apparent that a larger format only provides a theoretical IQ advantage
when equivalent exposure settings are unachievable on the smaller format.
This chapter begins by presenting the theory behind cross-format camera IQ
comparisons. Subsequently, observer RP and the circle of confusion (CoC) are
discussed. Since the CoC relates to the viewing conditions, it is important to consider
the CoC when interpreting the camera system IQ metrics discussed in the later
sections. The final section of the chapter discusses IQ in relation to photographic
practice.
5-2
Physics of Digital Photography (Second Edition)
The same AFoV could be achieved by using equivalent focal lengths on each format
according to the focal-length multiplier, as described in section 1.3.6 of chapter 1.
However, this will not lead to the same DoF, the reason being that DoF depends
upon the CoC diameter, and this is format dependent for the same image display
dimensions. Another explanation for the different DoF is that the larger sensor
format will collect a greater amount of light when identical exposure settings are
used on each format. It is also instructive to note that this will result in higher total
image noise for the smaller format, assuming the sensors are equally efficient in
terms of quantum efficiency (QE) and signal processing.
However, cross-format IQ comparisons should be based on the same total
amount of light, and this requirement leads to a definition of equivalent photographs.
These are defined as photographs taken on different formats that have the following
same-appearance characteristics:
(1) Perspective.
(2) Framing or AFoV.
(3) Display dimensions.
(4) DoF.
(5) Exposure duration or shutter speed.
(6) Mid-tone JPEG lightness.
Equivalent photographs are produced using equivalent exposure settings rather than
identical exposure settings. Along with the use of equivalent focal lengths to obtain
the same AFoV, equivalent exposure settings require that equivalent f-numbers and
equivalent ISO settings be used on each format. These are defined as follows:
5-3
Physics of Digital Photography (Second Edition)
f1 N1 S1
f2 = , N2 = , S2 = . (5.1)
R R R2
The 1 and 2 subscripts, respectively, denote the larger and smaller formats. The
focal-length multiplier, crop factor or equivalence ratio R is defined as follows:
d1
R= . (5.2)
d2
Here d1 is the length of the larger sensor diagonal and d2 is the length of the smaller
sensor diagonal. For example, if d1 = d then d2 = d/R.
Consider a lens with focal length f1 = 100 mm used on the 35 mm full-frame
format. For an example set of exposure settings on the 35 mm full frame format,
table 5.1 lists the equivalent focal lengths and equivalent exposure settings for a
selection of smaller sensor formats.
Finally, note that precise equivalence can only be obtained if the sensor formats
have the same aspect ratio. Furthermore, it will be shown in sections 5.1.2 and 5.1.3
that the above equivalence formulae are strictly valid only when focus is set at
infinity and that generalised equations are needed at closer focus distances.
Table 5.1. Example equivalent exposure settings when focus is set at infinity. Focal lengths and f-numbers
have been rounded to one and two decimal places, respectively.
Format R f (mm) N S
5-4
Physics of Digital Photography (Second Edition)
It follows that a smaller format will in principle provide the same IQ as a larger
format when equivalent exposure settings are used. In this case:
• Real-world noise differences will arise from contrasting sensor technologies,
which are mainly characterised by the QE and read noise.
• Real-world resolution differences will arise from factors such as lens aberra-
tion content and sensor pixel count.
• Other real-world factors that affect IQ include the default JPEG tone curve
and image processing.
As an example of the first type, consider an action photography scenario where the
lens on the larger format is set at a low f-number to isolate the subject from the
background and provide an exposure duration short enough to freeze the appear-
ance of the moving subject. If an equivalent photograph is attempted using the
smaller format but an equivalent f-number does not exist, the smaller format will
produce a photograph with a deeper DoF and will be forced to underexpose in order
to match the exposure duration used on the larger format. In this case the extra
exposure utilised by the larger format will in principle lead to a higher overall SNR.
Furthermore, the cut-off frequency due to diffraction will be higher on the larger
format since the lens EP diameter will be larger. This potentially offers greater image
resolution.
As an example of the second type, consider a landscape photography scenario
where the larger format camera is set at the base ISO setting. If an equivalent
photograph is attempted using the smaller format but an equivalent ISO setting does
not exist, then the smaller format will be unable to produce a photograph with a
sufficiently long exposure duration without overexposing, in which case the mid-
tone lightness of the output JPEG image will be incorrect. In other words, the
smaller format will be unable to provide sufficient long-exposure motion blur. In this
5-5
Physics of Digital Photography (Second Edition)
case the extra exposure utilised by the larger format will again in principle lead to a
higher overall SNR.
Here b1 and b2 are the bellows factors for the larger and smaller formats,
respectively. These are defined by equations (1.31) and (1.19) of chapter 1,
f1 f2
b1 = 1 + and b2 = 1 + .
m p(s1 − f1 ) m p(s2 − f2 )
Significantly, b1 ≠ b2 whenever focus is set closer than infinity since f1 ≠ f2. The
bellows factors are equal only in the special case that focus is set at infinity. In this
limit, b1 = b2 = 1 and so equation (5.3) reduces to equation (5.1).
However, a major problem with the practical application of equation (5.3) arises
from the fact that working f-numbers and working focal lengths depend on the OP
distance, i.e. the distance to the plane upon which focus is set. Unless focus is set at
infinity, the working values are not equal to the values actually marked on the lenses
of the camera formats being compared.
References [1, 2] solve this problem by reformulating equation (5.3) in the
following way:
f1 N1 S1
f2 = , N2 = , S2 = . (5.4)
Rw Rw R2
Here Rw is a new quantity known as the working equivalence ratio. This quantity
replaces the conventional equivalence ratio or ‘crop factor’, R. It is defined as
follows:
5-6
Physics of Digital Photography (Second Edition)
⎛b ⎞
R w = ⎜ 2 ⎟R. (5.5)
⎝ b1 ⎠
5-7
Physics of Digital Photography (Second Edition)
The example equivalent exposure settings for a 100 mm lens on the 35 mm full-
frame format listed in table 5.1 are repeated on the left-hand side (LHS) of table 5.2.
The right-hand side (RHS) of table 5.2 shows the required equivalent exposure
settings at 1:1 magnification (∣m∣ = 1), and m p = 1. From the formula
∣m∣ = f /(s − f ), this corresponds to an OP distance s = 2f1 = 200 mm.
The results shown in the RHS of table 5.2, which were calculated using equations
(5.4) and (5.5), are seen to be very different compared with those on the LHS, which
were obtained using the conventional equivalence equations (5.1) and (5.2) with
focus set at infinity. Much larger equivalent focal lengths and equivalent f-numbers
are required on the smaller formats than would be expected using the conventional
Figure 5.1. Rw /R plotted as a function of OP distance for a selection of sensor formats when format 1 is 35 mm
full frame. The OP distance has been expressed in units of the focal length used on format 1.
Table 5.2. The LHS shows example equivalent exposure settings (f, N, S) calculated using equations (5.1) and
(5.2) with focus set at infinity. The RHS uses equations (5.4) and (5.5) to show how the required settings change
when focus is set at an OP distance s = 200 mm, which corresponds to the macro regime (∣m∣ = 1 and mp = 1).
Since focus is set closer than infinity, these settings are related via Rw rather than R. Focal lengths and
f-numbers have been rounded to one and two decimal places, respectively.
5-8
Physics of Digital Photography (Second Edition)
5-9
Physics of Digital Photography (Second Edition)
• (4) The same DoF: This section derives the condition for producing images
with the same DoF (at the same perspective and framing) from different
formats, specifically a formula relating the equivalent f-numbers required. It
is shown that the equivalent f-numbers are related by Rw rather than R when
focus is set closer than infinity. It is also proven that the same EP diameter is
required on each format.
• (5) The same shutter speed: This section discusses the requirement that
equivalent photographs must be produced using the same exposure duration
or shutter speed.
• (6) The same mid-tone JPEG lightness: This section derives the condition for
producing output images from different formats with the same mid-tone
lightness, specifically a formula relating the equivalent ISO settings required
on each format.
5-10
Physics of Digital Photography (Second Edition)
OP H EP H XP SP
s1 f1
sEP,1
OP H EP H XP SP
s2 f2
sEP,2
Figure 5.2. For a general pupil magnification, the vector distance s1 − s EP,1 for the larger format (upper figure)
and vector distance s2 − s EP,2 for the smaller format (lower figure) must be equal in order to achieve the same
perspective and framing. In the diagram, OP is the object plane, H and H’ are the first and second principal
planes of the compound lens, EP is the entrance pupil, XP is the exit pupil, SP is the sensor plane, and f ′ is the
rear effective focal length. The principal planes and pupils are not required to be in the order shown.
5-11
Physics of Digital Photography (Second Edition)
The bellows factor depends upon both m p and the magnification ∣m∣ defined
by equation (1.19) of chapter 1,
f
∣m∣ = .
s−f
Recall from chapter 1 that when focus is set at infinity, s → ∞ and so ∣m∣ → 0
and b → 1. At closer focus distances (i.e. as the OP upon which focus is set is
brought forward from infinity) and assuming the use of a traditional-
focusing lens, the value of b gradually increases from unity. For a fixed
framing, the AFoV therefore becomes smaller. Consequently, the object
appears to be larger than expected, particularly at close-focusing distances.
Different behaviour may occur for internallyfocusing lenses; these change
their focal length at closer focus distances and so the ‘new’ focal length must
be used in the bellows factor and AFoV formulae. In all cases, s is the OP
distance measured from the first principal plane after focus has been set.
Consider format 1 with a sensor diagonal d and lens with front effective
focal length f1 focused at an OP distance s1 measured from the first principal
plane. The AFoV and bellows factor are
d ∣m1∣
α1 = 2 tan−1 , b1 = 1 + .
2 b1 f1 mp
Now consider format 2 with a smaller sensor diagonal d/R, where R is the
equivalence ratio introduced defined by equation (5.2),
d1
R= .
d2
Assume the lens has front effective focal length f2 and is focused on the same
OP positioned a distance s2 from the first principal plane. In this case, the
AFoV and bellows factor are defined as follows:
d ∣m2∣
α2 = 2 tan−1 , b2 = 1 + . (5.7)
2 R b2 f2 mp
The requirement that the two systems have the same AFoV demands that
α1 = α2 and therefore
b1 f1 = R b2 f2 .
This can be rewritten as follows:
f1,w
f2,w = , (5.8)
R
where f1,w = b1 f1 and f2,w = b2 f2 are the equivalent working focal lengths.
As discussed in section 5.1.2, practical application of equivalence theory
requires use of the actual focal lengths marked on the lenses of the camera
5-12
Physics of Digital Photography (Second Edition)
formats being compared rather than the working values. The way forward is
to rearrange equation (5.8) in the following manner:
f1
f2 = . (5.9)
Rw
Here f1 and f2 are the equivalent focal lengths as marked on the lenses, and
Rw is the ‘working’ equivalence ratio defined by equation (5.5) as previously
introduced in section 5.1.2,
⎛b ⎞
R w = ⎜ 2 ⎟R .
⎝ b1 ⎠
5-13
Physics of Digital Photography (Second Edition)
⎛ m c,1 ⎞
R w = ⎜⎜1 − ⎟⎟R .
⎝ pc,1 ⎠
The correction mc,1 arises due to the differing system magnifications, and the
correction pc,1 arises for a non-unity pupil magnification. These corrections
are defined by the following expressions:
⎛ R − 1 ⎞ f1
m c,1 = ⎜ ⎟
⎝ R ⎠ s1
f1
pc,1 = m p + (1 − m p) .
s1
If f2 is known instead, which is the marked focal length used on the smaller
format, then s1 and f1 need to be eliminated from equation (5.11). Algebraic
manipulation leads to the following result:
R
Rw =
⎛m ⎞ .
1 + ⎜ p c,2 ⎟R
⎝ c,2 ⎠
Again the correction mc,2 arises due to the differing system magnifications,
and the correction pc,2 arises for a non-unity pupil magnification. These are
defined by the following expressions:
⎛ R − 1 ⎞ f2
m c,2 = ⎜ ⎟
⎝ R ⎠ s2
f2
pc,2 = m p + (1 − m p) .
s2
For the special case of a symmetric lens design with m p = 1, the separation
terms defined by equation (5.10) vanish and the terms pc,1 and pc,2 are both
unity. In this case, the OP distances measured from the first principal plane
will be identical for each format when equivalent photos are taken, and so
s1 = s2 = s.
5-14
Physics of Digital Photography (Second Edition)
1 1
c1 ∝ , c2 ∝ .
X1 X2
This is discussed in much greater detail in sections 5.2.3 and 5.2.4.
Since X2 = RX1 where R = d1/d2 is the equivalence ratio between the
sensor diagonals, it follows that the relationship between the CoC diameters
for the respective formats being compared is given by
c1
c2 = . (5.12)
R
This important result will be utilized below.
(4) The same depth of field
The total DoF can be expressed using equation (1.40) of chapter 1,
2 h(s − f )(s − s EP )
total DoF = ,
h 2 − (s − f ) 2
where
Df
h= .
c
Here D is the EP diameter and c is the CoC diameter. This equation is
strictly valid only for s ⩽ H where H = h + f − sEP is the hyperfocal distance
measured from the EP. When s = H, the rear DoF and therefore the total
DoF both extend to infinity.
Consider camera system 1 with sensor diagonal d, focal length f1, EP
diameter D1, CoC diameter c1, and consider an OP positioned at a distance
s1 from the first principal plane. The total DoF is given by
2 h1(s1 − f1 )(s1 − s EP,1)
DoF1 = ,
h12 − (s1 − f1 )2
where
D1 f1
h1 = .
c1
Also consider camera system 2 with a smaller sensor diagonal d/R, focal
length f2, EP diameter D2, CoC diameter c2, and consider the same OP
positioned at a distance s2 from the first principal plane. The total DoF is
given by
2 h2(s2 − f2 )(s2 − s EP,2 )
DoF2 = , (5.13)
h 22 − (s2 − f2 )2
where
D2 f2
h2 = . (5.14)
c2
5-15
Physics of Digital Photography (Second Edition)
Since the same perspective and framing are required at an arbitrary focus
distance, the relationship between f2 and f1 must satisfy equation (5.9),
f1
f2 = ,
Rw
where Rw is the working equivalence ratio. Furthermore, the fact that
equivalent photographs must have the same display dimensions requires
that the CoC diameters be related by equation (5.12),
c
c2 = 1 .
R
Substituting the above expressions for c2 and f2 into equation (5.14) yields
RD2 f1
h2 = .
R wc1
Finally, substituting f2 and h2 into equation (5.13) and working through the
algebra leads to the following result:
DoF2 = DoF1
provided the following condition is satisfied:
D2 = D1 . (5.15)
5-16
Physics of Digital Photography (Second Edition)
This requirement does not specify an appropriate shutter speed, but merely
states that it must be the same for each camera format. The required shutter
speed depends upon the nature of the scene luminance distribution and the
exposure strategy.
(6) The same mid-tone JPEG lightness
Recall from section 2.6.2 of chapter 2 that the ISO setting determines the
sensitivity of the JPEG output to incident photometric exposure, specif-
ically the mid-tone value with digital output level (DOL) = 118, which
corresponds to middle grey. Since different formats receive different
levels of photometric exposure when equivalent photographs are taken,
the resulting mid-tone lightness of the digital images will not be the same
unless equivalent ISO settings are used rather than the same ISO settings.
In order to derive the ISO equivalence relationship, first recall from equation
(5.15) that the EP diameters on each format are the same when equivalent
photograpsh are taken. Since the shutter speeds are also required to be the same,
it follows that equivalent photographs are produced using the same total amount of
light. More specifically, the total luminous energy Q incident at the SP of both
formats will be the same:
Q1 = Q2 . (5.17)
In order to rigorously derive this result within Gaussian optics, consider the
photometric exposure at an infinitesimal area element on the SP. This is defined
by the camera equation derived in section 1.5 of chapter 1:
π t
H= L 2 T cos4 φ . (5.18)
4 Nw
Here L is the scene luminance at the corresponding scene area element. The cosine
fourth factor describes the natural reduction in illuminance at the SP that occurs
away from the optical axis (OA), and it depends upon the angle subtended by the EP
from the scene area element being considered. The factor T is the lens transmittance
factor, and t is the shutter speed. Recall that the working f-number Nw depends on
the OP distance and can be expressed in the form Nw = bN, where b is the bellows
factor. When focus is set at infinity, b → 1 and the working f-number reduces to the
f-number, Nw → N.
Now consider a larger format labelled format 1 and a smaller format labelled
format 2. From equation (5.18), the photometric exposure H1 at an infinitesimal area
element dA1 on the larger format sensor with focus set on a specified OP is given by
π t
H1 = L T cos4 φ . (5.19)
4 Nw2,1
5-17
Physics of Digital Photography (Second Edition)
As illustrated in figure 5.3, the area A2 of the smaller format sensor is a factor R2
smaller than A1:
A1 dA1
A2 = , dA2 = . (5.24)
R2 R2
Substituting equations (5.22) and (5.24) into (5.23) proves that the total luminous
energies incident at the SP of both camera formats are equal when equivalent
photographs are taken, which is consistent with equation (5.17).
Now consider the arithmetic average photometric exposures 〈H1〉 and 〈H2〉 for
both formats. These are defined as
〈H1〉 = Q1/ A1, 〈H2〉 = Q2 / A2 .
5-18
Physics of Digital Photography (Second Edition)
sensor 1
sensor 2
d dy
dy /R
d/R
dx dx /R
Figure 5.3. The area of sensor format 1 is dxdy and the area of sensor format 2 is (dx /R )(dy /R ). Format 2 is
therefore a factor R2 smaller than format 1.
As described in chapter 2, the product of the arithmetic average exposure with the
exposure index specified by the ISO setting S defines a photographic constant P that
is independent of sensor format. The ISO 12232 standard [4] uses P = 10, which
indirectly implies an average scene luminance of approximately 18% for a typical
photographic scene. This means that
〈H1〉S1 = 〈H2〉S2 = P. (5.26)
Now combining equations (5.25) and (5.26) yields the required ISO equivalence
relationship:
S1
S2 = . (5.27)
R2
The required ISO setting on the smaller format is therefore a factor R2 lower than
the required ISO setting on the larger format when equivalent photographs are
taken. Equation (5.27) holds when focus is set at any chosen OP distance provided
equivalent f-numbers and focal lengths are being used.
As described in sections 2.2.3 and 2.6.2 of chapter 2, Japanese camera manu-
facturers are required to use the standard output sensitivity (SOS) method to
determine camera ISO settings [4, 5]. The SOS method is based on a measurement
of the photometric exposure required to map 18% relative luminance to DOL = 118
in an output 8-bit JPEG file encoded using the sRGB colour space. This DOL
corresponds with middle grey (50% lightness) on the standard encoding gamma
curve of the sRGB colour space, and so 18% relative luminance will always map to
50% lightness in the output JPEG file, irrespective of the shape of the JPEG tone
curve used by the in-camera image-processing engine.
In section 2.5 of chapter 2 it was argued that the value of the photographic
constant P = 10 corresponds to assuming that the average scene luminance will be
approximately 18% of the maximum for a typical photographic scene metered using
average photometry. This means that the average scene luminance will map to
5-19
Physics of Digital Photography (Second Edition)
middle grey in the output JPEG file provided SOS is used to define the ISO setting.
When equivalent photographs are taken using different formats, the average
photometric exposures at the SP of the formats are related via equation (5.25).
Consequently, the mid-tone lightness of equivalent photographs will correspond to
the standard value of middle grey provided equivalent ISO settings are used on the
respective formats according to equation (5.27).
5-20
Physics of Digital Photography (Second Edition)
equal width. As the width of the lines decreases, the stripes eventually become
indistinguishable from a grey block.
• The least resolvable separation (LRS) in mm per line pair is the minimum
distance between the centres of neighbouring white stripes or neighbouring
black stripes when the pattern can still just be resolved by the eye.
• Observer resolving power (observer RP) is the reciprocal of the LRS [6] and is
measured in line pairs per mm (lp/mm),
1
RP = .
LRS
The term ‘high resolution’ should be interpreted as referring to a high RP and a
small LRS.
Observer RP depends upon the viewing distance. The distance at which observer
RP is considered to be at its optimum is known as the least distance of distinct vision,
Dv , which is generally taken to be 250 mm or 10 inches [7, 8]. However, observer RP
also depends upon the visual acuity of the individual and the ambient conditions. At
Dv , camera and lens manufacturers typically assume a value of around 5 lp/mm
when defining DoF scales [8].
Note that an observer RP of 5 lp/mm corresponds to 127 lp/inch, or equivalently
254 ppi (pixels per inch) for a digital image. This is the reason that 300 ppi is
considered sufficient image display resolution for a high quality print viewed under
the standard viewing conditions described in the following section.
The standard value for the enlargement factor is based upon the 60° cone of vision
defines the limits of near peripheral vision. At the least distance of distinct
vision Dv = 250 mm, the cone of vision roughly forms a circle of diameter
2Dvtan 30° = 288 mm. If it is assumed that the width of the viewed image corresponds
with this diameter, then the enlargement factor from a 35 mm full-frame sensor will
be 8. This is shown in figure 5.4, where the full-frame sensor dimensions (36 × 24 mm)
have been enlarged by a factor of 8. The closest paper size that accommodates a print
of this size is A4.
5-21
Physics of Digital Photography (Second Edition)
image
288 mm
image
30◦
eye
288 mm Dv = 250 mm
Figure 5.4. A viewing circle of approximately 60° diameter defines the limits of near peripheral vision. This is
shown in relation to a 3 × 2 image (green) printed on A4 (210 × 297 mm) paper (black border) viewed at the
least distance of distinct vision Dv .
If the observer can resolve 5 lp/mm when viewing the output image at Dv and the
enlargement factor is 8, then the observer RP projected down to the sensor
dimensions becomes 40 lp/mm [8]. Mathematically,
RP(sensor dimensions) = RP(print viewed at Dv ) × X . (5.28)
Here ‘print’ refers to an image either printed or viewed on a display, and the value
RP(sensor dimensions) refers to the observer RP and not the camera system RP.
Nevertheless, using the 35 mm full-frame format as an example, it is clear that the
camera system does not need to resolve a line pair of width less than the value RP
(sensor dimensions) = 40 lp/mm when the output image is viewed under standard
viewing conditions. This detail cannot be resolved by the observer when the image
has been enlarged by a factor of 8 and viewed at Dv = 250 mm.
1.22
c= . (5.29)
RP(sensor dimensions)
5-22
Physics of Digital Photography (Second Edition)
LRS
c
Figure 5.5. The required CoC diameter on the sensor is slightly wider than the least resolvable separation
(LRS) between neighbouring like stripes. A convolution of the CoC with the line pattern renders the stripes
unresolvable to an observer of the output photograph under the specified viewing conditions.
This relationship is illustrated graphically in figure 5.5, and a derivation of the above
expression is given in section 5.2.5. Details separated by a distance less than the CoC
diameter on the SP cannot be resolved by the observer of the output image.
Recall from the previous section that RP(sensor dimensions) = 40 lp/mm for a
35 mm full-frame camera under standard viewing conditions. In this case,
c = 0.030 mm according to equation (5.29). Smaller or larger sensors require a
smaller or larger diameter c, respectively, since the enlargement factor X in equation
(5.28) will change accordingly. Table 1.2 of chapter 1 lists standard CoC diameters
for various sensor formats. These values correspond to standard viewing conditions.
1.22 × L
c= . (5.30)
RP(print viewed at Dv ) × X × Dv
Here L is the known viewing distance and X is the known enlargement factor.
Evidently the CoC diameter, c, is proportional to the viewing distance and inversely
proportional to the enlargement factor. If these scale in the same manner, then c will
remain constant. If only the viewing distance increases, then c will increase and more
defocus blur can be tolerated.
If L is reduced and X is increased, the viewed image may no longer fit within a
comfortable viewing circle. For example, a poster sized print situated close to the
observer cannot be accommodated by the cone of vision. In this case, the CoC
diameter will be very small. The extreme case of an image viewed at 100% on a
computer display may be beyond the practical limit for a CoC [9].
5-23
Physics of Digital Photography (Second Edition)
1.22
c= .
RP(sensor dimensions)
Here the CoC has been treated as a uniform blur circle within Gaussian optics. Given
a line pattern on the SP with spatial frequency equal to RP(sensor dimensions), a
convolution of the line pattern with the CoC will render the pattern unresolvable.
In order to derive the above equation, the CoC needs to be treated mathemati-
cally as a circle or ‘circ’ function. A circle function was used to describe the lens XP
in section 3.2.4 of chapter 3, and it was found that the optical transfer function
(OTF) corresponding to a circle function is a jinc function. The OTF for the CoC is
analogously given by the following expression:
OTFCoC(πcμr ) = jinc(πcμr ).
Here μr is the radial spatial frequency on the SP. The OTF for an example CoC is
plotted in figure 5.6. If the first zero of the OTF defines the CoC cut-off frequency
μc,CoC , then
πc μc,CoC = 1.22 π ,
and equation (5.29) then follows. Using the first zero of the OTF to define μc,CoC is
equivalent to assuming that an MTF value of 0% corresponds with the observer RP
projected down to the sensor dimensions, where MTF = ∣OTF∣.
According to the above analysis, the required CoC diameter is slightly wider than
the narrowest line pair that needs to be resolved at the SP. This is illustrated
graphically in figure 5.5. However, there is latitude in the value of the numerator of
equation (5.29), depending on the criterion used for the cut-off frequency, μc,CoC . For
example, the 1.22 factor in the numerator can be dropped if RP(sensor dimensions)
Figure 5.6. OTF for a uniform CoC with diameter c = 0.030 mm. The cut-off frequency is found to be
40 lp/mm.
5-24
Physics of Digital Photography (Second Edition)
is instead defined as the spatial frequency at which the MTF value drops to 20%
rather than zero. In this case, μc,CoC would become
1
μc = .
c
In fact, both definitions are approximate since observer RP is measured using a solid
line pattern rather than a sinusoidal line pattern. The contrast transfer function
(CTF) describes the frequency response of a square wave pattern, and this can be
related to the MTF of a sinusoidal pattern through Coltman’s formula [10],
4⎛ MTF(3μ) MTF(5μ) ⎞
CTF(μ) = ⎜MTF(μ) − + + ⋯⎟.
π⎝ 3 5 ⎠
In this book, equation (5.29) is used to define the CoC diameter. In practice, the
criterion used for the cut-off frequency is much less significant than the value
assumed for the observer RP. When comparing different sensor formats, the relative
size of the CoC is the most important factor.
Denoting the depth of focus by W, combining the above equations leads to the
following result:
⎛ n′ ⎞
W= ⎜ ⎟2 cNw .
⎝n ⎠
5-25
Physics of Digital Photography (Second Edition)
OP EP H XP H SP
n n
DXP c
s s
sXP
Figure 5.7. Geometry defining the depth of focus W. Here D XP is the XP diameter, s′ is the image distance
′ is the distance from the second principal plane to the XP, m is
measured from the second principal plane, s XP
the Gaussian magnification, m p is the pupil magnification, and f ′ is the rear effective focal length.
Recall that a PSF can be thought of as a blur filter. The convolution operation
used in linear systems theory slides the PSF over the ideal (unblurred) image to
produce the real convolved (blurred) output image. The shape of a given PSF
determines the nature of the blur that it contributes, and most of the blur strength is
concentrated in the region close to the ideal image point. However, PSFs are difficult
to describe in simple numerical terms.
For incoherent lighting, the image at the SP can be interpreted as a linear
combination of sinusoidal irradiance waveforms of various spatial frequencies. This
suggests that aspects of IQ can be described by the imaging of sinusoidal target
objects such as stripe patterns. Significantly, the real image of a sinusoid will always
be another sinusoid, irrespective of the shape of the PSF. Furthermore, the direction
and spatial frequency of the real sinusoidal image will remain unchanged [7].
However, its contrast or modulation will be reduced compared to that of the ideal
image formed in the absence of the PSF. Since the MTF is the modulus of the FT of
the PSF, the MTF precisely describes how the modulation is attenuated as a
function of spatial frequency.
The MTF provides a useful quantitative description of IQ. As well as lenses,
various other components of the camera system can be described by their own MTF,
and the system MTF can be straightforwardly calculated by multiplying these
individual component MTFs together. The system MTF can be used to define
system RP, along with other metrics such as perceived image sharpness.
This section begins with a very brief description of lens aberrations, which
profoundly affect the nature of the lens PSF and various aspects of the correspond-
ing lens MTF. This is followed by an introduction to lens MTF plots. Based on the
lens MTF, the final section discusses lens RP. System MTF and system RP will be
discussed in sections 5.4 and 5.5, respectively.
5-26
Physics of Digital Photography (Second Edition)
Figure 5.8. (Left) Spatial frequency representation of lens MTF for three selected image heights. (Right) Image
height representation of lens MTF for three selected spatial frequencies. The example points indicated by
circles, squares and diamonds show identical data plotted using the two different representations. (Reproduced
from [9] with kind permission from ZEISS (Carl Zeiss AG).)
5-27
Physics of Digital Photography (Second Edition)
mm
.6
12 mm 21
18 mm
tan
ge
nti
la
sagittal
Figure 5.9. Image height is defined as the radially symmetric distance from the image centre and can be
measured out to the lens circle. The short and long edges of a full-frame 36 × 24 mm sensor occur at image
heights or radial positions 18 mm and 12 mm, respectively.
full-frame format are not relevant unless the image is viewed more critically.
Accordingly, for a lens circle covering a 35 mm full frame sensor, a typical set of
spatial frequencies may include 10 lp/mm, 20 lp/mm and 40 lp/mm.
Although lens PSFs are generally not circular, the rotational symmetry of the lens
dictates that the shortest or longest elongations of the PSF will always be parallel or
perpendicular to the radius of the lens circle [7]. Different MTF curves will be
obtained depending on whether the line pattern used to measure MTF is oriented
perpendicular or parallel to the radius of the lens circle. These are known as the
tangential (or meridional) and the sagittal (or radial) directions, respectively. These
directions are indicated in figure 5.9, and usually both of these types of curve are
shown in lens MTF plots. Typically the longest elongation of the PSF is parallel to
the sagittal direction, in which case the sagittal MTF curve will have the higher
MTF. Tangential and sagittal curves that are similar in appearance are indicative of
a circular lens PSF.
The ideal Gaussian image position for the lens may not coincide precisely with the
camera SP. For example, field curvature may be present along with focus shift to
counterbalance spherical aberration (SA). Lens MTF curves are sensitive to the
plane of focus chosen because different parts of the image field will be affected
differently as the image plane is shifted [7]. Moreover, lens MTF data is dependent
upon the nature of the light used for the calculation or used to take the measure-
ment. Data representing a single wavelength can be dramatically different from data
representing polychromatic light [7].
Ideal lens MTF curves will have high values that remain constant as the image
height or radial field position changes. Such ideal curves are rarely seen in practice,
particularly at the maximum aperture or lowest f-number due to the increased
severity of the residual aberrations. Experience is required to fully interpret the
information present in lens MTF curves. Figure 5.10, which has been reproduced
from [7], shows curves for the Zeiss Planar 1.4/50 ZF lens along with the type of
5-28
Physics of Digital Photography (Second Edition)
Figure 5.10. Lens MTF for the Zeiss Planar 1.4/50 ZF as a function of 35 mm full-frame image height for
polychromatic white light. The spatial frequencies shown are 10, 20 and 40 lp/mm. The dashed curves are the
tangential orientation and the lens is focused at infinity. (Figure reproduced from [7] with kind permission.)
5-29
Physics of Digital Photography (Second Edition)
information that can be extracted. References [7, 9] provide useful guidance for
interpreting MTF curves of real lenses.
1
RP = .
LRS
Lens RP is formally defined as the spatial frequency at which the lens MTF first
drops to zero, and is again expressed using units such as lp/mm. As discussed below,
in practice a small percentage MTF value is used instead of zero.
Diffraction places a fundamental limit on achievable lens RP referred to as the
diffraction limit. Since the camera system MTF is the product of the individual
component MTFs, the RP of the camera system as a whole cannot exceed the
diffraction limit.
The diffraction limit can be defined in a precise mathematical way in the Fourier
domain as the spatial frequency at which the diffraction component of the optics or
lens MTF drops to zero for a sinusoidal target waveform. The expression for the
diffraction OTF was derived in section 3.2 of chapter 3:
⎧ ⎛ ⎞
⎪ 2 ⎜ −1 ⎛ μr ⎞ μr ⎛ μ ⎞2
⎟ for μr ⩽ 1
⎪ π ⎜cos ⎜ μ ⎟ − μ 1 − ⎜ μ ⎟
r
⎝ c⎠ ⎝ c⎠ ⎟ μc
Hdiff,circ(μr , λ) = ⎨ ⎝ c ⎠ . (5.31)
⎪ μr
⎪ 0 for >1
⎩ μc
This expression is valid for incoherent illumination so that the PSF and sinusoidal
target waveforms at the SP are linear in terms of irradiance, and the aperture has
been assumed to be circular. The diffraction MTF drops to zero at the Abbe cut-off
frequency μc,diff measured in cycles/mm. Spatial frequencies higher than μc,diff at the
SP cannot be resolved by the lens. The Gaussian expression for μc,diff is given by
⎛n⎞ 1
μc,diff = ⎜ ⎟ . (5.32)
⎝ n′ ⎠ λNw
5-30
Physics of Digital Photography (Second Edition)
This reduces to the following well-known expression when n = n′ and focus is set at
infinity:
1
μc,diff = .
λN
In chapter 3, a lens free from aberrations was described as being diffraction limited. In
this case lens performance is limited only by diffraction and so its MTF obeys equation
(5.31). For a lens with residual aberrations, the MTF value at any given spatial
frequency cannot be greater than the diffraction-limited MTF at the same frequency,
and generally will be lower. Even though the RP of an aberrated lens remains that of a
diffraction-limited lens in principle, aberrations can reduce the MTF values at high
spatial frequencies to such an extent that the effective cut-off is much lower [11].
A useful way to demonstrate this graphically is via the anticipated root mean
square (RMS) wavefront error WRMS introduced in chapter 3. This is a way of
modelling the overall effect of aberrations as a statistical average of the wavefront
error over the entire wavefront converging from the XP to the image point. A small
aberration content is classed as an RMS wavefront error up to 0.07, medium
between 0.07 and 0.25, and large above 0.25 [12]. An empirical relationship for the
transfer function corresponding to the RMS wavefront error is provided by the
following formula [12, 13]:
HATF(μn ) = 1 − {(WRMS/0.18)2 [1 − 4(μn − 0.5)2 ]}.
Here μn = μr /μc,diff is the normalised spatial frequency, and μc,diff is the Abbe cut-off
frequency. This transfer function is referred to as the aberration transfer function
(ATF) [12] or optical quality factor (OQF) [14]. The approximate lens MTF is then
defined as the product of the diffraction-limited transfer function and the ATF,
Figure 5.11. Approximate lens MTF as a function of normalised spatial frequency μ n = μr /μc,diff for a
selection of RMS wavefront error values WRMS . Aberrations are absent when WRMS = 0.
5-31
Physics of Digital Photography (Second Edition)
As illustrated in figure 5.12 in 1D and figure 5.13 in 2D, each Airy disk is centred at
the first zero ring of the other.
Now substituting a spatial frequency μr of order 1/dRayleigh into the diffraction
OTF defined by equation (5.31) above reveals that the Rayleigh criterion corre-
sponds to a diffraction-limited lens MTF value of approximately 9% [15]. In other
Figure 5.12. Rayleigh two-point resolution criterion in units of λN . The distance between the centres of the
Airy disks denoted by the pair of red vertical lines is the Airy disk radius itself, dRayleigh = 1.22 λN .
5-32
Physics of Digital Photography (Second Edition)
words, MTF values between 0% and 9% do not provide useful detail according to
the Rayleigh criterion. This suggests that an appropriate effective cut-off frequency
for defining the RP of a real aberrated lens at a given f-number, wavelength, and
field position is the spatial frequency at which the real aberrated lens MTF similarly
drops to 9%. When aberrations are present, this effective cut-off frequency will be
smaller than the Abbe cut-off frequency. Depending on the application, other
percentage criteria such as 5%, 10% or 20% may be more suitable.
At the expense of a lower lens MTF at lower spatial frequencies, an inverse
apodization filter can be used to increase lens RP [11, 15]. Nevertheless, it should be
remembered that the RP of the camera system as a whole is the relevant metric for
describing the smallest detail that can be reproduced in the output image. As discussed
in section 5.5, camera system RP may analogously be defined in terms of the spatial
frequency at which the camera system MTF drops to a small percentage value.
5-33
Physics of Digital Photography (Second Edition)
When using the spatial frequency representation for a selected image height
(radial field position), lens MTF can be more generally interpreted by using a spatial
frequency unit that is directly comparable between different sensor formats, namely
line pairs per picture height (lp/ph). This unit is related to lp/mm as follows:
lp/ph = lp/mm × ph.
In the present context, picture height (ph) measured in mm refers to the short edge of
the sensor [9]. Picture height can also used to specify the print height, in which case
the lp/mm value will downscale accordingly. Since picture height is proportional to
CoC diameter, lp/ph is a very general unit that is directly comparable between
different sensor formats provided the aspect ratios are the same.
Finally, cross-format comparisons should be made using equivalent photos where
possible, as described in section 5.1. This means that in the present context, lens
MTF should be compared using equivalent focal lengths and equivalent f-numbers.
For example, lens MTF at f = 24 mm and N = 4 on 35 mm full frame should be
compared with lens MTF at f = 16 mm and N = 2.8 on APS-C. When equivalent f-
numbers are used, the diffraction cut-off frequency will be the same on each format
when expressed using lp/ph.
5-34
Physics of Digital Photography (Second Edition)
circular PSF near the SP can arise from over-corrected SA. As discussed in section
1.4.6 of chapter 1, this leads to annular defocused PSFs that produce a nervous or
restless blurred background [7]. Some of the best lenses are characterised by the
nature of the pleasing bokeh that they produce along with other special aesthetic
character that cannot be derived from MTF curves.
Various components of a camera system such as the lens, imaging sensor and optical
low-pass filter (OLPF) provide contributions to the total camera system MTF. Note
that the camera system MTF at a given spatial frequency cannot be larger than the
diffraction MTF at the same spatial frequency. This is due to the fact that the lens
MTF contribution is bounded from above by the diffraction MTF. An exception is
when MTF contributions from digital image sharpening filters such as unsharp mask
(USM) are included as part of the camera system MTF since these MTF
contributions can take values above 100%.
The camera system MTF generally describes micro contrast and edge definition in
the output image. It is important for two main reasons.
1. Camera system RP is determined by the nature of the camera system MTF at
the highest spatial frequencies.
2. Perceived sharpness of an output digital image is determined by the nature of
the camera system MTF at spatial frequencies related to the viewing
conditions, along with the contrast sensitivity of the HVS at those spatial
frequencies. These spatial frequencies are generally of relatively low value.
For example, under the standard viewing conditions described in sections
5.2.2 and 5.3.1, the relevant spatial frequencies range from 0 to 40 lp/mm on
the SP of the 35 mm full-frame format.
Camera system RP and perceived image sharpness are discussed further in sections
5.5 and 5.6.
5-35
Physics of Digital Photography (Second Edition)
Again picture height (ph) in mm refers to the short edge of the sensor [9]. The lp/ph
unit is consistent with equivalence theory provided the formats have the same aspect
ratio. In other words, the comparison should ideally be made using equivalent focal
lengths and f-numbers, as described in section 5.1. In this case, equivalent f-numbers
lead to the same diffraction cut-off frequency expressed using lp/ph.
Another advantage of the lp/ph unit relates to the sensor Nyquist frequency, μNyq ,
introduced in chapter 3. Recall that μNyq depends upon the pixel pitch or photosite
density when expressed using cycles/mm or lp/mm units. This is not directly
comparable between different formats due to the different enlargement factors
from the sensor dimensions to the dimensions of the viewed output photograph.
However, μNyq expressed using lp/ph units depends only on photosite count rather
than photosite density or pixel pitch. For example, μNyq for a 12.1 MP APS-C sensor
will be the same as μNyq for a 12.1 MP 35 mm full-frame sensor when expressed using
lp/ph units.
Another commonly encountered unit is cycles per pixel (cy/px),
p
cy/px = lp/mm × .
1000
Here p is the pixel pitch expressed in μm. When expressed using cy/px units, the
sensor Nyquist frequency becomes μNyq = 0.5 cy/px.
5-36
Physics of Digital Photography (Second Edition)
The detection area width, dx, can be varied up to the value of the pixel pitch,
px.
3. Four-spot OLPF MTF,
MTFOLPF(μx ) = ∣cos(π sxμx )∣ .
The spot separation sx between the two spots on a single axis can vary from
the pixel pitch, px, for a maximum strength filter down to zero for a non-
existent filter.
Consider a compact camera with a 1/1.7 inch sensor format and a 12 MP pixel
count. In this case the pixel pitch px ≈ 2 μm = 0.002 mm.
• The detector cut-off frequency is μc,det = 1/0.002 = 500 lp/mm assuming that
the photosite detection area width dx = px.
• The sensor Nyquist frequency is μNyq = 0.5 × (1/0.002) = 250 lp/mm.
• For simplicity, assume that the camera system cut-off frequency, μc , is the
spatial frequency at which the camera system MTF first drops to zero. This
defines the camera system RP.
Figure 5.14(a) shows the model results at N = 2.8 in the absence of the OLPF. The
spatial frequencies have been expressed in lp/mm units. The camera system cut-off
frequency corresponds with the detector cut-off frequency at 500 lp/mm, and so the
camera system RP is limited by the detector aperture. Aliasing is present above the
sensor Nyquist frequency.
When the f-number increases to N = 8, figure 5.14(b) shows that the camera
system cut-off frequency has dropped to 228 lp/mm, and so the camera system RP is
limited by lens aperture diffraction. Furthermore, aliasing has been totally elimi-
nated since the camera system cut-off frequency has dropped below the sensor
Nyquist frequency at this f-number.
When a full-strength OLPF is included, aliasing can be minimised at any selected
f-number. For example, figure 5.14(c) shows the camera system MTF at N = 2.8
when a full-strength OLPF is included. In this case the camera system cut-off
frequency corresponds with the sensor Nyquist frequency at 250 lp/mm, and so the
5-37
Physics of Digital Photography (Second Edition)
Figure 5.14. Camera system RP illustrated using a model camera system MTF in 1D. The sensor component is
defined by the detector aperture MTF, the optics component by the diffraction MTF, and the OLPF
component by a four-spot filter MTF.
camera system RP is limited by the OLPF. The camera system MTF above the
sensor Nyquist frequency is suppressed by the detector-aperture MTF and diffrac-
tion MTF [19].
The above simple model illustrates that in order to increase camera system RP by
reducing the pixel pitch, diffraction needs to be accordingly reduced by lowering the
5-38
Physics of Digital Photography (Second Edition)
f-number so that the diffraction cut-off frequency remains higher than the sensor
Nyquist frequency. However, lenses at their lowest f-numbers or maximum
apertures generally suffer from residual aberrations and are rarely diffraction-
limited in practice. Aberrations lower the effective lens cut-off frequency as
described in section 5.3.2, particularly when measured at large radial field positions.
This prevents achievable camera system RP from reaching the diffraction limit
defined by the Abbe cut-off frequency. When diffraction dominates, sensor pixel
pitch has negligible impact upon the camera system RP. However, reducing the pixel
pitch will improve the detector-aperture component of the sensor MTF at low
spatial frequencies and this can improve edge definition and perceived sharpness.
Perceived sharpness is discussed in the following section.
Although perceived image sharpness is synonymous with camera system RP, the two
are not necessarily correlated. A high camera system RP corresponds to a high
camera system cut-off frequency, whereas high perceived sharpness arises from a
high camera system MTF at spatial frequencies to which the HVS is most sensitive.
The important spatial frequencies depend upon factors such as the viewing distance
and the contrast sensitivity of the HVS.
This means that for the same photographic scene and output photograph viewing
conditions, one camera system could produce an image with high perceived sharp-
ness but low resolution, whereas a different camera system could produce an image
with low perceived sharpness but high resolution.
For a given camera system, the photographer plays an important role in
influencing the camera system MTF and therefore perceived image sharpness. An
important example is diffraction softening controlled by the lens f-number.
Diffraction softening was introduced in chapter 3, and is discussed further in section
5.10.2. Other examples include camera shake, which contributes a jitter component
to the camera system MTF, and the resizing of the output digital image, which
provides a resampling contribution to the camera system MTF [14].
Note that digital image sharpening techniques can actually increase the overall
camera system MTF, although care should be taken not to introduce sharpening
artefacts. For example, the USM technique involves blurring the image, typically by
convolving the image with a small blur filter. The blurred image is then subtracted
from the original image to produce a mask that contains high frequency detail. The
mask is subsequently added to the original image with a weighting applied. A high
weighting can boost the high frequencies such that the MTF contribution from the
5-39
Physics of Digital Photography (Second Edition)
USM filter is greater than unity. The fact that digital image sharpening can improve
perceived sharpness but cannot improve resolution is evidence that sharpness and
resolution are not necessarily correlated [7].
Various metrics for perceived image sharpness have been developed. Examples
include:
• Modulation transfer function area (MTFA) [20, 21].
• Subjective quality factor (SQF) [22].
• Square root integral (SQRI) [23].
• Heynacher number [7].
• MTF50 [24].
• Acutance [25].
5.6.1 MTF50
Camera MTF50 is a very simple metric defined as the spatial frequency at which the
camera system MTF drops to 50% of the zero frequency value. MTF50 is influenced
by all contributions to the camera system MTF including the optics, sensor, and
image processing. A high MTF50 is associated with a camera system MTF that
remains generally high over a wide range of important spatial frequencies and is
therefore considered to be a reasonable indicator of perceived image sharpness.
MTF50P is a similar metric defined as the spatial frequency at which the camera
system MTF drops to 50% of the peak camera system MTF, Since excessive
sharpening improves MTF50 even when this leads to visible image artifacts such as
halos at edges, MTF50P is designed to remove any influence that excessive digital
image sharpening may have on the result [24].
When performing cross-format comparisons, MTF50 and MTF50P are directly
comparable between the camera systems when expressed using the line pairs per
picture height (lp/ph) unit described in section 5.3.3.
MTF50 can also be applied to lenses, in which case it should be referred to as lens
MTF50. However, section 5.3.1 showed that lens performance is best evaluated
using lens MTF at several important spatial frequencies as a function of radial field
position [7].
In order to relate MTF50 to an appropriate IQ level, MTF50 can be expressed
using units related to the print dimensions, such as line widths per inch on the print
[24]. However, this perceived sharpness rating does not take into account the
viewing distance or the relevant properties of the HVS.
5-40
Physics of Digital Photography (Second Edition)
Figure 5.15. Model camera system MTF50 as a function of f-number for two 35 mm full-frame cameras with
different sensor pixel counts. The horizontal axis uses a base 2 logarithmic scale. Lens aberrations have not
been included.
5-41
Physics of Digital Photography (Second Edition)
μ2
SQF = k ∫μ 1
MTF(μ) CSF(μ) d (ln μ), (5.33)
where MTF denotes the camera system MTF, and CSF denotes the contrast
sensitivity function of the HVS.
It was found experimentally that there is a linear correlation between subjective
IQ and just-noticeable differences. This suggests that the above integration should
be carried out on a logarithmic scale with respect to spatial frequency,
dμ
d (ln μ) = .
μ
Furthermore, spatial frequency μ is expressed in cycles/degree units. The contrast
sensitivity function CSF(μ) describes the sensitivity of the HVS to contrast as a
function of μ. The CSF used in the original definition of SQF was taken to be unity
between 3 and 12 cycles/degree,
μ1 = 3
μ2 = 12.
In this case SQF is simply the area under the camera system MTF curve calculated
between 3 and 12 cycles/degree when the MTF is similarly expressed in cycles/degree
units and plotted on a logarithmic scale. The constant k is a normalisation constant
that ensures a constant MTF with value unity will yield SQF = 100,
μ2
100 = k ∫μ1
CSF(μ) d (ln μ).
In order to take into account the viewing distance, the cycles/degree units need to be
related to the cycles/mm units used at the SP. First consider the viewed output print.
Denoting θ as the angle subtended by the eye with one spatial cycle p on the print,
the spatial frequency μ(cycles/degree) at the eye and spatial frequency μ(cycles/mm)
on the print are specified by 1/θ and 1/p, respectively. From figure 5.16,
π
p = 2l tan(θ /2) ≈ lθ × ,
180
where l is the distance from the observer to the print, and the factor on the right-
hand side converts radians into degrees. Therefore,
180
μ(cycles/mm on print) = μ(cycles/degree) × .
lπ
p θ eye
l
Figure 5.16. Geometry used to relate cycles/degree at the eye to cycles/mm on the print.
5-42
Physics of Digital Photography (Second Edition)
This can now be projected onto the sensor dimensions by using the enlargement
factor,
180 ph
μ(cycles/mm on sensor) = μ(cycles/degree) × .
lπh
Here ph is the picture height (print or screen) in mm, and h is the height of the short
side of the sensor in mm. This formula can be used to relate the camera system MTF
as a function of μ(cycles/mm) to the μ(cycles/degree) appearing in equation (5.33).
An alternative spatial frequency unit used for camera system MTF calculation is
cycles/pixel, in which case the above formula can be expressed as follows [24]:
180 ph
μ(cycles/pixel) = μ(cycles/degree) × .
lπnh
Here nh is the number of pixels on the short side of the sensor, and so the sensor
height drops out of the equation. In practice, resampling will take place to satisfy a
pixels per inch (ppi) display resolution required by the print. As described in section
5.7, this will introduce a resampling MTF that will affect perceived image sharpness.
The original SQF calculation has been refined [24] by introducing the CSF
defined in [26] and extending the integration from zero up to the sensor Nyquist
frequency expressed in cycles/degree units. The revised CSF is shown in figure 5.17.
The revised integration limits [24] are given by
μ1 = 0
μ2 = μNyq .
This CSF applies a refined weighting for how strongly the camera system MTF
influences perceived sharpness at each spatial frequency.
5-43
Physics of Digital Photography (Second Edition)
SQF is a far superior metric for evaluating perceived sharpness compared with
MTF50. However, a drawback of SQF is that contour definition is not taken into
account [9]. Contour definition is higher for flatter MTF curves, and this is
associated with higher perceived sharpness in practice. In other words, additional
assessment is required to recognise special cases [9].
5.7.1 Upsampling
It is useful to think of the digital image in terms of a sampled signal. As described in
chapter 3, the sampled signal arises from the sampling and digitisation of the optical
image projected at the SP. Aliasing may be present in the sampled signal, and the
amount of aliasing depends on the value of the camera system cut-off frequency at
the time of image capture, μc , in relation to the sensor Nyquist frequency, μNyq .
Nevertheless, the aim when upsampling is simply to increase the sampling rate
without affecting the signal and image appearance.
For clarity, the following discussion will be restricted to resampling in 1D.
Extension to 2D is straightforward. Upsampling in principle requires reconstruction
of the continuous signal f (x ) from known sample values so that f (x ) can be
resampled at a new and more densely spaced grid of positions [27].
In practice, the f (x ) values at the new sample positions can be directly calculated
through convolution. The value of f (x ) at an arbitrary new sample location x can be
5-44
Physics of Digital Photography (Second Edition)
f (x ) = f˜ (x ) ∗ h(x )
= ∑ f˜ (xi ) h(x − xi ).
i
This amounts to centring the kernel at the new sample location x, and then summing
the products of the known discrete sample values and kernel values at those
positions. The sum over i runs over the number of known sample values within
the range of the reconstruction kernel. The example shown in figure 5.18 extends
over four samples [27].
It was shown in chapter 3 that the ideal reconstruction filter is a sinc function that
extends over all space,
sin(πx )
h ideal(x ) = sinc(x ) = .
π (x )
Here x is normalised to the pixel sample spacing so that Δxi = 1. The MTF of a sinc
function is a rectangle function,
MTFideal = rect(x ) .
h(x) f˜(x4 )
f˜(x1 )
x1 x2 x x3 x4
Δxi
resample
f (x)
x
Figure 5.18. Image resampling in 1D. Here the non-ideal interpolation kernel h(x ) (green) centred at position
x extends over four input samples located at x1, x2, x3, and x4. The output function value f (x ) at position x is
shown in the lower diagram. The input image sample spacing is indicated by Δxi .
5-45
Physics of Digital Photography (Second Edition)
Figure 5.19. Resampling MTF for several example filter kernels. The sensor Nyquist frequency is located at
0.5 cycles per pixel. When upsampling or downsampling, the pixel units are always defined by the image with
the larger pixel sample spacing.
This is plotted in figure 5.19. The horizontal axis in the figure represents cycles per
pixel, and the sensor Nyquist frequency is located at 0.5 cycles per pixel prior to any
resampling. The maximum frequency content of the digital image itself is always
0.5 cycles/pixel, which is often referred to as the Nyquist frequency of the image.
The region to the left of the Nyquist frequency in figure 5.19 is referred to as the
passband, and the region to the right is referred to as the stopband. The rectangle
function is equal to unity in the passband and zero in the stopband. This ensures that
all frequencies below the Nyquist frequency are perfectly reconstructed and that all
frequencies above the Nyquist frequency are completely suppressed. Note that the
passband may already contain aliased content if the digital image was captured with
frequency content above the sensor Nyquist frequency included since this high
frequency content will have folded into the passband.
Since the sinc function extends over all space, approximations must be made in
practice. The simplest reconstruction kernel is defined by nearest-neighbour
interpolation,
⎧ 1
⎪
⎪1 0 ⩽ ∣x∣ < 2
h nn(x ) = ⎨ .
⎪ 0 1 ⩽ ∣x∣
⎪
⎩ 2
Again x is normalised to the pixel sample spacing. The performance of this
reconstruction filter can be evaluated by comparing its transfer function with that of
the ideal interpolation function. The nearest-neighbour interpolation MTF is given by
MTFnn(μx ) = sinc(μx ) .
5-46
Physics of Digital Photography (Second Edition)
Figure 5.19 shows that the sharp transition between the passband and stopband is
attenuated. The decay in the passband blurs the image. Furthermore, there is
considerable frequency leakage into the stopband. Unless the replicated frequency
spectra of the original sampled image are sufficiently far apart, frequency leakage
due to non-ideal reconstruction will introduce spurious high frequency content
above the Nyquist frequency that can cause jagged edges to appear in the image.
Although jagged edges arise from non-ideal construction, they can be interpreted as
aliasing artefacts in the present context since the spurious high frequency content
will become part of the passband when the image is resampled [27].
An improvement on nearest-neighbour interpolation is bilinear interpolation,
⎧1 − ∣x∣ , 0 ⩽ ∣x∣ < 1
hbl(x ) = ⎨ .
⎩ 0, 1 ⩽ ∣x∣
Since x is normalised to the pixel sample spacing, the bilinear reconstruction kernel
extends over two samples in 1D and four samples in 2D. The MTF is given by
MTFbl(μx ) = ∣sinc 2(μx )∣ .
Figure 5.19 shows that frequency leakage into the stopband is reduced when using
bilinear interpolation compared with nearest-neighbour interpolation.
An even better approximation to the sinc function is provided by bicubic
convolution [28],
⎧(α + 2)∣x∣3 − (α + 3)∣x∣2 + 1, 0 ⩽ ∣x∣ < 1
⎪
hbc(x ) = ⎨ α∣x∣3 − 5α∣x∣2 + 8α∣x∣ − 4α , 1 ⩽ ∣x∣ < 2 .
⎪
⎩ 0, 2 ⩽ ∣x∣
A commonly used value for α is −0.5. The reconstruction kernel extends over four
samples in 1D and sixteen samples in 2D. The MTF is given by the following
expression [29]:
3
MTFbc(μx ) =
(πμx )2
(sinc 2(μx ) − sinc(2μx ))
2α
+
(πμx )2
(3 sinc 2(2μx ) − 2 sinc(2μx ) − sinc(4μx )) .
Figure 5.19 shows that bicubic convolution with α = −0.5 exhibits reduced
attenuation in the passband and greatly reduced frequency leakage into the
stopband. Bicubic convolution is the standard interpolation kernel used by
Adobe® Photoshop. Many other interpolation methods exist. For example,
Lanczos resampling uses a kernel based on a windowed sinc function. Such methods
can give superior results compared to bicubic convolution but may introduce other
types of reconstruction artefact [27].
In summary, the aim when upsampling an image is to leave the continuous
representation of the signal intact, and simply increase the sampling rate and
5-47
Physics of Digital Photography (Second Edition)
associated pixel count. In the Fourier domain, this is equivalent to increasing the
spacing between the replicated spectra. Unfortunately, non-ideal reconstruction will
corrupt the signal and introduce a combination of image blur and jagged edges. The
image blur will result from attenuation in the passband, and jagged edges may
appear due to frequency leakage into the stopband. The latter will occur if the
replicated spectra of the original sampled image are sufficiently close together such
that spurious high frequencies are retained.
5.7.2 Downsampling
Recall that when upsampling an image, the signal needs to be reconstructed in order
to increase the sampling rate. In this case the interpolation filter acts as a
reconstruction filter. The size of the reconstruction kernel, h(x ), remains fixed as it
depends upon the pixel sample spacing of the original sampled image rather than the
upsampled image.
However, the interpolation filter plays a different role when downsampling an
image due to the fact that information will be discarded. In contrast to upsampling,
which will increase the separation between the replicated frequency spectra, down-
sampling will decrease the separation. When a digital image produced by a camera is
downsampled, the spectra will overlap and corrupt the passband. This is known as
undersampling, and the interpolation filter must construct the output by acting as an
anti-aliasing filter rather than a reconstruction filter. The interpolation filter should
ideally bandlimit the image spectrum to under half the new sampling rate. Although
the new signal will correspond to a smoothed lower-resolution representation of the
original photographic scene, bandlimiting to half the new sampling rate will
minimise aliasing.
A simple way to achieve the above goal is to increase the size of the interpolation
kernel in accordance with the scale reduction factor used for downsampling [27, 30],
f (x ) = f˜ (x ) ∗ ah(ax )
= a ∑ f˜ (xi ) h(ax − axi ).
i
Here a < 1 is the scale reduction factor. In contrast to upsampling, where Δxi = 1
corresponds to the input image sample spacing as indicated in figure 5.18, here
Δaxi = 1 corresponds to the output image sample spacing. In other words, the sum
over i covers a greater number of input pixels than the number used when
upsampling. For example, the bicubic convolution kernel may extend over many
input samples, but only four and sixteen output samples in 1D and 2D, respectively.
The filter frequency response will accordingly be narrower due to the reciprocal
relationship between the spatial and frequency domains. This relationship is
expressed by the following identity, where H is the FT of h,
⎛μ ⎞
H ⎜ x ⎟ = FT{ah(ax )} .
⎝a⎠
5-48
Physics of Digital Photography (Second Edition)
Here ne,noise is the total signal noise [31]. This may include fixed pattern noise (FPN)
along with temporal noise, which must be added in quadrature. Sometimes only
temporal noise is included in the definition [32]. In terms of output-referred units,
n DN
SNR (ratio) = :1
n DN,noise
⎛ n ⎞
SNR (stops) = log2⎜ DN ⎟
⎝ n DN,noise ⎠
⎛ n ⎞
SNR (dB) = 20 log10⎜ DN ⎟ .
⎝ n DN,noise ⎠
In practice, SNR can be calculated in terms of output-referred units using the raw
data and the methods for measuring noise described in chapter 3. By using equation
(3.50) of chapter 3, the conversion factor g can then be used to convert output-
referred units to input-referred units,
ne = g n DN . (5.34)
5-49
Physics of Digital Photography (Second Edition)
The conversion factor g has units e−/DN and is defined by equation (3.51),
U
g= .
G ISO
The mosaic label has been dropped for clarity. The unity gain U is the ISO gain
setting that results in one electron being converted into one DN. Since g depends
upon the ISO gain G ISO, the relationship between input and output-referred units is
dependent on the associated ISO setting, S. Noise contributions such as read noise
that were not part of the original detected charge can also be converted to input-
referred units.
SNR also varies as a function of ISO setting. However, ISO settings themselves are
characteristics of a camera and are not performance metrics. The metric of
significance is the SNR as a function of signal at a given ISO setting.
To understand this, recall from chapter 3 that the base ISO setting corresponds to
the gain setting, G ISO, that uses the least analog amplification and therefore allows
full use of the sensor response curve, either through utilisation of full-well capacity
(FWC) or saturation of the analog-to-digital converter (ADC). The base ISO
setting, Sbase , can be associated with an ISO gain G ISO = 1. A better QE and higher
FWC per unit sensor area are both favourable in terms of SNR; however, a better
QE can raise Sbase whereas a higher FWC per unit area can lower Sbase . In older
cameras, a very low Sbase is characteristic of poor QE. In newer cameras that have
good QE, a very low Sbase is characteristic of a very high FWC per unit sensor area.
This extra photographic capability can be utilised by taking advantage of the longer
exposure times available, for example, through the use of a tripod in landscape
photography.
It should also be remembered that ISO settings are independent of photosite area,
as explained in section 2.6 of chapter 2.
5-50
Physics of Digital Photography (Second Edition)
Figure 5.20. SNR per photosite plotted as a function of raw level (DN) at a variety of ISO settings for the
Olympus® E-M1 camera. Here SNR is lowered at a given raw level when the ISO setting is raised since less
exposure has been used to obtain the same raw data.
Figure 5.20 was obtained using the experimental methods described in section 3.9
of chapter 3. SNR and raw level have been expressed in stops by taking the base 2
logarithm. Only read noise and photon shot noise have been included as noise
sources.
First, notice that the maximum SNR at a given ISO setting is obtained at the
highest raw value. This is the premise behind the ETTR technique discussed in
section 5.10.4. Each curve rises linearly at low exposure with slope approximately
equal to unity. The slope drops to one half at higher exposure levels as the square
root relationship between signal and photon shot noise dominates. The main reason
that SNR increases with exposure H is that photon shot noise is proportional to H ,
and so signal to photon-shot-noise ratio is also proportional to H ,
ne H
SNR ph = ∝ = H. (5.35)
σph H
Second, notice that SNR is lowered as the ISO setting increases. It is commonly
assumed that higher ISO settings are inherently noisier, however this is not the case,
as discussed previously in section 3.9.3 of chapter 3. In fact, the curves in figure 5.20
indicate lower SNR as the ISO setting is raised since less photometric exposure H has
been used to obtain the same raw level at the higher ISO setting [33]. This is evident
from equation (5.34),
ne = g n DN .
A higher ISO setting lowers the conversion factor g, and this lowers the signal ne that
corresponds to a given raw level. This means that FWC cannot be utilised when the
ISO setting is raised above the base value. For example, the ISO 200 curve has been
5-51
Physics of Digital Photography (Second Edition)
obtained using one stop higher H than the ISO 400 curve. The base ISO setting
offers the highest maximum SNR since a higher H can be tolerated before raw
saturation.
In order to see that higher ISO settings can in fact offer improved SNR, it is
necessary to compared curves obtained at the same fixed exposure level rather than
the same raw level [33]. This can be illustrated using input-referred units.
Figure 5.21. SNR per photosite plotted as a function of exposure at a variety of ISO settings for the Olympus®
E-M1 camera. Higher ISO settings provide a SNR advantage in the low-exposure regions of the image.
However, highlight headroom is lost and raw DR is reduced.
5-52
Physics of Digital Photography (Second Edition)
of chapter 3, the programmable gain amplifier (PGA) amplifies all of the voltage
signal but only part of the read noise, specifically the contribution arising from
readout circuitry upstream from the PGA [32, 33].
The penalty for using a higher G ISO is that the ADC will saturate before FWC can
be utilised. Above the base ISO gain, the available electron-well capacity is halved
every time G ISO is doubled. Consequently, the maximum achievable SNR along with
raw DR will be lowered. This is indicated on figure 5.21 by the arrows that drop to
zero.
For the same reasons, a camera manufacturer can improve SNR at low signal
levels by using a higher conversion factor g. However, the ADC may saturate before
FWC is utilised since the ADC power supply voltage is fixed. The camera
manufacturer must balance these trade-offs when choosing optimal values [32].
The above analysis reveals that if photometric exposure H is unrestricted by
photographic conditions, it makes sense to use the base ISO setting, G ISO = 1, which
is defined by S = 200 on the camera used to plot figure 5.21. This enables the full
sensor response curve to be utilised and therefore maximises the use of H in
producing the voltage signal that is converted into raw data. This maximises SNR
since SNR increases as H .
On the other hand, if photographic conditions restrict H to a fixed maximum
value and there is still headroom available at the top of the sensor response curve,
then SNR may be improved at low signal levels by using a higher ISO setting [33].
The above strategies for improving SNR can be implemented using the ETTR
technique discussed in section 5.10.4.
5-53
Physics of Digital Photography (Second Edition)
editing software [33]. Unlike analog gain, digital gain applied using tone curves is a
simple multiplication of digital values and this cannot bring any SNR advantage.
Some recent cameras have such low levels of downstream read noise that the ISO-
less setting is very low and the camera can be described as being ISO invariant [34].
This means that if the camera is set to store raw files, the ISO setting can effectively
be left at its base value, and no SNR penalty will arise by applying digital gain
instead when processing the raw file.
5-54
Physics of Digital Photography (Second Edition)
to a small size for displaying on the internet. The noise reduction achieved when
downsampling can be explained by the pixel binning example given above. If the
downsampling algorithm is implemented correctly as described in section 5.7, the
interpolation filter will construct the output by acting as an anti-aliasing filter rather
than a reconstruction filter, and will bandlimit the image spectrum to the Nyquist
frequency of the new sampling grid. The filtering operation has a similar effect to
averaging, and so the signal will be smoothed when details and gradients are present
in the scene. However, the temporal noise will be reduced by the anti-aliasing filter
since temporal noise is uncorrelated. The greatest reduction in noise will occur in
uniform areas of the scene.
In conclusion, downsampling yields larger pixels for the same display area, and
each of the new larger pixels will have a greater SNR per pixel. This means that the
downsampled image will have less noise at the pixel level, although noise is not
necessarily reduced at the image level. The noise reduction has been achieved by
discarding resolution [33].
Since a gain in SNR per pixel can be obtained by donwsampling, an SNR metric
that is directly comparable between camera sensors that have different photosite
areas will be area-based rather than photosite-based. This is discussed in the next
section.
Here Ap is the photosite area in microns squared. Although the signal is proportional
to Ap, the square root is required since photon shot noise is proportional to Ap .
Alternatively, SNR could be normalised according to a specified percentage frame
height or percentage area of the sensor,
A%
SNR per % area = SNR per photosite × . (5.36)
Ap
5-55
Physics of Digital Photography (Second Edition)
A%
SNR per % area = SNR per photosite × .
Ap
Here A% is the specified percentage area and Ap is the photosite area. The percentage
area normalisation not only takes into account different sensor pixel counts, it also
takes into account the different enlargement factors from the sensor dimensions to
the dimensions of the viewed output photograph when comparing equivalent
photographs [1].
Furthermore, SNR per % area should be compared using equivalent ISO settings
on each format. For example, a photograph taken at f = 100 mm, N = 2.8, and
S = 800 on APS-C should be be compared with an equivalent photo taken at
5-56
Physics of Digital Photography (Second Edition)
f = 150 mm, N = 4, and S = 1600 on 35 mm full frame. In this case, the level of
exposure at the SP of the APS-C sensor will be double that of the 35 mm full-frame
sensor and so the advantage of the larger full frame detection area will be
approximately neutralized. However, the larger format offers additional photo-
graphic capability and potentially higher SNR when the smaller format does not
offer an equivalent ISO setting [1].
5-57
Physics of Digital Photography (Second Edition)
when a scene that averages to middle grey (18% relative scene luminance) is metered
using average photometry. Alternatively, the headroom requirement could be
removed from the definition. In any case, no correspondence should be expected
between raw ISO values determined in this way and the ISO settings labelled on the
camera.
Nevertheless, photographers who wish to use the ETTR technique in conjunction
with raw output could in principle replace the labelled camera ISO settings with the
raw ISO speed values. According to the analysis given in chapter 2, an average scene
luminance that is 12.8% of the maximum would saturate the raw output if the scene
is metered using average photometry and raw ISO speeds are used for the ISO
settings.
Input-referred units
Raw DR per photosite describes the maximum DR that can be represented in the
raw data as a function of ISO setting. Using input-referred units, raw DR per
photosite is defined as follows:
ne,max
raw DR (ratio) = :1
σe,read
⎛ ne,max ⎞
raw DR (stops) = log2⎜ ⎟
⎝ σe,read ⎠
⎛ ne,max ⎞
raw DR (decibels) = 20 log10⎜ ⎟.
⎝ σe,read ⎠
Here ne,max is the maximum number of electrons that can be collected before the
ADC saturates. This defines the raw DR upper limit. By using the read noise σe,read as
the noise floor, the raw DR lower limit is defined by the signal per photosite where
the SNR = 1. This is also known as noise equivalent exposure. When expressed in
decibels, the convention adopted here is such that 1 stop ≡ 20 log10 2 ≈ 6.02 decibels
(dB).
At the base ISO setting, the raw DR upper limit is defined by FWC, assuming
that the photoelectron well completely fills before the ADC saturates. It was shown
in section 3.9 of chapter 3 and in section 5.8.3 above that read noise expressed in
electrons decreases as the ISO gain is raised from its base value, and so the raw DR
5-58
Physics of Digital Photography (Second Edition)
lower limit is similarly lowered or improved. However, the raw DR upper limit is
lowered from FWC by one stop each time the corresponding ISO setting is doubled
from its base value. The overall result is that raw DR goes down with increasing ISO
gain.
Output-referred units
Using output-referred units, raw DR per photosite is defined as follows:
n DN,clip
raw DR (ratio) = :1
σDN,read
⎛ n DN,clip ⎞
raw DR (stops) = log2⎜ ⎟
⎝ σDN,read ⎠
⎛ n DN,clip ⎞
raw DR (decibels) = 20 log10⎜ ⎟.
⎝ σDN,read ⎠
Here nDN,clip is the DN corresponding to the raw clipping point, and σDN,read is the
read noise in DN.
The ISO gain compensates for reduced exposure and electron count by main-
taining the raw level in output-referred units at the expense of SNR. In other words,
the read noise in DN goes up with increasing ISO gain while the raw clipping point
remains constant. Therefore DR goes down with increasing ISO gain. This is
consistent with the definition given above using input-referred units.
The sensor DR upper limit is defined by the signal at FWC, ne,FWC . The sensor DR
lower limit is defined by the contribution to the total read noise arising from the
sensor, σe,read,sensor . This is the contribution from components upstream from the
PGA. In terms of photoelectrons, the read noise contribution from electronics
downstream from the PGA is negligible at the ISO gain corresponding to the ISO-
less setting. Therefore σe,read,sensor can be defined as the read noise at the ISO-less
setting.
5-59
Physics of Digital Photography (Second Edition)
In practice, raw DR is less than the sensor DR only because the standard strategy
for quantizing the voltage signal does not overcome the limitations imposed by the
downstream electronics [33]. Camera manufacturers have recently begun to address
this issue by developing more sophisticated methods for quantizing the signal, such
as dual conversion gain [37],
Various attempts have been made to define alternative measures of raw DR.
Photographic dynamic range (PDR) [38] described in this section is a very simple
measure of perceivable raw DR that addresses the above issues.
In order to address issue (i) above, an appropriately normalised measure of SNR
needs to be introduced. One option is to use SNR per fixed percentage area of the
sensor, as already discussed in sections 5.8.5 and 5.8.7. PDR [38] uses the CoC as the
fixed percentage area, which offers several advantages.
It has already been shown in section 5.8.5 that signal, noise, and therefore
SNR are dependent on the spatial scale at which they are measured. Recall from
section 5.2 that detail on a smaller spatial scale than the CoC diameter cannot be
resolved by an observer of the output photograph under the viewing conditions
that specify the CoC. This suggests that the CoC is a suitable minimum spatial
scale at which to measure SNR as perceivable by an observer of the output
photograph.
As described in section 5.2, standard CoC diameters correspond with standard
viewing conditions. Standard viewing conditions assume that the photograph will be
displayed at A4 size and viewed at the least distance of distinct vision, Dv = 250 mm .
Alternatively, the photographer can calculate a custom CoC diameter based upon
the intended viewing conditions. Given the SNR per photosite, SNR per CoC can be
calculated as follows:
5-60
Physics of Digital Photography (Second Edition)
π (c /2)2
SNR per CoC = SNR per photosite × . (5.37)
Ap
Here Ap is the photosite area and c is the CoC diameter. As a metric, SNR per CoC
has two important properties.
• Photosite area and therefore sensor pixel count are accounted for since the
photosites are binned together up to the CoC area.
• Sensor format is accounted for since the CoC for a given format takes into
account the enlargement factor from the dimensions of the optical image
projected onto the SP to the dimensions of the viewed output photograph.
In order to address issue (ii) above, PDR uses a more appropriate raw DR lower
limit than the signal that provides an SNR per photosite = 1. An appropriate lower
limit is subjective since different observers have their own expectations in terms of
IQ. In [38], the PDR lower limit is defined as the signal that provides an SNR per
CoC = 20. This can be expressed by rearranging equation (5.37). Using output-
referred units,
PDR lower limit (DN) ≡ Raw level (DN) where
20
SNR per photosite = .
π (c / 2)2
Ap
If raw level and SNR are both expressed in stops by taking the base 2 logarithm, the
definition becomes
PDR lower limit (stops) ≡ Raw level (stops) where
20
SNR per photosite = log2 .
π (c / 2)2
Ap
Again using output-referred units, the PDR upper limit is defined by the raw
clipping point (RCP). The PDR upper limit may similarly be expressed in stops,
PDR upper limit (stops) = log2RCP.
Now PDR in stops can be defined as follows:
PDR (stops) = PDR upper limit (stops) − PDR lower limit (stops).
The PDR lower limit in output-referred units will increase as the ISO gain increases,
and so PDR will decrease upon raising the ISO setting, S.
PDR curves for several example cameras are shown in figure 5.22. It should be
noted that the ISO values in the plot are the ISO settings labelled on the camera, and
these are defined according to the camera JPEG output rather than the raw data.
The comparison could be improved by plotting PDR according to the raw ISO
speed values discussed in section 5.8.8. Nevertheless, it is apparent from figure 5.22
5-61
Physics of Digital Photography (Second Edition)
Figure 5.22. PDR as a function of ISO setting for a selection of cameras [38].
that similar PDR values are obtained when PDR is compared using equivalent ISO
settings, in accordance with equivalence theory. The highest PDR values are
obtained on the larger formats where equivalent ISO settings do not exist on the
smaller formats.
Not all of the raw DR will be transferred to the output photograph in general.
Indeed, the image DR is dependent on the tone curve used when converting the raw
file into the output image file, as discussed in section 2.3 of chapter 2. Furthermore,
the image DR may be compressed or tonemapped into the DR of the display
medium, as described in section 2.13 of chapter 2. Nevertheless, PDR provides an
upper bound on the image DR that is in principle just perceivable when the output
photograph is viewed under the specified viewing conditions.
5-62
Physics of Digital Photography (Second Edition)
5-63
Physics of Digital Photography (Second Edition)
5-64
Physics of Digital Photography (Second Edition)
5-65
Physics of Digital Photography (Second Edition)
In section 5.2, it was shown that the conditions under which the output photo-
graph is viewed can be described by the CoC diameter, c. A noticeable drop in
perceived image sharpness may not be expected until the Airy disk diameter
approaches the CoC diameter:
2.44 λN ≈ c . (5.40)
For example, consider an output photograph from a 35 mm full-frame camera
viewed under standard viewing conditions. In this case the viewing distance is
L = Dv , where Dv = 250 mm is the least distance of distinct vision, and the optical
image at the SP is enlarged by a factor X = 8, which corresponds with an A4 print.
According to equation (5.30), these conditions define a CoC diameter c = 0.030 mm.
Rearranging equation (5.40) and substituting c = 0.030 mm indicates that a
noticeable drop in perceived image sharpness is likely to occur at an f-number N ≈ 22.
Consequently, the optimum f-number is likely to be one or two stops below this value,
i.e. N = 16 or N = 11 on the 35 mm full-frame format. An accurate estimation would
require all contributions to the camera system MTF to be taken into account.
5-66
Physics of Digital Photography (Second Edition)
This means that averaging M frames decreases the temporal noise by a factor
M,
1 σ
〈σDN〉M = 2
MσDN = DN .
M M
Equivalently, averaging M frames increases the SNR by a factor M [39].
For example, averaging 16 frames will increase SNR by a factor of 4.
• Dark-frame subtraction
Recall from chapter 3 that DSNU (dark-signal nonuniformity) is defined as
FPN that occurs in the absence of irradiance at the SP, and that the major
contribution arises from DCNU (dark-current nonuniformity). Nevertheless,
DSNU is still present when the imaging sensor is exposed to light. Although
cameras will estimate and subtract the average dark signal on a row or
column basis, this procedure will not eliminate the DCNU contribution since
this arises from variations in individual photosite responses over the SP.
5-67
Physics of Digital Photography (Second Edition)
2
σDN + ( −σ )2DN = 2 σDN.
2. The photographer has to wait for the dark frame to be taken after every
standard frame.
3. LENR may not remove any pattern component of non-Gaussian read
noise since this may vary from frame to frame.
For photographers who specialise in long-exposure photography, an alter-
native strategy is to construct a set of dark frame templates for various
exposure durations t and ISO settings S. Each template can be made by
averaging together many dark frames in order to minimize temporal noise
and isolate DSNU. Subtracting the template from the standard frame will
only increase temporal noise by a factor 1 + (1/M ) , where M is the number
of dark frames used to make the template. If enough dark frames are used,
this process can also isolate any overall pattern that may arise from non-
Gaussian read noise.
• Flat-field correction
Recall from chapter 3 that PRNU (pixel response nonuniformity) increases in
proportion with exposure level and therefore electron count,
ne,PRNU = k ne .
Since the total noise will be larger than the PRNU, the proportionality
constant k places a limit on the maximum achievable SNR [32, 33].
ne 1
SNR ⩽ = .
ne,PRNU k
5-68
Physics of Digital Photography (Second Edition)
frames minimizes the temporal noise and isolates the PRNU. When any
standard frame is taken, the flat-field template needs to be divided from the
standard frame, which must then be multiplied by the average raw value of
the flat-field template.
The ETTR method requires that exposure decisions be made using the raw data
rather than the JPEG output image. However, camera LCD displays and electronic
viewfinders show the image histogram corresponding to the JPEG output even when
the camera is set to record raw files. Nevertheless, open-source firmware solutions
for displaying raw histograms are available. An alternative strategy is to exper-
imentally determine the difference between the JPEG clipping point and raw
clipping point under typical scene illumination. If the difference is known to be M
stops, then the photographer can safely overexpose by M stops beyond the right-
hand side of the JPEG image histogram when implementing the ETTR method.
On traditional DSLRs, the histogram needs be checked after the photograph has
been taken. Cameras with liveview or mirrorless cameras with an electronic
viewfinder offer a ‘live histogram’ that can be monitored in real time.
5-69
Physics of Digital Photography (Second Edition)
5-70
Physics of Digital Photography (Second Edition)
exists for increasing SNR [33]. Specifically, read noise can be lowered by increasing
the ISO setting S. As explained in section 5.8.3, read noise decreases upon raising S
provided H is fixed, and so signal-to-read-noise ratio goes up. This is referred to as
shadow improvement. Since each doubling of S removes access to the uppermost stop
of the sensor response curve, the histogram is pushed to the right.
If the camera is in manual mode, the fixed-exposure ETTR method should be
implemented as follows:
1. Select the base ISO setting and maximise 〈H 〉 by using the lowest N and
longest t that the photographic conditions allow.
2. If headroom is still available, increase the ISO setting until the histogram is
pushed as far to the right as possible without clipping.
Step 1 utilizes the variable-exposure ETTR method described in the previous section
as far as the photographic conditions allow. This should always be applied first since
the greatest increase in total SNR is achieved by capturing more light. Step 2 then
utilizes the fixed-exposure ETTR method based on shadow improvement. In this
case, any further increase in total SNR is achieved in the low-exposure shadow
regions of the image.
Finally, the fixed-exposure ETTR method should only be applied up to the
camera-dependent ISO-less setting defined in section 5.8.4. Above the ISO-less
setting, no further shadow improvement can be achieved but available raw DR
decreases.
References
[1] Rowlands D A 2018 Equivalence theory for cross-format photographic image quality
comparisons Opt. Eng. 57 110801
[2] Rowlands D A 2017 Physics of Digital Photography 1st edn (Bristol: Institute of Physics
Publishing)
[3] Kingslake R and Johnson R B 2010 Lens Design Fundamentals 2nd edn (New York:
Academic)
[4] International Organization for Standardization 2006 Photography—Digital Still Cameras—
Determination of Exposure Index, ISO Speed Ratings, Standard Output Sensitivity, and
Recommended Exposure Index, ISO 12232:2006
[5] Camera and Imaging Products Association 2004 Sensitivity of Digital Cameras CIPA DC-
004
[6] Williams J B 1990 Image Clarity: High-Resolution Photography (Stoneham, MA: Focal
Press)
[7] Nasse H H 2008 How to Read MTF Curves (Carl Zeiss Camera Lens Division)
[8] Ray S F 2002 Applied Photographic Optics: Lenses and Optical Systems for Photography,
Film, Video, Electronic and Digital Imaging 3rd edn (Oxford: Focal Press)
[9] Nasse H H 2009 How to Read MTF Curves. Part II (Carl Zeiss Camera Lens Division)
[10] Smith W J 2007 Modern Optical Engineering 4th edn (New York: McGraw-Hill)
[11] Goodman J 2004 Introduction to Fourier Optics 3rd edn (Englewood, CO: Roberts)
[12] Shannon R R 1997 The Art and Science of Optical Design (Cambridge: Cambridge
University Press)
5-71
Physics of Digital Photography (Second Edition)
[13] Shannon R R 1994 Optical specifications Handbook of Optics (New York: Mc-Graw Hill)
ch 35
[14] Fiete R D 2010 Modelling the Imaging Chain of Digital Cameras, SPIE Tutorial Text vol
TT92 (Bellingham, WA: SPIE Press)
[15] Koyama T 2006 Optics in digital still cameras Image Sensors and Signal Processing for
Digital Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 2
[16] International Organization for Standardization 2000 Photography-Electronic Still-picture
Cameras-Resolution Measurements, ISO 12233:2000
[17] Burns P D 2000 Slanted-edge MTF for digital camera and scanner analysis PICS 2000:
Image Processing, Image Quality, Image Capture, Systems Conf. (Portland, OR) (IS&T)
vol 3 pp 135–38
[18] Burns P D and Williams D 2002 Refined slanted-edge measurement practical camera and
scanner testing PICS 2000: Image Processing, Image Quality, Image Capture, Systems Conf.
(Portland, OR) (IS&T) vol 5 pp 191–95
[19] Palum R 2009 Optical antialiasing filters Single-Sensor Imaging: Methods and Applications
for Digital Cameras ed R Lukac (Boca Raton, FL: CRC Press) ch 4
[20] Charman W N and Olin A 1965 Image quality criteria for aerial camera systems Photogr.
Sci. Eng. 9 385
[21] Snyder H L 1973 Image quality and observer performance Perception of Displayed
Information (New York: Plenum Press) ch 3
[22] Granger E M and Cupery K N 1973 An optical merit function (SQF), which correlates with
subjective image judgments Photogr. Sci. Eng. 16 221
[23] Barten P G J 1990 Evaluation of subjective image quality with the square-root integral
method J. Opt. Soc. Am. A 7 2024
[24] Koren N 2009 Imatest Documentation (Imatest LLC)
[25] Baxter D, Cao F, Eliasson H and Phillips J 2012 Development of the I3A CPIQ spatial
metrics Image Quality and System Performance IX Proc. SPIE 8293 829302
[26] Mannos J and Sakrison D 1974 The effects of a visual fidelity criterion of the encoding of
images IEEE Trans. Inf. Theory 20 525
[27] Wolberg G 1990 Digital Image Warping 1st edn (Los Alamitos, CA: IEEE Computer Society
Press)
[28] Keys R G 1981 Cubic convolution interpolation for digital image processing IEEE Trans.
Acoust. Speech Signal Process 29 1153
[29] Mitchell D P and Netravali A N 1988 Reconstruction filters in computer graphics
Proceedings of the 15th Annual Conference on Computer Graphics and Interactive
Techniques, SIGGRAPH '88 (New York: Association for Computing Machinery) pp 221–8
[30] Wolberg G 2004 Sampling, reconstruction, and aliasing Computer Science Handbook ed
A B Tucker 2nd edn (Chapman and Hall/CRC) ch 39
[31] Holst G C and Lomheim T S 2011 CMOS/CCD Sensors and Camera Systems 2nd edn (JCD
Publishing/SPIE)
[32] Nakamura J 2006 Basics of image sensors Image Sensors and Signal Processing for Digital
Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 3
[33] Martinec E 2008 Noise, Dynamic Range and Bit Depth in Digital SLRs, unpublished
[34] Sanyal R 2015 Sony Alpha 7R II: Real-World ISO Invariance Study http://www.dpreview.
com/articles/7450523388
5-72
Physics of Digital Photography (Second Edition)
[35] Chen T, Catrysse P B, El Gamal A and Wandell B A 2000 How small should pixel size be?
Proceedings of SPIE 3965, Sensors and Camera Systems for Scientific, Industrial, and Digital
Photography Applications (Bellingham, WA: SPIE) pp 451–9
[36] Butler R 2011 Behind the Scenes: Extended Highlights! http://www.dpreview.com/articles/
2845734946
[37] Aptima Technology 2010 Leveraging Dynamic Response Pixel Technology to Optimize Inter-
scene Dynamic Range (Aptina Imaging Corporation) white paper
[38] Claff W J 2009 Sensor Analysis Primer–Engineering and Photographic Dynamic Range
unpublished
[39] Mizoguchi T 2006 Evaluation of image sensors Image Sensors and Signal Processing for
Digital Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 6
5-73
Index
6-2
Physics of Digital Photography (Second Edition)
colour space (CIE 1960 UCS) 4.2.2, 4.9.1 crop factor 1.3.6
colour space (CIE LAB) 2.2.3, 4.1, cross-format comparisons 1.3.6, 5, 5.1,
4.3.2, 4.4.2, 4.4.3 5.3.3, 5.4.1, 5.8.7
colour space (CIE RGB) 4.1.3, 4.1.6, CRT monitor see ‘cathode-ray tube
4.1.7 monitor’
colour space (CIE XYZ) 4.1.8, 4.4.2, current see ‘photocurrent’
4.4.3 cut-off frequency 3.1.8, 5.1.1
colour space (LMS) 2.1.2, 4.1.2, 4.6.2 cut-off frequency (detector) 3.3.4
colour space (output-referred) 2.1.2, 4.1, cut-off frequency (diffraction) 3.2.5,
4.1.12, 4.5 3.4.1, 3.5.2, 5.3.2, 5.10.1, 5.10.2
colour space (ProPhoto RGB) 4.5, cut-off frequency (effective) 5.3.2, 5.5.1
4.11.2, 4.11.3 cut-off frequency (object space) 5.10.1
colour space (reference) 4.1 cut-off frequency (sensor) see ‘cut-off
colour space (sRGB) 2.3.1, 4.1.12, 4.5, frequency (detector)’
4.5.1, 4.6, 4.7, 4.10, 4.11.1 cut-off frequency (system) 5.5, 5.10.1
colour temperature 4.2.1 cut-off frequency (vision) 3.4.3
colour tint 4.2.2 cycles per degree 5.6.3
Coltman’s formula 5.2.5 cycles per mm 3.1.7
coma 1.1.11, 1.5.10, 3.1.9, 5.3.4 cycles per pixel 5.4.1, 5.7.1
comb function 3.1.10, 3.3.3, 3.4.1, 3.4.2
complex amplitude 3.2.1 dark current 3.8.3, 3.9.2
compound lens see ‘lens (compound)’ dark-current shot noise see ‘noise (dark-
cone (light) 1.5.1, 1.5.2 current shot)’
cone cell (eye) 2.1.2 dark frame 3.9.2, 5.10.3
cone of vision 5.2.2 dark-frame subtraction 5.10.3
conversion factor 3.7.3, 3.9.1, 3.9.3, dark signal 3.8.2, 3.8.3, 4
4.3.1, 4.8.1, 5.8 dark-signal nonuniformity 3.8.5, 5.10.3
conversion gain 3.6.6, 5.9.2 data numbers see ‘digital numbers’
contour definition 5.6.3 dc bias 3.1.8
contrast 2.3.3, 2.12.2, 2.13.2 dcraw 2.12.1, 3.9, 4.3.2, 4.4.2, 4.8.2
contrast (mid-tone) 2.3.2, 2.3.3, 2.12.2 decibel 5.8, 5.9.1
contrast (waveform) 3.1.8 defocus aberration 1.2.1, 1.4.4, 1.4.6,
contrast ratio see ‘dynamic range 3.2.6, 5.2.3, 5.3.1, 5.3.4
(display)’ delta function 3.1.5, 3.1.6, 3.1.10, 3.3.3
contrast sensitivity function 5.6.3 demosaicing 2.1.2, 4.3.2
contrast transfer function 5.2.5 depletion region 3.6.2, 3.6.4, 3.6.6
convolution 3.1.4 - 3.1.6, 5.2.5, 5.7.1 depth of field 1.4, 2.8.1, 2.10, 3.2.4, 5.1.3
convolution theorem 3.1.7, 3.4.2 depth of focus 5.2.6
coordinate representation 1.5.7, 1.5.8, Descarte’s formula 1.1.8, 1.1.9
3.1.2, 3.2.3 detector aperture 3.3, 3.3.1, 3.3.2
correlated colour temperature 4.2.2, dielectric 2.11.2
4.2.4, 4.6.1, 4.7, 4.9.1 diffraction 3, 3.1.8, 3.2.2, 3.2.3, 3.2.5
correlated double sampling 3.8.2 diffraction limited 3.2.6, 5.3.2, 5.10.1
correlation 3.2.5 diffraction softening 3.2.4, 3.2.5, 5.6,
cosine fourth law 1.5.7 5.10.2
6-3
Physics of Digital Photography (Second Edition)
edge definition 5.4, 5.6 f-number 1.5, 1.5.3, 1.5.4, 1.5.10, 5.10.1
edge-spread function 5.5 f-number (working) 1.5.5, 5.1.3
effective focal length see ‘focal length’ f-stop 1.5.6, 2.1.3, 2.5.6, 2.10, 5.8
electric field 2.11.1, 3.2.1 Fermat’s principle 1.1.1, 1.1.4
electromagnetic energy 1.5.1, 3.1, 3.1.1 field curvature 1.1.11, 1.4.6, 5.3.1
electromagnetic optics 3.2.1 field of view 1.3, 1.3.2
electromagnetic wave 2.11.1, 3.2.1, 4.2.1 field stop 1.3.2
6-4
Physics of Digital Photography (Second Edition)
fill factor 2.6, 3.3.1, 3.3.4, 3.6.4, 4.3.4 full-well capacity 2.1.1, 2.1.3, 3.6.6,
film speed 2.5.3, 2.6 3.7.1, 3.7.2, 5.8.2, 5.9.1
filter (lens) 2.10, 2.10.1, 2.11 full-well capacity per unit area 5.8.1,
fixed pattern noise see ‘noise (fixed 5.8.6
pattern)’
fixed-point number 4.8.1 gain (analog) see ‘analog gain’
flare 2.2.5, 2.5.2 gain (conversion) see ‘conversion gain’
flash 2.9 gain (conversion factor) see ‘conversion
flat-field correction 5.10.3 factor’
floating element 1.2, 1.2.2 gain (digital) see ‘digital gain’
flux 1.5.1, 1.5.2, 1.5.7 gain (display) 2.13.2
focal distance 1.1.9 gamma correction 2.2.5
focal length 1.1.9, 1.5.4 gamma decoding 2.2.1, 2.2.5, 2.13.2, 4.5
focal length (working) 5.1.2 gamma encoding 2.2.1, 2.2.4, 2.3.1,
focal length multiplier 1.3.6, 5.1, also see 2.3.2, 4.5, 4.10.1
‘equivalence ratio’ gamut 4.1.9, 4.4.1, 4.5, 4.11.1
focal plane 1.1.9, 1.2 gamut mapping 4.11.2, 4.11.4
focal point 1.1.6, 1.1.9 Gaussian distribution 3.8.5, 3.9.2,
focus 1.1.4, 1.2, 1.4, 5.2.3, 5.2.5, 5.3.4 5.10.3
focus and recompose 1.4.5 Gaussian equation 1.1.7, 1.1.9, 1.2.1,
focus breathing 1.2, 1.2.2, 1.3.4, 1.3.5, 1.3.3
1.5.5 Gaussian optics 1.1.2, 1.1.4, 1.1.5,
focus at infinity 1.1.6, 1.1.9, 1.1.10, 1.1.11, 3.2.6
1.2.1, 1.3.3, 1.5.3, 1.5.4, 1.5.10, Gaussian reference sphere 3.2.6
5.1.2 geometrical optics 1.1.1
focusing 1.2, 1.2.1, 1.3.5, 1.4.5 graduated neutral density filter 1.2.2,
focusing screen 1.2.3 2.7, 2.9, 2.10.1
format see ‘sensor format’ Grassman’s laws 4.1.3, 4.1.8
forward matrix (Adobe) 4.9, 4.9.2 greyscale 2.1.2, 2.12.1, 4.1.1
Fourier transform 3.1.7, 3.2.3, 3.4.1, Gullstrand’s equation 1.1.8
3.9.2, 5.7.2
frame 2.12, 3.8.1, 3.9.1, 5.10.3 halo 2.12.2
frame averaging 2.12.1, 3.8.4, 5.10.3 headroom (exposure) 2.6.1, 5.8.4,
frame readout speed 1.5.9 5.10.5, 5.10.6
Fraunhofer diffraction 3.2.3 headroom (raw) see ‘raw headroom’
frequency (optical) see ‘optical Helmholtz equation 3.2.1
frequency’ high dynamic range 2.7, 2.9, 2.12
frequency (spatial) see ‘spatial highlight dynamic range see ‘dynamic
frequency’ range (highlight)’
frequency leakage 5.7.1 highlights 2.9
Fresnel diffraction 3.2.3 histogram 2.4, 2.4.1, 2.4.2, 5.10.3
Fresnel-Kirchhoff equation 3.2.2, 3.2.3 hue 4.1.1, 4.1.7, 4.1.9
Fuji X-Trans 3.6.3, 4.3.2 Hunt-Pointer-Estevez matrix 4.6.2
full-frame format 1.3.6, 3.2.4, 5.1, 5.3.1, Huygens-Fresnel principle 3.2.2
5.10.2 hyperfocal distance 1.4.4
6-5
Physics of Digital Photography (Second Edition)
ICC colour profile see ‘colour profile’ ISO 2720 standard 2.5.1, 2.5.2, 2.5.3,
illuminance 1.5.1, 1.5.3, 1.5.5, 1.5.7, 2.5.4
1.5.10, 2.12.1 ISO 2721 standard 2.5.3
illuminant A 4.2.4, 4.9.1 ISO 12232 standard 2, 2.5.1, 2.5.2, 2.5.4,
illuminant D50 4.2.4, 4.5, 4.8.2, 4.9, 2.6.1 - 2.6.3, 3.1, 5.1.3, 5.8.8
4.9.2 ISO 12233 standard 5.5
illuminant D55 4.2.4 ISO 17321 standard 4.4.2
illuminant D65 4.1.12, 4.2.4, 4.4.4, ISO gain 3.7.1, 3.7.3, 5.9.1
4.5.1, 4.6, 4.9.1, 4.10.2 ISO invariance 5.8.4
illuminant D75 4.2.4 ISO-less setting 5.8.4, 5.9.2, 5.10.6
illuminant E 4.1.4, 4.1.7, 4.1.9, 4.1.12, ISO setting 1.5.8, 2.5, 2.6, 3.7.1, 3.7.3,
4.2.4, 4.3.5 3.9.3
illuminant estimation 4.6, 4.6.1 ISO setting (base) see ‘base ISO setting’
illumination 4.2, 4.2.1, 4.2.2, 4.6 ISO setting (equivalent) 5.1.3
image 1.1.1, 1.1.3 ISO setting (raw) 5.8.8
image circle 1.3.4 ISO speed 2.6, 2.6.1, 5.8.8
image distance 1.1.3, 1.1.7 isoplanatic see ‘linear shift-invariance’
image dynamic range see ‘dynamic isotherm 4.2.2
range (image)’
image height 5.3.1, 5.3.3 jagged edges 5.7.1
image plane 1.1.3, 1.1.4, 1.1.10, 1.2.1, jinc function 3.2.4
5.10.1 jitter 5.6, 5.10.1
image space 1.1.6, 1.5.7, 5.10.1
image stabilization 2.8.1, 2.8.2 Kelvin 4.2.1
imaginary primary 4.1.2, 4.1.3, 4.1.8, keystone distortion see ‘distortion
4.1.9, 4.3.4, 4.5, 4.11.2, 4.11.4 (keystone)’
impulse response 3.2.2
incandescence 4.2.2 LAB colour space see ‘colour space
incident-light metering see ‘metering (CIE LAB)’
(incident-light)’ Lagrange theorem 1.1.10, 1.5.3, 1.5.7,
incident-light value 2.7.3 1.5.10
incoherence 3.2.1, 3.2.3, 5.3 Lambertian surface 1.5.2, 2.5.5
infinity focus see ‘focus at infinity’ Lambert’s cosine law 1.5.2
infrared-blocking filter 3.6.3 LCD display 2.13.2
input-referred units 3.9, 5.8.3, 5.9.1 least distance of distinct vision 1.4.1,
integration time 3.8.3, 3.9.2 5.2.1, 5.3.1, 5.9.3
intensity (luminous) 1.5.1 least resolvable separation 5.1.2, 5.3.2
intensity (optical) 3.2.1 lens (compound) 1.1.2, 1.1.5, 1.1.6,
intensity (radiant) 3.1.1, 3.2.1 1.1.8, 3.2.3
interference 3.2.2 lens (simple) 1.1.1
internal focusing see ‘focusing’ lens design 1.1.2, 1.1.11, 1.3.2, 3.2.6
iris diaphragm 1.4.6, 1.5.6 lens MTF see ‘modulation transfer
irradiance 2.12.1, 3.2.1, 4.4.2, function (lens)’
5.10.1 lens PSF see ‘point spread function
ISO 2240 standard 2.5.3 (lens)’
6-6
Physics of Digital Photography (Second Edition)
6-7
Physics of Digital Photography (Second Edition)
6-8
Physics of Digital Photography (Second Edition)
6-9
Physics of Digital Photography (Second Edition)
read noise see ‘noise (read)’ response curve see ‘sensor response
real rays 1.1.2, 1.1.11, 1.5.2, 1.5.10 curve’
reciprocal rule 2.8.1 response function (camera) see ‘camera
recombination 3.6.2, 3.6.4 response functions’
recommended exposure index 2.6.3 response function (eye cone) see ‘eye
reconstruction 3.4.3, 5.7.1, 5.7.2 cone response functions’
reconstruction kernel 5.7.1, 5.7.2 responsivity see ‘camera response
rectangle function 3.1.6, 3.3.4, 3.4.3, functions’
5.7.1 RMS wavefront error see ‘wavefront
reference sphere see ‘Gaussian error’
reference sphere’ Robertson’s method 4.2.2, 4.9.1
reference white 4.1.12, 4.2.3, 4.8, rolling shutter 1.5.9
4.10.2 rotation matrix see ‘colour rotation
reference white (camera) 4.3.5, 4.5.2, matrix’
4.6.3, 4.8
reflectance 2.5.5 s-curve 2.3.2, 2.3.3, 2.12.2, 4.10
reflected-light metering see ‘metering sagittal direction 5.3.1, 5.3.4
(reflected-light)’ sampling 3, 3.1.10, 3.3.2, 3.3.3, 3.4.1,
reflection 2.9, 2.9.1, 2.11.2 5.7.1
refracting surface 1.1.1 sampling theorem 3.4.4
refraction 1.1.1, 2.11.2, 3.4.7 saturation 2.1.3, 2.6.1, 5.8.2, 5.8.3, 5.8.4
refractive index 1.1.1, 1.1.6, 1.1.8, saturation (colour) 4.1.1, 4.1.7, 4.3.4
1.1.9, 1.5, 5.10.1 scattering 2.9
refractive power see ‘power’ scene dynamic range see ‘dynamic range
relative aperture 1.5, 1.5.3, 1.5.7 (scene)’
relative colourimetric intent 4.11.4 scene luminance ratio see ‘dynamic
relative colourimetry 4.1.11 range (scene)’
relative illumination factor 3.1.2, 3.5 Scheimpflug condition 1.3.8
relative luminance see ‘luminance sense node capacitance 3.6.6
(relative)’ sensitivity 2.6
rendering intent 4.11.2, 4.11.4 sensor 3.3
resampling 5.7, 5.8.5 sensor format 1.3.6, 1.4.1, 5.1, 5.10.2
resizing 4.11.5, 5.8.5 sensor Nyquist frequency see ‘Nyquist
resolution (image display) 4.11.5 frequency (sensor)’
resolution (object) 5.10, 5.10.1 sensor plane 1.2
resolution (optical) 3, 5.3.2 sensor response curve 2.1.1, 2.1.3, 2.12.1
resolution (perceived) 5.2 shadow dynamic range see ‘dynamic
resolution (print) 4.11.5 range (shadow)’
resolution (screen) 4.11.5 shadow improvement 5.8.3, 5.10.6
resolving power 2.8.1, 5 shadows 2.9
resolving power (lens) 5.3.1, 5.3.2 Shannon-Whittaker sampling theorem
resolving power (observer) 1.4.1, 5, 5.2, see ‘sampling theorem’
5.2.1, 5.2.2 sharpness 5, 5.4, 5.6, 5.10.2
resolving power (system) 5.5, 5.6, 5.10, shutter 1.2.3, 1.5.9
5.10.1 shutter priority mode 2.8.2
6-10
Physics of Digital Photography (Second Edition)
6-11
Physics of Digital Photography (Second Edition)
tone mapping (local) 2.3.2, 2.3.3, 2.12.2, voltage 2.1.1, 2.1.3, 2.2.5, 3.6.6, 3.7.1,
4, 4.10 5.8.4
tone mapping operator 2.12.2, 4 von-Kries CAT 4.6.2
transfer function see ‘optical transfer
function’ wave see ‘sinusoidal waveform’
transmission function (colour filter wave optics 3.2.1
array) 3.6.4, 4.3.4 wavefront 3.2.1, 3.2.3
transmission function (sensor) 3.6.4, wavefront error 3.2.6, 5.3.2
4.3.4 wavelength 2.9.1, 2.11.3, 3.2.1, 3.2.6,
transformation matrix 4.4.2, 4.4.4, 4.7 5.10.1
transmittance 1.5.3, 1.5.8, 2.5.2, 2.10 wavelength (visible range) 3.1.1, 4.1.1
triangle function 3.1.6 wavenumber 3.2.1
tripod 5.8.1, 5.10.1 Weber-Fechner law 2.2.3
tristimulus values 2.1.2, 2.2.4, 4.1.2, well capacity 3.6.2
4.1.6, 4.1.8 white (reference) see ‘reference white’
tristimulus values (raw) see ‘raw tristi- white balance 2.1.2, 4.6, 4.8.1
mulus values’ white balance matrix 2.12, 4.8, 4.8.1,
4.8.2, 4.9.2
ultraviolet light 5.10.1 white balance multipliers see ‘raw
undersampling 3.4.4 channel multipliers’
uniform chromaticity scale see ‘colour white point 4.2.3, 4.3.5, 4.4.2, 4.6.1, 4.7
space (CIE 1960 UCS)’ working equivalence ratio see
unit focusing see ‘focusing’ ‘equivalence ratio (working)’
unity gain 3.7.3, 5.8 working f-number see ‘f-number
unsharp mask 5.4, 5.6 (working)’
upsampling 5.7, 5.7.1 working focal length see ‘focal length
(working)’
variance 3.8.1, 3.8.4 working space 4.11.1, 4.11.2, 4.11.4
vector (colour) 4.1.11, 4.1.12, 4.3.3
veiling glare 5.3.4 XYZ colour space see ‘colour space
viewfinder 1.2.3, 1.4.5, 5.10.3 (CIE XYZ)’
viewing distance 1.4.1, 5.2.1, 5.2.2,
5.2.4 ynu raytrace 1.1.5, 1.1.6, 1.1.8
vignetting 1.4.6, 1.5.7, 1.5.8, 2.5.2, 3.1.2 Young-Helmholtz theory 4.1.2
virtual image 1.3.1, 1.3.2
visible range see ‘wavelengths Zone System 2.5.3
(visible range)’ zoom lens 1.3.5
6-12