0% found this document useful (0 votes)

908 views374 pages

Physicsof Digital Photography

Uploaded by

Jovana Djokovic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

908 views374 pages

Physicsof Digital Photography

Uploaded by

Jovana Djokovic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 374

This content has been downloaded from IOPscience. Please scroll down to see the full text.

Download details:

IP Address: 109.92.175.138
This content was downloaded on 09/12/2020 at 15:06

Please note that terms and conditions apply.

Physics of Digital Photography: Fundamental optical formulae

A Rowlands

The accuracy of webcams in 2D motion analysis: sources of error and their control
A Page, R Moreno, P Candelas et al.

Enhancement of strain measurement accuracy using optical extensometer by application of

dual-reflector imaging
Feipeng Zhu, Pengxiang Bai, Hongjian Shi et al.

Parallel phase-shifting digital holography based on the fractional Talbot effect

Lluís Martínez-León, María Araiza-E, Bahram Javidi et al.

Negative PMMA as a high-resolution resist - the limits and possibilities

A C F Hoole, M E Welland and A N Broers

Fabrication of circular sawtooth gratings using focused UV lithography

Wujun Mi, Staffan Karlsson, Anders Holmberg et al.

Diffraction-limited depth-from-defocus imaging with a pixel-limited camera using pupil phase

modulation and compressive sensing
Takahiro Niihara, Ryoichi Horisaki, Mitsuhiro Kiyono et al.

Digital approximation to extended depth of field in no telecentric imaging systems

J E Meneses and C R Contreras
Physics of Digital Photography
(Second Edition)
IOP Series in Emerging Technologies in Optics and Photonics

Series Editor
R Barry Johnson a Senior Research Professor at Alabama A&M
University, has been involved for over 50 years in lens design,
optical systems design, electro-optical systems engineering, and
photonics. He has been a faculty member at three academic
institutions engaged in optics education and research, employed
by a number of companies, and provided consulting services.

Dr Johnson is an IOP Fellow, SPIE Fellow and Life Member, OSA Fellow, and was
the 1987 President of SPIE. He serves on the editorial board of Infrared Physics &
Technology and Advances in Optical Technologies. Dr Johnson has been awarded
many patents, has published numerous papers and several books and book chapters,
and was awarded the 2012 OSA/SPIE Joseph W Goodman Book Writing Award for
Lens Design Fundamentals, Second Edition. He is a perennial co-chair of the annual
SPIE Current Developments in Lens Design and Optical Engineering Conference.

Foreword
Until the 1960s, the field of optics was primarily concentrated in the classical areas of
photography, cameras, binoculars, telescopes, spectrometers, colorimeters, radio-
meters, etc. In the late 1960s, optics began to blossom with the advent of new types of
infrared detectors, liquid crystal displays (LCD), light emitting diodes (LED), charge
coupled devices (CCD), lasers, holography, fiber optics, new optical materials,
advances in optical and mechanical fabrication, new optical design programs, and
many more technologies. With the development of the LED, LCD, CCD and other
electo-optical devices, the term ‘photonics’ came into vogue in the 1980s to describe
the science of using light in development of new technologies and the performance of
a myriad of applications. Today, optics and photonics are truly pervasive throughout
society and new technologies are continuing to emerge. The objective of this series is
to provide students, researchers, and those who enjoy self-teaching with a wide-
ranging collection of books that each focus on a relevant topic in technologies and
application of optics and photonics. These books will provide knowledge to prepare
the reader to be better able to participate in these exciting areas now and in the future.
The title of this series is Emerging Technologies in Optics and Photonics where
‘emerging’ is taken to mean ‘coming into existence,’ ‘coming into maturity,’ and
‘coming into prominence.’ IOP Publishing and I hope that you find this Series of
significant value to you and your career.
Physics of Digital Photography
(Second Edition)
D A Rowlands

IOP Publishing, Bristol, UK

ª IOP Publishing Ltd 2020

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system
or transmitted in any form or by any means, electronic, mechanical, photocopying, recording
or otherwise, without the prior permission of the publisher, or as expressly permitted by law or
under terms agreed with the appropriate rights organization. Multiple copying is permitted in
accordance with the terms of licences issued by the Copyright Licensing Agency, the Copyright
Clearance Centre and other reproduction rights organizations.

Permission to make use of IOP Publishing content other than as set out above may be sought
at permissions@ioppublishing.org.

D A Rowlands has asserted his right to be identified as the author of this work in accordance with
sections 77 and 78 of the Copyright, Designs and Patents Act 1988.

ISBN 978-0-7503-2558-5 (ebook)

ISBN 978-0-7503-2559-2 (print)
ISBN 978-0-7503-2560-8 (myPrint)
ISBN 978-0-7503-2561-5 (mobi)

DOI 10.1088/978-0-7503-2558-5

Version: 20201001

IOP ebooks

British Library Cataloguing-in-Publication Data: A catalogue record for this book is available
from the British Library.

Published by IOP Publishing, wholly owned by The Institute of Physics, London

IOP Publishing, Temple Circus, Temple Way, Bristol, BS1 6HG, UK

US Office: IOP Publishing, Inc., 190 North Independence Mall West, Suite 601, Philadelphia,
PA 19106, USA
For my parents, Ann and Gareth
Contents

Preface xv
Author biography xvi
Abbreviations xvii

1 Photographic optics 1-1

1.1 Optical image formation 1-2
1.1.1 Refraction 1-2
1.1.2 Lens design 1-3
1.1.3 Paraxial imaging 1-4
1.1.4 Gaussian optics 1-6
1.1.5 Compound lenses: ynu raytrace 1-7
1.1.6 Principal planes 1-9
1.1.7 Gaussian conjugate equation 1-11
1.1.8 Thick and thin lenses 1-12
1.1.9 Focal length 1-14
1.1.10 Magnification 1-18
1.1.11 Lens aberrations 1-19
1.2 Focusing 1-20
1.2.1 Unit focusing 1-21
1.2.2 Internal focusing 1-23
1.2.3 Single lens reflex cameras 1-24
1.2.4 Phase-detect autofocus 1-25
1.3 Framing 1-27
1.3.1 Entrance and exit pupils 1-27
1.3.2 Chief rays 1-28
1.3.3 Pupil magnification 1-30
1.3.4 Angular field of view formula 1-32
1.3.5 Focus breathing 1-35
1.3.6 Focal length multiplier 1-36
1.3.7 Perspective 1-37
1.3.8 Keystone distortion 1-39
1.4 Depth of field 1-40
1.4.1 Circle of confusion 1-40
1.4.2 Depth of field formulae 1-42

vii
Physics of Digital Photography (Second Edition)

1.4.3 Depth of field control 1-44

1.4.4 Hyperfocal distance 1-44
1.4.5 Focus and recompose limits 1-46
1.4.6 Bokeh 1-48
1.5 Photometric exposure 1-51
1.5.1 Photometry 1-51
1.5.2 Flux emitted into a cone 1-53
1.5.3 Relative aperture 1-56
1.5.4 f-number 1-58
1.5.5 Working f-number 1-59
1.5.6 f-stop 1-61
1.5.7 Natural vignetting 1-62
1.5.8 Camera equation 1-64
1.5.9 Shutters 1-65
1.5.10 f-number for aplanatic lenses 1-68
References 1-71

2 Digital output and exposure strategy 2-1

2.1 Raw data 2-2
2.1.1 Sensor response 2-2
2.1.2 Colour 2-3
2.1.3 Dynamic range transfer 2-5
2.2 Digital output levels 2-6
2.2.1 Bit depth reduction 2-7
2.2.2 Posterisation 2-8
2.2.3 Lightness 2-9
2.2.4 Gamma encoding 2-9
2.2.5 Gamma decoding 2-11
2.3 Image dynamic range 2-13
2.3.1 Gamma curves 2-14
2.3.2 Tone curves 2-15
2.3.3 Raw headroom 2-17
2.3.4 Shadow and highlight dynamic range 2-17
2.4 Histograms 2-18
2.4.1 Luminance histograms 2-19
2.4.2 Image histograms 2-19

viii
Physics of Digital Photography (Second Edition)

2.5 Average photometry 2-20

2.5.1 Reflected light meter equation 2-21
2.5.2 Proportionality constant 2-22
2.5.3 Photographic constant 2-23
2.5.4 Hand-held meter calibration constant 2-24
2.5.5 Average scene luminance 2-24
2.5.6 Exposure value 2-25
2.6 Exposure index 2-26
2.6.1 ISO speed 2-27
2.6.2 Standard output sensitivity 2-30
2.6.3 Recommended exposure index 2-32
2.6.4 Extended highlights 2-32
2.7 Advanced metering 2-34
2.7.1 Exposure compensation 2-35
2.7.2 In-camera metering modes 2-35
2.7.3 Incident light metering 2-35
2.8 Exposure modes 2-36
2.8.1 Aperture priority 2-36
2.8.2 Shutter priority 2-37
2.8.3 Program mode 2-37
2.8.4 Manual mode 2-38
2.9 Photographic lighting 2-38
2.9.1 Sunrise and sunset 2-40
2.10 Neutral density filters 2-41
2.10.1 Graduated neutral density filters 2-42
2.11 Polarizing filters 2-44
2.11.1 Malus’ law 2-45
2.11.2 Surface reflections 2-46
2.11.3 Blue skies 2-47
2.11.4 Circular polarizing filters 2-48
2.12 High dynamic range 2-49
2.12.1 High dynamic range imaging 2-50
2.12.2 Tone mapping 2-52
2.13 Image display 2-54
2.13.1 Luma 2-54
2.13.2 Display luminance 2-55
2.13.3 Display dynamic range 2-56
References 2-57

ix
Physics of Digital Photography (Second Edition)

3 Raw data model 3-1

3.1 Linear systems theory 3-2
3.1.1 Radiometry 3-3
3.1.2 Ideal optical image 3-5
3.1.3 Point spread function (PSF) 3-5
3.1.4 Linear shift invariance 3-6
3.1.5 Convolution: derivation 3-7
3.1.6 Convolution: examples 3-9
3.1.7 Optical transfer function 3-11
3.1.8 Modulation transfer function (MTF) 3-13
3.1.9 Phase transfer function 3-15
3.1.10 Model camera system 3-15
3.2 Optics 3-17
3.2.1 Wave optics 3-17
3.2.2 Huygens–Fresnel principle 3-20
3.2.3 Aperture diffraction PSF 3-21
3.2.4 Circular aperture: airy disk 3-27
3.2.5 Aperture diffraction MTF 3-29
3.2.6 Aberrations: wavefront error 3-31
3.3 Sensor 3-33
3.3.1 Spatial averaging 3-34
3.3.2 Detector-aperture PSF 3-35
3.3.3 Sampling 3-36
3.3.4 Detector-aperture MTF 3-37
3.4 Optical low-pass filter 3-39
3.4.1 Function sampling 3-39
3.4.2 Replicated spectra 3-40
3.4.3 Reconstruction 3-41
3.4.4 Aliasing 3-42
3.4.5 Sensor Nyquist frequency 3-43
3.4.6 Pre-filtering 3-44
3.4.7 Four-spot filter PSF 3-45
3.4.8 Four-spot filter MTF 3-46
3.5 Sampled convolved image 3-47
3.5.1 Model camera system PSF 3-48
3.5.2 Model camera system MTF 3-49

x
Physics of Digital Photography (Second Edition)

3.6 Charge signal 3-50

3.6.1 Sampled spectral exposure 3-50
3.6.2 Photoelements 3-51
3.6.3 Colour filter array 3-52
3.6.4 Camera response functions 3-53
3.6.5 Polychromatic PSF and MTF 3-56
3.6.6 Charge detection 3-57
3.7 Analog-to-digital conversion 3-58
3.7.1 Programmable ISO gain 3-58
3.7.2 Digital numbers 3-59
3.7.3 Conversion factor 3-60
3.7.4 Bias offset 3-61
3.8 Noise 3-61
3.8.1 Photon shot noise 3-62
3.8.2 Read noise 3-62
3.8.3 Dark current shot noise 3-63
3.8.4 Noise power 3-64
3.8.5 Fixed pattern noise 3-64
3.9 Noise measurement 3-65
3.9.1 Conversion factor measurement 3-65
3.9.2 Read noise measurement 3-67
3.9.3 Noise models 3-68
References 3-71

4 Raw conversion 4-1

4.1 Reference colour spaces 4-2
4.1.1 Theory of colour 4-3
4.1.2 Eye cone response functions 4-4
4.1.3 Colour-matching functions 4-5
4.1.4 Units 4-7
4.1.5 Standard luminosity function 4-8
4.1.6 CIE RGB colour space 4-9
4.1.7 rg chromaticity diagram 4-10
4.1.8 CIE XYZ colour space 4-11
4.1.9 xy chromaticity diagram 4-14
4.1.10 Absolute colourimetry 4-16
4.1.11 Relative colourimetry 4-16
4.1.12 Reference white 4-17

xi
Physics of Digital Photography (Second Edition)

4.2 Illumination 4-18

4.2.1 Colour temperature 4-18
4.2.2 Correlated colour temperature 4-18
4.2.3 White point 4-20
4.2.4 Standard illuminants 4-20
4.3 Camera raw space 4-21
4.3.1 Raw channels 4-21
4.3.2 Colour demosaicing 4-22
4.3.3 Raw pixel vectors 4-24
4.3.4 Camera raw space primaries 4-24
4.3.5 Camera raw space reference white 4-25
4.4 Camera colour characterisation 4-25
4.4.1 Luther–Ives condition 4-26
4.4.2 Raw to CIE XYZ 4-26
4.4.3 Colour difference: CIE LAB 4-29
4.4.4 Transformation matrix normalisation 4-29
4.5 Output-referred colour spaces 4-30
4.5.1 sRGB colour space: linear form 4-32
4.5.2 CIE XYZ D65 to sRGB D65 4-33
4.5.3 Raw D65 to sRGB D65 4-34
4.6 White balance 4-35
4.6.1 Adopted white 4-37
4.6.2 Chromatic adaptation transforms 4-38
4.6.3 Raw channel multipliers 4-41
4.7 Strategy 1: transformation matrices + CAT 4-42
4.8 Strategy 2: raw channel multipliers + rotation matrix 4-43
4.8.1 Traditional digital cameras 4-44
4.8.2 dcraw 4-46
4.9 Adobe DNG 4-49
4.9.1 Method 1: transformation matrix + CAT 4-49
4.9.2 Method 2: raw channel multipliers + forward matrix 4-52
4.10 sRGB colour space: nonlinear form 4-53
4.10.1 sRGB digital output levels 4-53
4.10.2 sRGB colour cube 4-55
4.11 Raw processing workflow 4-56
4.11.1 Colour management 4-56
4.11.2 Maximal colour strategy 4-57

xii
Physics of Digital Photography (Second Edition)

4.11.3 16-bit TIFF files 4-58

4.11.4 Adobe Photoshop colour settings 4-59
4.11.5 Image resizing 4-63
References 4-65

5 Camera image quality 5-1

5.1 Cross-format comparisons 5-2
5.1.1 Equivalence and image quality 5-4
5.1.2 Generalised equivalence theory 5-6
5.1.3 Proof of equivalence theory 5-9
5.2 Perceived resolution 5-20
5.2.1 Observer resolving power 5-20
5.2.2 Standard viewing conditions 5-21
5.2.3 Circle of confusion: standard value 5-22
5.2.4 Circle of confusion: custom value 5-23
5.2.5 Circle of confusion: derivation 5-23
5.2.6 Depth of focus 5-25
5.3 Lens MTF 5-25
5.3.1 Lens MTF: standard viewing conditions 5-27
5.3.2 Lens MTF: lens resolving power 5-30
5.3.3 Cross-format comparisons 5-33
5.3.4 Limitations of lens MTF 5-34
5.4 Camera system MTF 5-35
5.4.1 Cross-format comparisons 5-35
5.5 Camera system resolving power 5-36
5.5.1 Model camera system 5-36
5.6 Perceived image sharpness 5-39
5.6.1 MTF50 5-40
5.6.2 Example: pixel count 5-40
5.6.3 Subjective quality factor 5-41
5.7 Image resampling 5-44
5.7.1 Upsampling 5-44
5.7.2 Downsampling 5-48
5.8 Signal-to-noise ratio (SNR) 5-49
5.8.1 SNR and ISO setting 5-50
5.8.2 SNR: output-referred units 5-50
5.8.3 SNR: input-referred units 5-52

xiii
Physics of Digital Photography (Second Edition)

5.8.4 ISO invariance 5-53

5.8.5 SNR and pixel count 5-54
5.8.6 SNR per unit area 5-55
5.8.7 SNR: cross-format comparisons 5-56
5.8.8 Raw ISO values 5-57
5.9 Raw dynamic range 5-58
5.9.1 Raw dynamic range per photosite 5-58
5.9.2 Sensor dynamic range 5-59
5.9.3 Perceivable dynamic range 5-60
5.10 Practical strategies 5-62
5.10.1 Object resolution 5-63
5.10.2 Diffraction softening 5-65
5.10.3 Non-destructive noise reduction 5-67
5.10.4 Exposing to the right (ETTR) 5-69
5.10.5 ETTR: variable exposure 5-69
5.10.6 ETTR: fixed exposure 5-70
References 5-71

Index 6-1

xiv
Preface

The aim of this book is to provide a theoretical overview of the photographic

imaging chain. It is intended for use by both graduate students and established
researchers as a link between imaging science and photographic practice. It should
also be useful for photographers who have a graduate-level technical background.
Chapter 1 titled ‘Photographic Optics’ describes the formation of an optical
image by a compound photographic lens. Topics discussed include focusing,
framing, perspective and depth of field. The final section derives the photometric
exposure distribution formed at the sensor plane of the camera.
Chapter 2 titled ‘Digital Output and Exposure Strategy’ discusses the strategy for
generating useful digital output in response to the photometric exposure distribution
at the sensor plane derived in chapter 1. In modern digital photography, the
standard exposure strategy defined by the CIPA DC-004 and ISO 12232 standards is
based upon the output JPEG image obtained from the camera. Consequently, this
chapter begins by discussing the nature of the digital output levels of a digital image.
The theory of the standard exposure strategy is subsequently developed in detail.
For a typical scene, the aim is to produce a JPEG image that has the correct mid-
tone lightness. The later sections discuss practical exposure strategy and cover topics
such as photographic lighting, exposure modes and advanced metering, photo-
graphic filters and high dynamic range imaging.
Chapter 3 titled ‘Raw Data Model’ uses linear systems theory to develop a model
of the raw data produced by a camera. The aim is to illustrate how the nature and
quality of the raw data is affected by phenomena such as aliasing and noise, along
with various blurring phenomena such as diffraction. A derivation of the charge
signal is included, along with a model of the analog-to-digital conversion process.
Chapter 4 titled ‘Raw Conversion’ describes the main steps involved in converting
the raw data into a viewable output colour image. The majority of the chapter is
devoted to colour conversion of the raw data. Unlike conventional academic
treatments, this chapter describes colour conversion and white-balancing strategies
that are used by digital cameras in practice.
Chapter 5 titled ‘Camera Image Quality’ discusses the theory behind camera and
lens image quality metrics. Unlike conventional treatments, this chapter demon-
strates how such metrics should be applied and interpreted when comparing camera
systems that have different sensor pixel counts or are based on different sensor
formats. Practical strategies for maximising the full image quality potential of a
camera system are also discussed.
In this second edition, the chapter structure of the first edition has been preserved
but the material has been reorganised and extensively rewritten. New material has
been added, typographical errors have been corrected, the figures have been
improved and a detailed index has been included.
I would like to thank Prof. R Barry Johnson for useful discussions.
Andy Rowlands, June 2020

xv
Author biography

D A Rowlands
Andy Rowlands gained a ﬁrst-class degree in Mathematics and
Physics and a PhD in Theoretical Condensed Matter Physics from
the University of Warwick, UK.
He was subsequently awarded a Fellowship in Theoretical Physics
from the Engineering and Physical Sciences Research Council
(EPSRC), which he held at the University of Bristol, UK, and
this was followed by research positions at Lawrence Livermore
National Laboratory, USA, Tongji University in Shanghai, China, and the University
of Cambridge, UK.
Andyʼs combined interests in physics and photography inspired the writing of this
book. His photographic work, much of which features China, can be viewed at
http://www.andyrowlands.com.

xvi
Abbreviations

1D one dimension
2D two dimensions
ADC analog-to-digital converter
ADU analog-to-digital unit
AF autofocus
AFoV angular field of view
AHD Adaptive Homogeneity-Directed
APEX Additive System of Photographic Exposure
APS-C Advanced Photo System Type-C
AS aperture stop
ATF aberration transfer function
Av aperture value
AW adopted white
BSI backside illumination
Bv brightness value
CAT chromatic adaptation transform
CCD charge-coupled device
CCE charge collection efficiency
CCT correlated colour temperature
CDS correlated double sampling
CFA colour filter array
CIE Commission Internationale de l’Eclairage
CIPA Camera and Imaging Products Association
CMM colour-matching module
CMOS complimentary metal-oxide semiconductor
CoC circle of confusion
CRT cathode ray tube
CSF contrast sensitivity function
CTF contrast transfer function
DCNU dark current non-uniformity
DN data number
DNG digital negative
DoF depth of field
DOL digital output level
DR dynamic range
DSC/SMI digital still camera sensitivity metameric index
D-SLR digital single-lens reflex
DSNU dark signal non-uniformity
EC exposure compensation
EP entrance pupil
ESF edge spread function
ETTR expose to the right
Ev exposure value
EW entrance window
EXIF Exchangeable Image File
FF fill factor
FFP front focal plane

xvii
Physics of Digital Photography (Second Edition)

FoV ﬁeld of view

FP focal plane
FPN fixed pattern noise
FS field stop
FT Fourier transform
FWC full-well capacity
GND graduated neutral density
HDR high dynamic range
HVS human visual system
ICC International Color Consortium
IP image plane
IQ image quality
ISO International Organization for Standardization
Iv incident light value
JPEG Joint Photographic Experts Group
LCD liquid crystal display
LDR low dynamic range
LENR long-exposure noise reduction
LRS least resolvable separation
LSI linear shift-invariant
LUT look-up table
MOS metal oxide semiconductor
MTF modulation transfer function
NA numerical aperture
ND neutral density
OA optical axis
OECF opto-electronic conversion function
OLPF optical low-pass filter
OP object plane
OPD optical path difference
OTF optical transfer function
PCS profile connection space
PDAF phase-detect autofocus
PDR photographic dynamic range
PGA programmable gain amplifier
PPG Patterned Pixel Grouping
PRNU pixel response non-uniformity
PSF point spread function
PTF phase transfer function
REI recommended exposure index
RMS root mean square
QE quantum efficiency
RA relative aperture
REI recommended exposure index
RFP rear focal plane
RI relative illumination
RP resolving power
SA spherical aberration
SQF subjective quality factor
SLR single-lens reflex

xviii
Physics of Digital Photography (Second Edition)

SNR signal-to-noise ratio

SOS standard output sensitivity
SP sensor plane
SPD spectral power distribution
SQF subjective quality factor
Sv speed value
TIFF Tagged Image File Format
TMO tone-mapping operator
TTL through-the-lens
Tv time value
UCS uniform chromaticity scale
USM unsharp mask
VNG Variable Number of Gradients
WB white balance
XP exit pupil
XW exit window

xix
IOP Publishing

Physics of Digital Photography (Second Edition)

D A Rowlands

Chapter 1
Photographic optics

A camera provides control over the light from the photographic scene as the light
flows through the lens to the sensor plane (SP), where an optical image is formed.
The nature of the optical image, along with the choice of exposure duration,
determines the exposure distribution at the SP. This stimulates a response from the
imaging sensor, and it is the magnitude of this response, along with subsequent
signal processing, that ultimately defines the appearance of the output digital image.
The photographer must balance a variety of technical and aesthetic factors that
determine the nature of the optical image formed at the SP. For example, the field of
view of a camera and lens combination is an important geometrical factor that
influences photographic composition. For a given sensor format, this is primarily
controlled by the choice of lens focal length. Another fundamental aesthetic aspect is
the choice of focus point and the depth of field (DoF), which is the depth of the
photographic scene that appears to be in focus. DoF is primarily controlled by the
object distance and the lens aperture diameter, which restricts the size of the light
bundle passing through the lens. Another aspect is the appearance of subject motion.
This depends on the exposure duration, which is controlled by the shutter. Short and
long exposure durations can, respectively, freeze or blur the appearance of subject
motion.
This chapter begins with the basics of optical image formation and works through
the optical principles that determine the fundamental nature of the optical image
and subsequent exposure distribution formed at the SP. Many of these principles can
be described using simple photographic formulae. Photometry is used to quantify
light, and Gaussian optics is used to describe light propagation in terms of rays.
Gaussian optics is a branch of geometrical optics that assumes ideal imaging by
neglecting lens aberrations and their associated image defects. Although aberrations
are an inherent property of spherical surfaces, modern photographic lenses are
generally well-corrected for aberrations through skilful lens design. Formulae based

doi:10.1088/978-0-7503-2558-5ch1 1-1 ª IOP Publishing Ltd 2020

Physics of Digital Photography (Second Edition)

on Gaussian optics predict the position and size of the optical image representative
of a well-corrected photographic lens.
In practice, an exposure distribution is required that stimulates a useful response
from the imaging sensor. This is a fundamental aim of a photographic exposure
strategy, which forms the subject of chapter 2. One of the key quantities involved is
the lens f-number. Although photography is typically carried out in air, the
derivation of the f-number presented in this chapter does not make any assumption
about the nature of the refractive media surrounding the lens. This helps to provide
deeper insight into its physical signiﬁcance.
A variety of physical phenomena that are beyond the scope of geometrical optics,
such as diffraction, are introduced in chapter 3. Such phenomena profoundly affect
image quality, which is the subject of chapter 5.

1.1 Optical image formation

1.1.1 Refraction
Geometrical optics, also known as ray optics, uses rays to represent the propagation
of light as it traces paths through the optical system. The ability of a lens to bend
rays is explained by Fermat’s principle, which infers that light rays will choose a
path between any two points that is stationary with respect to variations in the path
length [1]. Usually, the path is a time minimum. If a light ray travels more slowly in
one medium compared to another, the quickest path may result in a change of
direction at the interface between the media. This is known as refraction. It follows
that the shape and material of a refractive medium such as glass can be used to
control the direction of light rays. The equation of central importance is Snell’s law
of refraction:
n′ sin I ′ = n sin I . (1.1)
Here, the first medium is described by its refractive index
c
n = 0,
c
where c0 is the speed of light in vacuum and c is the speed of light in the medium. The
second medium is similarly described by its refractive index n′, and a refracting
surface separates the media. The angles I and I ′ are the angles of incidence and
refraction, respectively. These angles are measured from the normal to the refracting
surface at the point of ray intersection, as shown in figure 1.1 for a single spherical
surface.
Only rotationally symmetric systems will be considered in this chapter. The axis
of symmetry is known as the optical axis (OA) and is conventionally taken along the
z direction. The refracting surface is defined by its radius of curvature R centred at
position C on the OA. Other angles of fundamental importance are U and U ′, which
are measured relative to the OA.
Surface power Φs is the optical or refractive power of the surface measured in
diopters (inverse metres):

1-2
Physics of Digital Photography (Second Edition)

I
U

n n
R
U U z
C

S
Figure 1.1. Snell’s law of refraction illustrated using a converging surface with positive surface power. The
incident ray (blue) is refracted at the spherical surface (magenta). The dotted line is the surface normal. Here,
n′ > n , and R is positive since the surface is positioned to the left of C.

n′ − n
Φs = . (1.2)
R

A converging surface has a positive surface power. Given an incoming ray travelling
parallel to the OA, a converging surface will redirect the ray so that it intersects the
OA. On the other hand, a diverging surface has negative power and will bend such a
ray away from the OA. If R → ∞, then the spherical surface takes the form of a plane
positioned perpendicular to the OA. In this case, the surface has zero refractive
power and a ray travelling parallel to the OA cannot be redirected, irrespective of
the value of n′.
A simple lens or element is deﬁned as a transparent refractive medium such as
glass with refractive index nlens bounded by two refracting surfaces. A positive
overall refractive power is required for image formation by a photographic lens.
Although refraction by a single converging spherical surface can form an optical
image of an object, an image formed by a lens can be located outside the lens
refractive medium.

1.1.2 Lens design

Unfortunately, spherical surfaces inherently cause image defects known as aberra-
tions. These arise from the higher order terms in the expansion of the sine function
appearing in Snell’s law:
θ3 θ5 θ7
sin θ = θ − + − + ⋯.
3! 5! 7!

1-3
Physics of Digital Photography (Second Edition)

(1) (2) z

Figure 1.2. SA due to a single spherical surface. Rays originating from the same object point on the OA
converge to different image points (1) and (2) on the OA after undergoing refraction.

The simplest example conceptually is that of spherical aberration (SA). Consider an

object point on the OA positioned very far away from the spherical refracting
surface. Rays originating from this point that travel towards the surface will be
parallel to the OA when they reach the surface. When SA is present, these rays do
not converge at a unique image point on the OA after undergoing refraction. As
illustrated in ﬁgure 1.2, rays further from the OA converge closer to the surface.
However, aberrations can be minimized through the skilful design of a suitable
compound lens composed of an assembly of various types of simple elements. A
positive amount of a given aberration introduced by one element can be counter-
balanced by a negative amount of the same aberration introduced by another. Real
rays must be traced through the surfaces of the compound lens in order to determine
the nature of the resulting optical image. This complex computational task is
performed by lens designers using specialist software [2].
A perfectly designed lens will be free of aberrations. Although tradeoffs must be
made in practice and there will inevitably be residual aberrations, modern photo-
graphic lenses are very well-corrected for the primary aberrations. Accordingly,
photographic formulae are based upon Gaussian optics, which describes image
formation in the absence of aberrations. The basic theory of Gaussian optics is
developed below.
It should be remembered that Gaussian optics describes a perfectly designed lens
and should not be viewed as an approximation. As discussed further in section
1.1.11, the Gaussian properties of a compound lens are calculated as a ﬁrst step in
lens design in order to provide an ideal reference system when counterbalancing
aberrations.

1.1.3 Paraxial imaging

Fundamental information such as the position of the ideal optical image relative to
the object can be obtained by considering the so-called paraxial region around the

1-4
Physics of Digital Photography (Second Edition)

OA where all angles are inﬁnitesimally small. In particular, the following relation
holds exactly:
sin θ = θ (when θ → 0).
This means that the effects of aberrations are inﬁnitesimally small in the paraxial
region and so the image-forming properties are ideal. Paraxial quantities are
conventionally denoted using lower-case symbols:
S→s
S ′ → s′
I →i
I ′ → i′
U →u
U ′ → u′ .
Snell’s law becomes
n′ i′ = n i . (1.3)

For simplicity, again consider a single spherical surface separating two refractive
media. Figure 1.3 shows a point object p positioned at the OA at a distance s from
the surface. The plane perpendicular to the OA at p is defined as the object plane
(OP).
Consider a paraxial ray making an infinitesimal angle u with the OA. After
refraction, this ray will intersect the OA again at point p′ a distance s′ from the
surface. The point p′ is defined as the image of p. The plane perpendicular to the OA
at p′ is defined as the image plane (IP).
Graphical consideration of figure 1.3 along with equation (1.3) can be used to
derive the following relationship [1]:

OP IP

n n
p u u p
C z
R
s
s
Figure 1.3. Snell’s law in the paraxial region where all angles are inﬁnitesimally small. The object is positioned
at the OA, and the dotted line is the surface normal. Here, u is positive, u′ is negative and s, s′ are both positive.

1-5
Physics of Digital Photography (Second Edition)

n n′
+ = Φs . (1.4)
s s′

Here Φs is the surface power defined by equation (1.2). Given a single spherical
surface and the OP distance s, the significance of equation (1.4) is that only
knowledge of the surface power is needed to locate the corresponding IP.
The photographic sign convention is such that the OP distance s is positive when
measured from right to left, and the IP distance s′ is positive when measured from
left to right.
Although equation (1.4) can be used to locate the positions of the OP and IP, the
angles involved are paraxial. Therefore, the object at the OP and image at the IP can
only be points positioned infinitesimally close to the OA. Further development is
needed in order to describe imaging of real objects of arbitrary size.

1.1.4 Gaussian optics

In order to utilize the ideal paraxial properties of a surface for imaging real objects
of arbitrary size, a way forward is suggested by the following relation that holds
exactly in the paraxial region:
tan θ = θ (when θ → 0).
The paraxial angles measured relative to the OA satisfy
u = tan u ,
(1.5)
u′ = tan u′ .
When considering arbitrary real heights above the OA, Gaussian optics maintains the
above relationships by performing a ﬁctitious extension of the paraxial region [3].
The extension proceeds by projecting each refracting surface onto a tangent plane, and
allowing equation (1.5) to be used at arbitrary ray intersection heights y. As illustrated
in ﬁgure 1.4, the quantities u and u′ become
⎛y⎞ y
u = tan−1 ⎜ ⎟ = ,
⎝s⎠ s
(1.6)
⎛ −y ⎞ −y
u′ = tan−1 ⎜ ⎟ = .
⎝ s′ ⎠ s′
This means that outside of the paraxial region where y can be made arbitrarily large,
the quantities u and u′ are no longer paraxial angles, but instead must be interpreted
as ray tangent slopes [2, 3].
Similarly, u and u′ are equivalent to i and i′ since the normal to the tangent plane
is parallel to the OA, and so i and i′ must be interpreted as tangent slopes of
incidence and refraction rather than angles. Nevertheless, a ray described by u, u′ or
i , i′ is often referred to as a paraxial ray.
Recall from section 1.1.1 that a plane surface has no refractive power. In
Gaussian optics, the refractive power of the original spherical surface, which is

1-6
Physics of Digital Photography (Second Edition)

i y i

n n
u u
C z

s
s
Figure 1.4. The point of ray intersection with the spherical surface (magenta curve) at height y is projected
onto a tangent plane (red line) at the same height y. Here the ray tangent slopes u and u′ are positive and
negative, respectively.

also present in the paraxial region, is retained when extending the paraxial region.
Substituting equation (1.6) into (1.4) yields the following result:
n′u′ = nu − yΦs . (1.7)

This is the form of Snell′s law obtained by extending the paraxial region and
retaining the surface power. The signiﬁcance of this equation is the linear relation-
ship between the ray tangent slopes and heights; all slopes and heights can be scaled
without affecting the positions of the OP and IP. In other words, Fermat′s principle
must be violated in order to exclude aberrations from consideration.
For a single surface, equation (1.4) is now formally valid for imaging of points
positioned at arbitrary height above the OA and therefore real objects of arbitrary
size. Figure 1.5 shows an object of height h positioned at the OP. The location of the
IP can be found by tracing rays from the axial (on-axis) position on the OP. Rays
from a non-axial position on the OP such as the top of the object will meet at a
unique point on the same IP since aberrations are absent. Therefore, the IP deﬁnes
the plane where the optical image of the OP can be said to be in sharp focus.

1.1.5 Compound lenses: ynu raytrace

For imaging by a single surface, recall the following:
• Aberrations can be left out of consideration by taking the paraxial limit.
However, real objects of arbitrary size cannot be imaged in the paraxial
region.
• Gaussian optics linearly extends the paraxial region to arbitrary heights
above the OA while retaining the surface power. This enables real objects of
arbitrary size to be imaged.

1-7
Physics of Digital Photography (Second Edition)

OP IP

u
h
y

u z

n n

s s
Figure 1.5. Imaging by a spherical surface after extending the paraxial region. Example rays for an object
point on the OA and at the top of the object of height h are shown. The ray slopes and intersection height have
been indicated for the grey ray. In this case, both u and u′ are negative.

Since Gaussian optics can describe imaging of real objects by a single surface,
Gaussian optics can also describe imaging of real objects by a general compound lens
that may consist of a large number of surfaces of various curvatures and spacings.
This is achieved by tracing paraxial rays through the lens by applying equation (1.7)
at each surface. This is known as a ynu raytrace [2, 3]. Only ray slopes u and u′
measured relative to the OA are needed along with the ray intersection height y at
each tangent plane. The IP will be located after the ﬁnal surface.
As an example, ﬁgure 1.6 illustrates the transfer of a ray between two refracting
surfaces. Only the tangent planes to the surfaces are shown. Equation (1.7) can be
applied at the tangent plane to surface 1:
n1′u1′ = n1u1 − y1Φ1,
where u1 and u1′ are the ray slopes of incidence and refraction and y1 is the ray height
at the tangent plane. Similarly, equation (1.7) can be applied at the tangent plane to
surface 2:
n 2′u 2′ = n2u2 − y2 Φ2 .
The transfer of the ray between the surfaces is achieved by noting that
n2u2 = n1′u1′
y2 = y1 + t12 u1′,
where t12 is the axial distance between the surfaces.

1-8
Physics of Digital Photography (Second Edition)

1 2

y1 u1 u1

u2
y2

n1 n1 n2

t12

Figure 1.6. Transfer of a paraxial ray between two surfaces.

Details of standard layouts for various types of photographic lenses can be found
in the lens patent literature and a variety of textbooks [4–6].

1.1.6 Principal planes

Within Gaussian optics, a compound lens can be replaced by a set of cardinal points
that provide an equivalent description of its refractive properties. The cardinal
points include the principal points and focal points.
Recall from the previous section that for a specified OP, the corresponding IP is
located by performing a ynu raytrace through all surfaces. The principal and focal
points can be found by performing a ynu raytrace for an OP positioned at an infinite
distance away from the lens. Selecting the OP at an infinite distance away from the
lens is known as setting focus at infinity.
Figure 1.7 shows the first and final surfaces of a compound lens. As the OP
distance approaches infinity, rays from the OP that enter the lens will be collimated,
which means parallel to each other. Furthermore, rays from the axial (on-axis)
position on the OP will be parallel to the OA when entering the lens, which means
that u1 = 0. To see this, consider rays leaving the axial position on the OP in
figure 1.5 and observe the effect on the rays entering the lens as the OP is moved
towards infinity.
The ray shown in figure 1.7 intersects the first surface at height y1 and emerges
from the final surface with tangent slope uk. The ray subsequently intersects the OA
at the rear focal point F′ beyond the final surface. The second principal plane H′ is a
flat surface located by extending this ray backwards to height y1. The second
principal point P′ is located where H′ intersects the OA.
Figure 1.8 shows a ray that emerges parallel to the OA with uk = 0 from the final
surface at a height yk. This ray can be traced backwards to locate the front focal
point F positioned on the OA in front of the first surface. The first principal plane H
and associated first principal point P are located by extending the ray forwards to
height yk.

1-9
Physics of Digital Photography (Second Edition)

ﬁrst ﬁnal
surface H H surface

uk
z

P P F
n n

Figure 1.7. Principal points and second focal point F’ for a compound lens. The ray illustrated is used to locate
the second principal plane H′.

ﬁrst ﬁnal
surface H H surface

u1
z
F P P
n n

Figure 1.8. Principal points and ﬁrst focal point F for a compound lens. The ray illustrated is used to locate the
ﬁrst principal plane H.

The principal planes may be located in any order and not necessarily inside the
lens itself. The second principal plane H′ is often located to the left of H, and so the
terms ‘front’ and ‘rear’ are usually avoided.
An important property of H and H′ is that they are planes of unit magniﬁcation
since rays travel parallel between them in this equivalent refracting scenario. It must
be emphasised that rays do not necessarily follow the path indicated by the dashed
lines in ﬁgure 1.7 and 1.8, and that these equivalent refracting scenarios must
generally be treated as hypothetical.

1-10
Physics of Digital Photography (Second Edition)

Object space and image space

When a lens is represented by its cardinal points, the region of space to the left of the
first surface is extended up to H. This is defined as object space. Object space has the
same refractive index n as the region of space to the left of the first surface, which is
typically air.
Analogously, the region of space to the right of the final surface is extended up to
H’. This is defined as image space. Image space has the same refractive index n′ as
the region of space to the right of the final surface, which again is typically air.
The same overall refractive properties will result from replacing the lens and
surrounding refractive media by object and image space together with the pair of
principal planes H and H’. Object space defines the region of space before rays undergo
refraction, and image space defines the region of space after all refraction has taken
place. Image-space quantities are conventionally denoted using a prime symbol.

1.1.7 Gaussian conjugate equation

When a compound lens is replaced by the principal planes along with object space
and image space, equation (1.4) for refraction at a single surface can be generalised
to describe the compound lens:
n n′
+ =Φ . (1.8)
s s′
This is known as the Gaussian conjugate equation.

OP H H IP

n n

s s
Figure 1.9. The Guassian conjugate equation describes the relationship between the OP to H distance s and the
IP to H’ distance s′ for a general compound lens.

1-11
Physics of Digital Photography (Second Edition)

• Φ is the total refractive power of the compound lens. In general, Φ depends

not only on the individual surface powers, but also upon all surface and
element spacings. The value will emerge from the ynu raytrace described in
the previous section.
• n and n′ are the refractive indices of object space and image space, respectively.
• s is the OP distance measured along the OA from the ﬁrst principal point P.
• s′ is the IP distance measured along the OA from the second principal point P’.
• The photographic sign convention is to take s as positive when measured
from right to left, and s′ as positive when measured from left to right. These
distances are illustrated in ﬁgure 1.9.

1.1.8 Thick and thin lenses

For a general compound lens, the total refractive power Φ appearing in equation
(1.8) can be found by performing a ynu raytrace through all surfaces with the OP at
inﬁnity. However, very simple formulae for Φ can be found for the special cases of a
single thin or thick lens.

Thin lens
The thin lens illustrated in ﬁgure 1.10 has negligible thickness compared to its
diameter. The total refractive power Φ is simply the sum of the individual surface
powers Φ1 and Φ2 [1]:
Φ = Φ1 + Φ2 ,
n −n
Φ1 = lens ,
R1 (1.9)
n′ − nlens
Φ2 = .
R2

R2 R1

C2 C1

n tlens = 0 n
nlens

Figure 1.10. The idealised thin lens with t → 0. R1 and R2 are the radii of curvature of the ﬁrst and second
refracting surfaces, respectively. The corresponding centres of curvature are labelled C1 and C2. Here R2 is
negative because C2 lies to the left of the second surface.

1-12
Physics of Digital Photography (Second Edition)

If nlens > n, n′, then Φ1 and Φ2 are both positive. Equation (1.8) with Φ defined by
equation (1.9) is known as Descartes′ thin lens formula.
Since tlens → 0, the tangent planes for the two spherical surfaces shown in
figure 1.10 are coincident with the principal planes H and H′ ‘at the lens’.
Accordingly, the OP distance s and IP distance s′ are both measured ‘from the
lens’. Given s, the distance s′ can be found by solving equations (1.8) and (1.9).
Air has a refractive index equal to 1.000 277 at standard temperature and pressure
(0 °C and 1 atm). In comparison, water has a refractive index equal to 1.3330 at
standard temperature (20 °C). A value of 1 for the refractive index of air is often
assumed to simplify photographic formulae. For precise calculations, refractive
indices may be defined relative to air rather than to vacuum. In either case, equations
(1.8) and (1.9) simplify when the lens is surrounded by air:
1 1 ⎛1 1 ⎞
+ = (nlens − 1)⎜ − ⎟.
s s′ ⎝ R1 R2 ⎠
This is known as the lensmakers’ formula. In analogy with imaging by a single
refracting surface, the significance of Descartes’ and the lensmakers’ formulae is that
only the total refractive power is required to locate the IP.

Thick lens
In practice, an element of a compound lens can be treated mathematically as a thin
lens for preliminary calculations or if neglecting its thickness does not affect the
accuracy of the calculation [3]. For a single thick lens, a third term Φ12 emerges:

H H

R2 R1

z
C2 C1

n nlens n

tlens
Figure 1.11. Thick lens with thickness t lens .

1-13
Physics of Digital Photography (Second Edition)

Φ = Φ1 + Φ2 − Φ12 ,
n −n
Φ1 = lens ,
R1
n′ − nlens (1.10)
Φ2 = ,
R2
t
Φ12 = lens Φ1Φ2 .
nlens
As illustrated in ﬁgure 1.11, tlens is the lens thickness at the OA. Equation (1.8) with
Φ deﬁned by equation (1.10) is known as Gullstrand′s equation.

1.1.9 Focal length

A physically more useful way of representing the ability of a lens to bend light rays is
in terms of focal length rather than total refractive power Φ. This yields an
alternative form of the Gaussian conjugate equation.

Rear effective focal length

Recall from section 1.1.6 that when the OP distance s → ∞, rays originating from
the axial position on the OP will converge to the second focal point F′ upon exiting
the lens. This is shown in figure 1.12 (a). The plane perpendicular to the OA at F′ is
known as the rear focal plane. When s → ∞, the physical distance s′ along the OA
from P′ to F′ is defined as the second effective focal length [3], posterior focal length [2]
or rear effective focal length [7] denoted by fR′ or f ′. This is the minimum possible
distance over which the lens can bring rays to a focus. In this situation the IP will
coincide with the rear focal plane. This important concept is known as setting focus
at infinity. From equation (1.8),
n′
= Φ. (1.11)
f′

Evidently the total refractive power Φ can be obtained using the above equation
after performing a ynu raytrace through all surfaces and calculating f ′, where
f ′ = y1/uk .

Front effective focal length

Recall from section 1.1.6 that rays from the front focal point F will emerge parallel
to the OA so that the IP distance s′ → ∞. The plane perpendicular to the OA at F is
known as the front focal plane. The physical distance along the OA from P to F is
deﬁned as the ﬁrst effective focal length [3], anterior focal length [2] or front effective
focal length [7] denoted by fF or f . A lens is unable to focus on an object placed at F
or closer. From equation (1.8),
n
= Φ. (1.12)
f

1-14
Physics of Digital Photography (Second Edition)

(a) n n
H H

P P F
z

(b) n n
H H

F P P
z

f
Figure 1.12. Front and rear effective focal lengths for a compound lens. (a) Rear effective focal length f ′ and
rear focal point F′. (b) Front effective focal length f and front focal point F.

Gaussian conjugate equation

The front and rear effective focal lengths f and f ′ are equal only if n = n′. The term
‘effective’ is used because f and f ′ are the physical distances from the principal
points to the respective focal points, the principal point locations being determined
by hypothetical equivalent refracting planes.
Combining equations (1.8), (1.11) and (1.12) yields an alternative form of the
Gaussian conjugate equation,

n n n′ n′
= + = . (1.13)
f s s′ f′

This form of the Gaussian conjugate equation describes the behaviour illustrated in
ﬁgure 1.13. If the OP is brought forward from inﬁnity so that s decreases, the
distance s′ increases and so the IP moves further away from the rear focal plane.

1-15
Physics of Digital Photography (Second Edition)

OP n H H n IP

F P P F
z

f f

s s
Figure 1.13. Imaging according to the Gaussian conjugate equation. Only rays from the axial position on the
OP are shown.

Front and back focal distances

The front focal distance is measured from the point where the ﬁrst surface of the lens
intersects the OA to the front focal point. The rear or back focal distance is measured
from the point where the ﬁnal surface of the lens intersects the OA to the rear focal
point [3, 7]. These distances are commonly used in lens design. Care should be taken
not to confuse them with f and f ′ which are ‘effective’ quantities measured from the
principal planes.

Nodal points
Photographers often refer to the nodal points of a lens rather than the principal
points. The nodal points can be visualised intuitively; a ray aimed at the first nodal
point in object space will emerge from the second nodal point with the same slope in
image space. The nodal points are therefore points of unit angular magnification.
The front nodal point is often assumed to be the no-parallax point in panoramic
photography; in fact, the no-parallax point is the lens entrance pupil [8].
The front and rear effective focal lengths are always defined from the principal
points [1, 7]. Nevertheless, they can be measured from the nodal points provided the
definition is reversed [4]; the distance from the first nodal point to the front focal
point is equal to the magnitude of the rear effective focal length f ′, and the distance
from the second nodal point to the rear focal point is equal to the magnitude of the
front effective focal length f . This is illustrated in figure 1.14.
If the object-space and image-space refractive indices n and n′ are equal, then the
first nodal and first principal points coincide, the second nodal and second principal
points coincide, and f = f ′. This is naturally the case in air, where a nodal slide can
be used to experimentally determine effective focal lengths [1].

Effective focal length

The effective focal length fE is deﬁned as the reciprocal of the total refractive
power [7]:

1-16
Physics of Digital Photography (Second Edition)

f H H f

n n (> n)

F P P u = u F
z
N N
u

f f

Figure 1.14. Lens nodal points; a ray aimed at the ﬁrst nodal point, N, emerges from the second nodal point,
N′, at the same tangent slope, u. In this example, image space has a higher refractive index than object space.

Table 1.1. Focal lengths and surface powers for a thin lens calculated using
equation (1.9) assuming that R1 = 0.1m, R2 = −0.1 m and n lens = 1.5. All powers
and focal lengths are measured in diopters and metres, respectively.

Air–lens–air Water–lens–water Water–lens–air Air–lens–oil

n 1 1.33 1.33 1
n′ 1 1.33 1 1.5
Φ1 5 1.7 1.7 5
Φ2 5 1.7 5 0
Φ 10 3.4 6.7 5
fE 0.1 0.294 0.149 0.2
f 0.1 0.391 0.199 0.2
f′ 0.1 0.391 0.149 0.3

1
fE = . (1.14)
Φ
Combining equations (1.11), (1.12) and (1.14) reveals the following relationship:
n n′ 1
= = . (1.15)
f f′ fE
Unlike f and f ′, the effective focal length fE is not a physically representable length
in general. However, in photography the usual case is that the object-space and
image-space refractive media are both air so that n = n′ = 1. In this case, fE = f = f ′
and photographers simply refer to the ‘focal length’. The Gaussian conjugate
equation may then be written as follows,

1-17
Physics of Digital Photography (Second Edition)

1 1 1
+ = (in air).
s s′ fE

In order to illustrate the general relationship between f, f ′ and fE , table 1.1 lists
example data for a thin lens calculated using Descartes’ formula. It can be seen that
if either of n or n′ are changed, then the total refractive power Φ and effective focal
length fE will change. Signiﬁcantly, this affects the values of both f and f ′.

1.1.10 Magnification
Lateral or transverse magnification m is defined as the ratio of the image height h′ to
the object height h:
h′
m= . (1.16)
h
As shown in figure 1.15, the IP projected onto the imaging sensor by a photographic
lens is real and inverted. In optics, the usual sign convention is to take h and h′ as
positive when measured upwards from the OA, and so h′ and m are negative quantities.
Within Gaussian optics, m does not vary over the IP and it can be expressed in terms
of the object and image distances measured along the OA from the principal points,
ns′
m=− .
n′s
Substituting into the Gaussian conjugate equation defined by equation (1.13) yields
f
m= . (1.17)
f−s
The Lagrange theorem can be used to express m in terms of the initial and final ray
slopes:
nu
m= . (1.18)
n′u′
Although this book uses the optics sign convention for the object and image
heights, m is often taken to be a positive value in photographic formulae. In order to
OP IP
n
h
u u

h
n

s s
Figure 1.15. Magniﬁcation is deﬁned by m = h′/h within Gaussian optics.

1-18
Physics of Digital Photography (Second Edition)

be consistent with such formulae, −m is denoted by ∣m∣ where appropriate in this

book. For example, equation (1.17) can be expressed as follows:

f
∣m∣ = . (1.19)
s−f

This commonly encountered formula shows that the magnification reduces to zero
when focus is set at infinity.
A macro lens can achieve life-size (1:1) reproduction. This occurs when the OP is
positioned at s = 2f so that ∣m∣ = 1. Higher magnifications may be possible for
f < s < 2f via use of a bellows, lens extension tube, or close-up filter.

1.1.11 Lens aberrations

Gaussian optics describes a perfectly designed lens that is free of aberrations, and a ynu
raytrace (using so-called ‘paraxial rays’, which are actually ray tangent slopes) is
typically performed as a first step in lens design [2, 3]. The cardinal points and position
of the ideal IP obtained from the ynu raytrace define an ideal reference system.
Subsequently, real rays are traced through the lens to determine the actual
aberration content of the lens, and the aberrations are appropriately minimised
through counterbalancing by adjusting the lens design. A real raytrace involves the
tracing of real rays through the lens using exact trigonometrical equations based on
Snell’s law defined by equation (1.1):
n′ sin I ′ = n sin I .
Aberrations arise from the higher order terms in the expansion of the sine function
θ3 θ5 θ7
sin θ = θ − + − + ⋯.
3! 5! 7!
Since Gaussian optics is based on a linear extension of the paraxial region, the
imaging properties of the lens at larger radial distances outwards from the OA are
represented by the ideal imaging properties of the paraxial region. In other words,
the higher order terms in the above expansion are neglected at each surface, and so
the overall design will necessarily be free of aberrations.
When real rays are traced through the compound lens, aberrations due to the
higher order terms will arise from each surface. The primary or Seidel aberrations
arise from the third order term.
• Spherical aberration (SA): As discussed in section 1.3.1, lenses contain an
aperture that restricts the size of the ray bundle that can pass through the lens.
SA arises due to variation of focus with ray height in the aperture. In other
words, the rear effective focal length f ′ differs from the Gaussian value for rays
that are higher in the aperture, as illustrated in figure 1.2. For elements with
spherical surfaces, a converging element contributes under-corrected SA so that
rays come to a focus closer to the lens as the ray height increases. Conversely, a
diverging element contributes over-corrected SA. Expensive modern photo-
graphic lenses may include aspherical elements that do not contribute any SA.

1-19
Physics of Digital Photography (Second Edition)

• Coma: Comatic aberration or coma is caused by variation of magniﬁcation

with ray height in the lens aperture so that rays from a given object point will
be brought to a focus at different heights on the IP. The image of a point is
spread into a non-symmetric shape that resembles a comet. Coma is absent at
the OA but increases with radial distance outwards.
• Astigmatism: This arises because rays from an off-axis object point are
presented with a tilted rather than symmetric aperture. A point is imaged
as two small perpendicular lines. Astigmatism improves as the aperture is
reduced in size.
• Petzval field curvature: This describes the fact that the IP corresponding to a
flat OP is itself not naturally flat. Positive and negative elements introduce
inward and outward curvature, respectively. Field curvature increases with
radial distance outwards.
• Distortion: This describes variation of magnification with radial height on the
IP. The amount of distortion varies as the cube of the radial height. This
means that positive or pincushion distortion will pull out the corners of a
rectangular image crop to a greater extent than the sides, whereas negative or
barrel distortion will push them inwards to a greater extent than the sides.
Distortion does not introduce blur and is unaffected by aperture size.

Two further Seidel aberrations appear in the presence of polychromatic light.

• Chromatic aberration: This arises from variation of focus with wavelength λ.
• Lateral colour: This arises due to off-axis variation of magniﬁcation with λ.

Aberrations can be counterbalanced by utilising the degrees of freedom provided by

the surfaces and spacings in the compound lens. In a perfectly designed lens, all
residual deviations away from the ideal Gaussian reference system will be elimi-
nated. It is therefore inappropriate to describe Gaussian optics or the paraxial limit
as an approximation.
In practice, modern photographic lenses are typically very well-corrected for the
Seidel aberrations. Residual higher order aberrations affect the nature and quality of
the ﬁnal image. Depending on the ﬁnal aberration content of the lens design, the
optimized image location may be fractionally shifted from the ideal Gaussian location.

1.2 Focusing
In the previous section describing optical image formation, an OP was selected and
the Gaussian conjugate equation was used to determine the location of the
corresponding IP where the optical image of the OP appears in sharp focus. In a
digital camera, the IP needs to be positioned to coincide with the imaging sensor.
However, the location of the imaging sensor is fixed, and so a focusing operation is
required in general.
In a digital camera, the imaging sensor is fixed in position at the sensor plane (SP),
which is analogous to the film plane in a film camera. Recall the definition of the rear
focal plane and the concept of setting focus at infinity described in sections 1.1.6 and

1-20
Physics of Digital Photography (Second Edition)

1.1.9. Significantly, the SP is positioned to coincide with the rear focal plane when
focus is set at infinity.
When considering the Gaussian conjugate equation defined by equation (1.13), it
is clear that when the OP is brought forward from infinity, the IP will no longer
coincide with the rear focal plane and SP but will naturally move behind them. In
order to record an optical image in sharp focus at the SP, the IP must be brought
forward to coincide with the SP. This is achieved by moving the optical elements of
the lens in order to bring the rear focal plane forward until the IP and SP coincide.
The optical elements can be moved either manually by adjusting the lens barrel or
via an autofocus (AF) motor linked to the AF system. Since the SP and IP only
coincide with the rear focal plane when focus is set at infinity, these terms should not
be used interchangeably.
In a lens that utilizes traditional or unit focusing, the whole lens barrel is moved in
order to achieve sharp focus. Other focusing methods include front-cell focusing
and internal focusing. In the case of internal focusing, a floating optical element
or group is moved inside the lens in order to achieve sharp focus [5, 6, 9]. As
discussed in section 1.3, focusing can affect the framing of the scene. This is known
as focus breathing, and different focusing methods have different breathing
properties.

1.2.1 Unit focusing

Provided the OP and IP distances are measured along the OA from the compound
lens principal points, the Gaussian conjugate equation can be used to calculate
the required focusing movement for a unit focusing lens. For simplicity, it will
be assumed that the lens is immersed in air so that n = n′ = 1 and therefore
f = f ′ = fE .
As illustrated in figure 1.16(a), the rear focal plane, IP, and SP all coincide when
focus is set at infinity. If the OP is then brought forward, the IP will move behind the
SP by a distance or extension e. This is illustrated in figure 1.16(b), where the OP and
IP distances have been labelled by l and l ′. These distances satisfy the usual Gaussian
conjugate equation,
1 1 1
+ = .
l l′ fE
The aim is to eliminate e by moving the lens barrel forward. However, this
movement will also reduce the original OP distance l by e. The final geometry is
illustrated in figure 1.16(c). The final OP and IP distances s and s′ satisfy the
following Gaussian conjugate equation:
1 1 1
+ = . (1.20)
l−e fE + e fE
This can be rearranged as a quadratic equation,
e 2 − (l − fE )e + f E2 = 0.

1-21
Physics of Digital Photography (Second Edition)

(a) SP
H H IP

P P
z

Δ
s→∞ s = f
(b) OP H H SP IP

P P
z

Δ f e

l l = f + e

P P
z

Δ f e
s=l−e
d s = f + e

Figure 1.16. Geometry for focusing movement in a lens that utilizes unit focusing. Note that f ′ = fE for a lens
immersed in air.

1-22
Physics of Digital Photography (Second Edition)

The required solution is

1
e=
2((l − fE ) − (l − fE )2 − 4f E2 ). (1.21)

Here l is the original OP distance before the focusing movement. Since e has been
defined relative to the infinity focus position, e → 0 when s → ∞.
The magnification can be expressed in terms of e by combining equations (1.19)
and (1.20),
e
∣m∣ = .
fE
As required, ∣m∣ → 0 when focus is set at infinity.
Since a lens is unable to focus on an OP positioned at the front focal point or
closer, equation (1.21) reveals that a unit-focusing lens is unable to focus on an OP
positioned closer than an original OP distance l = 3fE from the first principal point.
At this minimum focus distance, e = fE . After the focusing movement, the minimum
possible object-to-image distance is 4fE + Δ, where Δ is the separation between the
compound lens principal planes. Note that Δ can be positive or negative, depending
on the order of the principal planes. For an ideal thin lens, Δ → 0 and the minimum
possible object-to-image distance is 4fE .
Manual focusing scales on lenses are calibrated using the distance from the SP to
the OP. The location of the sensor/film plane is indicated by a o symbol marked on
the camera body. From figure 1.16(c) where d denotes the SP to OP distance, the
extension e is obtained by making the substitution d = l + fE + Δ in equation (1.21),
1
e=
2 (
(d − 2fE − Δ) − (d − 2fE − Δ)2 − 4f E2 . )
The above formulae derived within Gaussian optics are almost exact for a photo-
graphic lens that has been well-corrected for aberrations. In practice, defocus
aberration may be used to counterbalance other aberrations, and so the ideal IP
position may be intentionally shifted very slightly from the Gaussian image position.

1.2.2 Internal focusing

Lenses that utilize internal focusing [5, 6, 9] achieve sharp focus by movement of a
floating element or group along the OA in order to change the total refractive power
of the lens.
Figure 1.17 shows an example telephoto lens design that utilizes internal focusing
[10, 11]. The front group has positive overall refractive power, the floating group has
negative overall refractive power, and the rear group has positive overall refractive
power. When the OP distance s is reduced, the floating group moves towards the rear
group. The new layout has a higher total refractive power and therefore a reduced
effective focal length. The change in focal length is such that the Gaussian conjugate
equation is satisfied for the OP distance s. Advantages of internal focusing include
the following:

1-23
Physics of Digital Photography (Second Edition)

front group ﬂoating group rear group

(postive power) (negative power) (postive power)
Figure 1.17. Internal focusing for an example telephoto lens design.

• The lens does not physically extend.

• The lens can be fully weather sealed.
• The lens barrel does not rotate upon focusing, and so polarizing filters and
graduated neutral density filters can be used without issue.
• Only a small movement of the floating element is needed. This is highly
advantageous for applications that require fast AF.
• Focus breathing can in principle be eliminated. This is particularly advanta-
geous for cinematography lenses. As discussed in section 1.3.5, focus breath-
ing is a change in framing or angular field of view that occurs upon setting
focus at a different OP distance.

1.2.3 Single lens reﬂex cameras

Single lens reflex (SLR) cameras [6, 9] use a type of viewfinder in which the camera
lens is an integral part. This has the advantage that the photographer can directly
look through the lens to frame the scene and set focus.
As illustrated in figure 1.18(a), the SLR viewfinder uses a 45° reflex mirror that reflects
the cone of light emerging from the camera lens onto a ground glass focusing screen. The
focusing screen is positioned in a plane that is optically equivalent to the real SP but
rotated 90° vertically. A roof pentaprism that incorporates an ocular lens (eyepiece) is used
to form a virtual image of the focusing screen that is seen when the photographer looks
through the eyepiece. As illustrated in figure 1.18(b), the roof pentaprism corrects the
reversed horizontal orientation of the optical image at the focusing screen while
simultaneously turning the image through 90° for viewing at eye level.
When focus is set at infinity, the overall scene magnification observed through the
viewfinder is defined by the camera lens focal length divided by the ocular lens focal
length. For example, a 50 mm lens on the 35 mm full frame format gives an overall
visual magnification of 0.8× when used with a 62.5 mm ocular.
As described in section 1.5.9, a camera uses a shutter to control the time that the
imaging sensor is exposed to light. On cameras that use an SLR viewfinder, the reflex
mirror swings by 45° into the horizontal position when the shutter is activated. The
mirror returns to its starting position when the shutter closes.

1-24
Physics of Digital Photography (Second Edition)

roof
(a) pentaprism

ocular

focusing screen

reﬂex SP
mirror

autofocus
module

(b)

eye

Figure 1.18. (a) Roof pentaprism used in the SLR viewﬁnder. Here the cone of light emerging from the lens
exit pupil (XP) corresponds to the central position on the SP. (b) The pentaprism corrects the orientation of the
optical image seen by the eye.

1.2.4 Phase-detect autofocus

Modern digital cameras with an SLR viewfinder (D-SLR cameras) use phase-detect
autofocus (PDAF) systems [6, 9]. These systems have evolved from the design introduced
in the Minolta Maxxum film camera in 1985, which was the first SLR camera to have
built-in AF functionality. The AF module [12] was based upon an idea by Honeywell [13].
Figure 1.18(a) shows that the reflex mirror has a zone of partial transmission. The
transmitted light is directed by a secondary mirror down to an AF module located at

1-25
Physics of Digital Photography (Second Edition)

SP
(equivalent) microlenses CCD signal

(a)

(b)

(c)

Figure 1.19. Principle of the PDAF module. (a) The OP is in sharp focus at the equivalent SP and so the
signals on each half of the CCD strip are identical. (b) and (c) The OP is out of focus as indicated by the signal
phase shift on each half of the CCD strip.

a plane that is optically equivalent to the SP. Figure 1.19 illustrates the basic
function of the AF module. Tiny microlenses direct the light arriving from the
camera lens onto a CCD strip.
Consider light originating from a small region on the OP indicated by an AF
point on the focusing screen. Photographic lenses contain an adjustable aperture
stop (AS) such as that shown in ﬁgure 1.21 that limits the diameter of the ray bundle
that can pass through the lens. It will be explained in section 1.3.1 that the exit pupil
(XP) is the image of the AS due to optical elements behind the AS. PDAF is based
upon the fact that when the OP is in sharp focus at the SP, symmetry dictates that
the light distribution over equivalent portions of the XP will appear identical when
viewed from the small region on the SP that corresponds to the AF point.
Figure 1.19(a) shows that in this case, an identical optical image and electronic
signal is obtained along each half of the CCD strip. In ﬁgure 1.19(b) and (c), the
optical image at the SP is not in sharp focus and is described as being out of focus. In
these cases, the optical images on each half of the CCD strip will shift either towards
or away from each other. The nature of the shift indicates the direction and amount
by which the lens needs to be adjusted by its AF motor.
A horizontal CCD strip is suited for analysing signals that arise when the AF
point contains horizontal detail. The optimum signal will arise from a change of

1-26
Physics of Digital Photography (Second Edition)

Figure 1.20. An example focusing screen with 13 AF points. The left and right groups correspond to horizontal CCD
strips contained in the AF module, whereas the AF points in the central group correspond to cross-type CCD strips.

contrast provided by a vertical edge. This means that the AF module may be unable
to achieve focus if the AF point covers an area that contains only a horizontal edge.
Conversely, a vertical CCD strip is suited for analysing signals that arise from detail
in the vertical direction such as a horizontal edge. Cross-type AF points such as
those shown in the centre of ﬁgure 1.20 utilize both a horizontal and a vertical CCD
strip so that scene detail in both directions can be utilized to achieve sharp focus.

1.3 Framing
The framing or field of view (FoV) is the area of the OP that can be imaged by the
camera. In this section, it is shown that the FoV is formed by a specific light cone
with the base at the OP and apex situated at the lens entrance pupil. In photography,
a more useful way to express FoV is via the angle of the FoV or angular field of view
(AFoV) [3]. The AFoV describes the angular extent of the light cone that defines the
FoV, and it can be measured along the horizontal, vertical or diagonal directions.
Equation (1.14) defined the effective focal length fE as the reciprocal of the total
refractive power. The shorter the effective focal length of a lens, the more strongly it
is able to bend the incoming light rays, and hence the wider the AFoV. Conversely,
the longer the effective focal length, the narrower the AFoV.
A number of important concepts will be introduced through the derivation of the
AFoV formula. These include entrance and exit pupils, field stops, chief rays, pupil
magnification, bellows factor and focus breathing. Finally, the concept of perspective will
be described. This depends upon the distance between the OP and the entrance pupil.

1.3.1 Entrance and exit pupils

Photographic lenses include a physical aperture stop (AS) that limits the amount of
light passing through the lens. This is typically an adjustable iris diaphragm [9] such
as that shown in ﬁgure 1.21. Since there will be optical elements in front of and
behind the AS in a compound lens, it is more appropriate to consider the images of

1-27
Physics of Digital Photography (Second Edition)

Figure 1.21. Adjustable 8-blade iris diaphragm that acts as an aperture stop. Iris diaphragms with an odd
number of blades are more commonly found in practice.

the AS in object space and image space rather than the AS itself. In a photographic
lens, these images are typically virtual and so cannot be displayed on a screen.
• Within Gaussian optics, the entrance pupil (EP) is a flat surface that coincides
with the Gaussian object-space image of the AS formed by the elements in
front of the AS.
• Within Gaussian optics, the exit pupil (XP) is a flat surface that coincides with
the image-space Gaussian image of the AS formed by the elements behind the
AS.
• For a symmetric lens design, the EP and XP will coincide with the first and
second principal planes, respectively. Accordingly, they may be located in any
order and not necessarily inside the lens.
• As discussed in section 1.5.10, the pupils are not flat surfaces when a lens is
described using real ray optics, but rather they are portions of spheres that
intersect the Gaussian images of the AS at the OA [14].

Figure 1.22 shows the location of the pupils for an example lens design in which the
images of the AS are real, and figure 1.23 shows a photographic lens design in which
the images of the AS are virtual. As evident from the ray bundles bounded by the
rays highlighted in blue in the figures, the pupil diameters control the amount of light
that can enter and exit the lens. This is fundamental for defining exposure, as
discussed in section 1.5. However, it is also evident from the ray bundles bounded by
the rays highlighted in green in the figures that the pupil positions are fundamental
for defining the AFoV indicated by α. This is because the object-space centre of
perspective is located at the centre of the EP, and the image-space centre of
perspective is located at the centre of the XP.

1.3.2 Chief rays

One or more field stops may be present in the lens. A field stop (FS) blocks the
passage of rays. One of these field stops will be the limiting FS. The Gaussian images

1-28
Physics of Digital Photography (Second Edition)

EW EP FS AS XP XW

u α u

Figure 1.22. The chief rays (green) define the AFoV (α). The marginal rays (blue) define the maximum cone
acceptance angles u and u′ and therefore the amount of light passing through the lens, but are not relevant to a
discussion of framing or AFoV. Shown are the entrance window (EW) positioned on the OP, the entrance
pupil (EP), field stop (FS), aperture stop (AS), exit pupil (XP) and exit window (XW) positioned on the SP.

EW XP AS EP XW

u
α

Figure 1.23. Virtual pupils for an example photographic lens design. The chief rays (green) deﬁne the AFoV.

of the limiting FS seen through the front and back of the lens are known as the
entrance window (EW) and exit window (XW), respectively. When a photographic
lens is focused on a specified OP, these windows are positioned on the OP and SP so
that the FoV area is defined by the EW.
A meridional plane is a plane containing the OA. As illustrated in figure 1.22, the
chief ray (highlighted in green) is the incoming meridional ray from the scene that
passes by the edge of the limiting FS. The chief ray passes through the centre of the

1-29
Physics of Digital Photography (Second Edition)

AS. Consequently, the chief ray also passes through the centre of the pupils, as
evident in figure 1.22 and 1.23. In the photographic lens with virtual pupils shown in
figure 1.23, the virtual extension of the entering chief ray passes through the centre of
the EP, and the virtual extension of the exiting chief ray passes through the centre of
the XP.
Formally, the chief ray defines the AFoV as the angle α subtended by the edges of
the EW from the centre of the EP. Therefore, the apex of the light cone forming the
AFoV is located at the centre of the EP. If a lens designer moves the position of the
AS, the pupil positions and pupil magnification (see below) will change. This will
alter the AFoV and perspective (see section 1.3.7), but the FoV area defined by the
EW will remain unchanged.
Although not needed for a discussion of framing, it is instructive to note that the
marginal ray (highlighted in blue in the figures) is the meridional ray from the axial
position on the OP that passes by the edge of the AS. Consequently, the marginal ray
also passes by the edges of the pupils. In other words, the EP diameter defines the
maximum cone of light that emerges from the axial position on the OP and is accepted
by the lens, and the XP diameter defines the maximum cone of light that subsequently
exits the lens and converges at the axial position on the SP. Adjusting the diameter of
the AS restricts the size of the ray bundle formed by the marginal rays. Evidently, this
will affect the amount of light reaching the SP but will not affect the AFoV.
Note that within Gaussian optics, the chief and marginal rays are treated
mathematically as ray tangent slopes by extending the paraxial region according
to equation (1.6).

1.3.3 Pupil magniﬁcation

The pupil factor or pupil magnification [4–6] denoted by m p that often appears in
photographic formulae is defined as
D XP
mp = . (1.22)
D
Here D XP is the diameter of the XP, and D is the diameter of the EP. If m p = 1, the
centre of the EP is coincident with the first principal point P and the centre of the XP
is coincident with the second principal point P’. When m p ≠ 1, the pupils will be
displaced from the principal planes. Typically, m p < 1 for telephoto lenses, m p > 1
for wide-angle lenses, and for zoom lenses, m p may vary with focal length.
The AFoV and perspective will be affected by any displacement of the pupils.
Since object and image distances are conventionally measured from the principal
points, it is important to find formulae for the distances between the pupils and
principal planes in terms of the pupil magnification. These distances can be found by
considering figure 1.24, where focus has been set at infinity. This eliminates the
magnification variable since ∣m∣ → 0 when s → ∞.
Let sEP be the distance from the first principal point along the OA to the EP, and
′ be the distance from the second principal point along to the OA to the XP. In
let sXP
figure 1.24, rays emerge from the XP and form a cone by converging at the rear focal

1-30
Physics of Digital Photography (Second Edition)

(a) EP H XP H SP
mp > 1

1
2
D mpD2
z

(1 − mp )f

(b) EP XP SP
mp = 1

1
2
D
z

1
2
D mpD2
z

(1 − mp )f

Figure 1.24. Gaussian pupil and focal plane distances when focus is set at inﬁnity. The pupils and principal
planes are not required to be in the order shown.

point. Since the principal planes are planes of unit magniﬁcation, the cone has
diameter D at the intersection with the second principal plane. Since the second
principal point lies a distance f ′ from the rear focal point, the XP will be positioned
at a distance m p f ′ from the SP. Therefore

1-31
Physics of Digital Photography (Second Edition)

′ = (1 − m p)f ′ .
sXP (1.23)

The distance sEP can be found by solving the following Gaussian conjugate
equation:
n n′ 1
+ = .
sEP ′
sXP fE
By using the relationship between the focal lengths deﬁned by equation (1.15), the
solution is found to be
⎛ 1 ⎞
sEP = ⎜1 − ⎟f . (1.24)
⎝ mp ⎠

Both sEP and sXP reduce to zero when m p = 1.

Recall that s and s′ are measured along the OA from P and P’ (or H and H’),
respectively. In order to derive expressions for the OP and IP distances s − sEP and
′ measured from the EP and XP, respectively, the ﬁrst step is to rearrange
s′ − sXP
equation (1.19) to obtain s in terms of the magniﬁcation:
⎛ 1 ⎞
s = ⎜1 + ⎟ f. (1.25)
⎝ ∣m∣ ⎠

Applying the Gaussian conjugate equation yields the analogous expression for s′:
s′ = (1 + ∣m∣)f ′ . (1.26)

Combining equations (1.24) and (1.25) yields the required expression for the OP
distance measured from the EP:

⎛ 1 1 ⎞
s − sEP = ⎜ + ⎟f . (1.27)
⎝ ∣m∣ mp ⎠

Similarly, combining equations (1.23) and (1.26) yields the required expression for
the IP distance measured from the XP:

′ = (∣m∣ + m p) f ′ .
s′ − sXP (1.28)

1.3.4 Angular ﬁeld of view formula

Recall that the AFoV is deﬁned as the angle subtended by the edges of the EW from
the centre of the EP. Consider a photographic lens with an image circle optimized
for a given sensor format diagonal and an XW coincident with the imaging sensor at
the SP. The geometry is shown in ﬁgure 1.25. The length of the imaging sensor in the
vertical direction is d = 2h′. When focused at the OP, the corresponding AFoV in the

1-32
Physics of Digital Photography (Second Edition)

(a) OP EP H XP H SP
mp < 1

h
h
α α z

sEP sXP
s s

(b) OP EP XP SP
mp = 1

h
h
α α z

s s

h
h
α α z

sEP sXP
s s

Figure 1.25. Geometry deﬁning the AFoV when the pupil magniﬁcation m p < 1, m p = 1 and m p > 1. The
pupils and principal planes are not required to be in the order shown.

1-33
Physics of Digital Photography (Second Edition)

vertical direction has been denoted as α. An expression for α can be obtained in

terms of the object height and the OP distance measured from the EP:
α h
tan = . (1.29)
2 s − sEP
Here s is the distance along the OA from the first principal point to the OP, and sEP is
the distance along the OA from the first principal point to the EP. The EP to OP
distance s − sEP has already been derived and is given by equation (1.27). Since
d = 2h′, applying equation (1.16) for the magnification yields the object height,
d
h= .
2∣m∣
Substituting these results into equation (1.29) and re-arranging leads to the following
formula for the AFoV:
⎛ d ⎞
α = 2 tan−1 ⎜ ⎟ . (1.30)
⎝ 2 bf ⎠

The AFoV can be deﬁned in the horizontal or diagonal direction simply by replacing
d with the length of the imaging sensor in the horizontal or diagonal direction.
Figure 1.26 illustrates the relative difference in the resulting FoV as a function of f
when focus is set at inﬁnity.

Figure 1.26. Relative difference in FoV for a selection of focal lengths (mm) corresponding to a 35 mm full-
frame system with focus set at inﬁnity.

1-34
Physics of Digital Photography (Second Edition)

The focal length f appearing in equation (1.30) is the front effective focal length.
The quantity b is the so-called bellows factor

∣m∣
b=1+ . (1.31)
mp

The bellows factor takes a value b = 1 when focus is set at infinity, but its value
increases as the magnification increases. Therefore, the presence of the bellows
factor reveals that the AFoV actually depends upon the magnification and therefore
the distance to the OP upon which the lens is focused.
For a unit-focusing lens, the AFoV becomes smaller when the OP is brought closer
to the lens. Consequently, the object appears to be larger than expected. For an
internally focusing lens, recall that the focal length f reduces when the OP is brought
closer to the lens. This can have an additional effect on the AFoV that opposes the
bellows factor. Any overall change in AFoV with OP position is referred to as focus
breathing. This phenomenon is discussed in more detail in the next section.

1.3.5 Focus breathing

Recall the AFoV formula defined by equations (1.30) and (1.31),
⎛ d ⎞ ∣m∣
α = 2 tan−1 ⎜ ⎟, where b = 1 + .
⎝ 2 bf ⎠ mp
When focus is set closer than infinity, the presence of the bellows factor b causes the
value α to become dependent upon the OP distance s. This phenomenon is known as
focus breathing. Although the AFoV formula is valid for any focusing method, the
nature of the focus breathing does depend upon the focusing method utilized.
For lenses that utilize traditional or unit focusing, f does not change from its
infinity focus value when focus is set at any other OP distance. However, ∣m∣ and b
both gradually increase from zero as the OP distance s is reduced from infinity. This
causes α to decrease from its infinity focus value. Objects therefore appear larger than
expected. The AFoV reduction is appreciable at macro object distances. For example,
b = 2 when ∣m∣ = 1 (1:1 magnification) and m p = 1. For the 35 mm full-frame format,
the diagonal AFoV for a 50 mm macro lens is then seen to decrease from 46.8 ° to 24.4 °.
Note that the OP distance changes very slightly after setting focus using a unit-focusing
lens due to the movement of the lens barrel. In the AFoV formula, the OP distance s
that appears via ∣m∣ is the H to OP distance after focus has been set.
For lenses that utilize internal focusing, recall from section 1.2.2 that f decreases
as the OP distance is reduced from infinity. The ∣m∣ and b values at a given OP
distance s depend upon the ‘new’ front focal length fnew after setting focus on the OP.
These new values depend upon the details of the lens design and must be used in the
AFoV formula in place of the infinity focus values. For zoom lenses, the values may
change over the zoom range.
Internal focusing can in principle be used to eliminate focus breathing. If the lens has
a focal length f when focus is set at infinity, focus breathing can be eliminated by

1-35
Physics of Digital Photography (Second Edition)

designing the lens so that the product bfnew always remains equal to f when the OP
distance s changes. Expensive cinematography lenses are often designed in this way
since focus breathing is disadvantageous in cinematography. However, internal focusing
in photographic lenses often overcompensates for the reduction in AFoV that would
occur with a unit-focusing lens. For example, the AFoV of many 70–200 mm lenses on
the 35 mm full-frame format increases as s is reduced. Objects therefore appear
smaller than expected. This behaviour is opposite to that of unit-focusing lenses.
Focus breathing does not occur when focus is set at inﬁnity. In this case, the
AFoV formula reduces to
⎛d ⎞
α(s→∞) = 2 tan−1 ⎜ ⎟ .
⎝ 2f ⎠

1.3.6 Focal length multiplier

According to equation (1.30), use of the same lens focal length on two different
cameras will not yield the same AFoV if the cameras are based on different sensor
formats. Given the focal length used on one camera, an equivalent focal length
would be required on the other.
Consider a lens with effective focal length fE mounted on a camera based on a
speciﬁed sensor format. For historical reasons, the 35 mm full-frame format is
commonly taken as a reference system for comparison purposes. In order to achieve
the same AFoV on a full-frame camera, a lens would be required with equivalent
focal length fE,35 deﬁned by

fE,35 = mf fE . (1.32)

The quantity mf is the focal length multiplier. This formula is valid when focus is set
at infinity and when the object and image space media are both air so that f = fE .
By definition, mf = 1 for a 35 mm full-frame format camera. Approximate values
for other formats are listed in figure 1.27. These mf values are strictly valid only
when focus is set at infinity. At closer focus distances, equation (1.32) is no longer
exact because the bellows factor b for each format will be different when the cameras

Medium format 0.70

35mm full frame 1
APS-C 1.53
APS-C (Canon) 1.61
Micro Four Thirds 2.00
1 inch 2.73
2/3 inch 3.93
1/1.7 inch 4.55
1/2.5 inch 6.03
Figure 1.27. Relative sizes of various common sensor formats (not to scale). The corresponding approximate
value of the inﬁnity-focus focal-length multiplier mf is indicated.

1-36
Physics of Digital Photography (Second Edition)

being compared are focused on the same OP. A generalized equation is derived in
chapter 5 where cross-format comparisons are discussed in detail.
It should be noted that focal length fE is a property of the lens and its value does
not change when a lens designed for a given format is used on a different format. For
example, 35 mm full-frame lenses can be used on APS-C cameras. In this case, the
35 mm full-frame lens will project an optical image at the SP with a wider image
circle than the APS-C sensor format diagonal can accommodate. This is not optimal in
terms of image quality since the image will be cropped. Nevertheless, the lens will continue
to operate as a fE lens on the APS-C camera with AFoV given by equation (1.30).
Again, a lens with full-frame equivalent focal length fE,35 deﬁned by equation (1.32)
would be required to produce the same AFoV on a 35 mm full-frame camera. In this
scenario, the term crop factor is used for mf instead of focal length multiplier.
However, focal-length multiplier is the appropriate term when the lens projects an
image circle optimized for the sensor format.

1.3.7 Perspective
Figure 1.28 shows a simple photographic scene with two objects labelled A and B.
When the photographer views the scene from position 1, object B appears to be
almost at tall as object A. In contrast, object B only appears to be about half as tall
as object A when the photographer views the scene from position 2. Furthermore,
the horizontal space between objects A and B appears much more compressed when
the scene is viewed from position 2. In other words, positions 1 and 2 offer a different
perspective of the scene.
In the simple example above, perspective is found to depend upon the distance
between the pupil of the photographer’s eye and the object upon which the eye is
focused. When taking a photograph using a camera, perspective again depends only
upon the distance between the lens EP and the OP. This is the true perspective of the
photograph [5], and the location of the EP defines the centre of perspective in object
space. Mathematically, the OP is located at a distance s − sEP from the centre of
perspective, where s is the distance from H to the OP, and sEP is the distance from H
to the EP.
Certain photographic tasks require a particular perspective. For example, it is
preferable to carry out portrait photography at a perspective that provides flattering
compression of facial features. This requirement specifies a suitable working distance
between the photographer and the model. The working distance is independent of
focal length since focal length affects AFoV and magnification but not perspective.
Nevertheless, a portrait lens should provide an appropriate framing once the

A
B
1 2
Figure 1.28. A simple photographic scene viewed from two different perspectives.

1-37
Physics of Digital Photography (Second Edition)

perspective and therefore working distance has been established. On the 35 mm full-
frame format, it is generally considered that a focal length of around 85 mm provides
a suitable framing or AFoV when used at portrait working distances.
A photographic print or screen image should be viewed from a position that
portrays the true perspective. This again requires that the photograph be viewed
from the centre of perspective, but in image space rather than object space.
Therefore, if a photograph does not undergo any enlargement from the sensor
dimensions, the pupil of the photographer’s eye should be positioned at a distance
from the photograph equivalent to s′ − sXP ′ . If the photograph is enlarged by a
factor X from the sensor dimensions as illustrated in ﬁgure 1.29, the distance s′ − sXP
′
also needs to be scaled by X . Utilizing equation (1.28), the required viewing distance
l ′ can be written
′ )X
l ′ = (s′ − sXP
= (∣m∣ + m p) f ′X .

Here X is the enlargement factor. For the 35 mm full-frame format, X ≈ 8 when an

optical image at the SP is digitized and printed at A4 paper size. According to the
above equation, the appropriate viewing distance for a camera used in air is l ′ = fE X
when the pupil magnification m p = 1 and focus is set at infinity so that ∣m∣ = 0. This is
a useful approximation when the OP distance is unknown.
If the photograph is viewed from a distance that does not match l ′, it will appear
with an apparent perspective that does not match the true perspective [5]. If there is a
significant difference between the apparent and true perspectives, perspective
distortion will occur.

OP EP H XP H SP photograph

α α z

sEP sXP
s s

l
′ in image space.
Figure 1.29. True perspective is deﬁned by the distance s − sEP in object space and s′ − s XP
When a photograph is enlarged by a factor X from the optical image recorded at the SP, the photograph needs
to be viewed from a distance equivalent to l ′= (s′ − s XP
′ )X in order to appear at the true perspective.

1-38
Physics of Digital Photography (Second Edition)

1.3.8 Keystone distortion

The OP has been perpendicular to the OA throughout the theory discussed so far.
Figure 1.30 shows a situation where the intended OP has been tilted by an angle θ.
The new OP visible within the AFoV is shown by the thick black line in object space.
The corresponding IP shown by the thick black line in image space is tilted relative
to the SP. Significantly, the ratio s′/s changes with height from the OA so that the
magnification varies across the IP. When the image is projected onto the SP, this
leads to a type of perspective distortion known as keystone distortion [2, 3].
The relationship between θ and θ′ is known as the Scheimpflug condition. An
expression can be straightforwardly derived within Gaussian optics [2, 3]. One
argument proceeds by using the linearity of the paraxial region to perform a
hypothetical extension of the principal planes, as shown in figure 1.30. A paraxial
ray leaving the OA with tangent slope u will intersect the first principal plane at
height y, and therefore
y
u= .
s
Since the principal planes are planes of unit magnification, the ray must emerge from
the second principal plane at the same height, y. The ray will intersect the OA at a
distance s′ from the second principal plane, and so the tangent slope u′ is given by

EP XP
OP H H SP
y

θ θ

h θ
θ

h
u u
z

s s

Figure 1.30. Keystone distortion is caused by the magniﬁcation varying across the IP. The pupil magniﬁcation
m p = 1 in this example.

1-39
Physics of Digital Photography (Second Edition)

y
u′ = .
s′
Therefore
⎛ s′ ⎞
u= ⎜ ⎟u′ = mu′ ,
⎝s⎠
where m is the Gaussian magniﬁcation at the OA. From the geometry it can be seen
that
s 1
tan θ = =
y u
s′ 1
tan θ′ = = .
y u′
Combining these equations yields the Scheimpﬂug condition,
tan θ′ = m tan θ .
The human visual system (HVS) subconsciously interprets leaning vertical lines as
parallel when the tilt angle θ is small, and so keystone distortion is objectionable for
small tilt angles.
Tilting can be avoided by using an ultra-wide angle lens and cropping the output
digital image or preferably by using the shift function of a tilt-shift lens to shift the
lens relative to the SP in a vertical plane. Tilt-shift lenses can also tilt the lens relative
to the SP so as to tilt the plane of focus without affecting the AFoV. An example
application is DoF extension [5].

1.4 Depth of field

When focus is set on a given object plane labelled OP, the corresponding IP deﬁnes
the plane at which the optical image is in sharp focus. In principle, objects lying in
front of or behind the OP will not appear in focus on the photograph. In practice,
such objects can still appear acceptably sharp at the IP provided they lie within a
distance from the OP known as the depth of ﬁeld (DoF). More precisely, an object
behind the OP must lie within the far DoF, and an object in front of the OP must lie
within the near DoF. It is possible to derive simple equations for the near DoF, far
DoF and total DoF within Gaussian optics.

1.4.1 Circle of confusion

Figure 1.31 shows a lens with focus set on an OP positioned at a distance s from the
ﬁrst principal point P. The corresponding IP coincides with the SP at a distance s′
from the second principal point P’.
In the upper diagram, a point object has been placed at the OA in front of the OP
at a distance sn from P. The rays from this point converge behind the SP at a distance
sn′ from P’. At the SP, the image of the point is a blur spot with diameter labelled c.

1-40
Physics of Digital Photography (Second Edition)

OP EP H XP H SP
n n

near
c 1
D mp D2
2
|m|
c
P P

sn sn
sEP sXP
s s

OP EP H XP H SP
n n

far
1
D mp D2
2
c
|m| c
P P

sf sf
sEP sXP
s s

Figure 1.31. Geometry for the DoF equations. The pupil magniﬁcation m p > 1 in this example. The near DoF
and far DoF boundaries are deﬁned by the distances where the blur spot diameter is equal to the
acceptable CoC diameter c.

In the lower ﬁgure, rays from a point object positioned behind the OP converge in
front of the SP. This point also appears as a blur spot at the SP with diameter c.
Provided the blur spot diameter does not exceed a prescribed value, objects
situated between sn and sf will remain acceptably sharp or in focus at the SP. The
blur spot with this prescribed diameter c is known as the acceptable circle of

1-41
Physics of Digital Photography (Second Edition)

Table 1.2. Standard CoC diameters for various sensor formats.

Sensor format Sensor dimensions (mm) CoC (mm)

35 mm full frame 36 × 24 0.030

APS-C 23.6 × 15.7 0.020
APS-C (Canon) 22.2 × 14.8 0.019
Micro Four ThirdsTM 17.3 × 13.0 0.015
1 inch 13.2 × 8.80 0.011
2/3 inch 8.60 × 6.60 0.008
1/1.7 inch 7.60 × 5.70 0.006
1/2.5 inch 5.76 × 4.29 0.005

confusion (CoC) [5, 6]. The criterion for deﬁning the CoC diameter is based upon
the fact that the resolving power (RP) of the HVS is limited, and so blur spots on the
SP that are smaller than the CoC will not produce noticeable blur on the photo-
graph. Since the RP of the HVS depends upon the viewing conditions, the prescribed
CoC also depends upon the viewing conditions. Camera and lens manufacturers
typically assume the following set of standard viewing conditions when deﬁning
DoF scales [6]:
• Enlargement factor X from the sensor dimensions to the dimensions of the
viewed output photograph: X = 8 for the 35 mm full frame format.
• Viewing distance: L = Dv, where Dv = 250 mm is the least distance of distinct
vision.
• Resolving power (RP) of the HVS at Dv: RP(Dv) = 5 lp/mm (line pairs per
mm) when viewing a pattern of alternating black and white lines. More than
5 lp/mm cannot be distinguished from uniform grey..

Table 1.2 lists examples of standard CoC diameters based upon the above set of
standard viewing conditions. Note that smaller sensor formats require a smaller
CoC since the enlargement factor X will be greater.
When the viewing conditions differ from those deﬁned above, a custom CoC
diameter can be calculated using the following formula:
1.22 × L
c= . (1.33)
X × Dv × RP(Dv )
This formula will be derived in chapter 5 where the concept of the CoC will be
discussed in further detail.

1.4.2 Depth of ﬁeld formulae

The CoC diameter c is an image space quantity defined at the SP, but c can be
expressed in object space by using the Gaussian magnification to project the CoC
onto the OP [5]. By using similar triangles, the geometry shown in the upper diagram
of figure 1.31 reveals that

1-42
Physics of Digital Photography (Second Edition)

c ⎛ s − sn ⎞
=⎜ ⎟D. (1.34)
∣m∣ ⎝ sn − sEP ⎠

Similarly, the geometry shown in the lower diagram of ﬁgure 1.31 reveals that
c ⎛ s −s ⎞
=⎜ f ⎟D. (1.35)
∣m∣ ⎝ sf − sEP ⎠

The distance sEP from H to the EP has been derived in section 1.3.3 and is deﬁned by
equation (1.24),
⎛ 1 ⎞
s EP = ⎜1 − ⎟ f.
⎝ mp ⎠

The above equations can be re-arranged to obtain the DoF formulae.

Near depth of field

The distance sn to the near DoF boundary can be found by rearranging equation
(1.34),
∣m∣Ds + cs EP
sn = .
∣m∣D + c
An object positioned closer than the distance sn from H is considered to be out of
focus. The near DoF is the distance s − sn . This is found to be

c(s − sEP )
near DoF = . (1.36)
∣m∣D + c

An object positioned closer than the OP with a separation distance greater than
s − sn from the OP is considered to be out of focus.

Far depth of field

The distance sf to the far DoF boundary can be found by rearranging equation
(1.35),
∣m∣Ds − c s EP
sf = .
∣m∣D − c
An object positioned further than the distance sf from H is considered to be out of
focus. The far DoF is the distance sf − s . This is found to be

c(s − s EP )
far DoF = . (1.37)
∣m∣D − c

An object positioned beyond the OP with a separation distance greater than sf − s

from the OP is considered to be out of focus.

1-43
Physics of Digital Photography (Second Edition)

Total depth of field

The total DoF is the distance sf − sn between the near and far boundaries,

2∣m∣Dc (s − sEP )
total DoF = .
m2D 2 − c 2

1.4.3 Depth of ﬁeld control

A large or small total DoF is described as deep or shallow, respectively. According
to the DoF formulae, a shallower total DoF can be achieved by the following
methods:
1. Increase the magnification. According to the formula ∣m∣ = f /(s − f ) defined
by equation (1.19), a higher magnification is achieved by either using a
longer focal length f or by moving the camera closer to the OP in order to
reduce s.
2. Use a larger EP diameter. As discussed in section 1.5, the EP diameter D can
be expressed in terms of focal length f and lens f-number N by the formula
D = f /N . Therefore, lowering N increases D, which in turn provides a
shallower total DoF.
3. Reduce the pupil magnification. In a lens with a smaller pupil magnification
m p , the EP is positioned closer to the OP and so the distance s − sEP is
reduced. The pupil magnification is important in macro photography but can
be neglected at typical photographic working distances where the EP to OP
distance can be approximated as s − sEP ≈ s .
4. Reduce the circle of confusion. A smaller CoC reduces the
undetectable defocus blur on an output photograph and therefore reveals a
shallower DoF. The CoC diameter depends upon the photograph viewing
conditions according to equation (1.33). For example, a shallower DoF is
seen when the observer is positioned closer to the photograph.

As discussed in the following section, the near and far contributions to the total DoF
are not equal in general.

1.4.4 Hyperfocal distance

The near DoF and far DoF are only formally equal when s = f so that ∣m∣ → ∞. In
fact, a lens is unable to achieve focus at this OP distance. When the OP is moved
further from the lens, the magnification reduces. According to equations (1.36) and
(1.37), the near DoF becomes smaller than the far DoF. This is illustrated in
figure 1.32, which shows the ratio of the near DoF to the far DoF as a function of
OP distance s for an example set of f, N and c values.
When s = 2f , the magnification ∣m∣ = 1 and the near DoF to far DoF ratio
becomes (D − c )/(D + c ). As the OP distance increases further, a distance is
eventually reached where the near DoF to far DoF ratio reduces to zero because

1-44
Physics of Digital Photography (Second Edition)

Figure 1.32. Near DoF divided by the far DoF as a function of distance s from the ﬁrst principal plane for an
example set of f, N and c values. At the minimum focus distance past s = f, the near and far DoF are almost
equal and so the ratio is 1:1. At the hyperfocal distance H, the ratio is 1: ∞ and so the fraction reduces to zero.

the far DoF extends to infinity. The distance s − sEP = H measured from the lens EP
is known as the hyperfocal distance [5]. From equation (1.37), the far DoF extends to
infinity at the following magnification:
c
∣m∣ = (when s − sEP = H). (1.38)
D
Substituting for ∣m∣ with s − sEP = H yields the following formula for H:
fD
H= + f − sEP . (1.39)
c
Now substituting equation (1.38) into the near DoF formula defined by equation
(1.36) with s − sEP = H yields the following result:
H
H= .
2
This reveals that when the OP is positioned at the hyperfocal distance, the far
DoF extends to infinity and the near DoF extends to half the hyperfocal distance
itself.
According to Gaussian optics, focusing at the hyperfocal distance yields the
maximum available DoF for a given combination of camera settings. This is useful
for landscape photography, although in practice it is advisable to set focus at an OP
positioned beyond the hyperfocal distance if the aim is to ensure the distant features
remain in focus. Fixed-focus cameras are set at the hyperfocal distance since these
rely on maximising the total DoF to produce in-focus images.

1-45
Physics of Digital Photography (Second Edition)

Practical calculations
Equation (1.39) can be written in the following way for practical calculations:

H = h + f − sEP

The quantity h is deﬁned by

fD f2
h== .
c cN
Here N is the lens f-number, and h serves as a practical approximation to the
hyperfocal distance,
H ≈ h.
In terms of h, the DoF formulae can be written as follows:
(s − f )(s − sEP )
near DOF =
h + (s − f )
(s − f )(s − sEP )
far DOF = . (1.40)
h − (s − f )
2h(s − f )(s − sEP )
total DOF =
h 2 − (s − f ) 2
Figure 1.33 illustrates the DoF formulae for the 35 mm full-frame format with CoC
diameter c = 0.030 mm and an example set of s, f and N values. The green horizontal
dashed line indicates the CoC diameter, and the intersection of the graph with this
line marks the near DoF and far DoF boundaries. When s = H, the blur spot
diameter never reaches the value 0.030 mm at distances beyond s, but does become
0.030 mm at a distance H/2 in front of s.
In practice, the DoF formulae are approximate as only uniform defocus blur has
been included as a source of blur. Accurate formulae would require all other blur
contributions to be taken into account, including blur due to diffraction.

1.4.5 Focus and recompose limits

Focusing and recomposing is a widely used technique for setting focus on static
subjects. Typically the centre AF point in the viewfinder is used to focus on the
subject, the focus is locked by half-pressing the shutter button, and then the camera
is pivoted to recompose the scene before the shutter is released. Although this
method is adequate when the DoF is deep, it can lead to error when the DoF is
shallow since the object will become out of focus if it pivots outside the near DoF
after the scene is recomposed. This can be particularly problematic in portrait
photography, which typically requires precise focus on the eye nearest to the camera.
Consider the geometry illustrated in figure 1.34. First the focus is set on the object
at point p using the centre AF point. This defines the OP and the near DoF.
Assuming the camera is pivoted about the lens EP when the scene is recomposed, the
maximum pivot angle φ allowed before p falls outside the near DoF limit is given by

1-46
Physics of Digital Photography (Second Edition)

Figure 1.33. Blur spot as a function of object distance from the ﬁrst principal point after focusing on the OP
positioned at distance s. The horizontal dashed line indicates an acceptable CoC with diameter c = 0.030 mm.

⎛ s − s EP ⎞
φ = cos−1 ⎜ n ⎟.
⎝ s − sEP ⎠
This angle is valid in any pivot direction. The distance sn − sEP can be found by re-
arranging the near DoF formula deﬁned by equation (1.36),

1-47
Physics of Digital Photography (Second Edition)

p
OP near
DoF

s s − sEP
sn

sEP EP
H

z
Figure 1.34. Geometry of the focus and recompose technique.

∣m∣D (s − sEP )
sn − s EP = .
∣m∣D + c
The maximum pivot angle is found to be
⎛ ∣m∣D ⎞
φ = cos−1 ⎜ ⎟ .
⎝ ∣m∣D + c ⎠

For the special case that the lens is focused at the hyperfocal distance H, equation
(1.38) reveals that ∣m∣D = c and so the maximum allowed pivot angle is
φ = cos−1(0.5) = 60°.
Generally, if the calculated φ is too small for the required recomposition, it is
advisable to first compose the scene and then set focus by using an alternative
viewfinder AF point that lies as close as possible to the location of the subject. This
method was used in single-AF mode to focus on the eye in the example shown in
figure 1.35.

1.4.6 Bokeh
As mentioned in the previous section, the photographic DoF formulae give
approximate results since they only describe uniform blur due to defocus. All other
blur contributions have been neglected, including those arising from the higher order

1-48
Physics of Digital Photography (Second Edition)

Figure 1.35. Accurate focus on the eye is critical when the DoF is shallow.

properties of a lens not described by Gaussian optics. In practice, the various

residual lens aberrations will profoundly effect the nature of the blur regions seen on
a photograph. Lens bokeh is a term used to describe the aesthetic character of the
blur, and certain lenses are highly regarded for the special bokeh that they produce.
SA introduced in sections 1.1.2 and 1.1.11 plays an important role in determining
bokeh. Recall that SA is caused by variation of focus with ray height in the aperture.
A converging element is said to contribute under-corrected SA since the rays come to
focus at the OA over a shorter distance as the ray height increases. Conversely, a
diverging element is said to contribute over-corrected SA since the rays come to focus
at the OA over a longer distance as the ray height increases.
Figure 1.36 shows a compound lens with residual under-corrected SA. In this
case, the IP corresponding to rays from a point object positioned on the OA just
beyond the far DoF boundary will be located some distance in front of the SP. As
shown on the diagram, the IP has been chosen as the plane that minimizes the ray
spread. At the SP, the point will appear as a blur spot larger than that produced in
the absence of SA, and with a blur distribution that decreases towards the edges
where the ray density is seen to be lower. Overlapping blur spots of this type produce
smooth bokeh. Since the rays originate from beyond the far DoF boundary, under-
corrected SA is found to lead to smooth background bokeh.
For the same lens, the IP corresponding to rays originating from a point on the
OA positioned just in front of the near DoF boundary will be located some distance

1-49
Physics of Digital Photography (Second Edition)

IP SP

SP IP

Figure 1.36. Blur spots for a lens with under-corrected SA. The upper diagram shows a blur spot at the SP
arising from a background point located just beyond the far DoF boundary, whereas the upper diagram shows
a blur spot at the same SP arising from a foreground point located just in front of the near DoF boundary.
(Blur spots reproduced from [15] with kind permission from ZEISS (Carl Zeiss AG).)

behind the same SP. At the SP, the point will appear as a smaller blur spot with blur
distribution that sharply increases at the edges. Overlapping blur spots of this type
can produce harsh bokeh. Since the rays originate from in front of the near DoF
boundary, under-corrected SA is found to lead to harsh foreground bokeh.
The situation reverses for a lens with residual over-corrected SA. In this case, the
background bokeh becomes harsher, and the foreground bokeh becomes smoother.
Since background bokeh is generally considered more important, slightly under-
corrected SA is preferred in practice [15].
Many other factors affect the appearance of a blur spot. For example, chromatic
aberration at the light–dark boundary may be visible on the rim of an isolated blur
spot due to variation of focus with wavelength. The number of blades used by the iris
diaphragm affects the shape of a blur spot. A higher blade count will produce a more
circular blur spot at the OA. Even if the blur spots are circular at the OA, their shape
may be altered at ﬁeld positions away from the OA. For example, ﬁeld curvature

1-50
Physics of Digital Photography (Second Edition)

and vignetting at large aperture diameters can produce a cats-eye shaped blur
towards the edges of the frame.

1.5 Photometric exposure

Exposure is a measure of the amount of light per unit area that reaches the SP while
the camera shutter is open. The amount of light is quantified in terms of its energy
using radiometry (see chapter 2). The radiometric exposure at a given photosite
(sensor pixel) stimulates an electronic response from the camera imaging sensor that
will ideally be proportional to the amount of light received at that photosite.
A radiometric measure of exposure is necessary for modelling the electronic
response of the camera imaging sensor. However, photometry and a photometric
measure of exposure is instead conventionally used to quantify light in photography.
Photometry includes a weighting that takes into account the spectral sensitivity of
the HVS, and this can be helpful when making exposure decisions based upon the
visual properties of the scene.
Note that for fixed lighting, cameras are designed so that the radiometric response
will ideally be proportional to the photometric response. This is discussed further in
chapters 2 and 3.
Since exposure is a measure of the amount of light per unit area that reaches the
SP while the camera shutter is open, several factors influence its magnitude:
1. The diameter of the lens EP since this restricts the size of the ray bundle that
can enter the lens (see section 1.3.1.)
2. The ability of the lens to bend light rays. This depends upon the total
refractive power of the lens, which can be described in terms of the effective
focal length fE introduced in section 1.1.9.
3. The nature of the object-space and image-space refractive media with
refractive indices n and n′.
4. The light transmittance properties of the lens and the camera components
situated in front of the SP.
5. The time duration the imaging sensor is exposed to the light that emerges
from the XP.

Note that factors 1–3 depend upon the relative aperture (RA) of the lens. Later in
this section, it will be shown how the familiar f-number emerges from the RA.
Subsequently, the camera equation will be derived. This describes the photometric
exposure distribution at the SP in terms of the above factors. In chapter 2, an
exposure strategy will be developed based on the camera equation.

1.5.1 Photometry
The ‘amount’ of light from the photographic scene that passes through the lens to
the SP can be quantified in terms of power or flux. Photometric or luminous flux
denoted by Φv is the rate of flow of electromagnetic energy emitted from or received
by a specified surface area, appropriately weighted to take into account the spectral
sensitivity of the HVS. The unit of measurement is the lumen (lm).

1-51
Physics of Digital Photography (Second Edition)

Table 1.3 lists various other photometric quantities that are useful for describing
the flow of flux from the photographic scene to the SP.
• Illuminance E v : This is the luminous flux per unit area received by a specified
surface from all directions. The unit of measurement is the lux (lx), which is
equivalent to l m m−2. Integrating illuminance over a surface area yields the
total flux associated with that area. In photography, the illuminance at the SP
is of primary interest.
• Luminous intensity Iv : Consider the flux emerging from the scene being
photographed. When flux is emitted from an area, the quantity analogous to
illuminance is luminous exitance Mv. However, luminous exitance does not
depend upon a specific direction and so it is not a convenient quantity for
measuring the light emitted from the scene in the direction of the camera.
Luminous intensity Iv is a much more useful quantity as it defines the flux
emitted from a point source into a cone. As illustrated in figure 1.37, the cone

Table 1.3. Common photometric quantities with their symbols and SI units. The ‘v’ (for visual)
subscript is often dropped from photographic formulae.

Photometry

Quantity Symbol Unit Equivalent unit

Luminous flux Φv lm
Luminous intensity Iv lm sr−1 cd
Luminance Lv lm m−2 sr−1 cd m−2
Luminous exitance Mv lm m−2 lx
Illuminance Ev lm m−2 lx
Spectral exposure Hv lm s m−2 lx s

(a) r (b)
r2
r r
θ ω

θ = 1 radian ω = 1 steradian
Figure 1.37. (a) An angle of 1 radian (rad) is defined by an arc length r of a circle with radius r. Since the
circumference of a circle is 2πr , the angle corresponding to a whole circle is 2π radians. (b) A solid angle ω
projects a point source onto the surface of a sphere, thus defining a cone. For a cone of radius r, a surface area
of r2 defines a solid angle of 1 steradian (sr). Since the surface area of a whole sphere is 4πr 2 , the solid angle
corresponding to a whole sphere is 4π steradians.

1-52
Physics of Digital Photography (Second Edition)

is defined per unit solid angle measured in steradians (sr). The unit of
measurement of luminous intensity is the candela (cd), equivalent to l m sr−1.
In the present context, the cone subtended by the lens EP is of primary interest.
• Luminance L v : When the source is not an isolated point but is extended,
luminous intensity at a source point can be spread into an infinitesimal source
area. This defines luminance L v as the luminous flux per unit solid angle per
unit projected source area. Integrating luminance over the cone angular
subtense yields the flux received by the observer (such as the eye or lens EP)
from the source position. Integrating over the entire source area then yields
the total flux received by the observer.

Based on the above, the scene luminance distribution can now be defined as an array
of infinitesimal luminance patches representing the photographic scene, where the
solid angle defines a cone subtended by the scene position from the lens EP. The lens
transforms the flux defined by the scene luminance distribution into an array of
infinitesimal illuminance patches on the SP referred to as the sensor-plane illumi-
nance distribution.

1.5.2 Flux emitted into a cone

The expression for the flux emitted into a cone from a point on an extended plane
source is an important optical result needed for deriving the RA of a lens. In this
section, the derivation is performed using real angles [4, 16]. Subsequently, the
paraxial limit will be taken and then extended linearly. The final result defined by
equation (1.47) is expressed in terms of ray tangent slopes consistent with Gaussian
optics.
First recall from the previous section that the flux emitted into a cone from a point
source is defined by
Φ = Iω . (1.41)
The ‘v’ subscripts have been dropped; here I is the luminous intensity, and ω is the
solid angle defining the cone. An isolated point source of flux has the same luminous
intensity when viewed from any angle.
Now consider a typical photographic scene where a source of flux such as the Sun
is illuminating an extended plane surface. An ideal diffuse or Lambertian surface will
scatter the flux into many different directions. For a point on such a surface,
Lambert’s cosine law of intensity states that the luminous intensity will fall off as the
cosine of the angle θ between the scattered direction and the surface normal. Unlike
the case of the isolated point source, the luminous intensity is now dependent on θ,
I (θ ) = I cos θ . (1.42)
Figure 1.38 shows such a scenario for a point at the axial position on the plane
surface. The angle θ is the real vertical angle subtended by the cone with the OA.
Since luminous intensity now depends upon θ, integration is required in order to
determine the flux emitted into the cone. Equation (1.41) must be modified to

1-53
Physics of Digital Photography (Second Edition)

θ
dA dθ z
φ

Figure 1.38. The angle element dθ is integrated from 0 to θ. The OA is in the z-direction, and the angle ϕ
revolves around the OA.

θ 2π
Φ= ∫θ=0 ∫ϕ=0 Iv(θ1) dω(θ1, ϕ) dθ1 dϕ. (1.43)

Here θ1 is a dummy integration variable, and ϕ is the horizontal angle revolving

around the OA.
Before proceeding to evaluate equation (1.43), it is instructive to note that the
luminance of a Lambertian surface is in fact independent of θ and therefore the same
when viewed from any direction. This can be seen by associating an infinitesimal
area element dA with the point at the axial position on the surface. The projected
surface area seen by an observer at an angle θ from the surface normal is reduced by
the cosine of the angle θ,
dA(θ ) = dA cos θ .
Therefore
I (θ ) I
L= = .
dA(θ ) dA
Now in order to perform the integration defined by equation (1.43), an expression
for an infinitesimal change in solid angle dω is required. Consider the geometry
shown in figure 1.39. The spherical surface area element dS for an infinitesimal
change in vertical angle θ and infinitesimal change in horizontal angle ϕ is given by
dS = r dθ1 × r sin θ1 dϕ .
Therefore
r dθ1 × r sin θ1 dϕ
dω = = sin θ1 dθ1 dϕ . (1.44)
r2

1-54
Physics of Digital Photography (Second Edition)

rdθ
r sin θ
dS
r sin θ dφ

θ dθ
r

y
dφ
φ

x
Figure 1.39. Surface area element dS of a cone of ﬂux. Here the z-axis representing the OA is in the vertical
direction.

Substituting equations (1.42) and (1.44) into (1.43) and performing the integration
over ϕ yields
θ θ
Φ = 2π ∫0 I cos θ1 sin θ1 dθ1 = πL dA ∫0 (2 sin θ1 cos θ1) dθ1. (1.45)

Here θ1 is the dummy variable. Performing this integration provides the following
result:

Φ = πL sin2 θ dA (1.46)

Here dA is the inﬁnitesimal area element associated with the axial position on the
plane surface, L is the luminance associated with dA and θ is the real vertical angle
subtended by the cone with the OA.
Equation (1.46) is valid when tracing real rays through an optical system. Recall
from section 1.1.4 that Gaussian optics avoids describing lens aberrations by
considering only the paraxial region where sin θ → θ . Subsequently, a linear extension
of the paraxial region is performed by introducing the ray tangent slope u in place of θ.
Consequently, the form of equation (1.46) valid within Gaussian optics is

Φ = πL u 2 dA (1.47)

1-55
Physics of Digital Photography (Second Edition)

OP EP H XP H SP

n n

1
2 mpD
1
2D
dA u u dA
P z
P

f f

sEP sXP

s s
Figure 1.40. Gaussian geometry deﬁning the relative aperture (RA) and working f-number Nw .

The formal way to obtain this result is to note that the dummy variables appearing
in equation (1.45) become cos θ1 = 1, sin θ1 = u1, and dθ1 = du1 within Gaussian
optics.
Since u is a ray tangent slope, an important feature of the Gaussian description is
that the base of the cone becomes ﬂat. This feature is illustrated in ﬁgure 1.40.

1.5.3 Relative aperture

Consider a photographic lens focused on an OP so that the corresponding Gaussian
IP coincides with the SP. Figure 1.40 shows a ray bundle entering and exiting the
lens from the axial position on the OP. Recall from section 1.3.2 that this ray bundle
is bounded by the marginal rays since these pass by the edges of both the AS and the
pupils.
According to equation (1.47) derived in the previous section, the ﬂux Φ at the lens
EP is given by
Φ = πL u 2 dA .
In the present context,
• u is the marginal ray tangent slope.
• dA is the inﬁnitesimal area at the axial position on the OP.
• L is the luminance associated with dA.

The aim is to obtain an expression for the resulting illuminance E associated with
dA′, the inﬁnitesimal area at the axial position on the SP.

1-56
Physics of Digital Photography (Second Edition)

First note that dA and dA′ may be expressed as the product of inﬁnitesimal
heights in the x and y directions,
dA = dhx dhy
dA′ = dhx′ dh y′ .

By utilizing equation (1.16), the ratio of dA′ to dA can now be expressed in terms of
the magnification,
dA′ dh x′ dh y′
= = m2 .
dA dhx dhy
Substituting for dA in equation (1.47) yields
dA′ 2
Φ = πL u .
m2
Further progress can be made by utilising the Lagrange theorem [1, 2]. This theorem
defines an optical invariant valid within Gaussian optics. Application of the
invariant provides the relationship defined by equation (1.18) between the magni-
fication and the marginal ray tangent slopes u and u′,
nu
m= .
n′u′
Now substituting for m yields the flux at the axial position on the SP in terms of the
ray tangent slopes,
⎧⎛ n′ ⎞ ⎫2
Φ = πL dA′⎨⎜ ⎟u′⎬ .
⎩⎝ n ⎠ ⎭
The corresponding illuminance at the axial position is given by
Φ
E=T , (1.48)
dA′
where T ⩽ 1 is a lens transmittance factor that takes into account light loss due to the
lens material. It now follows that

π ⎧⎛ n′ ⎞ ⎫2
E= LT ⎨⎜ ⎟2u′⎬ . (1.49)
4 ⎩⎝ n ⎠ ⎭

This equation can be written in the following way:

π
E = LT × (RA)2 ,
4
where RA deﬁnes the relative aperture within Gaussian optics

⎛ n′ ⎞
RA = ⎜ ⎟2u′ . (1.50)
⎝n ⎠

1-57
Physics of Digital Photography (Second Edition)

In photography, it is convenient to express u′ and the RA in an alternative way. Two

cases can be considered:
1. Focus set at inﬁnity: In this case the RA can be expressed in terms of the
f-number.
2. Focus set closer than inﬁnity: In this more general case, the RA can be
expressed in terms of the working f-number. The working f-number can in
turn be expressed in terms of the f-number and the bellows factor introduced
in section 1.3.4.

1.5.4 f-number
When focus is set at inﬁnity, the OP distance s → ∞, and the IP (and SP) distance
s′ → f ′, where f ′ is the rear effective focal length. The corresponding geometry is
shown in ﬁgure 1.41. The ray tangent slope u′ is seen to be
D
u (′s→∞) = ,
2f ′
where D is the diameter of the entrance pupil. Substituting into equation (1.49) yields
the following expression for the illuminance at the axial position on the SP:
π ⎧ ⎛ n′ ⎞ D ⎫ 2
E(s→∞) = LT ⎨⎜ ⎟ ⎬ .
4 ⎩⎝ n ⎠ f ′ ⎭

OP EP H XP H SP
∞
n n
1
m D
2 p

1
2
D
u dA
z

f f
sEP sXP

Figure 1.41. Gaussian geometry defining the f-number N for a lens focused at infinity. In this illustration the
pupil magnification m p > 1.

1-58
Physics of Digital Photography (Second Edition)

The refractive indices can be removed by utilising the relationship between the front
and rear effective focal lengths defined by equation (1.15),
f′ n′
= .
f n
Therefore
π ⎛ D ⎞2
E(s→∞) = LT ⎜ ⎟ .
4 ⎝f⎠
Back in the nineteenth century, the quantity D/f was defined as the apertal ratio [17],
however this term has not come into widespread use. The apertal ratio is the specific
case of the RA when the lens is focused at infinity. In photography it is numerically
more convenient to consider the reciprocal of the apertal ratio instead. This is
defined as the f-number N:

f
N= . (1.51)
D
RA and f-number therefore have a reciprocal relationship [14]. The f-number is
usually marked on lens barrels using the symbols f /n or 1:n, where n is the numerical
value of the f-number. (In order to avoid confusion with focal length, the symbol N
is used for f-number throughout this book.) The expression for the illuminance at the
axial position (axial inﬁnitesimal area element) on the SP when focus is set at inﬁnity
becomes

π 1
E(s→∞) = LT 2 . (1.52)
4 N

Note that according to equation (1.51), the f-number is deﬁned as the front effective
focal length divided by the diameter of the EP [2, 18]. The f-number is commonly but
incorrectly deﬁned as the effective focal length fE or rear effective focal length f ′
divided by the diameter of the EP. In the former case, the correct numerical value for
N would be obtained only if the object-space medium is air. In the latter case, the
correct numerical value for N would be obtained only if the object-space and image-
space media both have the same refractive index.
It is often assumed that the Gaussian expression for the f-number is valid only
within Gaussian optics. It will be shown in section 1.5.10 that equation (1.51) is in
fact exact for a lens that is free of SA and coma. It will also be shown that the
minimum value of the f-number in such a lens is limited to N = 0.5 in air.

1.5.5 Working f-number

When focus is set on an OP positioned closer than infinity, the lens focusing
movement ensures that the flux emitted from the OP is brought to a sharp focus at
the SP. Consequently, s′ ≠ f ′ since the rear focal point will no longer lie on the SP.
The relevant geometry is shown in figure 1.40. Simple trigonometry reveals that

1-59
Physics of Digital Photography (Second Edition)

m pD /2
u′ = .
′ )
(s′ − sXP

Here D is the diameter of the EP, m p is the pupil magniﬁcation, s′ is the image
distance measured from the second principal plane and sXP′ is the distance from the
′
second principal plane to the XP. The distance s′ − sXP was derived in section 1.3.3
and is deﬁned by equation (1.28),
′ = (∣m∣ + m p) f ′ .
s′ − sXP

Therefore
m pD
u′ = .
2 (∣m∣ + m p) f ′

Substituting into equation (1.49) and utilising the relationship between the front and
rear effective focal lengths deﬁned by equation (1.15) leads to the following
expression for the illuminance at the axial position (axial inﬁnitesimal area element)
on the SP:

π ⎛ D ⎞2
E= LT ⎜ ⎟ . (1.53)
4 ⎝ bf ⎠

Here b is the bellows factor deﬁned by equation (1.31), which emerged when deriving
the AFoV formula
∣m∣
b=1+ .
mp

It is convenient to rewrite equation (1.53) in the following way:

π 1
E= LT 2 . (1.54)
4 Nw

The working f-number Nw is deﬁned by

Nw = bN . (1.55)

When focus is set at infinity, ∣m∣ → 0 and the bellows factor b → 1. The working
f-number then reduces to the f-number N .
Consider a lens that utilises unit focusing. When focus is set on an OP positioned
closer than infinity, ∣m∣ > 0 and so Nw > N . Comparison of equations (1.52) and
(1.54) reveals that the illuminance decreases relative to its value at infinity focus.
This is a consequence of focus breathing. The contribution from the bellows factor
can become significant at close focus distances in the same way that it affects the

1-60
Physics of Digital Photography (Second Edition)

AFoV. For example, when ∣m∣ = 1 and m p = 1, the bellows factor becomes b = 2 and
equation (1.54) reveals that the flux arriving at the axial position on the SP is
reduced to one quarter of the amount that would be expected according to the value
of N . For lenses that utilise internal focusing, the decrease in focal length that occurs
at closer focusing distances may compensate for the flux reduction, and so any
change in illuminance depends upon the specific focus breathing properties of the
lens design.
Photographic exposure strategy will be discussed in chapter 2. Standard
exposure strategy is based upon N rather than Nw as it assumes that focus is set
at infinity. Nevertheless, knowledge of Nw will indicate exposure compensation
that may need to be applied at close focusing distances when a hand-held exposure
meter is used. In the case of in-camera through-the-lens (TTL) metering systems,
any illuminance change due to the OP distance will automatically be taken into
account [5].

1.5.6 f-stop
Recall that when focus is set at infinity, the illuminance at the axial area element dA′
on the SP is defined by
π 1
E(s→∞) = LT 2 .
4 N
Here L is the scene luminance at the axial area element on the OP, T is the lens
transmittance factor and N = f /D is the f-number. The flux collected at dA′ is given
by
Φ = E dA′ .
Significantly, if the front effective focal length f and EP diameter D are changed but
their ratio is kept constant, N will remain constant and the flux or luminous power Φ
incident at dA′ will also remain constant provided L is time-independent.
Now consider adjusting the f-number itself. Since E is inversely proportional to
the square of N, the f-number must decrease by a factor 2 in order to double the
flux Φ. Analogously, the f-number must increase by a factor 2 in order to halve Φ.
Adjustable iris diaphragms are constructed so that successive increments will double
or halve the flux Φ. This leads to the following series of possible f-numbers when the
surrounding medium is air:

N: 0.5 0.7 1 1.4 2 2.8 4 5.6 8 11 16 22 32 etc

Changing the f-number by a single increment in this series is referred to as an

increase or decrease by one f-stop. Modern iris diaphragms also allow fractional f-
stops.

1-61
Physics of Digital Photography (Second Edition)

1.5.7 Natural vignetting

Equations (1.50), (1.52) and (1.54) are written in terms of the RA, f-number and
working f-number, respectively. It is important to understand that these equations
define the illuminance only at infinitesimal area element dA′ associated with the
axial (on-axis) position on the SP.
For an arbitrary off-axis source position on the OP, geometrical considerations
dictate that the cone of flux subtended by the lens EP will be distorted (figure 1.42).
Even if the OP is an extended Lambertian surface with a uniform scene luminance
distribution, the illuminance at the corresponding off-axis area element on the SP
will in fact be reduced compared to the axial value.
In order to obtain a valid expression for the illuminance at an arbitrary position
on the SP, equations (1.52) and (1.54) need to be modified to describe the cone of
flux emitted from an arbitrary off-axis source position on the OP. This can be
achieved through a straightforward modification of the derivation for the flux
emitted into a cone given in section 1.5.2. Referring to the geometry of figure 1.42,
the following modifications are required:
• The OP area element normal to the centre of the EP is reduced by a factor
cos φ so that dA(φ ) = dA cos φ.
• The radial distance r is extended by a factor cos φ. This increases the
projected differential solid angle element by a factor cos2 φ.
• The differential area element on the surface of the cone at the EP is
approximately reduced by a factor cos φ. This approximation is valid provided
the source point is far from the EP. The precise relation is derived in [19].

ϕ
u
z

OP EP
r

Figure 1.42. Geometry of the cosine fourth law which explains natural vignetting. Recall that the base of the
cone becomes ﬂat in Gaussian optics.

1-62
Physics of Digital Photography (Second Edition)

Performing these modiﬁcations to the derivation given in section 1.5.2 leads to the
following Gaussian result:
Φ(φ) = πL dA u 2 cos4 φ .
Using this in place of equation (1.47) yields the illuminance at any desired position
on the SP. For example, equation (1.54) generalises to
π ⎛⎜ x y ⎞⎟ 1 ⎧ ⎛ x y ⎞⎫
E (x , y ) = L , T cos4 ⎨φ⎜ , ⎟⎬ . (1.56)
4 ⎝ m m ⎠ Nw 2
⎩ ⎝ m m ⎠⎭

The coordinates (x , y ) indicate the position on the SP, and the coordinates
(x /m, y /m ) indicate the corresponding position on the OP. These two sets of

Figure 1.43. Natural cosine-fourth falloff for a rectilinear lens focused at inﬁnity in air. The upper and lower
diagrams show effective focal lengths 24 mm and 50 mm, respectively, on a camera with a 35 mm full-frame
sensor.

1-63
Physics of Digital Photography (Second Edition)

coordinates are related via the magniﬁcation. The coordinates are typically dropped
in photographic formulae,
π 1
E= L 2 T cos4 φ . (1.57)
4 Nw

If focus is set at inﬁnity, Nw can be replaced by N. Although the result is approximate,

the above equation is known as the cosine fourth law. The cos4 φ factor takes into
account the reduction in flux at the lens EP as the source position on the OP moves
away from the OA. For a rotationally symmetric system, this leads to natural
darkening with radial distance from the centre of the resulting image and is referred
to as fall-off, roll-off or natural vignetting. Examples are illustrated in figure 1.43.
Note that the angle φ is an object-space angle that is subtended by an off-axis
position on the OP from the centre of the EP [20, 21]. However, the cosine fourth law
is often incorrectly quoted with φ replaced by the image-space angle φ′ subtended
from the XP by the corresponding off-axis position on the IP. In fact, φ = φ′ only
under the conditions that the pupil magnification m p = 1 and the refractive indices of
the object-space and image-space media are equal, n = n′.
The relationship between the angles φ and φ′ can be found by applying the
Lagrange invariant to the pupils. Since the ratio of φ to φ′ is equal to the ratio between
the chief rays in object space and image space, the relationship is found to be

n′
φ= m p φ′ .
n

1.5.8 Camera equation

Photometric exposure Hv at a given position on the SP is deﬁned as the time integral
of the illuminance E v at that position,
t
H= ∫0 E (t′)dt′ .

Again the ‘v’ subscripts have been dropped for clarity. The time integral can be
replaced by a product provided the illuminance does not change during the exposure
duration,
H = Et. (1.58)

The time t is the exposure duration, informally referred to as the shutter speed.
Since illuminance E is the power or ﬂux received per unit area weighted by the
spectral response of the HVS, photometric exposure H is the electromagnetic energy
received per unit area weighted by the same response. Substituting equation (1.57)
into (1.58) yields the camera equation

π t
H= L 2 T cos4 φ . (1.59)
4 Nw

1-64
Physics of Digital Photography (Second Edition)

The T-number T# ⩾ N combines the f-number and lens transmittance factor,

N
T# = .
T
The T-number provides a more accurate indication of the expected photometric
exposure than N, and for that reason is commonly speciﬁed on cinematography
lenses. In terms of the T-number, the camera equation may be written
π t
H= L cos4 φ .
4 b 2 T#2

The camera equation describes the photometric exposure distribution at the SP. It
should be remembered that H is a function of position on the SP, where each
position is associated with an inﬁnitesimal area element dA′. In analogy with
equation (1.57), the camera equation can be expressed more explicitly using a
coordinate representation,
π ⎛⎜ x y ⎞⎟ t ⎧ ⎛ x y ⎞⎫
H (x , y ) = L , T cos4 ⎨φ⎜ , ⎟⎬ .
⎝
4 m m Nw ⎠ 2
⎩ ⎝ m m ⎠⎭
The coordinates (x , y ) indicate the position on the SP, and the coordinates
(x /m, y /m ) indicate the corresponding position on the OP. These two sets of
coordinates are related via the magniﬁcation. Again, the angle φ is the object-space
angle subtended from the centre of the EP by the off-axis scene position under
consideration. Even if the scene luminance distribution is uniform, H will vary over
the SP due to natural vignetting along with any mechanical vignetting.
For a given scene luminance distribution, the magnitude of the photometric
exposure distribution at the SP depends primarily upon the working f-number and
the exposure duration t. Photometric exposure is independent of the camera ISO
setting, although a higher ISO setting will reduce the maximum exposure that can be
tolerated. The choice of exposure settings depends upon the exposure strategy of the
photographer. Exposure strategy is discussed in chapter 2.

1.5.9 Shutters
A camera uses a shutter to control the exposure duration t. Three main types of
shutter are used in modern cameras.

Focal plane shutter

The modern focal plane (FP) shutter is a type of rolling shutter. It is capable of fast
shutter speeds of order 1/8000 s. It typically comprises two curtains situated in front of
the SP, and each curtain is composed of three horizontal blades. When the shutter is
activated, the first curtain opens vertically to expose the imaging sensor. As illustrated
in figure 1.44, the second curtain then closes in the same vertical direction to end the
exposure. The shutter speed or exposure duration t is defined as the measured time
from which the shutter is half open to when the shutter is half closed [9].

1-65
Physics of Digital Photography (Second Edition)

ﬁrst curtain opens second curtain closes

Figure 1.44. Operation of a focal plane shutter.

(a) (b)
shutter shutter
traversal time traversal time
height on SP

height on SP
shutter speed t t

(slow) (fast)

time → time →
shutter shutter
traversal time traversal time
Figure 1.45. (a) Traversal of each curtain when the shutter speed is slow. (b) Traversal of each curtain when
the shutter speed is faster than the curtain traversal time (sync speed).

For an FP shutter, the shutter traversal time is the time needed for a single curtain
to traverse the sensor. This value is typically of order 1/250 s and is the same for both
curtains. Two different scenarios are shown in figure 1.45 that illustrate the
significance of the curtain traversal time. Figure 1.45(a) shows the opening and
closing of the shutter for a slow shutter speed (long exposure duration) where
t > 1/250 s and figure 1.45(b) shows the opening and closing of the shutter for a fast
shutter speed (short exposure duration) where t < 1/250 s. Evidently, a shutter speed
faster than the shutter traversal time is obtained only when the second curtain starts
closing before the first curtain has fully opened. In this case, the imaging sensor is
exposed by a moving slit that is narrower than the height of the imaging sensor
(figure 1.46). This means that shutter speeds faster than the shutter traversal time
should not be used with conventional flash as the very brief flash duration would
freeze the appearance of the slit and produce dark bands in the photograph. For this
reason, the shutter traversal time is also known as the sync speed.
All types of rolling shutter can cause an artefact known as rolling shutter
distortion, particularly in photographs of scenes that include fast motion. Rolling
shutter distortion occurs due to the fact that different vertical positions on the SP are
not exposed at the exact same instant of time.

1-66
Physics of Digital Photography (Second Edition)

Figure 1.46. The sensor is exposed by a moving slit when the second curtain starts closing before the ﬁrst
curtain has fully opened. This occurs when shutter speeds faster than the sync speed (shutter traversal time) are
used.

Leaf shutter
This type of shutter is positioned next to the AS of the lens. It is commonly used in
compact cameras. The blades open from the centre and so the shutter traversal time
is very quick, which is advantageous for ﬂash photography. However, the fastest
achievable shutter speed is limited to about 1/2000 s since the same blades are used
for opening and closing the shutter.

Electronic shutter
Compared to a mechanical shutter, an electronic shutter allows much faster shutter
speeds to be used. At present, electronic shutter speeds of up to 1/32000 s are
available in consumer cameras. The fastest shutter speed depends on the row
readout speed, which is the time taken for the charge signals corresponding to a row
of pixels on the imaging sensor to be electronically read out. Advantages of
electronic shutters include silent operation, absence of mechanical wear and absence
of unwanted vibrations described as shutter shock.
In a CCD sensor, all rows can be read simultaneously. This enables an electronic
global shutter to be used, which has two main advantages. First, rolling shutter
distortion will be absent. Second, the shutter traversal speed is equivalent to the row
readout speed. Since the row readout speed is equivalent to the fastest possible
shutter speed, flash can be used at very fast shutter speeds.
At present, electronic global shutters have not been implemented in consumer
cameras with CMOS sensors, although this will likely change in the near future.
Instead, consumer cameras with CMOS sensors use an electronic rolling shutter that
requires each row to be read in sequence. Since each row needs to be exposed for the
same time duration, the exposure at a given row cannot be started until the readout
of the previous row has been completed. The shutter traversal time is therefore
limited by the frame readout speed, which is typically slower than the shutter
traversal time of a mechanical shutter. This has two main disadvantages. First,
rolling shutter distortion is more severe than that caused by mechanical FP shutters.
Second, the use of conventional flash is limited to shutter speeds slower than the
frame readout speed. In cameras that offer both a mechanical and an electronic
shutter, the use of flash with electronic rolling shutter is often disabled. Electronic
first curtain shutter is a compromise solution that offers some of the advantages of

1-67
Physics of Digital Photography (Second Edition)

an electronic shutter. The electronic rolling shutter is used only to start the exposure
at each row. This process is synchronized with a mechanical shutter that ends the
exposure.

1.5.10 f-number for aplanatic lenses

Further insight into the physical significance of the f-number can be achieved by
describing a lens using real ray optics rather than Gaussian optics. In place of the ray
tangent slopes used in Gaussian optics, real ray optics uses exact trigonometrical
relations based on equation (1.1) to trace real rays through an optical system [2, 3].
Figure 1.47 shows a real ray bundle entering and exiting the lens from the axial
position on the OP. The ray angles U and U ′ are the real angles defined by the
marginal ray in object space and image space, respectively. The IP shown is the one
that corresponds to the marginal ray; this IP may not coincide exactly with the
Gaussian IP when aberrations are present.
Recall from sections 1.3.1 and 1.3.2 that the marginal rays pass by the edges of
the pupils. Within Gaussian optics, the pupils are flat surfaces that coincide with the
object-space and image-space images of the AS. When described using real ray
optics, the pupils are not flat surfaces as this would violate Fermat’s principle [14].
(The same reasoning applies to the principal ‘planes’). For a well-corrected lens such
as an aplanatic lens, the pupils will be portions of spheres centred at the object and
image points. These pupils intersect the Gaussian pupils at the OA.
The flux passing through the EP shown in figure 1.47 is defined by equation (1.46)
with θ = U ,
Φ = πL sin2 U dA . (1.60)
Recall that the expression for the RA within Gaussian optics was derived in section
1.5.3 by utilising the Lagrange theorem to obtain a relationship between the ray
tangent slopes u, u′ and the Gaussian magnification,
nu
m= .
n′u′

OP IP
EP XP
n n

dA U U dA
z

Figure 1.47. When an optical system is described using real ray optics, the object-space and image-space
numerical apertures are deﬁned in terms of the half-cone angles U and U ′ formed by the real marginal ray.

1-68
Physics of Digital Photography (Second Edition)

In the present context, the equivalent theorem is the sine theorem. This can be used
to obtain a relationship between the real marginal ray angles U , U ′ and the
magnification M defined by the real marginal ray IP,
NA n sin U
M= = . (1.61)
NA′ n′ sin U ′
Here NA = n sin U is the object-space numerical aperture, and NA ’= n′ sin U ′ is
the image-space numerical aperture.
By substituting equation (1.61) into (1.60) and utilising the fact that M 2 = dA′/dA,
the illuminance E = (T Φ)/dA′ at the axial position on the IP becomes
π
E = LT × (RA)2
4
where RA is the relative aperture defined by the real marginal ray,
2 NA′ 2n′ sin U ′
RA = = .
n n
This is a generalisation of equation (1.50). Recall that the f-number is defined as the
reciprocal of the RA when focus is set at infinity, and therefore
n
N= . (1.62)
2 NA′∞

Here NA′∞ is the image-space numerical aperture when focus is set at infinity. It
should be noted that the above expression for the f-number defines the illuminance
at the axial position on the IP defined by the real marginal ray. In the presence of
aberrations, this IP may not correspond exactly with the plane of best focus defined
by the camera SP.
The image-space angle U ′ and image-space numerical aperture NA′ are maxi-
mised when focus is set at infinity. When a lens that utilises unit focusing is focused
on an OP positioned closer than infinity, the object-space angle U and object-space
numerical aperture NA both become larger and the magnification increases
according to equation (1.61). On the other hand, U ′ and NA′ both become smaller
and this reduces the illuminance at the axial position on the IP. The maximum
achievable magnification will in theory be higher if n > n′. This principle is utilised
in microscopes by immersing the objective in immersion oil that has n ≈ 1.5.
Conversely for a given scene luminance, the maximum achievable illuminance at
the axial position on the IP can in principle be increased by using an image-space
medium with n′ > n.
Now consider the case of an aplanatic lens. This is defined as a lens free from SA and
coma. According to Abbe’s sine condition [1, 2, 16], in an aplanatic lens the
magnification M defined by the real marginal ray is equal to the Gaussian magnifi-
cation m, and so the sine theorem defined by equation (1.61) takes the following form:
sin U ′ sin U
= (aplanatic lens).
u′ u

1-69
Physics of Digital Photography (Second Edition)

EP H H SP

f
D/2
U
z

f
Figure 1.48. When an aplanatic lens focused at inﬁnity is described using real ray optics, the second equivalent
refracting surface or second principal surface (dashed blue curve) is part of a perfect hemisphere centred at the
rear focal point. The Gaussian principal planes are indicated by H and H’. The EP diameter is D. The XP is
not involved.

Figure 1.48 shows an aplanatic lens with focus set at infinity. In this special case, the
object-space angles sin U and u approach zero but the quotient sin U /u approaches
unity, and therefore sin U ′ = u′ [22]. In section 1.5.4 and figure 1.41, it was shown
that
D
u (′s→∞) = .
2f ′
This means that the image-space numerical aperture for an aplanatic lens focused at
infinity can be written as follows:
D
NA′∞ = n′ (aplanatic lens).
2f ′
The final result is obtained by substituting NA′∞ into equation (1.62) and utilising
the fact that n′/f ′ = n/f,

n f
N= = (aplanatic lens) . (1.63)
2 NA′∞ D

In other words, the Gaussian expression for the f-number is exact for an aplanatic
lens [1, 5, 14]. Equation (1.63) is important for two main reasons:

1-70
Physics of Digital Photography (Second Edition)

D/2
U
u

f
Figure 1.49. Remembering that u′ must be interpreted as a tangent when the paraxial region is extended, the
sine of the real angle U ′ must equal u′ when an aplanatic lens is focused at inﬁnity.

1. It shows that the f-number cannot be made arbitrarily small. The maximum
value of the sine function is unity and so the minimum possible f-number in
air is seen to be N = 0.5 for an aplanatic lens. A similar limit can be expected
for non-aplanatic lenses that have been well-corrected. The limit can be
lowered by using an image-space medium with a higher refractive index than
object space so that n′ > n [2].
2. In order for equation (1.63) to hold, the real image-space ray bundle must be
associated with a second equivalent refracting surface or second principal
surface that takes the form of a perfect hemisphere of radius f ′ centred at the
rear focal point. The geometry is shown in ﬁgure 1.48 and 1.49. This is
consistent with the fact that the principal ‘planes’ obtained by extending the
paraxial region within Gaussian optics are actually curved surfaces centred
at the object and image points when described using real ray optics [1, 3, 5].

References
[1] Jenkins F A and White H E 1976 Fundamentals of Optics 4th edn (New York: McGraw-Hill)
[2] Kingslake R and Johnson R B 2010 Lens Design Fundamentals 2nd edn (New York:
Academic)
[3] Smith W J 2007 Modern Optical Engineering 4th edn (New York: McGraw-Hill)
[4] Kingslake R 1983 Optical System Design 1st edn (New York: Academic)
[5] Kingslake R 1992 Optics in Photography, SPIE Press Monograph vol PM06 (Bellingham,
WA: SPIE)
[6] Ray S F 2002 Applied Photographic Optics: Lenses and Optical Systems for Photography,
Film, Video, Electronic and Digital Imaging 3rd edn (Oxford: Focal Press)
[7] Greivenkamp J E 2004 SPIE Field Guides Field Guide to Geometrical Optics vol FG01
(Bellingham, WA: SPIE)
[8] Johnson R B 2008 Correctly making panoramic imagery and the meaning of optical center
Proc. SPIE 7060 70600F
[9] Goldberg N 1992 Camera Technology: The Dark Side of the Lens (New York: Academic)
[10] Kreitzer M H 1982 Internal focusing telephoto lens US Patent Speciﬁcation 4359297
[11] Sato S 1994 Internal focusing telephoto lens US Patent Speciﬁcation 5323270

1-71
Physics of Digital Photography (Second Edition)

[12] Mukai H, Karasaki T and Kawamura K 1985 Focus conditioning detecting device for
cameras US Patent Speciﬁcation 4552445
[13] Stauffer N L 1975 Auto focus camera US Patent Speciﬁcation 3860035
[14] Blahnik V 2014 About the Irradiance and Apertures of Camera Lenses (Oberkochen: Carl
Zeiss Camera Lens Division)
[15] Nasse H H 2010 Depth of Field and Bokeh (Oberkochen: Carl Zeiss Camera Lens Division)
[16] Born M and Wolf E 1999 Principles of Optics: Electromagnetic Theory of Propagation,
Interference and Diffraction of Light 7th edn (Cambridge: Cambridge University Press)
[17] Sutton T and Dawson G 1867 A Dictionary of Photography (London: Sampson Low, Son, &
Marston)
[18] Hatch M R and Stoltzmann D E 1980 The f-stops here Opt. Spectra 80
[19] Foote P D 1915 Illumination from a radiating disk Bull. Bureau Stand. 12 583
[20] Koyama T 2006 Optics in digital still cameras Image Sensors and Signal Processing for
Digital Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 2
[21] Kerr D 2007 Derivation of the ‘Cosine Fourth’ law for falloff of illuminance across a camera
image unpublished
[22] Sasian J 2012 Introduction to Aberrations in Optical Imaging Systems (Cambridge:
Cambridge University Press)

1-72
IOP Publishing

Physics of Digital Photography (Second Edition)

D A Rowlands

Chapter 2
Digital output and exposure strategy

Chapter 1 derived the optical formulae required to define the illuminance distribu-
tion formed by an optical image at the sensor plane (SP). The photometric exposure
distribution at the SP was defined as the illuminance distribution multiplied by the
exposure time or duration t, as described by the camera equation derived in section 1.5.
For a time-dependent illuminance distribution, the multiplication generalises to a time
integral.
Although the choice of photometric exposure distribution depends upon the
aesthetic and technical requirements of the individual photographer, the exposure
distribution must at least generate a useful response from the imaging sensor so that
a satisfactory digital output image can be obtained. The Camera and Imaging
Products Association of Japan (CIPA) DC-004 and International Organization for
Standardization (ISO) 12232 photographic standards [1, 2] provide a standard
exposure strategy based on the digital output JPEG image produced by the camera,
and not the raw data. For a typical photographic scene, standard exposure strategy
aims to map the metered average scene luminance to a standard mid-tone lightness
in the output JPEG image file. This can be used as a starting point for further
adjustments to accommodate non-typical scenes or aesthetic requirements.
Since the standard exposure strategy is based upon the digital output JPEG image
file, this chapter begins with an overview of the chain of processes that lead to the
production of a JPEG image from the photometric exposure distribution at the SP.
These processes are described in much greater detail in chapters 3 and 4, which cover
raw data and raw conversion, respectively.
Subsequently, the theory of the standard exposure strategy is developed. This is
followed by a description of the metering and exposure modes found on modern
digital cameras, and advanced topics related to a practical exposure strategy such as
photographic lighting, the use of photographic filters and high dynamic range
imaging.

doi:10.1088/978-0-7503-2558-5ch2 2-1 ª IOP Publishing Ltd 2020

Physics of Digital Photography (Second Edition)

Finally, it should be noted that the standard exposure strategy does not aim to
maximise image quality. Image quality in relation to photographic practice is
discussed in chapter 5.

2.1 Raw data

Ideally, the raw data produced by a camera will be an array of digital values that
provide a normalised representation of the scene luminance distribution within the
dynamic range (DR) constraints imposed by the electronics. For colour cameras, the
raw data should also provide colour information.

2.1.1 Sensor response

The photometric exposure distribution at the SP is defined by the camera equation
derived in section 1.5 of chapter 1. This distribution generates photoelectrons at the
sensor pixels or photosites. The total number of photoelectrons, ne,j , generated at a
given photosite j for the exposure duration is proportional to the average photo-
metric exposure per photosite 〈Hj 〉,
ne,j ∝ 〈Hj〉.
This relation is valid only for exposure levels that produce a useful response from the
imaging sensor. Furthermore, this relation assumes that the sensor response is
perfectly linear and that the range of light wavelengths over which the sensor
responds is identical to that of the human visual system (HVS). The proportionality
constant is fixed only if the Luther–Ives condition is satisfied, which is discussed
further in section 2.12.1 and chapter 4. The precise equation will be derived using
radiometry in chapter 3.
For a given photosite, the photoelectrons generate a voltage signal proportional
to ne,j :
Vj ∝ n ej . (2.1)

An analog-to-digital converter (ADC) converts the voltage into a raw value, which
can be one of many possible discrete raw levels. Raw values can be expressed in
terms of digital numbers (DN) or identically in terms of analog-to-digital units
(ADU). The raw values associated with all photosites comprise the raw data. The
raw data together with camera metadata can be stored in a raw file.
Each raw level is specified by a string of binary digits or bits, each of which can be
either 0 or 1. The length of the string is known as the bit depth. For a bit depth equal
to M, the number of possible raw levels is given by 2M . A raw file with a bit depth
equal to M is referred to as an M-bit raw file. An M-bit raw file therefore provides
2M possible raw levels per photosite. Typically M = 10, 12 or 14 in consumer
cameras.
Figure 2.1 shows an idealised sensor response curve. The proportionality between
scene luminance and electron count is evident. Modern imaging sensors respond
linearly over much of the sensor response curve. However, a useful response to

2-2
Physics of Digital Photography (Second Edition)

FWC

electrons
signal

noise floor

photons
Figure 2.1. Model sensor response curve. A useful response is obtained between the noise ﬂoor and full-well
capacity.

photometric exposure is obtained only between lower and upper electron counts, ne,j ,
referred to as the noise floor and full-well capacity (FWC), respectively. The noise
floor is defined by the read noise, which is the signal noise due to the electronic
readout circuitry. Read noise will be present every time charge readout occurs, even
in the absence of photometric exposure. It therefore defines the minimum usable
output signal from an engineering perspective. FWC is the maximum number of
photoelectrons that can be stored at a photosite. When FWC is reached, the
photosite is described as saturated.

2.1.2 Colour
A colour can be specified by its luminance and chromaticity. Luminance only defines
the achromatic or greyscale component of a colour, and so full colour reproduction
requires a strategy for detecting chromaticity.
As described in chapter 4, the human eye uses three types of cone cells for
detecting colour. Each colour can be specified in terms of a unique set of three
tristimulus values that arise from the light entering the eye and the response of each
type of cone cell. The LMS colour space describes all possible colours in terms of all
valid sets of tristimulus values. A colour space that describes all possible colours is
referred to as a reference colour space.
A similar approach is used in consumer cameras. One of three different types of
colour filter are placed above each photosite on the imaging sensor. The pattern of
colour filters forms a colour filter array (CFA). For example, figure 2.2 illustrates a
Bayer CFA, which uses a pattern of red, green and blue filters [3]. When the voltage
signal from a given photosite is quantized by the ADC, a raw value R , G or B
associated with a given type of filter will be recorded. Since only one type of filter can
be placed over each photosite, full RGB information is not available at each
photosite location. This missing information can be obtained through interpolation
by carrying out a computational process known as colour demosaicing.
After the colour demosaic has been performed, the colour of the light recorded at
a given photosite is described by a set of R , G , B values, which can be referred to as
raw tristimulus values. The internal camera raw space describes all possible colours

2-3
Physics of Digital Photography (Second Edition)

Figure 2.2. Bayer CFA showing the red, green and blue mosaics.

that the camera can record. Although the raw tristimulus values specifying a given
colour in the camera raw space are not numerically the same as the tristimulus
values that specify the same colour in the LMS colour space, a linear transformation
should exist between the camera raw space and the LMS colour space if the camera
is to correctly record colour. Also notice that the camera raw space is sampled
discretely since the number of raw levels is restricted to 2M for each colour
component per photosite.
The camera raw space is device-dependent and may contain many colours that
cannot be displayed on a standard three-channel display monitor. Therefore, the
camera raw space is not a suitable colour space for viewing images. However, the
camera raw space can be transformed into a standard output-referred colour space
designed for viewing images on a standard display monitor. Familiar examples
include sRGB [4] and AdobeⓇ RGB [5].
The linear RGB components of the output-referred colour space are denoted by
RL , G L , BL . These can be referred to as relative tristimulus values of the output-
referred colour space. The set of raw tristimulus values, R , G , B obtained from a
given photosite can be transformed into RL , G L , BL values by applying the following
linear matrix transformation:
⎡ RL ⎤ ⎡R⎤
⎢ ⎥ ⎢ ⎥
⎢G L ⎥ = RD⎢ G ⎥ .
̲ ̲
⎣ BL ⎦ ⎣B ⎦
Here R̲ is a colour rotation matrix,
⎡ R11 R12 R13 ⎤
⎢ ⎥
R̲ = ⎢ R21 R22 R23 ⎥ .
⎣ R31 R32 R33⎦
The matrix entries are colour space and camera dependent, however each row will
always sum to unity. The diagonal matrix D̲ takes care of white balance, which will
be discussed in chapter 4. Colour rotation matrices can be determined by character-
ising the colour response of a camera, which is also described in chapter 4.
When a colour is represented by RL , G L , BL , the luminance component of the
colour is obtained as a weighted sum. In the case of the sRGB colour space, the
weighting is deﬁned as follows:
Y = 0.2126 RL + 0.7152 G L + 0.0722 BL. (2.2)

2-4
Physics of Digital Photography (Second Edition)

Here Y is referred to as relative luminance as it is a normalised representation of the

scene luminance, L v, that corresponds to the photosite or sensor pixel location.
Before describing how RL , G L , BL are transformed into the digital output levels
(DOLs) of an output image ﬁle, the important concept of DR transfer is discussed
below.

2.1.3 Dynamic range transfer

Dynamic range (DR) is an important concept that is fundamental to the develop-
ment of a photographic exposure strategy. In particular, it is important to under-
stand how DR is transferred through the various stages of the photographic imaging
chain.
Scene DR
Scene DR is often specified as the scene luminance ratio between the highest and
lowest scene luminance values. For example, a typical daylight scene may have a
scene DR of 128:1.
Luminance ratios can also be specified in terms of the photographic stop, which is
a generalisation of the f-stop concept introduced in section 1.5.6 of chapter 1, and is
discussed further in section 2.5.6. Since each stop corresponds to a doubling or
halving of luminance, a 128:1 scene luminance ratio is equivalent to a scene DR of
log2128 = 7 stops since the lowest luminance value needs to be doubled 7 times in
order to equal the highest luminance value.
The aim of a photographic exposure strategy is to appropriately position the
scene DR on the sensor response curve.
ADC DR
The ADC DR describes the maximum scene DR that can be quantized by the ADC
The ADC quantizes the analog voltage described by equation (2.1) into a raw
level specified as a DN. A linear ADC uses raw levels that are proportional to the
voltage and scene luminance up to the quantization step. An M-bit linear ADC
provides 2M raw levels and can therefore quantize log22M = M stops of DR in
principle, assuming that the sensor response curve is perfectly linear. In this case, the
DR provided by a linear ADC is directly specified by its bit depth.
For example, DN = 4096 represents a scene luminance level 4096 times greater
than DN = 1. In this case DN = 1 needs to be doubled 12 times in order to equal
DN = 4096, and so the raw levels from DN = 1 to DN = 4096 represent 12 stops of
DR. ADC saturation occurs when the input voltage is quantized to the highest
available DN.
Raw DR
Raw DR is the maximum scene DR that can be represented by the raw data. If the
sensor response curve and ADC are both perfectly linear, the raw DR cannot be
greater than the ADC DR.

2-5
Physics of Digital Photography (Second Edition)

Raw DR is limited by the noise floor from below, and by FWC or ADC
saturation from above. Ideally the scene DR will be less than the raw DR, otherwise
scene information will be clipped irrespective of the photographic exposure strategy.
As described in chapter 5, raw DR can be specified using electrons or raw levels.
In terms of raw levels, raw DR per photosite can be defined as follows:
n DN,clip
raw DR(ratio) = :1
σDN,read
⎛ n DN,clip ⎞ (2.3)
raw DR(stops) = log2⎜ ⎟.
⎝ σDN,read ⎠
Here nDN,clip is the raw clipping point expressed as a DN, which is the maximum
usable raw level in the raw data. The noise floor or read noise expressed using DN
has been denoted by σDN,read , and this should not drop below the quantization step
(1 DN) on average.
Image DR
Image DR is the maximum DR that can be represented by the encoded output
image file such as a JPEG or TIFF file.
Since the DOLs used by an image file are not involved in the capture of raw data,
they can in principle represent an infinite amount of DR. However, the image DR
cannot be greater than the raw DR when the output image file is produced from the
raw file.
The entire raw DR can be transferred to the image file if an appropriate nonlinear
tone curve is applied, even if the image file bit depth is lower than the bit depth of the
raw data. In this case, the DOLs are nonlinearly related to the scene luminance,
which is discussed further in the next section. For accurate luminance reproduction,
any nonlinearity must be compensated for by an opposing display nonlinearity.
For reasons discussed in section 2.3, the image DR is typically much less than the
raw DR when the output image file is encoded using the default tone curve applied
by the camera manufacturer. In order to access more of the available raw DR, a
custom tone curve can be applied using an external raw converter.
Display DR
The display DR is the contrast ratio between the highest and lowest luminance
values that can be produced by the display medium.
If the display DR is less than the image DR, the image DR can be compressed
into the display DR through tone mapping, for example by reducing display
contrast. Display DR is discussed further in section 2.13.

2.2 Digital output levels

Section 2.1 very brieﬂy described how the raw data is generated and transformed
from the camera raw space into an output-referred colour space such as sRGB.
When represented using an output-referred colour space, the raw data for a given
sensor pixel location is speciﬁed in terms of relative tristimulus values, RL , G L , BL , of
the output-referred colour space.

2-6
Physics of Digital Photography (Second Edition)

However, RL , G L , BL are not directly encoded in the output JPEG image ﬁle.
Instead, bit depth reduction is performed in conjunction with gamma encoding, and it
is the resulting digital values, R ′DOL , G ′DOL , B ′DOL , known as digital output levels
(DOLs) that are encoded. Since the standard exposure strategy is based upon the
output JPEG image ﬁle, it is important to gain an understanding of the nature of the
DOLs.

2.2.1 Bit depth reduction

Digital images are typically viewed on 8-bit displays, which provide 28 = 256 digital
levels per colour channel. Since there are three colour channels, a total of (28)3 =
16 777 216 possible colours can be shown. Accordingly, in-camera image processing
engines produce 8-bit output JPEG image files. In this case, the colour of an image
pixel is specified by three colour components in the output-referred colour space,
each taking one of 256 possible normalised values ranging from 0 to 255. These
define the DOLs.
Recall that the scene DR that can be represented by the raw data defines the raw
DR. If the imaging sensor response and ADC are both perfectly linear, the raw DR
expressed in stops cannot be greater than the bit depth of the ADC, which is
typically 10, 12 or 14 in consumer cameras.
Evidently, bit depth needs to be reduced when converting the raw data into an
8-bit output image file. An immediate concern is that doing so could lead to a loss of
DR. However, this concern is erroneous, the reason being that the DOLs of an
output image file are not involved in the capture of the raw data. Accordingly, they
can be assigned to represent any desired amount of DR in principle. Nevertheless,
the represented scene DR cannot be greater than the raw DR when the output image
file is produced from the raw file.
Figure 2.3 provides an illustration of the fact that after the raw data has been
captured, the number of tonal levels used by an image encoding is independent from
the DR.

Figure 2.3. The upper diagram shows a wide DR represented using few tonal levels. The lower diagram shows
a narrower DR represented using a greater number of tonal levels.

2-7
Physics of Digital Photography (Second Edition)

Although the 8-bit DOLs of an output image ﬁle can represent the entire raw DR
in principle, they need to be appropriately allocated. It turns out that DOLs cannot
be allocated in a linear manner as this would introduce visible banding or
posterisation artefacts that degrade image quality. This problem can be overcome
by allocating DOLs in a nonlinear manner so that they become nonlinearly related to
the raw data. This allocation procedure is known as gamma encoding. However, the
nonlinearity needs to be compensated for when displaying the image. This is known
as gamma decoding.

2.2.2 Posterisation
Banding or posterisation artefacts occur when a specified DR is displayed using an
insufficient number of tonal transitions. For example, the upper diagram of figure 2.4
shows a linear luminance gradient that appears smooth. In contrast, an insufficient
number of tonal levels have been used to display the same linear gradient in the
lower diagram, and so posterisation artefacts are visible.
When designing a digital camera, the ADC bit depth is chosen such that the
minimum noise level in the raw data exceeds the quantization step (1 DN or 1 ADU)
on average. Consequently, luminance gradients appear smooth since the tonal
transitions defined by the raw levels are smoothed or dithered by the noise, and so
raw data is never posterised [6]. However, the noise level would in general be too low
to dither the tonal transitions if the bit depth of the raw data were to be subsequently
reduced to 8 in a linear manner.
However, posterisation can be minimised by performing the bit depth reduction
in a nonlinear manner. The idea is to make efficient use of the 256 available tonal
levels per channel by allocating more levels to tonal transitions that the HVS can
more easily perceive, while allocating fewer levels to tonal transitions that are

Figure 2.4. The upper diagram shows a linear gradient of 8-bit gamma-encoded DOLs; this appears correctly
as a linear luminance gradient on a display with a compensating nonlinear display gamma. The middle and
lower diagrams show posterisation when the data is truncated to 6 bits or 64 tonal levels, and 5 bits or 32 tonal
levels, respectively.

2-8
Physics of Digital Photography (Second Edition)

invisible. In other words, it is necessary to take into account the way that the HVS
perceives luminance.

2.2.3 Lightness
The HVS does not perceive luminance linearly in terms of its physiological
brightness response. According to the Weber–Fechner law, the relationship is
approximately logarithmic. Physically this means that darker tones are perceived
as brighter than their luminance values would suggest, as illustrated by observing the
linear luminance gradient in the upper diagram of figure 2.4.
When using relative colourimetry, lightness can be thought of as brightness
defined relative to a reference white. Whereas brightness as a descriptor ranges from
‘dim’ to ‘bright’, lightness ranges from ‘dark’ to ‘light’. The lightness function L*
defined by the CIE (International Commission on Illumination) is specified by the
following formula:
L* = 116 f (Y ) − 16,
where
⎧Y 1/3 if Y > δ 3
⎪
f (Y ) = ⎨ Y 4 ,
⎪ + otherwise
⎩ 3δ 2
29
and δ = 6/29. Here Y = L /L n denotes relative luminance, with L n being the
luminance of the reference white. Lightness is therefore a nonlinear function of
relative luminance and takes values between 0 and 100 or 100%, as illustrated in
figure 2.5. The lightness function is part of the colour model associated with the CIE
LAB perceptually uniform reference colour space defined in section 4.4.3 of
chapter 4. In particular, notice that 18.4% relative luminance lies midway on the
lightness scale as this value corresponds to L* = 50 when Y is normalised to the range
[0,100]. Accordingly, 18.4% relative luminance and L* = 50% are referred to as
middle grey.
When reducing bit depth to 8, posterisation can be minimised by using an
encoding that linearly allocates DOLs according to lightness L* rather than relative
luminance Y. In this way, more DOLs will be allocated to luminance levels that the
HVS is more sensitive to, while fewer DOLs will be allocated to luminance levels
that the HVS cannot easily discern from neighbouring ones. A so-called encoding
gamma curve that is similar to L* is used in practice.

2.2.4 Gamma encoding

After the raw data has been appropriately converted from the camera raw space into
an output-referred colour space such as sRGB, the image DOLs are allocated by
ﬁrst applying a nonlinear encoding gamma curve, γE , that takes the form of a power
law [7, 8],

2-9
Physics of Digital Photography (Second Edition)

Figure 2.5. 18% relative luminance corresponds to 50% lightness (middle grey) on the CIE lightness curve. The
corresponding 8-bit DOL in the sRGB colour space is 118. This DOL is approximately 46% of the maximum
DOL since the sRGB gamma curve is not identical to the CIE lightness curve.

R′ ∝ (RL )γE
G′ ∝ (G L )γE (2.4)
B′ ∝ (BL )γE .

• RL , G L , BL denote relative tristimulus values in the linear form of the chosen

output-referred colour space. Here the values are normalised to the range
[0,1], and the number of possible values depends upon the bit depth of the raw
data along with the raw clipping point.
• R′, G′, B′ are nonlinearly related to the relative tristimulus values.
• Encoding gamma values used in practice typically range from around 1/3 to
1/2.2, depending on the choice of output-referred colour space [7, 8]. An
example with γE = 1/2.2 is shown in relation to the CIE lightness curve in
ﬁgure 2.5.

Subsequently, the image DOLs are determined by appropriately normalising and

quantising R′, G′, B′. For example, 8-bit DOLs are obtained by normalising R′, G′,
B′ to the range [0,255] and then quantizing to the nearest integer:
′ = Round (255 × R′)
RDOL
′ = Round (255 × G′)
G DOL (2.5)
′ = Round (255 × B′).
BDOL

2-10
Physics of Digital Photography (Second Edition)

Figure 2.6. A γE = 1/2.2 curve applied to 12-bit linear raw data normalised to the range [0,1] and subsequently
quantised to 8-bit DOLs.

A gamma curve is incorporated into the deﬁnition of standard output-referred

colour spaces such as sRGB [4] and AdobeⓇ RGB [5]. This nonlinearity is included
as a final step after linearly transforming from the camera raw space into the linear
form of the chosen output-referred colour space.
Figure 2.6 shows an example gamma curve defined by γE = 1/2.2 applied to linear
raw data normalised to the range [0,1] and subsequently quantized to 8-bit output.
Two example regions are shown. The first region with normalised raw level between
0.0 and 0.1 has been allocated 90 DOLs on account of the relatively high sensitivity
of the HVS to this luminance range. On the other hand, the second region between
0.9 and 1.0 has only been allocated 12 levels on account of the lower sensitivity of the
HVS to this luminance range.

2.2.5 Gamma decoding

For photometrically correct tone reproduction, the luminance distribution presented
to the observer of the output image will ideally be linearly related to the scene
luminance distribution over the represented DR. This requires that the luminance
distribution emitted by the display device be linearly related to the scene luminance
distribution and raw data.
Accordingly, the nonlinearity introduced by gamma encoding must be cancelled
out through gamma expansion or gamma decoding when the image is viewed. The
gamma decoding curve, γD, is typically applied by the display device. For example,
a display gamma deﬁned by γD = 2.2 will compensate for an encoding gamma
γE = 1/2.2. This is shown graphically in ﬁgure 2.7. Mathematically, the relationship
can be modelled in the following way:

2-11
Physics of Digital Photography (Second Edition)

Figure 2.7. Gamma curves: γE = 1/2.2 (blue), γ = 1 (black), γD = 2.2 (green).

V ′ = V γE
(2.6)
L n = C{V ′}γD + B.
Here V ′ = R′, G′ or B′ are the nonlinear values defined by equation (2.4), L n
represents the normalised luminance output from the display, C is a gain and B is a
black-level offset. The gain and offset control the display DR, which is discussed in
section 2.13.1. Note that a single quantity derived from the set of DOLs known as
luma is used for V ′ in practice.
The significance of equation (2.6) is that the display gamma γD is the reciprocal of
the encoding gamma γE . Consequently, the overall gamma will be unity and L n will
be linearly related to V,
L n = CV + B.
However, there may be several nonlinearities that need to be compensated for in a
general imaging chain, each associated with their own gamma. When differing
gammas exist between successive stages in an imaging chain, a change of gamma is
referred to as gamma correction. In practice, an overall gamma slightly higher than
unity is preferred in order to account for environmental viewing factors such as
flare [7].
The term ‘gamma correction’ should be avoided when referring to the encoding
gamma as this may give the impression that the raw data is in some way being
corrected. It should be remembered that linear raw data would appear at an
appropriate lightness if viewed on a linear display. The primary purpose of gamma
encoding and decoding is to prevent posterisation artefacts from arising when
reducing bit depth to 8 by more efficiently utilizing the available tonal levels. The

2-12
Physics of Digital Photography (Second Edition)

approximately logarithmic brightness response of the HVS to luminance is only used

to appropriately allocate levels. This nonlinearity is compensated for by introducing
a display gamma so that the overall input–output luminance relationship remains
linear.
Finally, it is instructive to note that gamma encoding was originally required to
compensate for the natural nonlinear response of cathode-ray tube (CRT) monitors.
In other words, gamma encoding historically played a different role since many
modern types of display do not naturally respond in this way. Instead, gamma
encoding and decoding is required in modern digital imaging in order to minimise
posterisation artefacts when reducing bit depth to 8, as described above.

2.3 Image dynamic range

Recall that the raw DR is the maximum scene DR that can be represented by the
raw data. The raw DR is restricted from above by either FWC or the ADC bit depth
M, and from below by the read noise. If the sensor response curve and ADC are
both perfectly linear, the raw DR cannot be greater than M provided the noise stays
above the quantization step on average. Clipping will occur if the scene DR is
greater than the raw DR.
The image dynamic range (image DR) is the maximum scene DR that can be
represented by the DOLs of an encoded output image file such as a JPEG or TIFF
file. In principle, the image DOLs can encode any desired amount of DR since they
are not involved with the capture of raw data. However, the image DR cannot be
greater than the raw DR when the image file is produced from the raw file since the
raw DR defines the maximum scene DR available for producing an image.
The actual image DR depends primarily upon the tone curve used by the encoding
DOLs. An encoding gamma curve can be regarded as a default tone curve designed
to render photometrically correct images on a display with a compensating decoding
gamma. However, in-camera image processing engines typically apply tone curves
that differ from the encoding gamma curve, and these can alter the image DR. In
general the image DR can be defined in the following way:
Y (max. DOL)
image DR (ratio) = :1 (2.7a )
Y (min. non‐clipped DOL)

⎛ Y (max. DOL) ⎞
image DR (stops) = log2⎜ ⎟. (2.7b)
⎝ Y (min. non‐clipped DOL) ⎠

• Y (max. DOL) is the relative luminance represented by the maximum DOL.

This corresponds to the maximum representable relative scene luminance.
The maximum DOL is 255 for 8-bit output.
• The minimum non-clipped DOL is the minimum non-zero DOL that results
from a non-zero relative scene luminance Y. This DOL is known as the black
clipping level as it corresponds to the minimum representable non-clipped
relative scene luminance denoted by Y (min. non-clipped DOL).

2-13
Physics of Digital Photography (Second Edition)

The image DR that can be represented by the standard encoding gamma curve of a
chosen output-referred colour space is discussed below. Subsequently, general tone
curves and their effect on image DR are described.

2.3.1 Gamma curves

For a given bit depth, gamma encoding can compress more of the raw DR into a
representation by DOLs compared to a linear encoding. Since the gamma decoding
by the display device occurs after the quantization to DOLs, the represented scene
DR can in principle be preserved when the image is displayed.
In general, 8-bit DOLs encoded using the gamma curve of an output-referred
colour space can accommodate the following image DR:
⎛ (255/255)1/γE ⎞
image DR (stops) = log2⎜ ⎟.
⎝ (1/255)1/γE ⎠
The above expression follows from equation (2.7b) by dividing the DOLs by 255 and
then reversing the encoding gamma curve.
As an example, the AdobeⓇ RGB colour space uses an encoding gamma curve
with γE = 1/2.2. By using the above formula, the image DR that can be
accommodated by the gamma curve is found to be 17.59 stops. This is much larger
than the raw DR provided by consumer cameras, and so 8-bit images encoded using
the Adobe RGB colour space can in principle represent the entire raw DR.
The calculation for 8-bit DOLs encoded using the sRGB colour space is slightly
more complicated since sRGB uses a piecewise encoding gamma curve. The details
of the sRGB gamma curve are given in section 4.10.1 of chapter 4, and the image
DR calculation is given below.
sRGB gamma curve
The maximum relative luminance Y that can be represented by the sRGB encoding
gamma curve corresponds to the following pixel value that deﬁnes the white clipping
point:
′ , G DOL
(RDOL ′ , BDOL
′ ) = (255, 255, 255).
The relative tristimulus values that correspond to the above pixel are given by
dividing by 255 and then reversing the piece of the encoding gamma curve deﬁned by
equation (4.26) of chapter 4:
⎡ RL ⎤ ⎡1.055 (255/255)2.4 − 0.055⎤ ⎡1⎤
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎢G L ⎥ = ⎢1.055 (255/255) − 0.055⎥ = ⎢1⎥ .
2.4 (2.8)
⎣ BL ⎦ ⎢⎣1.055 (255/255)2.4 − 0.055⎥⎦ ⎣1⎦

The corresponding relative luminance Y is given by converting from sRGB to the

CIE XYZ colour space using equation (2.2), which is derived from equation (4.14) of
chapter 4:

2-14
Physics of Digital Photography (Second Edition)

Y = 0.2126 RL + 0.7152 G L + 0.0722 BL. (2.9)

Evidently the weighted sum of the relative tristimulus values sums to unity and Y = 1
is the maximum relative luminance that can be represented by the sRGB encoding
gamma curve.
The minimum relative luminance Y that can be represented by the sRGB
encoding gamma curve corresponds to the following non-clipped pixel value:
′ , G DOL
(RDOL ′ , BDOL
′ ) = (1, 1, 1).
The relative tristimulus values that correspond to this pixel are given by dividing by
255 for 8-bit DOLs and then reversing the piece of the encoding gamma curve
deﬁned by equation (4.25) of chapter 4:
⎡ RL ⎤ ⎡(1/255)/12.92 ⎤
⎢ ⎥ ⎢ ⎥
⎢G L ⎥ = ⎢(1/255)/12.92 ⎥ . (2.10)
⎣ BL ⎦ ⎢⎣(1/255)/12.92 ⎥⎦

Substituting equations (2.8) and (2.10) into equation (2.7b) and utilising equation
(2.9) reveals that 8-bit DOLs encoded using the sRGB gamma curve can represent
the following image DR:
⎛ 1 ⎞
image DR (sRGB gamma) = log2⎜ ⎟ ≈ 11.69 stops.
⎝ (1/255)/12.92 ⎠
This is comparable with the raw DR provided by consumer cameras. If required, it
may be necessary in some cases to use a custom tone curve to transfer the full raw
DR to the output image ﬁle.

2.3.2 Tone curves

An encoding gamma curve can be considered as a special type of tone curve designed
to render photometrically correct output images on a display with a compensating
decoding gamma. When plotted on axes representing gamma-encoded values, a
gamma curve appears as a diagonal straight line. This is illustrated for γE = 1/2.2 by
the black curve in ﬁgure 2.8.
The application of any tone curve beyond an encoding gamma curve will produce
in an image that is no longer accurate or photometrically correct. This is known as
preferred tone reproduction. In-camera image processing engines used by traditional
DSLR cameras apply a type of tone curve known as an s-curve to the raw data
before applying the encoding gamma curve. In practice, the two curves are typically
combined into a single tone curve that is applied using a look-up table (LUT). As
illustrated by the blue curve in ﬁgure 2.8, this type of combined tone curve can also
be interpreted as an s-curve as it has a characteristic ‘s’-shape when plotted using
axes representing gamma-encoded values. The lower part of the s-curve is known as
the toe, and the upper part as the knee or shoulder. Camera manufacturers apply
s-curves for two main reasons.

2-15
Physics of Digital Photography (Second Edition)

Figure 2.8. (Left) Example tone curve (blue) and gamma curve with γE = 1/2.2 (black) applied to linear raw
data normalised to the range [0,1]. (Right) Corresponding curves plotted using gamma-encoded axis values
with γE = 1/2.2.

1. An s-curve can reduce the raw DR transferred to the output image ﬁle (the
image DR) down to a value commensurate with the contrast ratio of a
typical display (the display DR) or the contrast ratio of a typical photo-
graphic scene.
2. An s-curve increases mid-tone contrast in the image, and this is considered to
be visually more pleasing to the HVS. The increased mid-tone contrast
occurs at the expense of contrast in the shadows and highlights, which
become compressed.

An s-curve reduces the raw DR transferred to the output image file by raising the
black clipping level and therefore lowers the image DR compared to that provided
by the encoding gamma curve of the chosen output-referred colour space.
The example s-curve illustrated in figure 2.8 can accommodate approximately 7.7
stops of raw DR after quantizing to 8-bit DOLs compared to the 17.59 stops
provided by the example encoding gamma curve with γE = 1/2.2. In this example, the
image DR is slightly less than 8 stops since the s-curve gradient near the black level is
slightly less than unity when plotted on linear axes.
In general, the image DR represented by an 8-bit JPEG file that has already had
an arbitrary tone curve applied cannot be deduced unless the details of the tone
curve are known. The tone curve can be determined experimentally by photograph-
ing a calibrated step wedge and calculating the opto-electronic conversion function
(OECF) [9]. The OECF defines the relationship between radiant flux (see chapter 3)
at the SP and the output image file DOLs.
Finally, it should be noted that unlike traditional DSLR cameras, modern
smartphone cameras do not necessarily apply the same fixed global tone curve to
all images. Instead, the tone curve may adapt to each individual image.

2-16
Physics of Digital Photography (Second Edition)

Furthermore, the tone curve may become dependent on pixel location, which is
known as local tone-mapping. This is discussed further in section 2.12.2.

2.3.3 Raw headroom

Although the default tone curve applied by an in-camera image processing engine of
a traditional DSLR camera is designed to render a pleasing output image for a
typical scene, clipping of scene data will occur if the image DR is less than the scene
DR.
Since the image DR provided by the default tone curve is typically much less than
the raw DR, extra scene information will in general be present in the raw file.
Formally, this extra scene information is defined as the raw headroom:
raw headroom (stops) = raw DR (stops) −image DR (stops). (2.11)
Provided the camera is set to store raw files, a photographer can use external raw
conversion software to utilize the raw headroom when the scene DR exceeds the
image DR. This requires the application of a custom tone curve designed to recover
the clipped scene data.
A simple example of a tone curve that increases the raw DR transferred to the
output image file is an inverted s-curve. Contrary to an s-curve, an inverted s-curve
lowers the black clipping level. The effect is to increase contrast in the shadows and
highlights at the expense of mid-tone contrast. This can produce a muted
appearance. A custom tone curve will in general combine the characteristics of s-
curves and inverted s-curves in order to render a satisfactory image.
Tone curves are examples of global tone-mapping operators since their functional
form does not depend upon pixel location. When utilizing the raw headroom to
recover clipped shadows or highlights, it may be necessary to apply tone curves
locally to different parts of the image. This type of raw conversion combines both
global and local tone-mapping. Tone-mapping operators are discussed further in
section 2.12.2.

2.3.4 Shadow and highlight dynamic range

Although the raw DR cannot be subdivided, the image DR can be divided into
shadow and highlight contributions known as the shadow dynamic range (shadow
DR) and highlight dynamic range (highlight DR),
image DR = shadow DR + highlight DR.
Highlight DR arises from DOLs of the encoded output image file located above
middle grey, and shadow DR arises from DOLs located below middle grey.
As an example, consider a white-balanced 8-bit JPEG image file encoded using the
sRGB colour space. As illustrated in figure 2.5 of section 2.2.3, middle grey is
located at DOL = 118 in this case,

2-17
Physics of Digital Photography (Second Edition)

noise black clipping white clipping raw clipping

floor point middle grey point point

raw raw
headroom shadow DR highlight DR headroom

0 118 255
Figure 2.9. DR contributions for a model sensor response curve. Here the image DR is deﬁned by 8-bit DOLs
between DOL = 255 and the black clipping point, typically DOL = 1. Middle grey is positioned at DOL = 118
for a white-balanced output JPEG image encoded using the sRGB colour space. The total raw headroom is the
raw DR minus the image DR. The magnitude of the upper and lower contributions to the raw headroom
depends upon the image DR along with the camera manufacturer’s positioning of DOL = 118 on the sensor
response curve.

Y (DOL = 255)
shadow DR =
Y (DOL = 118)
Y (DOL = 118)
highlight DR = .
Y (min. non‐clipped DOL)
The magnitude of the highlight DR and shadow DR is dependent upon camera
model and the nature of the default tone curve chosen by the camera manufacturer.
Camera manufacturers are free to place middle grey in the encoded JPEG image
file at any desired position on the sensor response curve. As discussed in section 2.6,
the position chosen by the camera manufacturer contributes to the standard output
sensitivity (SOS) of the DOLs to incident photometric exposure. The SOS value is
used for the ISO setting as part of an exposure strategy.
The available raw headroom is dependent upon camera model. As evident in
figure 2.9, placing middle grey at a lower position on the sensor response curve will
increase the raw headroom contribution from above the JPEG white clipping point
and decrease the contribution from below the JPEG black clipping point.
Conversely, placing middle grey at a higher position on the sensor response curve
will decrease the raw headroom contribution from above the JPEG white clipping
point and increase the contribution from below the JPEG black clipping point.
It is beneficial in terms of SNR to place the image DR higher on the sensor
response curve, but this reduces the ability to recover clipped highlights when using a
custom tone curve. Camera manufacturers must balance these factors when design-
ing image-processing engines.

2.4 Histograms
Histograms provide a useful means of interpreting image data. This section discusses
luminance histograms and image histograms. An image histogram represents DOLs
and is the type of histogram shown on the viewﬁnder or rear liquid crystal display
(LCD) of a digital camera.

2-18
Physics of Digital Photography (Second Edition)

2.4.1 Luminance histograms

A luminance histogram is a plot of the number of pixels as a function of relative
luminance Y. For the sRGB colour spaces, Y can be calculated from the relative
tristimulus values via equation (2.9).
In section 2.5 it will be become apparent that a typical photographic scene is
assumed to have a relative luminance distribution that approximately averages to
middle grey or 18% relative luminance. Figure 2.10(a) shows such a luminance
histogram for a model photographic scene. Since the histogram is skewed, the peak
is located very slightly to the left of 18%. Luminance histograms are rarely used in
practice for two main reasons.
1. Luminance histograms for typical photographic scenes are heavily skewed to
the left, as evident in ﬁgure 2.10(a). This means that luminance histograms
are difﬁcult to interpret.
2. Luminance data represented by the histogram would appear correctly if
shown on a linear display, but would appear too dark if shown on a
conventional display with a nonlinear display gamma, γD > 1.

A raw histogram is a plot of photosite (sensor pixel) count as a function of raw value
for a speciﬁed raw channel. Since the raw data for each raw channel is approx-
imately linearly related to relative scene luminance, raw histograms are similar to
luminance histograms and will be heavily skewed to the left for typical photographic
scenes.

2.4.2 Image histograms

An image histogram is a plot of the number of pixels as a function of DOL. An
image histogram can be plotted for each of the colour channels of the chosen output-
referred colour space.

Figure 2.10. (a) Luminance histogram with an average relative luminance of 18.4%. (b) Corresponding image
histogram obtained by converting to 8-bit DOLs of the sRGB colour space.

2-19
Physics of Digital Photography (Second Edition)

For the sRGB colour space, the DOLs denoted by R ′DOL , G ′DOL , and B ′DOL can
be calculated using equations (4.26) and (4.27) of chapter 4 if the image is encoded
using the conventional sRGB encoding gamma curve. As described in section 2.3.2,
for preferred tone reproduction traditional DSLR in-camera image processing
engines may alter the luminance distribution before applying the encoding gamma
curve, and these steps may be combined by using a single LUT.
Image histograms corresponding to the JPEG output can seen on the back of a
digital camera when reviewing an image. Image histograms can also be seen directly
through the viewfinder in real time on a mirrorless camera or camera with liveview
capability. Furthermore, histograms provided by commercial raw converters are
image histograms representing DOLs even if a ‘linear’ tone curve is selected.
Since DOLs are distributed closely in line with lightness (relative brightness) and
not luminance, image histograms are much easier to interpret than luminance
histograms. For example, the same data used for the luminance histogram in
figure 2.10(a) appears much more symmetrical when plotted as DOLs of the sRGB
color space in figure 2.10(b). This is particularly useful when making exposure
decisions.

2.5 Average photometry

The aim of a photographic exposure strategy is to appropriately place the image DR
on the sensor response curve. Although an appropriate placement of the image DR
depends upon the exposure strategy of the photographer, standard exposure strategy
aims to provide the photographer with an appropriate starting point for further
adjustments. Standard exposure strategy is based upon average photometry as it aims
to estimate 〈H 〉, the average photometric exposure at the SP. Average photometry
requires the following:
1. A reﬂected-light meter for measuring the average scene luminance, 〈L〉. This
is the arithmetic average of the scene luminance distribution within the ﬁeld
of view (FoV) of the meter. The meter can be either hand held or in-camera.
If the in-camera meter is used, the camera should be set to a simple metering
mode that functions using average photometry.
2. A measure of the sensitivity of the DOLs of the camera JPEG output to
photometric exposure at the SP. This is provided by the ISO setting, S.
3. Knowledge of either the lens f-number N or the required exposure duration, t.

Standard exposure strategy for digital cameras is based upon the output JPEG
image ﬁle from the camera and not the raw data. Japanese camera manufacturers
are required to determine ISO settings using either the SOS or recommended
exposure index (REI) methods introduced by CIPA in 2004.
Standard exposure strategy assumes that 〈L〉 for a typical scene corresponds with
middle grey on the lightness scale. The aim of the SOS method is to ensure that 〈L〉
for a typical scene is correctly reproduced as middle grey in the JPEG output,
irrespective of the camera JPEG tone curve.

2-20
Physics of Digital Photography (Second Edition)

This section discusses reﬂected-light metering. The meter will recommend t, N

and S combinations that satisfy the reﬂected-light meter equation, which also
depends upon 〈L〉. As described in section 2.5.6, valid combinations will have the
same exposure value. Methods for specifying camera ISO settings are discussed in
section 2.6.

2.5.1 Reﬂected light meter equation

The average photometric exposure considered to yield a well-exposed photograph
for a typical scene satisﬁes the following equation:

〈H 〉S = P. (2.12)

The constant P is known as the photographic constant. The value was obtained from
statistical analysis of typical scenes and user preference as to the nature of a well-
exposed photograph [10]. ISO 12232 is the latest standard on exposure metering and
this uses P = 10.
Reﬂected-light meters measure 〈L〉, the arithmetic average scene luminance. In
order to estimate 〈H 〉 given 〈L〉, the camera equation is rewritten in the following
way:
t
〈H 〉 = q 〈L〉 . (2.13)
N2
The constant q = 0.65 ensures proportionality between 〈L〉 and 〈H 〉. It combines
various assumptions about a typical lens and includes an ‘effective’ cos4 θ natural
vignetting value. A derivation is given in the section below.
The two equations above can be combined to give the reﬂected light meter
equation,
t K
= .
N2 〈L〉S

Here K is the hand-held reﬂected-light meter calibration constant,

P 10
K= = = 15.4. (2.14)
q 0.65
A tolerance is applicable and K = 14 is often used in practice. Note that the ISO 2720
standard uses 10.6 ⩽ K ⩽ 13.4 based upon an obsolete definition of film speed, and
this range should be multiplied by 10/8. In-camera through-the-lens (TTL) metering
systems can use P directly.
All combination of t, N, and S that satisfy the reflected-light meter equation are
regarded as suitable exposure settings for a typical scene. For a given metered scene,
all such combinations will have the same exposure value. This will be defined in
section 2.5.6.
As discussed in section 2.5.5, it can be argued that P = 10 corresponds to the
following result:

2-21
Physics of Digital Photography (Second Edition)

〈L〉
≈ 18%.
L max
In other words, the average scene luminance for a typical scene will always be
approximately 18% of the maximum scene luminance. Since 18% relative scene
luminance corresponds to middle grey, a typical scene is assumed to have a
luminance distribution that averages to middle grey.

2.5.2 Proportionality constant

This section provides a derivation of equation (2.13) and explains the origins of the
constant q.
The camera equation defined by equation (1.59) of chapter 1 defines the photo-
metric exposure distribution at the SP. Replacing L by the metered arithmetic
average luminance 〈L〉 yields the following:
π t
H= 〈L〉 T 2 cos4 φ . (2.15)
4 Nw
The problem with the use of this expression as part of an exposure strategy is that H
varies over the SP even though 〈L〉 is a constant. This is due to natural vignetting
described by the cos4 φ term. A way forward is to define a single value for an
arithmetic average photometric exposure 〈H 〉 that corresponds with 〈L〉. This can be
achieved by choosing a fixed ‘effective’ value for cos4 φ deemed most representative
of the variation due to natural vignetting.
The ANSI PH3.49-1971 standard [11], later replaced by ISO 2720 [12], selected a
value cos4 φ = 0.916 with φ = 12°. Furthermore, various additional assumptions
were made about light loss through a typical lens. Equation (2.15) could then be
written in the form of equation (2.13),
t
〈H 〉 = q 〈L〉 .
N2
The constant q groups together the various assumptions,
πF
q= T cos4(12°) = 0.65.
4 b2
Here F = 1.03 is a lens flare correction factor, T = 0.9 is the lens transmittance factor,
and b = 80/79 is the bellows factor.
The most recent photographic standard on exposure, ISO 12232 [2], uses the same
value for q but the assumptions made are slightly different,
π
q = v Tlens cos4(10°) = 0.65.
4
Here v = 0.98 is a vignetting factor, the lens transmittance factor is T = 0.9, the
cosine fourth factor is cos4 φ = 0.94 with φ = 10°, and focus is set at infinity so that
b = 1 and the working f-number reduces to the f-number. The significance of
equation (2.13) is the proportionality between 〈L〉 and 〈H 〉.

2-22
Physics of Digital Photography (Second Edition)

2.5.3 Photographic constant

Recall from equation (2.12) that the product of 〈H 〉 and the ISO setting S yields the
photographic constant P,
〈H 〉S = P.
Through long-term statistical analysis of photographs of a large number of scenes,
the most appropriate value for the photographic constant was settled upon [10]
P ≈ 8. (2.16)
Therefore, exposure meters used up to the end of the 1970s were calibrated to aim
for P = 8. This value could be expected to yield the most suitable average
photometric exposure 〈H 〉 for a typical scene. The appropriate 〈H 〉 for a typical
scene can be referred to as the required photographic exposure [10].
The value chosen for P explains the sunny-16 rule. This states that a
suitable exposure duration when a satisfactory photograph is taken with a camera
set at N = 16 on a clear sunny day at a solar altitude of 40° is approximately given by
the reciprocal of the film speed [10] (see below). In fact, the sunny-16 rule
corresponds with P = 8.11, the difference being negligible in practice.
In 1982, the ISO 2721 standard was issued for calibration of simple in-camera
metering systems [13]. The standard describes procedures for calibrating 〈H 〉 at the
film plane in order to be consistent with average photometry. The following
definition was used by ISO 2721:
〈H 〉S = 10.
This 10/8 increase from the value used for P in equation (2.16) can be accounted for
by the 1979 change in definition for colour reversal film speed. In film photography,
the film speed S is a sensitometric measure of a specified response to photometric
exposure from photographic film. Around the time that the ANSI PH3.49-1971
photographic standard was issued in 1971 for calibration of hand-held exposure
meters, film speed for colour-reversal (R) film was defined as
8
S= .
HR
Here HR is the geometric average of the base 10 logarithm of the photometric
exposure required to produce specific densities at two specific points on the film
response curve. In 1979, the ANSI PH2.21-1979 standard [14], now replaced by ISO
2240-2003 [15], changed the definition of film speed S for colour reversal film to
10
S= . (2.17)
HR
This increase in the value of S by 10/8 accounts for the 10/8 increase in the value
of P.
The change in definition for colour reversal film speed was made according to user
preference since the general practice at the time was to average multiple hand-held

2-23
Physics of Digital Photography (Second Edition)

meter readings according to the Zone System [16]. This effectively reduced the
recommended 〈H 〉 to a geometric average not consistent with the arithmetic average
used in ISO 2720. In fact, a geometric average is approximately a factor 8/10 smaller
for a typical scene than a true arithmetic average, and so photographs generally
appeared 1/3 of a photographic stop too bright [16]. The change in definition for
colour reversal film speed therefore enabled practitioners of the Zone System to
obtain correct exposure recommendations without the need for meter recalibration.
On the other hand, in-camera metering systems perform an arithmetic average and
must be consistent with the new film speed definition.

2.5.4 Hand-held meter calibration constant

The calibration constant K for hand-held reﬂected-light meters can be obtained by
substituting the values for P and q into equation (2.14),
P
K= .
q

The ANSI PH3.49-1971 photographic standard published in 1971 used P = 8 and

q = 0.65 [10], which yields K = 12.3. The sunny-16 rule corresponds with K = 12.5. The
ANSI PH3.49-1971 photographic standard was replaced in 1974 by ISO 2720:1974,
which included a tolerance on the allowed values of K from 10.6 to 13.4 [12].
However, a consequence of the new film speed definition expressed by equation
(2.17) that was introduced by the ANSI PH2.21-1979 standard [14] in 1979 is that
users of reflected-light meters should increase the meter constants specified in ISO
2720 by a factor 10/8 when metering for 〈L〉. For example, K = 12.5 becomes
K = 15.4,
and the allowed tolerance now covers 13.25 to 16.75 [16].
For consistency between film and digital photography, the ISO 12232 [2] photo-
graphic standard on exposure adopted P = 10 for use in digital photography,

〈H 〉S = 10.

2.5.5 Average scene luminance

It is often stated that reflected-light meters are calibrated according to an ‘average
reflectance’ expressed as a percentage, for example 18%. In fact, average reflectance
is meaningful only when the scene illumination is uniform. Under uniform
illumination, luminance is proportional to reflectance provided the scene objects
within the FoV are perfect Lambertian reflectors. For non-Lambertian reflectors,
luminance is proportional to luminance factor.
However, illumination in general scenes is non-uniform. Natural objects have a
reflectance ranging between approximately 3% and 90%. Assuming the scene objects
are perfect Lambertian reflectors, uniform illumination would result in a scene

2-24
Physics of Digital Photography (Second Edition)

luminance ratio of 30:1 or scene DR of log2(90/3) = 4.9 stops, well below the 160:1
luminance ratio or 7.3 stops of scene DR often quoted for a typical scene. Higher
scene DR arises when multiple light sources are present and illuminate different
parts of the scene, and in particular when scene areas are shaded from a light source.
Recall that reflected-light meters measure average luminance and are calibrated in
terms of the metering constant K, was derived from the photographic constant P.
These constants were determined through observer statistical analysis of photo-
graphs of real scenes and can therefore be associated with an average luminance 〈L〉
rather than an average reflectance.
An estimate for 〈L〉 expressed as a percentage of the maximum scene luminance
can be obtained by comparing the value of the hand-held reflected-light metering
constant K with the value of the incident-light metering constant C recommended for
calibration of hand-held incident-light meters [17],
〈L〉 K
≈π .
L max C
For example, using the value C = 250 common for flat receptors and within the
range specified in ISO 2720 together with K = 12.5 yields 〈L〉/L max ≈ 16%. This
estimate is close to 18%, which has special significance in relation to the HVS since
18% relative luminance corresponds to middle grey on the lightness scale.
An interpretation of this result is that the scene luminance distribution for typical
scenes approximately averages to middle grey or an average luminance that is 18% of
the maximum. Equivalently, the average scene luminance is always log2(100/18) ≈
2.5 stops below the maximum scene luminance, irrespective of the scene DR.
2.5.6 Exposure value
Recall that a reflected light meter functioning using average photometry will
recommend combinations of N, t and S that satisfy the reflected-light meter equation,
t K
= .
N2 〈L〉S
This can be rewritten as the APEX (Additive System of Photographic Exposure)
equation desgined to simplify manual calculations
Ev = Av + Tv = Sv + Bv. (2.18)
Here Ev is the exposure value. The aperture value (Av), time value (Tv), speed value
(Sv) and brightness value (Bv) are defined by
Av = log2N 2
Tv = − log2t
S
Sv = log2
3.125
〈L〉
Bv = log2 .
0.3K

2-25
Physics of Digital Photography (Second Edition)

The following table lists example values of N and corresponding Av:

N: 0.5 0.7 1 1.4 2 2.8 4 5.6 etc

Av: −2 −1 0 1 2 3 4 5

The following table lists example values of t and corresponding Tv:

t: 8 4 2 1 1/2 1/4 1/8 etc

Tv: −3 −2 −1 0 1 2 3

Signiﬁcantly, all combinations of N, t and S that satisfy the reﬂected-light meter

equation have the same Ev.
Although Av and Tv are specific numbers to be associated with specific values of
N and t, respectively, a difference of 1 Ev defines a photographic stop. An f-stop
specifically relates to a change of Av.

2.6 Exposure index

The exposure index SEI is a quantity that is inversely proportional to the average
photometric exposure at the SP [2],
〈H 〉SEI = P, (2.19)
where P = 10 is the photographic constant. In film photography, film speed is
typically used as the exposure index, although film can be ‘pushed’ or ‘pulled’ by
using an exposure index that differs from the ISO film speed rating. In digital
photography, the exposure index is commonly referred to as the ISO setting. This
determines the sensitivity of the camera JPEG output to incident photometric
exposure. In this chapter the ISO setting is denoted as S.
Although meter calibration ensures that the SP receives the correct 〈H 〉, digital
gain along with analog gain and exposure is used to produce the output JPEG file.
Digital gain is a multiplication of digital values. Camera JPEG engines apply digital
gain via the encoding tone curve, and these tone curves differ between camera
models. This means that methods for determining camera ISO settings must
additionally specify the nature of the corresponding JPEG digital output required.
ISO speed, SOS and REI are types of ISO settings that differ in the way that the
required digital output is specified.
At a fundamental level, the sensitivity of the digital output to photometric
exposure depends upon several factors.
• The rate at which the photosite electron wells fill with electrons in response to
photometric exposure. This defines the sensitivity of the sensor itself and
depends primarily upon its quantum efficiency and fill factor to be defined in

2-26
Physics of Digital Photography (Second Edition)

chapter 3. The sensitivity of a given sensor is therefore ﬁxed and cannot be

adjusted.
• Analog gain applied to amplify the signal before it is quantized by the ADC
Doubling the gain enables the same digital image to be obtained from halving
〈H 〉, and so the ISO setting doubles according to equation (2.12).
• Digital gain via a gamma curve or tone curve. This alters the DOLs.

For sensors that use the same technology and have equal quantum efficiency and fill
factor, the sensitivity is independent of photosite area (sensor pixel size). This is
explained by figure 2.11.
Since 2004, Japanese camera manufacturers have been required to use either the
SOS or REI methods first introduced by CIPA [1]. The digital output is specified in
terms of DOLs in the output JPEG image and not in terms of raw levels.

2.6.1 ISO speed

Although no longer used by Japanese camera manufacturers, it is instructive to first
describe ISO speed, which remains part of the ISO 12232 standard [2].
ISO speed specifies an ISO setting designed to ensure a minimum image quality
level. Two methods for determining ISO speed are available, which relate to
different aspects of image quality [18]. The noise-based method aims to ensure a
minimum signal-to-noise ratio (SNR). However, this method cannot be used with
lossy image formats such as JPEG and so this method will not be discussed here. The
alternative is the saturation-based method, which aims to avoid the clipping of
highlights. Saturation-based ISO speed Ssat must be defined in accordance with
equation (2.12)
10
Ssat = . (2.20)
〈H 〉
Recall from section 2.5.5 that the photographic constant P = 10 contained in this
equation indirectly implies an assumed 18% average luminance 〈L〉/L max for a
typical scene, where 〈L〉 is the average scene luminance and L max is the maximum
scene luminance. Consistent with this assumption, the aim of saturation-based ISO

Figure 2.11. Sensor sensitivity is independent of photosite area for sensors that use the same technology with
equal quantum efficiency and fill-factor. In this case, the photosites will fill with electrons at the same rate. An
analogy is commonly made to the fact that large and small buckets collect rainwater at the same rate.

2-27
Physics of Digital Photography (Second Edition)

speed is to ensure that the DR for a typical scene is placed on the sensor response
curve such that the output JPEG image file will not contain clipped highlights. In
this section, it will be assumed that the output image file is an 8-bit JPEG file.
For a typical scene with 18% average luminance, a scene object with 100%
relative luminance must be placed just below the JPEG clipping point in order to
prevent highlights from clipping. In fact, the definition includes a safety factor so
that a scene object with 100% relative luminance will be placed half a stop below the
JPEG clipping point. In other words, the highlights will not actually clip until the
average scene luminance drops to approximately 12.8% of the scene maximum.

ISO speed measurement

In order to measure Ssat , the analog gain must be fixed in a known setting. A
uniform grey card of any reflectance must be uniformly illuminated. For a fixed
combination of N and t, the maximum luminance Lsat that does not lead to JPEG
saturation (highlight clipping) can then be measured.
The measurement should be carried out according to the procedures and ambient
conditions detailed in the ISO 12232 standard [2]. In particular, the white balance
appropriate for the scene illumination should be used so that the DOLs for each
RGB colour component will be equal. An 8-bit JPEG file clips when the DOL
reaches 255. It is possible that the camera manufacturer could place the JPEG
clipping point below the raw clipping point on the sensor response curve, and so this
does not necessarily imply that FWC has been utilised.
Since the luminance is uniform, the photometric exposure Hsat corresponding to
the measured Lsat can be calculated by using equation (2.13),
t
Hsat = q Lsat .
N2
In order to determine the value of the ISO speed, the 18% value for the average scene
luminance is amended in the following way:
〈L〉 18 10
= ≈ .
Lsat 100 2 78
The extra 2 factor has been inserted to provide log2( 2 ) = 0.5 stops of exposure
headroom for use in situations where the metered 〈L〉 falls below 18%. The JPEG
output will not saturate until the average scene luminance drops to 10/78 ≈ 12.8%, as
mentioned above. Equivalently, 〈L〉 is assumed to always be log2(78/10) ≈ 3 stops
below the maximum scene luminance, irrespective of the scene DR. If the scene DR
is larger than the DR that can be accommodated by the JPEG tone curve (the image
DR), then the shadows will clip at the DOL corresponding to the base of the tone
curve.
Since 〈H 〉 is proportional to 〈L〉 according to equation (2.13), the relationship
between 〈H 〉 and Hsat is found to be
〈L〉 〈H 〉 10
= ≈ . (2.21)
Lsat Hsat 78

2-28
Physics of Digital Photography (Second Edition)

The ISO speed Ssat can now be determined from Hsat by substituting equation (2.21)
into (2.20),
78
Ssat =
Hsat

This measured value should be rounded to the nearest standard value. The base ISO
speed corresponds to the analog gain setting that allows full use of the sensor
response curve.

ISO speed characteristics

There are several drawbacks associated with the use of saturation-based ISO speed
as the exposure index.
• ISO speed is primarily affected by the position of the JPEG highlight clipping
point on the sensor response curve.
• ISO speed is not necessarily affected by the shape of the tone curve below the
JPEG highlight clipping point.
• The half stop of exposure headroom is built into the measurement.

Figure 2.12 shows the 8-bit JPEG tone curves for two different camera models. Since
both curves clip at exactly the same position, the measured ISO speed will be the
same for both cameras [18]. However, each curve leads to a different mid-tone
lightness since the DOL corresponding to 18% relative luminance is not the same.

Figure 2.12. 100% relative luminance is tied to the JPEG highlight clipping point (DOL = 255 for 8-bit output)
assuming a 12.8% average scene luminance. The position of DOL = 255 determines the ISO speed. Both JPEG
tone curves here deﬁne the same ISO speed but lead to different mid-tone image lightness. The JPEG clipping
point may be placed below the raw clipping point.

2-29
Physics of Digital Photography (Second Edition)

This is a disadvantage for photographers who primarily use JPEG output since
images from different camera models produced using the same S, t, N may not turn
out to have the same mid-tone lightness.

2.6.2 Standard output sensitivity

In 2004, CIPA introduced standard output sensitivity (SOS). This is again deﬁned in
accordance with equation (2.12):

10
SSOS = . (2.22)
〈H 〉

The aim of SOS is to produce an image with a standard mid-tone lightness rather
than ensure a minimum image quality level.
SOS relates the ISO setting to a mid-tone DOL instead of the DOL at JPEG
saturation used by saturation-based ISO speed. The particular mid-tone chosen is
defined by DOL = 118, the reason being that this corresponds with middle grey (18%
relative luminance) on the standard encoding gamma curve of the sRGB colour space,
as shown in figure 2.13. In other words, if a typical scene with a 18% average luminance
is metered using average photometry, the output image will turn out to be photo-
metrically correct when viewed on a display with a compensating display gamma.
The camera manufacturer is free to place middle grey at any desired position on
the sensor response curve through use of both analog and digital gain, and the
measured SSOS value will adjust accordingly in order to ensure that the standard mid-
tone lightness is achieved. For example, the two JPEG tone curves shown in
figure 2.12 would lead to different measured SSOS values and hence a different
recommended Ev when metering the same scene. However, use of the recommended
Ev in both cases would lead to the same standard mid-tone lightness expected by the
photographer. Furthermore, the shadow and highlight clipping points do not affect
SSOS and can be freely adjusted. This is discussed further in section 2.6.4, which
discusses extended highlights.

Standard output sensitivity measurement

In order to calculate SSOS, a uniform grey card of any reflectance must be uniformly
illuminated according to the conditions specified by CIPA DC-004 or ISO 12232.
The measured luminance can be taken as the average scene luminance 〈L〉. For a
fixed combination of N and t, the value 〈L〉 that produces DOL = 118 in the JPEG
output should be obtained, again under the procedures specified by CIPA DC-004 or
ISO 12232. In particular, the camera white balance should be correctly set.
The corresponding average photometric exposure 〈H 〉 must be consistent with
equation (2.13),
t
〈H 〉 = q 〈L〉 .
N2

2-30
Physics of Digital Photography (Second Edition)

Figure 2.13. (Upper) Standard sRGB gamma curve (magenta) with output quantised to 8 bits. Metered 〈L〉
corresponds to DOL = 118 assuming an 18% average scene luminance. (Lower) Same curve plotted using base
2 logarithmic units on the horizontal axis; 〈L〉 lies approximately 2.5 stops below Lsat and the curve clips to
black approximately 11.7 stops below Lsat .

Using q = 0.65 along with K = 15.4 as deﬁned by equation (2.14) ensures that the
photographic constant P = 10,
P = Kq = 15.4 × 0.65 = 10.
Subsequently, SOS can be determined from equation (2.22). The calculated value
should be rounded to the nearest standard value tabulated in CIPA DC-004 or ISO
12232. This means that the quoted SSOS may differ from the calculated value to
within a tolerance of ± 1/3 stop.

2-31
Physics of Digital Photography (Second Edition)

2.6.3 Recommended exposure index

Recommended exposure index (REI) allows camera manufacturers to use an
arbitrary specification for the required digital output according to the manufac-
turer’s own objective, for example a specification that the manufacturer considers
produces a pleasing image. Although digital gain and other image processing can be
applied arbitrarily, the REI value must nevertheless be consistent with equation
(2.12), which defines the average photometric exposure provided at the SP,
〈H 〉SREI = 10.
Since REI must be defined using average photometry, its value can be determined
experimentally [1] using the Ev equation defined in section 2.5.6. A uniformly
illuminated grey card can be photographed using a known Av and Tv determined
from the selected N and t, and the Bv can be obtained by measuring the luminance
using a hand-held reflected-light meter with known calibration constant K. The Sv
can then be determined by solving the Ev equation,
Sv = Av + Tv − Bv.
Subsequently, the equation for Sv can be rearranged to yield the REI,
SREI = 3.125 × 2Sv .
The REI is provided by the camera manufacturer so that users of hand-held
exposure meters and other accessories such as strobe lights can correctly use the Ev
equation. According to ISO 12232, if the camera does not provide a simple metering
mode that functions using average photometry, then the REI value is not useful and
should not be reported [2].

2.6.4 Extended highlights

The introduction of SOS has given camera manufacturers the freedom to use the
sensor response curve in any desired fashion when rendering an output JPEG image
since the SOS value will always produce an image with the correct mid-tone
lightness. Digital cameras now routinely appear with ‘extended’ low-ISO settings
available below the base ISO setting. However, the JPEG output at the base ISO
setting is found to have additional highlight headroom compared to the extended
low-ISO output.
The diagram on the left-hand side of ﬁgure 2.14 was obtained by photographing a
step wedge using the OlympusⓇ E-620 [19]. The dark blue curve and the red curve
are the JPEG tone curves at ISO 100 and ISO 200, respectively. The horizontal axis
uses a base 2 logarithmic scale with respect to scene luminance and therefore
represents stops. This representation is similar to the lower diagram of ﬁgure 2.13,
except that the stop values have been normalised so that zero corresponds to middle
grey or DOL = 118. It can be seen that the highlights clip at approximately 2.5 stops
above middle grey on the ISO 100 curve. However, the highlights clip at
approximately 3.5 stops above middle grey on the ISO 200 curve, and so an extra
stop of highlight headroom is available before the JPEG output saturates.

2-32
Physics of Digital Photography (Second Edition)

Figure 2.14. The diagram on the left shows digital output level (DOL) as a function of stops above and below
middle grey (DOL = 118) for the OlympusⓇ E-620. A JPEG image taken at ISO 200 has an extra stop of
highlight headroom compared to an image taken at ISO 100. The diagram on the right shows the
corresponding raw levels as a function of scene brightness (logarithmic luminance). (Reproduced from [19]
courtesy of http://www.dpreview.com.)

Further information is provided by the light blue curve in the same diagram on
the left-hand side. This is the JPEG tone curve obtained at ISO 100 when
underexposing by one stop by using the t and N recommended by metering at
ISO 200. Naturally the image is darker, but notably the highlights have the same
clipping point as the ISO 200 curve.
The diagram on the right-hand side of figure 2.14 shows the corresponding raw
levels. These curves are nonlinear since the horizontal axis represents scene bright-
ness (logarithmic luminance) rather than relative scene luminance. Surprisingly, it
can be seen that ISO 100 and ISO 200 do not produce the same raw data.
Furthermore, the ISO 100 curve underexposed by one stop coincides exactly with
the ISO 200 curve.
The above investigation reveals that ISO 100 and ISO 200 are in fact using the
same analog gain setting. The ISO 200 setting is defined by metering at ISO 100 and
then reducing the metered Ev by one stop. This effectively lowers the position of
middle grey (18% relative luminance) on the sensor response curve by one stop.
Digital gain is then utilised by applying a different tone curve to the ‘underexposed’
raw data. This tone curve renders JPEG output with the standard mid-tone lightness
by relating middle grey with DOL = 118, but with an additional stop of highlight
headroom made accessible by the ‘underexposing’ of the scene luminance distribution.
For example, if the ISO 100 tone curve saturates at the same point as the standard
sRGB curve, then the highlights at ISO 200 will clip at 200% relative luminance.
Equivalently, the average scene luminance can drop to 9% before the highlights must
clip. This additional stop of highlight headroom is lost if the ISO 100 ‘low-ISO’
setting is selected. An equivalent viewpoint is that the ISO 100 setting is defined by
‘overexposing’ the raw data by one stop, which causes a loss of one stop of highlight

2-33
Physics of Digital Photography (Second Edition)

headroom in the JPEG output at ISO 100 compared to ISO 200. In reality, neither
viewpoint can be considered as underexposing or overexposing because SOS permits
the application of both analog and digital gain [19].
When reviewing cameras, reviewers may investigate and report the nature of the
JPEG tone curve. As described in section 2.3.4, the number of stops provided by the
tone curve above and below middle grey (DOL = 118) is referred to as the highlight
DR and shadow DR, respectively [19]. This information is useful for photographers
who primarily use JPEG output from the camera. Since the application of digital
gain adversely affects achievable SNR, camera manufacturers must balance various
image quality trade-offs when designing an in-camera JPEG processing engine, and
so it is important to consider all aspects of the JPEG output when comparing camera
image quality. Photographers who process the raw data themselves using raw
processing software are of course free to apply any desired tone curve to the raw
data, and so information regarding aspects of the raw data such as raw DR will be of
greater interest.

2.7 Advanced metering

Many scenes are non-typical, meaning that the scene luminance distribution does
not average to 18% relative luminance or middle grey. For such scenes, standard
exposure strategy will produce an output JPEG image with an incorrect mid-tone
lightness and so the image histogram will be shifted towards the left or right.
Although this may be desirable for aesthetic reasons, an incorrect mid-tone lightness
can be corrected by applying exposure compensation (EC) and/or by using a form of
advanced metering. The latter include in-camera matrix metering modes, spot
metering, and incident-light metering.
Another issue that may occur, particularly when the scene is non-typical, is the
scene DR exceeding the image DR. In this case, clipping of the output JPEG image
will occur. If the corresponding raw ﬁle is available, a custom tone curve can be
applied using an external raw converter to utilize a portion of the raw headroom.
The entire raw headroom can be utilized provided the photograph was taken by
exposing to the right, in which case the entire raw DR can be transferred to the
output JPEG image. This technique is discussed in section 5.10.4 of chapter 5.
If the scene DR exceeds the raw DR, neither EC nor advanced metering can
prevent clipping from occurring. In this case, highlight or shadow detail will be lost,
even if the scene is typical. The following strategies can be followed:
1. Use EC and/or advanced metering to preserve the luminance range consid-
ered to be more important in portraying the scene, for example the highlights
or shadows. The lightness of the overexposed or underexposed regions of the
image can subsequently be corrected using using an external raw converter or
editing software.
2. For scenes with a bright sky, use a graduated neutral density ﬁlter to
compress the scene DR. These are discussed in section 2.10.
3. Capture the entire scene DR by using high dynamic range imaging. This is
discussed in section 2.12.

2-34
Physics of Digital Photography (Second Edition)

2.7.1 Exposure compensation

Exposure compensation (EC) can be applied to adjust the target Ev away from that
recommended by the meter.
When 〈L〉/L max < 18% , standard exposure strategy will lead to overexposure.
Negative EC will compensate by pushing the image histogram to the left. Conversely
when 〈L〉/L max > 18% , standard exposure strategy will lead to underexposure.
Positive EC will compensate by pushing the image histogram to the right.
EC can be measured in stops as a modification to the metered Bv defined in
section 2.5.6. This modification can be incorporated into the Ev equation,
Ev = Av + Tv = Sv + Bv − EC.
Since a lower Ev corresponds to a greater average photometric exposure, positive
EC will lower the metered Ev, and negative EC will raise the metered Ev.

2.7.2 In-camera metering modes

Although traditional arithmetic average metering remains an option on some
modern cameras, it has largely given way to more advanced forms of metering
such as centre weighted, spot and matrix metering.
Centre-weighted metering and spot metering can be classed as simple metering
modes that function using average photometry, however they aim to give more
reliable results for scenes that are not typical. Centre-weighted metering applies a
greater weighting to the centre of the scene when calculating 〈L〉, and spot metering
uses only a small selected area in the scene to calculate 〈L〉. Hand-held light meters
can also be used for spot metering.
Matrix metering, also known as pattern metering or evaluative metering, does not
function using average photometry. Instead, matrix metering mode generally bases
its Ev recommendation on its own analysis of multiple parts of the scene. The
analysis may include comparison with a database of known scene characteristics and
will not in general be based upon an assumption that the luminance distribution
averages to middle grey.
Although ISO settings determined using the SOS and REI methods remain
perfectly valid in matrix metering mode, the recommended t and N combinations or
Ev will not in general correspond with those calculated using average photometry.
As such, no correspondence can be expected with the Ev recommended by a simple
in-camera metering mode or hand-held reﬂected-light meter.

2.7.3 Incident light metering

Incident light metering takes a different approach to reﬂected light metering. The
photographer must be positioned at the subject location when taking an incident
light meter reading. At the subject location, the photographer points the handheld
incident light meter in the direction of the camera. The meter records the illuminance
E incident on the subject. Suitable combinations of t, N and S satisfy the incident-
light meter equation,

2-35
Physics of Digital Photography (Second Edition)

t C
2
= .
N ES
Here C is the incident light meter calibration constant [2, 12]. The above equation can
be expressed in terms of the APEX system described in section 2.5.6,
Ev = Av + Tv = Sv + Bv = Sv + Iv.
Here Iv is the incident light value,
E
Iv = log2 .
0.3 C
Incident-light metering is useful when an important subject must be exposed
correctly. It is also used for studio flash photography. Incident-light metering is
more reliable than reflected-light spot metering as the measurement is independent
of the percentage relative luminance of the subject, which does not need to be middle
grey. Since the illuminance incident on the subject is known, the subject will be
exposed correctly according to its own percentage reflectance.

2.8 Exposure modes

Each of the several different exposure modes available on modern digital cameras
can be useful, depending on the type of photographic situation encountered. The
descriptions given below are meaningful when the exposure is based upon the JPEG
image produced by the camera. As described in section 2.6, standard exposure
strategy does not aim to provide the best image quality. It will be shown in chapter 5
that image quality can be optimised by using raw output in conjunction with the
expose-to-the-right (ETTR) methodology and external raw conversion.

2.8.1 Aperture priority

In aperture priority mode, the photographer controls the f-number N and ISO setting
S. The camera meter provides the exposure duration (or ‘shutter speed’) t. Aperture
priority mode is useful for general purpose photography as it is designed to prioritise
control of depth of ﬁeld (DoF).
After adjusting N according to DoF requirements, the photographer can increase
S if the metered t is too slow to prevent unwanted subject motion blur or unwanted
blur to due camera shake. Camera shake can occur when the ambient lighting
conditions are dark and a tripod is not available. For a photographer with steady
hands, the usual rule of thumb is that camera shake can be prevented by using an
exposure duration shorter than tmax , where
2M
t max = . (2.23)
mf fE
Here fE is the effective focal length and mf is the focal length multiplier. The 2M term
accounts for M stops of image stabilization, if available. The above rule of thumb is
known as the reciprocal rule. The reciprocal rule is useful even though it emerged

2-36
Physics of Digital Photography (Second Edition)

before the advent of digital photography. However, it is advisable to use an even

shorter exposure duration on a camera that has a high photosite (sensor pixel) count
in order to take advantage of the high resolving power on offer. Alternatively a
tripod can be used. Resolving power will be discussed in chapter 5.
When shortening the exposure duration either by lowering N or raising S, in
terms of overall SNR it is generally preferable to lower N. However, this will result
in a shallower DoF which may not be suitable for certain photographic scenes such
as landscapes. The photographer must judge the optimum balance between N and S.
Some cameras feature an auto-ISO mode. This automatically raises S once t
increases beyond a selected threshold value. When using a zoom lens, it is
particularly useful if the threshold value can be tied to the lens focal length
according to the reciprocal rule. Shutter priority mode and manual mode can be
more convenient than aperture priority mode when photographing moving
subjects.

2.8.2 Shutter priority

In time priority or shutter priority mode, the photographer controls the exposure
duration (shutter speed) t and ISO setting S. The camera meter provides the f-
number N. Shutter priority mode is useful when the photographic conditions require
a maximum exposure duration (minimum shutter speed) to be maintained, partic-
ularly when needing to freeze the appearance of moving action. However, there is
less direct control over DoF.
An experienced photographer can judge the required exposure duration by
considering factors such as the speed of the subject, the distance between the camera
and the subject, and the lens focal length. Camera shake is not typically a major
issue in shutter priority mode since the selected exposure duration will likely be
much shorter than the tmax deﬁned by equation (2.23). However, camera shake can
be an issue if a very long focal length is used. Image stabilisation can help prevent
camera shake but will not help to freeze the appearance of moving action.
When a shorter exposure duration is selected, available light decreases, or positive
EC is applied, the camera will open up the aperture by lowering N. If N has been
lowered to the minimum available value, the camera meter will indicate if the target
exposure cannot be achieved by using a mark on the Ev scale. In this case the ISO
setting S will need to be raised accordingly.
If the camera has an auto-ISO mode, the camera will automatically raise S to
achieve the target exposure once N has been lowered to its minimum available value.

2.8.3 Program mode

The program exposure mode is a semi-automatic mode. For the selected S, the
camera will meter the scene and select a combination of N and t. The particular
combination presented will be based on analysis of the scene content. However,
this combination can be overridden by the user. All combinations of N and t that
match the metered Ev can be cycled through and an alternative combination
selected.

2-37
Physics of Digital Photography (Second Edition)

If a simple metering mode is used based on average photometry, all available

combinations of N and t will satisfy the Ev equation,
Ev = Av + Tv = Bv + Sv − EC.
Note that in aperture priority mode the f-number N will remain fixed when EC is
applied, and in shutter priority mode the exposure duration t will remain fixed when
EC is applied. If EC is applied in program mode, neither N nor t will remain fixed,
and so all available combinations of N and t can be selected.
If auto-ISO is available in program mode, all valid combinations of N and t can
be cycled through, and the camera will automatically adjust S according to the
chosen combination.

2.8.4 Manual mode

Manual mode allows t, N and S to be independently adjusted. The combination
selected deﬁnes a user Ev. By using a mark on the Ev scale, the camera meter will
indicate the difference between the user Ev and its own metered Ev.
If auto-ISO is available in manual mode, the camera will automatically raise S so
as to eliminate the difference between the user Ev and its own metered Ev. Some
cameras additionally allow EC to be applied in manual mode so that a new target Ev
can replace the metered Ev.

2.9 Photographic lighting

The direction from which the photographic subject or scene is illuminated can have a
profound effect on its appearance. The direction can be broadly categorized as from
the front, back or side. A combination of these directions may occur, particularly if
there are multiple light sources present.
• Front lighting
Shadows fall behind the photographic subject since the light source is either behind
or above the camera. This leads to a lower scene DR and accurate colours.
However, there is limited information about 3D form and surface texture. If a
portrait is being taken, undesirable shadows may fall directly under facial features.
• Side lighting
Certain areas of the scene are illuminated while other areas fall under
shadow. The shadows are directed sideways and appear elongated if the
Sun is low in the sky. Side lighting reveals information about 3D form and
surface texture, but is best suited for simple compositions [20]. The overall
appearance can be dramatic, but the scene DR is typically higher. Side
lighting is often used for product and still-life photography.
• Back lighting
The main subject is cast under shadow and so very limited information is
revealed about its colour, 3D form, and surface texture. Under strong back
lighting, the subject may appear as a silhouette. An example is illustrated in
ﬁgure 2.15. The overall scene appearance can be very dramatic, but the scene

2-38
Physics of Digital Photography (Second Edition)

Figure 2.15. The subject can appear as a silhouette under strong back lighting.

DR is typically very high. When used for product or still-life photography, a

backlight appears through translucent objects and can reveal object edges.
Recall from section 2.5 that standard exposure strategy assumes that the
luminance distribution of a ‘typical’ photographic approximately averages to middle
grey. In practice, this is most likely to correspond to a scene illuminated by front
lighting. In other words, photographs of scenes illuminated by front lighting are the
least likely to require EC or advanced metering modes. Since front lighting leads to a
lower scene DR, clipping is also unlikely to occur.
However, scenes illuminated by back lighting are unlikely to be typical. The
abundance of shadows often requires the application of negative EC, particularly if
the highlights need to be preserved. This may lead to clipping of the shadows since
backlit scenes have a very high scene DR. If detail in the shadows must be preserved,
an appropriate graduated neutral density ﬁlter can be used or high dynamic range
imaging techniques can be employed. These are discussed in sections 2.10.1 and 2.12.
Along with its direction, light can be categorized as being either diffuse or direct.
• Direct lighting
This describes light that reaches the subject directly. Direct lighting is also
known as hard or harsh lighting as it leads to strong and distinct shadows and
highlights. Consequently, subject contrast and scene DR will be higher.
• Diffuse lighting
This describes light that reaches the subject indirectly. Diffuse lighting is also
known as soft lighting as the subject will be illuminated more evenly.

2-39
Physics of Digital Photography (Second Edition)

Consequently, the shadows and highlights will have softer edges and so the
subject contrast and scene DR will be lower.

In general, the more indirect the light source, the softer the light. Although harsh
lighting may be desirable in certain situations, soft or diffuse lighting is generally
preferred and is considered to be light of higher quality. Natural light can be diffused
by three main processes:
1. Diffuse reﬂection from surfaces. Diffuse reﬂecting materials have an irregular
molecular structure. When incoming light is incident at the surface of such a
material, the molecules near the surface vibrate and cause the light to be
emitted in many different directions.
2. Scattering from small objects. For example, Mie scattering can describe the
scattering of light by cloud droplets and dust particles in the sky.
3. Rayleigh scattering from air molecules in the atmosphere. The mechanism of
Rayleigh scattering is different to ordinary scattering since air molecules are
much smaller than the wavelength of the incoming light. Consequently, the
waves will be diffracted and emerge as spherical waves.

Light is naturally softer in the early morning and evening as the distance from the
Sun to the observer is greater. Therefore light will be scattered to a greater extent.
Since light from a ﬂash unit is direct and therefore harsh, dedicated diffusers are
often used for ﬂash photography. Larger diffusers produce softer light.

2.9.1 Sunrise and sunset

Most of the sunlight reaches an observer at ground level on the Earth through
Rayleigh scattering by air molecules in the sky. Unlike ordinary scattering, Rayleigh
scattering is found to be proportional to 1/λ 4 , where λ is the wavelength [21]. This
means that sunlight at the violet end of the visible spectrum is scattered to a much
greater extent due to its shorter wavelength.
During the daytime when an observer is at a position that corresponds to A in
ﬁgure 2.16, the sky appears pale blue due to the overall mixture of scattered light

denser
B layer

sunlight A Earth

Figure 2.16. The orange and red light seen at a sunrise or sunset is caused by increased Rayleigh scattering due
to the greater distance travelled by the sunlight.

2-40
Physics of Digital Photography (Second Edition)

Figure 2.17. Light cloud can enhance the appearance of a sunset.

waves. The main contributions are from blue and green since violet itself is partly
absorbed by ozone in the stratosphere. Furthermore, the photopic (daylight) eye
response falls off rapidly into the violet end of the visible spectrum, as evident from
the standard 1924 CIE luminosity function for photopic vision illustrated in
figure 3.1 of chapter 3.
In the early morning or late evening, the same observer will be at a position that
corresponds to B or C in figure 2.16. In these cases, the sunlight must travel a much
greater distance through the atmosphere to reach the observer. Furthermore, the
atmosphere closer to the Earth’s surface is denser. Consequently, the blue end of the
visible spectrum will have been scattered away by the time the sunlight reaches
the observer. The remaining mixture of wavelengths appear orange, and eventually red.
Rayleigh scattering can be enhanced by pollutants such as aerosols and sulfate
particles close to the Earth’s surface provided their size is less than λ/10. Mie
scattering from dust and a mild amount of cloud providing diffuse reflection can
both enhance the appearance of any coloured light that has already undergone
Rayleigh scattering, thus creating a sunrise or sunset with a more dramatic
appearance. An example is illustrated in figure 2.17.

2.10 Neutral density filters

A neutral density (ND) ﬁlter reduces the light transmitted by the lens. This reduces
the Ev deﬁned by equation (2.18),
Ev = Av + Tv = Sv + Bv.

2-41
Physics of Digital Photography (Second Edition)

Accordingly, the original Ev can be restored by increasing Tv, thus enabling a longer
exposure duration to be used than would originally have been possible. A variety of
ND filter labelling systems are in use. For example:
• Filter factor (transmittance).
An ‘NDn’ or ‘ND ×n ’ filter has fractional transmittance T = 1/n. This reduces
the Ev by log2n stops. For example, T = 1/32 for an ND32 filter, and so the
Ev is reduced by 5 stops.
• Optical density.
An ‘ND d’ filter has an optical density d = −log10T . Since this is a base 10
logarithmic measure, the Ev is reduced by log2(10−d ) stops. For example, an
ND 1.5 filter reduces the Ev by 5 stops.
• ND 1 number (stops).
This is a direct way of labelling the Ev reduction in stops. For example, an
ND 105 filter reduces the Ev by 5 stops.
There are several situations where ND filters are useful. For example:
1. Shallow DoF in bright daylight conditions. In this case, use of a low f-
number to achieve a shallow DoF may require an exposure duration t
shorter than the fastest shutter speed available on the camera. In this case,
the photograph would be overexposed. However, use of an ND filter can
prevent overexposure and the associated clipping of highlights by allowing a
longer exposure duration to be used.
2. Creative motion blur effects. In particular, landscape photographers use ND
filters in order to smooth the appearance of the flow of water.

2.10.1 Graduated neutral density ﬁlters

A graduated ND filter (GND) has a graduated transmittance profile. For a
given exposure duration, a GND filter will reduce the exposure distribution corre-
sponding to the upper half of the filter relative to the lower half. This has two main uses.
1. The appearance of the sky can be darkened relative to the foreground, or
equivalently the appearance of the foreground can be lightened relative to the sky.
Note that when the scene DR can be accommodated by the raw DR, this
effect could also be created using a digital filter when processing the raw file
or output image. However, the advantage of a physical GND filter is that it
can give improved SNR in the region with increased exposure. This could
reveal detail in the foreground that would otherwise not be captured.
2. Clipping can be prevented when the scene DR exceeds the raw DR.
If used appropriately, a GND filter can compress the scene DR into the raw
DR. This is a form of tone mapping. As discussed in section 2.12, tone
mapping is ordinarily applied to compress the DR of a high dynamic range
image into the display DR. However, in the present context the tone
mapping occurs before the raw data is recorded, and so this effect cannot
be recreated using a digital filter. An example is illustrated in figure 2.18.

2-42
Physics of Digital Photography (Second Edition)

Figure 2.18. Use of a soft-edge graduated 0.9 ND ﬁlter has prevented the foreground shadow detail from
clipping to black.

There are several types of GND filter available, each providing a different trans-
mittance profile. For example:
• Hard edge GND filter.
This provides a hard transition between the dark and clear areas, and is useful when
the scene contains a horizon. The transition line is typically located halfway. If the
scene contains objects projected above the horizon, the lightness of the objects may
need to be corrected digitally when processing the raw file or output image.
• Soft edge GND filter.
This provides a more gradual transition between the dark and clear areas.
The effective transition line is again typically halfway. A soft edge GND filter
is more useful than a hard edge filter when the scene contains objects
projected above the horizon.
• Attenuator/blender GND filter.
This provides a gradual transition over the entire filter and is useful when the scene
does not contain a horizon. This type of filter is also used in cinematography.
• Reverse GND filter.
Here the darkest region of the filter is located midway, and the lightest region is
at the top. This is useful for sunrise and sunset scenes that contain a horizon.
• Center GND filter.
Nowadays this refers to a reverse GND filter mirrored at the transition line,
which is useful when a sunrise or sunset is reflected onto water. The term can

2-43
Physics of Digital Photography (Second Edition)

also refer to a different type of ﬁlter designed to eliminate natural vignetting

on older lenses.

ND and GND ﬁlters are available as conventional circular screw-on ﬁlters.

However, landscape photographers often prefer to use rectangular filters, which
require a holder system and lens adapter. The holder system offers greater flexibility.
For example, composition may be restricted when using a screw-on GND filter since
the transition line is fixed, however the transition line of a rectangular GND filter
can be shifted vertically by adjusting the filter in the holder.

2.11 Polarizing filters

Light consists of transverse waves with oscillations perpendicular to the direction of
propagation. If an incoming beam of light is visualised travelling along the z-axis
towards an observer, the oscillations at any instant in time will appear to the
observer to lie in a plane oriented at an angle θ with the z-axis. When light is
unpolarized, θ fluctuates randomly as a function of time and so any oriented plane is
equally likely. This is illustrated in figure 2.19(a). Direct natural light is unpolarized.
Linearly polarized or plane polarized light arises when θ takes a fixed value that
does not fluctuate. In other words, the oscillations are always confined to one plane.
An example is illustrated in figure 2.20. The oscillations in partially plane polarized
light have a preferred θ that occurs with greater probability.
A polarizing filter only allows light to be transmitted in one plane defined by the
angle of rotation of the filter. The utility of a polarizing filter is that the ratio between
unpolarized light and the partially or fully plane polarized light entering the lens can
be altered by appropriately rotating the filter. This can be used to enhance image
appearance, for example:
• Unwanted surface reflections and glare can be eliminated.
• Blue skies can be selectively darkened.

Figure 2.21 illustrates an example scene taken with and without the use of a
polarizing ﬁlter.
y

E
Ey

θ
x
Ex

(a) (b)
Figure 2.19. (a) The electric ﬁeld vector for unpolarized light can take any value at random. (b) Viewed along
the z-axis out of the page, the electric ﬁeld vector E can be resolved into x and y components.

2-44
Physics of Digital Photography (Second Edition)

y
y

x
x

(a) (b)
Figure 2.20. (a) Linear polarization. In this example, the wave is conﬁned to lie parallel to the y-axis. (b) The
corresponding vector E is ﬁxed in the direction of the y-axis.

Figure 2.21. (Left) Photograph taken with a polarizing filter fitted to the lens. (Right) Photograph taken
shortly afterwards with the polarizing filter removed.

2.11.1 Malus’ law

Mathematically, the electric field vector E defines the orientation of the oscillations.
As illustrated in figure 2.19(b), it is useful to resolve E into the x and y components
where A is the amplitude,
Ex = A cos θ
Ey = A sin θ .
In the case of unpolarized light, θ fluctuates randomly. Since all values are equally
probable, symmetry is restored on the average. Equivalently, each component has
the same amplitude but there is no phase coherence between these components as a
function of time. This description is a simplified picture of the physical reality,
nevertheless it is mathematically equivalent [21].
When unpolarized light passes through a polarizing filter, the appearance of the
resulting image will be unaffected. However, the power per unit area or irradiance
(see chapter 3) will be reduced by half. For example, if the Ey component is
eliminated then only the Ex = A cos θ component remains in the beam. Since θ
continues to fluctuate randomly, the time averaged irradiance Ee is reduced from A2
to the following value:

2-45
Physics of Digital Photography (Second Edition)

A2
Ee = 〈∣Ex∣2 〉 = A2 〈cos2 θ〉 = .
2
An ideal polarizing filter therefore acts as a 1-stop ND filter when the transmitted
light is unpolarized.
On the other hand, completely plane polarized light has a fixed angle θ that
defines the plane of polarization. This angle may be defined relative to a convenient
choice of x and y coordinate axes which are perpendicular to the direction of
propagation. The polarizing filter itself transmits light only in a plane defined by the
angle of rotation of the filter. This plane is referred to as the plane of transmission.
When a beam of plane polarized light passes through a polarizing filter, the axes
defining the angle θ can be aligned with the plane of transmission. In this way, θ
defines the angle between the plane of polarization and the plane of transmission.
The filter eliminates the perpendicular Ey component, and only the Ex component
remains in the beam. Since the angle θ is fixed and does not fluctuate, the irradiance
is reduced to the following value:
Ee = ∣Ex∣2 = A2 cos2 θ .
This is known as Malus’ law [21]. When the plane of polarization and the plane of
transmission are aligned, θ = 0° and so 100% transmission is achieved in principle.
When θ = 90°, no light is transmitted. In practice, polarizing filters are not ideal and
so the transmission never quite reaches these extremes. When partially or fully plane
polarized light is mixed with unpolarized light, the 100% transmission figure will not
be achieved even for an ideal filter. As already noted, the utility of the polarizing
filter is that the ratio between the unpolarized light and partially or fully plane
polarized light entering the lens can be altered.

2.11.2 Surface reﬂections

A dielectric material is an electrical insulator that can be polarized by an applied
electric field. When an unpolarized light beam is incident upon a dielectric surface
such as glass, refraction and reflection will take place. The incident ray, the normal
to the surface, and the reflected ray all lie in the same plane referred to as the plane of
incidence. This is shown in figure 2.22(a).
The reflected beam arises from re-radiation due to the vibration of atoms in the
dielectric material. Vibrations due to the refracted beam will contribute to generat-
ing the reflected beam, and these vibrations have components both in the plane of
incidence (p-vibrations) and perpendicular to the plane of incidence (s-vibrations).
Generally, a greater proportion of s-vibrations will contribute, and so the reflected
beam will be partially plane polarized. This means light entering the camera lens that
has been directly reflected from dielectric scene objects such as glass, leaves, wood,
paint, and water will be partially plane polarized.
There is in fact a situation where the reflected light will be completely plane
polarized, as illustrated in figure 2.22(b). If it so happens that the angle between the
refracted and reflected beams is 90 °, then only the s-vibrations can contribute to
generating the reflected beam [21]. In this case, the reflected beam will be completely

2-46
Physics of Digital Photography (Second Edition)

normal
reflected
plane of
incidence
n φ φ

(a)
n

refracted

normal s-polarized

φB φB
n

(b)
n

s and p-polarized
Figure 2.22. (a) Reﬂection and refraction at a dielectric surface. (b) At Brewster’s angle ϕB , the reﬂected beam
is completely polarized perpendicular to the plane of incidence.

plane polarized by s-vibrations, and the reflected light is said to be s-polarized. This
happens when the angle of incidence ϕ takes a particular value known as Brewster’s
angle, ϕB. Significantly, ϕB depends only on the refractive indices of the materials, n
(usually air) and n′. Simple trigonometry shows that
n′
tan ϕB = .
n
For a beam of light in air (n = 1) incident on a glass dielectric surface (n′ = 1.5),
Brewster’s angle ϕB ≈ 57 ° and approximately 15% of the incident light is reflected. If
the beam is incident upon water (n′ = 1.33), then ϕB ≈ 53 °.
If the light beam is incident on a stack of a sufficient number of glass plates, all
s-polarized light will eventually be reflected from the incident beam, leaving only
p-polarized light in the refracted beam [21].

2.11.3 Blue skies

As described in section 2.9.1, non-direct sunlight reaches an observer on the ground
by Rayleigh scattering from air molecules in the sky. According to the Rayleigh
theory, the scattering is proportional to 1/λ 4 since the size of the molecules is much

2-47
Physics of Digital Photography (Second Edition)

smaller than the wavelength of the incoming light [21]. Since shorter wavelengths
experience greater scattering but the violet light is partially absorbed by ozone, the
overall colour of the sky is pale blue during the daytime.
Light that scatters in the plane perpendicular to the direction of the incoming
light will be completely plane polarized. If the incoming light is propagating in the
z-direction, light that scatters at an angle θ with the z-axis will only be partially plane
polarized, and the light parallel to the z-axis will be unpolarized. This is illustrated in
figure 2.23. For an observer on the ground, light scattered from the vertical strip of
sky positioned at a right angle between the Sun and the observer will exhibit the
strongest polarization.
Since light from clouds will be unpolarized due to repeated diffuse reflection,
rotating the polarizing filter to reduce the polarized light will cause the blue sky to be
selectively darkened. Figure 2.24 shows how use of a polarizing filter at wide angles
can reveal the graduated darkening with respect to angle θ.

2.11.4 Circular polarizing ﬁlters

Linear (plane) polarizing filters should not be used with cameras such as SLR
cameras that utilise a beamsplitter for autofocus and exposure metering. Beamsplitters
themselves function using polarization, and so the use of a linear polarizing filter can
prevent these systems from functioning correctly [22].
A solution is to use a circular polarizing filter (CPL). These function in exactly the
same manner as linear polarizing filters except that a quarter-wave retarding plate is
added. The retarding plate induces a phase difference between the electric field
components resolved along the ‘fast’ and ‘slow’ axes of the plate. This causes the
light that has already passed through the filter to become circularly polarized. The
electric field vector rotates as a function of time and traces out a helix, and so

plane-polarized
unpolarized

x
sun

particle

θ z

partially
plane-polarized plane-polarized

observer

Figure 2.23. Scattering of incoming sunlight from an air molecule in the sky.

2-48
Physics of Digital Photography (Second Edition)

Figure 2.24. The sky polarization gradient is revealed by use of a strong polarizing ﬁlter at a wide angle.

the light is prevented from entering the autofocus and metering modules in a plane
polarized state.

2.12 High dynamic range

As described in sections 2.5 and 2.6, the aim of modern standard exposure strategy is
to correctly reproduce middle grey in the output JPEG image ﬁle. However, clipping
of the represented scene luminance distribution will occur if the scene DR exceeds
the image DR, even if the image is correctly exposed. In the case that a custom tone
curve is used to transfer all of the raw DR to the image, clipping will occur if the
scene DR exceeds the raw DR.
One way to reduce the extent of the clipping is to use high dynamic range (HDR)
imaging to capture a greater amount of the scene DR. This can be achieved by
combining multiple frames of the same scene, each taken at a different Ev [23]. The
best way to change the Ev between frames is to use a different exposure duration and
therefore a different Tv for each frame,
Ev = Av + Tv = Sv + Bv.
For example, ﬁve frames could be taken in total, with two frames taken at 1 Ev and
2 Ev below the metered Ev, and two frames taken at 1 Ev and 2 Ev above the
metered Ev,
Ev − 2, Ev − 1, Ev, Ev + 1, Ev + 2.

2-49
Physics of Digital Photography (Second Edition)

The frames are combined to construct an HDR image, which describes a scene
luminance distribution of higher DR than that achievable using a single frame. The
distribution can be referred to as an HDR luminance map.
The basic theory behind the construction of an HDR luminance map is discussed
below, which is followed by a brief description of tone mapping. In common with
raw data, HDR images are approximately linear. Since true linear HDR displays are
not yet available on the consumer market, the HDR luminance map needs to be
compressed to fall within a luminance ratio commensurate with that of a typical low
dynamic range (LDR) display.

2.12.1 High dynamic range imaging

Ignoring offsets and quantization, the raw data is proportional to the average
irradiance incident at the corresponding sensor pixel on the SP. Irradiance is a
radiometric measure of electromagnetic energy that will be introduced in section
3.1.1 of chapter 3. Unlike photometry, which quantifies luminous energy, radio-
metry directly quantifies electromagnetic energy without including the spectral
sensitivity of the HVS as a weighting function.
Irradiance is globally proportional to illuminance, E, provided several conditions
hold.
1. The sensor response curve must be perfectly linear.
2. The camera must respond over the same range of wavelengths as the HVS.
3. For a greyscale camera, the camera response function must be proportional
to the standard luminosity function for photopic vision denoted by V (λ ) or
ȳ(λ ), which is introduced in section 3.1.1 of chapter 3. For colour cameras
with a CFA, the set of camera response functions, which are defined in
section 3.6.4 of chapter 3, must be a linear combination of the eye cone
response functions. These are V (λ ) decomposed into contributions from each
type of eye cone.

The third requirement above, which is known as the Luther–Ives condition, is rarely
satisﬁed in practice. This means that raw data is only approximately proportional to
illuminance at the SP, and this proportionality also varies with the spectral
composition of the illumination.
Nevertheless, it can be assumed for simplicity that the raw data expressed using
DNs is proportional to the average illuminance at the corresponding pixel on the SP,
n DN ∝ t 〈E 〉 = 〈H 〉.
For a given frame, dividing the raw levels by the frame exposure duration yields a
distribution of scaled or relative illuminance values
n
〈E 〉 ∝ DN . (2.24)
t
Signiﬁcantly, the range of relative illuminance values can be extended by taking
multiple frames, each using a different t. When equation (2.24) is applied to each

2-50
Physics of Digital Photography (Second Edition)

frame, the overall result will be a relative illuminance distribution of higher DR that
covers the scene DR captured by all frames. This distribution can be referred to as
an HDR relative illuminance map.
It should be noted that the individual relative illuminance distributions for the
frames will in general overlap, and so the relative illuminance at a given pixel could,
in principle, be deduced from any of the frames that cover the value. However, each
frame will have a different noise distribution since each frame corresponds to a
different Ev. This is discussed in section 3.8 of chapter 3, where it is shown that
photon shot noise typically increases as the square root of the electron count, and so
SNR generally improves at higher Ev. For a given pixel, a naive solution to this issue
would be to use the relative illuminance value from the frame with the lowest noise.
However, it will be shown in section 5.10.3 of chapter 5 that an overall gain in SNR
can be achieved by frame averaging, meaning that temporal noise can be reduced by
averaging over all valid frames. Since the exposure duration of each frame is
different, appropriate weighting factors can be included that maximize overall SNR
when the frame averaging is performed [24].
The optimum weighting depends upon all noise sources [24], which include ﬁxed
pattern noise, read noise and photon shot noise. If only photon shot noise is
included, the optimum weighting for frame i turns out to be its exposure duration
t (i ) ,

∑t(i )〈E (i )〉
E^ = i
.
∑t(i )
i

Here E^ denotes the optimized HDR relative illuminance map. Note that all relative
illuminance values obtained from any clipped n DN will be incorrect. These will
ideally be omitted from the frame averaging by using techniques that take into
account the noise distribution [24].
Assuming that the lens has been corrected for natural vignetting, the HDR
relative illuminance map is proportional to the HDR relative luminance map that
describes the scene luminance distribution. If absolute measurements of the
maximum and the minimum luminance levels are taken using a luminance meter
at the time the frames are captured, an HDR luminance map that estimates the
absolute scene luminance distribution can also be determined.
In practice, the frames used to construct the HDR image can be obtained by
performing raw conversion without any tone curve or encoding gamma curve
applied, and saving the frames as linear 16-bit TIFF images. If the ‘dcraw’ freeware
raw converter is used, the commands are
dcraw ‐v ‐w ‐H 0 ‐o 1 ‐q 3 ‐4 ‐T filename,
where sRGB has been chosen as the output colour space. If only camera output
JPEG images are available, the encoding gamma curve (or preferably the overall
tone curve) needs to be reversed before the images can be combined.

2-51
Physics of Digital Photography (Second Edition)

A variety of software, both commercial and non-commercial, can be used to

combine the frames into an HDR image. Since HDR images are linear, they need to
be efficiently encoded. Popular file formats include Radiance, which uses the ‘.hdr’
extension, and OpenEXR, which uses the ‘.exr’ extension. However, HDR images
cannot be directly viewed on a conventional LDR display. HDR images first need to
be tone-mapped and converted into a standard file format.

2.12.2 Tone mapping

In conventional photography, the raw DR defines the maximum scene DR that can
be represented by the raw file. In HDR photography, the HDR image encoding
ultimately places a limit on the scene DR that can be represented by the HDR
luminance map, and the maximum achieved depends on the chosen number and Ev
separation of the frames.
Although tone mapping generally refers to a mapping of tone values from one
domain to another, in the present context it refers to the use of a tone-mapping
operator (TMO) to compress the raw DR or HDR luminance map into a luminance
ratio commensurate with that of a typical display.
Recall from section 2.2 that when a displayable output image file such as a JPEG
or TIFF file is produced, gamma encoding and decoding is required to minimise
posterisation when the bit depth is reduced to 8. The gamma encoding curve is not
designed to reduce or compress the captured scene DR, but rather it is designed to
prevent posterisation by encoding the DOLs in a perceptual manner. Consequently,
the introduced nonlinearity must be compensated for by the display device so that
the overall representation of the scene luminance distribution is linear. This is
referred to as photometrically correct tone reproduction.
Nevertheless, it should be noted that reducing the bit depth to 8 increases the
quantization step, and this does limit the maximum raw DR that can be transferred
to the output image. As shown in section 2.3, 8-bit DOLs encoded using the AdobeⓇ
RGB colour space, which uses γE = 1/2.2, can encode 17.59 stops of DR, while 8-bit
DOLs encoded using the sRGB colour space, which uses a piecewise gamma curve,
can encode 11.69 stops of DR.
If an additional tone curve is applied so that the overall tone curve differs from
the encoding gamma curve of the chosen output-referred colour space, γE , tone
reproduction will no longer be photometically correct. Indeed, as described in
section 2.3.2 and illustrated in figure 2.8, traditional D-SLR in-camera image
processing engines typically use a LUT to apply a tone curve in place of γE that
has a characteristic s-shape relative to γE . The tone curve reduces the raw DR to a
value commensurate with that of a typical display by raising the black clipping level.
At the same time, the characteristic s-shape boosts mid-tone contrast in order to
compensate for the ‘flat’ appearance of an image rendered on a conventional display
that has a relatively low contrast ratio. This type of tone curve both compresses and
clips the raw DR. The raw headroom defined by equation (2.11) describes the raw

2-52
Physics of Digital Photography (Second Edition)

DR that is not transferred to the image. This can be recovered by applying a custom
tone curve to the raw ﬁle using an external raw converter.
The type of tone curve described above is an example of a global TMO as it
operates on all pixels identically. In the context of HDR imaging, alternative types
of TMOs have been developed within the computer science community that
generally aim to compress the HDR luminance map into a smaller luminance ratio,
while at the same time minimising the visual impact of the luminance compres-
sion. For example, local tone-mapping operators (local TMOs) have been
developed, which are pixel-dependent operators where the tone mapping depends
upon the pixel environment. Although global contrast must ultimately be reduced,
local TMOs preserve contrast locally. This technique takes advantage of the fact
that the HVS is more sensitive to the photometrically correct appearance of local
contrast in comparison to the global luminance levels throughout the tone-
mapped image, which are photometrically incorrect. A drawback of local TMOs
is that they can produce undesirable artefacts such as halos around high-contrast
edges.
As with conventional LDR images, tone-mapped images are encoded using
DOLs and stored using conventional formats such as JPEG or TIFF. Nevertheless,
the various types of TMO that have been developed are not necessarily implemented
in the same manner [25]. Strategies include the following:
1. The linear HDR pixel values are directly tone-mapped into DOLs. Gamma
encoding is not required as it has essentially been built into the TMO.
2. The linear HDR pixel values are tone-mapped in the scene-referred lumi-
nance domain. These LDR values result from luminance compression and
are therefore pseudo-linear values that subsequently need to be transformed
into DOLs by applying the encoding gamma curve of the chosen output-
referred colour space, γE .
3. The linear HDR pixel values are tone-mapped into LDR output display
luminance values. In this case, the required DOLs can be obtained by
inverting a display model for the display device that relates DOLs and display
luminance. An example display model is described in section 2.13.2.

As discussed in the next section, the brightness and contrast controls of a display
device provide an additional form of global tone mapping.
Figure 2.25 illustrates the use of a local TMO to produce a satisfactory output
image. Figure 2.25(a) shows the JPEG output from the camera without any EC
applied. Evidently, the highlights have clipped. Figure 2.25(b) shows the JPEG
output using −3 Ev where the highlight information has been preserved but the
shadows have clipped. Figure 2.25(c) shows the JPEG output obtained using +3 Ev.
Finally, ﬁgure 2.25(d) shows a locally tone mapped JPEG image. This was
constructed by reversing the tone curves applied to the JPEG images, combining
them into an HDR image, applying a local TMO, and reapplying the encoding
gamma curve.

2-53
Physics of Digital Photography (Second Edition)

Figure 2.25. (a) JPEG output as metered by camera. (b) JPEG output using −3 Ev. (c) JPEG output using
+3 Ev. (d) Locally tone-mapped and enhanced JPEG image.

2.13 Image display

It was shown in section 2.2 that when an encoded digital image represented by DOLs
is displayed on a monitor, the display gamma needs to invert the encoding gamma so
that the overall input–output luminance relationship is linear. This section describes
a simple model for the absolute luminance emitted by a display. Subsequently, the
relationship between image DR and display DR is discussed.

2.13.1 Luma
The voltage signals used to drive the display channels are not directly derived from
the DOLs. Instead, JPEG encoding involves a conversion of the DOLs to the
Y’CbCr colour space, also written as Y ′CRCB.
The Y ′ component is deﬁned as luma, and the CR and CB are blue-difference and
red-difference chroma components, respectively. The Y ′CRCB representation is more
efﬁcient since the chroma components can be subsampled. Chroma subsampling
takes advantage of the low sensitivity of the HVS to colour differences so that the
chroma components can be stored at a lower resolution.
Recall that relative luminance Y is a weighted sum of linear relative tristimulus
values. For example, Y for the sRGB colour space is calculated from equation (2.2),
Y = 0.2126 RL + 0.7152 G L + 0.0722 BL.

2-54
Physics of Digital Photography (Second Edition)

On the other hand, luma Y ′ is deﬁned as a weighted sum of DOLs. For example, Y ′
for the sRGB colour space is deﬁned as

Y ′ = 0.299 R ′DOL + 0.587 G ′DOL + 0.114 B ′DOL .

Since these DOLs have been gamma encoded using the sRGB encoding gamma
curve or encoded using a tone curve such as an s-curve, luma is seen to be a weighted
sum of nonlinear values. Therefore, luma is not identical to gamma-encoded relative
luminance. Instead, luma can be regarded as a quantity that is of similar magnitude
to gamma-encoded relative luminance

Y ′ ≈ Y γE . (2.25)

2.13.2 Display luminance

The luminance output from a conventional CRT monitor or LCD can be expressed
using the following simple model [26]:
L = (L peak − L black ){Y ′}γD + L black + L refl. (2.26)

• L is the absolute luminance output measured in cd/m2 as a function of luma

Y ′.
• L peak is the peak luminance output in a completely dark room, which is up to
500 cd/m2 for typical LCD displays.
• L refl is the luminance due to ambient light reﬂected from the surface of the
display.
• Lblack is the luminance emitted by a black pixel. This cannot be reduced to
zero for an LCD display and is typically in the range 0.1 to 1 cd/m2 .
• γD is the display gamma. This is not necessarily the same as the inherent
display nonlinearity. Although the luminance output from CRT monitors is
naturally related to input electron-gun voltage via a power law, the same is
not true of modern LCD monitors. The required nonlinearity {Y ′}γD can be
accommodated through use of a LUT.

Dividing equation (2.26) throughout by L peak deﬁnes the normalised display

luminance,
L
Ln = .
L peak

The resulting equation can be expressed in the form of equation (2.5)

L n = C{Y ′}γD + B. (2.27)

2-55
Physics of Digital Photography (Second Edition)

The gain C and offset B are deﬁned as follows [7]:

L peak − L black
C=
L peak
L black + L refl
B= .
L peak

• The gain C is controlled by the ‘contrast’ setting.

• The offset B is controlled by the ‘brightness’ setting.

The following relation holds when L refl = 0:

C + B = 1. (2.28)

Since {Y ′}γD ≈ Y according to equation (2.25), the approximately linear relationship

between relative scene luminance and the normalised luminance output from the
display is conﬁrmed by equation (2.27),
L n ≈ CY + B.

2.13.3 Display dynamic range

The contrast ratio of a display is the ratio between the maximum and the minimum
possible luminance output with L refl = 0,
L peak
contrast ratio = : 1.
L black
This occurs when the gain C is set to its maximum value and the offset B is set to its
minimum value. The contrast ratio expressed in terms of stops is known as the
display DR,
⎛ L peak ⎞
display DR(stops) = log2⎜ ⎟.
⎝ L black ⎠
If Lblack = 0 then the contrast ratio and display DR are calculated using the ﬁrst
controllable level above zero instead [26]. In practice, the display DR is limited by
the ambient light that is reﬂected from the surface of the display. This becomes
apparent when the L refl term is included,
⎛ L peak + L refl ⎞
ambient display DR (stops) = log2⎜ ⎟.
⎝ L black + L refl ⎠

Although tone curves applied by in-camera image processing engines are designed to
restrict the image DR to a value commensurate with the contrast ratio of a typical
display, the image DR often exceeds the display DR, particularly if a custom tone
curve has been used to transfer the entire raw DR to the encoded output image.

2-56
Physics of Digital Photography (Second Edition)

Figure 2.26. Absolute display luminance L n for an LCD display as a function of (a) luma Y ′, and (b) relative
luminance Y. In both cases L refl has been set to zero, and C + B = 1 so that the image DR is compressed into
the contrast ratio of the display.

In this case, the image DR is compressed into the display DR through contrast
reduction provided equation (2.28) holds,
C + B = 1.
This is illustrated in ﬁgure 2.26. If C + B = 1 were to be violated, then clipping of
image DOLs would occur if the image DR were to exceed the display DR.
Recall that the raw DR cannot be larger than the scene DR, and the image DR
cannot be larger than the raw DR when the output image is produced from the raw
ﬁle. In other words, the image DR is always the maximum scene DR that can be
rendered on the display. If the display DR is larger than the image DR, the image
DR can be expanded into the display DR through contrast expansion. However,
contrast expansion cannot increase the represented scene DR.

References
[1] Camera & Imaging Products Association 2004 Sensitivity of digital cameras CIPA DC-004
[2] International Organization for Standardization 2006 Photography–Digital Still Cameras–
Determination of Exposure Index, ISO Speed Ratings, Standard Output Sensitivity, and
Recommended Exposure Index, ISO 12232:2006
[3] Bayer B E 1976 Color imaging array US Patent Speciﬁcation 3971065
[4] International Electrotechnical Commission 1999 Multimedia Systems and Equipment–
Colour Measurement and Management–Part 2-1: Colour Management–Default RGB
Colour Space–sRGB, IEC 61966-2-1:1999
[5] Adobe Systems Incorporated 2005 AdobeⓇ RGB (1998) Color Image Encoding Version
2005-05
[6] Martinec E Noise, Dynamic Range and Bit Depth in Digital SLRs (unpublished)
[7] Poynton C 2003 Digital Video and HDTV algorithms and Interfaces (San Mateo, CA:
Morgan Kaufmann Publishers)

2-57
Physics of Digital Photography (Second Edition)

[8] Sato K 2006 Image-processing algorithms Image Sensors and Signal Processing for Digital
Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 8
[9] International Organization for Standardization 2009 Photography–Electronic Still-Picture
Cameras - Methods for Measuring Opto-Electronic Conversion Functions (OECFs), ISO
14524:2009
[10] Connelly D 1968 Calibration levels of ﬁlms and exposure devices J. Photogr. Sci. 16 185
[11] American National Standards Institute 1971 General-Purpose Photographic Exposure
Meters (Photoelectric Type), ANSI PH3.49-1971
[12] International Organization for Standardization 1974 Photography–General Purpose
Photographic Exposure Meters (Photoelectric Type)–Guide to Product Speciﬁcation, ISO
2720:1974
[13] International Organization for Standardization 1982 Photography—Cameras—Automatic
Controls of Exposure, ISO 2721:1982
[14] American National Standards Institute 1979 Method for Determining the Speed of Color
Reversal Films for Still Photography, ANSI PH2.21-1979
[15] International Organization for Standardization 2003 Photography–Colour Reversal Camera
Films–Determination of ISO Speed, ISO 2240:2003
[16] Holm J 2016 private communication
[17] Stimson A 1962 An interpretation of current exposure meter technology Photogr. Sci. Eng. 6 1
[18] Yoshida H 2006 Evaluation of image quality Image Sensors and Signal Processing for Digital
Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 10
[19] Butler R 2011 Behind the Scenes: Extended Highlights! (http://www.dpreview.com/articles/
2845734946)
[20] George C 2008 Mastering Digital Flash Photography (Lewes: Ilex Press)
[21] Jenkins F A and White H E 1976 Fundamentals of Optics 4th edn (New York: McGraw-Hill)
[22] Goldberg N 1992 Camera Technology The Dark Side of the Lens (New York: Academic)
[23] Debevec P E and Malik J 1997 Recovering high dynamic range radiance maps from
photographs Proceedings: SIGGRAPH '97: Proceedings of the 24th Annual Conference on
Computer Graphics and Interactive Techniques (New York: ACM Press/Addison-Wesley) pp
369–78
[24] Granados M, Adjin B, Wand M, Theobalt C, Seidel H-P and Lensch H P A 2010 Optimal
HDR reconstruction with linear digital cameras Proc. of the 2010 IEEE Computer Society
Conf. on Computer Vision and Pattern Recognition (San Francisco, CA) (Piscataway, NJ:
IEEE) 215–22
[25] Eilertsen G, Mantiuk R K and Unger J 2017 A comparative review of tone-mapping
algorithms for high dynamic range video Comput. Graph. Forum 36 565–92
[26] Mantiuk R K, Myszkowski K and Seidel H-P 2015 High dynamic range imaging Wiley
Encyclopedia of Electrical and Electronics Engineering (New York: Wiley)

2-58
IOP Publishing

Physics of Digital Photography (Second Edition)

D A Rowlands

Chapter 3
Raw data model

Chapter 1 used photometry and Gaussian optics to describe the photometric

exposure distribution formed at the camera sensor plane (SP). Chapter 2 described
the resulting digital output and the development of an exposure strategy based upon
the output JPEG image file and average photometry.
The aim of this chapter is to give insight into the nature of the raw data produced
by a camera through the development of a model based on linear systems theory.
The characteristics and quality of the raw data are affected by a variety of physical
phenomena. One of the most important is diffraction. This is a wave phenomenon
that causes light passing through the lens aperture to spread out. Diffraction imposes
a fundamental limit on resolution because a point in the scene is prevented from
being imaged as a point on the SP. Diffraction requires a wave description of light,
which is beyond the scope of geometrical optics. Other physical phenomena that
affect the raw data arise from the imaging sensor itself. The sensor photosites are not
point objects and must sample the optical image at a non-zero finite resolution. This
leads to further blurring of scene detail since each photosite mixes together light
collected from a corresponding area in the photographic scene. Furthermore, the
sampling can lead to false detail appearing in the output image. This unwanted effect
known as aliasing can be described using Fourier theory. One way to reduce aliasing
to a minimal level is to pre-filter the light through an optical low-pass filter (OLPF)
before the light reaches the SP. However, such filtering causes further blurring of
scene detail.
Mathematically, the above phenomena can be modelled using linear systems
theory as a blurring and sampling of the ideal optical image formed at the SP. For
photographic applications, linear systems theory can be applied as part of the design
process when building an imaging system to a required specification. The central
quantities of interest are the point spread function (PSF) in the real domain, and the
optical transfer function (OTF) in the Fourier domain. In the present chapter, a
simple camera model based on linear systems theory is developed. The model

doi:10.1088/978-0-7503-2558-5ch3 3-1 ª IOP Publishing Ltd 2020

Physics of Digital Photography (Second Edition)

includes basic PSFs for the optics, OLPF, and sensor components. Subsequently, a
model for the charge signal generated by the blurred and sampled optical image at
the SP is developed, along with a model for the conversion of the charge signal into
digital raw data. A model for signal noise is also included.
The PSF and OTF are fundamental measures of image quality (IQ). In particular,
the modulation transfer function (MTF), which can be extracted from the OTF, is
widely used in photography to interpret lens performance and to deﬁne camera
system resolving power. The camera model derived in this chapter provides
mathematical formulae that will be used to discuss IQ in chapter 5.

3.1 Linear systems theory

Linear systems theory is a mathematical framework that uses a linear operator to
model a system [1]. It is widely used in the design [2, 3] and simulation [4–7] of
optical systems and cameras and plays a central role in the field of digital image
processing [8].
Recall that photometry was used in chapter 1 to describe the energy flow of light
as perceived by the human visual system (HVS). Photometry derives from radio-
metry [9]; however, photometry includes a spectral weighting that takes into account
the spectral sensitivity of the HVS. As described in chapter 2, a photometric
description of light is helpful for making exposure decisions based upon the visual
properties of the scene. For example, luminance as measured by a reflected-light
meter correlates logarithmically with the brightness response of the HVS, and the
sensitivity of the camera JPEG output to photometric exposure is used to define the
camera ISO settings [10, 11].
Although a photometric description of light is useful for developing a photo-
graphic exposure strategy, a radiometric description of light is more appropriate for
describing the characteristics of camera components such as the optics. For example,
the blur arising from lens aperture diffraction is a function of wavelength.
A radiometric description of light is also required to model the charge signal and
precise nature of the raw data produced by a camera. This requires knowledge of the
spectral composition of the illumination, which may differ from that assumed by the
ISO 12232 standard, the spectral transmission properties of camera components
such as the colour filter array (CFA), and finally the spectral response of the sensor
photoelements that generate electronic charge.
This chapter begins with an introduction to radiometry and lists the spectral
radiometric counterparts of the photometric quantities introduced in chapter 1. After
deriving expressions for the component PSFs and OTFs for a model camera system,
a model for the charge signal is developed in which the spectral response of the
camera system is included as a weighting function in place of the spectral response of
the HVS. Although the response of the camera is not the same as that of the HVS,
these responses will ideally be proportional to each other, as discussed in section 2.12.1
of chapter 2, and this enables the camera raw data to represent the visual properties of
the photographic scene.

3-2
Physics of Digital Photography (Second Edition)

3.1.1 Radiometry
Table 3.1 lists the spectral radiometric counterparts of all the photometric quantities
introduced in section 1.5.1 of chapter 1. Spectral radiometric quantities are denoted
with an ‘e,λ’ subscript to distinguish them from photometric quantities. The ‘e’
represents ‘energetic’.
In a spectral radiometric description, the lens transforms the scene spectral
radiance distribution at each λ into the sensor plane spectral irradiance distribution
at the same λ.
Full radiometric quantities are obtained from their spectral representations by
directly integrating over all wavelengths. For example, irradiance Ee is obtained
from spectral irradiance Ee,λ in the following manner:

Ee = ∫ Ee,λ dλ.

Table 3.1. Common spectral radiometric, radiometric, and photometric quantities with their symbols and
SI units.

Radiometry

Quantity Symbol Unit Equivalent unit

−1
Spectral flux Φe,λ W nm
Spectral intensity Ie,λ W sr−1 nm−1
Spectral radiance L e,λ W sr−1 m−2 nm−1
Spectral exitance Me,λ W m−2 nm−1
Spectral irradiance Ee,λ W m−2 nm−1
Spectral exposure He,λ J m−2 nm−1

Radiant flux Φe W J s− 1
Radiant intensity Ie W sr−1
Radiance Le W sr−1 m−2
Radiant exitance Me W m −2
Irradiance Ee W m −2
Radiant exposure He J m −2

Photometry

Quantity Symbol Unit Equivalent unit

Luminous flux Φv lm
Luminous intensity Iv lm sr−1 cd
Luminance Lv lm m−2 sr−1 cd m−2
Luminous exitance Mv lm m−2 lx
Illuminance Ev lm m−2 lx
Photometric exposure Hv lm s m−2 lx s

3-3
Physics of Digital Photography (Second Edition)

However, camera components such as the optics affect the spectral irradiance
distribution at the SP in a wavelength-dependent manner. Such contributions can be
included as spectral weighting functions before integrating over the range of
wavelengths that produce a response from the camera.

Photometry from radiometry

An important example of a weighting function is the sensitivity of the HVS to light.
In daylight, the relative sensitivity is described by the standard luminosity function
for photopic vision denoted by V (λ ) or ȳ(λ ). This function is shown in ﬁgure 3.1. In
chapter 4, it will be shown that V (λ ) can be further decomposed into contributions
from three different types of eye cone.
Photometric quantities are obtained from their spectral radiometric counterparts
by including V (λ ) as a weighting factor, and then integrating over the visible range of
wavelengths from λ1 → λ2 , where
λ1 ≈ 380 nm,
λ2 ≈ 780 nm.
For example, luminance is obtained from spectral radiance Le,λ in the following
manner:
λ2

Lv = Km ∫ Le,λ V (λ) dλ . (3.1)

λ1

Figure 3.1. Standard 1924 CIE luminosity function for photopic (daylight) vision V (λ ) (green curve). The
curve is normalised to a peak value of unity at 555 nm. The 1951 CIE luminosity function for scotopic (low
light) vision V ′(λ ) is also shown (blue curve).

3-4
Physics of Digital Photography (Second Edition)

The maximum luminous efﬁcacy Km is a normalisation constant,

K m = 683 lm/W.
A source of electromagnetic radiation with a high radiometric value will have zero
photometric value if wavelengths in the visible range are absent.

3.1.2 Ideal optical image

The ideal geometrical spectral irradiance distribution at the SP is the spectral
radiometric counterpart of equation (1.56), which is the illuminance formula derived
previously using photometry in section 1.5 of chapter 1:

π ⎛x y⎞ 1 ⎧ ⎛ x y ⎞⎫
Eλ,ideal(x , y ) = Le,λ⎜ , ⎟ 2 T cos4 ⎨φ⎜ , ⎟⎬ . (3.2)
4 ⎝ m m ⎠ Nw ⎩ ⎝ m m ⎠⎭

Here Le,λ is the spectral scene radiance, Nw is the working f-number, T is the lens
transmittance factor and m is the Gaussian magnification. The magnification has
been used to express the coordinates on the SP denoted by (x , y ) in terms of the
coordinates on the object plane (OP).
The cosine fourth term can be replaced with an image-space term, R(x , y, λ ),
referred to as the relative illumination factor (RI). This models the combined effects
of the natural fall-off due to the cosine fourth law, along with the vignetting arising
from the specific real lens design [6]. The RI factor is normalised to unity at the
optical axis (OA). Vignetting arising from the lens design typically decreases as the
f-number N increases, and so the RI factor is a function of N.

3.1.3 Point spread function (PSF)

In reality, a point on the OP will not be imaged as a point on the SP due to the
limited fidelity of the camera system. Light originating from a point on the OP will
always be distributed over an area surrounding the ideal image point. This leads to
blurring of the ideal point image, which can be described mathematically by a point
spread function (PSF) [1]. A variety of components in a camera system will
contribute to the blurring effect. Each contribution can be modelled by its own
PSF, and these can all be combined together to define an overall camera system
PSF.
Figure 3.2 shows a selection of real lens PSFs obtained using a microscope [12].
The first six images represent moderate IQ and are typical point spreads for fast
(large aperture) lenses used at maximum aperture or for wide-angle lenses at the
edge of the frame, while image 7 shows a PSF representative of a lens with
outstanding performance [12]. Image 8 is for the same lens that produced image 7
but an OLPF has been included. As described in section 3.4, OLPFs split light in
order to reduce an unwanted effect known as aliasing.

3-5
Physics of Digital Photography (Second Edition)

Figure 3.2. Point spread functions obtained using a microscope. The white square represents the size of a
photosite with 8.5 μ m pixel pitch. Figure reproduced from [12] with kind permission from ZEISS (Carl Zeiss AG).

3.1.4 Linear shift invariance

When light originating from the entire ﬁeld of view is projected by the lens onto the
SP, every ideal point image will be blurred. In turn, the entire image will be blurred
in comparison with the ideal optical image described by equation (3.2).
It is useful to represent the ideal and real spectral irradiance distributions at the
SP by the input and output functions f (x , y ) and g (x , y )
f (x , y ) = Eλ,ideal(x , y )
(3.3)
g(x , y ) = Eλ(x , y ).
Both f (x , y ) and g (x , y ) depend on wavelength λ, but for clarity this dependence
will not be explicitly indicated in this chapter.
Since the PSFs associated with each position on the SP will all overlap, evaluating
the real blurred image g (x , y ) may seem an intractable task. However, it turns out
that g (x , y ) can be straightforwardly determined provided two conditions are
satisﬁed.
• The PSF denoted by h(x , y ) must not change functional form when it is
shifted over the SP. This condition is referred to as shift invariance.
• A linear mapping must exist between f (x , y ) and g (x , y ).

A system that is both linear and shift-invariant is referred to as isoplanatic or linear

shift-invariant (LSI) [1, 4]. In practice, a camera system will only be approximately
LSI. For example, the ﬁrst condition does not hold in the presence of residual lens
aberrations. However, in practice, the camera system can be treated as a group of
subsystems that are approximately LSI over speciﬁc regions of the SP. The second
condition holds in terms of spectral irradiance provided the lighting is incoherent,
which is a reasonable approximation under typical photographic conditions.

3-6
Physics of Digital Photography (Second Edition)

When the camera system is treated as LSI, the output function g (x , y ) can be
determined at an arbitrary position on the SP via the following convolution integral:
∞ ∞
g (x , y ) = ∫−∞ ∫−∞ f (x′, y′) h(x − x′, y − y′) dx′ dy′. (3.4)

This can be written more compactly by introducing the convolution operator ∗:

g ( x , y ) = f ( x , y ) ∗ h( x , y ) . (3.5)

An informal derivation of the convolution integral is given below. Subsequently, the

convolution operation will be illustrated with some simple examples.

3.1.5 Convolution: derivation

This section gives an informal derivation of the convolution integral in 1D.
1. Mathematically, a point on the SP in 1D can be represented by a Dirac delta
function δ (x − x0 ). A delta function has the following properties:
δ (x − x 0 ) = 0 when x ≠ x0
∞
∫−∞ δ(x − x0) dx = 1.
The first property states that δ (x − x0 ) is zero everywhere except at the
location x0. The second property ensures that δ (x − x0 ) defines an area of
unity. This means that the height of a delta function approaches infinity as its
width becomes infinitesimally small. Nevertheless, the important character-
istic is the area, and so a delta function is represented graphically by an
upward arrow with height equal to unity, as shown in figure 3.4(a).
Weighting a function by δ (x − x0 ) and integrating over all space yields the
function value at x0:
∞ ⎧ f (x0), x = x0
∫−∞ f (x) δ(x − x0) dx = ⎨⎩0, x ≠ x0
. (3.6)

2. For illustration purposes, assume that the system is ideal apart from the
existence of an isolated PSF centred at position x0 on the SP and denoted by
h(x − x0 ). If the delta function at x0 is replaced by the PSF, which also
deﬁnes an area of unity, then weighting f (x ) by h(x − x0 ) and integrating
over all space will yield the output function value g (x0 ). In contrast to the
delta function, the PSF also has non-zero values at a range of positions x = xi
surrounding x0. Denoting the domain or kernel of the PSF by A, it follows
that
⎧ g(x0), x = x0
∞ ⎪
∫−∞ f (x ) h(x − x0) dx = ⎨ g(xi ), x = xi ∈ A .
⎪ 0, ∣x − x ∣ ∉ A
⎩ 0

3-7
Physics of Digital Photography (Second Edition)

3. Now consider a real system where blur exists at all positions on the SP. In
this case, all the PSFs associated with every point on the SP will overlap. In
order to determine the output function g (x ) at an arbitrary position x0, all
point spread towards position x0 needs to be evaluated. In an LSI system, it
turns out that this can be achieved simply by making the replacement
h(x − x0 ) → h(x0 − x ). In 1D, this amounts to flipping the PSF,
∞
∫−∞ f (x) h(x0 − x) dx = g(x0).
To see this graphically, consider the example PSF shown by the black curve in
figure 3.3 centred at point x0. This example PSF is symmetric for simplicity, but
this need not be the case in general. Three example points x1, x2 and x3
contained within the domain of the PSF are shown. The contribution to
g (x0 ) from x0 itself is the product f (x0 )h(x0 − x0 ) or f (x0 )h(0). The contri-
bution from x1 is seen to be the product f (x1)h(x0 − x1). Similarly, the
contributions from x2 and x3 are seen to be f (x2 )h(x0 − x2 ) and
f (x3)h(x0 − x3), respectively. The PSF has insufficient width to contribute to
g (x0 ) when centred at positions beyond x3. In summary,
g(x0) = f (x0)h(x0 − x0) + f (x1)h(x0 − x1)
+ f (x2 )h(x0 − x2 ) + f (x3)h(x0 − x3).

4. Since x actually varies continuously between x0 and x3, there are many more
contributions that must be accounted for,
x3
g (x 0 ) = ∫−x
3
f (x′) h(x0 − x′) dx′ .

f (x)

h(x)

x
x0 x1 x2 x3
A
Figure 3.3. Informal derivation of the convolution operation in 1D. The input function f (x ) is shown at
positions xi = x0, x1, x2 and x3. The PSF denoted by h(x ) is assumed to be shift-invariant.

3-8
Physics of Digital Photography (Second Edition)

The integration limits have been defined to take into account contributions
from both the left-hand side and right-hand side of x0. Moreover, the value x3
must be replaced by infinity to take into account a PSF of infinite extent,
∞
g (x 0 ) = ∫−∞ f (x′) h(x0 − x′) dx′.
The end result is that the output function g (x0 ) is given by the product of the
overlap of the input function f (x ) with the PSF centred at x0. In other words,
the functional form of the spread around x0 that contributes to the output at
x0 is defined by the PSF itself.
5. In an LSI system, the complete output function g (x ) can be obtained by
performing the above calculation at every output position x,
∞
g (x ) = ∫−∞ f (x′) h(x − x′) dx′ .

This result generalises to the 2D expression deﬁned by equation (3.4).

3.1.6 Convolution: examples

1. Although not achievable in a camera system, the ideal PSF in 1D is the delta
function itself,
h(x − x′) = δ(x − x′).

In this case, the PSF becomes infinitesimally narrow and its height extends to
infinity so as to preserve an area of unity. Since there is no spread around x
contributing to the output at x, the output function remains unchanged
compared to the input function,
∞
g (x ) = ∫−∞ f (x′) δ(x − x′) dx′ = f (x).
The next simplest PSF is a scaled delta function where B is a constant,
h(x − x′) = B δ(x − x′).
Again there is no spread around x contributing to the output at x and so the
result is a simple multiplication,
∞
g (x ) = B ∫−∞ f (x′) δ(x − x′) dx′ = B f (x).
2. Consider the input function f (x ) defined by the rectangle function illustrated
in figure 3.4(b),
f (x ) = rect(x ).

3-9
Physics of Digital Photography (Second Edition)

(a) (b) (c)

1 1 1

x0 x x0 x x0 x

wx 2wx

Figure 3.4. (a) Delta function of unit area positioned at x0. (b) 1D rectangle function centred at x0. (c) 1D
triangle function centred at x0.

The rectangle function for general x0 is deﬁned by

⎧ x − x0 1
⎪ 1, <
⎪ wx 2
⎛ x − x0 ⎞ ⎪⎪1 x − x0 1
rect⎜ ⎟=⎨ , = .
⎝ wx ⎠ ⎪ 2 wx 2
⎪
⎪ 0, x − x0 1
⎪ >
⎩ wx 2
In the present example, the rectangle function is centred at x0 = 0 and the
width wx = 1. Let the PSF similarly be a rectangle function centred at x0 = 0,
h(x ) = rect(x ).
The output function is given by the following convolution integral:
∞
g (x ) = ∫−∞ rect(x′) rect(x − x′) dx′.
The solution can be determined graphically according to figure 3.5. Starting
at x = −∞, the function f (x′) = rect(x′) along with the flipped PSF defined by
h(x − x′) = rect(x − x′) are plotted using x′ as the dummy variable on the
horizontal axis [1]. The output function g( −∞) is equal to the area of overlap
of rect(x′) and rect(x − x′). A new value for x is chosen and the procedure is
repeated. The flipped PSF shifts along the axis as x increases, and the overlap
must be calculated at each value of x.
In the present example, the area of overlap is easily seen to be zero for
−∞ < x ⩽ −1 and 1 ⩽ x < ∞, but is positive for values of x between −1 and
1. For example, figure 3.5(a) shows the overlap area is 0 for x = −1, and
figure 3.5(b) shows that the overlap area is 0.5 for x = −0.5. The overall
result turns out to be the triangle function centred at x0 = 0 shown in
figure 3.5(c),

g(x ) = tri(x ).

3-10
Physics of Digital Photography (Second Edition)

(a) x = -1 1

-3 -2 -1 0 1 2 3

(b) x = -0.5 1

-3 -2 -1 0 1 2 3

-3 -2 -1 0 1 2 3
Figure 3.5. Graphical solution of the convolution integral of example 2. The input function f (x′) is shown by
the blue rectangle function. The ﬂipped PSF is shown in yellow. (a) For x = −1, the area of overlap is zero and
so g(−1) = 0. (b) For x = −0.5, the area of overlap shown in grey is equal to 0.5 and so g(−0.5) = 0.5. (c) The
complete output function g (x ) is a triangle function.

The triangle function for general x0 is illustrated in ﬁgure 3.4(c). It is deﬁned

as follows:
⎧ x − x0
⎪ 0, ⩾1
⎛x − x0 ⎞ ⎪ wx
tri⎜ ⎟=⎨ .
⎝ wx ⎠ ⎪ x − x0 x − x0
⎪1 − , <1
⎩ wx wx

3.1.7 Optical transfer function

PSFs give qualitative insight into IQ. Although some quantitative IQ measures are
based on the PSF, the shape of a PSF can be very complicated and difficult to
describe in simple numerical terms, as evident from figure 3.2. A much more useful
quantitative description of IQ is provided by the transfer function, which is the
Fourier transform (FT) of the PSF. For an optical system such as a camera, a
component transfer function is referred to as an optical transfer function (OTF) [13].
Consider the ideal and real spectral irradiance distributions at the SP denoted by
f (x , y ) and g (x , y ), respectively. These have been defined by equation (3.3),
f (x , y ) = Eλ,ideal(x , y )
g(x , y ) = Eλ(x , y ).
The camera system PSF denoted by h(x , y ) and its corresponding OTF denoted by
H (μx , μy ) are related by the following FT pair:

3-11
Physics of Digital Photography (Second Edition)

∞ ∞
h(x , y ) = ∫−∞ ∫−∞ H (μx , μy) ei 2π(μ x+μ y) dμx dμy
x y

∞ ∞ . (3.7)
H (μx , μy ) = ∫−∞ ∫−∞ h(x , y ) e−i 2π (μx x+μyy ) dx dy

Analogous expressions can be written for the FT pairs f (x , y ) ↔ F (μx , μy ) and

g (x , y ) ↔ G(μx , μy ).
The FT gives the prescription for constructing an arbitrary function as a
superposition of sinusoidal waveforms at various spatial frequencies μx and μy ,
phases, and amplitudes. The spatial frequencies are expressed in units of cycles per
unit distance, for example, cycles mm−1. These spatial frequencies, phases, and
amplitudes correspond to spectral irradiance waveforms in a 2D plane and are
characterised by the optical waves that travel from the OP to the SP.
Consider the real space representation of the convolution integral deﬁned by
equation (3.5),
g(x , y ) = f (x , y ) ∗ h(x , y ).

A major advantage of the spatial frequency representation arises from the con-
volution theorem, which infers that the convolution operation in the real domain is
equivalent to a simple product in the Fourier domain. The above equation may
therefore be written as follows:
G (μx , μy )
H (μx , μy ) = .
F (μx , μy )

Recall that the total volume under a PSF is normalised to unity [14],

∫ ∫ h(x, y ) dx dy = 1.
Therefore, OTFs are always normalised to unity at (0,0).
The OTF is seen to provide a simple relationship between the ideal input and real
output spectral irradiance distributions as a function of spatial frequency. Since the
OTF is a complex quantity in general, more speciﬁc information can be extracted by
expressing the OTF in terms of its modulus and phase,
H (μx , μy ) = ∣H (μx , μy )∣ e iϕ(μx , μy ).

The modulus ∣H (μx , μy )∣ and phase ϕ(μx , μy ) are deﬁned as the modulation transfer
function (MTF) and phase transfer function (PTF), respectively,

MTF(μx , μy ) = ∣H (μx , μy )∣
.
PTF(μx , μy ) = ϕ(μx , μy )

3-12
Physics of Digital Photography (Second Edition)

3.1.8 Modulation transfer function (MTF)

The MTF has been deﬁned above as the modulus of the OTF
G (μx , μy )
MTF(μx , μy ) = ∣H (μx , μy )∣ = . (3.8)
F (μx , μy )

The MTF is commonly used as a fidelity metric in imaging. In order to gain physical
insight into the significance of the MTF, consider imaging a sinusoidal target pattern
at the OP defined by a single spatial frequency. The ideal spectral irradiance
waveform f (x , y ) at the SP is shown by the blue curve in figure 3.6. At the SP, the
spatial frequency of f is an image-space spatial frequency, which depends upon the
magnification. This spatial frequency has been denoted by μx0, μy0 .
The modulation depth of f is defined as the ratio of the magnitude of the waveform
(ac value) to the mean value (dc bias),
∣f ∣
Mf = .
dc
Notice that the modulation depth is independent of the spatial frequency μx0, μy0 . In
an LSI system, convolving f with a PSF will yield a real output spectral irradiance
waveform g (x , y ) at the same spatial frequency μx0, μy0 and with the same dc bias [1].
However, the magnitude and therefore the modulation depth will be attenuated
according to equation (3.8),

∣ g∣ 1 G (μx0 , μy0)
Mg(μx0 , μy0) = = f .
dc dc F (μx0 , μy0)

In other words, the modulation depth of the output waveform depends upon the
spatial frequency μx0 , μy0 [1, 15]. Combining the above equations yields the
following:

1 cycle
= 1/μx0

|f |
|g|

x
Figure 3.6. The blue curve shows an ideal input irradiance waveform f (x, y ) deﬁned by a single spatial
frequency μx0 , μy0 at the SP. The magenta curve shows an example reduction in magnitude after convolving
f (x, y ) with a PSF that partially ﬁlls in the troughs of the waveform. Phase differences have not been included.

3-13
Physics of Digital Photography (Second Edition)

Mg(μx0 , μy0)
∣H (μx0 , μy0)∣ = .
Mf
If the whole process is repeated but the spatial frequency of the sinusoidal target
pattern is changed each time, the following general function of image-space spatial
frequency will be obtained:
Mg(μx , μy )
∣H (μx , μy )∣ = .
Mf
This is precisely the MTF deﬁned by equation (3.8).
Modulation depth can be interpreted by expressing the ac/dc ratio in the form of
the Michelson equation [15],
E max − E min
M= .
E max + E min
Here Emin and Emax are the minimum and maximum values of the spectral
irradiance waveform at a speciﬁed spatial frequency. Since neither Emax nor Emin
can be negative, it must follow that 0 ⩽ M ⩽ 1. This interpretation is synonymous
with contrast, although contrast as a term is applied to square waveforms rather
than sinusoidal waveforms If the camera component under consideration introduces
point spread, then
0 ⩽ Mg(μx , μy ) ⩽ Mf ⩽ 1.

Therefore the MTF is bounded by zero and unity,

0 ⩽ MTF(μx , μy ) ⩽ 1 .

Since most of the point spread is typically concentrated over a small blur spot
surrounding the ideal image point, the MTF will only be reduced slightly from unity
at low spatial frequencies. However, the MTF will drop significantly at higher
spatial frequencies as the peak to peak separation approaches the size of the blur
spot.
The spatial frequency at which the MTF drops to zero for the camera component
under consideration is defined as its cut-off frequency. Image information cannot be
transmitted above the cut-off frequency. This will be important for defining
resolving power in chapter 5.
As an example, a perfect aberration-free lens has an associated PSF that arises
due to lens aperture diffraction. For a circular lens aperture, the PSF is defined by
the well-known Airy pattern illustrated in figure 3.14 of section 3.2.4. The
corresponding MTF when the f-number is set at N = 4 is shown along one spatial
frequency direction in figure 3.7. Section 3.2 will discuss lens aperture diffraction in
further detail.

3-14
Physics of Digital Photography (Second Edition)

Figure 3.7. MTF due to lens aperture diffraction for monochromatic light with λ = 550 nm along one spatial
frequency direction for a perfect aberration-free lens set at N = 4. In this case the cut-off frequency is 455
cycles/mm.

3.1.9 Phase transfer function

As the modulus of the OTF, the MTF leaves phase information out of consid-
eration. The PTF describes the change in phase ϕ(μx , μy ) of the output sinusoidal
waveforms G(μx , μy ) relative to the input F (μx , μy ) as a function of spatial frequency.
Since these waveforms combine to form the ideal and real spectral irradiance
distributions at the SP represented by f (x , y ) and g (x , y ), respectively, any change
in phase can have a significant effect on the nature of the output image.
The PTF takes values between −π and +π . A PTF that is linear will shift the
image [4]. For example, the sinusoidal waveforms shown in figure 3.6 have spatial
frequency μx0 , μy0. A linear PTF at μx0, μy0 will shift the attenuated waveform shown
in magenta relative to the input waveform shown in blue.
If the PSF is symmetric, the PTF will be either zero or +π . These values
correspond to spatial frequency ranges where the OTF is positive or negative,
respectively [15]. A phase reversal will reverse the contrast at the specified spatial
frequency, meaning that peaks and troughs will switch places.
Certain residual lens aberrations such as coma are associated with real but
asymmetric PSFs. In this case the PTF will be a nonlinear function. Nonlinearities
cause phase distortions that can severely degrade the image.
Although not achievable in a camera system, a perfect imaging system would in
principle have a system MTF(μx , μy ) = 1 and PTF(μx , μy ) = 0 for all spatial
frequencies.

3.1.10 Model camera system

A simple model camera system consists of a lens, OLPF, and imaging sensor, all of
which provide contributions to the total camera system PSF. The camera system
PSF is a convolution of each component PSF,

3-15
Physics of Digital Photography (Second Edition)

hsystem(x , y ) = hoptics(x , y ) ∗ h OLPF(x , y ) ∗ hsensor(x , y ) .

Each of these component PSFs can be subdivided into further PSF contributions
that are similarly convolved together. The simple model described in this chapter
includes only the contribution to the lens PSF arising from lens aperture diffraction,
and the contribution to the sensor PSF arising from the detector aperture at each
photosite.
The aim is to determine the real spectral irradiance distribution at the SP denoted
by g (x , y ). This is obtained by convolving the camera system PSF with the ideal
spectral irradiance distribution denoted by f (x , y ):
g(x , y ) = hsystem(x , y ) ∗ f (x , y ),
where
f (x , y ) = Eλ,ideal(x , y )
g(x , y ) = Eλ(x , y ),
and Eλ,ideal(x , y ) is defined by equation (3.2),
π ⎛x y⎞ 1 ⎧ ⎛ x y ⎞⎫
Eλ,ideal(x , y ) = Le,λ⎜ , ⎟ 2 T cos4 ⎨φ⎜ , ⎟⎬ .
4 ⎝ m m ⎠ Nw ⎩ ⎝ m m ⎠⎭
Physically it is useful to think of the camera system PSF as a blur filter analogous to
those available in image editing software. The convolution operation is analogous to
the sliding of a blur filter over the ideal optical image to produce the real optical
image. The shape, diameter, and strength of the blur filter determines the level of
blur present in the real image. Most of the blur strength is concentrated in the region
close to the ideal image point.
It will be shown in section 3.3 that a sampling of the real spectral irradiance
distribution must accompany the detector-aperture contribution to the sensor PSF.
Ultimately this arises from the fact that sensor photosites are not point objects, and
so the charge signal obtained from a photosite is not a continuous function over the
SP. Mathematically, the sampling operation can be modelled as a multiplication by
the so-called comb function. The comb function is a 2D array of delta functions that
restrict the output of the convolution operation to the appropriate grid of sampling
positions defined by the photosite or sensor pixel spacings px and py,
⎡x y⎤
g˜(x , y ) = (hsystem(x , y ) ∗ f (x , y )) comb⎢ , ⎥ .
⎢⎣ px py ⎥⎦

The sampled real spectral irradiance distribution g̃ (x , y ) = E˜λ(x , y ) can be used to

model the charge signal and subsequent digital raw value obtained from each
photosite.
In practice it is more convenient to work in the Fourier domain. Since the FT of
each component PSF yields the component transfer function or component OTF,

3-16
Physics of Digital Photography (Second Edition)

the FT of the camera system PSF is the camera system transfer function or camera
system OTF,
Hsystem(μx , μy ) = FT{hsystem(x , y )} .

From the convolution theorem introduced in section 3.1.7, the camera system OTF
is obtained as a multiplication of the component OTFs,
Hsystem(μx , μy ) = Hoptics(μx , μy ) HOLPF(μx , μy ) Hsensor(μx , μy ) .

The camera system PSF is the inverse FT of the camera system OTF.
The following sections derive expressions for the optics, OLPF, and imaging
sensor contributions to the model camera system PSF and OTF.

3.2 Optics
The Airy pattern is a famous pattern in optics that arises from the PSF due to
diffraction at a lens aperture. Since diffraction is of fundamental importance to IQ in
photography, this section traces the physical origins of the diffraction PSF, starting
from the wave equation. The mathematical expression for the diffraction PSF for a
circular lens aperture is arrived at in section 3.2.4.
This section also brieﬂy shows how a description of lens aberrations can be
included within the framework of linear systems theory by introducing the wave-
front error.

3.2.1 Wave optics

Electromagnetic waves are transverse waves with oscillating electric field E and
magnetic field H components oriented perpendicular to each other and to the
direction of wave propagation, as illustrated in figure 3.8. The fields themselves
depend on position r = (x , y, z ) and time t. Being vector fields, they have orthogonal
components in the x, y and z directions,
E(r , t ) = (Ex(r , t ), Ey(r , t ), Ez(r , t ))
H(r , t ) = (Hx(r , t ), Hy(r , t ), Hz(r , t )).

Electromagnetic waves are described by Maxwell’s equations, which form the basis
of electromagnetic optics. However, the theory of electromagnetic optics greatly
simpliﬁes when the propagation medium satisﬁes four important properties. If the
medium is linear, isotropic, uniform and nondispersive [13, 16], then Maxwell’s
equations reduce to the following two wave-type equations:
1 ∂ 2E
∇2 E − =0
c 2 ∂t 2
1 ∂ 2H
∇2 H − 2 2 = 0.
c ∂t

3-17
Physics of Digital Photography (Second Edition)

E
λ
x H

Figure 3.8. Electromagnetic wave at a given instant in time propagating in the z-direction. The electric and
magnetic ﬁelds oscillate in phase and are perpendicular to each other and to the direction of propagation.

Here c is the speed of light in the medium. Significantly, these equations are also
satisfied by any of the individual vector field components:
1 ∂ 2U (r , t )
∇2 U (r , t ) − = 0. (3.9)
c2 ∂t 2
Here U (r, t ) can be any of Ex, Ey, Ez, Hx, Hy or Hz. In other words, it is not
necessary to solve for the full vector quantities when the medium satisfies the above
properties, and instead only one scalar wave equation needs to be solved. This is the
basis of the scalar theory of wave optics [13, 16].
At the interface between two dielectric media such as the lens aperture and the iris
diaphragm, equation (3.9) no longer holds for the individual components.
Nevertheless, the coupling at the interface can be neglected provided the aperture
area is large compared to the wavelength [13, 16]. In this case wave optics remains a
good approximation. On the other hand, the vector theory of electromagnetic optics
based upon Maxwell’s equations must be used when the medium does not satisfy one
of the properties of linearity, isotropy, uniformity or nondispersiveness. For
example, the vector theory is needed to describe polarization.
For monochromatic light, the complex solution of equation (3.9) can be written
U (r , t ) = U (r) e i 2πνt .
• ν is the optical frequency.
• U (r) is the complex field or complex amplitude,
U (r) = A(r) e iϕ(r).
• A(r) is a complex constant.
• ∣U (r)∣ defines the real amplitude.
• ϕ(r) is the phase describing the position within an optical cycle or wavelength.

3-18
Physics of Digital Photography (Second Edition)

Figure 3.9. Cross section of a spherical wave with the source at the origin. The phase is ϕ = arg(A) − kr and
the wavefronts satisfy kr = 2πn + arg(A), where n is an integer. The spacing between consecutive wavefronts is
given by λ = 2π /k .

Figure 3.9 shows example wavefronts for a spherical wave. These are positions with
the same phase. At a point on a wavefront, the vector normal to the wavefront at
that point describes the direction of propagation. These vectors can be associated
with the geometrical rays used in chapter 1.
Substituting U (r, t ) into equation (3.9) reveals that the complex amplitude must
satisfy the Helmholtz equation,
(∇2 + k 2 )U (r) = 0.

The wavenumber k corresponding to λ is deﬁned by

2πν 2π
k= = .
c λ
Since the Helmholtz equation is a linear equation, a linear combination of any two
solutions is also a solution. This means that the principle of superposition applies
and the sum of the individual complex amplitudes at any given position gives the
total complex amplitude at that position [13, 16].
When imaging using polychromatic light, the camera system PSF would be based
on the complex amplitude if the light were to be perfectly correlated or fully
coherent. Point sources such as lasers can produce spatially coherent light, which will
also have high temporal coherence provided the range of wavelengths defines a
narrow band. However, extended sources such as the Sun give rise to light that is
both temporally and spatially random or incoherent. In this case, the response of the
camera system is linear in irradiance, which was defined in section 3.1.1 as the power
per unit area received by a specified surface measured in W/m2, and so the camera
system PSF should be based on irradiance. For incoherent illumination, irradiance is
simply the squared magnitude of the complex amplitude:

Ee(r) = ∣U (r)∣2 . (3.10)

3-19
Physics of Digital Photography (Second Edition)

This relation is valid provided the irradiance is stationary, meaning that ﬂuctuations
will occur over very short time scales but the average value will be independent of
time. In this case the time dependence of the wave solutions does not need to be
considered.
In reality, light is generally partially coherent, which requires a much more
complicated analysis [16]. In this chapter, the light will be assumed to be
polychromatic and incoherent, which is a reasonable approximation under typical
photographic conditions. In this simpliﬁed approach, the light can be treated
mathematically as monochromatic but incoherent, and the polychromatic effects
can be taken into account simply by integrating the spectral irradiance at the SP over
the spectral passband [4, 6]. The spectral passband of the camera is the range of
wavelengths over which it responds, λ1 → λ2 . The advantage of this approach is that
the camera response functions can be included as weighting functions when
integrating over the spectral passband. This will become apparent when deriving
the charge signal in section 3.6.
In the theory of wave optics, power per unit area is commonly referred to as the
optical intensity or simply the intensity. These terms will not be used in this book so
as to avoid confusion with the radiant intensity, which is a directional quantity
involving solid angle [9].

3.2.2 Huygens–Fresnel principle

An important solution of the Helmholtz equation written in spherical coordinates is
the spherical wave,
A −ikr
Usphere(r) = e .
r
Here A is a complex constant and r is the radial distance. As shown in figure 3.9, the
wavefronts are spheres centred at the origin and so the wave propagates radially.
The spherical wave is important because the Huygens–Fresnel principle states that
all points on any wavefront actually spread out as spherical waves. In other words,
the points on a wavefront are themselves sources of spherical waves, the super-
position of which maintains the wavefront as it propagates.
The Fresnel–Kirchhoff equation is a mathematical expression of the Huygens–
Fresnel principle within wave optics. Denoting the complex amplitude or field for a
distribution of source points {x1, y1} on a plane perpendicular to the OA by U (x1, y1),
the resulting field at points {x2, y2 } on a plane separated by a distance z from the
source is given by

1 ∞ ∞
e ikR
U (x2 , y2 ) =
iλ
∫−∞ ∫−∞ U (x1, y1)
∣R∣
dx1 dy1 .

3-20
Physics of Digital Photography (Second Edition)

Here R is the distance between (x1, y1) and (x2, y2 ),

∣R∣ = (x2 − x1)2 + (y2 − y1)2 + z 2 .

Since R depends only on the distance perpendicular to the OA between the source
points and the resulting field points rather than on their absolute positions in the
plane, R ≡ R(x2 − x1, y2 − y1), the Fresnel–Kirchhoff equation is seen to be a
convolution of the source field with a spherical wave impulse response,
⎛ 1 e ikR ⎞
U (x2 , y2 ) = U (x1, y1) ∗ ⎜ ⎟.
⎝ iλ ∣R∣ ⎠
The impulse response is an example of a Green’s function [13].
The Huygens–Fresnel principle can be observed when a wave passes through a
narrow opening such as a lens aperture. Spherical waves are seen to propagate
radially outwards, but the effects of superposition are restricted at points nearer to
the edges of the aperture. Consequently the whole wavefront is distorted as shown in
figure 3.10, leading to a visible diffraction pattern of constructive and destructive
interference. The diffraction pattern leads to a blurring of the optical image at the
SP. This blur contribution can be modelled by the diffraction PSF.

3.2.3 Aperture diffraction PSF

Due to the importance of diffraction in relation to IQ in photography, a simpliﬁed
derivation of the diffraction PSF for incoherent light is given below in order to illustrate

Figure 3.10. Diffraction of incoming plane waves at an aperture. Spherical waves propagating radially
outward are shown from three example source points marked by crosses. Superposition of spherical waves is
limited closer to the edges of the aperture leading to a spreading out of the waves and a visible diffraction
pattern at the SP.

3-21
Physics of Digital Photography (Second Edition)

the basic underlying principles. The general result is defined by equation (3.16), and the
result for the specific case of a circular aperture is defined by equation (3.19).
The following derivation proceeds by first determining the diffraction PSF for
fully coherent illumination. This is the diffraction amplitude PSF as it is based on
complex amplitude. A delta function source point at the axial position on the OP is
considered, and the lens is treated as a thin lens with the aperture stop at the lens. An
expression is derived for the complex amplitude response at the SP arising from the
delta function input. The magnification is then used to project the input coordinates
onto the SP to form an LSI system valid for coherent illumination, and the thin lens
model is generalised to a compound lens model. For fully incoherent illumination,
the diffraction PSF is obtained in terms of irradiance as the squared magnitude of
the amplitude PSF.
In this section, the following notation is used to distinguish spatial coordinates on
the OP, aperture plane, exit pupil (XP) plane and SP:

(xop, yop) : coordinates on the OP

(xap, yap) : coordinates on the aperture plane
(xxp, yxp) : coordinates on the XP plane
(x , y ) : coordinates on the SP

(1) Thin lens model

Figure 3.11 shows the wavefronts entering a lens from an object point
positioned on the OA. The wavefronts are modified by the lens and
subsequently converge towards the SP. As a result of diffraction at the
lens aperture, the corresponding field at the SP will be a distribution
described by a PSF rather than a point.
Let the field U (xop, yop ) at the object position (xop, yop ) be described by a
delta function. When the Fresnel–Kirchhoff equation is applied to this field,
the integration vanishes and the field at the lens becomes

OP SP

Figure 3.11. A spherical wave emerging from an axial source position is modiﬁed by the phase transmission
function of the lens and subsequently converges towards the ideal image position at the SP.

3-22
Physics of Digital Photography (Second Edition)

1 e ikR
U (xap, yap) = .
iλ ∣R∣
Here R is the distance between the object position (xop, yop ) and the position
on the aperture plane (xap, yap ),

R= (xap − xop)2 + (yap − yop)2 + s 2 .

The geometry is shown in ﬁgure 3.12. The lens modiﬁes the spherical
wavefront at the aperture plane in the following manner:
U ′(xap, yap) = U (xap, yap) a(xap, yap) tl (xap, yap).

Here tl is the phase transformation function [13, 16] for a converging thin lens
with focal length f,
⎛ k 2 ⎞
tl (xap, yap) = exp ⎜ −i
⎝ 2f
(
xap + yap2 ⎟ .
⎠
)
This form of tl neglects aberrations and is strictly valid only in the paraxial
region [13]. The aperture function a(xap, yap ) restricts the size of the aperture
by taking a value of unity where the aperture is clear, and zero where the
aperture is blocked.
Now the Fresnel–Kirchhoff equation can be applied to the set of source
points {(xap, yap )} lying in the aperture plane. The resulting ﬁeld at the SP is
given by

1 ∞ ∞
e ikR ′
U ′(x , y ) =
iλ
∫−∞ ∫−∞ U ′(xap, yap)
∣R′∣
dxap dyap .

OP SP

R R
(xap , yap )
(xop , yop ) (x, y)

s s
Figure 3.12. The set of coordinates {(xap, yap )} lie in the aperture plane for a thin lens with the aperture at the
lens. The coordinates (xop, yop ) and (x, y ) denote the source and image positions, respectively. The distances
R and R′ are indicated for an example coordinate (xap, yap ) .

3-23
Physics of Digital Photography (Second Edition)

Here R′ is the distance between (xap, yap ) and (x , y ),

R′ = (x − xap)2 + (y − yap)2 + s′2 .

The complex amplitude U ′(x , y ) is the amplitude response due to the delta
function input and can be denoted hamp,diff (xop, yop ; x , y ). This emphasizes
the dependence on the OP coordinates through R. Putting everything
together leads to the following expression:
∞ ∞
1
hamp,diff (xop, yop ; x , y ) =
(iλ)2
∫−∞ ∫−∞ dxap dyap
(3.11)
e ikR e ikR ′
× tl (xap, yap) a(xap, yap)
∣R∣ ∣R′∣
The response hamp,diff (xop, yop ; x , y ) depends upon both the OP and SP
coordinates. It is therefore shift variant and not yet in the form of a PSF.
Before proceeding further, various simplifying approximations need to be
made.
(2) Simplifying approximations
Close to the OP, the spherical wavefronts can be approximated as parabolic
wavefronts. This is achieved by performing a Taylor expansion of R in the
numerator of equation (3.11) and dropping higher-order terms At the lens,
the parabolic phase term can be further expanded and terms dropped. This
leads to a plane wave approximation,
e ikR e ikR plane
≈ ,
∣R∣ s
where
2 2 2
xop + yop xap + yap2 xap xop + yap yop
R plane = s + + − .
2s 2s s
After diffraction by the lens aperture, similar approximations can be made.
The parabolic approximation in the so-called near-ﬁeld region leads to a
description of diffraction known as Fresnel diffraction. The plane-wave
approximation in the far-ﬁeld region where the SP is located leads to a
description known as Fraunhofer diffraction,
′
e ikR ′ e ikR plane
≈ ,
∣R′∣ s′
where
2
x2 + y2 xap + yap2 xap x + yap y
′ = s′ +
R plane + − .
2s′ 2s′ s′

3-24
Physics of Digital Photography (Second Edition)

Substituting these expressions together with the phase transformation

function tl into equation (3.11) leads to an equation containing many terms,

e ik(s+s ′)
hamp,diff (xop, yop ; x , y ) ≈
(iλ)2 ss′
∞ ∞ ⎧ ik ⎫ ⎧ ik 2 2 ⎫
× ∫ ∫
−∞ −∞
a(xap, yap) exp ⎨ (x 2 + y 2 )⎬ exp ⎨
⎩ 2s′ ⎭ ⎩ 2s
xop + yop ⎬
⎭( )
⎧k ⎛1 1 1 ⎞⎫
× exp ⎨ xap
⎩2
(
2
+ yap2 ⎜ +
⎝s
) s′
− ⎟⎬
f ⎠⎭
⎧ −ik ⎫ ⎧ −ik ⎫
× exp ⎨
⎩ s (
xap xop + yap yop ⎬ exp ⎨
⎭ ) ⎩ s′ (
xap x + yap y ⎬ dxap dyap .
⎭ )
Fortunately, this expression can be simpliﬁed. Arguments can be given for
dropping the ﬁnal exponential term on the second line [13]. Furthermore,
the exponential term on the third line vanishes according to the Gaussian
lens conjugate equation in air,
1 1 1
+ − = 0.
s s′ f

The simpliﬁed expression becomes

e ik(s+s ′) ⎧ ik 2 ⎫ ∞ ∞
hamp,diff (xop, yop ; x , y ) ≈ exp ⎨ ( x + y 2 ⎬
) ∫ ∫ a(xap, yap)
(iλ)2 ss′ ⎩ 2s′ ⎭ −∞ −∞
(3.12)
⎧ −ik ⎫ ⎧ −ik ⎫
× exp ⎨
⎩ s (
xap xop ) ⎬
+ yap yop exp
⎭
⎨ (
⎩ s′ ap
⎬ )
x x + yap y dxap dyap .
⎭

(3) Shift invariance

The ﬁnal line of equation (3.12) can be rewritten by substituting the
magniﬁcation m = −s′/s according to the optical rather than photographic
sign convention,

e ik(s+s ′) ⎧ ik ⎫
hamp,diff (x − mxop, y − myop) ≈ exp ⎨ (x 2 + y 2 )⎬
2
(iλ) ss′ ⎩ 2s′ ⎭
∞ ∞ ⎧ −ik ⎫ ⎧ −ik ⎫
× ∫−∞ ∫−∞ a(xap, yap) exp ⎨ (x − mxop)xap⎬ exp ⎨ (y − myop)yap⎬ dxap dyap .
⎩ s′ ⎭ ⎩ s′ ⎭

The magniﬁcation projects the object-space coordinates onto the SP and so

(mxop, myop ) are image-space coordinates.
For fully coherent lighting, the system is now in a form that is LSI. In this
case, the diffraction amplitude PSF is given by

3-25
Physics of Digital Photography (Second Edition)

e ik(s+s ′) ⎧ ik ⎫
hamp,diff (x , y , λ) ≈ exp ⎨ (x 2 + y 2 )⎬
2
(iλ) ss′ ⎩ 2s′ ⎭
(3.13)
∞ ∞
× ∫−∞ ∫−∞ a(xap, yap) ( )
x
e−i 2π λs ′ xap e − i 2π ( λys′ )y
ap
dxap dyap .

In the second line, the wavenumber k has been replaced by making the
substitution k = 2π /λ .
If the light is fully incoherent, the system is not linear in complex amplitude
and so hamp,diff (x , y, λ ) does not deﬁne a PSF. Instead, the system is linear in
irradiance. The diffraction PSF for incoherent lighting can be straightfor-
wardly determined from hamp,diff (x , y, λ ), as shown in step (6) below.
(4) Fourier transformation
By comparing equation (3.13) with the FT deﬁned by equation (3.7), the
second line should be recognisable as the FT of the aperture function,
A(μx , μy ) = FT{a(x , y )}.

This deﬁnes the amplitude transfer function. However, the spatial frequencies
μx and μy are to be substituted by the following real-space quantities deﬁned
at the SP:
x y
μx = , μy = .
λs′ λs′
Now equation (3.13) may be written as follows:
e ik(s+s ′) ⎧ ik ⎫ ⎛ x y ⎞⎟
hamp,diff (x , y , λ) ≈ exp ⎨ (x 2 + y 2 )⎬ A⎜μx = , μy = .
2
(iλ) ss′ ⎩ 2s′ ⎭ ⎝ λs′ λs′ ⎠
When the illumination is coherent, this important result shows that the real
optical image at the SP is a convolution of the ideal image predicted by
geometrical optics with a PSF that is the Fraunhofer diffraction pattern of the
XP [13].
(5) Compound lens model
The derivation for the thin lens model can be generalised to the case of a
general compound lens by associating all diffraction effects with either the
entrance pupil (EP) or the XP [13].
Taking the viewpoint that all diffraction effects are associated with the XP,
the coordinates of the aperture function must now be considered on the XP
plane instead. The aperture function must take a value of unity where the
projected aperture is clear, and zero where the projected aperture is blocked:

U0 e ikz ′ ⎧ ik 2 ⎫ ⎛ x y ⎞⎟
hamp,diff (x , y , λ) ≈ exp ⎨ (x + y 2 )⎬A⎜μx = , μy = . (3.14)
iλz′ ⎩ 2z′ ⎭ ⎝ λz ′ λz ′ ⎠

Here U0 is a complex constant, and A(x /λz′, y /λz′) is the FT of the aperture
function evaluated at the spatial frequencies μx = x /λz′ and μy = y /λz′.

3-26
Physics of Digital Photography (Second Edition)

The distance s′ measured from the aperture plane to the SP for the thin
lens has been replaced by the distance z′ measured from the XP to the SP.
The Gaussian expression for z′ is
n′
′ =
z′ = s′ − sxp DxpNw. (3.15)
n
′ are the distances from the second principal plane to the SP
Here s′ and sxp
and XP, respectively, Dxp is the diameter of the XP, and Nw is the working f-
number.
(6) Incoherent illumination
When the illumination is incoherent, the system is linear in irradiance rather
than complex amplitude. In this case, the diffraction PSF is obtained by
utilizing equation (3.10) and taking the squared magnitude of the amplitude
PSF deﬁned by equation (3.14):

U02 ⎛ x y ⎞⎟ 2
hdiff (x , y , λ) = A⎜μx = , μy = . (3.16)
(λz′)2 ⎝ λz ′ λz ′ ⎠

The volume under the PSF should be normalised to unity, and this
expression is valid for source points on or very close to the OA.

This leads to the important result that for incoherent illumination, the real optical
image at the SP is a convolution of the ideal image predicted by geometrical optics
with a PSF that is proportional to the squared magnitude of the Fraunhofer diffraction
pattern of the XP [13].

3.2.4 Circular aperture: Airy disk

For a circular lens aperture, the aperture function describing the XP plane can be
modelled by a radially symmetric circle function (ﬁgure 3.13) of diameter
Dxp = m pD, where m p is the pupil magniﬁcation and D is the diameter of the EP,
⎧1, rxp < Dxp /2
⎡ 2 2 ⎤ ⎪
⎢ x xp + yxp ⎥ ⎡ rxp ⎤ ⎪1
a circ(xxp, yxp) = circ⎢ ⎥ = circ⎢⎣ D ⎥⎦ = ⎨ 2 , rxp = Dxp /2 .
⎢⎣ Dxp ⎥⎦ xp ⎪
⎪ 0, rxp > Dxp /2
⎩
The FT of an aperture function modelled by a circle function is the so-called jinc
function:

⎡ ⎤ 2J1⎡⎣πDxp μr ⎤⎦
Acirc (μx , μy ) = jinc⎣Dxp μx2 + μy2 ⎦ = jinc⎡⎣Dxp μr ⎤⎦ = . (3.17)
πDxp μr

Here μr is the radial spatial frequency, and J1 is a Bessel function of the ﬁrst kind.

3-27
Physics of Digital Photography (Second Edition)

1
y

Dxp

Figure 3.13. Circle function used to model the aperture function.

For incoherent illumination, the diffraction PSF for a circular aperture is

obtained by substituting equation (3.17) into (3.16),

U02 ⎛ ⎛ x ⎞2 ⎛ y ⎞2 ⎞
2
π 2
hdiff,circ(x , y , λ) = Dxp jinc⎜⎜Dxp ⎜ ⎟ + ⎜ ⎟ ⎟⎟ . (3.18)
(λz′)2 4 ⎝ ⎝ λz ′ ⎠ ⎝ λz ′ ⎠ ⎠

For the special case that focus is set at inﬁnity, z′ → m pf ′ = Dxp N . Now substituting
for z′ yields the following expression:

⎛ U0 πDxp ⎞2 2J1[α(x , y , λ)]

2
hdiff,circ(x , y , λ) = ⎜ ⎟ . (3.19)
⎝ 4λ N ⎠ α( x , y , λ )

The α variable is deﬁned by

π x2 + y2 πr
α( x , y , λ ) = = .
λN λN
A 1D illustration of the PSF is given in figure 3.14. Illustrations in 2D and 3D are
given in figure 3.15. A sequence of diffraction rings where the PSF is zero are
apparent. If r, the radial distance on the SP, is measured in units of λN , these rings
occur when J1[πr ] = 0. The first zero ring has the following solution:

r = 1.22 λN .
This deﬁnes the well-known Airy disk, and the diameter or spot size of the Airy disk
is 2.44 λN .
The spot size contains the major contribution to the point spread associated with
diffraction. Since the spot size becomes wider as the f-number N increases, increased
blurring of the image occurs with increasing N. This effect is known as diffraction
softening. Landscape photographers often need to maximise depth of ﬁeld by using a
small aperture while simultaneously avoiding noticeable diffraction softening. On a
35 mm full-frame camera, around N = 11 is generally considered to achieve the
optimum balance when the output image is viewed under standard viewing
conditions.

3-28
Physics of Digital Photography (Second Edition)

Figure 3.14. Cross-section of the incoherent diffraction PSF for a single wavelength with the lens focused at
infinity. The lens aperture is unobstructed and circular, and the lighting is incoherent. The horizontal axis
represents radial distance from the origin measured in units of λN . The first zero ring at a radial distance
rAiry = 1.22 λN defines the Airy disk. The volume under the PSF should be normalised to unity.

Figure 3.15. Incoherent diffraction PSF for a single wavelength λ with the lens focused at inﬁnity. The lens
aperture is unobstructed and circular, and the lighting is incoherent. No contrast adjustments have been made.

3.2.5 Aperture diffraction MTF

Recall the expression for the diffraction PSF in the presence of incoherent
illumination deﬁned by equation (3.16),
U02 ⎛ x y ⎞⎟ 2
hdiff (x , y , λ) = A⎜μx = , μy = .
(λz′)2 ⎝ λz ′ λz′ ⎠
The diffraction OTF for incoherent illumination is obtained by taking the FT of the
diffraction PSF. Recall that the square modulus of the FT of the aperture function,

3-29
Physics of Digital Photography (Second Edition)

2
A(μx , μy ) , is a real-space quantity as the spatial frequencies are substituted by real-
2
space coordinates, x /(λz′), y /(λz′). Therefore, the FT of A(μx , μy ) will yield a
Fourier-domain function related to a(x , y ) since the real-space coordinates will be
substituted by the spatial frequencies (λz′μx , λz′μy ). The following identity can be
used to take the FT:
⎧ ⎛ x y ⎟⎞ 2⎫
⎧ ⎛ x y ⎞⎟ *⎛⎜ x x ⎞⎟⎫
FT⎨ A⎜ , ⎬ = FT⎨A⎜ , A , ⎬
⎩ ⎝ λz ′ λz ′ ⎠ ⎭ ⎩ ⎝ λz′ λz′ ⎠ ⎝ λz′ λz′ ⎠⎭
= (λz′)2 p(λz′μx , λz′μy ) ⊗ p* (λz′μx , λz′μy ).
The self cross-correlation or auto correlation operation is denoted by the ⊗ symbol.
Correlation is deﬁned as follows:
∞
g (x ) = ∫−∞ f (x) h(x0 − x) dx0.
Compared to a convolution, the correlation operation does not ﬂip the function h
before the overlap is calculated. For auto correlation, f = h.
Substituting the above identity into equation (3.16) yields the following general
expression for the incoherent diffraction OTF:

Hdiff (μx , μy , λ) = U02 a(λz′μx , λz′μy ) ⊗ p* (λz′μx , λz′μy ) .

After normalising to unity at (0,0), the OTF is seen to be the normalised auto-
correlation function of the amplitude transfer function [13]. At the SP, the Gaussian
expression for z′ is given by equation (3.15),
n′
′ =
z′ = s′ − sxp DxpNw.
n
For the case of a circular lens aperture,
⎡ 2 2 ⎤
⎢ λz′ μx + μy ⎥ ⎡ λz′μ ⎤ ⎡μ ⎤
a circ(λz′μx , λz′μy ) = circ⎢ ⎥ = circ ⎢ r
⎥ = circ ⎢ r ⎥.
⎢⎣ D xp ⎥⎦ ⎣ Dxp ⎦ ⎣ μc ⎦

Here the radial spatial frequency μr and the quantity μc are deﬁned as follows:

x2 + y2 r
μr = =
λz ′ λz ′
Dxp
μc = .
λz ′
The diffraction OTF becomes
⎡μ ⎤ ⎡μ ⎤
Hdiff,circ(μx , μy , λ) = U02 circ⎢ r ⎥ ⊗ circ⎢ r ⎥ .
⎣ μc ⎦ ⎣ μc ⎦

3-30
Physics of Digital Photography (Second Edition)

The auto correlation can be performed by graphically calculating the overlap [13].
This yields the following normalised result:

⎧ ⎛ ⎞
⎪ 2 ⎜ −1 ⎛ μr ⎞ μr ⎛ μ ⎞2
⎟ for μr ⩽ 1
⎪ π ⎜cos ⎜ μ ⎟ − μ 1 − ⎜ μ ⎟
r
⎝ ⎠ ⎝ ⎟
Hdiff,circ(μr , λ) = ⎨ ⎝ c c c⎠ ⎠
μc
. (3.20)
⎪ μr
⎪0 for >1
⎩ μc

This expression is equivalent to the FT of equation (3.18) normalised to unity at

(0,0). Taking the modulus yields the MTF, which is usually expressed as a
percentage.
It is evident that the incoherent diffraction OTF for a circular aperture drops to
zero at the spatial frequency deﬁned by μc . This value can be interpreted as the cut-
off frequency due to diffraction in the presence of incoherent illumination. Any scene
information requiring spatial frequencies above μc on the SP will be lost due to the
effects of diffraction.
If the surrounding medium is air and focus is set at inﬁnity, z′ → DxpN and the
cut-off frequency becomes
1
μc = .
λN
Figure 3.16 illustrates how the cut-off frequency is lowered as N increases. This
behaviour is consistent with the discussion of diffraction softening given in section
3.2.4.

3.2.6 Aberrations: wavefront error

Recall from sections 1.3.1 and 1.5.10 of chapter 1 that the XP is a ﬂat plane when
deﬁned using Gaussian optics and is referred to as the XP plane. When a lens is
described using real optics, the real XP is actually a spherical wavefront that
intersects the XP plane at the OA.
A lens that produces an image differing from a perfect image projected onto the
SP only due to the effects of diffraction is said to be diffraction limited. In such a lens,
the wavefront emerging from the XP plane will be a perfect spherical wave
converging towards the ideal Gaussian image point [13]. Departures from this ideal
spherical wave are caused by residual lens aberrations, which Gaussian optics leaves
out of consideration.
One way of modelling aberrations is to replace the aperture function with the
pupil function p(x , y ),
p(xxp, yxp) = a(xxp, yxp) e i 2πW (x xp, yxp ).

A description of the wavefront error as a function of position on the XP plane is

speciﬁed by the aberration function or wavefront error function W (xxp, yxp ). For the

3-31
Physics of Digital Photography (Second Edition)

Figure 3.16. Optics OTF or MTF as a function of f-number for an ideal aberration-free lens with a circular
aperture. The wavelength has been taken as λ = 550 nm and so the cut-off frequency is μc = 1818/N.

lens design under consideration, this must be obtained numerically using lens design
software.
Figure 3.17 shows the geometry for defining the wavefront error [13]. A Gaussian
reference sphere that intersects the XP plane at the OA is constructed. This reference
sphere describes the shape of the ideal spherical wave as it emerges from the XP and
converges towards the ideal Gaussian image point defined by the paraxial chief ray.
For a real ray emerging from the XP at position (xxp, yxp ), the wavefront error
W (xxp, yxp ) describes the distance along the ray between the reference sphere and the
actual emerging wavefront (the real XP) that intersects the XP plane at the OA. The
wavefront error is measured in units of wavelength and is related to the optical path
difference (OPD) [14, 17],
OPD(xxp, yxp)
W (xxp, yxp) = .
λ
Multiplying the wavefront error function by 2π gives the phase difference [14, 17].
For example, if the OPD = λ/4 at a given position (xxp, yxp ), then W = 1/4 at the
same position. The peak-to-peak wavefront error WPP is the maximum OPD value
over all points on the reference sphere. The Rayleigh quarter-wave limit [18] specifies
WPP = 1/4 as an aberration allowance for a lens to be sensibly classed as perfect or
diffraction limited [19].
A useful measure for the overall effect of aberrations is the root mean square
(RMS) wavefront error WRMS. This is defined in terms of the mean of the squared

3-32
Physics of Digital Photography (Second Edition)

XP SP

(x, y)

(xxp , yxp )

Figure 3.17. Gaussian reference sphere (blue arc) and example real wavefront (green curve) at the XP. The
paraxial chief ray has been denoted in blue. The red line indicates the optical path difference OPD(xxp, yxp ) for
the example real wavefront.

wavefront error over the reference sphere coordinates minus the square of the mean
wavefront error,
WRMS = 〈W 2〉 − 〈W 〉2 .
An RMS wavefront error up to 0.07 corresponds to a small amount of aberration
content. Medium aberration content has an RMS wavefront error between 0.07 and
0.25, and a large aberration content has an RMS wavefront error above 0.25 [14].
Geometrical optics provides a good approximation to the MTF in the large
aberration content region, WRMS > 0.25.
The Strehl ratio is the ratio of the irradiance at the centre of the PSF for an
aberrated system compared to the irradiance at the centre of the PSF for a diffraction-
limited system. According to the Marechal criterion, a lens can be considered
essentially diffraction limited if the Strehl ratio is at least 80%. For defocus aberration,
where the plane of best focus no longer corresponds with the SP, the Merechal criterion
is identical to the Rayleigh quarter-wave limit as there is an exact relationship between
the peak-to-peak and RMS wavefront errors in this case, WPP = 3.5 WRMS.
Since the wavefront error generally becomes worse for light rays that are further
from the OA, the camera system PSF will no longer be isoplanatic over the entire
SP. In other words, the system can no longer be treated as LSI once a description of
aberrations is included. In practice, a way forward is to treat the system as a group of
subsystems that are LSI over speciﬁc regions of the SP [4, 6].

3.3 Sensor
Camera imaging sensors comprise a 2D array of sensor pixels or photosites. As
described in section 3.6.2, each photosite contains at least one photoelement that
converts light into stored charge.

3-33
Physics of Digital Photography (Second Edition)

The PSF due to diffraction derived in the previous section is straightforward to

visualise. Light from a point object spreads out over an area surrounding the ideal
optical image point. For a circular aperture, the central part of this area forms the
Airy disk. Significantly, the imaging sensor also causes light to spread out from the
ideal image point when the optical image is recorded, the reason being that
photosites are not point objects. A photosensitive detection area Adet exists at
each photosite, and the spectral radiant flux Φe,λ arising from the spectral irradiance
distribution at Adet is mixed together when generating the charge signal. This
blurring effect can be described by a spatial detector-aperture PSF, also referred to as
the detector PSF or pixel PSF.
Although many contributions to the overall sensor PSF can be defined [2, 3], in
this chapter only the spatial detector-aperture PSF will be considered.

3.3.1 Spatial averaging

Consider the output function defined as the ideal spectral irradiance distribution
f (x , y ) = Eλ,ideal(x , y ) at the SP convolved with the camera system PSF:
g(x , y ) = f (x , y ) ∗ hsystem(x , y ). (3.21)
At this stage in the development, only the contribution from the diffraction
component of the optics PSF has been included in the camera system PSF,
hsystem(x , y ) = hdiff (x , y ).
Sensor photosites have an area of order microns (μm) squared. However, not all of
this area is available for receiving light. The effective photosensitive area or flux
detection area Adet depends upon the sensor architecture and the nature and number
of photoelements used to generate charge contained within each photosite.
Furthermore, a microlens is typically fitted above each photosite in order to enlarge
Adet . Photosites in CCD sensors typically have rectangular photosensitive areas,
whereas photosites in CMOS sensors typically have notched rectangular or
L-shaped photosensitive areas due to the presence of circuitry obstructing the light
path [3]. An example of these type of areas viewed from above is shown in
figure 3.18.
The ratio of the flux detection area to the photosite area Ap specifies the fill factor (FF):

(a) (b)
Figure 3.18. Example (a) rectangular, and (b) notched rectangular photosensitive detection areas, Adet .

3-34
Physics of Digital Photography (Second Edition)

Adet
FF = . (3.22)
Ap

Since the spectral irradiance distribution over Adet contributes to generating the
charge signal for a photosite, mathematically the spectral irradiance must be
integrated over Adet to yield the spectral radiant ﬂux Φe,λ responsible for generating
the charge signal. Signiﬁcantly, integrating the spectral irradiance over Adet is
proportional to averaging over the same area,

∫ f (x , y ) dA = Adet f (x , y ) Adet ,
Adet

where the angled brackets denote a spatial average over Adet . This averaging blurs
the corresponding scene detail. As mentioned in the introduction to this section, the
blurring can be considered a form of point spread and can equivalently be expressed
in terms of a spatial detector-aperture PSF. The generation of the charge signal itself
will be discussed in section 3.6.

3.3.2 Detector-aperture PSF

Recall from sections 3.1.5 and 3.1.6 that the convolution operation slides the flipped
PSF over the input function, and integrates all point spread towards the point at
which the PSF is centred. This has direct analogy with the spectral irradiance spatial
integration over Adet described above. It will be shown below that the point at which
the integration is centred is the associated sampling point.
A rectangular detection area Adet = dx dy associated with a CCD sensor can be
represented by a 2D rectangle function,
⎡x y⎤ ⎡x⎤ ⎡y⎤
rect⎢ , ⎥ = rect⎢ ⎥ rect⎢ ⎥ .
⎣ dx d y ⎦ ⎣ dx ⎦ ⎣ dy ⎦
The 1D rectangle function was defined in section 3.1.6 and illustrated in
figure 3.4(b),
⎧1 ∣x∣ < d /2
⎡x ⎤ ⎪⎪1
rect⎢ ⎥ = ⎨ ∣x∣ = d /2 .
⎣d ⎦ ⎪2
⎪
⎩0 ∣x∣ > d /2
The PSF that represents the averaging over Adet is referred to as the detector-
aperture PSF [2–4]:

1 ⎡x⎤ ⎡y⎤
hdet−ap(x , y ) = rect⎢ ⎥ rect⎢ ⎥ . (3.23)
Adet ⎣ dx ⎦ ⎣ dy ⎦

3-35
Physics of Digital Photography (Second Edition)

1
dx dy 1
d
2 y

1
d
2 x

Figure 3.19. Spatial detector-aperture PSF.

The aperture area is deﬁned by Adet , and the 1/Adet factor ensures that
the convolution operation performs an averaging rather than an integration. The
1/Adet factor will be cancelled out when the charge signal is derived in section 3.6. A
2D square detector-aperture PSF is illustrated in ﬁgure 3.19. Detector-aperture PSFs
for a variety of aperture shapes can be found in the literature [20, 21].

3.3.3 Sampling
Recall that the output function g (x , y ) deﬁned by equation (3.21) is the input
function f (x , y ) convolved with the camera system PSF,
g(x , y ) = f (x , y ) ∗ hsystem(x , y ).
Along with the diffraction PSF, consider including the detector-aperture PSF
derived above as part of the camera system PSF,
hsystem(x , y ) = hdiff (x , y ) ∗ hdet−ap(x , y ).
Now at the centre of the detector aperture area, the convolution operation will
output the averaged value of f (x , y ) over the extent of the aperture as it slides over
f (x , y ). This will occur as the centre passes over every spatial coordinate on the SP,
and so the output function g (x , y ) is a continuous function of position.
However, only one output value for g (x , y ) associated with each photosite is
required. This means that the detector-aperture PSF must be accompanied by a
sampling operation that restricts the output of the convolution operation to the
appropriate grid of sampling positions.
Mathematically, the sampling of a function can be achieved by multiplying with
the Dirac comb function illustrated in ﬁgure 3.20,
⎡x y⎤ ∞ ∞
comb⎢ , ⎥ = ∑ ∑ δ(x − mpx ) δ(y − npy ).
⎢⎣ px py ⎥⎦ m =−∞ n =−∞

The Dirac comb function is a 2D array of delta functions. Points on the output grid
of sampling positions are separated by integer multiples of the pixel pitch in the
x and y directions, denoted as px and py. The pixel pitch in a given direction is equal

3-36
Physics of Digital Photography (Second Edition)

dx
x
px py

Figure 3.20. Detector sampling represented by a Dirac comb function. Each upward arrow represents a delta
function.

to the reciprocal of the number of photosites per unit area in that direction. For
square photosites, px = py = p.
Multiplying equation (3.21) by the Dirac comb yields the following sampled
output function denoted with a tilde symbol:
⎡x y⎤
g˜(x , y ) = (f (x , y ) ∗ hsystem(x , y )) comb⎢ , ⎥ . (3.24)
⎢⎣ px py ⎥⎦

Sampling can introduce an unwanted effect known as aliasing, which can be

minimised by ﬁtting an OLPF above the SP. Sampling and aliasing will be discussed
in section 3.4.

3.3.4 Detector-aperture MTF

For a rectangular detector aperture, the detector-aperture MTF is deﬁned by the
modulus of the FT of equation (3.23). The FT of the 1D rectangle function is given
by
⎧ ⎡ x ⎤⎫
FT⎨rect⎢ ⎥⎬ = ∣d ∣ sinc(dμx ).
⎩ ⎣ d ⎦⎭

Since ∣dxdy∣ = Adet , it follows that

MTFdet−ap(μx , μy ) = sinc(dxμx , d yμy )

sin(π dxμx ) sin(π d yμy ) . (3.25)

=
π dxμx π d yμy

3-37
Physics of Digital Photography (Second Edition)

In general, the extent of dx and dy that define Adet will be less than the pixel pitches
px and py.
Figure 3.21 and 3.22 illustrate the MTF for a square detector with dx = 3.8 μ m
and px = 4 μ m . In analogy with the lens aperture diffraction MTF, the detector-
aperture MTF also has a cut-off frequency. This is defined as the frequency where
the MTF first drops to zero. In the present example, the cut-off frequency is
(3.8 μ m)−1 = 263 cycles/mm.
The sensor Nyquist frequency μNyq discussed in the next section is the maximum
cut-off frequency required to prevent aliasing and is related to the pixel pitch. If
dx = px, then μNyq will be half the detector-aperture cut-off frequency. In the present
example, μNyq = 125 cycles/mm. The detector cut-off frequency is slightly higher
than 2 × μNyq since the FF is less than 100%. Although a higher FF increases quantum

Figure 3.21. Detector-aperture MTF in the x-direction for a rectangular detector. In this example the detection
area width dx is 3.8 μ m for a 4 μ m pixel pitch px. The detector cut-off frequency is 263 cycles/mm.

Figure 3.22. Detector-aperture MTF for a square detector.

3-38
Physics of Digital Photography (Second Edition)

efﬁciency (section 3.6), a higher FF also increases the width of the detector-aperture
PSF. This in turn increases the point spread and lowers the detector cut-off frequency.
The spatial frequency representation of equation (3.24) is discussed in the next
section.

3.4 Optical low-pass filter

An optical low-pass ﬁlter (OLPF) is commonly ﬁtted above the imaging sensor in
order to reduce an unwanted effect known as aliasing. For 2D images, aliasing
manifests itself as moiré patterns. Aliasing is a consequence of the sampling of the
spectral irradiance distribution at the SP modelled by equation (3.24),
⎡x y⎤
g˜(x , y ) = (f (x , y ) ∗ hsystem(x , y )) comb⎢ , ⎥ .
⎢⎣ px py ⎥⎦

The OLPF can reduce aliasing to an acceptable level by modifying the ﬂow of light
before it reaches the SP.
By design, the OLPF is another source of blur that can be modelled by a PSF.
This section begins with a brief introduction to sampling theory and the causes of
aliasing. Subsequently, it is shown that the introduction of an OLPF can minimise
aliasing, and an expression for the PSF and MTF due to the OLPF is given. These
can be included as part of the camera system PSF and MTF.

3.4.1 Function sampling

The function to be sampled is the output function g (x , y ) denoting the real spectral
irradiance distribution at the SP. This is modelled by a convolution of the system
PSF with the input function f (x , y ) denoting the ideal spectral irradiance distribu-
tion at the SP:
g(x , y ) = f (x , y ) ∗ hsystem(x , y ). (3.26)
At this stage of the development, the model camera system PSF includes the
diffraction PSF and detector-aperture PSF contributions,
hsystem(x , y ) = hdiff (x , y ) ∗ hdet−ap(x , y ).
In order to discuss the consequences of sampling a function, for clarity it is
convenient to consider the sampling of a 1D function denoted by f (x ). This 1D
function is related to its FT via the following FT pair:
∞
f (x ) = ∫−∞ F (μx) ei 2πμ x dμx
x

∞ .
F (μx ) = ∫−∞ f (x) e−i 2πμ x dx
x

A function with an FT that is zero everywhere outside some region or band is said to
be band limited [22, 23]. In the present context, the input function deﬁned by

3-39
Physics of Digital Photography (Second Edition)

equation (3.26) is always band limited because the diffraction OTF always has a
ﬁnite-valued cut-off frequency μc .
In analogy with section 3.3.3, the sampling of a continuous input function f (x )
that is band limited to the region [−μx,max , μx,max ] can be achieved by multiplying
with a Dirac comb function:
⎡ x ⎤
f˜ (x ) = f (x ) comb⎢ . (3.27)
⎣ Δx ⎥⎦

The tilde symbol indicates that the function is a sampled function. The spatial period
Δx is the spacing of the discrete sampling intervals. The Dirac comb in 1D is deﬁned
by
∞
⎡ x ⎤
comb⎢
⎣ Δx ⎥⎦
= ∑ δ(x − nΔx ).
n =−∞

In analogy with equation (3.6), the value of an arbitrary sample at location xn can be
obtained by integration,
∞
f ( xn ) = ∫−∞ f (x) δ(x − nΔx) dx = f (nΔx).

3.4.2 Replicated spectra

Apart from a scaling factor, the FT of the Dirac comb function is also a Dirac comb
function,
⎧ ⎡ x ⎤⎫ ⎡ μ ⎤
FT⎨comb⎢ ⎥⎬ = Δμx comb⎢ x ⎥ .
⎩ ⎣ Δx ⎦⎭ ⎣ Δμx ⎦
However, the spatial period is given by
1
Δμx = .
Δx
Recall from the convolution theorem introduced in section 3.1.7 that the convolution
operation in the real domain is equivalent to a simple multiplication in the Fourier
domain. Conversely, the multiplication operation in the real domain is equivalent to
a convolution operation in the Fourier domain. The FT of the sampled function
f˜ (x ) deﬁned by equation (3.27) is therefore given by

⎛ ⎡ μ ⎤⎞
F˜ (μx ) = F (μx ) ∗ ⎜⎜Δμx comb⎢ x ⎥⎟⎟ .
⎝ ⎣ Δμx ⎦⎠

Significantly, the convolution of the continuous function F (μx ) with a Dirac comb
function of infinite extent and period Δμx yields a periodic sequence of copies of
F (μx ) with period Δμx . This important result is illustrated in figure 3.23(a) and (b).

3-40
Physics of Digital Photography (Second Edition)

(a)

−Δμx −μmax μmax Δμx

(b)

−Δμx −μmax μmax Δμx

(c)

−Δμx −μmax μmax Δμx

(d)

−2Δμx −Δμx −μmax μmax Δμx 2Δμx

Figure 3.23. (a) Fourier transform F (μx ) shown by the triangle function (yellow) along with the Dirac
sampling comb (blue arrows). (b) The convolution of F (μx ) with the sampling comb yields copies of F (μx ) with
period Δμx . (c) Ideal reconstruction ﬁlter in the Fourier domain (red rectangle). (d) Aliasing caused by
undersampling.

Unlike the sampled function f˜ (x ), the Fourier transform F̃ (μx ) is a continuous

function.
The high-frequency harmonic copies of F (μx ) are referred to as replicated spectra
and are denoted here by Fhigh(μx ),

F˜ (μx ) = F (μx ) + Fhigh(μx ).

The replicated spectra can be considered as undesirable high-frequency components
that corrupt the original signal and are responsible for the discrete appearance of a
sampled image. However, if Fhigh(μx ) can be removed, then only the original
continuous signal, F (μx ), will remain [22, 23].

3.4.3 Reconstruction
In principle, the original function f (x ) can be recovered from its samples by
multiplying F̃ (μx ) with an ideal reconstruction filter or ideal low-pass filter. The ideal
reconstruction filter in the Fourier domain is a rectangle function defined by

3-41
Physics of Digital Photography (Second Edition)

⎧1/ Δμ , − μ
max ⩽ μ ⩽ μmax
Hideal(μx ) = ⎨ x
.
⎩0 otherwise
This isolates the so-called baseband F (μx ), as illustrated in ﬁgure 3.23(c).
Subsequently, f (x ) can be recovered by taking the inverse FT of the baseband.
It is also instructive to show how f (x ) can be recovered from its samples in the
real domain. The baseband can be written as follows:
F (μx ) = H (μx )F˜ (μx ).
Taking the inverse FT yields
f (x ) = FT−1{H (μx )F˜ (μx )} = f˜ (x ) ∗ h ideal(x ). (3.28)

The function h ideal(x ) is the inverse FT of the rectangle function Hideal(μx ),

sin(πx )
h ideal(x ) = sinc(x ) = .
πx
This reveals that the ideal reconstruction ﬁlter in the real domain is the sinc function.
Equation (3.28) may be rewritten in the following way:
∞
⎡ x − nΔx ⎤
f (x ) = ∑ ⎣ nΔx ⎥⎦
f (nΔx ) sinc⎢ .
n =−∞

The reconstructed function f (x ) is equal to the sample values at the sample

locations. Between samples, f (x ) is given by an exact interpolation defined by an
infinite sum of sinc functions.
In practice, a signal can only be reconstructed in an approximate way because the
infinite sum of sinc functions is not physically realisable. Digital image display relies
on low-pass filtering by the display medium and the HVS [2, 3]. At standard viewing
distances, the cut-off frequency of the HVS will eliminate the replicated spectra and
so the digital image will appear to be continuous.

3.4.4 Aliasing
Up to this point in the discussion, it has been assumed that exact reconstruction is
always possible. However, this is not the case, and two conditions are required [22, 23].
• The signal must be band limited. This prevents the existence of replicated
spectra of inﬁnite extent that are impossible to separate.
• The sampling rate must be greater than twice the maximum frequency present
in the signal. This ensures that the replicated spectra do not overlap and
corrupt the baseband.

Since the signal sampled at the SP will be band-limited by the diffraction PSF,
hdiff (x , y ), the first of these conditions is always satisfied for a camera system.
However, the second of these conditions depends upon a number of variables.
Notably, the sampling rate is fixed by the pixel pitch. Whether or not this sampling

3-42
Physics of Digital Photography (Second Edition)

Figure 3.24. Illustration of aliasing. In this example, the sampling indicated by the black circles is insufﬁcient
for reconstruction of the continuous signal shown in green. Consequently, the signal shown in green incorrectly
appears as the signal shown in magenta upon reconstruction.

rate is greater than twice the maximum frequency present in the signal depends upon
the extent of the low-pass filtering by the camera components. The most important
of these is the cut-off frequency defined by the PSF due to diffraction, hdiff (x , y ), and
this depends upon the lens f-number used to take the photograph. It will be shown
later in this section that an OLPF can be fitted above the sensor to ensure that the
second condition is always satisfied, irrespective of the lens f-number selected.
The second condition above is a statement of the Shannon–Whittaker sampling
theorem [24]. If the sampling theorem is not satisfied, a perfect copy of F (μx ) cannot
be isolated and so the original function will not be correctly recovered. This issue is
known as aliasing because higher spatial frequencies will incorrectly appear as lower
spatial frequencies in the reconstructed signal. A simple example is shown in
figure 3.24. Expressed mathematically, aliasing can be avoided by sampling at a
rate μx,Nyq that must satisfy

1
μx,Nyq = > 2μx,max . (3.29)
Δx

Here μx,max is the highest spatial frequency content of the function, and μx,Nyq is
known as the Nyquist rate. Figure 3.23(d) shows an example of aliasing by sampling
at a rate that fails to satisfy the above condition. This is known as undersampling.
The sampling theorem applied to a function of two spatial variables f (x , y ) yields
separate Nyquist rates in the x and y directions, μx,Nyq and μy,Nyq .

3.4.5 Sensor Nyquist frequency

The imaging sensor in a camera has ﬁxed photosites and so the sampling period is
ﬁxed in both the x and y directions. The sampling periods are Δx = px and Δy = py,
where px is the pixel pitch in the x direction, and py is the pixel pitch in the y
direction. The sampling rates are therefore
1 1
=
Δx px
1 1
= .
Δy py

3-43
Physics of Digital Photography (Second Edition)

According to the sampling theorem expressed by equation (3.29), the highest

frequency content that can be correctly reproduced in the x and y directions without
aliasing is deﬁned by
1
μx,sensor =
2px
1
μy,sensor = .
2py

The frequencies μx,sensor and μy,sensor are deﬁned as the sensor Nyquist frequencies in
the x and y directions. Although the detection areas are usually rectangular or L-
shaped, the photosites are typically square so that px = py = p. In this case it is usual
practice to refer to a single sensor Nyquist frequency,
1
μsensor = .
2p

Although uncommon, some cameras have non-square photosites. An interesting

example is the Nikon® D1X, which has a sensor with py = 2px .
For imaging sensors with a CFA, sensor Nyquist frequencies can be deﬁned for
each mosaic separately. This is illustrated in ﬁgure 3.25.

3.4.6 Pre-ﬁltering
The continuous function to be sampled at the SP is the real 2D spectral irradiance
distribution modelled by equation (3.26) and denoted as g (x , y ),
g(x , y ) = f (x , y ) ∗ hsystem(x , y ).
At present the camera system PSF includes the optics (diffraction) and sensor
(detector-aperture) contributions.
The FT of g (x , y ) is denoted as G(μx , μy ). In order to minimise aliasing, all spatial
frequency content present in G(μx , μy ) above the sensor Nyquist frequency μNyq
needs to be removed before g (x , y ) is sampled. In other words, g (x , y ) must be
py = 2a

√
2a
px = 2a a
Figure 3.25. For a Bayer CFA, the effective pixel pitch for the red and and blue mosaics is p = px = py = 2a .
The green mosaic has p = 2 a rotated at 45°, which results in a higher Nyquist frequency.

3-44
Physics of Digital Photography (Second Edition)

appropriately band limited to the sensor Nyquist frequency by pre-ﬁltering before

sampling at the SP.
One solution is to ﬁt an OLPF above the sensor. This strategy can reduce aliasing
to a minimal level, irrespective of the photographic conditions or exposure settings.

3.4.7 Four-spot ﬁlter PSF

Many cameras have a dedicated OLPF fitted above the sensor that acts as an anti-
aliasing filter [25, 26]. An OLPF is made from a birefringent material such as quartz
or lithium niobate that is capable of causing double refraction. The refractive index
of birefringent materials depends upon the polarisation and direction of the light
passing though, and so a ray of light entering the material can be split into two rays
taking separated paths. A four-spot birefringent OLPF is made of two plates so that
light from each point in the scene is spread over four points, the spot separation
value being determined by the thickness of the plates (figure 3.26).
The spectral irradiance distribution that would have resulted in the absence of the
OLPF is split into four distributions displaced slightly from each other. If the spot
separation is chosen to be the pixel pitch, splitting the light in this way will reduce
the modulation depth to zero at the sensor Nyquist frequency μNyq . Spectral
irradiance contributions from spatial frequencies above μNyq will be suppressed by
the remaining contributions to the camera system PSF [26]. The combination of
OLPF and other camera system PSF components effectively reduces aliasing to a
minimal level. This will become clear when discussing the MTF contributed by the
OLPF.
The PSF for an ideal four-spot birefringent OLPF can be modelled in terms of
delta functions representing the four spots [26]:

⎧ δ (x − a 0 )δ (y − b 0 ) ⎫
⎪ ⎪
1 ⎪+ δ (x − a1)δ(y − b1) ⎪
h OLPF(x , y ) = ⎨ ⎬ . (3.30)
4 ⎪+ δ (x − a2 )δ(y − b2 ) ⎪
⎪
⎩+ δ (x − a3)δ(y − b3) ⎪ ⎭

Figure 3.26. Four-spot OLPF. The ﬁrst plate splits the light in the horizontal direction, and the second plate
splits the light in the vertical direction.

3-45
Physics of Digital Photography (Second Edition)

The constants define the point separation and hence the strength of the filter. A
maximum strength filter is obtained by setting the point separation equal to the pixel
pitch:

(a 0, b0) = (0, 0)
(a1, b1) = (px , 0)
(3.31)
(a2 , b2 ) = (0, py )
(a3, b3) = (px , py ).

In this example, the spots are not situated symmetrically about the origin and so
there will be a phase contribution to the OTF. It is preferable that OLPF filters be
designed so that the spots are symmetrical about the origin. The use of a birefringent
OLPF has other benefits such as reduction of colour interpolation error when
demosaicing the raw data.
Objects containing fine repeated patterns are most susceptible to aliasing artefacts
since these patterns are most likely to be associated with well-defined spatial
frequencies above μNyq . The disadvantage of using an OLPF is that its PSF
contributes to the camera system PSF at all times, even when other contributions
such as lens aperture diffraction are already sufficiently band-limiting the spatial
frequency content. As discussed in chapter 5, this reduces the camera system MTF at
spatial frequencies below μNyq and reduces perceived image sharpness.
Cameras with very high sensor pixel counts have a lower sensor Nyquist
frequency and are therefore less prone to aliasing. It is becoming more common
for camera manufacturers to use a very weak OLPF or even completely remove it in
such cameras. Although any aliased scene information corresponding to frequencies
above μNyq cannot be recovered from a single frame, the photographer can use
image-processing software to reduce the prominence of the aliasing artefacts.

3.4.8 Four-spot ﬁlter MTF

The four-spot ﬁlter OTF is straightforwardly given by the FT of the PSF deﬁned by
equation (3.30),

⎧ exp( −i 2π (a 0 μx +b0 μy )) ⎫
⎪ ⎪
1 ⎪+ exp( −i 2π (a1 μx +b1 μy )) ⎪
HOLPF(μx , μy ) = ⎨ ⎬ . (3.32)
4 ⎪+ exp( −i 2π (a 2 μx +b2 μy ))⎪
⎪+ exp( −i 2π (a3 μx +b3 μy )) ⎪
⎩ ⎭

Selecting the constants corresponding to a maximum-strength ﬁlter, the OTF in the

x-direction may be written in terms of its modulus and phase,
HOLPF(μx ) = MTF(μx ) e i PTF(μx ).

3-46
Physics of Digital Photography (Second Edition)

Figure 3.27. MTF for a maximum-strength four-spot OLPF in the x-direction (blue). The detector-aperture
MTF (magenta) corresponds to a pixel pitch px = 4.0 μ m and detection area width px = 3.8 μ m . The product of
the OLPF and detector MTFs (black) drops to zero at μNyq = 125 cycles/mm.

The MTF and PTF are deﬁned by

MTFOLPF(μx ) = ∣cos(πpx μx )∣
.
PTFOLPF(μx ) = −πpx μx

The PTF arises from the fact that the four spots defined by equation (3.31) are not
symmetrical about the origin and so the overall image is shifted by half a pixel in the
x and y directions. The PTF will vanish if the spots can be arranged symmetrically
around the origin.
Figure 3.27 illustrates the maximum-strength four-spot MTF in the x-direction
along with the detector-aperture MTF for the same model parameters used in
section 3.3.4. The cut-off frequency for the detector-aperture MTF is 263 cycles/mm.
Since the cut-off frequency for the four-spot filter is defined by the sensor Nyquist
frequency μNyq = 125 cycles/mm, the combined MTF of the four-spot and detector-
aperture MTFs taken at each spatial frequency similarly drops to 125 cycles/mm.
The detector-aperture MTF suppresses the combined MTF above μNyq . Stronger
suppression will occur when other contributions to the camera system MTF such as
the optics MTF are included in the model, and this will reduce aliasing to a minimal
level.

3.5 Sampled convolved image

As a consequence of the sampling deﬁned by equation (3.24), the real or convolved
sampled spectral irradiance distribution at the SP is deﬁned by

3-47
Physics of Digital Photography (Second Edition)

⎡x y⎤
g˜(x , y , λ) = E˜λ(x , y ) = (f (x , y , λ) ∗ hsystem(x , y , λ)) comb⎢ , ⎥ .
⎢⎣ px py ⎥⎦

Here the dependence on wavelength λ has been reintroduced for clarity. Since the
detector-aperture contribution to the camera system PSF averages the spectral
irradiance distribution over each photosite, the comb function restricts the output of
the convolution operation to the appropriate grid of sampling positions defined by
the pixel pitches px and py.
The input function denoted by f (x , y ) is the ideal spectral irradiance distribution
at the SP defined by equation (3.2),
π ⎛x y⎞ 1 ⎧ ⎛ x y ⎞⎫
f (x , y , λ) = Eλ,ideal(x , y ) = Le,λ⎜ , ⎟ 2 T cos4 ⎨φ⎜ , ⎟⎬ .
4 ⎝ ⎠
m m Nw ⎩ ⎝ m m ⎠⎭
Here Lλ is the spectral scene radiance, m is the magnification, T is the lens
transmittance factor, and Nw is the working f-number. The cosine fourth term can
be replaced by the RI, R(x , y, λ ), which includes vignetting arising from a specific
real lens design [6].
For the model camera system introduced in section 3.1.10, the contributions to
the camera system PSF derived in this chapter can now be summarised, along with
the corresponding contributions to the camera system MTF in the Fourier domain.

3.5.1 Model camera system PSF

For the model camera system introduced in section 3.1.10, the camera system PSF
includes the diffraction contribution to the optics PSF, the detector-aperture
contribution to the sensor PSF, and the PSF due to a four-spot OLPF,
hsystem(x , y , λ) = hdiff (x , y , λ) ∗ h OLPF(x , y ) ∗ hdet−ap(x , y ).
The diffraction PSF for a circular aperture is deﬁned by equation (3.18),

U02 ⎛ ⎛ x ⎞2 ⎛ y ⎞2 ⎞
2
π 2
hdiff (x , y , λ) = Dxp jinc⎜⎜Dxp ⎜ ⎟ + ⎜ ⎟ ⎟⎟ .
(λz′)2 4 ⎝ ⎝ λz ′ ⎠ ⎝ λz ′ ⎠ ⎠

Here z′ is the distance from the XP to the SP, and Dxp is the diameter of the XP.
The PSF for a four-spot OLPF is deﬁned by equation (3.30),
⎧ δ (x − a 0 )δ (y − b 0 ) ⎫
⎪ ⎪
1 ⎪+ δ (x − a1)δ(y − b1) ⎪
h OLPF(x , y ) = ⎨ ⎬.
4 ⎪+ δ (x − a2 )δ(y − b2 ) ⎪
⎪ a3)δ(y − b3) ⎪
⎩+ δ (x − ⎭
For a full-strength ﬁlter, the constants take values (a 0, b0 ) = (0, 0), (a1, b1) = (px , 0),
(a2, b2 ) = (0, py ), and (a3, b3) = (px , py ).

3-48
Physics of Digital Photography (Second Edition)

Finally, the spatial detector-aperture PSF for a rectangular CCD detector is

deﬁned by equation (3.23),

1 ⎡x⎤ ⎡y⎤
hdet−ap(x , y ) = rect⎢ ⎥rect⎢ ⎥ .
Adet ⎣ dx ⎦ ⎣ dy ⎦

The detection area Adet = dx dy , where dx and dy are the horizontal and vertical
dimensions of the effective aperture area.

3.5.2 Model camera system MTF

The corresponding model camera system MTF is deﬁned by
MTFsystem(μx , μy , λ) = MTFdiff (μx , μy , λ) MTFOLPF(μx , μy ) MTFdet−ap(μx , μy ).

The MTF due to lens aperture diffraction for a circular aperture is deﬁned by
equation (3.20),
⎧ ⎛ ⎞
⎪ 2 ⎜ −1 ⎛ μr ⎞ μr ⎛ μ ⎞2
⎟ for μr ⩽ 1
⎪π ⎜ cos ⎜ ⎟ − 1 − ⎜ r⎟
⎝ μc ⎠ μc ⎝ μc ⎠ ⎟ μc
MTFdiff,circ(μr , λ) = ⎨ ⎝ ⎠ .
⎪ μ
⎪0 for r > 1
⎩ μc

Here μr = μx2 + μy2 is the radial spatial frequency, and μc = 1/(λNw ) is the cut-off
frequency due to diffraction.
The MTF due to a four-spot OLPF is deﬁned by equation (3.32),

exp( −i 2π (a 0 μx +b0 μy ))
1 + exp( −i 2π (a1 μx +b1 μy ))
MTFOLPF(μx , μy ) = .
4 + exp( −i 2π (a 2 μx +b2 μy ))
+ exp( −i 2π (a3 μx +b3 μy ))

For a full-strength ﬁlter, the constants are the same as those for the PSF given above.
Finally, the spatial detector-aperture MTF for a rectangular CCD detector is
deﬁned by equation (3.25),

MTFdet−ap(μx , μy ) = sinc(dxμx , d yμy )

sin(π dxμx ) sin(π d yμy )

= .
π dxμx π d yμy

Again, dx and dy are the horizontal and vertical dimensions of the effective aperture
area.

3-49
Physics of Digital Photography (Second Edition)

3.6 Charge signal

As described in the introduction to section 3.1, cameras do not respond to light in
the same manner as the HVS. This means that a spectral radiometric measure of
exposure is required for modelling the electronic response of the camera imaging
sensor in terms of the charge signal generated at a given photosite (sensor pixel). In
contrast to the photometric weighting that describes the response of the HVS, in the
present context the camera response functions provide the appropriate spectral
weighting when integrating over the spectral passband of the camera.
Recall from section 2.12.1 of chapter 2 that although the response of a camera is
not the same as the photometric response of the HVS, these responses will ideally be
proportional to each other, and this important property enables camera raw data to
represent the visual properties of the scene.
In the present section, the charge signal obtained at each photosite on the SP is
derived. The derivation requires the following information as inputs:
• The expression for the sampled convolved optical image at the SP summar-
ised in the previous section. This can be used to deﬁne the sampled spectral
exposure incident at each photosite.
• The camera response functions. These depend upon the nature of the photo-
elements used to convert the spectral exposure into electronic charge. For
colour cameras, the camera response functions also depend upon the proper-
ties of the CFA ﬁtted above the imaging sensor.

3.6.1 Sampled spectral exposure

Recall the expression for the real or convolved sampled spectral irradiance
distribution at the SP deﬁned by equation (3.24),
⎡x y⎤
E˜e,λ(x , y ) = (Ee,λ,ideal(x , y ) ∗ hsystem(x , y , λ)) comb⎢ , ⎥ . (3.33)
⎢⎣ px py ⎥⎦

The detector-aperture contribution to the camera system PSF averages the spectral
irradiance distribution over the flux detection area of each photosite, and the comb
function restricts the output of the convolution operation to the appropriate grid of
sampling positions defined by the pixel pitches px and py. In other words, the (x , y )
coordinates correspond with the array of photosite centres.
Spectral exposure is generally defined as spectral irradiance integrated over the
exposure duration,
t

He,λ = ∫ Ee,λ(t′)dt′
0

The integration can be replaced by the product if Ee,λ(t′) is time-independent,

He,λ = t × Ee,λ.

3-50
Physics of Digital Photography (Second Edition)

If the scene radiance distribution is time-independent, then E˜e,λ(x , y ) deﬁned by

equation (3.33) will also be time-independent. In this case, the real or convolved
sampled spectral exposure distribution at the SP can be deﬁned as follows:
⎡x y⎤
H˜ e,λ(x , y ) = t (Ee,λ,ideal(x , y ) ∗ hsystem(x , y , λ)) comb⎢ , ⎥ . (3.34)
⎢⎣ px py ⎥⎦

3.6.2 Photoelements
Each photosite contains at least one photoelement such as a photogate (MOS
capacitor) or a photodiode that convert light into stored charge [27–31].
Figure 3.28 illustrates the basic structure of a photogate. A polysilicon electrode
situated above doped p-type silicon is held at a positive bias. This causes mobile
positive holes to flow towards the ground electrode while leaving behind the
immobile negative acceptor impurities, thus creating a depletion region. The spectral
flux Φe,λ incident at the photogate can be considered as a stream of photons. Each
photon has electromagnetic energy hc/λ , where c is the speed of light and h is Plank’s
constant. Silicon has a band gap energy of 1.12 eV, which means that the depletion
region can absorb photons with wavelengths shorter than 1100 nm [32]. When an
incident photon is absorbed in the depletion region, an electron–hole pair is created.
The gate voltage prevents the electron–hole pairs from recombining. The holes flow
towards the ground electrode and leave behind the electrons as stored charge. The
well capacity is determined by factors such as gate electrode area, substrate doping,
gate voltage, and oxide layer thickness.
In a photodiode, the depletion region is created by a reverse bias applied to the
junction between n-type and p-type silicon. Overlaying electrodes are not required,
and the well capacity is limited by the junction width.

microlens

colour ﬁlter

polysilicon gate

SiO2 layer

depletion
region p-type Si

Figure 3.28. Simpliﬁed illustration of a photogate or MOS capacitor.

3-51
Physics of Digital Photography (Second Edition)

Imaging sensors based on CCD architecture traditionally use photogates, whereas

sensors based on CMOS architecture traditionally use photodiodes [3]. However,
varieties of photoelements are increasingly being used with either architecture. As
discussed in section 3.6.6, the main difference between CCD and CMOS remains the
strategy for reading out the generated charge.
An important characteristic of a photoelement is its charge collection efﬁciency
(CCE) denoted by η(λ ), also referred to as the internal quantum efﬁciency. This
function of wavelength expresses the charge successfully stored as a fraction of the
charge generated [32].
• η(λ ) = 1 for photons absorbed in the depletion region. In other words, all
electrons generated in the depletion region contribute to the collected charge.
• η(λ ) < 1 for photons absorbed in the bulk Si region. This is due to the fact
that only a fraction of the electron–hole pairs created in the bulk will avoid
recombination by successfully diffusing to the depletion region. This tends to
occur at longer wavelengths as the photons can penetrate more deeply into
the bulk.

3.6.3 Colour ﬁlter array

Recall that the HVS is sensitive to wavelengths between approximately 380 nm and
780 nm. The human eye contains three types of cone cells with photon absorption
properties that can be described by a set of eye cone response functions l¯(λ ), m̄(λ ),
s̄(λ ). These different responses ultimately lead to the visual physiological sensation
of colour.
A camera requires an analogous set of response functions to detect colour. In
order for colour to be correctly recorded by the camera, a linear transformation
should in principle exist between the camera response functions and the eye cone
response functions. This topic will be discussed further in chapter 4.
In consumer cameras, the appropriate response functions are obtained by fitting a
colour filter array (CFA) above the imaging sensor in order to alter the spectral
composition of the flux reaching the detection areas. The Bayer CFA illustrated in
figure 3.29(a) uses a 2 × 2 block pattern of red, green and blue filters in order to
generate three or four types of mosaic. The filters have different spectral

(a) (b)
Figure 3.29. Example CFAs. (a) Bayer CFA. (b) Fuji X-Trans® CFA.
®

3-52
Physics of Digital Photography (Second Edition)

transmission properties described by a set of spectral transmission functions

TCFA,i (λ ), where i denotes the mosaic label. It will be shown in the next section
that the overall camera response is determined largely by the product of TCFA,i (λ ) and
the CCE of the photoelement.
The Bayer CFA uses a greater number of green filters since the HVS is more
sensitive to wavelengths in the green region of the visible spectrum. This is evident
from figure 3.1, which shows that the peak of the standard 1924 CIE luminosity
function for photopic vision V (λ ) lies in the green region. Although it is beneficial to
use a greater number of green filters, there is nothing significant about the 2:1 ratio
used by the Bayer CFA. Other types of CFA have also been developed. For
example, the Fuji® X-Trans® CFA illustrated in figure 3.29(b) uses a 6 × 6 block
pattern. This type of CFA is more expensive to manufacture and the arrangement
requires greater processing power, however it can give improved IQ.
Recall that the camera spectral passband describes the range of wavelengths over
which it responds. Ideally the camera will not respond to non-visible light. In
practice, an infrared blocking filter is combined with the CFA to limit the camera
response outside of the visible spectrum.
After the imaging sensor has been exposed to light and the camera has generated
the digital raw data, only the digital value of a single colour component will be
known at each photosite, for example, red, green or blue. A computational
procedure described in chapter 4 known as color demosaicing can estimate the
missing values. After the colour demosaic has been performed, all colour compo-
nents will be known at every photosite.

3.6.4 Camera response functions

The spectral flux incident at a given photosite can be written as follows:
Φe,λ = Adet E˜e,λ(x , y ).
Here E˜e,λ(x , y ) is the sampled convolved spectral irradiance at the photosite defined
by equation (3.33), and Adet is the flux detection area associated with the photosite.
Since each photon has energy hc/λ, the number of photons with wavelength λ
incident at Adet during the exposure duration t is given by
λ
n ph(λ) = Φe,λ t.
hc
Substituting for the spectral flux yields
λ ˜
n ph(λ) = tAdet Ee,λ(x , y ).
hc
This equation can be written in the following way:
λ ˜
n ph(λ) = Adet He,λ. (3.35)
hc

3-53
Physics of Digital Photography (Second Edition)

Here H˜ e,λ is the sampled spectral exposure defined by equation (3.34), and the
sampling coordinates (x , y ) have been dropped for clarity.
Only a fraction of the photon count defined by equation (3.35) will be converted
into stored electric charge. When a CFA is fitted above the imaging sensor, the
average number of stored electrons generated per incident photon with wavelength λ
can be expressed in the following way:
ne,i (λ) = n ph,i (λ) QEi (λ). (3.36)
Here i is the mosaic label. The overall external quantum efficiency [32], or simply the
quantum efficiency (QE), is defined by

QEi (λ) = η(λ) FF T (λ) TCFA,i (λ) . (3.37)

In the absence of a CFA, the QE becomes independent of mosaic label i. The QE

depends greatly upon the photoelement configuration and sensor architecture. The
major contributions can be defined as follows:
• Charge collection efficiency η(λ ): The CCE denoted by η(λ ) was introduced in
section 3.6.2. It describes the charge that is successfully stored as a fraction of
the charge generated through photoconversion, and is also referred to as the
internal QE. Although η(λ ) = 1 when electrons are generated in the depletion
region, electrons generated deep in the bulk neutral region may fail to
successfully diffuse to the depletion region. In this case, η(λ ) < 1. The CCE
reduces at longer wavelengths since the photons can penetrate deeper into the
bulk. Ideally the CCE will remain high in the visible region below 780 nm.
This is achieved by tuning the material thickness and applied voltage to
control the diffusion length. The diffusion length can also be increased by
controlled doping.
• Fill factor (FF): The FF defined by equation (3.22) describes the ratio of the
flux detection area to the photosite area Ap,

Adet
FF = .
Ap
• Transmission function T (λ ): This function takes into account unwanted
surface absorption and reflectance effects. Reflection at the SiO2–Si interface
can be reduced by using anti-reflective films [32]. In front-illuminated devices
with photogates, the polysilicon electrodes reduce sensitivity in the blue
region below 600nm and become opaque below 400 nm [3]. Backside-
illuminated devices do not suffer from reduced sensitivity in the blue region
provided appropriate anti-reflective coating is applied and the wafer is
thinned to minimise recombination [3].
• CFA transmission function TCFA,i : This is dependent upon the mosaic label i.
The set of CFA transmission functions were introduced in the previous
section.

3-54
Physics of Digital Photography (Second Edition)

Equations (3.35), (3.36) and (3.37) can be combined,

λ ˜
ne,i (λ) = Ap QEi (λ) He,λ,i .
hc
Since the FF has been included in the deﬁnition of the QE, the photosite area Ap
appears in the above expression rather than the ﬂux detection area Adet .
The electron count is the total number of electrons stored at a given photosite
during the exposure. This is obtained by integrating ne,i (λ ) over the spectral passband
of the camera, λ1 → λ2 ,
λ2
λ ˜
ne,i = Ap ∫ QEi (λ)
hc
He,λ,i dλ . (3.38)
λ1

As discussed in the previous section, the spectral passband of the camera should
ideally correspond to that of the HVS. The above equation can alternatively be
expressed in the following way:
λ2
Ap
ne,i =
e
∫ Ri (λ)H˜ e,λ,i dλ . (3.39)
λ1

Here e is the elementary charge of an electron, and Ri (λ ) is the spectral responsivity

for mosaic i,
eλ
Ri (λ) = QEi (λ)
hc
The set of spectral responsivity functions measured in amperes per watt (A/W) are
the set of weighting functions such that

∫ Ri(λ) Φλ,i dλ = Ii .
Here Ii is the total generated photocurrent measured in amperes at a photosite
belonging to mosaic i,
Qi
Ii = .
t
The charge signal Qi at a photosite belonging to mosaic i is deﬁned by

Qi = ne,i e . (3.40)

The set of spectral responsivity functions are also known as the photoconversion
response functions or the camera response functions. Figure 3.30 illustrates an
example set [33]. There will be only one response function in the absence of a CFA.
The camera response functions should ideally be linear functions of spectral
exposure and therefore linear functions of radiant exposure.

3-55
Physics of Digital Photography (Second Edition)

Figure 3.30. Relative spectral responsivity Ri (λ ) for each colour channel of the Nikon® D700 as a function of
wavelength. The curves have been normalised by the peak response.

3.6.5 Polychromatic PSF and MTF

When deriving the charge signal in the previous section, the polychromatic effects of
incoherent illumination were taken into account by integrating all relevant wave-
length-dependent quantities over the spectral passband of the camera.
It is also possible to directly deﬁne polychromatic PSFs. However, these are valid
only for the spectral conditions under which they were calculated. In other words,
the camera system PSF must be weighted at each wavelength with respect to the
camera response functions and illumination source,
λ2

∫ Ri (λ) Le,λ hsystem(x , y , λ) dλ

λ1
h poly(x , y ) = λ2
.
∫ Ri (λ) Le,λ dλ
λ1

Here the Ri (λ ) are the camera response functions, and Le,λ is the spectral radiance of
the illumination source. More generally, Le,λ can take the form of a spectral power
distribution (SPD) to be deﬁned in chapter 4. The magnitude of Le,λ is removed
through normalisation as only the spectral characteristics of Le,λ are relevant [4, 34].
The corresponding OTF is deﬁned by

3-56
Physics of Digital Photography (Second Edition)

λ2

∫ Ri (λ) Le,λ Hsystem(μx , μy , λ) dλ

λ1
Hpoly(μx , μy ) = λ2
.
∫ Ri (λ) Le,λ dλ
λ1

Analogous expressions can be written for individual system components. For

example, the polychromatic optics MTF can be extracted after replacing the camera
system OTF with the optics OTF in the above expression.
If the polychromatic camera system PSF is known for a given set of spectral
conditions, the electron count under the same conditions can be modelled in the
following way:

tAp
λ2 ⎡x y⎤
ne,i = ∫ Ri (λ)(Eλ,ideal(x , y ) ∗ h poly(x , y )) comb⎢ , ⎥dλ .
e ⎢⎣ px py ⎥⎦
λ1

3.6.6 Charge detection

A major difference between CCD and CMOS architecture lies in the strategy used
for reading out the charge signal Qi.
CCD photosites typically contain up to four photogates with overlapping
depletion regions. By adjusting the gate voltages through systematic clocking,
charge can be transferred very quickly from one photogate to another. A parallel-
to-serial horizontal shift register situated at the end of the columns transports the
charge packets from each column to an output charge amplifier, one row at a time, in
a serial manner [3]. The charge readout mechanism as a function of time for a 2 × 2
group of photosites at the corner of a CCD sensor is illustrated in figure 3.31. The
charge amplifier converts the detected charge into a voltage signal.
In contrast, charge detection occurs inside each photosite in active pixel CMOS
sensors, and each active pixel contains connection circuitry for address and readout.

photosite photogate electrons column

photosite

charge
motion

charge
detection
charge
motion
Figure 3.31. Charge readout mechanism as a function of time for a 2 × 2 block of photosites at the corner of a
CCD sensor.

3-57
Physics of Digital Photography (Second Edition)

Despite the above differences, the actual charge detection process is similar for
both CCD and CMOS sensors [32]. The charge signal at a photosite is converted by
a charge ampliﬁer into a voltage Vp :
Gp
Vp = Q. (3.41)
C
The mosaic label i has been dropped for clarity. Here C is the sense node capacitance,
and Gp is the source follower ampliﬁer gain of order unity [3]. The maximum voltage
Vp,FWC occurs at full-well capacity (FWC):
Gp
Vp,FWC = QFWC. (3.42)
C
The value QFWC is the maximum charge that the photosite can hold. The conversion
gain [32] describes the output voltage change per electron in μV/e− units,
Gp
GCG = e.
C

3.7 Analog-to-digital conversion

After the charge signal at a given photosite has been detected, the voltage Vp is
amplified by a programmable gain amplifier (PGA) to within the input range of an
analog-to-digital converter (ADC). The ADC converts the amplified analog voltage
into a digital raw value, which can be one of a number of discrete raw revels. The raw
values for all photosites are stored as raw data. A simple model for the analog-to-
digital conversion process is developed in this section.

3.7.1 Programmable ISO gain

The PGA ampliﬁes the voltage Vp deﬁned by equation (3.41) to a useful level V that
lies within the input range of the ADC,
V = GVp. (3.43)

For a CCD sensor, the PGA will be off-chip. The programmable gain G is controlled
by the camera ISO setting S.
As described in chapter 2, the numerical value of S is determined from the JPEG
output and not the raw data. However, the actual numerical value of S is not
important in the present context. The important aspect is that doubling S doubles G.
In the simplest case, S controls a single analog PGA. Some cameras use a second-
stage PGA for intermediate ISO settings.
The maximum output voltage Vmax occurs at the maximum raw level. The ISO
setting that uses the least analog amplification to achieve the maximum raw value is
defined as the base ISO setting, Sbase . This is typically designed to occur when FWC
is utilised. Assuming that FWC is utilised, the corresponding gain G = G base satisfies

3-58
Physics of Digital Photography (Second Edition)

Vmax = G base Vp,FWC, (3.44)

where Vp,FWC is deﬁned by equation (3.42). It is useful to express the programmable
gain G as the product of the base value G base and an ISO gain G ISO,
G = G base G ISO. (3.45)
At the base ISO setting Sbase , the ISO gain takes a value of unity, G ISO = 1.
Substituting equation (3.45) into (3.43) yields
V = G base G ISO Vp. (3.46)
Each time S is doubled from the base value, G ISO is doubled and therefore V is
doubled. For a speciﬁed output voltage V obtained at a given G ISO, the advantage of
doubling G ISO is that the same V can be obtained by halving Vp . Halving Vp
corresponds to halving the electron count ne and therefore halving the required
radiant exposure at the photosite. This is useful when a short exposure duration is
required at low scene luminance levels.
However, combining equations (3.44) and (3.46) produces the following result:
V Vp G ISO
= . (3.47)
Vmax Vp,FWC

Since V cannot be larger than Vmax , the following relation always holds:
Vp G ISO ⩽ Vp,FWC .

When G ISO is raised above the base value (G ISO = 1), the detected voltage
Vp < Vp,FWC and so FWC is not utilised. For example, Vp ⩽ 0.5 Vp,FWC when
G ISO = 2, which might correspond to a doubling of S from say ISO 100
(G ISO = 1) to ISO 200 (G ISO = 2). There are disadvantages of not utilising FWC:
• Raw dynamic range (raw DR) is reduced, meaning that less scene contrast can
be reproduced in the raw data.
• Less radiant exposure is utilised at the maximum V, and this lowers signal-to-
noise ratio (SNR).

Signal noise will be discussed in section 3.8. SNR and raw DR are discussed in
chapter 5.

3.7.2 Digital numbers

A perfectly linear ADC can be modelled as a quantization by integer part of the
fraction V /Vmax into a raw level [3]:

⎡ V ⎤
n DN = INT⎢ k (2M − 1)⎥ . (3.48)
⎣ Vmax ⎦

3-59
Physics of Digital Photography (Second Edition)

• nDN is a raw level expressed either as a digital number (DN), or identically as

an analog-to-digital unit (ADU).
• M is the bit depth of the ADC For example, a 12-bit ADC provides
212 = 4096 possible raw levels ranging from DN = 0 to DN = 4095. A 10-bit
ADC provide 1024 raw levels, and a 14-bit ADC provides 16 384 levels. For a
given Vmax , a larger bit depth deﬁnes a smaller quantization step between the raw
levels.
• The maximum possible raw level is obtained when V = Vmax ,
n DN,max = INT[k (2M − 1)]. (3.49)
The constant k ⩽ 1 has been included to take into account situations where
the maximum raw level is less than 2M − 1. Although it is advantageous to
utilise all possible raw levels, in some cameras the maximum is obtained
without utilising all levels provided by the ADC. Conversely, a value k = 1
does not necessarily imply that FWC is utilised since an ADC with insufﬁcient
bit depth may be employed in rare cases. Note that the effective maximum
possible raw level may be reduced further by the presence of a bias offset. This
is discussed in section 3.7.4.

3.7.3 Conversion factor

It is possible to substitute the voltages appearing in equation (3.48) by electron
counts. Substituting equations (3.41), (3.42) and (3.47) into equation (3.48) and
utilising the fact that Q = ne e yields the following result:
⎡n ⎤
n DN = INT⎢ e ⎥ . (3.50)
⎣g⎦

The conversion factor g is deﬁned by

U
g= , (3.51)
G ISO
where U is the unity gain,
ne,FWC
U= .
k(2M − 1)
Although g is conventionally referred to as the gain in scientiﬁc photography, it is
simply a conversion factor between electron counts and raw values, and has units
e−/DN [35, 36]. Note the following:
• The conversion factor g is inversely proportional to G ISO. Under typical
photographic conditions at the base ISO setting, the electron count is larger
than 2M − 1 and so g ⩾ 1. For example, 10 000 electron counts converted to
1000 DN implies that g = 10 e−/DN. In this respect, the behaviour of g is
more characteristic of an inverse gain. Indeed it is sometimes deﬁned in DN/e−
units, in which case g should be replaced by g −1 in the above equations.

3-60
Physics of Digital Photography (Second Edition)

For generality, the mosaic index i should be included since g may not be
identical for different mosaics.
• If the ISO setting S and associated ISO gain G ISO are doubled, a given nDN
will result from half the radiant exposure He and half the electron count ne.
However, g will also be halved and so the raw value nDN will remain
unchanged. Consistent with the discussion in section 3.7.1, this can be useful
when needing to freeze the appearance moving action or when an exposure
duration short enough to counteract camera-shake in low-light conditions is
needed.
• When S is adjusted so that G ISO = U, the conversion factor becomes g = 1. In
this case, there is a one-to-one correspondence between electron counts and
raw levels. Because U depends on factors such as the bit depth of the ADC,
FWC, and the constant k, the unity gain will differ between camera models.

3.7.4 Bias offset

Camera manufacturers may add a bias offset voltage Vbias to the signal voltage before
it is quantized by the ADC In this case, equations (3.48) and (3.50) become
⎡ (k V + Vbias ) ⎤
n DN + n DN,bias = INT⎢ (2M − 1)⎥
⎣ Vmax ⎦
⎡n ⎤ ⎡V ⎤
= INT⎢ e ⎥ + INT⎢ bias (2M − 1)⎥ .
⎣g⎦ ⎣ Vmax ⎦

One reason for adding the bias offset is that its presence can aid the analysis of signal
noise. However, the raw value nDN,bias must be subtracted from the raw data when
the raw data file undergoes conversion into a viewable output digital image. Some
camera manufacturers automatically subtract nDN,bias before the raw data file is
written.
Recall that equation (3.49) defined the maximum possible raw level as
n DN,max = INT[k (2M − 1)] ⩽ 2M − 1.
However, the following condition must also hold in the presence of a bias offset:
n DN,max + n DN,bias ⩽ 2M − 1.
This means that when Vbias is included and nDN,bias subtracted, the effective maximum
raw level or raw clipping point nDN,clip is lower than nDN,max ,
n DN,clip = n DN,max − n DN,bias .
Since nDN,bias is an offset rather than a factor, subtracting nDN,bias does not
necessarily reduce the maximum achievable raw DR.

3.8 Noise
Signal noise can be deﬁned as unwanted variations in the voltage signal. Unless
eliminated, the noise will appear in the raw data. Signal noise can be broadly

3-61
Physics of Digital Photography (Second Edition)

classiﬁed into two main types. Temporal noise arises from the fact that the voltage
signal always ﬂuctuates over short time scales. A major contribution to the temporal
noise arises from the quantum nature of light itself. Contributions to the total
temporal noise also arise from a variety of camera components. Fixed pattern noise
(FPN) is a type of noise that does not vary over time.
In digital cameras, the main contributions to the total temporal noise are photon
shot noise, read noise, and dark-current shot noise [32, 35].

3.8.1 Photon shot noise

When multiple photographs are taken under identical conditions, each raw data
file can be referred to as a frame. Significantly, even if the scene radiance
distribution is time-independent, the charge signal distribution over the SP
obtained from an exposure duration t will not be identical each time the exposure
is repeated. In other words, the frames will not be identical. This is because the
number of photons arriving at any given photosite is not a constant function of
time, but rather it fluctuates randomly over very short time scales. This is an
unavoidable consequence of the quantum nature of light, and the fluctuations
result in photon shot noise.
Although fluctuations occur over very short times scales, the mean signal at a
given photosite is considered to be stationary, meaning that the time average over an
exposure duration t that is long compared to the time scale over which the
fluctuations occur can be replaced by the ensemble average over all possible
fluctuation configurations. Photon shot noise can then be measured by the standard
deviation from the mean value of its statistical distribution. In other words, the
charge signal defined by equations (3.38) and (3.40) describes the mean or expected
charge signal at a given photosite for a specified exposure duration t, and the
associated photon shot noise measured in electron counts is denoted by σe,ph .
Since photons obey Poisson statistics, an important characteristic of the Poisson
distribution is that its variance, or standard deviation squared, is equal to the mean
value itself [16]:

σe2,ph = ne (3.52)

Although photon shot noise is a dominant source of noise at high exposure levels, it
increases as the square root of the electron count and so it becomes relatively less
signiﬁcant as the exposure level increases. In other words, the SNR increases with
exposure level.

3.8.2 Read noise

Read noise mainly arises from voltage fluctuations in the readout circuitry. It will be
present whenever charge readout occurs, even in the absence of irradiance at the SP.
Read noise defines the noise floor. The readout circuitry includes the charge
detection circuitry and PGA. Dark signal and dark-current shot noise (described

3-62
Physics of Digital Photography (Second Edition)

below) are generated inside a photoelement and are not included in the deﬁnition of
the read noise [32]. Forms of read noise include the following:
• Thermal (Johnson-Nyquist) noise: This occurs due to thermal agitation of
electrons.
• Flicker (1/f) noise: This is a circuit resistance ﬂuctuation that appears at low
frequencies.
• Reset or kTC noise: This is a type of thermal noise due to reset of the photon-
well capacitance. In CMOS sensors it can be minimzed using the correlated
double-sampling (CDS) technique [32].

Since read noise arises from voltage fluctuations in the readout circuitry rather than
fluctuations in the charge signal, it is measured as a standard deviation using DNs in
the raw data, σDN,read . However, read noise can be converted into electron counts by
using the conversion factor g defined in section 3.7.3. In this case the read noise is
denoted as σe,read . The conversion between DNs and electron counts is discussed
further in section 3.9.
A certain amount of read noise will already be present in the voltage signal before
it is amplified by the PGA. This part of the read noise is defined as the upstream read
noise and it will be amplified when the ISO gain G ISO is increased. On the other hand,
downstream read noise arises due to circuitry downstream from the PGA and is
therefore independent of G ISO.

3.8.3 Dark current shot noise

Even in the absence of irradiance at the SP, a charge signal Qdark = ne,dark e referred
to as the dark signal will be generated due to thermal agitation of electrons inside the
photoelements [32]. The corresponding dark current generated at a given photosite is
defined as
ne,dark e
Idark = .
t
Here the exposure duration t is referred to as the integration time. This term applies
both in the presence and absence of irradiance at the SP.
Dark current shot noise σe,dcsn arises from fluctuations of the dark current over
very short times scales. It occurs both in the presence and absence of irradiance at
the SP. Although negligible at short integrations times, σe,dcsn is dependent on
temperature and can become a dominant noise source at long integration times such
as those used in long-exposure photography.
By utilising the average raw value of so-called optical black photosites that are
positioned at the edges of the imaging sensor and shielded from light, cameras can
measure and subtract the dark signal before the raw data file is written. More
sophisticated methods will compensate for shading gradients with respect to row or
column as a function of the operating temperature [32]. However, subtraction of the
dark signal will not eliminate σe,dcsn .

3-63
Physics of Digital Photography (Second Edition)

3.8.4 Noise power

The noise power of a given temporal noise source is deﬁned by the variance of its
statistical distribution, or equivalently the square of the standard deviation, σ 2 .
Signiﬁcantly, it is the noise powers that are summed when multiple independent
sources of temporal noise are present,
2
σtotal = σ12 + σ22 + σ32 + ⋯
This means that if there are M independent sources of temporal noise present, the
total temporal noise is given by
M
σtotal = ∑ σm2 . (3.53)
m=1

This is known as summing in quadrature. A characteristic of temporal noise is that it

can be reduced by frame averaging. This technique is described in chapter 5.

3.8.5 Fixed pattern noise

Unlike temporal noise, FPN has a fixed pattern associated with it that does not
change from frame to frame. Due to the ability of the HVS to detect patterns, FPN it
is often the most visually unpleasant type of noise [37]. It can categorized into two
main types:
• Dark-signal nonuniformity (DSNU): Denoted by nDN,dsnu or ne,dsnu , this refers
to FPN that occurs in the absence of irradiance at the SP. The major
contribution to DSNU for long integration times is dark-current nonuniform-
ity (DCNU), which arises from variations in photosite response over the SP
to the thermally induced dark current generated at each photosite. Another
contribution is from amplifier glow. Scientific CCD sensors used in astropho-
tography are often cooled in order to reduce DSNU.
• Pixel response nonuniformity (PRNU): Denoted by nDN,prnu or ne,prnu , this
refers to FPN that occurs in the presence of irradiance at the SP. Although
PRNU again arises from variations in photosite response over the SP, its
magnitude increases in proportion with the exposure level.

Independent contributions to the total FPN are noise signals that are added directly.
This differs from independent contributions to the total temporal noise, which are
added in quadrature. Since FPN does not change from frame to frame, it can in
principle be removed. This is described in the next section.
There is another type of pattern noise that should be mentioned. Although read
noise should in principle be purely temporal and form a Gaussian distribution about
the expected signal, the read noise in some cameras is not purely Gaussian but has
some periodic pattern component arising from circuit interference [35]. Although the
pattern component is not ﬁxed from frame to frame and therefore cannot be strictly
categorized as DSNU, any overall pattern can be detected by averaging over many

3-64
Physics of Digital Photography (Second Edition)

bias frames, a bias frame being a frame taken with zero integration time at a
speciﬁed ISO setting.

3.9 Noise measurement

When measuring noise, it is important to analyse raw frames that have not
undergone any further processing. For example, the freeware raw converter dcraw
can be used to decode the raw data without performing a colour demosaic.
Noise measured using the raw data will be in terms of DN (or ADU). These units
are described as output-referred units. However, noise can subsequently be expressed
using electron count by applying the conversion factor g deﬁned in section 3.7.3. The
conversion factor is dependent upon ISO setting S. Electron counts are described as
input-referred units. Signiﬁcantly, noise contributions such as read noise that were
not present in the original charge signal can also be expressed in terms of electron
counts.

3.9.1 Conversion factor measurement

A useful technique for measuring the conversion factor g is to carry out a temporal
noise measurement according to the procedure detailed below. Effectively, the
photon shot noise measured in DN is used as a signal that provides the required
information [32, 35, 36]. Ideally, all colour channels should be measured individ-
ually. Only the green channel is considered here.
1. Take two successive frames of a uniformly illuminated surface such as a
neutral grey card. The two frames must be of equal duration so that they will
be identical apart from the statistical variation in temporal noise.
2. Add the two frames together and measure the mean green-channel raw value
in a fixed area at the centre of the frame. Dividing by two gives the mean
green-channel raw value in the fixed area for the two frames.
3. Subtract one frame from the other and divide the result by 2. This removes
the FPN through cancellation and averages the temporal noise. Now
measure the standard deviation in the fixed area. Since the temporal noise
1
adds in quadrature, the measured noise is given by 2 σ 2 + ( −σ )2 = σ / 2 ,
where σ is the noise (per pixel in the green channel) of either of the original
frames. The temporal noise will now have been reduced by a factor of 2 .
Multiplying by 2 yields the temporal noise σ for either of the original
frames in the absence of FPN.
4. Repeat the procedure a number of times for a given ISO setting, each for a
different integration time, starting from the minimum available in manual
mode such as 1/8000 s.
2
5. Plot a graph of σDN along the vertical axis versus mean green-channel raw
value nDN along the horizontal axis and fit the result to a straight line. Since
the dark-current shot noise will be negligible at the minimum available
integration time, the intercept with the y-axis will be the square of the read

3-65
Physics of Digital Photography (Second Edition)

2 2
noise, σDN,read , measured in DN. The square of the photon shot noise, σDN,ph ,
2
is given by subtracting σDN,read from the graph value.

Figure 3.32 shows a temporal noise measurement for the Olympus® E-M1 at ISO
1600. The ﬁtted straight line is shown near the origin.
The value of the conversion factor g emerges from the above measurement. The ﬁrst
step is to utilise equation (3.52), which states that the square of the photon shot noise is
equal to the mean signal itself when both are expressed using input-referred units,
σe2,ph = ne .

Dividing both sides by the conversion factor yields

σe2,ph ne
= . (3.54)
g g
From equation (3.50) it is known that
ne
= n DN ,
g
and
⎛ σe,ph ⎞2 2
⎜ ⎟ = σDN,ph.
⎝ g ⎠
Substituting both of these into equation (3.54) yields the ﬁnal result,
2 n DN
σDN,ph = .
g

Figure 3.32. Temporal noise measurement for the Olympus® E-M1 at ISO 1600.

3-66
Physics of Digital Photography (Second Edition)

This means that the value of g at the selected ISO setting is obtained as the inverse of
the gradient of the ﬁtted straight line [32, 35, 36].

3.9.2 Read noise measurement

Although a read noise estimate will emerge when performing the temporal noise
measurement described above, alternative methods can be used to determine the
read noise more accurately.
Recall from section 3.8.5 that a bias frame is a frame taken with zero integration
time. Dark current will be absent when the integration time is zero, and so a bias
frame contains only read noise. A bias frame can be approximated by a dark frame
taken using the shortest exposure duration available in manual mode, for example,
1/8000 s. In order to ensure the frame is dark, it should be taken in a dark room with
the lens cap covering the lens.
Recall from section 3.7.4 that some camera manufacturers leave a bias offset in
the raw data. In this case, the mean of the read noise distribution in a bias frame will
be the bias offset, nDN,bias , and the standard deviation of the distribution can be
measured accurately. Example read noise distributions for the Olympus® E-M1 are
shown in ﬁgure 3.33 for several ISO settings. This camera includes a bias offset
centred at DN = 256, and the read noise appears to have ideal Gaussian character.
Any non-Gaussian character that may arise, such as periodic pattern components
due to circuit interference, can be analysed by taking the FT of the bias frame.
The mean of the read noise distribution will be centred at DN = 0 if the camera
manufacturer does not leave a bias offset in the raw data, and this makes analysis
more difﬁcult since half of the distribution will be clipped to zero [35].

Figure 3.33. Gaussian read noise distribution measured in DN for the Olympus® E-M1 at a selection of high
ISO settings. The distribution is centred at the DN = 256 bias offset present in the raw data.

3-67
Physics of Digital Photography (Second Edition)

3.9.3 Noise models

Once the conversion factor g is known at each ISO setting, noise measured using
output-referred units (DN) can be converted into input-referred units (electron
count). Significantly, this includes the read noise even though this was not part of the
original detected charge.
The upper plot in figure 3.34 shows read noise measured using DN for the
Olympus® E-M1 as a function of ISO setting, S. As expected, the read noise
increases as S is raised from its base value at ISO 200 through to ISO 25600. The
middle plot shows the conversion factor g as a function of S. The data indicates the
presence of a single-stage PGA for all S. Furthermore, the same analog gain is used
for ISO 100 and ISO 200, indicating that the ISO 100 setting is an extended low-ISO
setting achieved using overexposure and JPEG tone curve manipulation.
It is often assumed that higher ISO settings are noisier, and this assertion appears
to be suggested by the above data. In fact, the opposite is true. This can be illustrated
by using g to express the read noise in terms of electron count. As illustrated by the
lower plot in figure 3.34, the read noise expressed using electron count decreases as S
is raised from ISO 200 through to ISO 25600. In other words, for a fixed exposure at
the SP, which results in a fixed charge signal, the higher ISO settings are seen to be
less noisy. The explanation for this lies in the fact that part of the read noise arises
from circuitry downstream from the PGA, and this contribution is not amplified by
raising S.
The reason that higher ISO settings often produce noisier raw in data in practice
is that a high S is typically used to compensate when insufficient exposure is
available at the SP, and it is the reduced exposure that is responsible for the noisier
raw data rather than the higher S. SNR as a function of S is discussed in detail in
chapter 5.
It is straightforward to develop temporal noise models for cameras that use a
single-stage PGA. The read noise component can be modelled in the following way:
2
σDN,read = (Sσ0)2 + σ12.

The ISO setting S is proportional to the PGA gain defined by equation (3.45) of
section 3.7. The read noise term σ0 describes the contribution to the total read noise
arising from electronics upstream from the PGA, and the read noise term σ1
describes the contribution from electronics downstream from the PGA [35]. Since
these terms describe temporal noise, they must be added in quadrature. This model
can be fitted to the data of figure 3.34.
As a second example, figure 3.35 shows measured data for the Nikon® D700. The
behaviour of the conversion factor suggests that all S below ISO 200 are extended
low-ISO settings. Furthermore, the read noise data appears to indicate the use of a
second-stage PGA at certain S. A more sophisticated noise model is required for a
two-stage PGA [35],
2 2
σDN,read = M 2{(Sσ0)2 + σ12} + σ22.

3-68
Physics of Digital Photography (Second Edition)

Figure 3.34. Read noise and conversion factor measurement for the Olympus® E-M1 plotted as a function of
ISO setting using base 2 logarithmic axes. (Upper) Output-referred read noise expressed using digital numbers
(DN). Data measured by the author. (Centre) Conversion factor measured in electrons/DN. Data courtesy of
W. J. Claff. (Lower) Calculated input-referred read noise measured in electrons.

Here σ0 is the read noise upstream from the PGA, S is the main (first-stage) ISO
setting, and σ1 is the noise contribution from the first-stage amplifier. For inter-
mediate S, the multiplier M for the second-stage amplifier takes a value of 1.25 or
1.6, and all read noise present is amplified. The final term σ2 is the read noise

3-69
Physics of Digital Photography (Second Edition)

Figure 3.35. Read noise and conversion factor measurement for the Nikon® D700 plotted as a function of ISO
setting using base 2 logarithmic axes. Data courtesy of W. J. Claff. (Upper) Output-referred read noise
expressed using digital numbers (DN). (Centre) Conversion factor expressed using electrons/DN. (Lower)
Input-referred read noise expressed using electrons.

contribution from the second-stage ampliﬁer along with the read noise downstream
from the PGA.
Since FPN can in principle be removed, only a temporal noise model needs to be
included to complete the raw data model derived in this chapter. Read noise can be

3-70
Physics of Digital Photography (Second Edition)

included using models such as those given above, and photon shot noise expressed as
a standard deviation with zero mean varies as the square root of the signal expressed
using input-referred units (electrons) according to equation (3.52). Conversion
between input-referred and output-referred units (DN) is achieved using the
conversion factor deﬁned by equation (3.51). The temporal noise model expressed
using input-referred units and output-referred units can be added to equations (3.39)
and (3.50), respectively.

References
[1] Gaskill J D 1978 Linear Systems, Fourier Transforms, and Optics (New York: Wiley-
Interscience)
[2] Holst G C 1998 CCD Arrays, Cameras, and Displays 2nd edn (Winter Park, FL: JCD
Publishing, and Bellingham, WA: SPIE)
[3] Holst G C and Lomheim T S 2011 CMOS/CCD Sensors and Camera Systems 2nd edn
(Winter Park, FL: JCD Publishing, and Bellingham, WA: SPIE)
[4] Fiete R D 2010 Modeling the Imaging Chain of Digital Cameras, SPIE Tutorial Text vol
TT92 (Tutorial Texts in Optical Engineering) (Bellingham, WA: SPIE Press)
[5] Farrell J E, Xiao F, Catrysse P B and Wandell B A 2003 A simulation tool for evaluating
digital camera image quality Proc. SPIE 5294 124
[6] Maeda P, Catrysse P and Wandell B 2005 Integrating lens design with digital camera
simulation Proc. SPIE 5678 48
[7] Farrell J E, Catrysse P B and Wandell B A 2012 Digital camera simulation Appl. Opt. 51
A80
[8] Gonzalez R C and Woods R E 2008 Digital Image Processing 3rd edn (Englewood Cliffs,
NJ: Prentice Hall)
[9] Palmer J M and Grant B G 2009 The Art of Radiometry SPIE Press Monograph vol. 184
(Bellingham, WA: SPIE Press)
[10] Camera & Imaging Products Association 2004 Sensitivity of Digital Cameras CIPA DC-004
[11] International Organization for Standardization 2006 Photography—Digital Still Cameras—
Determination of Exposure Index, ISO Speed Ratings, Standard Output Sensitivity, and
Recommended Exposure Index, ISO 12232:2006
[12] Nasse H H 2008 How to Read MTF Curves (Carl Zeiss Camera Lens Division)
[13] Goodman J 2004 Introduction to Fourier Optics 3rd edn (Englewood, CO: Roberts and
Company)
[14] Shannon R R 1997 The Art and Science of Optical Design (Cambridge: Cambridge
University Press)
[15] Boreman G D 2001 Modulation Transfer Function in Optical and Electro-Optical Systems,
SPIE Tutorial Texts in Optical Engineering Vol. TT52 (Bellingham, WA: SPIE Publications)
[16] Saleh B E A and Teich M C 2007 Fundamentals of Photonics 2nd edn (New York: Wiley-
Interscience)
[17] Shannon R R 1994 Optical speciﬁcations Handbook of Optics (New York: Mc-Graw-Hill)
ch 35
[18] Born M and Wolf E 1999 Principles of Optics: Electromagnetic Theory of Propagation,
Interference and Diffraction of Light 7th edn (Cambridge: Cambridge University Press)
[19] Smith W J 2007 Modern Optical Engineering 4th edn (New York: McGraw-Hill)

3-71
Physics of Digital Photography (Second Edition)

[20] Yadid-Pecht O 2000 Geometrical modulation transfer function for different pixel active area
shapes Opt. Eng. 39 859
[21] Fliegel K 2004 Modeling and measurement of image sensor characteristics Radioengineering
13 27
[22] Wolberg G 1990 Digital Image Warping 1st edn (Piscataway, NJ: IEEE Computer Society
Press)
[23] Gonzalez R C and Woods R E 2007 Digital Image Processing 3rd edn (Englewood Cliffs,
NJ: Prentice-Hall)
[24] Shannon C E 1949 Communication in the presence of noise Proc. Inst. Radio Eng. 37 10
[25] Greivenkamp J E 1990 Color dependent optical preﬁlter for the suppression of aliasing
artifacts Appl. Optics 29 676
[26] Palum R 2009 Optical antialiasing ﬁlters Single-Sensor Imaging: Methods and Applications
for Digital Cameras ed R Lukac (Boca Raton, FL: CRC Press) ch 4
[27] Theuwissen A J P 1995 Solid-State Imaging with Charge-Coupled Devices (Dordrecht:
Kluwer)
[28] Sze S M and Ng K K 2006 Physics of Semiconductor Devices 3rd edn (New York: Wiley-
Interscience)
[29] Sze S M and Lee M-K 2012 Semiconductor Devices: Physics and Technology 3rd edn (New
York: Wiley)
[30] Yamada T 2006 CCD image sensors Image Sensors and Signal Processing for Digital Still
Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 4
[31] Takayanagi I 2006 CMOS image sensors Image Sensors and Signal Processing for Digital
Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 5
[32] Nakamura J 2006 Basics of image sensors Image Sensors and Signal Processing for Digital
Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 3
[33] Jiang J, Liu D, Gu J and Susstrunk S 2013 What is the space of spectral sensitivity functions
for digital color cameras? IEEE Workshop on the Applications of Computer Vision (WACV)
(Clearwater Beach, FL) (Piscataway, NJ: IEEE Computer Society) 168–79
[34] Subbarao M 1990 Optical transfer function of a diffraction-limited system for polychromatic
illumination Appl. Optics 29 554
[35] Martinec E 2008 Noise, Dynamic Range and Bit Depth in Digital SLRs unpublished
[36] Mizoguchi T 2006 Evaluation of image sensors Image Sensors and Signal Processing for
Digital Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 6
[37] Sato K 2006 Image-processing algorithms Image Sensors and Signal Processing for Digital
Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 8

3-72
IOP Publishing

Physics of Digital Photography (Second Edition)

D A Rowlands

Chapter 4
Raw conversion

The objective of raw conversion is to convert a recorded raw file into a viewable
output image file. Although this objective was briefly summarised in section 2.1 of
chapter 2, numerous procedures are required in practice. In traditional digital
cameras, fundamental steps in the raw conversion process include the following:
1. Linearisation: The raw data may be stored in a compressed form that needs
to be decompressed for processing, for example, by using a look-up
table (LUT).
2. Dark signal subtraction: As mentioned in section 3.8 of chapter 3, cameras
can measure and subtract the dark signal before the raw data file is written by
utilising the average raw value of the optical black photosites that are
positioned at the edges of the imaging sensor and shielded from light. More
sophisticated methods will compensate for shading gradients with respect to
row or column as a function of the operating temperature.
3. White balance: In traditional digital cameras, this is achieved by direct
application of multipliers to the raw channels in order that equal raw values
correspond to a neutral subject.
4. Colour demosaic: Raw data obtained from a sensor with a colour filter array
(CFA) contains incomplete colour information. The missing data needs to be
determined through interpolation.
5. Colour-space transformation: The demosaiced raw data resides in the internal
camera raw space and needs to be transformed into a standard device-
independent output-referred colour space such as sRGB for viewing on a
display. In traditional digital cameras, this is achieved through application of
a colour rotation matrix that can be derived after characterising the camera
by mapping the internal camera raw space to a reference colour space such as
CIE XYZ.
6. Image processing: Various proprietary image-processing algorithms will be
applied by the in-camera image processing engine.

doi:10.1088/978-0-7503-2558-5ch4 4-1 ª IOP Publishing Ltd 2020

Physics of Digital Photography (Second Edition)

7. Tone mapping: For preferred tone reproduction, luminance compression is

performed through application of a tone-mapping operator such as the s-
shaped global tone curve described in section 2.3.2 of chapter 2.
8. Bit-depth reduction: As described in section 2.2 of chapter 2, bit-depth
reduction to 8 is performed in conjunction with gamma encoding in order
to minimize visible banding or posterisation artefacts. The nonlinear gamma
encoding curve is applied before the occupied raw levels are converted to
digital output levels (DOLs). The nonlinearity is compensated for through
gamma decoding by the display device. Gamma encoding is typically
combined with the tone curve of the previous step through application of
a LUT.
9. JPEG encoding: As described in section 2.13.1 of chapter 2, the DOLs are
transformed into the Y ′CRCB representation, where Y ′ denotes luma. The CR
and CB chroma components are subsampled for storage efﬁciency.

The steps may be implemented slightly differently in smartphone cameras. For

example, white balance and colour conversion may be performed by using a
chromatic adaptation transform and camera characterisation matrix rather than
raw channel multipliers and a colour rotation matrix. These different strategies are
described in section 4.6. Furthermore, modern smartphone cameras typically use
local tone-mapping operators rather than global tone curves.
Since steps 7–9 have already been discussed in chapter 2, the present chapter
mainly focuses on the details of the colour conversion steps 3–5 that were brieﬂy
summarised in section 2.1.2 of chapter 2. Additionally, section 4.11 discusses several
image management issues that arise when raw conversion software is used to
perform the raw conversion.

4.1 Reference colour spaces

A colour space is a specific range of colours that can be specified by an associated
colour model. Reference colour spaces contain all possible colours, and examples
include LMS, CIE RGB, CIE XYZ and CIE LAB. These reference colour spaces
differ only by their associated colour model.
Reference colour spaces are vital for theoretical work as they can be used to
translate colours from one colour space to another but are not suitable for viewing
images. Indeed, a conventional monitor is only capable of displaying a much smaller
range of colours. A wide variety of smaller colour spaces have been defined that
contain fewer colours than reference colour spaces. These smaller colour spaces have
various applications and may differ by the colours that they contain or by their
associated colour model. A commonly encountered colour model is the RGB colour
model, which uses a Cartesian coordinate system. Output-referred colour spaces are
standard device-independent colour spaces that are suitable for viewing images on
conventional display monitors, and examples include sRGB and Adobe® RGB.
A given camera model has its own internal camera raw space. As it is beneficial to
preserve as much scene information as possible before converting to an output- referred

4-2
Physics of Digital Photography (Second Edition)

colour space, an ideal camera raw space would contain all possible colours. In other
words, it would be a reference colour space associated with a non-standard colour
model. In practice, camera raw spaces are very large but are dependent upon camera
model and do not contain all possible colours. In fact, camera raw spaces cannot be
regarded as true colour spaces at all unless a condition known as the Luther–Ives
condition is satisfied. This condition is discussed in section 4.4.
In general, camera raw spaces can be regarded as large approximate colour
spaces. Before any processing of the raw data can occur, a camera must be
characterized by establishing a linear mapping between the camera raw space and
a reference colour space such as CIE XYZ. Unless the Luther–Ives condition is
satisfied exactly, this linear mapping can only be defined in an approximate way.
After a camera has been characterized, colours specified in the camera raw space can
be converted into colours specified in an appropriate output-referred colour space.
This colour space conversion will be performed as part of the conversion of the raw
data into a viewable output digital image.
The reference colour space specified by the International Color Consortium (ICC)
is known as the profile connection space (PCS) [1], and this can be based on either
CIE LAB or CIE XYZ.

4.1.1 Theory of colour

Colour is a perceived human physiological sensation to electromagnetic waves with
wavelengths in the visible region, which ranges from approximately 380 nm to 780
nm. A common way to categorize colour is in terms of luminance and chromaticity
descriptors. Luminance has already been defined in section 1.5.1 of chapter 1. Note
the following:
• Chromaticity can be further divided into hue and saturation components.
• A hue or pure spectrum colour is a colour that can be seen in a rainbow. These
can be divided into six main colour regions: red, orange, yellow, green, blue,
and violet. Figure 4.1 shows that there is a smooth transition from one colour
region to the next. This means that the term ‘red’ does not define any specific
colour, and any of the hues corresponding to the first colour region may be
described as being red. The same reasoning applies to the other colour
regions.
• Colours that are not pure spectrum colours have been diluted with white
light, which is a mixture of all possible hues. Adding white light to a hue
decreases the saturation of the colour. Pure spectrum colours are fully

400 450 500 550 600 650 700 750

Figure 4.1. Pure spectrum colours as a function of wavelength λ in nanometres.

4-3
Physics of Digital Photography (Second Edition)

saturated. An example of a colour that has reduced saturation is pink, which

is obtained from mixing a red hue with white light.
• Fully desaturated or achromatic colours have no saturation component and
can only be described using a grey scale.
• The term monochrome is used to describe an image containing colours that
may vary in luminance but not in chromaticity. A monochrome image is not
necessarily achromatic.

Light incident upon the retina of the eye is generally composed of a mixture of
electromagnetic waves of various wavelengths. This polychromatic mixture can be
characterised by its spectral power distribution (SPD), P(λ ). This can be any function
of power contribution at each wavelength such as spectral radiance, Le,λ , which was
introduced in chapter 3.
For a given set of viewing conditions, there may be many different SPDs that
appear to have the same colour to a human observer. For example, different SPDs
may be found that appear to have the same colour as a pure monochromatic
spectrum colour. This concept is known as the principle of metamerism, and a group
of SPDs that yield the same visual colour response under identical viewing
conditions are known as metamers. The principle of metamerism is fundamental
to the theory introduced in the following sections.

4.1.2 Eye cone response functions

Most humans are trichromats, which means that the human eye has three types of
cone cells that act as light receptors. The three types of cone cells are known as long,
medium, and short since their responses peak in the longer, medium, and shorter
wavelength regions of the visible spectrum, respectively. According to the Young–
Helmholtz theory, the physiological sensation of colour arises from differences in
these cone responses in terms of photon absorption as a function of wavelength.
Mathematically, the responses can be described by eye cone response functions l¯(λ ),
m̄(λ ), and s̄(λ ).
Trichromacy is important because it permits the eye to make metameric colour
matches using a set of three eye cone primary colours or eye cone primaries λL , λM,
and λS in a way that resembles linear algebra. The eye cone response functions l¯(λ ),
m̄(λ ), and s̄(λ ) can be interpreted as the amounts of the eye cone primaries that the
eye uses at a given λ to sense colour. By definition, all colours spanned by the eye
cone primaries define a reference colour space. This is known as the LMS colour
space.
As illustrated in figure 4.2, there is in fact considerable overlap between l¯(λ ), m̄(λ ),
s̄(λ ), and so the eye cones cannot be uniquely stimulated. This means that the eye
cone primaries λL , λM, λS cannot be observed as individual colours. Although they
belong to the visible spectrum of wavelengths, their saturations are greater than pure
spectrum colours and so they are invisible. Accordingly, the eye cone primaries are
described as imaginary primaries.

4-4
Physics of Digital Photography (Second Edition)

Figure 4.2. Normalised eye response curves based on the Stiles and Burch 10° colour-matching functions [2]
adjusted to 2°. Arbitrary colours have been used for the curves.

The colour of an SPD denoted by P can be associated with a set of three

tristimulus values L, M , S obtained by integrating the product of P and the eye cone
response functions over the visible spectrum,
780 780 780
L= ∫380 P(λ) l¯(λ) dλ , M= ∫380 P (λ ) m
¯ ( λ ) dλ , S= ∫380 P(λ) s¯(λ) dλ .

Under the same viewing conditions, the same L, M , S triple will result from
metameric SPDs. The set of all distinct L, M , S triples deﬁnes the LMS colour
space.
Alternative reference colour spaces such as CIE XYZ have mathematical
properties that are more useful for digital imaging.

4.1.3 Colour-matching functions

In the early twentieth century it was not possible to directly measure the eye cone
response functions accurately. Consequently, an alternative set of functions known
as colour-matching functions were derived from visual colour-matching experiments.
The colour-matching functions were used instead of the eye cone response functions
to deﬁne a reference colour space that contains all possible colours.
In the previous section, it was shown that trichromacy permits the human eye to
make metameric colour matches using a set of three eye cone primaries. Grassman’s
laws are a generalisation of this concept. They state that any set of three linearly
independent monochromatic light sources can be chosen as primaries, and any
target colour deﬁned by an SPD can be matched by a linear combination of the
chosen primaries. However, the linear combination must permit ‘negative’ amounts,
as discussed below.

4-5
Physics of Digital Photography (Second Edition)

In 1931, the Commission Internationale de l’Eclairage (CIE) (translated as

‘International Commission on Illumination’) considered a set of data that had
been obtained experimentally by Wright (1928–29) and Guild (1931) in which the
following set of real primaries were chosen:
λR = 700 nm,
λ G = 546.1 nm,
λB = 435.8 nm.
Human observers had been asked to visually match the colour of a target
monochromatic light source of wavelength λ within a viewing angle of 2° by mixing
any amounts of the three primaries. The aim of the experiments was to obtain a set
of colour-matching functions r̄(λ ), ḡ(λ ), b̄(λ ) that would deﬁne the amounts of the
primaries λR , λ G , λB needed for a metameric colour match to be made with each hue
or pure spectrum colour.
The results of the colour-matching experiments can be summarised by the
following expression:
[E(λ)] ≡ r¯(λ)[R ] + g¯(λ)[G ] + b¯(λ)[B ]. (4.1)

Here [E(λ )] is one unit of the monochromatic target colour, and [R], [G], [B] each
denote one unit of the primaries. The nature of these units is discussed in section
4.1.4 below. The colour-matching functions are shown in ﬁgure 4.3. They deﬁne the
CIE 2° standard colourimetric observer representative of normal human colour
vision and serve as mathematical functions that can be used to obtain metameric
colour matches. The associated reference colour space is known as CIE RGB, which
is discussed in section 4.1.6. Colour-matching functions obtained using a 10° viewing

Figure 4.3. 1931 CIE colour-matching functions, r̄(λ ), ḡ(λ ), b̄(λ ), which deﬁne the 2° standard observer. The
units are deﬁned according to equation (4.2) so that the area under each curve is the same.

4-6
Physics of Digital Photography (Second Edition)

angle were later defined in 1964, and these define the CIE 10° standard colourimetric
observer.
Notice that the colour-matching functions have negative values at certain wave-
lengths. In these cases it was found that a primary needed to be added to the target
colour to obtain a colour match rather than be mixed with the other primaries. This
is a consequence of the fact that real primaries are being used, and so the linear
combination must permit negative values. In fact, a set of three colour matching
functions that do not have any negative values can only be associated with
imaginary primaries such as the eye cone primaries. Ultimately, this is a conse-
quence of the fact that the eye cone primaries cannot be uniquely stimulated due to
the overlap between the eye cone response functions.
It should also be noted that any set of three linearly independent monochromatic
sources can be chosen as primaries. However, sources with wavelengths in each of
the red, green and blue regions of the visible spectrum are most useful in practice
since a large range of colours can be matched by such a choice when only positive
linear combinations of the primaries are allowed, as in the case of a display device.
Significantly, it follows from Grassman’s laws that all valid sets of colour matching
functions are related to each other via a linear transformation.

4.1.4 Units
It is important to understand the units that were deﬁned by the CIE when the
experimental data of Wright and Guild was analysed.
According to equation (4.1), one unit of the monochromatic target colour
denoted by [E(λ )] was deﬁned such that
[E(λ)] ≡ r¯(λ)[R ] + g¯(λ)[G ] + b¯(λ)[B ],
where [R], [G], [B] each denote one unit of the associated primaries λR, λ G , λB.
Rather than use absolute power units, the CIE normalised the units so that the
following result is obtained when both sides of the above expression are integrated
over the visible spectrum:

[E] ≡ [R ] + [G ] + [B ] . (4.2)

Here [E] is deﬁned by

780
1[E] = ∫380 1[E(λ)] dλ .

This means that the units for the primaries are deﬁned such that one unit of each will
match the colour of a hypothetical polychromatic equi-energy source that has
constant power at all wavelengths. This SPD is known as illuminant E.
Consequently, the area under each of the curves appearing in ﬁgure 4.3 is the
same. Since the [R], [G], [B] units each represent their own individual quantities of
power, they must be treated as dimensionless.

4-7
Physics of Digital Photography (Second Edition)

Since the primary units are dimensionless, it is important to keep track of the
luminance and radiance ratios implied by the units. The luminance ratio between
[E], [R ], [G ] and [B ], respectively, is given by

1 : 0.17697 : 0.81240 : 0.01063 . (4.3)

This implies that 0.17697 cd m−2 of the red primary, 0.81240 cd m−2 of the green
primary, and 0.01063 cd m−2 of the blue primary would be needed to match 1 cd/m2
of illuminant E.
The standard luminosity function V (λ ) that was introduced in section 3.1.1 of
chapter 3 can be used to obtain the corresponding radiance ratio [3]. This function is
discussed further in the next section. By utilising the value of V (λ ) at the primary
wavelengths, the radiance ratio between the units is found to be

1 : 0.96805 : 0.01852 : 0.01343 .

4.1.5 Standard luminosity function

Recall from section 3.1.1 of chapter 3 that the sensitivity of the human visual system
(HVS) to daylight is described by the standard luminosity function for photopic
vision denoted by V (λ ) or ȳ(λ ). This function is needed to convert radiometric
quantities into photometric (visible) quantities.
When the colour-matching functions are used to obtain a colour match at a given
λ according to equation (4.1), the total luminance on both sides of the equation must
be equal. This follows from the fact that luminance is one of the deﬁning
characteristics of colour, along with chromaticity. Since luminance is obtained
from radiance at a given wavelength via V (λ ), the relationship between the colour-
matching functions and V (λ ) must satisfy the luminance ratio between the units
deﬁned by equation (4.3),

V (λ) = c(0.17697 r¯(λ) + 0.81240 g¯(λ) + 0.01063 b¯(λ)). (4.4)

Here c is a normalisation constant. At λ = 700 nm where λ = λR , ﬁgure 4.3 shows that

the functions ḡ(λ ) and b̄(λ ) are zero. However, r̄(λ ) remains non-zero,
V (λR) = c × 0.17697 × r¯(λR).

This reveals the value for the normalisation constant

c = 5.6508 . (4.5)

Substituting c into equation (4.4) reveals the relationship between the standard
luminosity function and the colour-matching functions,

V (λ) = 1.0000 r¯(λ) + 4.5907 g¯(λ) + 0.0601 b¯(λ) (4.6)

4-8
Physics of Digital Photography (Second Edition)

Figure 4.4. The colour-matching functions r̄(λ ), ḡ(λ ) and b̄(λ ) each multiplied according to the luminance ratio
between their respective dimensionless units illustrates the relationship with the 1924 CIE standard luminosity
function V (λ ).

This relationship is illustrated in ﬁgure 4.4. The area under each of the colour-
matching functions shown in ﬁgure 4.3 is found to be 1/c of the area under V (λ ).

4.1.6 CIE RGB colour space

Recall from section 4.1.2 that an SPD denoted by P can be associated with a set of
three tristimulus values L, M , S that define its colour, and these tristimulus values
are obtained by integrating the product of P(λ ) and the eye cone response functions
over the visible spectrum. The LMS reference colour space is defined by the set of all
valid L, M , S triples.
If the colour-matching functions are used in place of the eye cone response
functions to specify a given colour, a different set of three tristimulus values denoted
by R, G, B are obtained. The set of all valid R, G , B triples defines the CIE RGB
colour space. This is again a reference colour space by definition as it contains all
possible colours. However, the colours are specified differently in CIE RGB
compared to LMS.
Mathematically, both sides of equation (4.1) can be multiplied by P(λ ) in order to
match the colour of P at a single wavelength λ,
P(λ)[E(λ)] ≡ P(λ) r¯(λ)[R ] + P(λ) g¯(λ)[G ] + P(λ) b¯(λ)[B ].

The colour of P itself can be matched by integrating over the visible spectrum,
780
P≡ ∫380 P(λ)[E(λ)] dλ .

4-9
Physics of Digital Photography (Second Edition)

This means that

780 780 780
P≡ ∫380 P(λ) r¯(λ)[R ] dλ + ∫380 P(λ) g¯(λ)[G ] dλ + ∫380 P(λ) b¯(λ)[B ] dλ .

This expression can be written more compactly in the form

P ≡ R [R ] + G [G ] + B [B ].
Here R, G, B are the CIE RGB tristimulus values measured in [R ], [G ], [B ] units,
780 780 780
R= ∫380 P (λ) r¯(λ) dλ , G= ∫380 P (λ) g¯(λ) dλ , B= ∫380 P(λ) b¯(λ) dλ .

The same R, G , B triple will result from metameric SPDs when viewed under
identical conditions.
SPDs are commonly expressed using spectral radiance units, W sr−1 m−2 nm−1.
The numerical values of r̄(λ ), ḡ(λ ), b̄(λ ) are deﬁned according to equation (4.6), and
so the CIE RGB tristimulus values can be negative. Tristimulus values can be
absolute or relative. Normalisation of the tristimulus values will be discussed in
sections 4.1.10 and 4.1.11.
The CIE RGB colour space contains all colours visible to the 2° standard
colourimetric observer. However, the CIE RGB colour space is rarely used as a
reference space in practice. Nevertheless, it provides the foundation for the widely
used CIE XYZ reference colour space to be introduced in section 4.1.8.
The following section discusses the rg chromaticity diagram, which provides a
straightforward way to visualise the CIE RGB colour space.

4.1.7 rg chromaticity diagram

Colour spaces have an associated colour model that can be used to specify the
tristimulus values. The CIE RGB colour space can be represented by a 3D
coordinate system that treats the primaries as vector components. However, unlike
output-referred RGB colour spaces such as sRGB and Adobe® RGB, the CIE RGB
colour space is difficult to visualise in 3D as it does not form the shape of a cube.
A more useful representation of the CIE RGB colour space is obtained by
separating the chromaticity information from the luminance information. The rg
chromaticity diagram shown in figure 4.5 is a 2D diagram that describes the relative
proportions of the R, G, B tristimulus values that define each colour [4]. Luminance
information is not included. The rg chromaticity coordinates are defined as
R G B
r= , g= , b= .
R+G+B R+G+B R+G+B
The 2D representation using only (r, g ) chromaticity coordinates is possible since
b = 1 − r − g . The utility of chromaticity diagrams will become apparent in section
4.1.9 when discussing the xy chromaticity diagram of the CIE XYZ colour space.
In figure 4.5, the grey horseshoe-shaped area represents all chromaticity coor-
dinates of all colours visible to the 2° standard colourimetric observer. The rg

4-10
Physics of Digital Photography (Second Edition)

Figure 4.5. 1931 CIE rg chromaticity diagram deﬁned by the grey horseshoe-shaped area. The red, green and
blue circles mark the primaries of the 1931 CIE RGB colour space, and the point corresponding to illuminant
E is shown in white. The black circles on the boundary indicate a selection of pure spectrum colours and their
associated wavelengths.

chromaticity coordinates of the primaries that define the CIE RGB colour space are
(0,0), (0,1) and (1,0) by definition, and illuminant E has coordinates (1/3,1/3) due to the
normalisation of the units. All visible chromaticities form a horseshoe shape rather
than a triangle due to the overlap of the eye cone response functions shown in
figure 4.2. Pure spectral colours or hues are located on the curved part of the horseshoe,
and saturation decreases with inward distance from the horseshoe boundary.
The horseshoe area has been shown in grey rather than colour because a standard
monitor cannot display all of these colours correctly. For reference, figure 4.14 in
section 4.5 displays the chromaticities of the smaller sRGB colour space as these can
be shown correctly on a standard monitor.

4.1.8 CIE XYZ colour space

A deﬁciency of the CIE RGB colour space is that negative tristimulus values occur.
Consequently, in 1931, the CIE also introduced an alternative reference colour space
with advantageous mathematical properties known as the CIE XYZ colour space.
Recall from section 4.1.3 that any set of three linearly independent monochro-
matic light sources can be chosen as primaries when colour-matching experiments
are performed. According to Grassman’s laws, it is possible to deﬁne alternative sets
of colour-matching functions as linear transformations from those used in the
colour-matching experiments of Wright and Guild. Accordingly, the CIE introduced

4-11
Physics of Digital Photography (Second Edition)

a new set of colour-matching functions x̄(λ ), ȳ(λ ), z̄(λ ) and associated primaries
λX , λY , λZ . These new colour matching functions are obtained as a linear trans-
formation from r̄(λ ), ḡ(λ ), b̄(λ ) as follows:
⎡ x¯(λ) ⎤ ⎡ r¯(λ) ⎤
⎢ ⎥ ⎢ ⎥
_ ⎢ g¯(λ)⎥ ,
⎢ y¯ (λ)⎥ = T
⎢⎣ z¯(λ) ⎥⎦ ⎢⎣ b¯(λ)⎥⎦

where T
_ is a 3 × 3 transformation matrix,
⎡ 0.49 0.31 0.20 ⎤
_ = c⎢ 0.17697 0.81240 0.01063⎥ ,
T
⎢⎣ ⎥
0.00 0.01 0.99 ⎦
and c = 5.6508 is the normalisation constant deﬁned by equation (4.5). The above
transformation matrix T was deﬁned so that the CIE XYZ colour space would have
the following properties:
• One of the tristimulus values, Y, would be proportional to luminance.
• Negative tristimulus values would not occur.

The ﬁrst property can be examined by writing the above matrix equation more
explicitly,
x¯(λ) = c(0.49 r¯(λ) + 0.31 g¯(λ) + 0.20 b¯(λ))
y¯ (λ) = c(0.17697 r¯(λ) + 0.81240 g¯(λ) + 0.01063 b¯(λ))
z¯(λ) = c(0.01 g¯(λ) + 0.99 b¯(λ)).

Figure 4.6. 1931 CIE colour-matching functions x̄(λ ), ȳ(λ ), z̄(λ ). Arbitrary colours have been used for the
curves.

4-12
Physics of Digital Photography (Second Edition)

Significantly, it can be seen that the ratio between the coefficients appearing on the
second line is precisely the luminance ratio between the [R ], [G ] and [B ] units defined
by equation (4.3). Furthermore, the CIE defined x̄(λ ) and z̄(λ ) such that they yield
zero luminance. Consequently, all luminance information about the colour is
described by the ȳ(λ ) colour-matching function only. Since the constant
c = 5.6508 is the same as that defined by equation (4.5), ȳ(λ ) is in fact equal to
the standard luminosity function V (λ ) defined by equation (4.6),

y¯ (λ) = V (λ) . (4.7)

The presence of the normalisation constant c explains the differences in height

between the r̄(λ ), ḡ(λ ), b̄(λ ) and x̄(λ ), ȳ(λ ), z̄(λ ) colour-matching functions [5].
The second property is illustrated in figure 4.6, which shows that the colour-
matching functions x̄(λ ), ȳ(λ ), z̄(λ ) remain non-negative over the visible spectrum.
This is possible because the CIE defined the λX , λY , λZ primaries to be imaginary
primaries that do not correspond to visible chromaticities, in analogy with the eye
cone primaries described in section 4.1.2. The location of the λX , λY , λZ primaries on
the rg chromaticity diagram is shown in figure 4.7. Unlike the real r̄(λ ), ḡ(λ ), b̄(λ )
primaries used in the experiments of Wright and Guild indicated by the red, green,

Figure 4.7. 1931 CIE rg chromaticity diagram deﬁned by the grey shaded horseshoe-shaped area. The red,
green and blue points mark the primaries of the 1931 CIE RGB colour space. The black points mark the
primaries of the CIE XYZ colour space.

4-13
Physics of Digital Photography (Second Edition)

and blue circles, all visible chromaticities can be obtained as positive linear
combinations of the λX , λY , λZ primaries.
In analogy with section 4.1.6, the CIE XYZ tristimulus values representing the
colour of an SPD denoted by P are deﬁned as follows:
780 780
X=k ∫380 P(λ) x¯(λ) dλ , Y = k ∫380 P(λ) y¯ (λ) dλ ,
780
. (4.8)
Z=k ∫380 P(λ) z¯(λ) dλ

Since the units [X ], [Y ], [Z ] are derived from [R ], [G ], [B ], they are dimensionless

units that each represent their own quantity of power,
P ≡ X [X ] + Y [Y ] + Z [Z ].
Again, illuminant E is obtained by adding a unit of each primary.
Equation (4.8) includes a normalization factor k that determines whether the
tristimulus values are absolute or relative. This will be discussed in sections 4.1.10
and 4.1.11.

4.1.9 xy chromaticity diagram

The xy chromaticity diagram can be defined in analogy with the rg chromaticity
diagram introduced in section 4.1.7. The xy chromaticity coordinates are defined as
X Y Z
x= , y= , z= = 1 − x − y. (4.9)
X+Y+Z X+Y+Z X+Y+Z
The xyz chromaticity coordinates represent relative proportions of the XYZ
tristimulus values [4]. The xy chromaticity diagram is is a projection of the rg
chromaticity diagram shown in figure 4.7 onto the indicated x and y chromaticity
axes shown in figure 4.8. The (x , y ) chromaticity coordinates of the primaries are
(0,0), (0,1) and (1,0) by definition, and illuminant E again has coordinates (1/3,1/3).
Again, the visible chromaticities form a horseshoe shape rather than a triangle due
to the overlap of the eye cone response functions shown in figure 4.2. The curved
portion of the horseshoe boundary represents pure spectrum colours or hues, and
saturation decreases with inward distance from the boundary.
Chromaticity diagrams have useful properties. Consider adding two colours with
chromaticities (x1, y1) and (x2, y2 ). The chromaticity coordinates of the resulting
colour will lie on the straight line connecting (x1, y1) and (x2, y2 ). Therefore the line at
the base of the horseshoe cannot represent pure spectrum colours; this line defines
the non-spectral purples. It can also be deduced that the area enclosed by a triangle
connecting a fixed set of primaries reveals all (x , y ) that can be obtained by additive
mixtures of those primaries. Since the colour-matching functions x̄(λ ), ȳ(λ ), z̄(λ ) and
tristimulus values X , Y , Z are always non-negative, the λX , λY , λZ primaries must lie
outside of the horseshoe as this is the only way that a triangle connecting the

4-14
Physics of Digital Photography (Second Edition)

Figure 4.8. 1931 CIE xy chromaticity diagram deﬁned by the grey shaded horseshoe-shaped area. The black
circles on the boundary indicate a selection of pure spectrum colours and their associated wavelengths. The
point corresponding to illuminant E is shown in white. The red, green and blue circles mark the primaries of
the 1931 CIE RGB colour space.

primaries can enclose all visible chromaticities. Therefore, the λX , λY , λZ primaries

are invisible as they are more saturated than pure spectrum colours.
The primaries of the CIE RGB colour space are indicated by the red, green, and
blue circles in figure 4.8. This triangular area does not enclose the entire horseshoe
since negative (r, g ) chromaticity coordinates and R, G , B tristimulus values cannot
be obtained from additive mixtures. By comparing figure 4.7 and 4.8, it can be seen
that the entire horseshoe is filled when the appropriate negative (r, g ) chromaticity
coordinates are included.
The horseshoe area has again been shown in grey rather than colour because a
standard monitor cannot display all of these colours correctly. However, figure 4.14
in section 4.5 displays the chromaticities of the smaller sRGB colour space as these
can be shown correctly on a standard monitor. Output-referred colour spaces such
as sRGB that are used to display images on standard monitors must be defined using
real primaries, and so they will form triangles on the xy chromaticity diagram.
However, these colour spaces are additive and so visible chromaticities defined by
areas outside the triangle are not included. These chromaticities are said to lie
outside the gamut of the colour space.

4-15
Physics of Digital Photography (Second Edition)

4.1.10 Absolute colourimetry

The (x , y ) chromaticity coordinates describe the relative amounts of the X , Y , Z
tristimulus values that specify a given colour, and so knowledge of the actual
X , Y , Z values is lost. However, X and Z can be recovered provided the magnitude
of Y is known:
xY (1 − x − y )Y
X= , Z= . (4.10)
y y

In section 4.1.8 it was shown that Y is proportional to luminance. In fact, Y can

specify absolute luminance L v measured in cd/m2 provided the normalisation
constant k appearing in equation (4.8) is appropriately set.
Recall that luminance is obtained from spectral radiance Le,λ according to
equation (3.1) of chapter 3,
780
Lv = Km ∫380 Le,λ(λ) V (λ) dλ ,

where Km = 683 lm/W is the maximum luminous efﬁcacy. Since ȳ(λ ) = V (λ )

according to equation (4.7), it follows that Y will be absolute luminance in cd/m2
provided two conditions are met:
• The normalisation constant k appearing in equation (4.8) must be set equal to
Km = 683 lm/W.
• The SPD denoted by P must be speciﬁed in spectral radiance units,
W/(sr m2 nm).

Now the tristimulus values appearing in equation (4.8) are absolute, and Y speciﬁes
absolute luminance Yabs or L v measured in cd/m2,
780
Yabs = K m ∫380 P(λ) y¯ (λ) dλ .

4.1.11 Relative colourimetry

When X , Y , Z tristimulus values are speciﬁed using relative colourimetry, Y
remains proportional to luminance but its absolute value in cd/m2 is not relevant.
Instead, Y is normalized to the range [0,100], where Y = 100 corresponds to a
reference luminance. Formally, this is achieved by deﬁning the normalisation
constant k appearing in equation (4.8) as follows:
100
k= 780
.
∫380 P(λ) y¯ (λ) dλ

4-16
Physics of Digital Photography (Second Edition)

When tristimulus values are speciﬁed using relative colourimetry, Y is known as

relative luminance. Since visible colours do not form a cube in a 3D representation of
the XYZ colour space, only Y ⩽ 100.
When dealing with colour space conversion in digital cameras, Y is typically
normalized to the range [0,1] instead,
1
k= 780
.
∫380P(λ) y¯ (λ) dλ

Relative tristimulus values X , Y , Z for a given pixel can then be regarded as

components of a vector,
⎡X ⎤
⎢Y ⎥ .
⎢⎣ ⎥⎦
Z

4.1.12 Reference white

The native white or reference white of a colour space is the colour stimulus to which
the tristimulus values are normalised. Using relative colourimetry with Y normalised
to the range [0,1], the reference white of a colour space is deﬁned by the unit vector in
that colour space. For example, the reference white of the CIE XYZ colour space is
illuminant E,
⎡X ⎤ ⎡1⎤
⎢Y ⎥ = ⎢1⎥ .
⎢⎣ ⎥⎦ ⎢ ⎥
Z E ⎣1⎦
The reference white is indicated by the subscript following the vector. The reference
white of the CIE RGB colour space is also illuminant E,
⎡R⎤ ⎡1⎤
⎢G ⎥ = ⎢1⎥ .
⎢⎣ ⎥⎦ ⎢ ⎥
B E ⎣1⎦
This is a consequence of the normalization of the units used in the colour-matching
experiments, as described in section 4.1.4. Output-referred colour spaces such as the
RGB colour spaces to be introduced in section 4.5 are subsets of CIE RGB and CIE
XYZ. The reference white depends upon the choice of primaries. For example, the
reference white of the sRGB colour space is the D65 standard illuminant introduced
in the following section.
Even though the vector specifying the reference white has equal components, it
should be remembered that those components are not equal in a photometric or
radiometric sense. In the case of the CIE RGB colour space, the photometric ratio
between the components of the unit vector can be obtained from equation (4.3),
1 : 0.17697 : 0.81240 : 0.01063,
and the radiometric ratio between the components is given by
1 : 0.968 05 : 0.01852 : 0.01343.

4-17
Physics of Digital Photography (Second Edition)

4.2 Illumination
In the previous section, it was shown that the colour of an SPD can be speciﬁed by a set
of three tristimulus values belonging to a reference colour space, or alternatively by
luminance together with chromaticity coordinates belonging to a reference colour space.
This section describes the properties of SPDs that arise from everyday sources of
illumination such as heated objects, and in particular the concept of correlated
colour temperature.

4.2.1 Colour temperature

A black body is an ideal object that absorbs all incident electromagnetic radiation
[6]. A black body in thermal equilibrium with its surroundings at constant temper-
ature T emits electromagnetic radiation speciﬁed by an SPD that is a function of
temperature only. This is known as Planck’s law. If the SPD is speciﬁed in terms of
spectral radiance Le,λ , Planck’s law can be expressed by the following formula:

2hc 2 1
Le,λ(T ) = .
λ5 ⎛ hc ⎞
exp ⎜ ⎟−1
⎝ λk BT ⎠

Here h is Planck’s constant, kB is the Boltzmann constant, and c is the speed of light
in the medium. The temperature T measured in Kelvin (K) is known as the colour
temperature of the black body as its colour ranges from red at low temperatures
through to blue at high temperatures.
The Planckian locus is the locus formed on a chromaticity diagram by a black-
body radiator as a function of temperature. For the xy chromaticity diagram, the
chromaticity coordinates of the Planckian locus can be obtained by using equation
(4.8) to calculating the tristimulus values for a given SPD speciﬁed as Le,λ(T ) for a
given colour temperature, and then substituting these into equation (4.9) to calculate
the chromaticity coordinates as a function of colour temperature (x(T ), y(T )). In
practice, it is convenient to use an approximate formula such as a cubic spline [7].
The Planckian locus is shown on the xy chromaticity diagram in ﬁgure 4.9.

4.2.2 Correlated colour temperature

Sources of everyday illumination are typically incandescent, meaning the emitted
electromagnetic radiation derives from a heated object. Although such sources are
not true black body radiators, the colour of the illumination can often be closely
matched to the colour of the SPD emitted by a black body at a given temperature T.
Provided this colour match is within a speciﬁed perceptual difference, the source of
illumination can be associated with a correlated colour temperature (CCT), which is
the colour temperature of the closest-matching black-body radiator.
The xy chromaticity diagram is unsuitable for determining the CCT associated
with given chromaticity coordinates (x , y ) as the XYZ colour space is not

4-18
Physics of Digital Photography (Second Edition)

Figure 4.9. Planckian locus (black curve) on the 1931 CIE xy chromaticity diagram calculated using the cubic
spline approximation. A selection of colour temperatures are indicated. Only chromaticities contained within
the output-referred sRGB colour space have been shown in colour.

perceptually uniform, meaning distances do not correlate uniformly with colour

perception. Instead, the (x , y ) chromaticity coordinates can be transformed into the
(u, v ) chromaticity coordinates of the CIE 1960 UCS (uniform chromaticity scale)
colour space:
4x 6y
u= , v= . (4.11)
12y − 2x + 3 12y − 2x + 3
The uv chromaticity diagram is known as MacAdam’s diagram. Although now
superseded by the CIE 1976 UCS chromaticity diagram, an important feature of
MacAdam’s diagram is that lines with the same CCT or isotherms are normal to the
Planckian locus. This enables the CCT associated with given chromaticity coor-
dinates to be straightforwardly computed using numerical schemes such as
Robertson’s method [8]. Approximate formulae have also been developed [9, 10].
There are many possible chromaticity coordinates that map to the same CCT. If
only the CCT of the illuminant is known, the corresponding chromaticity coor-
dinates can be distinguished by deﬁning a colour tint value. For a given pair of
chromaticity coordinates (u, v ), the colour tint describes the distance along the

4-19
Physics of Digital Photography (Second Edition)

isotherm in uv space between (u, v ) and the Planckian locus. According to the CIE,
the concept of CCT is only valid for distances up to a value ±0.05 from the
Planckian locus in uv space. The colour tint will be magenta or red below the
Planckian locus and green or amber above the Planckian locus.

4.2.3 White point

The white point of a source SPD is defined by the chromaticity coordinates (x , y ) of a
100% diffuse neutral reflector illuminated by that SPD.
It is convenient to express the WP in terms of relative tristimulus values with Y
normalised to the range [0,1]. In this book, the following notation will be used to
indicate the specific vector corresponding to the white point:

⎡ X (WP)⎤
⎢ ⎥
⎢Y (WP) ⎥ ,
⎢⎣ Z (WP) ⎥⎦

where Y (WP) = 1. The values for X (WP) and Z(WP) can be calculated using
equation (4.10).
The terms ‘white point’ and ‘reference white’ are often used interchangeably. In this
book, ‘white point’ will be used only to describe the above property of the illumination.
The term ‘reference white’ introduced in section 4.1.12 will be used to describe the
white reference of a colour space deﬁned by the unit vector in that colour space.

4.2.4 Standard illuminants

The CIE have deﬁned a set of standard illuminants, which are useful for theoretical
work and camera colour characterisation [3]. Figure 4.10 illustrates the SPDs that

Figure 4.10. SPDs for some example CIE standard illuminants. All curves are normalised to a value of 100 at
560 nm.

4-20
Physics of Digital Photography (Second Edition)

Table 4.1. XYZ colour space data for a selection of CIE standard illuminants. The D series all represent
natural daylight.

CIE Standard Illuminants

Illuminant White point (x , y ) White point (Y = 1) CCT(K) Description

A (0.4476, 0.4074) X = 1.0985, Z = 0.3558 2856 Incandescent bulb

D50 (0.3457, 0.3586) X = 0.9642, Z = 0.8252 5003 Horizon
D55 (0.3324, 0.3475) X = 0.9568, Z = 0.9214 5503 Mid-morning
D65 (0.3127, 0.3291) X = 0.9504, Z = 1.0888 6504 Noon
D75 (0.2990, 0.3150) X = 0.9497, Z = 1.2261 7504 North sky
E (1/3,1/3) X = 1, Z = 1 5454 Equi-energy

deﬁne some common standard illuminants. Table 4.1 gives the white point for a
selection of standard illuminants in terms of chromaticity coordinates and relative
tristimulus values, together with the corresponding CCTs.
If the illumination under consideration is a standard illuminant, this can be
indicated at the lower-right-hand corner of the vector. For example, the white point
of D65 illumination in the XYZ colour space can be indicated in the following way:

⎡ X (WP)⎤ ⎡ 0.9504 ⎤
⎢ ⎥
⎢Y (WP) ⎥ = ⎢ 1 ⎥.
⎢⎣ ⎥
⎢⎣ Z (WP) ⎥⎦ 1.0888 ⎦
D65

4.3 Camera raw space

The camera response functions R1(λ ), R2(λ ), R3(λ ) introduced in section 3.6.4 of
chapter 3 define the camera raw space. The camera response functions play the role
analogous to the eye cone response functions l¯(λ ), m̄(λ ), s̄(λ ) that define the LMS
reference colour space, and the colour matching functions r̄(λ ), ḡ(λ ), b̄(λ ) that define
the CIE RGB reference colour space.
A mathematical model of a camera raw space is developed in this section.

4.3.1 Raw channels

For a CFA that uses three types of colour filter such as a Bayer CFA, the raw levels
expressed using output-referred units (DN or ADU) defined in section 3.7.2 of
chapter 3 belong to a set of raw channels,
n DN,i = R, G1, G2, B ,
where i is the mosaic label. Since a Bayer CFA uses twice as many green filters as red
or blue, two values G1 and G2 associated with different positions in each Bayer block
will be obtained in general. Nevertheless, there are fundamentally only three camera
response functions, R1(λ ), R2(λ ), R3(λ ).

4-21
Physics of Digital Photography (Second Edition)

By combining equations (3.39) and (3.50) from chapter 3, the raw channels can be
modelled as follows:
λ2
R=k ∫λ R1(λ) E˜e,λ(x , y ) dλ
1
λ2
G=k ∫λ R2(λ) E˜e,λ(x , y ) dλ
1
λ2
B=k ∫λ R3(λ) E˜e,λ(x , y ) dλ .
1

The SPD is specified in terms of spectral irradiance at the sensor plane (SP) as
described below, and the integration is over the spectral passband of the camera. The
normalization constant k is defined by
Ap t
k= .
gi e
Here Ap is the photosite area, t is the exposure duration, e is the elementary charge,
and gi is the conversion factor between electron counts and DN for mosaic i, as
described in section 3.7.3 of chapter 3.
The actual R , G1, G2 , B values obtained in practice are quantized values modelled
by taking the integer part of the above equations. As described in section 3.7.2 of
chapter 3, the maximum raw value is defined by the raw clipping point and is limited
by the bit depth of the analog-to-digital converter (ADC).
The SPD used to obtain the tristimulus values of the camera raw space has been
specified in terms of E˜λ(x , y ). This is the convolved and sampled spectral irradiance
distribution at a given photosite defined by equation (3.33) of chapter 3,
⎡x y⎤
E˜e,λ(x , y ) = (Ee,λ,ideal(x , y ) ∗ hsystem(x , y , λ)) comb ⎢ , ⎥ .
⎢⎣ px py ⎥⎦

Here Ee,λ,ideal(x , y ) is the ideal spectral irradiance distribution at the SP, px and py are
the pixel pitches in the horizontal and vertical directions, hsystem (x , y, λ ) is the system
point spread function (PSF), and (x , y ) are the sampling coordinates associated with
the photosite.

4.3.2 Colour demosaicing

As mentioned previously in section 2.1.2 of chapter 2, a computational interpolation
process known as colour interpolation or colour demosaicing needs to be carried out
on the raw channels in order to obtain three different raw values R , G , and B
associated with each photosite.
Various colour demosaicing algorithms have been developed. The basic idea is
shown in ﬁgure 4.11, which illustrates the concept using simple bilinear interpolation.
In the diagram on the left, only the green pixel component is known at photosite i.
The red and blue pixel components Ri and Bi can be calculated in the following way:

4-22
Physics of Digital Photography (Second Edition)

B1 R1 G1 R2 B1 G1 B2

R1 Gi R2 G3 Bj G4 G3 Rk G4

B2 R3 G2 R4 B3 G2 B4

Figure 4.11. Colour demosaicing using bilinear interpolation.

Ri = (R1 + R2 )/2
Bi = (B1 + B2 )/2.
In the middle diagram, only the blue pixel component is known at site j. The red and
green pixel components Rj and Gj can be calculated as
Rj = (R1 + R2 + R3 + R 4)/4
Gj = (G1 + G 2 + G 3 + G4)/4.

Finally, only the red pixel component is known at site k in the diagram on the right.
The blue and green pixel components can be calculated as
Bk = (B1 + B2 + B3 + B4)/4
G k = (G1 + G 2 + G 3 + G4)/4.
Although conceptually simple, bilinear interpolation generates cyclic pattern noise
and zipper patterns along edges due to the cyclic change of direction of the
interpolation filter [11, 12].
Although the demosaicing algorithms used by in-camera image processing
engines are proprietary, a variety of sophisticated demosaicing algorithms have
been published in the literature [13]. For example, the open-source raw converter
‘dcraw’ offers the following [12]:
• Halfsize: This method does not carry out an interpolation but instead
produces a smaller image by combining each 2 × 2 Bayer block into a single
raw pixel vector. Since the red and blue photosites are physically separated,
this method causes colour fringes to appear along diagonal edges [12].
• Bilinear Interpolation: This method is illustrated in figure 4.11. It is used
mainly as a first step in the VNG algorithm described below.
• Threshold-based Variable Number of Gradients (VNG) [14]: This method
measures the colour gradients in each of the eight directions around each
pixel. Only the gradients closest to zero are used to calculate the missing
colour components in order to avoid averaging over sharp edges. Although
this method is slow and produces zipper patterns at orthogonal edges,
it excels for shapes that do not have well-defined edges such as leaves and
feathers [12].
• Patterned Pixel Grouping (PPG) [15]: This method first fills in the green
mosaic using gradients and pattern matching before filling in the red and blue

4-23
Physics of Digital Photography (Second Edition)

mosaics based on the green. Whenever two interpolation directions are

possible, the preferred direction is calculated based on the gradients [12].
• Adaptive Homogeneity-Directed (AHD) [16]: This method generates two
separate interpolated images that are combined together. The ﬁrst image is
generated by interpolating the green channel in the horizontal direction and
then interpolating the red and blue based on the green, resulting in perfect
horizontal edges but ragged vertical edges. Conversely, the second image is
generated by interpolating the green channel in the vertical direction and then
interpolating the red and blue based on the green, resulting in perfect vertical
edges but ragged horizontal edges. The combined image is generated by
choosing the most homogeneous image at each pixel location. The homoge-
neity is measured by converting to the perceptually uniform CIE LAB colour
space [12].

Of the above example methods, the AHD method yields the best quality output
overall. A weakness of the AHD method is that horizontal and vertical interpolation
can fail simultaneously at 45° edges. This is problematic for Fuji® raw files produced
by cameras that use the Fuji® X-Trans® CFA illustrated in figure 3.29 of chapter 3.
This non-Bayer type of CFA is designed to reduce the need for an optical-low pass
filter. Since the raw data contains many 45° edges, dcraw instead defaults to the PPG
method for Fuji® raw files [12].

4.3.3 Raw pixel vectors

After the colour demosaic has been performed, there will be three different raw
values R , G , and B associated with each photosite. These can be interpreted as raw
tristimulus values that deﬁne a ‘colour’ in the camera raw space.
The raw tristimulus values corresponding to a given photosite can be expressed in
the form of a raw pixel vector,
⎡R⎤
⎢ G ⎥.
⎢ ⎥
⎣B ⎦
The calligraphic-style notation is used in this book to distinguish raw pixel vectors
from pixel vectors of standard RGB colour spaces.
The raw tristimulus values are relative tristimulus values with maximum value
deﬁned by the raw clipping point, which is limited by the bit depth of the ADC. It is
useful to further normalize R , G , B to the range [0,1] when transforming between the
camera raw space and other colour spaces.

4.3.4 Camera raw space primaries

Recall the deﬁnition of the camera response functions from section 3.6.4 of chapter 3,
eλ
Ri (λ) = QEi (λ) .
hc

4-24
Physics of Digital Photography (Second Edition)

The external quantum efﬁciency for mosaic i is deﬁned by

QEi (λ) = TCFA,i (λ) η(λ) T (λ) FF.

Here TCFA,i is the CFA transmission function for mosaic i, η(λ ) is the charge
collection efficiency (CCE) of a photoelement, T (λ ) is the SiO2/Si interface trans-
mission function, and FF = Adet /Ap is the fill factor.
The camera response functions can be interpreted as specifying ‘amounts’ of the
camera raw space primaries at each wavelength. Since the camera response functions
are defined by physical filters, their values will always be non-negative.
Consequently, the primaries of the camera raw space must be imaginary, in analogy
with the eye cone primaries of the HVS. Recall that imaginary primaries are invisible
to the HVS as they are more saturated than pure spectrum colours.

4.3.5 Camera raw space reference white

The camera raw space for a given camera model has its own reference white. Using
relative colourimetry, this is the white point of the illumination that produces
maximum equal raw tristimulus values for a neutral diffuse subject.
If the raw tristimulus values are normalised to the range [0,1], the reference white
can be labelled using the following vector notation:
⎡R⎤ ⎡1⎤
⎢G ⎥ = ⎢1⎥ ,
⎢ ⎥ ⎢ ⎥
⎣ B ⎦reference ⎣1⎦

where ‘reference’ refers to the scene illumination white point that yields the unit
vector.
Recall from section 4.1.4 that the reference white of the CIE RGB and CIE XYZ
colour spaces is illuminant E. This was achieved by introducing dimensionless units
for the primaries with each primary unit representing its own speciﬁc quantity of
power. In contrast, the units of the camera raw space primaries are not normalized
in such a manner. Due to the transmission properties of the CFAs used by camera
manufacturers, the reference white of a camera raw space is typically a magenta
colour that does not necessarily have an associated CCT.

4.4 Camera colour characterisation

Before any processing of the raw data can occur, a camera must be characterized by
establishing a linear mapping between the camera raw space and a reference colour
space such as CIE XYZ. Unless the Luther–Ives condition discussed below is
satisﬁed exactly, this linear mapping can only be deﬁned in an approximate way.
This section describes the procedure for characterizing a camera by determining a
linear mapping described by a transformation matrix. Normalisation and validity
issues related to the transformation matrix are also discussed.

4-25
Physics of Digital Photography (Second Edition)

4.4.1 Luther–Ives condition

Recall that the colour-matching functions that define the CIE RGB and CIE XYZ
reference colour spaces are related to each other via a linear transformation, and
these in turn are related to the eye cone response functions via linear
transformations.
In order for a camera to record colour correctly, it can be inferred from the
Luther–Ives condition [17] that a linear transformation must also exist between the
camera response functions and the eye cone response functions or colour-matching
functions.
In practice, camera imaging sensors do not obey the Luther–Ives condition
exactly. This means that only an approximate linear transformation can be defined
between the camera response functions and the eye cone response functions or
colour-matching functions [18]. If the Luther–Ives condition is not satisfied exactly,
the camera raw space cannot be interpreted as a true colour space as there will be a
degree of camera metameric error present.
Recall from section 4.1.1 that metamers are different SPDs that appear to the
HVS to be the same colour when viewed under identical viewing conditions. Camera
metameric error occurs when the camera violates the principle of metamerism so that
metameric SPDs that should be recorded as the same colour instead lead to different
colour responses from the camera. The degree of metameric error can be specified in
terms of the digital still camera sensitivity metameric index (DSC/SMI) described in
the ISO 17321-1 standard [19]. The DSC/MSI metric can be interpreted as
describing the extent to which the linear transformation between the camera
response functions and eye cone response functions or colour-matching functions
is approximate.
Even though the camera raw space is not a true colour space for a camera that
exhibits metameric error, it can be interpreted as an approximate colour space with a
warped spectral locus [20] that does not form a triangle on the xy chromaticity
diagram. Camera raw spaces are very large in general, although parts of their
gamuts may extend outside the range of visible chromaticities [20].

4.4.2 Raw to CIE XYZ

It is convenient to use CIE XYZ as the reference colour space when characterising
the colour response of a camera. The approximate linear transformation from the
camera raw space to CIE XYZ can be speciﬁed as follows:
⎡X ⎤ ⎡R⎤
⎢ ⎥
_ ⎢ G ⎥.
⎢Y ⎥ ≈ T (4.12)
⎢ ⎥
⎣Z ⎦ ⎣B ⎦

Here T
_ is a 3 × 3 transformation matrix,

4-26
Physics of Digital Photography (Second Edition)

⎡T11 T12 T13 ⎤

⎢ ⎥
T
_ = ⎢T21 T22 T23⎥ .
⎣T31 T32 T33 ⎦
The above approximate transformation can be optimised for scene illumination with
a specified white point, and so the optimum matrix T _ is dependent upon the scene
illumination white point.
Method B of the ISO 17321-1 standard [19] describes a procedure for determining
_ based on the use of a standard target such as the colour chart shown in figure 4.12.
T
However, method B uses processed images output by the camera rather than raw
data, and consequently requires experimental determination and inversion of the
opto-electronic conversion function (OECF) [21]. The OECF defines the nonlinear
relationship between irradiance at the SP and the digital output levels (DOLs) of an
image produced by the internal image processing engine of the camera.
Instead, an alternative method based on photographing a colour chart is
described below. This method uses the raw data directly and does not require
determination and inversion of the OECF.
1. Take a photograph of a colour chart illuminated by a standard illuminant.
Since the raw values scale linearly, only their relative values are important.
However, the f-number N and exposure duration t should be chosen so as to
avoid clipping.
2. Calculate relative XYZ tristimulus values for each patch of the colour chart
by using equation (4.8) and a spectrometer to measure the SPD, which needs
to be multiplied by the spectral reflectance of each patch. Alternatively, a
tristimulus colorimeter can be used. The data can be normalized so that Y is
in the range [0,1] by using the white patch as a white reference.
3. Obtain a linear demosaiced output image directly in the camera raw space
without converting to any other colour space. Gamma encoding, tone curves
and white balance (WB) must all be disabled. Software capable of providing
such output includes the open-source raw converter ‘dcraw’. It is crucial to
disable WB, otherwise scaling factors known as raw WB multipliers may be
applied to the raw channels, and these will affect the calculated T _ . An
appropriate dcraw command is ‘dcraw -v -r 1 1 1 1 -o 0 -4 -T filename’.

Figure 4.12. A typical colour chart that can be used for camera colour characterisation.

4-27
Physics of Digital Photography (Second Edition)

4. Measure average R , G , B values for each patch. The ISO 17321-1 standard
recommends that the block of pixels over which the average is taken should
be at least 64 × 64 pixels in size. Each patch can then be associated with an
appropriate average raw pixel vector.
5. Build a 3 × n matrix A_ containing the XYZ vectors for each patch 1, .., n as
columns,
⎡ X1 X2 ⋯ Xn ⎤
⎢ ⎥
A
_ = ⎢Y1 Y2 ⋯ Yn ⎥
⎣ Z1 Z2 ⋯ Zn ⎦
Similarly build a 3 × n matrix B
_ containing the corresponding raw pixel
vectors as columns,
⎡ R1 R2 ⋯ R n ⎤
⎢ ⎥
_ = ⎢ G1 G2 ⋯ Gn ⎥
B
⎣ B1 B2 ⋯ Bn ⎦

6. Calculate the 3 × 3 colour transformation matrix T

_ that transforms B
_ to A
_.
Unless the Luther–Ives condition is satisfied exactly, the relationship will
only be approximate,
A
_ ≈ TB.
__
The simplest method for minimising the error is to use linear least-squares
minimisation [22, 23]. This yields the following solution:
T _ _ T (BB
_ = AB _ _ T )−1.
Here the T superscript denotes the transpose operator.
7. Optionally, transform into the perceptually uniform CIE LAB reference
colour space introduced in section 4.4.3 below and calculate the colour
difference ΔEi between the estimated tristimulus values Xi , Yi , Zi obtained
via T_ and real tristimulus values for each patch, as described in the ISO
17321-1 standard [19, 22]. The DSC/SMI can be calculated from the set of
{ΔEi}. A score of 100 would be obtained if the Luther–Ives condition were to
be satisfied exactly. Starting with T _ from the previous step, {ΔEi} can be
minimised using a nonlinear optimization technique. The potential colour
error for the final improved T _ is defined by the final DSC/SMI.
8. Scale T_ according to the normalisation required for its practical implemen-
tation, as discussed in section 4.4.4 below.

Provided WB was disabled in step 3, T _ can be used with arbitrary scene

illumination. However, optimum results will be obtained for scene illumination
with a white point that closely matches that of the characterisation illuminant.

4-28
Physics of Digital Photography (Second Edition)

4.4.3 Colour difference: CIE LAB

References colour spaces such as CIE XYZ do not provide a perceptually uniform
representation of colour. In order to calculate the colour difference ΔE used in step 7
above, colours specified using CIE XYZ can be straightforwardly converted into the
CIE LAB colour space, which is an alternative reference colour space that was
defined by the CIE in 1976. CIE LAB is associated with a colour model that specifies
colours in a more perceptually uniform manner. The transformation from CIE XYZ
to CIE LAB is defined as follows:
L* = 116 f (Yn ) − 16
a* = 500(f (Xn ) − f (Yn ))
b* = 200(f (Yn ) − f (Zn )),
where
⎧ α1/3 if α > δ 3
⎪
⎨
f (α ) = α 4
⎪ + otherwise,
⎩ 3δ 2 29
and δ = 6/29. Here Xn , Yn and Zn are normalised values defined relative to the
tristimulus values of the illumination white point,
X Y Z
Xn = , Yn = , Zn = .
X (WP) Y (WP) Z (WP)
This means that Xn , Yn and Zn are all normalised to the range [0,1].
L* is the lightness function illustrated in figure 2.5 that was introduced in section
2.2.3 of chapter 2. It takes values between 0 and 100 and is designed to correlate with
the brightness perception of relative luminance. The a* and b* components represent
the position between magenta and green and between yellow and blue, respectively.
The 1976 formula for colour difference within CIE LAB is given by

* = 2 2 2
ΔE ab (L1* − L 2*) + (a1* − a 2*) + (b1* − b 2*) .

A just noticeable difference corresponds to a ΔE ab

* value of around 2.3 [24]. More

accurate colour difference formulae have since been developed.

4.4.4 Transformation matrix normalisation

The raw to CIE XYZ transformation deﬁned by equation (4.12) needs to be
normalised. As an example, consider a colour transformation matrix T _D65 deter-
mined from a characterisation performed using the D65 daylight standard illumi-
nant. The illuminant is indicated by the subscript.
In order to normalise the transformation, consider the average data correspond-
ing to the white patch on the colour chart, which should ideally be a 100% diffuse
reﬂector. Since the average white patch data ideally corresponds to the D65

4-29
Physics of Digital Photography (Second Edition)

illumination white point, the speciﬁc CIE XYZ and raw pixel vectors under
consideration can be denoted using the notation introduced in section 4.2.3,
⎡ X (WP)⎤ ⎡ R(WP)⎤
⎢ ⎥ ⎢ ⎥
⎢Y (WP) ⎥ and ⎢ G(WP)⎥ .
⎢⎣ Z (WP) ⎥⎦ ⎢⎣ B(WP)⎥⎦
D65 D65

These vectors can be normalized according to the following procedure:

1. Scale the average CIE XYZ pixel vector corresponding to the white patch so
that Y is restricted to the range [0,1],
⎡ X (WP)⎤ ⎡ 0.9504 ⎤
⎢ ⎥ ⎢1.0000 ⎥ .
⎢ Y (WP) ⎥ =
⎢⎣ ⎥
⎢⎣ Z (WP) ⎥⎦ 1.0888 ⎦
D65

2. Scale the average raw pixel vector so that all raw tristimulus values are
restricted to the range [0,1]. Since the green raw tristimulus value is typically
the ﬁrst to saturate (reach its maximum value) under most types of
illumination, generally G(WP) = 1 whereas R(WP) and B(WP) are both
less than 1,
⎡ R(WP) < 1⎤
⎢ ⎥
⎢ G(WP) = 1⎥ .
⎢⎣ B(WP) < 1⎥⎦
D65

3. Scale T
_D65 so that it maps the scaled average raw pixel vector to the scaled
average CIE XYZ pixel vector for the white patch,
⎡ 0.9504 ⎤ ⎡ R(WP) < 1⎤
⎢1.0000 ⎥ = T ⎢ ⎥
_ D65 ⎢ G(WP) = 1 ⎥ .
⎢⎣ ⎥⎦
1.0888 ⎢⎣ B(WP) < 1 ⎥⎦
D65

Further details of normalisations used in digital cameras and raw converters are
described in sections 4.8.2 and 4.9.

4.5 Output-referred colour spaces

Now that a linear transformation has been defined between the camera raw space
and the CIE XYZ reference colour space, a further transformation is needed from
CIE XYZ to an output-referred colour space designed for viewing images on
standard displays.
Recall that the colours contained within a colour space define its gamut. Output-
referred RGB colour spaces are additive. This means that the tristimulus values are
non-negative and the gamut is defined by all chromaticities contained within a
triangle connecting the primaries. Accordingly, an output-referred RGB colour

4-30
Physics of Digital Photography (Second Edition)

space can be represented by a cube using a 3D Cartesian coordinate system. The

sRGB colour cube will be described in section 4.10.2.
Figure 4.13 uses the xy chromaticity diagram to show the gamuts of three well-
known output-referred RGB colour spaces. The sRGB colour space [25] is widely
used as its gamut is comparable with that of a standard display monitor.
Furthermore, sRGB is the colour space used by the internet. Adobe® RGB [26] is
suited for displaying images on wide-gamut display monitors, and ProPhoto RGB
[27] (also known as ROMM RGB) is suited for high-quality printing. Notice that
ProPhoto RGB uses imaginary primaries and so its gamut cannot be shown in its
entirety on a conventional three-channel display monitor.
Unlike reference colour spaces, output-referred colour spaces include a nonlinear
gamma curve applied to the colour channels. Since luminance must be presented to
the viewer in a linear manner as it is in nature, this encoding nonlinearity is cancelled
out by a compensating decoding nonlinearity provided by the display. As discussed
in chapter 2, the purpose of gamma encoding and decoding is to more efﬁciently
space the DOLs when reducing bit depth down to 8 so as to avoid visible banding
artefacts on 8-bit displays. In other words, gamma encoding and decoding only

Figure 4.13. Gamuts of the sRGB, Adobe® RGB and ProPhoto RGB output-referred colour spaces. The
reference white of sRGB and Adobe RGB is CIE illuminant D65, and the reference white of ProPhoto RGB is
CIE illuminant D50. The grey horseshoe-shaped region deﬁnes all (x, y ) chromaticities visible to the 2°
standard colourimetric observer.

4-31
Physics of Digital Photography (Second Edition)

relates to image quality and not colour conversion. Up until section 4.10, the present
chapter deals with the linear forms of the output-referred colour spaces.

4.5.1 sRGB colour space: linear form

The sRGB colour space standardised by IEC 61966-2-1:1999 [25] is defined by the
ITU-R BT.709 primaries, which are also used by high-definition television. The
sRGB colour space uses primaries corresponding to chromaticities in each of the
red, green and blue regions of the visible spectrum. The (x , y ) chromaticity
coordinates of the red, green, and blue primaries are, respectively, defined as
(0.64, 0.33), (0.30, 0.60), (0.15, 0.06).
These primaries are less saturated than pure spectrum colours, and figure 4.14 shows
their location on the xy chromaticity diagram. Since the sRGB colour space is
additive with non-negative tristimulus values, it contains all chromaticities con-
tained within the triangular region connecting the primaries. These chromaticities
have been shown in colour on figure 4.14 as most of the sRGB gamut can be shown
correctly on a standard monitor.

Figure 4.14. The coloured region deﬁned by the ITU-R BT.709 primaries shows the chromaticities represented
by the sRGB colour space. The grey horseshoe-shaped region deﬁnes all chromaticities visible to the 2°
standard colourimetric observer.

4-32
Physics of Digital Photography (Second Edition)

In this book, the tristimulus values of the linear form of the sRGB colour space
are denoted using an ‘L’ subscript in order to distinguish them from tristimulus
values of the ﬁnal nonlinear form of the sRGB colour space discussed in section
4.10.
Using relative colourimetry with tristimulus values normalised to the range [0,1],
the primaries of the sRGB colour space are deﬁned by

⎡ RL ⎤ ⎡1 ⎤ ⎡0⎤ ⎡0⎤
⎢ ⎥ ⎢ ⎥ ⎢1 ⎥ , and ⎢ 0 ⎥ .
⎢G L ⎥ = ⎢ 0 ⎥ , ⎢⎣ ⎥⎦ ⎢⎣ ⎥⎦
⎣ BL ⎦ ⎣ 0 ⎦ 0 1

The reference white obtained by adding a normalised unit of each primary is CIE
illuminant D65,
⎡ RL(WP) ⎤ ⎡1⎤
⎢ ⎥
⎢G L(WP)⎥ = ⎢1⎥ .
⎢⎣ ⎥⎦
⎢⎣ BL(WP) ⎥⎦ 1
D65

4.5.2 CIE XYZ D65 to sRGB D65

For scene illumination with a D65 white point, the transformation from CIE XYZ
to linear sRGB is deﬁned by

⎡ RL ⎤ ⎡X ⎤
⎢ ⎥ −1 ⎢ ⎥
G
⎢ ⎥L = M
_ sRGB Y ,
⎢⎣ ⎥⎦
⎣ BL ⎦D65 Z D65

where

⎡ 3.2406 − 1.5372 − 0.4986 ⎤

M −1
_ sRGB = ⎢− 0.9689 1.8758 0.0415 ⎥ . (4.13)
⎢⎣ ⎥
0.0557 − 0.2040 1.0570 ⎦

Relative luminance Y is normalised to the range [0,1], and any resulting RL , G L , BL

values lying outside the range [0,1] are clipped to 0 or 1. The reference white is
obtained as

⎡1⎤ ⎡ X (WP) = 0.9504 ⎤

⎢1⎥ = M − 1 ⎢ ⎥
_ sRGB⎢Y (WP) = 1.0000 ⎥ .
⎢⎣ ⎥⎦
1 ⎢⎣ Z (WP) = 1.0888 ⎥⎦
D65

The reverse transformation from linear sRGB to CIE XYZ is deﬁned by

4-33
Physics of Digital Photography (Second Edition)

⎡X ⎤ ⎡ RL ⎤
⎢Y ⎥ ⎢ ⎥
=M
_ sRGB⎢G L ⎥ ,
⎢⎣ ⎥⎦
Z D65 ⎣ BL ⎦ D65

with

⎡ 0.4124 0.3576 0.1805 ⎤

_ sRGB = ⎢ 0.2126 0.7152 0.0722 ⎥ .
M (4.14)
⎢⎣ ⎥
0.0193 0.1192 0.9505 ⎦

For reasons discussed in section 4.6, the above transformations need to be

appropriately white balanced when dealing with other scene illumination white
points.

4.5.3 Raw D65 to sRGB D65

It is possible to directly transform from the camera raw space to sRGB by
combining the raw to CIE XYZ and CIE XYZ to sRGB transformations.
For scene illumination with a D65 white point, the optimum matrix that
transforms from the camera raw space to CIE XYZ is obtained from the camera
colour characterisation procedure described in section 4.4 performed using the D65
standard illuminant,

⎡X ⎤ ⎡R⎤
⎢Y ⎥ _ D65 ⎢ G ⎥
=T .
⎢⎣ ⎥⎦ ⎢ ⎥
Z D65 ⎣B ⎦ D65

The transformation from CIE XYZ to the linear form of the sRGB colour space is
deﬁned by

⎡ RL ⎤ ⎡X ⎤
⎢ ⎥ −1 ⎢ ⎥
⎢G L ⎥ =M
_ sRGB Y .
⎢⎣ ⎥⎦
⎣ BL ⎦D65 Z D65

The subscripts in both of the above equations indicate that the scene illumination
has a D65 white point. Combining the equations yields the required transformation
from the camera raw space to sRGB,

⎡ RL ⎤ ⎡R⎤
⎢ ⎥ −1 ⎢ ⎥
G
⎢ L⎥ = M
_ sRGB D65 ⎢ G ⎥
T
_ .
⎣ BL ⎦D65 ⎣ ⎦
B D65

4-34
Physics of Digital Photography (Second Edition)

Normalisation
By considering the raw pixel vector corresponding to the scene illumination white
point (i.e. a 100% neutral diffuse reﬂector photographed under the scene illumina-
tion), the T_ D65 matrix can be normalised according to section 4.4.4 so that the
maximum green raw tristimulus value maps to Y = 1,
⎡ 0.9504 ⎤ ⎡ R(WP) < 1⎤
⎢1.0000 ⎥ = T ⎢ ⎥
_ D65 ⎢ G(WP) = 1 ⎥ .
⎢⎣ ⎥
1.0888 ⎦ ⎢⎣ B(WP) < 1 ⎥⎦
D65

Since the reference white of the sRGB colour space is D65, the total transformation
is normalised as follows:
⎡1⎤ ⎡ R(WP) < 1⎤
⎢1⎥ = M ⎢ ⎥
−1
_ D65 ⎢ G(WP) = 1 ⎥
_ sRGB T .
⎢⎣ ⎥⎦
1 ⎢⎣ B(WP) < 1 ⎥⎦
D65

4.6 White balance

A remarkable property of the HVS is its ability to naturally adjust to the ambient
lighting conditions. For example, consider a 100% neutral diffuse reflector placed in
a photographic scene. In daylight conditions, the reflector appears to be neutral
white. Later in the day when there is a change in the chromaticity and CCT of the
scene illumination, the colour of the reflector would be expected to change
accordingly. However, the reflector will continue to appear neutral white. In other
words, the perceived colour of objects remains relatively constant under varying
types of scene illumination. This is known as colour constancy.
The mechanism by which the HVS achieves colour constancy is known as
chromatic adaptation. Although adaptation mechanisms of the HVS are complex,
a simplistic way of thinking about chromatic adaptation is that the HVS aims to
discount the chromaticity of the illuminant [28]. Back in 1902, von-Kries postulated
that this is achieved by an independent scaling of each of the eye cone response
functions. The colour stimulus that an observer adapted to the ambient conditions
considers to be neutral white (perfectly achromatic with 100% relative luminance) is
defined as the adapted white [29].
Camera response functions do not naturally emulate the HVS by discounting the
chromaticity of the scene illumination. Consequently, an output image will appear
too warm or too cold if it is displayed using illumination with a white point that does
not match the adapted white for the photographic scene at the time the photograph
was taken. This is known as incorrect white balance (WB). A solution to the problem
of achieving correct WB can be summarised as follows:
1. Perform illuminant estimation by informing the camera about the adapted
white before taking a photograph. For example, a WB preset corresponding
to the scene illumination can be manually selected. Alternatively, the camera

4-35
Physics of Digital Photography (Second Edition)

can compute its own estimate by analysing the raw data using the automatic
WB function. In all cases, the camera estimate for the adapted white is
known as the camera neutral [30] or adopted white (AW) [29]. The AW can be
regarded as an estimate of the scene illumination white point, which for
simplicity is assumed to correspond to the adapted white.
2. Choose a standard illumination white point that will replace the AW when
the image is encoded. This is deﬁned as the encoding white and is typically the
reference white of the selected output-referred colour space, which is
illuminant D65 in the case of the sRGB colour space.
3. Perform the AW replacement by applying a chromatic adaptation transform
(CAT) to the raw data in order to adapt the AW to the encoding white.
4. In combination with step 3, convert the camera raw space into the linear
form of the selected output-referred colour space. Subsequently, the image
can be encoded, as described in chapter 2.
5. View the image on a calibrated display monitor. In the case of an image
encoded using the sRGB colour space, a scene object that appeared to be
white at the time the photograph was taken will now be displayed using the
D65 reference white. Ideally, the ambient viewing conditions should match
those deﬁned as appropriate for viewing the sRGB colour space.

In summary, correct WB can be achieved by identifying the scene illumination white

point and adapting it to the reference white of the output-referred colour space that
will be used to display the image. In the case of sRGB, the required transformation
that must be applied to every raw pixel vector can in general be written in the
following manner:
⎡ RL ⎤ ⎡R⎤
⎢ ⎥ −1 ⎢G ⎥
G
⎢ ⎥ = M
_ CAT T
_ . (4.15)
L sRGB AW→D65
⎢ ⎥
⎣ L ⎦D65
B ⎣ B ⎦scene

• The matrix T _ is a transformation matrix optimised for the AW. It converts

from the camera raw space to CIE XYZ.
• The matrix CAT AW→D65 is a CAT that adapts the AW to the reference white
of the output-referred colour space, which is D65 for sRGB.
• Finally, M −1
_ sRGB is the CIE XYZ to sRGB matrix deﬁned by equation
(4.13).

Further details of the above approach, labelled ‘Strategy 1’, are given in section 4.7.
This is the type of approach used by various external raw converters and modern
smartphone cameras.
However, traditional digital camera manufacturers typically reformulate equa-
tion (4.15) in the following way:

4-36
Physics of Digital Photography (Second Edition)

⎡ RL ⎤ ⎡R⎤
⎢ ⎥
⎢G L ⎥ _ _⎢G ⎥
= RD . (4.16)
⎢ ⎥
⎣ BL ⎦D65 ⎣ B ⎦scene

• The matrix D _ is a diagonal matrix containing raw channel multipliers

appropriate for the AW. These multipliers are applied to the raw channels
before the colour demosaic and they serve to adapt the AW to the reference
white of the camera raw space deﬁned by the unit vector in the camera raw
space.
• The matrix R_ is a colour rotation matrix that can be derived from the matrices
of equation (4.15). It transforms directly from the camera raw space to the
chosen output-referred colour space. Unlike the conventional transformation
matrix T_ , a colour rotation matrix includes a built-in CAT and has the
property that each of its rows sum to unity. Consequently, the camera raw
space reference white is adapted to the encoding white, which is typically the
reference white of the chosen output-referred colour space. This is D65 in the
case of sRGB.

The above approach, labelled ‘Strategy 2’, has several advantages over strategy 1.
Further details are given in section 4.8. Practical examples of strategies 1 and 2 are
given in the sections 4.8 and 4.9.

4.6.1 Adopted white

A mentioned above, the camera estimate for the adapted white is known as the
camera neutral or AW. Rather than use a scene illumination preset to specify the
AW, an alternative approach is to photograph a neutral card under the scene
illumination. The camera can then determine the scene illumination white point
accurately by analysing the raw data, and this is assumed to be the adapted white.
If the AW is instead calculated using the automatic WB function, the camera
needs to use its own algorithms to determine the AW, typically by analysing the raw
data. A very simple approach is the ‘grey world’ method, which assumes that the
average of all scene colours will turn out to be achromatic. Another very simple
approach is the ‘brightest white’ method, which assumes that the brightest white is
likely to correspond to the scene illumination white point, which is assumed to be the
adapted white. However, practical algorithms are much more sophisticated. Note
that calculating the AW is known as illuminant estimation. This should not be
confused with WB, which is the adaptation of the AW to the encoding white.
The outcome from all of the above approaches will be a set of three raw values
that deﬁne the AW. These can be labelled as R(AW), G(AW), B(AW). Since the
colour demosaic has not yet been performed, these values can be written as a vector
representing a Bayer block rather than individual raw pixels.

4-37
Physics of Digital Photography (Second Edition)

Following the description of how to convert from the camera raw space to the
CIE XYZ colour space given in section 4.4.2, the AW can be speciﬁed using XYZ
tristimulus values by applying the camera transformation matrix T
_,
⎡ X (AW)⎤ ⎡ R(AW)⎤
⎢ ⎥ ⎢ ⎥
⎢Y (AW) ⎥ _ ⎢ G(AW) ⎥
=T
⎢⎣ Z (AW) ⎥⎦ ⎢⎣ B(AW) ⎥⎦
scene scene

• The ‘scene’ subscript denotes the true white point of the scene illumination.
The X(AW), Y(AW), Z(AW) are the camera estimates of the scene illumi-
nation white point, which is assumed to correspond to the adapted white.
• The transformation matrix T _ should ideally be obtained from a character-
isation performed using illumination with a white point that matches the AW.
• Since the green raw tristimulus value is typically the ﬁrst to reach its
maximum value under most types of illumination, in general G(AW) = 1
while R(AW) < 1 and B(AW) < 1.

After X(AW), Y(AW), Z(AW) are known, equation (4.9) can be used to calculate
the (x , y ) chromaticity coordinates of the AW.
Using the methods described in section 4.2.2, the AW can alternatively be
specified in terms of a CCT and tint provided the chromaticity lies within a tolerated
distance from the Planckian locus.
WB will be incorrect if the AW (scene illumination CCT estimate) is not
sufficiently close to the true adapted white. An example of incorrect WB is shown
in figure 4.15. In each case, the same raw data has been white balanced for the sRGB
colour space using a different AW. Remembering that a blackbody appears red at
low colour temperatures and blue at high colour temperatures, the upper image
assumes a scene illumination CCT lower than the true value. Consequently, the
image appears to be bluer or colder than expected. Conversely, the lower image
assumes a scene illumination CCT higher than the true value, and so the image
appears to be redder or warmer than expected. The central image shows correct WB
obtained using the CCT calculated by the in-camera auto-WB function.

4.6.2 Chromatic adaptation transforms

A chromatic adaptation transform (CAT) is a computational technique for adjusting
the white point of a given SPD. It achieves this goal by attempting to mimic the
chromatic adaptation mechanism of the HVS.
In the context of digital cameras, the two most important CATs are raw channel
scaling and the Bradford CAT. Although not used in digital cameras, it is instructive
to ﬁrst describe the von-Kries CAT.

von-Kries CAT
Back in 1902, von-Kries postulated that the chromatic adaptation mechanism of the
HVS can be modelled as an independent scaling of each of the eye cone response

4-38
Physics of Digital Photography (Second Edition)

Figure 4.15. Illustration of white balance.

4-39
Physics of Digital Photography (Second Edition)

functions, or equivalently the L, M and S values in the LMS colour space introduced
in section 4.1.2.
Consider the raw data represented using the CIE XYZ colour space. In the
following example, the aim is to transform the raw data so that the scene
illumination white point is adapted to D65,
⎡X ⎤ ⎡X ⎤
⎢Y ⎥ = CAT AW→D65⎢Y ⎥ .
⎢⎣ ⎥⎦ ⎢⎣ ⎥⎦
Z D65 Z scene
In this case the von-Kries CAT can be written
⎡ L(D65) ⎤
⎢ 0 0 ⎥
⎢ L(AW) ⎥
−1
⎢ M (D65) ⎥
_ vK⎢ 0
CAT AW→D65 = M 0 ⎥M _ vK.
⎢ M (AW) ⎥
⎢ S (D65) ⎥
⎢ 0 0
S (AW) ⎥⎦
⎣
The matrix M
_ vK transforms each raw pixel vector into a diagonal matrix in the LMS
colour space. The modern form of the transformation matrix M _ vK is the Hunt–
Pointer–Estevez transformation matrix [31] deﬁned as
⎡ 0.38971 0.68898 − 0.07868⎤
M
_ vK = ⎢− 0.22981 1.18340 0.04641⎥ .
⎢⎣ ⎥
0.00000 0.00000 1.00000 ⎦
After applying M_ vK , the L, M, S values are independently scaled according to the
von-Kries hypothesis. In the present example, the scaling factors arise from the ratio
between the AW and D65 white points. These can be obtained from the following
white point vectors:
⎡ L(AW) ⎤ ⎡ X (AW)⎤
⎢ ⎥ ⎢ ⎥
⎢ M (AW)⎥ = M_ vK⎢Y (AW) ⎥
⎢⎣ S (AW) ⎥⎦ ⎢⎣ Z (AW) ⎥⎦
scene
⎡ L(D65) ⎤ ⎡ X (WP) = 0.9504 ⎤
⎢ ⎥ ⎢ ⎥
⎢ M (D65)⎥ = M_ vK⎢Y (WP) = 1.0000 ⎥ .
⎢⎣ S (D65) ⎥⎦ ⎢⎣ Z (WP) = 1.0888 ⎥⎦
D65

Finally, the inverse of the transformation matrix M

_ vK is applied in order to convert
each raw pixel vector back into the CIE XYZ colour space.

Linear Bradford CAT

The Bradford CAT [32] can be regarded as an improved version of the von-Kries
CAT. A simpliﬁed and linearised version is recommended by the ICC for use in
digital imaging [1].

4-40
Physics of Digital Photography (Second Edition)

The linear Bradford CAT can be implemented in an analogous fashion as the

von-Kries CAT, the difference being that the L, M, S tristimulus values are replaced
by ρ, γ, β, which correspond to a ‘sharpened’ artiﬁcial eye cone space. The
transformation matrix is deﬁned by
⎡ 0.8951 0.2664 − 0.1614 ⎤
M ⎢
_ BFD = − 0.7502 1.7135 0.0367 ⎥ .
⎢⎣ ⎥
0.0389 − 0.0685 1.0296 ⎦

4.6.3 Raw channel multipliers

In analogy with the independent scaling of the eye cone response functions (or LMS
values) hypothesised by von-Kries, a type of CAT can be defined by directly scaling
the camera response functions (or raw channels) in the camera raw space.
Consider a Bayer block for the AW obtained by photographing a 100% neutral
diffuse reflector under the scene illumination. The following operation will adapt the
AW to the reference white of the camera raw space defined by the unit vector:
⎡ 1 ⎤
⎢ 0 0 ⎥
⎡ R = 1⎤ ⎢ R(AW) ⎥⎡ R(AW)⎤
⎢ G = 1⎥ ⎢ 1 ⎥⎢ ⎥
=⎢ 0 0 ⎥⎢ G(AW) ⎥ .
⎢ ⎥ G(AW)
⎣ B = 1 ⎦reference ⎢ ⎥⎢⎣ B(AW) ⎥⎦
⎢ 1 ⎥ scene
⎢⎣ 0 0
B(AW) ⎥⎦

The same scaling must be applied to all Bayer blocks,

⎡ 1 ⎤
⎢ 0 0 ⎥
⎡R⎤ ⎢ R(AW) ⎥⎡ ⎤
⎢G ⎥ ⎢ 1 ⎥ R
=⎢ 0 0 ⎥⎢ G ⎥ .
⎢ ⎥ G(AW) ⎢ ⎥
⎣ B ⎦reference ⎢ ⎥⎣ B ⎦scene
⎢ 1 ⎥
⎢⎣ 0 0
B(AW) ⎥⎦

Since the colour demosaic has not yet been performed, the above vectors represent
Bayer blocks rather than raw pixels. Equivalently, the scaling factors can be applied
directly to the raw channels.
The diagonal scaling factors are known as raw WB multipliers or raw channel
multipliers. They can be obtained directly from the AW calculated by the camera,
⎡ R(AW)⎤
⎢ ⎥
⎢ G(AW) ⎥ .
⎢⎣ B(AW) ⎥⎦
scene

4-41
Physics of Digital Photography (Second Edition)

4.7 Strategy 1: transformation matrices + CAT

Strategy 1 is based upon the type of CATs used in colour science. This is the type of
approach to colour conversion typically used by commercial raw converters and
smartphone camera image processing engines.
Consider the white-balanced transformation from the camera raw space to an
output-referred RGB colour space. In the case of sRGB, the transformation that
must be applied to every raw pixel vector is deﬁned by equation (4.15),
⎡ RL ⎤ ⎡R⎤
⎢ ⎥ −1 ⎢G ⎥
⎢G L ⎥ =M
_ sRGB CAT AW→D65 T
_ .
⎢ ⎥
⎣ BL ⎦D65 ⎣ B ⎦scene

In particular, the AW is adapted to the encoding white, which is the output-referred

colour space reference white,
⎡1⎤ ⎡ R(AW)⎤
⎢1⎥ = M ⎢ ⎥
⎢ G(AW) ⎥
−1
_ sRGB CAT AW→D65 T
_ .
⎢⎣ ⎥⎦
1 ⎢⎣ B(AW) ⎥⎦
scene

Recall the following:

• The matrix T _ is a transformation matrix optimised for the AW that converts
from the camera raw space to CIE XYZ.
• The matrix CAT AW→D65 is a CAT that adapts the AW to the encoding white,
which is D65 for sRGB. The ICC recommends using the Bradford CAT
deﬁned in section 4.6.2.
• The M _ sRGB inverse is the CIE XYZ to sRGB matrix deﬁned by equation
(4.13).

Although the above matrix transformation appears to be straightforward, the

transformation matrix T _ should in principle be optimised for scene illumination
with a white point matching the AW. However, it is impractical to perform multiple
camera colour characterisations using every possible illumination white point. There
are two main solutions to this issue.
1. Determine a small set of transformation matrices covering a range of CCTs,
with each matrix optimised for a particular CCT. When the CCT of the AW
is calculated, the transformation matrix optimised for the closest-matching
CCT can be selected.
2. Interpolate between two preset transformation matrices that are optimised
for use with either a low CCT or high CCT illuminant. For a given AW, a
numerical algorithm can determine an interpolated matrix optimised for a
CCT that matches that of the AW.

The interpolation approach is used by the Adobe® DNG converter, and full details
are given in section 4.9.

4-42
Physics of Digital Photography (Second Edition)

• D
_ is a diagonal matrix containing raw channel multipliers appropriate for the
AW.
• R
_ is the colour rotation matrix optimised for the scene illumination. It is
algebraically deﬁned as
−1 (4.17)
R
_ =M
_ sRGB _ _ −1.
CAT AW→D65 TD
Each row of this matrix sums to unity.

Since equations (4.15) and (4.16) are white balanced, the raw pixel vector
corresponding to the AW is mapped to the sRGB reference white,
⎡ RL = 1⎤ ⎡ R(AW)⎤
⎢ ⎥ ⎢ ⎥
⎢G L = 1⎥ _ _ ⎢ G(AW) ⎥
= RD .
⎣ BL = 1⎦D65 ⎢⎣ B(AW) ⎥⎦
scene

The WB can be understood by decomposing the transformation into two steps:

1. The diagonal matrix D _ containing raw channel multipliers appropriate for
the scene illumination is applied to the raw channels,
⎡R⎤ ⎡R⎤
⎢G ⎥ _⎢G ⎥
=D
⎢ ⎥ ⎢ ⎥
⎣ B ⎦reference ⎣ B ⎦scene
where
⎡ 1 ⎤
⎢ 0 0 ⎥
⎢ R(AW) ⎥
⎢ 1 ⎥
_ =⎢
D 0 0 ⎥.
⎢ G(AW) ⎥
⎢ 1 ⎥
⎢⎣ 0 0
B(AW) ⎥⎦

4-43
Physics of Digital Photography (Second Edition)

In particular, these serve to adapt the AW to the reference white of the

camera raw space,
⎡ R = 1⎤ ⎡ R(AW)⎤
⎢ G = 1⎥ ⎢ ⎥
_ ⎢ G(AW) ⎥
=D .
⎢ ⎥
⎣ B = 1 ⎦reference ⎢⎣ B(AW) ⎥⎦
scene

2. After performing the colour demosaic, the colour rotation matrix R_ is

applied to the demosaiced raw data in order to transform to sRGB,
⎡ RL ⎤ ⎡R⎤
⎢ ⎥ ⎢G ⎥
G
⎢ L⎥ = R
_ .
⎢ ⎥
⎣ BL ⎦D65 ⎣ B ⎦reference

In particular, the camera raw space reference white is mapped to the sRGB
reference white,
⎡ RL = 1⎤ ⎡ R = 1⎤
⎢ ⎥
⎢G L = 1⎥ _ ⎢ G = 1⎥
=R .
⎢ ⎥
⎣ BL = 1⎦D65 ⎣ B = 1 ⎦reference

This proves that each row of R

_ sums to unity.

4.8.1 Traditional digital cameras

Manufacturers of traditional digital cameras typically utilise Strategy 2 for several
reasons. These include:
• The raw channel multipliers can be applied to the raw channels before the
colour mosaic. This results in a demosaic of better quality [12].
• Performing chromatic adaptation by scaling the raw channels has been found
to work better in practice for extreme cases [30].
• The raw channel multipliers are stored in the proprietary raw file metadata
and do not affect the raw data. They are applied by the internal JPEG engine
and can also be utilised by external raw conversion software provided by the
camera manufacturer.
• If desired, part of the raw channel scaling can be carried out in the analog
domain using analog amplification, which is beneficial for image quality if the
ADC does not have a high bit depth. This will affect the input to output-
referred unit conversion factors gi defined in chapter 3.
• It is computationally advantageous to decouple the raw channel scaling and
colour space conversion as this enables WB and colour space conversion to be
straightforwardly implemented on fixed-point number architecture using the
strategy discussed below.

Recall the algebraic deﬁnition of the colour rotation matrix optimised for the scene
illumination,

4-44
Physics of Digital Photography (Second Edition)

−1
R
_ =M
_ sRGB _ _ −1.
CAT AW→D65 TD
In principle, T_ should be obtained from a characterisation performed using an
illuminant with a white point that matches the AW. This means that a different
optimised colour rotation matrix can be defined for every possible scene illumination
white point.
In practice, sufficient accuracy is achieved by defining a small fixed set of colour
rotation matrices, each optimised for use with a particular camera WB preset. This is
due to the fact that the variation of colour rotation matrices with respect to CCT is
very small. The procedure is as follows:
1. The camera determines the AW and associated raw channel multipliers
appropriate for the scene illumination. The diagonal WB matrix D _ is then
applied to the raw channels. In particular, this adapts the AW to the camera
raw space reference white.
2. The camera chooses the preset rotation matrix optimised for illumination
with a white point that provides the closest match to the AW.
3. The camera applies the chosen preset colour rotation matrix to convert to the
output-referred colour space. In particular, the camera raw space reference
white will be mapped to the output-referred colour space reference white.

As an example, table 4.2 lists raw channel multipliers and colour rotation matrices
for the preset WB settings found in the raw metadata of the Olympus® E-M1 digital
camera. The listed values are 8-bit ﬁxed point numbers used by the internal
processor and so need to be divided by 256.
Notice that the custom WB preset has been set to 6400 K. In this case the raw WB
multipliers are found to be
⎡ 2.1719 0 0 ⎤
_ 6400 K = ⎢ 0
D 1 0 ⎥ .
⎢⎣ ⎥
0 0 1.4922 ⎦6400 K

Table 4.2. Raw channel multipliers and raw-to-sRGB colour rotation matrices corresponding to in-camera
preset CCTs for the Olympus® E-M1 with 12-40 f/2.8 lens and v4.1 ﬁrmware. These raw metadata values can
be divided by 256 as they are 8-bit ﬁxed-point numbers used by the internal processor. The third and fourth
columns of the ‘Raw WB multipliers’ column represent the two green mosaics.

CCT Description Raw WB multipliers Colour rotation matrix

3000 K Tungsten 296 760 256 256 324 −40 −28 −68 308 16 16 −248 488
4000 K Cool fluorescent 492 602 256 256 430 −168 −6 −50 300 6 12 −132 376
5300 K Fine weather 504 434 256 256 368 −92 −20 −42 340 −42 10 −140 386
6000 K Cloudy 544 396 256 256 380 −104 −20 −40 348 −52 10 −128 374
7500 K Fine weather, shade 588 344 256 256 394 −116 −22 −38 360 −66 8 −112 360
5500 K Flash 598 384 256 256 368 −92 −20 −42 340 −42 10 −140 386
6400 K Custom 556 382 256 256 380 −104 −20 −40 348 −52 10 −128 374

4-45
Physics of Digital Photography (Second Edition)

The camera has chosen to use the 6000 K preset rotation matrix in conjunction with
the 6400 K raw channel multipliers

⎡ ⎤ ⎡ ⎤
1 ⎢ 380 − 104 − 20 ⎥ ⎢ 1.4844 − 0.4063 − 0.0781⎥
R
_ 6000 K = − 40 348 − 52 = − 0.1563 1.3594 − 0.2031 . (4.18)
256 ⎢⎣ ⎥ ⎢ ⎥
10 − 128 374 ⎦ ⎣ 0.0391 − 0.5000 1.4609 ⎦

For a given camera model, the preset rotation matrices and raw channel multipliers
are dependent on factors such as
• The output-referred colour space selected by the photographer in the camera
settings, for example, sRGB or Adobe® RGB.
• The lens model used to take the photograph.
• The camera used to take the photograph. Due to sensor calibration differ-
ences between different examples of the same camera model, the listed matrix
may differ even if the same settings are selected on different examples and the
same ﬁrmware is installed.

4.8.2 dcraw
The widely used open-source raw converter ‘dcraw’ by default outputs directly to the
sRGB colour space with a D65 encoding illumination white point by utilising a
variation of Strategy 2.
Recall that the colour rotation matrix optimised for use with the scene
illumination is deﬁned by equation (4.17),
−1
R
_ =M
_ sRGB _ _ −1.
CAT AW→D65 TD

Also recall from the previous section that digital cameras typically use a small set of
preset rotation matrices optimised for a selection of preset CCTs. Alternatively,
numerical methods can be applied to interpolate between two preset rotation
matrices that are optimised for use with either a low CCT or high CCT illuminant.
In contrast, dcraw takes a very computationally simple approach by using only a
single rotation matrix optimised for scene illumination with a D65 white point. In
other words, R_ ≈R _ D65, where
−1 −1
(4.19)
R
_ D65 = M
_ sRGB T
_ D65 D
_ D65 .

• The matrix T_ D65 corresponds to a characterisation performed using the D65

standard illuminant, as described in section 4.4.2.
• The diagonal WB matrix D _ D65 contains raw channel multipliers appropriate
for illumination with a D65 white point:

4-46
Physics of Digital Photography (Second Edition)

⎡ 1 ⎤
⎢ 0 0 ⎥
⎢ R(D65) ⎥
⎢ 1 ⎥
_ D65 = ⎢
D 0 0 ⎥. (4.20)
⎢ G(D65) ⎥
⎢ 1 ⎥
⎢⎣ 0 0
B(D65) ⎥⎦

The overall transformation from the camera raw space to sRGB is deﬁned by
⎡ RL ⎤ ⎡R⎤
⎢ ⎥
⎢G L ⎥ ≈R _⎢G ⎥
_ D65 D .
⎢ ⎥
⎣ BL ⎦D65 ⎣ B ⎦scene

Notice that D_ appropriate for the AW is applied to the raw data and so the correct
raw channel multipliers appropriate for the scene illumination are applied. Although
the colour transformation matrix T_ D65 is optimised for scene illumination with a D65
white point only, the colour rotation matrix R _ D65 is perfectly valid for transforming
from the camera raw space to sRGB. In fact, R _ D65 becomes exact when the AW is
D65. However, the overall colour transformation loses accuracy when the scene
illumination CCT differs signiﬁcantly from 6500 K.

Example: Olympus E-M1

Equation (4.19) requires a T _ D65 transformation matrix for each camera model
obtained using a camera colour characterisation with WB disabled, and dcraw uses
the Adobe® ‘ColorMatrix2’ matrices provided by the Adobe® DNG converter. The
Adobe® matrices map in the opposite direction to the transformation matrices
deﬁned in section 4.4.2, and therefore
_ D65 ∝ ColorMatrix2−1.
T
As an example, dcraw stores the ColorMatrix2 entries for the Olympus® E-M1
digital camera in the following manner:
7687, −1984, −606, −4327, 11928, 2721, −1381, 2339, 6452.
Dividing by 10 000 and rearranging in matrix form yields
⎡ 0.7687 − 0.1984 − 0.0606 ⎤ −1
_ D65 ∝ ⎢− 0.4327 1.1928
T 0.2721 ⎥ . (4.21)
⎢⎣ ⎥
− 0.1381 0.2339 0.6452 ⎦

Although the Adobe® ColorMatrix2 matrices are obtained from a characterisation

performed using the D65 standard illuminant, they are normalised so that the
maximum green raw tristimulus value for a 100% neutral diffuse reﬂector under D50
illumination maps to the D50 white point in the CIE XYZ colour space,

4-47
Physics of Digital Photography (Second Edition)

⎡ R < 1⎤ ⎡ X (WP) = 0.9641⎤

⎢ G = 1⎥ −1 ⎢ ⎥
=T
_ D65⎢Y (WP) = 1.0000 ⎥ .
⎢ ⎥
⎣ B < 1 ⎦D50 ⎢⎣ Z (WP) = 0.8249 ⎥⎦
D50

Accordingly, they need to be rescaled to match the normalisation described in

section 4.4.4,

⎡ R < 1⎤ ⎡ X (WP) = 0.9504 ⎤

⎢ G = 1⎥ −1 ⎢ ⎥
=T
_ D65⎢Y (WP) = 1.0000 ⎥ .
⎢ ⎥
⎣ B < 1 ⎦D65 ⎢⎣ Z (WP) = 1.0888 ⎥⎦
D65

In the present example, the inverse of the T

_ D65 matrix deﬁned by equation (4.21)
needs to be divided by 1.0778,

⎡ 0.7133 − 0.1841 − 0.0562 ⎤ −1

_ D65 = ⎢− 0.4015 1.1068
T 0.2525 ⎥ .
⎢⎣ ⎥
− 0.1281 0.2170 0.5987 ⎦

By considering the unit vector in the sRGB colour space, the above matrix can be
applied to obtain the raw tristimulus values for illumination with a D65 white point,
⎡ R(WP) = 0.4325⎤ ⎡ RL = 1⎤
⎢ ⎥ ⎢ ⎥
⎢ G(WP) = 1.0000 ⎥
−1
=T
_ D65 M
_ sRGB⎢G L = 1⎥
⎢⎣ B(WP) = 0.7471 ⎥⎦ ⎣ BL = 1⎦D65
D65

Now equation (4.20) can be used to extract the raw channel multipliers for scene
illumination with a D65 white point,

⎡ 2.3117 0 0 ⎤
_ D65 = ⎢ 0
D 1 0 ⎥ .
⎢⎣ ⎥
0 0 1.3385⎦D65

Finally, the colour rotation matrix can be calculated from equation (4.19),

⎡ 1.7901 − 0.6689 − 0.1212 ⎤

_ D65 = ⎢− 0.2167 1.7521 − 0.5354 ⎥ .
R
⎢⎣ ⎥
0.0543 − 0.5582 1.5039 ⎦

Each row sums to unity as required. The form of the matrix is similar to the example
Olympus matrix deﬁned by equation (4.18). However, the Olympus matrix
corresponds to a characterisation performed using a 6000 K illuminant rather
than 6500 K.

4-48
Physics of Digital Photography (Second Edition)

4.9 Adobe DNG

The Adobe® digital negative (DNG) file format is an open source raw format
developed by Adobe [30]. The freeware DNG Converter can be used to convert any
given raw file into the DNG format [33].
Although the DNG converter does not aim to produce a viewable output image,
it does perform a colour conversion from the camera raw space into the PCS based
on the CIE XYZ colour space with a D50 illumination white point [1].
Consequently, the colour processing model used by the DNG converter must
provide appropriate transformation matrices along with a strategy for achieving
correct WB.
The DNG specification provides two different methods referred to here as
Method 1 and Method 2. Method 1 is a variation of Strategy 1; it uses a
transformation matrix in conjunction with a CAT, but the raw data is stored in
the CIE XYZ colour space without converting to an output-referred colour space
such as sRGB. Method 2 is a variation of Strategy 2; raw channel multipliers are
applied to the raw channels but a so-called forward matrix is used instead of a colour
rotation matrix. The forward matrix replaces the rotation matrix as the mapping is
to the CIE XYZ colour space with a D50 illumination white point rather than an
output-referred RGB colour space. This means that, unlike a rotation matrix, the
rows of a forward matrix do not each sum to unity.
Derivations of Method 1 and Method 2 are given below. All linear RGB values in
the camera raw space are scaled to the range [0,1] after subtracting the dark level on
a per-pixel basis. Similarly, the CIE XYZ vectors are based on relative colourimetry
with Y normalised to the range [0,1].

4.9.1 Method 1: transformation matrix + CAT

The transformation from the camera raw space to the PCS is deﬁned as follows:

⎡X ⎤ ⎡R⎤
⎢Y ⎥ _ −1⎢ G ⎥
= CAT AW→D65 C . (4.22)
⎢⎣ ⎥⎦ ⎢ ⎥
Z D50 ⎣ B ⎦scene

• Here C_ is a transformation matrix optimised for the AW. The mapping is in

the direction from the CIE XYZ colour space to the camera raw space,
⎡R⎤ ⎡X ⎤
⎢G ⎥ _ ⎢Y ⎥
=C .
⎢ ⎥ ⎢ ⎥
⎣ B ⎦scene ⎣ ⎦
Z scene
This is the reverse direction to the transformation matrix T
_ deﬁned in section
4.4.2 so that
T _ −1.
_ ∝C
The reverse direction is necessary for highlight recovery logic [30].

4-49
Physics of Digital Photography (Second Edition)

• The matrix C _ is normalised so that the maximum raw tristimulus value

(typically green) for a 100% neutral diffuse reﬂector under the scene
illumination maps to the D50 white point in the CIE XYZ colour space:
⎡ R(AW) < 1⎤ ⎡ X (WP) = 0.9641⎤
⎢ ⎥ ⎢ ⎥
⎢ G(AW) = 1 ⎥ =C
_ ⎢Y (WP) = 1.0000 ⎥ .
⎢⎣ B(AW) < 1 ⎥⎦ ⎢⎣ Z (WP) = 0.8249 ⎥⎦
scene D50

Accordingly, equation (4.22) is correctly normalised.

• The CAT adapts the illumination white point from the AW to CIE illuminant
D50. The linear Bradford CAT described in section 4.6.2 is recommended by
the ICC.

As described in section 4.7, the implementation of equation (4.22) is complicated by

the fact that C
_ should be optimised for the AW, i.e. the camera estimate for the
adapted white or scene illumination white point. In the DNG speciﬁcation, the
optimised C_ matrix is determined by interpolating between two transformation
matrices.
The DNG speciﬁcation provides tags for two transformation matrices,
ColorMatrix1 and ColorMatrix2. The ColorMatrix1 matrix should be obtained
from a camera colour characterisation performed using a low-CCT illuminant such
as CIE illuminant A, and ColorMatrix2 from a characterisation performed using a
high-CCT illuminant such as CIE illuminant D65. Consistent with section 4.4.2,
these matrices are obtained from characterisations performed with the camera
digital WB disabled,
⎡R⎤ ⎡X ⎤
⎢ G ⎥ = ColorMatrix1⎢Y ⎥
⎢ ⎥ ⎢⎣ ⎥⎦
⎣ B ⎦(1) Z (1)
⎡R⎤ ⎡X ⎤
⎢ G ⎥ = ColorMatrix2⎢Y ⎥ .
⎢ ⎥ ⎢⎣ ⎥⎦
⎣ B ⎦(2) Z (2)

However, the Adobe transformation matrices are deﬁned in the reverse direction to
the transformation matrices of section 4.4.2,
_ (1) ∝ ColorMatrix1−1
T
_ (2) ∝ ColorMatrix2−1.
T

Here (1) and (2) refer to the low-CCT and high-CCT characterisation illuminants,
respectively. The DNG speciﬁcation uses linear interpolation based upon the inverse CCT.

Interpolation algorithm
The optimised transformation matrix C
_ is calculated by interpolating between
ColorMatrix1 and ColorMatrix2 based on the scene illumination CCT estimate

4-50
Physics of Digital Photography (Second Edition)

denoted by CCT(AW), together with the CCTs associated with each of the two
characterisation illuminants denoted by CCT(1) and CCT(2), respectively, with
CCT(1) < CCT(2). If only one matrix is included, then C _ will be optimal only if
CCT(AW) matches CCT(1) or CCT(2). To facilitate the interpolation, note that
ColorMatrix1 and ColorMatrix2 are by default normalized so that the WP of D50
illumination, rather than the AW, maps to the maximum raw tristimulus value.
The interpolation itself is complicated by the fact that the AW is calculated by the
camera in terms of raw values,
⎡ R(AW)⎤
⎢ ⎥
⎢ G(AW) ⎥ .
⎢⎣ B(AW) ⎥⎦
scene

Finding the corresponding CCT(AW) requires converting to CIE XYZ via a matrix
transformation C _ that itself depends upon the unknown CCT(AW). This problem
can be solved using a self-consistent iteration procedure.
1. Make a guess for the AW chromaticity coordinates, (x(AW), y(AW)). For
example, the chromaticity coordinates corresponding to one of the character-
isation illuminants could be used.
2. Convert (x(AW), y(AW)) to the corresponding (u(AW), v(AW)) chromatic-
ity coordinates of the 1960 UCS colour space using equation (4.11) of section
4.2.2.
3. Use Robertson’s method [8] or an approximate formula [9, 10] as described
in section 4.2.2 to determine a guess for CCT(AW).
4. Calculate the interpolation weighting factor α according to

(CCT(AW))−1 − (CCT(2))−1
α= . (4.23)
(CCT(1))−1 − (CCT(2))−1
5. Calculate the interpolated transformation matrix C
_,
C
_ = α ColorMatrix1 + (1 − α ) ColorMatrix2.
6. Convert the AW from the camera raw space to the CIE XYZ colour space,
⎡ X (AW)⎤ ⎡ R(AW)⎤
⎢ ⎥ −1⎢ ⎥
⎢Y (AW) ⎥ _ ⎢ G(AW) ⎥
=C ,
⎢⎣ Z (AW) ⎥⎦ ⎢⎣ B(AW) ⎥⎦
scene scene

and calculate a new guess for (x(AW), y(AW)) by applying equation (4.9).

7. Repeat the procedure until (x(AW), y(AW)), CCT(AW), and C

_ all converge
to a stable solution.

4-51
Physics of Digital Photography (Second Edition)

If CCT(AW) < CCT(1) or if only ColorMatrix1 is provided, then C

_ is set equal to
ColorMatrix1. Similarly, if CCT(AW) > CCT(2) or if only ColorMatrix2 is
provided, then C
_ is set equal to ColorMatrix2.

4.9.2 Method 2: raw channel multipliers + forward matrix

Recall the transformation used by Method 1 from the camera raw space to CIE
XYZ with a D50 illumination white point deﬁned by equation (4.22),
⎡X ⎤ ⎡R⎤
⎢Y ⎥ −1⎢ ⎥
= CAT AW→D65 C
_ G .
⎢⎣ ⎥⎦ ⎢ ⎥
Z D50 ⎣ B ⎦scene

Method 2 re-expresses the above transformation in the following way:

⎡X ⎤ ⎡R⎤
⎢Y ⎥ =F
_D_⎢G ⎥ .
⎢⎣ ⎥⎦ ⎢ ⎥
Z D50 ⎣ B ⎦scene

• The diagonal WB matrix D_ contains raw channel multipliers appropriate for

the AW. These adapt the AW to the reference white of the camera raw space,

⎡1⎤ ⎡ R(AW)⎤
⎢1⎥ = D ⎢ ⎥
_ ⎢ G(AW) ⎥ .
⎢⎣ ⎥⎦
1 ⎢⎣ B(AW) ⎥⎦
scene

• The forward matrix _F maps from the camera raw space to the PCS, i.e. the
CIE XYZ colour space with a D50 white point. This means that the reference
white of the camera raw space is adapted to the white point of D50
illumination,
⎡ X (WP) = 0.9641⎤ ⎡1⎤
⎢ ⎥
⎢Y (WP) = 1.0000 ⎥ _ ⎢1⎥ .
=F
⎢⎣ ⎥⎦
⎢⎣ Z (WP) = 0.8249 ⎥⎦ 1
D50

The forward matrix _F is obtained by interpolating between ForwardMatrix1 and

ForwardMatrix2, where

ForwardMatrix1 = CAT AW→D65 ColorMatrix1−1 D−1

(4.24)
ForwardMatrix2 = CAT AW→D65 ColorMatrix2−1D−1.

• The camera raw space reference white is adapted to the white point of D50
illumination.

4-52
Physics of Digital Photography (Second Edition)

• In order to determine _F , the linear interpolation method based on inverse

CCT previously described in section 4.9.1 should be used to determine the
CCT for the AW denoted by CCT(AW). Note that CCT(AW) is determined
using ColorMatrix1 and ColorMatrix2. Subsequently, _F is deﬁned by

F
_ = α ForwardMatrix1 + (1 − α ) ForwardMatrix2.

Again α is deﬁned by equation (4.23),

(CCT(AW))−1 − (CCT(2))−1
α= .
(CCT(1))−1 − (CCT(2))−1

4.10 sRGB colour space: nonlinear form

After the colour demosaic and white-balanced transformation from the camera raw
space to an output-referred colour space have been performed, further processing is
required to produce a viewable output digital image.
In-camera image-processing engines typically produce 8-bit JPEG output images.
As already discussed in section 2.2 of chapter 2, the occupied raw levels in the
output-referred colour space are converted into DOLs, and the bit-depth reduction
to 8 is performed in conjunction with gamma encoding and decoding in order to
minimize posterisation artefacts.
For preferred tone reproduction, traditional DSLR in-camera image-processing
engines apply a global s-shaped tone curve to the raw data before applying the
gamma encoding curve of the chosen output-referred colour space. These operations
are typically combined using a LUT so that a single tone curve is applied that is s-
shaped relative to the gamma encoding curve. Modern smartphone cameras
typically apply custom tone curves on an image-by-image basis and may use local
tone-mapping operators.
A global s-shaped tone curve reduces the raw DR transferred to the output image
to a value commensurate with a typical display, as described in section 2.3.2 of
chapter 2. Finally, the DOLs are transformed into the Y′CbCr representation for
encoding as a JPEG image, as described in section 2.13.1 of chapter 2.
For completeness, this section provides details of the standard encoding gamma
curve for the sRGB colour space.

4.10.1 sRGB digital output levels

The linear form of the sRGB colour space has been deﬁned in section 4.5.1. When
plotted on standard axes, the sRGB encoding gamma curve appears almost identical
to the γE = 1/2.2 power law curve shown in ﬁgure 4.16 used by the Adobe RGB
colour space. In fact, there are subtle differences since the gamma value is
not constant. For sRGB relative tristimulus normalised to the range [0,1], nonlinear

4-53
Physics of Digital Photography (Second Edition)

Figure 4.16. The blue line shows a power law gamma curve deﬁned by γE = 1/2.2, which appears as a straight
line when plotted using logarithmic axes. In contrast, the piecewise sRGB gamma curve shown in green has
constant gamma only below 0.003 130 8. Here V = RL , G L or BL and V ′ = R′, G′ or B′.

R′, G′, B′ values are obtained by applying a piecewise gamma curve in the following
way:
For RL , G L , BL ⩽ 0.0031308,
R′ = 12.92 RL
G′ = 12.92 G L (4.25)
B′ = 12.92 BL.
For RL , G L , BL > 0.0031308,

R′ = 1.055 R L1/2.4 − 0.055

G′ = 1.055 GL1/2.4 − 0.055 (4.26)
B′ = 1.055 BL1/2.4 − 0.055.

Subsequently, image DOLs can be obtained by scaling R′, G′, B′ to the range
[0, 2M − 1], where M is the required bit depth, and then quantizing to the nearest
integer:
′ = Round{(2M − 1)R′}
R DOL
′ = Round{(2M − 1)G′}
GDOL (4.27)
′ = Round{(2M − 1)B′} .
BDOL

In order to avoid numerical issues close to zero, notice that the sRGB piecewise
gamma curve has a linear portion with constant encoding gamma γE = 1 below
0.0031308.

4-54
Physics of Digital Photography (Second Edition)

The differences between the sRGB gamma curve and a standard γE = 1/2.2 curve
can be seen more clearly by plotting both curves using logarithmic axes. As
illustrated in figure 4.16, data plotted on logarithmic axes is not altered numerically
but the spacing between axis values changes in a logarithmic manner. A power law
plotted on linear axes appears as a straight line when plotted on logarithmic axes
with the gradient of the line equal to the exponent of the power law. Unlike the
sRGB curve, the standard γE = 1/2.2 curve appears as a straight line with gradient
equal to 1/2.2.
4.10.2 sRGB colour cube
Recall that output-referred RGB colour spaces are additive and so their gamuts each
define a triangle on the xy chromaticity diagram. Furthermore, sRGB and Adobe®
RGB use real primaries. These properties imply that sRGB and Adobe® RGB can
be visualised as cubes when described by a colour model based on a 3D Cartesian
coordinate system.
In the case of sRGB, recall that the (x , y ) chromaticity coordinates of the red,
green, and blue primaries are defined as

(0.64, 0.33), (0.30, 0.60), (0.15, 0.06).

The gamut deﬁned by the primaries is illustrated in ﬁgure 4.14. Using tristimulus
values of the linear form of sRGB normalised to the range [0,1], each primary can
also be written as a vector,
⎡ RL ⎤ ⎡1 ⎤ ⎡0⎤ ⎡0⎤
⎢ ⎥ ⎢ ⎥ ⎢1 ⎥ , and ⎢ 0 ⎥ .
⎢G L ⎥ = ⎢ 0 ⎥ , ⎢⎣ ⎥⎦ ⎢⎣ ⎥⎦
⎣ BL ⎦ ⎣ 0 ⎦ 0 1

In order to visualise the sRGB colour cube correctly on a display monitor, the linear
form of sRGB needs to be converted into its nonlinear form by applying the
encoding gamma curve and then quantizing to DOLs using equation (4.27). A cube
defined by all DOLs will appear correctly when viewed on a standard gamut
monitor since the display gamma will compensate for the encoding gamma.
An 8-bit sRGB colour cube is illustrated in figure 4.17. Using Cartesian
coordinates, black is defined by the origin,
′ , GDOL
(R DOL ′ , BDOL
′ ) = (0, 0, 0).

The D65 reference white is deﬁned by

′ , GDOL
(R DOL ′ , BDOL
′ ) = (255, 255, 255).

All greyscale colours lie along the diagonal between the origin and the reference
white. The red, green and blue primaries are deﬁned by
′ , GDOL
(R DOL ′ , BDOL
′ ) = (255, 0, 0), (0, 255, 0), and (0, 0, 255).

4-55
Physics of Digital Photography (Second Edition)

magenta
reference white (D65)
(255,255,255)
blue
primary
cyan

red
primary
yellow
black
(0,0,0)

green primary
Figure 4.17. sRGB colour cube deﬁned by 8-bit DOLs.

Since the DOLs are nonlinearly related to scene luminance, they are speciﬁed by
coordinates rather than vectors.
Adobe® RGB can also be described using a colour cube, but can only be
visualised correctly on a wide-gamut monitor. On the other hand, ProPhoto RGB
cannot be visualised as a cube since part of its gamut lies outside the visible
chromaticities deﬁned by the horseshoe-shaped region of the xy chromaticity
diagram.

4.11 Raw processing workflow

This section discusses several important issues related to image management that
arise as part of the workﬂow when raw conversion software is used to convert a raw
ﬁle into a viewable output image.

4.11.1 Colour management

When a digital image is transferred between different devices, the aim of colour
management is to preserve the real-world colour appearance as closely as possible.
A prerequisite for successful colour management is the use of an external
hardware profiling device to create an ICC display profile that describes the colour
response of the monitor or display device. The colours that can be displayed by a
hardware device define its gamut. After calibration, which sets the display at its
optimum state, profiling will characterise the colour response of the calibrated
display so that a mapping can be determined between the display gamut of the
monitor/display and the PCS, which can be based on CIE XYZ or CIE LAB with a

4-56
Physics of Digital Photography (Second Edition)

D50 white point. In recent versions of Microsoft® Windows®, ICC colour profiles
are stored in the following directory:
C: \Windows\System32\spool\drivers\color.
Colour-managed applications such as raw converters and image-editing software
will automatically utilise the display profile created by the profiling software. The
translation of colours between ICC profiles is carried out by the colour matching
module (CMM).
When raw conversion is performed by the in-camera image processing engine, an
EXIF (exchangeable image file format) tag is added to the JPEG metadata
indicating the appropriate ICC profile for the chosen output-referred colour space.
Typically, sRGB [25] and Adobe® RGB [26] are the available in-camera options.
The ICC profile enables the encoded JPEG image DOLs to be correctly interpreted
and does not affect the raw data. If the option is given, it is good practice to embed
the ICC profile in the image file for the benefit of users who do not have the relevant
ICC profile installed on their viewing device.
• sRGB is the appropriate option for displaying images on standard-gamut
monitors and on the internet. It is also the most suitable option for displaying
images on devices that do not have colour management in place.
• Adobe® RGB is the appropriate option for displaying images on wide-gamut
monitors and for printing. Multi-ink printers can have very large gamuts.

If further editing of the image is required, it is important to use colour-managed

editing software and an appropriate internal working space. The colour settings for
Adobe® Photoshop® are discussed in section 4.11.4.
An external raw converter offers much greater ﬂexibility. For example, a custom
tone curve can be applied to transfer the entire raw DR to the output image.
Furthermore, an external raw converter can be used for a maximal colour strategy
that aims to preserve as many of the captured scene colours as possible.

4.11.2 Maximal colour strategy

A maximal colour strategy aims to preserve as much of the colour information
captured by the camera as possible at each stage of the processing chain before the
final output image encoding is chosen. This requires use of an external raw converter
and colour-managed image editing software.
Raw converters use an internal working space for processing, which is usually a
camera raw space ICC profile or a reference colour space. An output-referred colour
space needs to be selected for displaying and saving the output image, and only
colours contained within the selected colour space that are also contained within the
display gamut of the monitor/display can be seen on the monitor/display. The stages
of a maximal colour strategy can be summarised as follows:
1. Use a wide-gamut monitor/display and associated ICC profile.
2. Select a large output-referred colour space for saving the output image from
the raw converter, ideally as a 16-bit TIFF file. 16-bit TIFF files are

4-57
Physics of Digital Photography (Second Edition)

discussed in section 4.11.3. The WideGamut RGB and ProPhoto RGB

output-referred colour spaces are larger than Adobe RGB and are
suitable for editing and displaying images on wide-gamut monitors and for
high-quality printing. Although the extra colours may fall within the gamut
of multi-ink printers, ProPhoto RGB cannot be seen in its entirety on any
conventional three-channel monitor/display since two of its primaries are
imaginary. This is evident from figure 4.13.
3. Use colour-managed editing software if further processing is required. Such
software also has an internal working space that defines the accessible
colours for editing an image. Ideally, this should be set to be the colour
space chosen in step 2.
4. Select the same colour space chosen in step 3 when saving the edited image,
ideally as a 16-bit TIFF file.

Only colours contained in both the working space and the monitor/display gamut
can be seen when viewing or editing the image. Nevertheless, colours in the working
space that cannot be seen on the display will be visible on a print if they lie within the
printer gamut.
Since the output image file from step 4 has been encoded using a large colour
space, it can be archived and used as a source file for producing an appropriate
output JPEG image as required. Different applications require specific adjustments
to be made, such as resampling, sharpening or conversion to a smaller output-
referred colour space.
When converting the image file from step 4 into a different colour space, software
such as Adobe® Photoshop® (via the ‘Convert to Profile’ option) will perform gamut
mapping with an applied rendering intent. Perceptual intent is recommended when
converting to a smaller colour space such as sRGB. Although any out-of-gamut
colours will be clipped, perceptual intent will shift the in-gamut colours in order to
preserve colour gradations.
The advantage of using a working space larger than the final chosen output-
referred colour space is that a greater amount of colour information can be
preserved by minimizing the clipping of colours until the final gamut-mapping stage.

4.11.3 16-bit TIFF ﬁles

When a raw file is processed using an external raw converter, there is an option to
output 16-bit rather than 8-bit image files.
Recall from section 2.2.1 of chapter 2 that 8-bit image files can encode any desired
amount of DR as they are not involved with the capture of raw data. Nevertheless, an
8-bit image file cannot represent a greater amount of scene DR than the raw DR when
the image file is produced from the raw file. The same applies to 16-bit images files.
The main difference between 8-bit and 16-bit image files is the number of DOLs.
• A 16-bit image file provides 216 = 65 536 DOLs rather than the 28 = 256
DOLs provided by an 8-bit image file. These extra tonal levels provide a finer
discretization of the scene DR transferred from the raw file. However, these

4-58
Physics of Digital Photography (Second Edition)

extra tonal transitions cannot be observed on a conventional 8-bit display

monitor.
• A 16-bit image ﬁle occupies a much larger space on a storage medium such as
a hard drive in comparison with an 8-bit image ﬁle.

The extra tonal levels can be advantageous as part of a raw workflow. When an 8-bit
image file obtained from a raw converter undergoes further manipulation such as
extensive levels adjustment using image editing software, posterisation artefacts may
arise in areas of steep tonal gradation. However, a 16-bit image file provides a much
finer mesh for the calculations before conversion to an 8-bit image file or 8-bit DOLs
for an 8-bit display, and so these artefacts are avoided. For the same reason, 16-bit
image files are recommended for use with wide gamut colour spaces such as
ProPhoto RGB, which provide steeper colour gradations.
Information will be lost when a JPEG file is resaved, even if no adjustments are
made. In contrast, TIFF (Tagged Image File Format) files are lossless and can be
repeatedly saved without loss of information. 16-bit TIFF files can be archived and
an appropriate JPEG file produced as required.

4.11.4 Adobe Photoshop colour settings

The ‘Color Settings’ box available in Adobe® Photoshop® CS5 is shown in
ﬁgure 4.18. The settings are grouped into three main sections:
1. Working Spaces.
2. Colour Management Policies.
3. Conversion Options.

Each of these are discussed below.

Working space
As explained in step 3 of section 4.11.2, the working space is the colour space used
for editing an imported image and is unrelated to the display profile. Several options
are available, as discussed in the colour management policies section below.
Nevertheless, it is generally advisable to set the working space to be the colour
space associated with the ICC profile of the imported image. This is likely to be
either sRGB or Adobe® RGB for an imported image obtained directly from an in-
camera image-processing engine.
Note that the monitor/display ICC profile is available as ‘Monitor RGB - profile
name’ in the working space option list. However, it is inadvisable to select this as the
working space.
When following a maximal colour strategy, the ICC profile of the imported image
will be a large output-referred colour space such as Adobe® RGB, or preferably
WideGamut RGB or ProPhoto RGB. Recall from section 4.11.2 that a major
advantage of using a large working space is that a greater amount of colour information
can be preserved by minimizing the clipping of colours, even if the image will ultimately
be converted to a smaller colour space at the final gamut-mapping stage.

4-59
Physics of Digital Photography (Second Edition)

Figure 4.18. ‘Color Settings’ control box available in Adobe® Photoshop® CS5.

A drawback of using ProPhoto RGB as the working space is that its gamut
cannot be shown in its entirety on a wide-gamut display, and so some of the colours
that appear in print may not be visible on the monitor/display. This can be
understood by recalling from sections 4.1 and 4.3 that non-negative linear combi-
nations of three real primaries cannot reproduce all visible chromaticities. This is
due to the fact that the three types of eye cone cell cannot be stimulated
independently, and so the visible gamut appears as a horseshoe shape rather than
a triangle on the xy chromaticity diagram. Although the primaries associated with
response functions of capture devices can be imaginary, the primaries of a display
device must be physically realizable and therefore cannot lie outside of the visible
gamut. Since non-negative linear combinations of primaries lie within a triangle
deﬁned by the primaries on the xy chromaticity diagram, a display device that uses
only three real primaries cannot reproduce all visible colours. Modern wide-gamut
displays can show many colours outside both the sRGB and Adobe® RGB gamuts,

4-60
Physics of Digital Photography (Second Edition)

although some regions of these gamuts may not be covered. ProPhoto RGB includes
many additional colours that can be shown on wide-gamut displays, but two of the
primaries are imaginary and so the entire colour space cannot be shown. The gamuts
of several output-referred colour spaces are illustrated in ﬁgure 4.13.

Colour management policies

The ICC colour-space profile associated with an image file determines how the
DOLs are interpreted. For example, JPEG output image files produced by in-
camera image-processing engines will have sRGB or Adobe® RGB embedded ICC
profiles. Tags may be used instead of embedded profiles since sRGB and Adobe®
RGB are standard profiles commonly installed on devices.
An embedded profile mismatch will occur if an image is opened and the colour
space associated with the embedded ICC profile differs from the Adobe®
Photoshop® working space, or if the embedded ICC profile or tag is missing. In
these cases, the photographer is provided with control over handling of the profile
mismatch via the dialogue box shown in figure 4.19. The options are as follows:
(A) Use the embedded profile (instead of the working space).
(B) Convert document’s colors to the working space.
(C) Discard the embedded profile (do not color manage).

It is generally advisable to choose option A. The working space will temporarily be

set to the colour space associated with the image ICC profile instead of the default
working space chosen in ‘Color Settings’. For example, the working space might be
ProPhoto RGB and the embedded profile might be sRGB or Adobe® RGB. When
opening multiple image files simultaneously, the embedded profile associated with
each image will be used as the temporary working space for the appropriate image
window.

Figure 4.19. ‘Embedded Proﬁle Mismatch’ dialogue box present upon opening an image using Adobe®
Photoshop® CS5.

4-61
Physics of Digital Photography (Second Edition)

Provided the working space is larger than the image ICC profile colour space, it is
also perfectly acceptable to choose option B and convert the colours to the working
space. Although the additional colours available in the working space will not be
utilised upon conversion, the entire gamut of the working space can subsequently be
utilised when editing the image. In some cases this can be undesirable. If the image
ICC profile colour space is larger than the working space, choosing option B may
lead to unnecessary clipping of the scene colours. In this case the rendering intent
default setting in the colour options will be applied. Rendering intent is discussed in
the conversion options section below.
If the embedded profile is missing, a colour profile can be assigned to the image.
In order to test different profiles, choose ‘Leave as is (don’t color manage)’ and then
assign a profile using the ‘Assign Profile’ menu option. The appearance of the image
will suggest the most suitable profile. Most likely, the RGB values will correspond to
the sRGB colour space.

Conversion options
The conversion options define the default settings for converting between ICC
profiles. Conversion may be required when there is an embedded profile mismatch
upon opening an image. It can also be performed at any time by using the ‘Convert
to Profile’ menu option. For example, if the image ICC profile is ProPhoto RGB and
this has been set as the working space for editing, the photographer may wish to
create a version converted to sRGB for displaying on the internet.
Since the source and destination colour spaces have different gamuts, gamut
mapping may result in a change to the image DOLs. The ICC have defined several
rendering intents that can be applied by the CMM. In photography, the two most
useful rendering intents are relative colorimetric and perceptual.
• Relative colorimetric intent clips out-of-gamut colours to the edge of the
destination gamut, and leaves in-gamut colours unchanged. It is generally
advisable to enable black point compression when using relative colorimetric
intent, particularly if the source image colours all lie inside the gamut of the
destination colour space.
• Perceptual intent also clips out-of-gamut colours to the edge of the destination
gamut, but at the same time shifts in-gamut colours to preserve colour
gradations. This can be viewed as a compression of the source gamut into the
destination gamut and may be preferable when there are source image colours
that lie outside of the destination gamut.

If the preview option is selected, the most suitable rendering intent can be chosen on
a case-by-case basis.
The ‘Proof Colors’ menu option enables the effects of conversion to destination
profiles to be simulated. The destination profile can be selected via ‘Proof Setup’.
Note that the ‘Monitor RGB’ option effectively simulates the appearance of colours
on the monitor/display without the display profile applied, in other words an
uncalibrated monitor/display.

4-62
Physics of Digital Photography (Second Edition)

4.11.5 Image resizing

Digital images often need to be resized for display. It is useful to deﬁne the following
terminology:
• The pixel count of a digital image is the total number of pixels. Pixel count n is
commonly speciﬁed in the following way:

n = n(h) × n(v),

where n(h ) is the number of pixels in the horizontal direction and n(v ) is the
number of pixels in the vertical direction.
• The image display resolution specifies the number of displayed image pixels
per unit distance, most commonly in terms of pixels per inch (ppi). This refers
to the manner in which an image is displayed and is not a property of the
image itself.
• The screen resolution is the number of monitor pixels per inch (ppi). The
screen resolution defines the image display resolution when the image is
displayed on a monitor.
• The print resolution is the image display resolution for a hardcopy print. For
high quality prints, 300 ppi is considered to be sufficient when a print is
viewed under standard viewing conditions. These conditions are described in
section 1.4.1 of chapter 1 and section 5.2.2 of chapter 5. Print resolution in
ppi should not be confused with the printer resolution in dpi (see below).
• The image display size is determined by the pixel count together with the
image display resolution,

pixel count
image display size = .
image display resolution

The print size is the image display size for a hardcopy print.
• The printer resolution is a measure of the number of ink dots per unit distance
used by a printer to print an image, and is commonly measured in dots per
inch (dpi). A higher dpi generally results in better print quality. Unlike screen
resolution, printer resolution in dpi is independent from image display
resolution.

For example, a 720 × 480 pixel image will appear with an image display size equal to
10 by 6.66 inches on a computer monitor set at 72 ppi. The same image will appear
with an image print size equal to 2.4 by 1.6 inches when printed with the image
display resolution set to 300 ppi, and is independent from the printer resolution
in dpi.
Commercial printers often request images to be ‘saved at 300 ppi’ or ‘saved at
300 dpi’. Such phrases are not meaningful since neither image display resolution nor
printer resolution are properties of an image. Nevertheless, it is possible to add a ppi
resolution tag to an image. This does not alter the image pixels in any way and is
simply a number stored in the image metadata to be read by the printing software.

4-63
Physics of Digital Photography (Second Edition)

In any case, the printing software will allow this value to be overridden by providing
print size options.
It is likely that the commercial printer intends to print the image with a 300 ppi
print resolution. In the absence of any further information such as the final print size,
it is advisable to leave the pixel count unchanged, add a 300 ppi resolution tag to the
image, and rely on the printing software used by the client to take care of any
resizing. However, if the final print size is already known then quality can be
optimized by resizing the image in advance using more specialised software. This
will change the pixel count according to the following formula:
required pixel count = required print size × image display resolution.
For example, if the image print size will be 12 × 8 inches and the image display
resolution needs to be 300 ppi, then the required pixel count is 3600 × 2400. If the
image pixel count is currently 3000 × 2000, then the pixel count needs to be increased
through resampling. The mathematics of resampling is discussed in section 5.7 of
chapter 5.
Adobe® Photoshop® provides the ‘Image Size’ option shown in figure 4.20. For
the present example, one way of using this feature is to set the required width and
height to be 3600 pixels and 2400 pixels in ‘Pixel Dimensions’ with ‘Resolution’ set
to 300 ppi. In this case the ‘Document Size’ will automatically change to 12 × 8
inches. Alternatively, the width and height in ‘Document Size’ can be set to 12 inches
and 8 inches, respectively. In this case the ‘Pixel Dimensions’ will automatically
change to 3600 × 2400, and selecting ‘OK’ will perform the resampling.

Figure 4.20. ‘Image Size’ control box available in Adobe® Photoshop® CS5.

4-64
Physics of Digital Photography (Second Edition)

References
[1] International Color Consortium 2010 Image Technology Colour Management—Architecture,
Profile Format, and Data Structure Specification ICC.1:2010 (profile version 4.3.0.0)
[2] Commission Internationale de l’Eclairage 2006 Fundamental Chromaticity Diagram with
Physiological Axes–Part 1 CIE 170-1:2006
[3] Ohta N and Robertson A R 2005 Colorimetry: Fundamentals and Applications (New York:
Wiley)
[4] Hunt R W G and Pointer M R 2011 Measuring Colour 4th edn (New York: Wiley)
[5] Hunt R W G 1997 The heights of the CIE colour-matching functions Color Res. Appl. 22 355
[6] Stewart S M and Johnson R B 2016 Blackbody Radiation: A History of Thermal Radiation
Computational Aids and Numerical Methods (Boca Raton, FL: CRC Press)
[7] Kim Y-S, Cho B-H, Kang B-S and Hong D-I 2006 Color temperature conversion system and
method using the same US Patent Specification 7024034
[8] Robertson A R 1968 Computation of correlated color temperature and distribution
temperature J. Opt. Soc. Am. 58 1528
[9] McCamy C S 1992 Correlated color temperature as an explicit function of chromaticity
coordinates Color Res. Appl. 17 142
[10] Hernández-Andrés J, Lee R L and Romero J 1999 Calculating correlated color temperatures
across the entire gamut of daylight and skylight chromaticities Appl. Opt. 38 5703
[11] Sato K 2006 Image-processing algorithms Image Sensors and Signal Processing for Digital
Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 8
[12] Coffin D 2015 private communication
[13] Li X, Gunturk B and Zhang L 2008 Image demosaicing: a systematic survey Visual
Communications and Image Processing Proc. SPIE 6822 68221J
[14] Chang E, Cheung S and Pan D Y 1999 Color filter array recovery using a threshold-based
variable number of gradients Sensors, Cameras, and Applications for Digital Photography
Proc. SPIE 3650 36–43
[15] Lin C-k 2004 Pixel grouping for color filter array demosaicing (Portland State University)
[16] Hirakawa K and Parks T W 2005 Adaptive homogeneity-directed demosaicing algorithm
IEEE Trans. Image Process. 14 360
[17] Luther R 1927 Aus dem Gebiet der Farbreizmetrik (On color stimulus metrics) Z. Tech.
Phys. 12 540
[18] Hung P-C 2006 Color theory and its application to digital still cameras Image Sensors and
Signal Processing for Digital Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/
Taylor & Francis) ch 7
[19] International Organization for Standardization 2012 Graphic Technology and Photography-
Colour Target and Procedures for the Colour Characterisation of Digital Still Cameras
(DCSs) ISO 17321-1:2012
[20] Holm J 2006 Capture color analysis gamuts Proc. 14th Color and Imaging Conference IS&T
(Scottsdale, AZ) (Springfield, VA: IS&T) pp 108–13
[21] International Organization for Standardization 2009 Photography–Electronic Still-Picture
Cameras–Methods for Measuring Opto-electronic Conversion Functions (OECFs), ISO
14524:2009
[22] Hung P-C 2002 Sensitivity metamerism index for digital still camera Color Science and
Imaging Technologies Proc. SPIE 4922

4-65
Physics of Digital Photography (Second Edition)

[23] Balasubramanian R 2003 Device characterization Digital Colour Imaging Handbook

ed G Sharma (Boca Raton, FL: CRC Press) ch 5
[24] Mahy M, Van Eycken L and Oosterlinck A 1994 Evaluation of uniform color spaces
developed after the adoption of CIELAB and CIELUV Color Res. Appl. 19 105–21
[25] International Electrotechnical Commission 1999 Multimedia Systems and Equipment–Colour
Measurement and Management–Part 2-1: Colour Management–Default RGB Colour Space–
sRGB, IEC 61966-2-1:1999
[26] Adobe Systems Incorporated 2005 Adobe® RGB (1998) Color Image Encoding Version
2005-05
[27] International Organization for Standardization 2013 Photography and Graphic
Technology–Extended Colour Encodings for Digital Image Storage, Manipulation and
Interchange–Part 2: Reference Output Medium Metric RGB Colour Image Encoding
(ROMM RGB), ISO 22028-2:2013
[28] McCann J 2005 Do humans discount the illuminant? Human Vision and Electronic Imaging
X ed B E Rogowitz, T N Pappas and S J Daly (Bellingham, WA: SPIE) https://doi.org/
10.1117/12.594383
[29] International Organization for Standardization 2012 Photography–Electronic Still Picture
Imaging –Vocabulary ISO 12231:2012
[30] Adobe Systems Incorporated 2012 Adobe Digital Negative (DNG) Speciﬁcation Version 1.4
[31] Fairchild M D 2013 Color Appearance Models 3rd edn (New York: Wiley)
[32] Lam K M 1985 Metamerism and colour constancy PhD Thesis University of Bradford
[33] Adobe Systems Incorporated 2004 Introducing the Digital Negative Speciﬁcation:
Information for Manufacturers

4-66
IOP Publishing

Physics of Digital Photography (Second Edition)

D A Rowlands

Chapter 5
Camera image quality

This chapter discusses the theory and interpretation of fundamental camera image
quality (IQ) metrics. Unlike conventional treatments, this chapter also demon-
strates how camera IQ metrics should be applied and interpreted when comparing
cameras with different pixel counts and when performing cross-format compar-
isons between camera systems based on different sensor formats [1]. Various
photographic techniques for utilizing the full IQ potential of a camera are also
discussed.
Many aspects of IQ are subjective in that different observers will value a given
aspect of IQ differently. For example, image noise deemed unacceptable by one
observer may be judged to be aesthetically pleasing by another. Furthermore,
observers may judge the relative importance of various IQ aspects differently.
Nevertheless, objective IQ metrics are useful when determining the suitability of a
given camera system for a given application. Objective IQ metrics that describe
camera system capability include the following:
• Camera system modulation transfer function (MTF) and lens MTF.
• Camera system resolving power (RP) and lens RP.
• Signal-to-noise ratio (SNR).
• Raw dynamic range (DR).

However, it should be remembered that camera system capability metrics do not

take into account the conditions under which the output photograph is viewed.
Objective IQ metrics that take into account the viewing conditions include the
following:
• Camera system MTF and lens MTF at spatial frequencies relevant to the
viewing conditions.
• Perceived image sharpness.
• Perceivable raw DR.

doi:10.1088/978-0-7503-2558-5ch5 5-1 ª IOP Publishing Ltd 2020

Physics of Digital Photography (Second Edition)

In order to clarify the distinction between the two different sets of metrics listed
above, consider the example of camera system RP, which is often confused with
perceived image sharpness. Camera system RP is defined as the highest spatial
frequency that the camera and lens combination can resolve. However, this spatial
frequency will not be observed in an output photograph viewed under standard
viewing conditions, which include a standard enlargement and viewing distance. In
this case, contrast reproduction described by the camera system MTF at much lower
spatial frequencies will have a much greater impact upon perceived image sharpness.
A camera with lower camera system RP could produce images that are perceived as
sharper provided it offers superior MTF performance over the relevant range of
spatial frequencies.
Furthermore, IQ metrics can be misleading when comparing different camera
models. A simple example is SNR per photosite or sensor pixel. If the cameras being
compared have sensors with different pixel counts but are otherwise identical, the
sensor with larger photosites would naturally yield a higher SNR per photosite since
SNR increases with light-collecting area. However, a similar gain in SNR could be
achieved for the camera with smaller photosites simply by downsampling the output
digital image so that the pixel counts match. Accordingly, it is more appropriate to
compare SNR at a fixed spatial scale when comparing camera models.
Finally, it is important to appreciate that identical exposure settings will not yield
photographs with the same appearance characteristics when using cameras based on
different sensor formats since the total light received by each camera will not be the
same. Cross-format IQ comparisons should be based on a framework where each
camera receives the same amount of light. This requires equivalent rather than
identical exposure settings on each format. When equivalent exposure settings are
used, real-world IQ differences arise from characteristics of the underlying camera
and lens technologies being compared rather than the total light received. It will
become apparent that a larger format only provides a theoretical IQ advantage
when equivalent exposure settings are unachievable on the smaller format.
This chapter begins by presenting the theory behind cross-format camera IQ
comparisons. Subsequently, observer RP and the circle of confusion (CoC) are
discussed. Since the CoC relates to the viewing conditions, it is important to consider
the CoC when interpreting the camera system IQ metrics discussed in the later
sections. The final section of the chapter discusses IQ in relation to photographic
practice.

5.1 Cross-format comparisons

Since all modern digital cameras produce satisfactory IQ, factors such as ergo-
nomics, controls, operation, autofocus speed and autofocus accuracy are important
considerations when choosing a camera system. Another important consideration is
sensor format. Larger formats offer additional photographic capability such as a
shallower achievable depth of ﬁeld and longer achievable exposure duration, and
this extra capability potentially offers higher IQ. However, the extra photographic
capability comes at the cost of a higher camera system size and weight.

5-2
Physics of Digital Photography (Second Edition)

When performing cross-format camera IQ comparisons, an appropriate frame-

work needs to be established. Recall that the standard exposure strategy discussed in
chapter 2 aims to produce an output JPEG image at a standard mid-tone lightness
when a typical photographic scene is metered using average photometry. Although
the standard exposure strategy is independent of sensor format, it is important to
appreciate that there are various aspects of image appearance that are format
dependent.
For a given object distance s and a typical photographic scene with 18% average
luminance metered using average photometry, an identical combination of focal
length f, f-number N, and ISO setting S used on different formats will lead to the
following:
• The same exposure duration (shutter speed), t.
• The same average exposure 〈H 〉 at the sensor plane (SP).
• The same perspective.
• The same magniﬁcation, ∣m∣ = f /(s − f ).
• Middle grey as the mid-tone JPEG lightness.

However, the following aspects of image appearance will be different:

• Angular ﬁeld of view (AFoV).
• Depth of ﬁeld (DoF).

The same AFoV could be achieved by using equivalent focal lengths on each format
according to the focal-length multiplier, as described in section 1.3.6 of chapter 1.
However, this will not lead to the same DoF, the reason being that DoF depends
upon the CoC diameter, and this is format dependent for the same image display
dimensions. Another explanation for the different DoF is that the larger sensor
format will collect a greater amount of light when identical exposure settings are
used on each format. It is also instructive to note that this will result in higher total
image noise for the smaller format, assuming the sensors are equally efficient in
terms of quantum efficiency (QE) and signal processing.
However, cross-format IQ comparisons should be based on the same total
amount of light, and this requirement leads to a definition of equivalent photographs.
These are defined as photographs taken on different formats that have the following
same-appearance characteristics:
(1) Perspective.
(2) Framing or AFoV.
(3) Display dimensions.
(4) DoF.
(5) Exposure duration or shutter speed.
(6) Mid-tone JPEG lightness.

Equivalent photographs are produced using equivalent exposure settings rather than
identical exposure settings. Along with the use of equivalent focal lengths to obtain
the same AFoV, equivalent exposure settings require that equivalent f-numbers and
equivalent ISO settings be used on each format. These are deﬁned as follows:

5-3
Physics of Digital Photography (Second Edition)

f1 N1 S1
f2 = , N2 = , S2 = . (5.1)
R R R2

The 1 and 2 subscripts, respectively, denote the larger and smaller formats. The
focal-length multiplier, crop factor or equivalence ratio R is deﬁned as follows:

d1
R= . (5.2)
d2

Here d1 is the length of the larger sensor diagonal and d2 is the length of the smaller
sensor diagonal. For example, if d1 = d then d2 = d/R.
Consider a lens with focal length f1 = 100 mm used on the 35 mm full-frame
format. For an example set of exposure settings on the 35 mm full frame format,
table 5.1 lists the equivalent focal lengths and equivalent exposure settings for a
selection of smaller sensor formats.
Finally, note that precise equivalence can only be obtained if the sensor formats
have the same aspect ratio. Furthermore, it will be shown in sections 5.1.2 and 5.1.3
that the above equivalence formulae are strictly valid only when focus is set at
inﬁnity and that generalised equations are needed at closer focus distances.

5.1.1 Equivalence and image quality

The appearance characteristics of equivalent photographs listed above are inde-
pendent of the underlying camera and lens technologies. Equivalent photographs
have the following properties due to the fact that they are produced using the same
amount of light:
1. Image noise of the same order of magnitude.
2. The same cut-off frequency due to diffraction, and therefore the same
theoretical resolution limit.

Table 5.1. Example equivalent exposure settings when focus is set at inﬁnity. Focal lengths and f-numbers
have been rounded to one and two decimal places, respectively.

Format R f (mm) N S

35 mm full frame 1 100 4 400

APS-C 1.53 65.4 2.62 171
APS-C (Canon) 1.61 62.0 2.48 154
Micro Four Thirds 2.00 50.0 2.00 100
1 inch 2.73 36.7 1.47 54
2/3 inch 3.93 25.4 1.02 26
1/1.7 inch 4.55 22.0 0.88 19
1/2.5 inch 6.03 16.6 0.66 11

5-4
Physics of Digital Photography (Second Edition)

It follows that a smaller format will in principle provide the same IQ as a larger
format when equivalent exposure settings are used. In this case:
• Real-world noise differences will arise from contrasting sensor technologies,
which are mainly characterised by the QE and read noise.
• Real-world resolution differences will arise from factors such as lens aberra-
tion content and sensor pixel count.
• Other real-world factors that affect IQ include the default JPEG tone curve
and image processing.

In other words, cross-format IQ comparisons should be performed using equivalent

exposure settings when available. In this case, IQ of the same order of magnitude
will be obtained from different formats. Real-world IQ differences will be due to
technology differences rather than sensor size.
However, it is important to appreciate that a larger format offers additional
photographic capability when equivalent exposure settings do not exist on a smaller
format. When the extra photographic capability of a larger format is utilised
through appropriate photographic practice, the resulting photograph will be
produced using a greater amount of light than that achievable on a smaller format.
This offers theoretical IQ advantages in terms of noise and resolution. The situations
where the extra photographic capability can be utilised generally fall into two
categories [1]:
1. The entrance pupil (EP) diameter is set larger than the value achievable using
the smaller format.
2. The exposure duration is set longer than the value achievable using the
smaller format.

As an example of the first type, consider an action photography scenario where the
lens on the larger format is set at a low f-number to isolate the subject from the
background and provide an exposure duration short enough to freeze the appear-
ance of the moving subject. If an equivalent photograph is attempted using the
smaller format but an equivalent f-number does not exist, the smaller format will
produce a photograph with a deeper DoF and will be forced to underexpose in order
to match the exposure duration used on the larger format. In this case the extra
exposure utilised by the larger format will in principle lead to a higher overall SNR.
Furthermore, the cut-off frequency due to diffraction will be higher on the larger
format since the lens EP diameter will be larger. This potentially offers greater image
resolution.
As an example of the second type, consider a landscape photography scenario
where the larger format camera is set at the base ISO setting. If an equivalent
photograph is attempted using the smaller format but an equivalent ISO setting does
not exist, then the smaller format will be unable to produce a photograph with a
sufficiently long exposure duration without overexposing, in which case the mid-
tone lightness of the output JPEG image will be incorrect. In other words, the
smaller format will be unable to provide sufficient long-exposure motion blur. In this

5-5
Physics of Digital Photography (Second Edition)

case the extra exposure utilised by the larger format will again in principle lead to a
higher overall SNR.

5.1.2 Generalised equivalence theory

A formal proof of equivalence theory has been given in references [1, 2]. The proof is
summarised in section 5.1.3.
The proof reveals that the conventional equivalence formulae defined by
equations (5.1) and (5.2) are in fact strictly valid only when focus is set at infinity.
When focus is set on an OP positioned closer than infinity, focal lengths and
f-numbers should in principle be replaced by their working values so that equation
(5.1) becomes
fw,1 Nw,1 S1
fw,2 = , Nw,2 = , S2 = . (5.3)
R R R2
The working f-numbers are defined according to equation (1.55) of chapter 1,
Nw,1 = b1N1 and Nw,2 = b2N2.
Analogously, the working focal lengths are defined as
fw,1 = b1 f1 and fw,2 = b2 f2 .

Here b1 and b2 are the bellows factors for the larger and smaller formats,
respectively. These are deﬁned by equations (1.31) and (1.19) of chapter 1,
f1 f2
b1 = 1 + and b2 = 1 + .
m p(s1 − f1 ) m p(s2 − f2 )

Significantly, b1 ≠ b2 whenever focus is set closer than infinity since f1 ≠ f2. The
bellows factors are equal only in the special case that focus is set at infinity. In this
limit, b1 = b2 = 1 and so equation (5.3) reduces to equation (5.1).
However, a major problem with the practical application of equation (5.3) arises
from the fact that working f-numbers and working focal lengths depend on the OP
distance, i.e. the distance to the plane upon which focus is set. Unless focus is set at
infinity, the working values are not equal to the values actually marked on the lenses
of the camera formats being compared.
References [1, 2] solve this problem by reformulating equation (5.3) in the
following way:

f1 N1 S1
f2 = , N2 = , S2 = . (5.4)
Rw Rw R2

Here Rw is a new quantity known as the working equivalence ratio. This quantity
replaces the conventional equivalence ratio or ‘crop factor’, R. It is deﬁned as
follows:

5-6
Physics of Digital Photography (Second Edition)

⎛b ⎞
R w = ⎜ 2 ⎟R. (5.5)
⎝ b1 ⎠

Signiﬁcantly, Rw is a function of the OP distance. By using Rw in place of R when

focus is set closer than inﬁnity, the focal lengths and f-numbers marked on the lens
can be used instead of their working values when determining equivalent exposure
settings. Note that the replacement is not required when relating equivalent ISO
settings, S. Practical expressions for Rw are derived in section 5.1.3. In summary,
• If the focal length of the larger format is known, then
⎛ m c,1 ⎞
R w = ⎜⎜1 − ⎟⎟R .
⎝ pc,1 ⎠

The correction terms are deﬁned as follows:

⎛ R − 1 ⎞ f1
m c,1 = ⎜ ⎟
⎝ R ⎠ s1
f1
pc,1 = m p + (1 − m p) .
s1
• If the focal length of the smaller format is known, then
R
Rw =
⎛ m c,2 ⎞ .
1 + ⎜ p ⎟R
⎝ c,2 ⎠

The correction terms are deﬁned as follows:

⎛ R − 1 ⎞ f2
m c,2 = ⎜ ⎟
⎝ R ⎠ s2
f2
pc,2 = m p + (1 − m p) .
s2
• Precise equivalence can only be achieved if the pupil magnification m p is the
same on each format.
• s1 and s2 are the OP distances measured from the first principal planes of the
larger and smaller formats, respectively. Since perspective is defined by the
EP to OP distance rather than the H to OP distance, s1 and s2 are not exactly
the same in general. However, s1 = s2 = s in the special case that m p = 1.

The working equivalence ratio Rw is numerically identical to R when focus is set at

infinity, but the numerical difference between Rw and R increases as the OP distance
is reduced from infinity. As illustrated in figure 5.1, the value of Rw is significantly
reduced compared to R at macro OP distances where the magnification is large.

5-7
Physics of Digital Photography (Second Edition)

The example equivalent exposure settings for a 100 mm lens on the 35 mm full-
frame format listed in table 5.1 are repeated on the left-hand side (LHS) of table 5.2.
The right-hand side (RHS) of table 5.2 shows the required equivalent exposure
settings at 1:1 magniﬁcation (∣m∣ = 1), and m p = 1. From the formula
∣m∣ = f /(s − f ), this corresponds to an OP distance s = 2f1 = 200 mm.
The results shown in the RHS of table 5.2, which were calculated using equations
(5.4) and (5.5), are seen to be very different compared with those on the LHS, which
were obtained using the conventional equivalence equations (5.1) and (5.2) with
focus set at inﬁnity. Much larger equivalent focal lengths and equivalent f-numbers
are required on the smaller formats than would be expected using the conventional

Figure 5.1. Rw /R plotted as a function of OP distance for a selection of sensor formats when format 1 is 35 mm
full frame. The OP distance has been expressed in units of the focal length used on format 1.

Table 5.2. The LHS shows example equivalent exposure settings (f, N, S) calculated using equations (5.1) and
(5.2) with focus set at inﬁnity. The RHS uses equations (5.4) and (5.5) to show how the required settings change
when focus is set at an OP distance s = 200 mm, which corresponds to the macro regime (∣m∣ = 1 and mp = 1).
Since focus is set closer than inﬁnity, these settings are related via Rw rather than R. Focal lengths and
f-numbers have been rounded to one and two decimal places, respectively.

Format R f (mm) N S Rw f (mm) N S

35 mm full frame 1 100 4 400 1 100 4 400

APS-C 1.53 65.4 2.62 171 1.26 79.1 3.16 171
APS-C (Canon) 1.61 62.0 2.48 154 1.31 76.5 3.06 154
Micro Four Thirds 2.00 50.0 2.00 100 1.50 66.7 2.67 100
1 inch 2.73 36.7 1.47 54 1.86 53.7 2.15 54
2/3 inch 3.93 25.4 1.02 26 2.47 40.5 1.62 26
1/1.7 inch 4.55 22.0 0.88 19 2.78 36.0 1.44 19
1/2.5 inch 6.03 16.6 0.66 11 3.51 28.5 1.14 11

5-8
Physics of Digital Photography (Second Edition)

equivalence formulae, which involve R rather than Rw . Although the pupil

magnification m p has been set to unity in the above example, a non-unity pupil
magnification will also have a large effect in the macro regime and this should be
included in real-world calculations.
Although the working equivalence ratio Rw is important for macro photography,
in general photographic situations where the magnification is smaller, Rw is well-
approximated by R, as evident from figure 5.1, in which case the conventional
equivalence equations (5.1) and (5.2) can be used. Indeed, for cross-format IQ
comparisons, it can be assumed that focus is set at infinity.
Several other equations of interest arise from the proof of equivalence. These
equations are valid when focus is set at any chosen OP distance, including infinity.
D2 = D1
Q2 = Q1
〈H2〉 = R2〈H1〉
c
c2 = 1
R
∣m1∣
∣m2∣ = .
R
Here D is the EP diameter, Q is the total luminous energy at the SP, 〈H 〉 is the
average photometric exposure at the SP, c is the CoC diameter, and ∣m∣ is the
absolute value of the magnification.

5.1.3 Proof of equivalence theory

As described in the previous section, the formal proof of equivalence theory given in
references [1, 2] reveals that a new quantity denoted Rw known as the working
equivalence ratio needs to be introduced in place of the conventional equivalence
ratio R. Unlike R, the working equivalence ratio Rw is a function of the OP distance,
and Rw = R only when focus is set at infinity. Consequently, the proof is valid for
compound photographic lenses with focus set at any chosen OP distance and with
any chosen pupil magnification. The proof given below is divided into seven main
sections:
• (1) The same perspective: This section explains the condition for producing
an image with the same perspective from different formats, specifically the
requirement that the OP distance be the same when measured from the EP of
each format.
• (2) The same framing: This section derives the condition for producing an
image with the same framing or AFoV from different formats, specifically a
formula relating the equivalent focal lengths required. It is shown that when
focus is set closer than infinity, Rw formally replaces R.
• Working equivalence ratio: This section derives practical expressions for Rw .
• (3) The same display dimensions: This section derives the relationship
between the CoC diameters that arises when equivalent photographs are
viewed at the same display dimensions.

5-9
Physics of Digital Photography (Second Edition)

• (4) The same DoF: This section derives the condition for producing images
with the same DoF (at the same perspective and framing) from different
formats, specifically a formula relating the equivalent f-numbers required. It
is shown that the equivalent f-numbers are related by Rw rather than R when
focus is set closer than infinity. It is also proven that the same EP diameter is
required on each format.
• (5) The same shutter speed: This section discusses the requirement that
equivalent photographs must be produced using the same exposure duration
or shutter speed.
• (6) The same mid-tone JPEG lightness: This section derives the condition for
producing output images from different formats with the same mid-tone
lightness, specifically a formula relating the equivalent ISO settings required
on each format.

(1) The same perspective

For cameras based on different formats focused on a specified OP, the same
perspective requires the same OP distance measured from the lens EP of
each camera [3].
The same perspective requires the same OP distance s. Recall that s is
measured from the first principal plane of a compound lens, which will
generally be located away from the aperture stop. The first and second
principal planes will both coincide with the aperture stop only for the special
case of a thin lens with the aperture stop at the lens.
For generality, consider a compound photographic lens with any valid
pupil magnification m p . The pupil magnification is defined by equation
(1.22) of chapter 1,
D XP
mp = .
D
Here D XP and D are the diameters of the exit pupil (XP) and EP,
respectively. When m p differs from unity, the pupils will be located away
from the principal planes of the compound lens. Precise equivalence between
different formats is possible with focus set at any chosen OP distance only if
the lens designs have the same symmetry and therefore the same pupil
magnification. This scenario is illustrated in figure 5.2.
Let s1 denote the distance from the first principal plane H to the OP for
format 1, and let sEP,1 denote the distance from H to the EP of format 1.
Analogously, let s2 denote the distance from H to the OP for format 2, and
let sEP,2 denote the distance from H to the EP of format 2. Since the total
distance from the EP to the OP must be the same for both cameras in order
that the perspective (and framing) be the same, the following condition that
will be utilised later in the proof must hold:
s1 − s EP,1 = s2 − s EP,2 . (5.6)

5-10
Physics of Digital Photography (Second Edition)

OP H EP H XP SP

s1 f1
sEP,1

OP H EP H XP SP

s2 f2
sEP,2
Figure 5.2. For a general pupil magnification, the vector distance s1 − s EP,1 for the larger format (upper figure)
and vector distance s2 − s EP,2 for the smaller format (lower figure) must be equal in order to achieve the same
perspective and framing. In the diagram, OP is the object plane, H and H’ are the first and second principal
planes of the compound lens, EP is the entrance pupil, XP is the exit pupil, SP is the sensor plane, and f ′ is the
rear effective focal length. The principal planes and pupils are not required to be in the order shown.

(2) The same framing

Recall the general expression for the AFoV defined by equation (1.30) of
chapter 1,
⎛ d ⎞
α = 2 tan−1 ⎜ ⎟.
⎝ 2 bf ⎠
As illustrated in section 1.3.1 of chapter 1 along with figure 5.2, the apex
of the AFoV is situated at the lens EP. Here f is the front (anterior)
effective focal length and d is the length of the sensor measured in
either the horizontal, vertical or diagonal direction, yielding the corre-
sponding AFoV α in that direction. Typically the diagonal AFoV is
chosen. The quantity b is the bellows factor defined by equation (1.31) of
chapter 1,
∣m∣
b=1+ .
mp

5-11
Physics of Digital Photography (Second Edition)

The bellows factor depends upon both m p and the magnification ∣m∣ defined
by equation (1.19) of chapter 1,
f
∣m∣ = .
s−f
Recall from chapter 1 that when focus is set at infinity, s → ∞ and so ∣m∣ → 0
and b → 1. At closer focus distances (i.e. as the OP upon which focus is set is
brought forward from infinity) and assuming the use of a traditional-
focusing lens, the value of b gradually increases from unity. For a fixed
framing, the AFoV therefore becomes smaller. Consequently, the object
appears to be larger than expected, particularly at close-focusing distances.
Different behaviour may occur for internallyfocusing lenses; these change
their focal length at closer focus distances and so the ‘new’ focal length must
be used in the bellows factor and AFoV formulae. In all cases, s is the OP
distance measured from the first principal plane after focus has been set.
Consider format 1 with a sensor diagonal d and lens with front effective
focal length f1 focused at an OP distance s1 measured from the first principal
plane. The AFoV and bellows factor are
d ∣m1∣
α1 = 2 tan−1 , b1 = 1 + .
2 b1 f1 mp
Now consider format 2 with a smaller sensor diagonal d/R, where R is the
equivalence ratio introduced defined by equation (5.2),
d1
R= .
d2
Assume the lens has front effective focal length f2 and is focused on the same
OP positioned a distance s2 from the first principal plane. In this case, the
AFoV and bellows factor are defined as follows:
d ∣m2∣
α2 = 2 tan−1 , b2 = 1 + . (5.7)
2 R b2 f2 mp

The requirement that the two systems have the same AFoV demands that
α1 = α2 and therefore
b1 f1 = R b2 f2 .
This can be rewritten as follows:
f1,w
f2,w = , (5.8)
R
where f1,w = b1 f1 and f2,w = b2 f2 are the equivalent working focal lengths.
As discussed in section 5.1.2, practical application of equivalence theory
requires use of the actual focal lengths marked on the lenses of the camera

5-12
Physics of Digital Photography (Second Edition)

formats being compared rather than the working values. The way forward is
to rearrange equation (5.8) in the following manner:
f1
f2 = . (5.9)
Rw
Here f1 and f2 are the equivalent focal lengths as marked on the lenses, and
Rw is the ‘working’ equivalence ratio deﬁned by equation (5.5) as previously
introduced in section 5.1.2,
⎛b ⎞
R w = ⎜ 2 ⎟R .
⎝ b1 ⎠

Note that b1 ≠ b2 when focus is set on an OP positioned closer than inﬁnity

since f1 ≠ f2. When focus is set at inﬁnity, b1 → 1 and b2 → 1, and so Rw → R.
Practical expressions for Rw are derived below.

Working equivalence ratio

In order to obtain the same AFoV and perspective on each format, the
working equivalence ratio Rw provides a correction to R when focus is set
on an OP positioned closer than infinity. The correction vanishes when
R = 1 or when s1, s2 → ∞, in which case Rw reduces to R.
In order to derive practical expressions for Rw , first recall that the
distances sEP,1 and sEP,2 between the first principal plane and EP of both
systems are defined by equation (1.24) of chapter 1:
⎛ 1 ⎞ ⎛ 1 ⎞
s EP,1 = ⎜1 − ⎟f, s EP,2 = ⎜1 − ⎟f . (5.10)
⎝ mp ⎠ 1 ⎝ mp ⎠ 2
Utilizing these expressions along with the magnification for both systems
yields the following formulae for the bellows factors:
s1 − s EP,1 s2 − s EP,2
b1 = , b2 = .
s1 − f1 s2 − f2
Since equivalent photographs must be taken with the same perspective,
equation (5.6) can be utilised, s1 − sEP,1 = s2 − sEP,2 . Substituting into
equation (5.5) yields a more useful expression for Rw :
⎛ s1 − f ⎞
Rw = ⎜ 1
⎟R. (5.11)
⎝ s2 − f2 ⎠
Since ∣m∣ = f /(s − f ), this expression reveals that the system magnifications
are always related via R when focus is set at any OP distance,
∣m1∣
∣m2∣ = .
R
Equation (5.11) can be used to obtain practical expressions for Rw . If f1 is
known, which is the marked focal length used on the larger format, then

5-13
Physics of Digital Photography (Second Edition)

s2 and f2 need to be eliminated from equation (5.11). Algebraic manipu-

lation leads to the following result:

⎛ m c,1 ⎞
R w = ⎜⎜1 − ⎟⎟R .
⎝ pc,1 ⎠

The correction mc,1 arises due to the differing system magnifications, and the
correction pc,1 arises for a non-unity pupil magnification. These corrections
are defined by the following expressions:
⎛ R − 1 ⎞ f1
m c,1 = ⎜ ⎟
⎝ R ⎠ s1
f1
pc,1 = m p + (1 − m p) .
s1

If f2 is known instead, which is the marked focal length used on the smaller
format, then s1 and f1 need to be eliminated from equation (5.11). Algebraic
manipulation leads to the following result:

R
Rw =
⎛m ⎞ .
1 + ⎜ p c,2 ⎟R
⎝ c,2 ⎠

Again the correction mc,2 arises due to the differing system magnifications,
and the correction pc,2 arises for a non-unity pupil magnification. These are
defined by the following expressions:
⎛ R − 1 ⎞ f2
m c,2 = ⎜ ⎟
⎝ R ⎠ s2
f2
pc,2 = m p + (1 − m p) .
s2

For the special case of a symmetric lens design with m p = 1, the separation
terms deﬁned by equation (5.10) vanish and the terms pc,1 and pc,2 are both
unity. In this case, the OP distances measured from the ﬁrst principal plane
will be identical for each format when equivalent photos are taken, and so
s1 = s2 = s.

(3) The same display dimensions

Equivalent photographs must be viewed at the same display dimensions and
from the same viewing distance. In this case, it is evident from equation
(1.33) of chapter 1 that the corresponding CoC diameters for the formats
being compared are inversely proportional to the enlargement factors from
the sensor dimensions to the output photograph dimensions,

5-14
Physics of Digital Photography (Second Edition)

1 1
c1 ∝ , c2 ∝ .
X1 X2
This is discussed in much greater detail in sections 5.2.3 and 5.2.4.
Since X2 = RX1 where R = d1/d2 is the equivalence ratio between the
sensor diagonals, it follows that the relationship between the CoC diameters
for the respective formats being compared is given by
c1
c2 = . (5.12)
R
This important result will be utilized below.
(4) The same depth of field
The total DoF can be expressed using equation (1.40) of chapter 1,
2 h(s − f )(s − s EP )
total DoF = ,
h 2 − (s − f ) 2
where
Df
h= .
c
Here D is the EP diameter and c is the CoC diameter. This equation is
strictly valid only for s ⩽ H where H = h + f − sEP is the hyperfocal distance
measured from the EP. When s = H, the rear DoF and therefore the total
DoF both extend to infinity.
Consider camera system 1 with sensor diagonal d, focal length f1, EP
diameter D1, CoC diameter c1, and consider an OP positioned at a distance
s1 from the first principal plane. The total DoF is given by
2 h1(s1 − f1 )(s1 − s EP,1)
DoF1 = ,
h12 − (s1 − f1 )2
where
D1 f1
h1 = .
c1
Also consider camera system 2 with a smaller sensor diagonal d/R, focal
length f2, EP diameter D2, CoC diameter c2, and consider the same OP
positioned at a distance s2 from the first principal plane. The total DoF is
given by
2 h2(s2 − f2 )(s2 − s EP,2 )
DoF2 = , (5.13)
h 22 − (s2 − f2 )2
where
D2 f2
h2 = . (5.14)
c2

5-15
Physics of Digital Photography (Second Edition)

Since the same perspective and framing are required at an arbitrary focus
distance, the relationship between f2 and f1 must satisfy equation (5.9),
f1
f2 = ,
Rw
where Rw is the working equivalence ratio. Furthermore, the fact that
equivalent photographs must have the same display dimensions requires
that the CoC diameters be related by equation (5.12),
c
c2 = 1 .
R
Substituting the above expressions for c2 and f2 into equation (5.14) yields
RD2 f1
h2 = .
R wc1
Finally, substituting f2 and h2 into equation (5.13) and working through the
algebra leads to the following result:
DoF2 = DoF1
provided the following condition is satisﬁed:
D2 = D1 . (5.15)

In other words, a necessary condition for producing equivalent photographs

is the use of the same EP diameter on each format. Since N = f /D, it follows
from equation (5.9) that equivalent f-numbers are related according to the
following equation:
N1
N2 = . (5.16)
Rw
Here N1 and N2 are the f-numbers marked on the lenses of the camera formats
being compared rather than the working values. When focus is set on an OP
positioned closer than inﬁnity, the above result reveals that the equivalence
ratio R must formally be replaced by the working equivalence ratio Rw when
relating equivalent f-numbers as well as equivalent focal lengths.
(5) The same shutter speed
Equivalent photos taken by cameras based on different formats must be
produced in the presence of the same amount of subject motion blur. This is
deﬁned as blur that occurs due to objects moving in the scene during the
exposure. Since equivalent photographs have the same perspective and
framing, it follows that equivalent photographs must be taken using the
same exposure duration or shutter speed,
t1 = t2 .

5-16
Physics of Digital Photography (Second Edition)

This requirement does not specify an appropriate shutter speed, but merely
states that it must be the same for each camera format. The required shutter
speed depends upon the nature of the scene luminance distribution and the
exposure strategy.
(6) The same mid-tone JPEG lightness
Recall from section 2.6.2 of chapter 2 that the ISO setting determines the
sensitivity of the JPEG output to incident photometric exposure, specif-
ically the mid-tone value with digital output level (DOL) = 118, which
corresponds to middle grey. Since different formats receive different
levels of photometric exposure when equivalent photographs are taken,
the resulting mid-tone lightness of the digital images will not be the same
unless equivalent ISO settings are used rather than the same ISO settings.

In order to derive the ISO equivalence relationship, ﬁrst recall from equation
(5.15) that the EP diameters on each format are the same when equivalent
photograpsh are taken. Since the shutter speeds are also required to be the same,
it follows that equivalent photographs are produced using the same total amount of
light. More speciﬁcally, the total luminous energy Q incident at the SP of both
formats will be the same:

Q1 = Q2 . (5.17)

In order to rigorously derive this result within Gaussian optics, consider the
photometric exposure at an inﬁnitesimal area element on the SP. This is deﬁned
by the camera equation derived in section 1.5 of chapter 1:
π t
H= L 2 T cos4 φ . (5.18)
4 Nw

Here L is the scene luminance at the corresponding scene area element. The cosine
fourth factor describes the natural reduction in illuminance at the SP that occurs
away from the optical axis (OA), and it depends upon the angle subtended by the EP
from the scene area element being considered. The factor T is the lens transmittance
factor, and t is the shutter speed. Recall that the working f-number Nw depends on
the OP distance and can be expressed in the form Nw = bN, where b is the bellows
factor. When focus is set at infinity, b → 1 and the working f-number reduces to the
f-number, Nw → N.
Now consider a larger format labelled format 1 and a smaller format labelled
format 2. From equation (5.18), the photometric exposure H1 at an infinitesimal area
element dA1 on the larger format sensor with focus set on a specified OP is given by
π t
H1 = L T cos4 φ . (5.19)
4 Nw2,1

5-17
Physics of Digital Photography (Second Edition)

Analogously, the exposure H2 at an inﬁnitesimal area element dA2 on the smaller

format sensor with focus set on the same OP is given by
π t
H2 = L 2 T cos4 φ . (5.20)
4 Nw,2
The working f-numbers for these systems are defined by
Nw,1 = b1N1, Nw,2 = b2N2.
It is instructive to note that substituting these into equation (5.16) and then utilising
equation (5.5) yields the following result:
Nw,1
Nw,2 = . (5.21)
R
Unlike the marked f-numbers, the working f-numbers are seen to be directly related
through the equivalence ratio R when equivalent photographs are taken with focus
set at any chosen OP distance.
As illustrated in figure 5.2, the luminance and cosine fourth terms appearing in
equations (5.19) and (5.20) will be the same for both formats at the same perspective
and framing or AFoV. It must also be assumed that the lens transmittance factors
are the same. As described in section 2.5.2 of chapter 2, the ISO 12232 standard
assumes a standard value T = 0.9 when ISO settings are measured, along with
φ = 10° and an infinite object distance. Since the shutter speeds are also the same,
combining equations (5.19), (5.20) and (5.21) yields the following relationship:
H2 = R2H1. (5.22)
The exposure at an infinitesimal area element on the smaller sensor is therefore a
factor R2 greater than the exposure at an infinitesimal area element on the larger
sensor when equivalent photographs are taken. The total luminous energy Q1 and
Q2 incident at the SP of the larger format and smaller format, respectively, during
the exposure are given by integrating the photometric exposure over the corre-
sponding sensor areas:

Q1 = ∫ H1 dA1, Q2 = ∫ H2 dA2 . (5.23)

As illustrated in ﬁgure 5.3, the area A2 of the smaller format sensor is a factor R2
smaller than A1:
A1 dA1
A2 = , dA2 = . (5.24)
R2 R2
Substituting equations (5.22) and (5.24) into (5.23) proves that the total luminous
energies incident at the SP of both camera formats are equal when equivalent
photographs are taken, which is consistent with equation (5.17).
Now consider the arithmetic average photometric exposures 〈H1〉 and 〈H2〉 for
both formats. These are deﬁned as
〈H1〉 = Q1/ A1, 〈H2〉 = Q2 / A2 .

5-18
Physics of Digital Photography (Second Edition)

sensor 1

sensor 2
d dy
dy /R
d/R

dx dx /R
Figure 5.3. The area of sensor format 1 is dxdy and the area of sensor format 2 is (dx /R )(dy /R ). Format 2 is
therefore a factor R2 smaller than format 1.

Utilising equations (5.17) and (5.24) yields

〈H2〉 = R2 〈H1〉. (5.25)

As described in chapter 2, the product of the arithmetic average exposure with the
exposure index speciﬁed by the ISO setting S deﬁnes a photographic constant P that
is independent of sensor format. The ISO 12232 standard [4] uses P = 10, which
indirectly implies an average scene luminance of approximately 18% for a typical
photographic scene. This means that
〈H1〉S1 = 〈H2〉S2 = P. (5.26)

Now combining equations (5.25) and (5.26) yields the required ISO equivalence
relationship:

S1
S2 = . (5.27)
R2

The required ISO setting on the smaller format is therefore a factor R2 lower than
the required ISO setting on the larger format when equivalent photographs are
taken. Equation (5.27) holds when focus is set at any chosen OP distance provided
equivalent f-numbers and focal lengths are being used.
As described in sections 2.2.3 and 2.6.2 of chapter 2, Japanese camera manu-
facturers are required to use the standard output sensitivity (SOS) method to
determine camera ISO settings [4, 5]. The SOS method is based on a measurement
of the photometric exposure required to map 18% relative luminance to DOL = 118
in an output 8-bit JPEG ﬁle encoded using the sRGB colour space. This DOL
corresponds with middle grey (50% lightness) on the standard encoding gamma
curve of the sRGB colour space, and so 18% relative luminance will always map to
50% lightness in the output JPEG ﬁle, irrespective of the shape of the JPEG tone
curve used by the in-camera image-processing engine.
In section 2.5 of chapter 2 it was argued that the value of the photographic
constant P = 10 corresponds to assuming that the average scene luminance will be
approximately 18% of the maximum for a typical photographic scene metered using
average photometry. This means that the average scene luminance will map to

5-19
Physics of Digital Photography (Second Edition)

middle grey in the output JPEG ﬁle provided SOS is used to deﬁne the ISO setting.
When equivalent photographs are taken using different formats, the average
photometric exposures at the SP of the formats are related via equation (5.25).
Consequently, the mid-tone lightness of equivalent photographs will correspond to
the standard value of middle grey provided equivalent ISO settings are used on the
respective formats according to equation (5.27).

5.2 Perceived resolution

Camera system RP is an objective metric that places an upper bound on the level of
scene detail that can be captured by the camera system. As discussed in section 5.5,
camera system RP depends upon the cut-off frequency of the camera system MTF.
Given a photographic scene with a high level of detail, consider camera systems A
and B, where system B has a greater camera system RP than system A. If an
observer moves closer to a print obtained from system A or if the print is enlarged, a
limiting situation will eventually be reached where the observer is unable to resolve
any further detail in the print. However, the observer may continue to resolve detail
in the print from system B upon moving closer or upon further enlargement. In other
words, the main advantage of a higher camera system RP is that the output
photograph can be viewed under stricter conditions before perceived quality begins
to deteriorate.
In other words, camera system RP only serves as an upper bound on the potential
detail that can be resolved by a human observer when viewing the output photo-
graph. Perceived resolution is a measure of the actual detail that could be resolved by
an observer, and this depends upon factors such as those listed in section 1.4.1 of
chapter 1:
• Viewing distance.
• Observer RP. This is the RP of the HVS at the speciﬁed viewing distance.
• Enlargement factor from the sensor dimensions to the dimensions of the
viewed output photograph. Note that for cross-format comparisons, photo-
graphs must be viewed at the same display dimensions and so the enlargement
factor is format dependent.

It is important to take perceived resolution into account when evaluating camera

system IQ. For example, lens MTF curves are typically plotted at spatial frequencies
much lower than the camera system cut-off frequency since the lower spatial
frequencies are much more relevant to typical viewing conditions.

5.2.1 Observer resolving power

The retina of the human eye has a granular structure. Various criteria exist for
measuring the detail resolved by the eye. Examples include point discrimination,
point separation, isolated line discrimination, and line pattern separation [6].
In photography, the ability of the eye to resolve detail is deﬁned using a pattern of
line pairs. Each line pair consists of a vertical black stripe and vertical white stripe of

5-20
Physics of Digital Photography (Second Edition)

equal width. As the width of the lines decreases, the stripes eventually become
indistinguishable from a grey block.
• The least resolvable separation (LRS) in mm per line pair is the minimum
distance between the centres of neighbouring white stripes or neighbouring
black stripes when the pattern can still just be resolved by the eye.
• Observer resolving power (observer RP) is the reciprocal of the LRS [6] and is
measured in line pairs per mm (lp/mm),
1
RP = .
LRS
The term ‘high resolution’ should be interpreted as referring to a high RP and a
small LRS.
Observer RP depends upon the viewing distance. The distance at which observer
RP is considered to be at its optimum is known as the least distance of distinct vision,
Dv , which is generally taken to be 250 mm or 10 inches [7, 8]. However, observer RP
also depends upon the visual acuity of the individual and the ambient conditions. At
Dv , camera and lens manufacturers typically assume a value of around 5 lp/mm
when deﬁning DoF scales [8].
Note that an observer RP of 5 lp/mm corresponds to 127 lp/inch, or equivalently
254 ppi (pixels per inch) for a digital image. This is the reason that 300 ppi is
considered sufﬁcient image display resolution for a high quality print viewed under
the standard viewing conditions described in the following section.

5.2.2 Standard viewing conditions

Camera and lens manufacturers typically assume the following set of standard
viewing conditions [8]:
• Viewing distance: L = Dv , where Dv = 250 mm is the least distance of distinct
vision, as described in the previous section.
• Observer RP at Dv : RP(Dv ) = 5 lp/mm when viewing a pattern of alternating
black and white lines, as described in the previous section.
• Enlargement factor X from the sensor dimensions to the dimensions of the
viewed output photograph: X = 8 for the 35 mm full frame format. For other
formats, X scales in direct proportion with the equivalence ratio R.

The standard value for the enlargement factor is based upon the 60° cone of vision
deﬁnes the limits of near peripheral vision. At the least distance of distinct
vision Dv = 250 mm, the cone of vision roughly forms a circle of diameter
2Dvtan 30° = 288 mm. If it is assumed that the width of the viewed image corresponds
with this diameter, then the enlargement factor from a 35 mm full-frame sensor will
be 8. This is shown in ﬁgure 5.4, where the full-frame sensor dimensions (36 × 24 mm)
have been enlarged by a factor of 8. The closest paper size that accommodates a print
of this size is A4.

5-21
Physics of Digital Photography (Second Edition)

image

288 mm

image
30◦
eye
288 mm Dv = 250 mm

Figure 5.4. A viewing circle of approximately 60° diameter deﬁnes the limits of near peripheral vision. This is
shown in relation to a 3 × 2 image (green) printed on A4 (210 × 297 mm) paper (black border) viewed at the
least distance of distinct vision Dv .

If the observer can resolve 5 lp/mm when viewing the output image at Dv and the
enlargement factor is 8, then the observer RP projected down to the sensor
dimensions becomes 40 lp/mm [8]. Mathematically,
RP(sensor dimensions) = RP(print viewed at Dv ) × X . (5.28)
Here ‘print’ refers to an image either printed or viewed on a display, and the value
RP(sensor dimensions) refers to the observer RP and not the camera system RP.
Nevertheless, using the 35 mm full-frame format as an example, it is clear that the
camera system does not need to resolve a line pair of width less than the value RP
(sensor dimensions) = 40 lp/mm when the output image is viewed under standard
viewing conditions. This detail cannot be resolved by the observer when the image
has been enlarged by a factor of 8 and viewed at Dv = 250 mm.

5.2.3 Circle of confusion: standard value

An equivalent viewpoint is that the value RP(sensor dimensions) derived in the
previous section affords a certain amount of defocus blur at the SP that remains
undetectable to the observer of the output image on a screen or print. The allowed
defocus blur can be treated rigorously within wave optics by calculating the camera
system point spread function (PSF) in the presence of defocus aberration. However,
this is not convenient for simple photographic calculations. Instead, photographic
calculations are based on Gaussian optics. Defocus blur is assumed to be the only
source of blur, and this blur is assumed to be uniform over a circle that approximates
the shape of the lens aperture. This is known as the circle of confusion (CoC). The
CoC describes the amount of defocus blur that can be tolerated on the SP before the
output image begins to appear ‘out of focus’ to the observer.
The relationship between the value RP(sensor dimensions) and the corresponding
CoC diameter c is given by

1.22
c= . (5.29)
RP(sensor dimensions)

5-22
Physics of Digital Photography (Second Edition)

LRS

c
Figure 5.5. The required CoC diameter on the sensor is slightly wider than the least resolvable separation
(LRS) between neighbouring like stripes. A convolution of the CoC with the line pattern renders the stripes
unresolvable to an observer of the output photograph under the speciﬁed viewing conditions.

This relationship is illustrated graphically in ﬁgure 5.5, and a derivation of the above
expression is given in section 5.2.5. Details separated by a distance less than the CoC
diameter on the SP cannot be resolved by the observer of the output image.
Recall from the previous section that RP(sensor dimensions) = 40 lp/mm for a
35 mm full-frame camera under standard viewing conditions. In this case,
c = 0.030 mm according to equation (5.29). Smaller or larger sensors require a
smaller or larger diameter c, respectively, since the enlargement factor X in equation
(5.28) will change accordingly. Table 1.2 of chapter 1 lists standard CoC diameters
for various sensor formats. These values correspond to standard viewing conditions.

5.2.4 Circle of confusion: custom value

If an image is viewed under known conditions that differ from the standard viewing
conditions assumed by the camera or lens manufacturer, the CoC diameter
appropriate to these conditions can be calculated using the following formula:

1.22 × L
c= . (5.30)
RP(print viewed at Dv ) × X × Dv

Here L is the known viewing distance and X is the known enlargement factor.
Evidently the CoC diameter, c, is proportional to the viewing distance and inversely
proportional to the enlargement factor. If these scale in the same manner, then c will
remain constant. If only the viewing distance increases, then c will increase and more
defocus blur can be tolerated.
If L is reduced and X is increased, the viewed image may no longer ﬁt within a
comfortable viewing circle. For example, a poster sized print situated close to the
observer cannot be accommodated by the cone of vision. In this case, the CoC
diameter will be very small. The extreme case of an image viewed at 100% on a
computer display may be beyond the practical limit for a CoC [9].

5.2.5 Circle of confusion: derivation

Recall that the relationship between the CoC diameter and the observer RP
projected down onto the SP is deﬁned by equation (5.29),

5-23
Physics of Digital Photography (Second Edition)

1.22
c= .
RP(sensor dimensions)
Here the CoC has been treated as a uniform blur circle within Gaussian optics. Given
a line pattern on the SP with spatial frequency equal to RP(sensor dimensions), a
convolution of the line pattern with the CoC will render the pattern unresolvable.
In order to derive the above equation, the CoC needs to be treated mathemati-
cally as a circle or ‘circ’ function. A circle function was used to describe the lens XP
in section 3.2.4 of chapter 3, and it was found that the optical transfer function
(OTF) corresponding to a circle function is a jinc function. The OTF for the CoC is
analogously given by the following expression:
OTFCoC(πcμr ) = jinc(πcμr ).
Here μr is the radial spatial frequency on the SP. The OTF for an example CoC is
plotted in figure 5.6. If the first zero of the OTF defines the CoC cut-off frequency
μc,CoC , then
πc μc,CoC = 1.22 π ,

and equation (5.29) then follows. Using the first zero of the OTF to define μc,CoC is
equivalent to assuming that an MTF value of 0% corresponds with the observer RP
projected down to the sensor dimensions, where MTF = ∣OTF∣.
According to the above analysis, the required CoC diameter is slightly wider than
the narrowest line pair that needs to be resolved at the SP. This is illustrated
graphically in figure 5.5. However, there is latitude in the value of the numerator of
equation (5.29), depending on the criterion used for the cut-off frequency, μc,CoC . For
example, the 1.22 factor in the numerator can be dropped if RP(sensor dimensions)

Figure 5.6. OTF for a uniform CoC with diameter c = 0.030 mm. The cut-off frequency is found to be
40 lp/mm.

5-24
Physics of Digital Photography (Second Edition)

is instead defined as the spatial frequency at which the MTF value drops to 20%
rather than zero. In this case, μc,CoC would become
1
μc = .
c
In fact, both definitions are approximate since observer RP is measured using a solid
line pattern rather than a sinusoidal line pattern. The contrast transfer function
(CTF) describes the frequency response of a square wave pattern, and this can be
related to the MTF of a sinusoidal pattern through Coltman’s formula [10],
4⎛ MTF(3μ) MTF(5μ) ⎞
CTF(μ) = ⎜MTF(μ) − + + ⋯⎟.
π⎝ 3 5 ⎠
In this book, equation (5.29) is used to define the CoC diameter. In practice, the
criterion used for the cut-off frequency is much less significant than the value
assumed for the observer RP. When comparing different sensor formats, the relative
size of the CoC is the most important factor.

5.2.6 Depth of focus

The defocus blur at the SP afforded by the observer of the output image can be
quantified by the allowable depth of focus at the SP.
It is straightforward to obtain the relationship between the prescribed CoC and
the tolerable depth of focus within geometrical optics. From figure 5.7, the CoC
diameter c is related to the allowed depth of focus W by
W
c= D XP .
′ )
2(s′ − sXP
The XP diameter D XP is related to the EP diameter D by the pupil magnification,
D XP = m pD.
′ were derived in section 1.3.3 of chapter 1,
The distances s′ and sXP
s′ = (1 + ∣m∣)f ′
′ = (1 − m p)f ′ .
sXP

Denoting the depth of focus by W, combining the above equations leads to the
following result:
⎛ n′ ⎞
W= ⎜ ⎟2 cNw .
⎝n ⎠

5.3 Lens MTF

The PSF and its Fourier transform (FT), the OTF, were deﬁned in chapter 3. The
MTF and phase transfer function (PTF) are respectively deﬁned by the modulus and
phase of the OTF.

5-25
Physics of Digital Photography (Second Edition)

OP EP H XP H SP

n n

DXP c

s s
sXP
Figure 5.7. Geometry defining the depth of focus W. Here D XP is the XP diameter, s′ is the image distance
′ is the distance from the second principal plane to the XP, m is
measured from the second principal plane, s XP
the Gaussian magnification, m p is the pupil magnification, and f ′ is the rear effective focal length.

Recall that a PSF can be thought of as a blur filter. The convolution operation
used in linear systems theory slides the PSF over the ideal (unblurred) image to
produce the real convolved (blurred) output image. The shape of a given PSF
determines the nature of the blur that it contributes, and most of the blur strength is
concentrated in the region close to the ideal image point. However, PSFs are difficult
to describe in simple numerical terms.
For incoherent lighting, the image at the SP can be interpreted as a linear
combination of sinusoidal irradiance waveforms of various spatial frequencies. This
suggests that aspects of IQ can be described by the imaging of sinusoidal target
objects such as stripe patterns. Significantly, the real image of a sinusoid will always
be another sinusoid, irrespective of the shape of the PSF. Furthermore, the direction
and spatial frequency of the real sinusoidal image will remain unchanged [7].
However, its contrast or modulation will be reduced compared to that of the ideal
image formed in the absence of the PSF. Since the MTF is the modulus of the FT of
the PSF, the MTF precisely describes how the modulation is attenuated as a
function of spatial frequency.
The MTF provides a useful quantitative description of IQ. As well as lenses,
various other components of the camera system can be described by their own MTF,
and the system MTF can be straightforwardly calculated by multiplying these
individual component MTFs together. The system MTF can be used to define
system RP, along with other metrics such as perceived image sharpness.
This section begins with a very brief description of lens aberrations, which
profoundly affect the nature of the lens PSF and various aspects of the correspond-
ing lens MTF. This is followed by an introduction to lens MTF plots. Based on the
lens MTF, the final section discusses lens RP. System MTF and system RP will be
discussed in sections 5.4 and 5.5, respectively.

5-26
Physics of Digital Photography (Second Edition)

5.3.1 Lens MTF: standard viewing conditions

Lens MTF data published by lens manufacturers will originate either from
calculations using lens design software, or preferably from direct measurements
using specialist devices. Since lens MTF varies with spatial position over the optical
image field, there are two main methods for plotting the data. These are illustrated in
figure 5.8.
1. Choose a set of three image heights, and plot lens MTF as a function of
spatial frequency for each of these image heights. As discussed in the next
section, this representation is suitable for illustrating lens RP.
2. Choose a set of three important spatial frequencies, and then plot lens MTF
as a function of image height for each of these spatial frequencies. As
illustrated in figure 5.9 for the 35 mm full frame format, image height refers
to the radially symmetric field position on the SP between the OA and the
lens circle [7, 9]. Significantly, the important spatial frequencies chosen are
those most relevant to lens performance when a human observer views the
output photograph under standard viewing conditions. This is the represen-
tation discussed in the present section.

Standard viewing conditions have been described in section 5.2. In summary,

• The image viewing distance is taken to be the least distance of distinct vision,
Dv = 250 mm.
• The enlargement factor from the sensor dimensions to the viewed output
image dimensions is such that the image ﬁts within a comfortable viewing
circle. The enlargement factor is 8 on the 35 mm full-frame format.
• The observer RP is taken to be 5 lp/mm at Dv .

When a photograph is viewed under standard viewing conditions, it was shown in

section 5.2.2 that spatial frequencies higher than 40 lp/mm on the SP of the 35 mm

Figure 5.8. (Left) Spatial frequency representation of lens MTF for three selected image heights. (Right) Image
height representation of lens MTF for three selected spatial frequencies. The example points indicated by
circles, squares and diamonds show identical data plotted using the two different representations. (Reproduced
from [9] with kind permission from ZEISS (Carl Zeiss AG).)

5-27
Physics of Digital Photography (Second Edition)

mm
.6
12 mm 21

18 mm

tan
ge
nti
la

sagittal

Figure 5.9. Image height is deﬁned as the radially symmetric distance from the image centre and can be
measured out to the lens circle. The short and long edges of a full-frame 36 × 24 mm sensor occur at image
heights or radial positions 18 mm and 12 mm, respectively.

full-frame format are not relevant unless the image is viewed more critically.
Accordingly, for a lens circle covering a 35 mm full frame sensor, a typical set of
spatial frequencies may include 10 lp/mm, 20 lp/mm and 40 lp/mm.
Although lens PSFs are generally not circular, the rotational symmetry of the lens
dictates that the shortest or longest elongations of the PSF will always be parallel or
perpendicular to the radius of the lens circle [7]. Different MTF curves will be
obtained depending on whether the line pattern used to measure MTF is oriented
perpendicular or parallel to the radius of the lens circle. These are known as the
tangential (or meridional) and the sagittal (or radial) directions, respectively. These
directions are indicated in figure 5.9, and usually both of these types of curve are
shown in lens MTF plots. Typically the longest elongation of the PSF is parallel to
the sagittal direction, in which case the sagittal MTF curve will have the higher
MTF. Tangential and sagittal curves that are similar in appearance are indicative of
a circular lens PSF.
The ideal Gaussian image position for the lens may not coincide precisely with the
camera SP. For example, field curvature may be present along with focus shift to
counterbalance spherical aberration (SA). Lens MTF curves are sensitive to the
plane of focus chosen because different parts of the image field will be affected
differently as the image plane is shifted [7]. Moreover, lens MTF data is dependent
upon the nature of the light used for the calculation or used to take the measure-
ment. Data representing a single wavelength can be dramatically different from data
representing polychromatic light [7].
Ideal lens MTF curves will have high values that remain constant as the image
height or radial field position changes. Such ideal curves are rarely seen in practice,
particularly at the maximum aperture or lowest f-number due to the increased
severity of the residual aberrations. Experience is required to fully interpret the
information present in lens MTF curves. Figure 5.10, which has been reproduced
from [7], shows curves for the Zeiss Planar 1.4/50 ZF lens along with the type of

5-28
Physics of Digital Photography (Second Edition)

Figure 5.10. Lens MTF for the Zeiss Planar 1.4/50 ZF as a function of 35 mm full-frame image height for
polychromatic white light. The spatial frequencies shown are 10, 20 and 40 lp/mm. The dashed curves are the
tangential orientation and the lens is focused at inﬁnity. (Figure reproduced from [7] with kind permission.)

5-29
Physics of Digital Photography (Second Edition)

information that can be extracted. References [7, 9] provide useful guidance for
interpreting MTF curves of real lenses.

5.3.2 Lens MTF: lens resolving power

Lens resolving power (lens RP) can be determined by plotting lens MTF data using
the second method described in the previous section. Typically a set of three image
heights are chosen, and lens MTF is plotted as a function of spatial frequency for
each of these image heights.
RP is generally expressed in spatial frequency units. For example, recall from
section 5.2 that observer RP is deﬁned as the reciprocal of the LRS for an alternating
black and white striped pattern and is expressed using line pairs per mm [6],

1
RP = .
LRS

Lens RP is formally defined as the spatial frequency at which the lens MTF first
drops to zero, and is again expressed using units such as lp/mm. As discussed below,
in practice a small percentage MTF value is used instead of zero.
Diffraction places a fundamental limit on achievable lens RP referred to as the
diffraction limit. Since the camera system MTF is the product of the individual
component MTFs, the RP of the camera system as a whole cannot exceed the
diffraction limit.
The diffraction limit can be defined in a precise mathematical way in the Fourier
domain as the spatial frequency at which the diffraction component of the optics or
lens MTF drops to zero for a sinusoidal target waveform. The expression for the
diffraction OTF was derived in section 3.2 of chapter 3:

⎧ ⎛ ⎞
⎪ 2 ⎜ −1 ⎛ μr ⎞ μr ⎛ μ ⎞2
⎟ for μr ⩽ 1
⎪ π ⎜cos ⎜ μ ⎟ − μ 1 − ⎜ μ ⎟
r
⎝ c⎠ ⎝ c⎠ ⎟ μc
Hdiff,circ(μr , λ) = ⎨ ⎝ c ⎠ . (5.31)
⎪ μr
⎪ 0 for >1
⎩ μc

This expression is valid for incoherent illumination so that the PSF and sinusoidal
target waveforms at the SP are linear in terms of irradiance, and the aperture has
been assumed to be circular. The diffraction MTF drops to zero at the Abbe cut-off
frequency μc,diff measured in cycles/mm. Spatial frequencies higher than μc,diff at the
SP cannot be resolved by the lens. The Gaussian expression for μc,diff is given by

⎛n⎞ 1
μc,diff = ⎜ ⎟ . (5.32)
⎝ n′ ⎠ λNw

5-30
Physics of Digital Photography (Second Edition)

This reduces to the following well-known expression when n = n′ and focus is set at
inﬁnity:
1
μc,diff = .
λN
In chapter 3, a lens free from aberrations was described as being diffraction limited. In
this case lens performance is limited only by diffraction and so its MTF obeys equation
(5.31). For a lens with residual aberrations, the MTF value at any given spatial
frequency cannot be greater than the diffraction-limited MTF at the same frequency,
and generally will be lower. Even though the RP of an aberrated lens remains that of a
diffraction-limited lens in principle, aberrations can reduce the MTF values at high
spatial frequencies to such an extent that the effective cut-off is much lower [11].
A useful way to demonstrate this graphically is via the anticipated root mean
square (RMS) wavefront error WRMS introduced in chapter 3. This is a way of
modelling the overall effect of aberrations as a statistical average of the wavefront
error over the entire wavefront converging from the XP to the image point. A small
aberration content is classed as an RMS wavefront error up to 0.07, medium
between 0.07 and 0.25, and large above 0.25 [12]. An empirical relationship for the
transfer function corresponding to the RMS wavefront error is provided by the
following formula [12, 13]:
HATF(μn ) = 1 − {(WRMS/0.18)2 [1 − 4(μn − 0.5)2 ]}.
Here μn = μr /μc,diff is the normalised spatial frequency, and μc,diff is the Abbe cut-off
frequency. This transfer function is referred to as the aberration transfer function
(ATF) [12] or optical quality factor (OQF) [14]. The approximate lens MTF is then
deﬁned as the product of the diffraction-limited transfer function and the ATF,

Figure 5.11. Approximate lens MTF as a function of normalised spatial frequency μ n = μr /μc,diff for a
selection of RMS wavefront error values WRMS . Aberrations are absent when WRMS = 0.

5-31
Physics of Digital Photography (Second Edition)

MTFlens ≈ Hdiff,circ(μr ) HATF(μr ).

This is plotted for several values of RMS wavefront error in figure 5.11. Since it is an
empirical model, negative values are ignored [12]. The model is accurate up to
WRMS = 0.18. A practical value for an effective cut-off frequency therefore requires a
less-stringent resolution criterion. For example, an effective cut-off frequency could
be defined where the real aberrated lens MTF drops to a small percentage value
instead of zero.
In order to establish an appropriate percentage value, it is useful to consider other
resolution criteria. For example, the Rayleigh two-point criterion is based upon light
from two isolated point sources in real space projected by the lens onto the SP. As
the points are brought closer together, a separation on the SP is reached such that
the points cannot be distinguished as separate objects due to the overlap of the
diffraction PSF associated with each point. The Rayleigh criterion for the LRS
(denoted by dRayleigh ) on the SP is defined as follows:
1.22 nλNw
dRayleigh = .
n′
When n = n′ and focus is set at infinity, this becomes
dRayleigh = 1.22 λN .

As illustrated in figure 5.12 in 1D and figure 5.13 in 2D, each Airy disk is centred at
the first zero ring of the other.
Now substituting a spatial frequency μr of order 1/dRayleigh into the diffraction
OTF defined by equation (5.31) above reveals that the Rayleigh criterion corre-
sponds to a diffraction-limited lens MTF value of approximately 9% [15]. In other

Figure 5.12. Rayleigh two-point resolution criterion in units of λN . The distance between the centres of the
Airy disks denoted by the pair of red vertical lines is the Airy disk radius itself, dRayleigh = 1.22 λN .

5-32
Physics of Digital Photography (Second Edition)

Figure 5.13. Rayleigh two-point resolution criterion illustrated in 2D.

words, MTF values between 0% and 9% do not provide useful detail according to
the Rayleigh criterion. This suggests that an appropriate effective cut-off frequency
for defining the RP of a real aberrated lens at a given f-number, wavelength, and
field position is the spatial frequency at which the real aberrated lens MTF similarly
drops to 9%. When aberrations are present, this effective cut-off frequency will be
smaller than the Abbe cut-off frequency. Depending on the application, other
percentage criteria such as 5%, 10% or 20% may be more suitable.
At the expense of a lower lens MTF at lower spatial frequencies, an inverse
apodization filter can be used to increase lens RP [11, 15]. Nevertheless, it should be
remembered that the RP of the camera system as a whole is the relevant metric for
describing the smallest detail that can be reproduced in the output image. As discussed
in section 5.5, camera system RP may analogously be defined in terms of the spatial
frequency at which the camera system MTF drops to a small percentage value.

5.3.3 Cross-format comparisons

As described in section 5.2.3, an image projected onto a smaller sensor format needs
a greater enlargement factor, X, to match the dimensions of the viewed output
image. This means that the spatial frequencies on the SP that are relevant to
standard viewing conditions scale in accordance with the equivalence ratio R
between the formats,

(μ (2), μ (2)) = R (μ (1), μ (1)).

x y x y

Here 1 and 2, respectively, denote the larger and smaller formats.

Recall that under standard viewing conditions, a typical set of spatial frequencies
used to plot lens MTF for the 35 mm full-frame format are 10 lp/mm, 20 lp/mm and
40 lp/mm. According to the above equation, these would increase to 15 lp/mm,
30 lp/mm and 60 lp/mm for a lens circle covering an APS-C sensor with R = 1.5.

5-33
Physics of Digital Photography (Second Edition)

When using the spatial frequency representation for a selected image height
(radial ﬁeld position), lens MTF can be more generally interpreted by using a spatial
frequency unit that is directly comparable between different sensor formats, namely
line pairs per picture height (lp/ph). This unit is related to lp/mm as follows:
lp/ph = lp/mm × ph.
In the present context, picture height (ph) measured in mm refers to the short edge of
the sensor [9]. Picture height can also used to specify the print height, in which case
the lp/mm value will downscale accordingly. Since picture height is proportional to
CoC diameter, lp/ph is a very general unit that is directly comparable between
different sensor formats provided the aspect ratios are the same.
Finally, cross-format comparisons should be made using equivalent photos where
possible, as described in section 5.1. This means that in the present context, lens
MTF should be compared using equivalent focal lengths and equivalent f-numbers.
For example, lens MTF at f = 24 mm and N = 4 on 35 mm full frame should be
compared with lens MTF at f = 16 mm and N = 2.8 on APS-C. When equivalent f-
numbers are used, the diffraction cut-off frequency will be the same on each format
when expressed using lp/ph.

5.3.4 Limitations of lens MTF

Certain information about the imaging properties of lenses cannot be represented by
MTF curves. For example, MTF curves cannot describe veiling glare. This arises
from unwanted reflections between the optical surfaces and from light scattered
from the interior barrel components, all of which cause light rays to reach the SP a
great distance from their intended target. Such phenomena are described by the
macro contrast of the lens, rather than the micro contrast described by lens MTF [7].
Another example is the information contained in skewed PSFs. Although the
separate tangential and sagittal MTF curves describe the elongation of the PSF in
the tangential and sagittal directions as shown in figure 5.9, the MTF cannot
describe any asymmetry in these directions. Such asymmetry arises from aberra-
tions such as coma. Coma produces a PSF with a comet-shaped tail in the sagittal
(radial) direction, and such information is contained in the PTF. As described in
section 3.1.9 of chapter 3, the PTF describes the changes in phase that cause a shift
of the real sinusoids relative to the ideal sinusoids as a function of spatial
frequency. The PTF can have a systematic effect on the nature of the image. In
particular, a skewed PSF will affect an edge in the image differently depending on
its relative orientation [7].
A final example is that of bokeh introduced in section 1.4.6 of chapter 1. Bokeh
describes the aesthetic quality of the out-of-focus blur produced by a lens. It is
desirable that the PSF be of circular shape near the SP, and this is indicated by the
presence of tangential and sagittal MTF curves that are similar. However, this is no
guarantee that the lens will produce bokeh of high aesthetic quality. Out-of-focus
blur arises from the defocused PSF defined far from the SP, whereas MTF curves are
derived from the PSF present in the immediate vicinity of the SP. For example, a

5-34
Physics of Digital Photography (Second Edition)

circular PSF near the SP can arise from over-corrected SA. As discussed in section
1.4.6 of chapter 1, this leads to annular defocused PSFs that produce a nervous or
restless blurred background [7]. Some of the best lenses are characterised by the
nature of the pleasing bokeh that they produce along with other special aesthetic
character that cannot be derived from MTF curves.

5.4 Camera system MTF

Recall from chapter 3 that the camera system MTF at a given spatial frequency is
deﬁned by the product or cascade of the individual component MTFs at the same
spatial frequency,
MTFsys(μx , μy ) = MTF1(μx , μy ) MTF2(μx , μy ) MTF3(μx , μy ) ⋯ MTFn(μx , μy ).

Various components of a camera system such as the lens, imaging sensor and optical
low-pass filter (OLPF) provide contributions to the total camera system MTF. Note
that the camera system MTF at a given spatial frequency cannot be larger than the
diffraction MTF at the same spatial frequency. This is due to the fact that the lens
MTF contribution is bounded from above by the diffraction MTF. An exception is
when MTF contributions from digital image sharpening filters such as unsharp mask
(USM) are included as part of the camera system MTF since these MTF
contributions can take values above 100%.
The camera system MTF generally describes micro contrast and edge definition in
the output image. It is important for two main reasons.
1. Camera system RP is determined by the nature of the camera system MTF at
the highest spatial frequencies.
2. Perceived sharpness of an output digital image is determined by the nature of
the camera system MTF at spatial frequencies related to the viewing
conditions, along with the contrast sensitivity of the HVS at those spatial
frequencies. These spatial frequencies are generally of relatively low value.
For example, under the standard viewing conditions described in sections
5.2.2 and 5.3.1, the relevant spatial frequencies range from 0 to 40 lp/mm on
the SP of the 35 mm full-frame format.

Camera system RP and perceived image sharpness are discussed further in sections
5.5 and 5.6.

5.4.1 Cross-format comparisons

As already discussed in section 5.3.3 for lens MTF, the important spatial frequencies
relevant to the viewing conditions scale in direct proportion with the equivalence
ratio R when performing cross-format comparisons.
Accordingly, it is preferable to express camera system MTF using line pairs per
picture height when performing cross-format comparisons,
lp/ph = lp/mm × ph.

5-35
Physics of Digital Photography (Second Edition)

Again picture height (ph) in mm refers to the short edge of the sensor [9]. The lp/ph
unit is consistent with equivalence theory provided the formats have the same aspect
ratio. In other words, the comparison should ideally be made using equivalent focal
lengths and f-numbers, as described in section 5.1. In this case, equivalent f-numbers
lead to the same diffraction cut-off frequency expressed using lp/ph.
Another advantage of the lp/ph unit relates to the sensor Nyquist frequency, μNyq ,
introduced in chapter 3. Recall that μNyq depends upon the pixel pitch or photosite
density when expressed using cycles/mm or lp/mm units. This is not directly
comparable between different formats due to the different enlargement factors
from the sensor dimensions to the dimensions of the viewed output photograph.
However, μNyq expressed using lp/ph units depends only on photosite count rather
than photosite density or pixel pitch. For example, μNyq for a 12.1 MP APS-C sensor
will be the same as μNyq for a 12.1 MP 35 mm full-frame sensor when expressed using
lp/ph units.
Another commonly encountered unit is cycles per pixel (cy/px),
p
cy/px = lp/mm × .
1000
Here p is the pixel pitch expressed in μm. When expressed using cy/px units, the
sensor Nyquist frequency becomes μNyq = 0.5 cy/px.

5.5 Camera system resolving power

Recall from section 5.3.2 that lens RP is defined by the effective lens cut-off
frequency, which is the spatial frequency at which the lens MTF drops to a small
specified percentage such as 9%. Lens RP primarily depends upon the selected
f-number along with the aberration content of the lens.
Similarly, camera system resolving power (camera system RP) is defined by the
camera system cut-off frequency μc , which is the spatial frequency at which the
camera system MTF drops to a small specified percentage value.
The ISO 12233 photographic standard provides a technique for measuring
average camera system MTF based upon photographing a target edge of high
contrast rather than target sinusoidal waveforms [16]. Since the edge is slanted, the
so-called average edge-spread function (ESF) can be determined from the output
image file at a much greater resolution than the sampling pixel pitch. The camera
system MTF can be derived from the ESF [17, 18].

5.5.1 Model camera system

Camera system RP will ultimately be limited by the lowest component cut-off
frequency. This can be illustrated using the simple model camera system derived in
chapter 3. The equations for the component MTFs were summarised in section 3.5,
but here the spatial frequency will be considered along one axis only. In this case the
model camera system MTF can be written as follows:
MTFsystem(μx , λ) = MTFdiff,circ(μx , λ) MTFdet−ap(μx ) MTFOLPF(μx ) .

5-36
Physics of Digital Photography (Second Edition)

The component MTFs are:

1. Lens MTF that assumes the lens is diffraction limited. This is deﬁned by
2
MTFdiff,circ(μx , λ) =
π( )2
cos−1 (μx / μc ) − (μx / μc ) 1 − (μx / μc ) .

This expression is valid when μx ⩽ μc . When μx > μc , MTFdiff,circ(μx , λ ) = 0.

The model calculation can be performed at a single representative wave-
length such as λ = 550 nm.
2. Sensor MTF that includes only the spatial detector-aperture contribution,
sin(π dxμx )
MTFdet−ap(μx ) = .
π dxμx

The detection area width, dx, can be varied up to the value of the pixel pitch,
px.
3. Four-spot OLPF MTF,
MTFOLPF(μx ) = ∣cos(π sxμx )∣ .
The spot separation sx between the two spots on a single axis can vary from
the pixel pitch, px, for a maximum strength ﬁlter down to zero for a non-
existent ﬁlter.

Consider a compact camera with a 1/1.7 inch sensor format and a 12 MP pixel
count. In this case the pixel pitch px ≈ 2 μm = 0.002 mm.
• The detector cut-off frequency is μc,det = 1/0.002 = 500 lp/mm assuming that
the photosite detection area width dx = px.
• The sensor Nyquist frequency is μNyq = 0.5 × (1/0.002) = 250 lp/mm.
• For simplicity, assume that the camera system cut-off frequency, μc , is the
spatial frequency at which the camera system MTF ﬁrst drops to zero. This
deﬁnes the camera system RP.

Figure 5.14(a) shows the model results at N = 2.8 in the absence of the OLPF. The
spatial frequencies have been expressed in lp/mm units. The camera system cut-off
frequency corresponds with the detector cut-off frequency at 500 lp/mm, and so the
camera system RP is limited by the detector aperture. Aliasing is present above the
sensor Nyquist frequency.
When the f-number increases to N = 8, ﬁgure 5.14(b) shows that the camera
system cut-off frequency has dropped to 228 lp/mm, and so the camera system RP is
limited by lens aperture diffraction. Furthermore, aliasing has been totally elimi-
nated since the camera system cut-off frequency has dropped below the sensor
Nyquist frequency at this f-number.
When a full-strength OLPF is included, aliasing can be minimised at any selected
f-number. For example, ﬁgure 5.14(c) shows the camera system MTF at N = 2.8
when a full-strength OLPF is included. In this case the camera system cut-off
frequency corresponds with the sensor Nyquist frequency at 250 lp/mm, and so the

5-37
Physics of Digital Photography (Second Edition)

Figure 5.14. Camera system RP illustrated using a model camera system MTF in 1D. The sensor component is
deﬁned by the detector aperture MTF, the optics component by the diffraction MTF, and the OLPF
component by a four-spot ﬁlter MTF.

camera system RP is limited by the OLPF. The camera system MTF above the
sensor Nyquist frequency is suppressed by the detector-aperture MTF and diffrac-
tion MTF [19].
The above simple model illustrates that in order to increase camera system RP by
reducing the pixel pitch, diffraction needs to be accordingly reduced by lowering the

5-38
Physics of Digital Photography (Second Edition)

f-number so that the diffraction cut-off frequency remains higher than the sensor
Nyquist frequency. However, lenses at their lowest f-numbers or maximum
apertures generally suffer from residual aberrations and are rarely diffraction-
limited in practice. Aberrations lower the effective lens cut-off frequency as
described in section 5.3.2, particularly when measured at large radial field positions.
This prevents achievable camera system RP from reaching the diffraction limit
defined by the Abbe cut-off frequency. When diffraction dominates, sensor pixel
pitch has negligible impact upon the camera system RP. However, reducing the pixel
pitch will improve the detector-aperture component of the sensor MTF at low
spatial frequencies and this can improve edge definition and perceived sharpness.
Perceived sharpness is discussed in the following section.

5.6 Perceived image sharpness

Sharpness is a perceived physiological phenomenon that is primarily inﬂuenced by
two main factors.
1. Edge deﬁnition in the recorded image described by the camera system MTF
at spatial frequencies relevant to the viewing conditions.
2. The contrast sensitivity of the HVS at the spatial frequencies relevant to the
viewing conditions.

Although perceived image sharpness is synonymous with camera system RP, the two
are not necessarily correlated. A high camera system RP corresponds to a high
camera system cut-off frequency, whereas high perceived sharpness arises from a
high camera system MTF at spatial frequencies to which the HVS is most sensitive.
The important spatial frequencies depend upon factors such as the viewing distance
and the contrast sensitivity of the HVS.
This means that for the same photographic scene and output photograph viewing
conditions, one camera system could produce an image with high perceived sharp-
ness but low resolution, whereas a different camera system could produce an image
with low perceived sharpness but high resolution.
For a given camera system, the photographer plays an important role in
inﬂuencing the camera system MTF and therefore perceived image sharpness. An
important example is diffraction softening controlled by the lens f-number.
Diffraction softening was introduced in chapter 3, and is discussed further in section
5.10.2. Other examples include camera shake, which contributes a jitter component
to the camera system MTF, and the resizing of the output digital image, which
provides a resampling contribution to the camera system MTF [14].
Note that digital image sharpening techniques can actually increase the overall
camera system MTF, although care should be taken not to introduce sharpening
artefacts. For example, the USM technique involves blurring the image, typically by
convolving the image with a small blur ﬁlter. The blurred image is then subtracted
from the original image to produce a mask that contains high frequency detail. The
mask is subsequently added to the original image with a weighting applied. A high
weighting can boost the high frequencies such that the MTF contribution from the

5-39
Physics of Digital Photography (Second Edition)

USM ﬁlter is greater than unity. The fact that digital image sharpening can improve
perceived sharpness but cannot improve resolution is evidence that sharpness and
resolution are not necessarily correlated [7].
Various metrics for perceived image sharpness have been developed. Examples
include:
• Modulation transfer function area (MTFA) [20, 21].
• Subjective quality factor (SQF) [22].
• Square root integral (SQRI) [23].
• Heynacher number [7].
• MTF50 [24].
• Acutance [25].

The MTF50 and SQF metrics are discussed below.

5.6.1 MTF50
Camera MTF50 is a very simple metric defined as the spatial frequency at which the
camera system MTF drops to 50% of the zero frequency value. MTF50 is influenced
by all contributions to the camera system MTF including the optics, sensor, and
image processing. A high MTF50 is associated with a camera system MTF that
remains generally high over a wide range of important spatial frequencies and is
therefore considered to be a reasonable indicator of perceived image sharpness.
MTF50P is a similar metric defined as the spatial frequency at which the camera
system MTF drops to 50% of the peak camera system MTF, Since excessive
sharpening improves MTF50 even when this leads to visible image artifacts such as
halos at edges, MTF50P is designed to remove any influence that excessive digital
image sharpening may have on the result [24].
When performing cross-format comparisons, MTF50 and MTF50P are directly
comparable between the camera systems when expressed using the line pairs per
picture height (lp/ph) unit described in section 5.3.3.
MTF50 can also be applied to lenses, in which case it should be referred to as lens
MTF50. However, section 5.3.1 showed that lens performance is best evaluated
using lens MTF at several important spatial frequencies as a function of radial field
position [7].
In order to relate MTF50 to an appropriate IQ level, MTF50 can be expressed
using units related to the print dimensions, such as line widths per inch on the print
[24]. However, this perceived sharpness rating does not take into account the
viewing distance or the relevant properties of the HVS.

5.6.2 Example: pixel count

Recall that a higher sensor pixel count raises the sensor Nyquist frequency, and this
raises the maximum achievable camera system RP. However, the maximum value
will not be achieved in practice if the optics cut-off frequency drops below the sensor
Nyquist frequency.

5-40
Physics of Digital Photography (Second Edition)

Figure 5.15. Model camera system MTF50 as a function of f-number for two 35 mm full-frame cameras with
different sensor pixel counts. The horizontal axis uses a base 2 logarithmic scale. Lens aberrations have not
been included.

Nevertheless, a higher sensor pixel count provides a higher detector-aperture

component MTF at low spatial frequencies, and this increases the camera system
MTF at low spatial frequencies. In turn, this can increase perceived image sharpness
even if a higher camera system RP is not realized. The increased perceived image
sharpness can be illustrated using camera system MTF50.
Consider the model camera system described in section 5.5.1. This model
system includes contributions to the total camera system MTF from lens aperture
diffraction, the detector aperture, and an OLPF. Figure 5.15 shows the model
camera system MTF50 plotted as a function of f-number for two 35 mm full-
frame cameras. The ﬁrst camera has a 12.1 MP sensor corresponding to a 8.45 μ m
pixel pitch, and the second has a 36.3 MP sensor corresponding to a 4.88 μ m pixel
pitch.
It can be seen that the 36.3 MP camera yields a much greater system MTF50 at
low f-numbers. It is also evident that the advantage of the higher pixel count is
gradually lost as the f-number increases from its minimum value. At N = 22, the
difference in MTF50 between the two cameras is relatively small since perceived
image sharpness is dominated by diffraction softening at high f-numbers. As
discussed in section 5.10.2, diffraction softening occurs due to the reduced
diffraction MTF contribution to the camera system MTF at low spatial
frequencies.

5.6.3 Subjective quality factor

Subjective quality factor (SQF) [22] takes into account the viewed output image size,
the viewing distance, and the contrast sensitivity of the HVS when evaluating
perceived image sharpness. It is deﬁned as follows:

5-41
Physics of Digital Photography (Second Edition)

μ2
SQF = k ∫μ 1
MTF(μ) CSF(μ) d (ln μ), (5.33)

where MTF denotes the camera system MTF, and CSF denotes the contrast
sensitivity function of the HVS.
It was found experimentally that there is a linear correlation between subjective
IQ and just-noticeable differences. This suggests that the above integration should
be carried out on a logarithmic scale with respect to spatial frequency,
dμ
d (ln μ) = .
μ
Furthermore, spatial frequency μ is expressed in cycles/degree units. The contrast
sensitivity function CSF(μ) describes the sensitivity of the HVS to contrast as a
function of μ. The CSF used in the original deﬁnition of SQF was taken to be unity
between 3 and 12 cycles/degree,
μ1 = 3
μ2 = 12.

In this case SQF is simply the area under the camera system MTF curve calculated
between 3 and 12 cycles/degree when the MTF is similarly expressed in cycles/degree
units and plotted on a logarithmic scale. The constant k is a normalisation constant
that ensures a constant MTF with value unity will yield SQF = 100,
μ2
100 = k ∫μ1
CSF(μ) d (ln μ).

In order to take into account the viewing distance, the cycles/degree units need to be
related to the cycles/mm units used at the SP. First consider the viewed output print.
Denoting θ as the angle subtended by the eye with one spatial cycle p on the print,
the spatial frequency μ(cycles/degree) at the eye and spatial frequency μ(cycles/mm)
on the print are speciﬁed by 1/θ and 1/p, respectively. From ﬁgure 5.16,
π
p = 2l tan(θ /2) ≈ lθ × ,
180
where l is the distance from the observer to the print, and the factor on the right-
hand side converts radians into degrees. Therefore,
180
μ(cycles/mm on print) = μ(cycles/degree) × .
lπ

p θ eye

l
Figure 5.16. Geometry used to relate cycles/degree at the eye to cycles/mm on the print.

5-42
Physics of Digital Photography (Second Edition)

This can now be projected onto the sensor dimensions by using the enlargement
factor,
180 ph
μ(cycles/mm on sensor) = μ(cycles/degree) × .
lπh
Here ph is the picture height (print or screen) in mm, and h is the height of the short
side of the sensor in mm. This formula can be used to relate the camera system MTF
as a function of μ(cycles/mm) to the μ(cycles/degree) appearing in equation (5.33).
An alternative spatial frequency unit used for camera system MTF calculation is
cycles/pixel, in which case the above formula can be expressed as follows [24]:
180 ph
μ(cycles/pixel) = μ(cycles/degree) × .
lπnh
Here nh is the number of pixels on the short side of the sensor, and so the sensor
height drops out of the equation. In practice, resampling will take place to satisfy a
pixels per inch (ppi) display resolution required by the print. As described in section
5.7, this will introduce a resampling MTF that will affect perceived image sharpness.
The original SQF calculation has been refined [24] by introducing the CSF
defined in [26] and extending the integration from zero up to the sensor Nyquist
frequency expressed in cycles/degree units. The revised CSF is shown in figure 5.17.
The revised integration limits [24] are given by
μ1 = 0
μ2 = μNyq .

This CSF applies a reﬁned weighting for how strongly the camera system MTF
inﬂuences perceived sharpness at each spatial frequency.

Figure 5.17. Contrast sensitivity function of the human eye [26].

5-43
Physics of Digital Photography (Second Edition)

SQF is a far superior metric for evaluating perceived sharpness compared with
MTF50. However, a drawback of SQF is that contour definition is not taken into
account [9]. Contour definition is higher for flatter MTF curves, and this is
associated with higher perceived sharpness in practice. In other words, additional
assessment is required to recognise special cases [9].

5.7 Image resampling

Many common digital image processing operations require resampling. Of partic-
ular relevance is the resizing of an image to satisfy a pixels per inch requirement on a
monitor or print. This was described in section 4.11.5 of chapter 4.
The number of pixels in a digital image can be increased by upsampling. Although
conventional upsampling cannot add new information, the aim is to preserve the
appearance of the original image as closely as possible. However, practical
interpolation filters are not exact. Undesirable artefacts may be introduced, and
interpolation filters also contribute a resampling MTF that reduces perceived image
sharpness.
The number of pixels can be reduced by downsampling. In this case, the signal
representing the image will be smoothed and so detail will be lost, and this will again
reduce perceived sharpness. The aim is to lose as little information as possible while
minimising aliasing artefacts.
It will be shown in sections 5.8 and 5.9 that image resampling also affects noise
and SNR.
The optimum practical solution for image resampling varies depending upon the
application and the nature of the image itself. Nevertheless, it is useful to have a
mathematical understanding of the basic idea of resampling. The following sections
briefly discuss upsampling and downsampling based on the introduction to sampling
theory given in chapter 3. In particular, it will be shown that the performance of
resampling methods can be evaluated from the resampling MTF, which can in
principle be included as part of the camera system MTF.

5.7.1 Upsampling
It is useful to think of the digital image in terms of a sampled signal. As described in
chapter 3, the sampled signal arises from the sampling and digitisation of the optical
image projected at the SP. Aliasing may be present in the sampled signal, and the
amount of aliasing depends on the value of the camera system cut-off frequency at
the time of image capture, μc , in relation to the sensor Nyquist frequency, μNyq .
Nevertheless, the aim when upsampling is simply to increase the sampling rate
without affecting the signal and image appearance.
For clarity, the following discussion will be restricted to resampling in 1D.
Extension to 2D is straightforward. Upsampling in principle requires reconstruction
of the continuous signal f (x ) from known sample values so that f (x ) can be
resampled at a new and more densely spaced grid of positions [27].
In practice, the f (x ) values at the new sample positions can be directly calculated
through convolution. The value of f (x ) at an arbitrary new sample location x can be

5-44
Physics of Digital Photography (Second Edition)

reconstructed from the discrete input image samples f˜ (x ) by convolving with a

reconstruction kernel h(x ),

f (x ) = f˜ (x ) ∗ h(x )
= ∑ f˜ (xi ) h(x − xi ).
i

This amounts to centring the kernel at the new sample location x, and then summing
the products of the known discrete sample values and kernel values at those
positions. The sum over i runs over the number of known sample values within
the range of the reconstruction kernel. The example shown in ﬁgure 5.18 extends
over four samples [27].
It was shown in chapter 3 that the ideal reconstruction ﬁlter is a sinc function that
extends over all space,
sin(πx )
h ideal(x ) = sinc(x ) = .
π (x )
Here x is normalised to the pixel sample spacing so that Δxi = 1. The MTF of a sinc
function is a rectangle function,
MTFideal = rect(x ) .

h(x) f˜(x4 )

f˜(x1 )

x1 x2 x x3 x4

Δxi
resample

f (x)

x
Figure 5.18. Image resampling in 1D. Here the non-ideal interpolation kernel h(x ) (green) centred at position
x extends over four input samples located at x1, x2, x3, and x4. The output function value f (x ) at position x is
shown in the lower diagram. The input image sample spacing is indicated by Δxi .

5-45
Physics of Digital Photography (Second Edition)

Figure 5.19. Resampling MTF for several example ﬁlter kernels. The sensor Nyquist frequency is located at
0.5 cycles per pixel. When upsampling or downsampling, the pixel units are always deﬁned by the image with
the larger pixel sample spacing.

This is plotted in figure 5.19. The horizontal axis in the figure represents cycles per
pixel, and the sensor Nyquist frequency is located at 0.5 cycles per pixel prior to any
resampling. The maximum frequency content of the digital image itself is always
0.5 cycles/pixel, which is often referred to as the Nyquist frequency of the image.
The region to the left of the Nyquist frequency in figure 5.19 is referred to as the
passband, and the region to the right is referred to as the stopband. The rectangle
function is equal to unity in the passband and zero in the stopband. This ensures that
all frequencies below the Nyquist frequency are perfectly reconstructed and that all
frequencies above the Nyquist frequency are completely suppressed. Note that the
passband may already contain aliased content if the digital image was captured with
frequency content above the sensor Nyquist frequency included since this high
frequency content will have folded into the passband.
Since the sinc function extends over all space, approximations must be made in
practice. The simplest reconstruction kernel is defined by nearest-neighbour
interpolation,
⎧ 1
⎪
⎪1 0 ⩽ ∣x∣ < 2
h nn(x ) = ⎨ .
⎪ 0 1 ⩽ ∣x∣
⎪
⎩ 2
Again x is normalised to the pixel sample spacing. The performance of this
reconstruction filter can be evaluated by comparing its transfer function with that of
the ideal interpolation function. The nearest-neighbour interpolation MTF is given by
MTFnn(μx ) = sinc(μx ) .

5-46
Physics of Digital Photography (Second Edition)

Figure 5.19 shows that the sharp transition between the passband and stopband is
attenuated. The decay in the passband blurs the image. Furthermore, there is
considerable frequency leakage into the stopband. Unless the replicated frequency
spectra of the original sampled image are sufﬁciently far apart, frequency leakage
due to non-ideal reconstruction will introduce spurious high frequency content
above the Nyquist frequency that can cause jagged edges to appear in the image.
Although jagged edges arise from non-ideal construction, they can be interpreted as
aliasing artefacts in the present context since the spurious high frequency content
will become part of the passband when the image is resampled [27].
An improvement on nearest-neighbour interpolation is bilinear interpolation,
⎧1 − ∣x∣ , 0 ⩽ ∣x∣ < 1
hbl(x ) = ⎨ .
⎩ 0, 1 ⩽ ∣x∣

Since x is normalised to the pixel sample spacing, the bilinear reconstruction kernel
extends over two samples in 1D and four samples in 2D. The MTF is given by
MTFbl(μx ) = ∣sinc 2(μx )∣ .
Figure 5.19 shows that frequency leakage into the stopband is reduced when using
bilinear interpolation compared with nearest-neighbour interpolation.
An even better approximation to the sinc function is provided by bicubic
convolution [28],
⎧(α + 2)∣x∣3 − (α + 3)∣x∣2 + 1, 0 ⩽ ∣x∣ < 1
⎪
hbc(x ) = ⎨ α∣x∣3 − 5α∣x∣2 + 8α∣x∣ − 4α , 1 ⩽ ∣x∣ < 2 .
⎪
⎩ 0, 2 ⩽ ∣x∣

A commonly used value for α is −0.5. The reconstruction kernel extends over four
samples in 1D and sixteen samples in 2D. The MTF is given by the following
expression [29]:

3
MTFbc(μx ) =
(πμx )2
(sinc 2(μx ) − sinc(2μx ))
2α
+
(πμx )2
(3 sinc 2(2μx ) − 2 sinc(2μx ) − sinc(4μx )) .
Figure 5.19 shows that bicubic convolution with α = −0.5 exhibits reduced
attenuation in the passband and greatly reduced frequency leakage into the
stopband. Bicubic convolution is the standard interpolation kernel used by
Adobe® Photoshop. Many other interpolation methods exist. For example,
Lanczos resampling uses a kernel based on a windowed sinc function. Such methods
can give superior results compared to bicubic convolution but may introduce other
types of reconstruction artefact [27].
In summary, the aim when upsampling an image is to leave the continuous
representation of the signal intact, and simply increase the sampling rate and

5-47
Physics of Digital Photography (Second Edition)

associated pixel count. In the Fourier domain, this is equivalent to increasing the
spacing between the replicated spectra. Unfortunately, non-ideal reconstruction will
corrupt the signal and introduce a combination of image blur and jagged edges. The
image blur will result from attenuation in the passband, and jagged edges may
appear due to frequency leakage into the stopband. The latter will occur if the
replicated spectra of the original sampled image are sufﬁciently close together such
that spurious high frequencies are retained.

5.7.2 Downsampling
Recall that when upsampling an image, the signal needs to be reconstructed in order
to increase the sampling rate. In this case the interpolation filter acts as a
reconstruction filter. The size of the reconstruction kernel, h(x ), remains fixed as it
depends upon the pixel sample spacing of the original sampled image rather than the
upsampled image.
However, the interpolation filter plays a different role when downsampling an
image due to the fact that information will be discarded. In contrast to upsampling,
which will increase the separation between the replicated frequency spectra, down-
sampling will decrease the separation. When a digital image produced by a camera is
downsampled, the spectra will overlap and corrupt the passband. This is known as
undersampling, and the interpolation filter must construct the output by acting as an
anti-aliasing filter rather than a reconstruction filter. The interpolation filter should
ideally bandlimit the image spectrum to under half the new sampling rate. Although
the new signal will correspond to a smoothed lower-resolution representation of the
original photographic scene, bandlimiting to half the new sampling rate will
minimise aliasing.
A simple way to achieve the above goal is to increase the size of the interpolation
kernel in accordance with the scale reduction factor used for downsampling [27, 30],
f (x ) = f˜ (x ) ∗ ah(ax )
= a ∑ f˜ (xi ) h(ax − axi ).
i

Here a < 1 is the scale reduction factor. In contrast to upsampling, where Δxi = 1
corresponds to the input image sample spacing as indicated in ﬁgure 5.18, here
Δaxi = 1 corresponds to the output image sample spacing. In other words, the sum
over i covers a greater number of input pixels than the number used when
upsampling. For example, the bicubic convolution kernel may extend over many
input samples, but only four and sixteen output samples in 1D and 2D, respectively.
The ﬁlter frequency response will accordingly be narrower due to the reciprocal
relationship between the spatial and frequency domains. This relationship is
expressed by the following identity, where H is the FT of h,
⎛μ ⎞
H ⎜ x ⎟ = FT{ah(ax )} .
⎝a⎠

5-48
Physics of Digital Photography (Second Edition)

The cut-off frequency where H → 0 will be reached at a lower spatial frequency.

Poorly implemented downsampling algorithms are sometimes encountered that
introduce aliasing by failing to include the scale reduction factor, a, in the
interpolation ﬁlter kernel. In such cases, a blur ﬁlter can be applied before down-
sampling in order to reduce aliasing.

5.8 Signal-to-noise ratio (SNR)

Signal-to-noise ratio (SNR) is a fundamental measure of camera IQ. Photographers can
follow the expose to the right (ETTR) exposure strategy in order to utilise the maximum
available SNR for a given camera system. ETTR is discussed in section 5.10.4.
Input-referred and output-referred units were introduced in section 3.9 of chapter 3.
Input-referred units express the signal and noise in terms of electron count ne, and
output-referred units express the signal and noise in terms of raw values. The raw
values can be expressed using digital numbers (DN) or identically using analog-to-
digital units (ADU).
SNR per photosite can be expressed as a ratio or alternatively can be measured in
stops or decibels. In terms of input-referred units,
ne
SNR (ratio) = :1
ne,noise
⎛ n ⎞
SNR (stops) = log2⎜ e ⎟
⎝ ne,noise ⎠
⎛ n ⎞
SNR (dB) = 20 log10⎜ e ⎟ .
⎝ ne,noise ⎠

Here ne,noise is the total signal noise [31]. This may include ﬁxed pattern noise (FPN)
along with temporal noise, which must be added in quadrature. Sometimes only
temporal noise is included in the deﬁnition [32]. In terms of output-referred units,
n DN
SNR (ratio) = :1
n DN,noise
⎛ n ⎞
SNR (stops) = log2⎜ DN ⎟
⎝ n DN,noise ⎠
⎛ n ⎞
SNR (dB) = 20 log10⎜ DN ⎟ .
⎝ n DN,noise ⎠

In practice, SNR can be calculated in terms of output-referred units using the raw
data and the methods for measuring noise described in chapter 3. By using equation
(3.50) of chapter 3, the conversion factor g can then be used to convert output-
referred units to input-referred units,
ne = g n DN . (5.34)

5-49
Physics of Digital Photography (Second Edition)

The conversion factor g has units e−/DN and is deﬁned by equation (3.51),
U
g= .
G ISO
The mosaic label has been dropped for clarity. The unity gain U is the ISO gain
setting that results in one electron being converted into one DN. Since g depends
upon the ISO gain G ISO, the relationship between input and output-referred units is
dependent on the associated ISO setting, S. Noise contributions such as read noise
that were not part of the original detected charge can also be converted to input-
referred units.

5.8.1 SNR and ISO setting

SNR per photosite varies as a function of signal. In order to increase SNR per
photosite for a given pixel pitch and sensor format, camera manufacturers can adopt
the following strategies:
• Increase the QE. This increases the rate at which incident photons are
converted to electrons for a given exposure, H. This in turn increases the
conversion factor, g, so that a given DN corresponds to a higher ne.
• Lower the read noise. This is particularly important at low signal levels.
• Increase FWC per unit sensor area. This increases the maximum achievable
signal and therefore the maximum achievable SNR.

SNR also varies as a function of ISO setting. However, ISO settings themselves are
characteristics of a camera and are not performance metrics. The metric of
signiﬁcance is the SNR as a function of signal at a given ISO setting.
To understand this, recall from chapter 3 that the base ISO setting corresponds to
the gain setting, G ISO, that uses the least analog ampliﬁcation and therefore allows
full use of the sensor response curve, either through utilisation of full-well capacity
(FWC) or saturation of the analog-to-digital converter (ADC). The base ISO
setting, Sbase , can be associated with an ISO gain G ISO = 1. A better QE and higher
FWC per unit sensor area are both favourable in terms of SNR; however, a better
QE can raise Sbase whereas a higher FWC per unit area can lower Sbase . In older
cameras, a very low Sbase is characteristic of poor QE. In newer cameras that have
good QE, a very low Sbase is characteristic of a very high FWC per unit sensor area.
This extra photographic capability can be utilised by taking advantage of the longer
exposure times available, for example, through the use of a tripod in landscape
photography.
It should also be remembered that ISO settings are independent of photosite area,
as explained in section 2.6 of chapter 2.

5.8.2 SNR: output-referred units

Figure 5.20 shows SNR per photosite plotted using output-referred units (raw values
expressed in DN) for the Olympus® E-M1 camera at a range of ISO settings.

5-50
Physics of Digital Photography (Second Edition)

Figure 5.20. SNR per photosite plotted as a function of raw level (DN) at a variety of ISO settings for the
Olympus® E-M1 camera. Here SNR is lowered at a given raw level when the ISO setting is raised since less
exposure has been used to obtain the same raw data.

Figure 5.20 was obtained using the experimental methods described in section 3.9
of chapter 3. SNR and raw level have been expressed in stops by taking the base 2
logarithm. Only read noise and photon shot noise have been included as noise
sources.
First, notice that the maximum SNR at a given ISO setting is obtained at the
highest raw value. This is the premise behind the ETTR technique discussed in
section 5.10.4. Each curve rises linearly at low exposure with slope approximately
equal to unity. The slope drops to one half at higher exposure levels as the square
root relationship between signal and photon shot noise dominates. The main reason
that SNR increases with exposure H is that photon shot noise is proportional to H ,
and so signal to photon-shot-noise ratio is also proportional to H ,
ne H
SNR ph = ∝ = H. (5.35)
σph H
Second, notice that SNR is lowered as the ISO setting increases. It is commonly
assumed that higher ISO settings are inherently noisier, however this is not the case,
as discussed previously in section 3.9.3 of chapter 3. In fact, the curves in ﬁgure 5.20
indicate lower SNR as the ISO setting is raised since less photometric exposure H has
been used to obtain the same raw level at the higher ISO setting [33]. This is evident
from equation (5.34),
ne = g n DN .
A higher ISO setting lowers the conversion factor g, and this lowers the signal ne that
corresponds to a given raw level. This means that FWC cannot be utilised when the
ISO setting is raised above the base value. For example, the ISO 200 curve has been

5-51
Physics of Digital Photography (Second Edition)

obtained using one stop higher H than the ISO 400 curve. The base ISO setting
offers the highest maximum SNR since a higher H can be tolerated before raw
saturation.
In order to see that higher ISO settings can in fact offer improved SNR, it is
necessary to compared curves obtained at the same ﬁxed exposure level rather than
the same raw level [33]. This can be illustrated using input-referred units.

5.8.3 SNR: input-referred units

Figure 5.21 shows the same data as ﬁgure 5.20 but has instead been plotted as a
function of photometric exposure H expressed in stops. This graph was obtained by
converting from output-referred units (DN) to input-referred units (electron count)
using the appropriate conversion factor g at each ISO setting. Subsequently, the base
2 logarithm of the photoelectron count ne was taken. The maximum resulting value
was then normalised to 12 since the camera uses a 12-bit ADC. In other words, the
electron count obtained from 12 stops of photometric exposure corresponds to the
maximum raw value.
An equivalent method for obtaining this graph from ﬁgure 5.20 is to replace raw
levels in stops on the horizontal axis by exposure in stops, and then to shift each
curve to the left according to the difference in the maximum number of stops of
exposure used relative to the base ISO setting. When SNR is plotted as a function of
electron count or exposure, the graph is referred to as a photon transfer curve.
Notice that higher ISO settings are seen to provide a higher SNR at low signal
levels, corresponding to the low-exposure or shadow regions in the output photo-
graph. This is known as shadow improvement. The advantage arises from better
signal-to-read-noise ratio as the ISO gain G ISO increases. As explained in section 3.9

Figure 5.21. SNR per photosite plotted as a function of exposure at a variety of ISO settings for the Olympus®
E-M1 camera. Higher ISO settings provide a SNR advantage in the low-exposure regions of the image.
However, highlight headroom is lost and raw DR is reduced.

5-52
Physics of Digital Photography (Second Edition)

of chapter 3, the programmable gain amplifier (PGA) amplifies all of the voltage
signal but only part of the read noise, specifically the contribution arising from
readout circuitry upstream from the PGA [32, 33].
The penalty for using a higher G ISO is that the ADC will saturate before FWC can
be utilised. Above the base ISO gain, the available electron-well capacity is halved
every time G ISO is doubled. Consequently, the maximum achievable SNR along with
raw DR will be lowered. This is indicated on figure 5.21 by the arrows that drop to
zero.
For the same reasons, a camera manufacturer can improve SNR at low signal
levels by using a higher conversion factor g. However, the ADC may saturate before
FWC is utilised since the ADC power supply voltage is fixed. The camera
manufacturer must balance these trade-offs when choosing optimal values [32].
The above analysis reveals that if photometric exposure H is unrestricted by
photographic conditions, it makes sense to use the base ISO setting, G ISO = 1, which
is defined by S = 200 on the camera used to plot figure 5.21. This enables the full
sensor response curve to be utilised and therefore maximises the use of H in
producing the voltage signal that is converted into raw data. This maximises SNR
since SNR increases as H .
On the other hand, if photographic conditions restrict H to a fixed maximum
value and there is still headroom available at the top of the sensor response curve,
then SNR may be improved at low signal levels by using a higher ISO setting [33].
The above strategies for improving SNR can be implemented using the ETTR
technique discussed in section 5.10.4.

5.8.4 ISO invariance

Although figure 5.21 shows that higher ISO settings offer improved SNR at low
signal levels, the advantage gradually lessens each time the ISO setting is raised.
Eventually a value is reached where the upstream read noise dominates and a higher
ISO setting would bring no further advantage. For the Olympus® E-M1 used to
produce figure 5.21, all curves above S = 800 lie almost on top of each other. The
precise S above which the corresponding ISO gain would bring no SNR advantage is
referred to as the ISO-less setting.
The Olympus® E-M1 camera used to produce figure 5.21 has a 12-bit ADC and a
base ISO setting Sbase = 200. This means that 12 stops of exposure can be tolerated
before raw saturation occurs at S = 200, assuming that the sensor response curve and
ADC are both perfectly linear and that FWC corresponds to ADC saturation. At
higher ISO settings, a higher signal voltage is quantized for a given exposure level. If
the ISO-less setting is S = 800, then 10 stops of exposure can be tolerated before
saturation at S = 800, but only 9 stops can be tolerated at ISO 1600. In other words,
ISO settings above the ISO-less value bring no further SNR advantage and yet lead
to further loss of exposure headroom and raw DR.
It is therefore not advisable to raise S above the ISO-less value. In practice, it is
preferable to underexpose at the ISO-less setting if necessary in order to preserve raw
DR and then increase the output image lightness digitally using a tone curve in

5-53
Physics of Digital Photography (Second Edition)

editing software [33]. Unlike analog gain, digital gain applied using tone curves is a
simple multiplication of digital values and this cannot bring any SNR advantage.
Some recent cameras have such low levels of downstream read noise that the ISO-
less setting is very low and the camera can be described as being ISO invariant [34].
This means that if the camera is set to store raw ﬁles, the ISO setting can effectively
be left at its base value, and no SNR penalty will arise by applying digital gain
instead when processing the raw ﬁle.

5.8.5 SNR and pixel count

The discussion so far has centred on SNR per photosite (sensor pixel). However, this
metric is only directly comparable between camera sensors that have the same area
per photosite (sensor pixel size). Furthermore, SNR per photosite is only directly
comparable between cameras that use the same sensor format or size. This is due to
the fact that SNR depends on the size of the area over which it is measured [33, 35].
For example, consider a single photosite at the SP with signal n1 and associated
temporal noise σ1. For simplicity, assume that FPN has been eliminated. The SNR
for this photosite can be expressed as follows:
n
SNR1 = 1 .
σ1
If the irradiance at the SP is uniform, the total signal n and total temporal noise σ for
a 2 × 2 block of photosites will be
n = n1 + n2 + n3 + n 4 = 4n1
σ = σ12 + σ22 + σ32 + σ 42 = 4σ12 = 2σ1.

The total SNR will be

n
SNR = = 2 SNR1.
σ
Therefore increasing the charge detection area by a factor of 4 through pixel binning
has increased the SNR by a factor of 2. For a uniform exposure at the SP, SNR is
seen to increase as the square root of the charge detection area. For a non-uniform
exposure, the increase in SNR will fall short of this value [33].
The explanation for the above lies in the fact that pixel binning is proportional to
averaging over photosites. Since temporal noise is uncorrelated and is measured by
the standard deviation σ from the mean value of its statistical distribution, averaging
reduces temporal noise. On the other hand, the charge signal at a photosite is
completely correlated with its neighbours when the irradiance at the SP is uniform,
and so the signal is unchanged by averaging. In areas of the photographic scene that
contain details and gradients, the exposure at the corresponding areas of the SP will
not be uniform. In this case, averaging will smooth the signal and the reduction in
temporal noise will not be as large.
Digital images appear to be less noisy after having been downsampled. This is
particularly noticeable when downsampling and subsequently downscaling an image

5-54
Physics of Digital Photography (Second Edition)

to a small size for displaying on the internet. The noise reduction achieved when
downsampling can be explained by the pixel binning example given above. If the
downsampling algorithm is implemented correctly as described in section 5.7, the
interpolation filter will construct the output by acting as an anti-aliasing filter rather
than a reconstruction filter, and will bandlimit the image spectrum to the Nyquist
frequency of the new sampling grid. The filtering operation has a similar effect to
averaging, and so the signal will be smoothed when details and gradients are present
in the scene. However, the temporal noise will be reduced by the anti-aliasing filter
since temporal noise is uncorrelated. The greatest reduction in noise will occur in
uniform areas of the scene.
In conclusion, downsampling yields larger pixels for the same display area, and
each of the new larger pixels will have a greater SNR per pixel. This means that the
downsampled image will have less noise at the pixel level, although noise is not
necessarily reduced at the image level. The noise reduction has been achieved by
discarding resolution [33].
Since a gain in SNR per pixel can be obtained by donwsampling, an SNR metric
that is directly comparable between camera sensors that have different photosite
areas will be area-based rather than photosite-based. This is discussed in the next
section.

5.8.6 SNR per unit area

For a given sensor format, SNR for cameras that have different sensor pixel counts
should be compared at a ﬁxed spatial scale [33]. For example, an appropriate metric
is SNR per unit area of the sensor,
SNR per photosite
SNR per unit area = .
Ap

Here Ap is the photosite area in microns squared. Although the signal is proportional
to Ap, the square root is required since photon shot noise is proportional to Ap .
Alternatively, SNR could be normalised according to a speciﬁed percentage frame
height or percentage area of the sensor,
A%
SNR per % area = SNR per photosite × . (5.36)
Ap

SNR per unit area depends upon factors such as

• QE per unit area.
• FWC per unit area.
• Read noise per unit area.

In practice, it is found that camera manufacturers are generally able to achieve

very similar QE per unit area over a wide range of photosite areas (sensor pixel
sizes). In CCD sensors, this naturally follows from the fact that the photosensitive
area at a given photosite decreases in proportion with the photosite area itself.

5-55
Physics of Digital Photography (Second Edition)

However, in CMOS sensors the area occupied by the transistor cannot be

proportionally reduced and so the QE per unit area becomes a function of
photosite area. Nevertheless, the QE of small CMOS photosites can be restored
through backside illumination (BSI) [31].
On the other hand, read noise does depend upon photosite area. In order that
SNR per unit area be independent of photosite area, read noise must scale in
proportion with the photosite spacing, i.e. the square root of the photosite area in
analogy with the ideal pixel-binning illustration given in the previous section. The
deviation from this square root relationship arises in practice because read noise for
a larger photosite is generally less than the read noise for the corresponding
aggregate of smaller photosites. Since the read noise introduced downstream from
the PGA is independent of photosite area, the read noise advantage of the larger
photosites only becomes apparent at higher ISO settings where the read noise
contribution upstream from the PGA dominates [33].

5.8.7 SNR: cross-format comparisons

Consider two cameras based on different sensor formats subject to the same uniform
level of exposure H. Assume that the sensors have the same QE and read noise per
unit sensor area. Since SNR is seen to increase as the square root of the charge
detection area, the larger sensor format will have a higher total SNR when measured
over the entire frame. In other words, there will be less noise at the image level.
This SNR relationship with sensor size is commonly observed in camera reviews
since image noise is typically compared using the same ISO settings on each camera.
However, it is important to realise that in such cases, the photographs being
compared will not have the same appearance characteristics. In order to place cross-
format comparisons on an equal footing, they should be based on equivalent
exposure settings that produce equivalent photographs, as described in section 5.1.
In particular, a different level of exposure is required at the SP when equivalent
photographs are taken.
When performing cross-format comparisons, SNR should ideally be normalised
according to a speciﬁed percentage height or percentage area of the sensor in
accordance with equation (5.36),

A%
SNR per % area = SNR per photosite × .
Ap

Here A% is the speciﬁed percentage area and Ap is the photosite area. The percentage
area normalisation not only takes into account different sensor pixel counts, it also
takes into account the different enlargement factors from the sensor dimensions to
the dimensions of the viewed output photograph when comparing equivalent
photographs [1].
Furthermore, SNR per % area should be compared using equivalent ISO settings
on each format. For example, a photograph taken at f = 100 mm, N = 2.8, and
S = 800 on APS-C should be be compared with an equivalent photo taken at

5-56
Physics of Digital Photography (Second Edition)

f = 150 mm, N = 4, and S = 1600 on 35 mm full frame. In this case, the level of
exposure at the SP of the APS-C sensor will be double that of the 35 mm full-frame
sensor and so the advantage of the larger full frame detection area will be
approximately neutralized. However, the larger format offers additional photo-
graphic capability and potentially higher SNR when the smaller format does not
offer an equivalent ISO setting [1].

5.8.8 Raw ISO values

As described in chapter 2, Japanese camera manufacturers are now required to use
either SOS or recommended exposure index (REI) to specify ISO settings [4, 5].
Significantly, SOS and REI are based on the JPEG output image from the camera
and not the raw data.
The aim of SOS is to ensure that the output JPEG image is produced with a
standard mid-tone lightness, specifically middle grey, when a typical photographic
scene with a luminance distribution that averages to middle grey is metered using
average photometry. Camera manufacturers are free to the use the sensor response
curve in any desired fashion provided the output JPEG image satisfies the standard
mid-tone lightness requirement [36].
Significantly, the measured SOS value takes into account both analog gain and
digital gain applied using the JPEG encoding tone curve. This means that the ISO
setting on a given camera model may have been defined using a different level of
analog gain compared to the ISO setting on a different camera model.
Comparing camera models using the same ISO settings (or equivalent ISO
settings when performing cross-format comparisons) is appropriate when comparing
the JPEG output from the cameras. Such comparisons are useful for photographers
who primarily rely on JPEG output since the nature of the JPEG tone curve and the
performance of the in-camera image-processing engine are fully taken into account.
When designing an image-processing engine, there are various trade-offs to be made.
For example, there is a trade-off between SNR at the mid-tone DOL and highlight
headroom above the JPEG clipping point. Camera manufacturers all correctly
adhere to the CIPA and ISO standards, but the trade-offs can be optimised
differently by placing middle grey at a different position on the sensor response
curve.
However, raw data should be used when comparing the full capability of a
camera in terms of IQ. SNR and raw DR comparisons should be carried out using
the same analog gain (or equivalent analog gain when performing cross-format
comparisons), in which case the ISO settings labelled on the camera are not strictly
valid.
In order to place raw comparisons on an equal footing, some sources of camera
IQ data define their own ISO settings based on the raw data rather than the JPEG
output. The only available method that can in principle be used with raw data is the
saturation-based ISO speed method described in chapter 2, which remains part of
the ISO 12232 standard [4]. When applied to the raw data, raw-based ISO speed
ensures that half a stop of raw headroom is retained below the raw clipping point

5-57
Physics of Digital Photography (Second Edition)

when a scene that averages to middle grey (18% relative scene luminance) is metered
using average photometry. Alternatively, the headroom requirement could be
removed from the deﬁnition. In any case, no correspondence should be expected
between raw ISO values determined in this way and the ISO settings labelled on the
camera.
Nevertheless, photographers who wish to use the ETTR technique in conjunction
with raw output could in principle replace the labelled camera ISO settings with the
raw ISO speed values. According to the analysis given in chapter 2, an average scene
luminance that is 12.8% of the maximum would saturate the raw output if the scene
is metered using average photometry and raw ISO speeds are used for the ISO
settings.

5.9 Raw dynamic range

The engineering deﬁnition of raw dynamic range (raw DR) was introduced brieﬂy in
chapter 2 and is expanded upon below. This section also discusses the concept of
perceivable raw DR.

5.9.1 Raw dynamic range per photosite

Raw DR is conventionally deﬁned per photosite (sensor pixel) using either input-
referred units (electron count) or output-referred units (raw DN or ADU).

Input-referred units
Raw DR per photosite describes the maximum DR that can be represented in the
raw data as a function of ISO setting. Using input-referred units, raw DR per
photosite is deﬁned as follows:
ne,max
raw DR (ratio) = :1
σe,read
⎛ ne,max ⎞
raw DR (stops) = log2⎜ ⎟
⎝ σe,read ⎠
⎛ ne,max ⎞
raw DR (decibels) = 20 log10⎜ ⎟.
⎝ σe,read ⎠

Here ne,max is the maximum number of electrons that can be collected before the
ADC saturates. This defines the raw DR upper limit. By using the read noise σe,read as
the noise floor, the raw DR lower limit is defined by the signal per photosite where
the SNR = 1. This is also known as noise equivalent exposure. When expressed in
decibels, the convention adopted here is such that 1 stop ≡ 20 log10 2 ≈ 6.02 decibels
(dB).
At the base ISO setting, the raw DR upper limit is defined by FWC, assuming
that the photoelectron well completely fills before the ADC saturates. It was shown
in section 3.9 of chapter 3 and in section 5.8.3 above that read noise expressed in
electrons decreases as the ISO gain is raised from its base value, and so the raw DR

5-58
Physics of Digital Photography (Second Edition)

lower limit is similarly lowered or improved. However, the raw DR upper limit is
lowered from FWC by one stop each time the corresponding ISO setting is doubled
from its base value. The overall result is that raw DR goes down with increasing ISO
gain.

Output-referred units
Using output-referred units, raw DR per photosite is deﬁned as follows:
n DN,clip
raw DR (ratio) = :1
σDN,read
⎛ n DN,clip ⎞
raw DR (stops) = log2⎜ ⎟
⎝ σDN,read ⎠
⎛ n DN,clip ⎞
raw DR (decibels) = 20 log10⎜ ⎟.
⎝ σDN,read ⎠

Here nDN,clip is the DN corresponding to the raw clipping point, and σDN,read is the
read noise in DN.
The ISO gain compensates for reduced exposure and electron count by main-
taining the raw level in output-referred units at the expense of SNR. In other words,
the read noise in DN goes up with increasing ISO gain while the raw clipping point
remains constant. Therefore DR goes down with increasing ISO gain. This is
consistent with the deﬁnition given above using input-referred units.

5.9.2 Sensor dynamic range

Although raw DR per photosite describes the actual DR per pixel available in the
raw ﬁle as a function of ISO setting, this metric does not describe the DR that the
sensor is actually capable of delivering. Sensor DR per photosite can be deﬁned using
input-referred units as follows:
ne,FWC
sensor DR (ratio) = :1
σe,read,sensor
⎛ ne,FWC ⎞
sensor DR (stops) = log2⎜ ⎟
⎝ σe,read,sensor ⎠
⎛ ne,FWC ⎞
sensor DR (decibels) = 20 log10⎜ ⎟.
⎝ σe,read,sensor ⎠

The sensor DR upper limit is defined by the signal at FWC, ne,FWC . The sensor DR
lower limit is defined by the contribution to the total read noise arising from the
sensor, σe,read,sensor . This is the contribution from components upstream from the
PGA. In terms of photoelectrons, the read noise contribution from electronics
downstream from the PGA is negligible at the ISO gain corresponding to the ISO-
less setting. Therefore σe,read,sensor can be defined as the read noise at the ISO-less
setting.

5-59
Physics of Digital Photography (Second Edition)

In practice, raw DR is less than the sensor DR only because the standard strategy
for quantizing the voltage signal does not overcome the limitations imposed by the
downstream electronics [33]. Camera manufacturers have recently begun to address
this issue by developing more sophisticated methods for quantizing the signal, such
as dual conversion gain [37],

5.9.3 Perceivable dynamic range

As a metric, raw DR has several drawbacks in relation to photography.
(i) Raw DR per photosite is not comparable between camera models that
have different photosite areas (sensor pixel counts), the reason being that
SNR depends upon the area over which it is measured, as described in
section 5.8.5. Consider two sensors that differ only by pixel count. The
sensor with the larger photosites offers higher SNR per photosite, however
a similar gain in SNR could be achieved by downsampling the digital
image obtained from the sensor with smaller photosites.
(ii) Raw DR per photosite is not comparable between camera models that
are based on different sensor formats. This is due to the different
enlargement factors required to produce output photographs at the
same display size.
(iii) The engineering raw DR lower limit is deﬁned as the signal where the SNR
is unity. This signal is unlikely to be of sufﬁcient quality when the output
photograph is viewed under standard viewing conditions.

Various attempts have been made to define alternative measures of raw DR.
Photographic dynamic range (PDR) [38] described in this section is a very simple
measure of perceivable raw DR that addresses the above issues.
In order to address issue (i) above, an appropriately normalised measure of SNR
needs to be introduced. One option is to use SNR per fixed percentage area of the
sensor, as already discussed in sections 5.8.5 and 5.8.7. PDR [38] uses the CoC as the
fixed percentage area, which offers several advantages.
It has already been shown in section 5.8.5 that signal, noise, and therefore
SNR are dependent on the spatial scale at which they are measured. Recall from
section 5.2 that detail on a smaller spatial scale than the CoC diameter cannot be
resolved by an observer of the output photograph under the viewing conditions
that specify the CoC. This suggests that the CoC is a suitable minimum spatial
scale at which to measure SNR as perceivable by an observer of the output
photograph.
As described in section 5.2, standard CoC diameters correspond with standard
viewing conditions. Standard viewing conditions assume that the photograph will be
displayed at A4 size and viewed at the least distance of distinct vision, Dv = 250 mm .
Alternatively, the photographer can calculate a custom CoC diameter based upon
the intended viewing conditions. Given the SNR per photosite, SNR per CoC can be
calculated as follows:

5-60
Physics of Digital Photography (Second Edition)

π (c /2)2
SNR per CoC = SNR per photosite × . (5.37)
Ap

Here Ap is the photosite area and c is the CoC diameter. As a metric, SNR per CoC
has two important properties.
• Photosite area and therefore sensor pixel count are accounted for since the
photosites are binned together up to the CoC area.
• Sensor format is accounted for since the CoC for a given format takes into
account the enlargement factor from the dimensions of the optical image
projected onto the SP to the dimensions of the viewed output photograph.

In order to address issue (ii) above, PDR uses a more appropriate raw DR lower
limit than the signal that provides an SNR per photosite = 1. An appropriate lower
limit is subjective since different observers have their own expectations in terms of
IQ. In [38], the PDR lower limit is deﬁned as the signal that provides an SNR per
CoC = 20. This can be expressed by rearranging equation (5.37). Using output-
referred units,
PDR lower limit (DN) ≡ Raw level (DN) where
20
SNR per photosite = .
π (c / 2)2
Ap

If raw level and SNR are both expressed in stops by taking the base 2 logarithm, the
deﬁnition becomes
PDR lower limit (stops) ≡ Raw level (stops) where
20
SNR per photosite = log2 .
π (c / 2)2
Ap

Again using output-referred units, the PDR upper limit is defined by the raw
clipping point (RCP). The PDR upper limit may similarly be expressed in stops,
PDR upper limit (stops) = log2RCP.
Now PDR in stops can be defined as follows:
PDR (stops) = PDR upper limit (stops) − PDR lower limit (stops).
The PDR lower limit in output-referred units will increase as the ISO gain increases,
and so PDR will decrease upon raising the ISO setting, S.
PDR curves for several example cameras are shown in figure 5.22. It should be
noted that the ISO values in the plot are the ISO settings labelled on the camera, and
these are defined according to the camera JPEG output rather than the raw data.
The comparison could be improved by plotting PDR according to the raw ISO
speed values discussed in section 5.8.8. Nevertheless, it is apparent from figure 5.22

5-61
Physics of Digital Photography (Second Edition)

Figure 5.22. PDR as a function of ISO setting for a selection of cameras [38].

that similar PDR values are obtained when PDR is compared using equivalent ISO
settings, in accordance with equivalence theory. The highest PDR values are
obtained on the larger formats where equivalent ISO settings do not exist on the
smaller formats.
Not all of the raw DR will be transferred to the output photograph in general.
Indeed, the image DR is dependent on the tone curve used when converting the raw
file into the output image file, as discussed in section 2.3 of chapter 2. Furthermore,
the image DR may be compressed or tonemapped into the DR of the display
medium, as described in section 2.13 of chapter 2. Nevertheless, PDR provides an
upper bound on the image DR that is in principle just perceivable when the output
photograph is viewed under the specified viewing conditions.

5.10 Practical strategies

Acquiring a camera and lens with the highest technical IQ scores does not
automatically guarantee higher IQ in practice. For a given choice of camera and
lens, various photographic techniques can be employed in order to utilise the full IQ
potential of the camera system.
Two fundamental aspects of IQ that can be optimised through photographic
practice are resolution and noise. This section discusses the following techniques:
• Object resolution maximisation. This can be achieved by maximising the
camera system RP.

5-62
Physics of Digital Photography (Second Edition)

• Diffraction softening. In landscape photography, a large DoF is required

without noticeable diffraction softening. An optimum f-number can be found
that balances these requirements.
• Non-destructive noise reduction. These methods increase SNR without affect-
ing resolution.
• Exposing to the right (ETTR). This technique maximises overall SNR by
pushing the raw histogram as far to the right as possible without clipping the
highlights.

5.10.1 Object resolution

In order to maximise the detail captured of an object positioned at the OP, the
camera system RP should be as high as possible. This requires a camera with a high
sensor pixel count and a lens with high RP. Furthermore, a low f-number should be
selected that maximises the lens RP.
It is also advisable to use a tripod to eliminate image blur caused by camera shake,
which can be described by a jitter MTF [14]. Camera system RP will be reduced if
the jitter cut-off frequency drops below that of the camera system at the selected f-
number.
For a given camera system, the following techniques can be used to take
advantage of the available RP:
• Use light with smaller wavelengths.
• Lower the f-number.

If the magniﬁcation is allowed to vary, then the following can additionally be

utilized:
• Move closer to the OP.
• Increase the lens focal length.
• Increase the refractive index of object space relative to image space.

These methods are discussed in more detail below.

1. Use light with smaller wavelengths
Equation (5.32) reveals that the diffraction cut-off frequency, μc,diff , is
inversely related to λ,
⎛n ⎞ 1
μc,diff = ⎜ ⎟ .
⎝ n′ ⎠ λNw
The maximum achievable resolution can therefore be increased by using light
with smaller wavelengths. For this reason, near ultra-violet light is often used
in microscopy.
2. Lower the f-number
For a fixed magnification, the bellows factor b remains fixed,
∣m∣
b=1+ .
mp

5-63
Physics of Digital Photography (Second Edition)

Equation (5.32) can be written in the form

⎛n ⎞ 1
μc,diff = ⎜ ⎟ . (5.38)
⎝ n′ ⎠ λ bN
Evidently the cut-off frequency due to diffraction, μc,diff , can be raised by
lowering the f-number, N. This is equivalent to increasing the lens EP diameter, D.
In practice, lens aberrations are more severe at the lowest f-numbers since
the EP diameter will be larger. Although aberrations do not in principle
change the lens cut-off frequency, the lens MTF can be lowered dramatically
in the region close to μc,diff [11]. If the camera system cut-off frequency, μc , is
defined to be the frequency at which the camera system MTF drops to a
small percentage value such as 10% rather than zero, then aberrations can
reduce μc . An optimum f-number or ‘sweet spot’ exists for a given lens. This
is typically higher than the lowest f-number available, particularly on larger
formats where aberration correction can be more difficult due to the larger
possible maximum EP diameter. On the other hand, lenses used in smart-
phone cameras are essentially diffraction limited.
3. Move closer to the object plane
If the magnification m is allowed to vary, then m and the bellows factor b
both increase when the OP is brought closer to the lens, and so μc,diff
decreases according to equation (5.38). However, μc,diff is the image-space
cut-off frequency due to diffraction and is measured at the SP. When m
changes, the scene content similarly changes and the spatial frequencies no
longer correspond to those at the previous m.
To observe the effect on object resolution, the quantity of interest is the
object-space cut-off frequency, μc,diff (OP), measured at the OP. This is
directly related to μc,diff via the magnification,
μc,diff (OP) = ∣m∣μc,diff .

Substituting into equation (5.38) and utilising equation (1.27) of chapter 1

yields the following result:
⎛n ⎞ D
μc,diff (OP) = ⎜ ⎟ . (5.39)
⎝ n′ ⎠ λ(s − s EP )

Here D = f /N is the EP diameter. The object distance s is measured from the

first principal point, and sEP is the distance along the OA from the first
principal point to the EP. Evidently μc,diff (OP) increases when s is reduced.
4. Increase the focal length (for fixed object distance and f-number)
For a given object distance s and f-number N, increasing the focal length
increases the diffraction cut-off frequency at the OP, μc,diff (OP). This follows
from equation (5.39) and is another means of increasing the lens EP diameter,
f
D= .
N

5-64
Physics of Digital Photography (Second Edition)

5. Increase the ratio n: n′

Object resolution can be improved by increasing the refractive index of
object space relative to image space. Although not practical in everyday
photography, this principle is used in microscopy through the use of object-
space immersion oil with a high refractive index, typically of order n ≈ 1.5. In
this case, equation (5.39) is usually expressed in the following more general
form:
⎛ n ⎞ 2 sin U 2 NA
μc,diff (OP) = ⎜ ⎟ = .
⎝ n′ ⎠ λ n′λ
Here NA = n sin U is the object-space numerical aperture deﬁned by the real
marginal ray angle U, and typically n′ = 1 (air). According to section 1.5.10
of chapter 1, the increase in maximum achievable object-space resolution
occurs at the expense of the image-space numerical aperture and irradiance at
the image plane.

5.10.2 Diffraction softening

As discussed in the previous section, a low-number is required in order to raise the
diffraction cut-off frequency and maximise object resolution. In the presence of
residual aberrations, an f-number slightly higher than the minimum available may
be the optimum required.
Although lowering the f-number can increase object resolution, the DoF will
become shallower. However, a deep DoF is required in landscape photography, and
a deep DoF requires a large f-number. Although increasing the f-number can lower
the camera system RP and therefore lower object resolution, this will not be
noticeable when the output photograph is viewed under standard viewing con-
ditions. The main effect of increasing the f-number is that diffraction softening will
become more noticeable.
Recall that increasing the f-number not only reduces the diffraction cut-off
frequency; the diffraction MTF at low spatial frequencies is also reduced. In turn,
this reduces the camera system MTF at the low spatial frequencies relevant to
standard viewing conditions. Diffraction softening is a term used to describe the
resulting decrease in perceived image sharpness that occurs.
In landscape photography, perceived image sharpness may initially increase as
the f-number is raised from its minimum value due to the lessening effect of lens
aberrations on reducing the camera system MTF at low spatial frequencies, but
diffraction softening will eventually begin to dominate. In practice, an optimum f-
number can be found that provides both sufﬁcient DoF and sufﬁcient perceived
image sharpness when the output photograph is viewed under standard viewing
conditions. The optimum f-number on smartphone cameras will likely be the lowest
available; such lenses are essentially diffraction-limited due to the small maximum
achievable EP diameter.

5-65
Physics of Digital Photography (Second Edition)

In section 5.2, it was shown that the conditions under which the output photo-
graph is viewed can be described by the CoC diameter, c. A noticeable drop in
perceived image sharpness may not be expected until the Airy disk diameter
approaches the CoC diameter:
2.44 λN ≈ c . (5.40)
For example, consider an output photograph from a 35 mm full-frame camera
viewed under standard viewing conditions. In this case the viewing distance is
L = Dv , where Dv = 250 mm is the least distance of distinct vision, and the optical
image at the SP is enlarged by a factor X = 8, which corresponds with an A4 print.
According to equation (5.30), these conditions deﬁne a CoC diameter c = 0.030 mm.
Rearranging equation (5.40) and substituting c = 0.030 mm indicates that a
noticeable drop in perceived image sharpness is likely to occur at an f-number N ≈ 22.
Consequently, the optimum f-number is likely to be one or two stops below this value,
i.e. N = 16 or N = 11 on the 35 mm full-frame format. An accurate estimation would
require all contributions to the camera system MTF to be taken into account.

Diffraction softening and sensor format

Diffraction softening is independent of sensor pixel count. Although all contribu-
tions to the camera system MTF affect perceived image sharpness, the major factor
in determining a noticeable onset of the diffraction softening contribution is the size
of the Airy disk in relation to the CoC. For a specified set of viewing conditions, the
prescribed CoC diameter c depends only upon sensor format. For smaller sensor
formats, the required enlargement is greater and so the CoC diameter will be
proportionally smaller,
c
c2 = 1 .
R
Here 1 and 2, respectively, denote the larger and smaller formats, and R is the
equivalence ratio between the sensor diagonals defined in section 5.1.
Using the above formula, the estimate for the optimum f-number required on the
APS-C and Micro Four ThirdsTM formats is N = 11 and N = 8, respectively, rather
than the N = 16 value on 35 mm full frame. These are estimates for the optimum
f-number that provides both sufficient DoF and sufficient perceived image sharpness
when the output photograph is viewed under standard viewing conditions.
Recall from section 5.1 that equivalent photographs obtained from different
formats require equivalent f-numbers,
N1
N2 = .
R
Here N1 and N2 are the respective f-numbers corresponding to the larger and smaller
formats. Consequently, the above analysis is entirely consistent with equivalence
theory. For example, a focal length f1 = 150 mm and f-number N = 11 on a 35 mm
full frame camera will produce a photograph with the same diffraction softening as
f1 = 100 mm and N = 8 on an APS-C camera.

5-66
Physics of Digital Photography (Second Edition)

5.10.3 Non-destructive noise reduction

Various image processing filters can be applied to the raw data or output digital
image in order to reduce noise. Unfortunately, noise filters affect resolution or image
detail. The application of a noise filter is described as noise filtering rather than noise
reduction. However, there are various noise reduction procedures that can be carried
out to reduce noise in the raw data without affecting resolution. Examples include
frame averaging, dark frame subtraction (DSNU compensation), and flat field
correction (PRNU compensation). These procedures are non-destructive as they
do not filter the data.
• Frame averaging
Temporal noise can be reduced to very low levels by frame averaging. To use
this technique, a number of frames must be taken in quick succession using
identical camera settings. The scene radiance distribution must remain
constant from frame to frame, and so the technique cannot be used with
moving subjects.
Since each frame will be identical apart from the statistical variation in
temporal noise, averaging M frames leaves the mean raw value nDN
corresponding to a given photosite unchanged,
1 Mn DN
〈σDN〉M = ∑ nDN = = n DN .
M M
M

On the other hand, temporal noise adds in quadrature according to equation

(3.53) of chapter 3,
M
σtotal = ∑ σm2 .
m=1

This means that averaging M frames decreases the temporal noise by a factor
M,
1 σ
〈σDN〉M = 2
MσDN = DN .
M M
Equivalently, averaging M frames increases the SNR by a factor M [39].
For example, averaging 16 frames will increase SNR by a factor of 4.
• Dark-frame subtraction
Recall from chapter 3 that DSNU (dark-signal nonuniformity) is deﬁned as
FPN that occurs in the absence of irradiance at the SP, and that the major
contribution arises from DCNU (dark-current nonuniformity). Nevertheless,
DSNU is still present when the imaging sensor is exposed to light. Although
cameras will estimate and subtract the average dark signal on a row or
column basis, this procedure will not eliminate the DCNU contribution since
this arises from variations in individual photosite responses over the SP.

5-67
Physics of Digital Photography (Second Edition)

A way to reduce DSNU is to enable the in-camera long-exposure noise

reduction (LENR) function. A frame taken in the absence of irradiance at the
SP is deﬁned as a dark frame. A dark frame will only contain DSNU. If a
photographer is using a long exposure duration, enabling the LENR function
instructs the camera to take a dark frame of equal duration immediately after
the standard frame. This dark frame is then subtracted from the standard
frame before the raw data ﬁle is written. However, LENR has several
drawbacks:
1. The standard frame and dark frame both contain temporal noise [33].
Since temporal noise adds in quadrature, subtracting the dark frame
increases temporal noise in the raw data by a factor 2 ,

2
σDN + ( −σ )2DN = 2 σDN.

2. The photographer has to wait for the dark frame to be taken after every
standard frame.
3. LENR may not remove any pattern component of non-Gaussian read
noise since this may vary from frame to frame.
For photographers who specialise in long-exposure photography, an alter-
native strategy is to construct a set of dark frame templates for various
exposure durations t and ISO settings S. Each template can be made by
averaging together many dark frames in order to minimize temporal noise
and isolate DSNU. Subtracting the template from the standard frame will
only increase temporal noise by a factor 1 + (1/M ) , where M is the number
of dark frames used to make the template. If enough dark frames are used,
this process can also isolate any overall pattern that may arise from non-
Gaussian read noise.
• Flat-ﬁeld correction
Recall from chapter 3 that PRNU (pixel response nonuniformity) increases in
proportion with exposure level and therefore electron count,
ne,PRNU = k ne .

Since the total noise will be larger than the PRNU, the proportionality
constant k places a limit on the maximum achievable SNR [32, 33].

ne 1
SNR ⩽ = .
ne,PRNU k

Attempts to remove PRNU are rarely made in normal photographic

situations. However, PRNU is often compensated for in astrophotography
through the use of a flat-field template. This is constructed by averaging over a
set of flat-field frames, where each flat-field frame is an image of a uniform
surface taken under uniform illumination. Frame averaging the flat-field

5-68
Physics of Digital Photography (Second Edition)

frames minimizes the temporal noise and isolates the PRNU. When any
standard frame is taken, the flat-field template needs to be divided from the
standard frame, which must then be multiplied by the average raw value of
the flat-field template.

5.10.4 Exposing to the right (ETTR)

Exposing to the right (ETTR) is an exposure strategy used with raw files rather than
JPEG images. The aim is to maximise IQ in terms of SNR and raw DR rather than
produce a JPEG image at the correct mid-tone lightness. After the raw data has been
captured, the correct JPEG mid-tone lightness can be restored by applying an
appropriate tone curve to the raw file using an external raw converter. This
maintains the higher SNR, and also allows the full raw DR to be utilised.
The ETTR technique takes advantage of the analysis given in section 5.8. The
general concept involves ensuring that the raw histogram is always pushed to the
right as far as possible without clipping the highlights. Raw histograms were
discussed in section 2.4.1 of chapter 2.
The ETTR concept takes a different form depending upon the type of photo-
graphic situation encountered. These photographic situations can be broadly
classified into two types according to any restrictions on the level of photometric
exposure H required at the SP.
• The photographer is free to vary 〈H 〉.
• The photographic conditions dictate that 〈H 〉 must be kept fixed.

The ETTR method requires that exposure decisions be made using the raw data
rather than the JPEG output image. However, camera LCD displays and electronic
viewfinders show the image histogram corresponding to the JPEG output even when
the camera is set to record raw files. Nevertheless, open-source firmware solutions
for displaying raw histograms are available. An alternative strategy is to exper-
imentally determine the difference between the JPEG clipping point and raw
clipping point under typical scene illumination. If the difference is known to be M
stops, then the photographer can safely overexpose by M stops beyond the right-
hand side of the JPEG image histogram when implementing the ETTR method.
On traditional DSLRs, the histogram needs be checked after the photograph has
been taken. Cameras with liveview or mirrorless cameras with an electronic
viewfinder offer a ‘live histogram’ that can be monitored in real time.

5.10.5 ETTR: variable exposure

At a given ISO setting, the maximum achievable SNR is obtained when the
photographer is free to increase the exposure at the SP. This follows from the fact
that photon shot noise is proportional to H , and so signal-to-photon-noise ratio is
also proportional to H according to equation (5.35),
ne H
SNR ph = ∝ = H.
n ph H

5-69
Physics of Digital Photography (Second Edition)

The average exposure at the SP can typically be increased when photographing

static subjects in good lighting conditions, and in low light provided a tripod can be
used to remove restrictions imposed by camera shake. In other words, 〈H 〉 can be
increased by lowering the f-number for a ﬁxed exposure duration t or by increasing t
for a ﬁxed f-number. In both cases the histogram will be pushed to the right. The
ETTR method requires that the histogram be pushed as far to the right as possible
without clipping. If the camera is in aperture priority mode, positive exposure
compensation can be applied to push the histogram to the right.
As described in section 3.7 of chapter 3, section 5.8, and section 5.9, available raw
DR increases upon lowering the ISO setting, and available raw DR is maximised at
the base ISO setting. Therefore, raw DR can be maximised by implementing the
ETTR technique at the base ISO setting.
In order to give further insight into the above statements, assume that the camera
is in manual mode and a moderate ISO setting has been selected along with a
combination of N and t that avoids highlight clipping. Increasing t or opening up the
aperture by lowering N will both increase 〈H 〉 and shift the histogram to the right. If
the photographic conditions allow a lower N or longer t to be selected even after
ETTR has been achieved at the selected ISO setting, a potentially higher SNR can
be achieved by lowering the ISO setting. If the ISO setting is lowered by one stop by
halving its numerical value, the right edge of the histogram will shift to the left of the
ETTR limit by one stop. The SNR will typically decrease slightly, however lowering
the ISO setting increases the exposure headroom available for ETTR. In the present
example, the extra stop available at the highlight end of the histogram could be
utilised by lowering N or lengthening t to further increase 〈H 〉, leading to an
improvement in SNR beyond that achievable at the higher ISO setting. If the
photographic conditions continue to allow a lower N or longer t, it logically follows
that the highest possible SNR and therefore the maximum raw DR will eventually
be achieved by lowering the ISO setting to the base value and applying the ETTR
method. This ensures the maximum possible amount of light from the scene is
utilized.

5.10.6 ETTR: ﬁxed exposure

In certain photographic scenarios, 〈H 〉 is limited to a ﬁxed maximum value. Typical
situations include hand-held photography in low light and action photography.
For example, consider an action photographer using the camera in manual mode.
Assume that the lowest available f-number N has been selected in order to open up
the aperture and achieve the shallowest possible DoF, and an exposure duration t
has been selected that is just short enough to freeze the appearance of the moving
action. If the scene lighting conditions remain ﬁxed, 〈H 〉 cannot be increased any
further because doing so would require t to be lengthened, which in turn could cause
subject motion blur.
In the above scenario, the photographic conditions prevent additional light from
being utilized to increase SNR, and so the histogram will not in general be
positioned as far to the right as possible. In this case, an alternative ETTR method

5-70
Physics of Digital Photography (Second Edition)

exists for increasing SNR [33]. Specifically, read noise can be lowered by increasing
the ISO setting S. As explained in section 5.8.3, read noise decreases upon raising S
provided H is fixed, and so signal-to-read-noise ratio goes up. This is referred to as
shadow improvement. Since each doubling of S removes access to the uppermost stop
of the sensor response curve, the histogram is pushed to the right.
If the camera is in manual mode, the fixed-exposure ETTR method should be
implemented as follows:
1. Select the base ISO setting and maximise 〈H 〉 by using the lowest N and
longest t that the photographic conditions allow.
2. If headroom is still available, increase the ISO setting until the histogram is
pushed as far to the right as possible without clipping.

Step 1 utilizes the variable-exposure ETTR method described in the previous section
as far as the photographic conditions allow. This should always be applied first since
the greatest increase in total SNR is achieved by capturing more light. Step 2 then
utilizes the fixed-exposure ETTR method based on shadow improvement. In this
case, any further increase in total SNR is achieved in the low-exposure shadow
regions of the image.
Finally, the fixed-exposure ETTR method should only be applied up to the
camera-dependent ISO-less setting defined in section 5.8.4. Above the ISO-less
setting, no further shadow improvement can be achieved but available raw DR
decreases.

References
[1] Rowlands D A 2018 Equivalence theory for cross-format photographic image quality
comparisons Opt. Eng. 57 110801
[2] Rowlands D A 2017 Physics of Digital Photography 1st edn (Bristol: Institute of Physics
Publishing)
[3] Kingslake R and Johnson R B 2010 Lens Design Fundamentals 2nd edn (New York:
Academic)
[4] International Organization for Standardization 2006 Photography—Digital Still Cameras—
Determination of Exposure Index, ISO Speed Ratings, Standard Output Sensitivity, and
Recommended Exposure Index, ISO 12232:2006
[5] Camera and Imaging Products Association 2004 Sensitivity of Digital Cameras CIPA DC-
004
[6] Williams J B 1990 Image Clarity: High-Resolution Photography (Stoneham, MA: Focal
Press)
[7] Nasse H H 2008 How to Read MTF Curves (Carl Zeiss Camera Lens Division)
[8] Ray S F 2002 Applied Photographic Optics: Lenses and Optical Systems for Photography,
Film, Video, Electronic and Digital Imaging 3rd edn (Oxford: Focal Press)
[9] Nasse H H 2009 How to Read MTF Curves. Part II (Carl Zeiss Camera Lens Division)
[10] Smith W J 2007 Modern Optical Engineering 4th edn (New York: McGraw-Hill)
[11] Goodman J 2004 Introduction to Fourier Optics 3rd edn (Englewood, CO: Roberts)
[12] Shannon R R 1997 The Art and Science of Optical Design (Cambridge: Cambridge
University Press)

5-71
Physics of Digital Photography (Second Edition)

[13] Shannon R R 1994 Optical specifications Handbook of Optics (New York: Mc-Graw Hill)
ch 35
[14] Fiete R D 2010 Modelling the Imaging Chain of Digital Cameras, SPIE Tutorial Text vol
TT92 (Bellingham, WA: SPIE Press)
[15] Koyama T 2006 Optics in digital still cameras Image Sensors and Signal Processing for
Digital Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 2
[16] International Organization for Standardization 2000 Photography-Electronic Still-picture
Cameras-Resolution Measurements, ISO 12233:2000
[17] Burns P D 2000 Slanted-edge MTF for digital camera and scanner analysis PICS 2000:
Image Processing, Image Quality, Image Capture, Systems Conf. (Portland, OR) (IS&T)
vol 3 pp 135–38
[18] Burns P D and Williams D 2002 Refined slanted-edge measurement practical camera and
scanner testing PICS 2000: Image Processing, Image Quality, Image Capture, Systems Conf.
(Portland, OR) (IS&T) vol 5 pp 191–95
[19] Palum R 2009 Optical antialiasing filters Single-Sensor Imaging: Methods and Applications
for Digital Cameras ed R Lukac (Boca Raton, FL: CRC Press) ch 4
[20] Charman W N and Olin A 1965 Image quality criteria for aerial camera systems Photogr.
Sci. Eng. 9 385
[21] Snyder H L 1973 Image quality and observer performance Perception of Displayed
Information (New York: Plenum Press) ch 3
[22] Granger E M and Cupery K N 1973 An optical merit function (SQF), which correlates with
subjective image judgments Photogr. Sci. Eng. 16 221
[23] Barten P G J 1990 Evaluation of subjective image quality with the square-root integral
method J. Opt. Soc. Am. A 7 2024
[24] Koren N 2009 Imatest Documentation (Imatest LLC)
[25] Baxter D, Cao F, Eliasson H and Phillips J 2012 Development of the I3A CPIQ spatial
metrics Image Quality and System Performance IX Proc. SPIE 8293 829302
[26] Mannos J and Sakrison D 1974 The effects of a visual fidelity criterion of the encoding of
images IEEE Trans. Inf. Theory 20 525
[27] Wolberg G 1990 Digital Image Warping 1st edn (Los Alamitos, CA: IEEE Computer Society
Press)
[28] Keys R G 1981 Cubic convolution interpolation for digital image processing IEEE Trans.
Acoust. Speech Signal Process 29 1153
[29] Mitchell D P and Netravali A N 1988 Reconstruction filters in computer graphics
Proceedings of the 15th Annual Conference on Computer Graphics and Interactive
Techniques, SIGGRAPH '88 (New York: Association for Computing Machinery) pp 221–8
[30] Wolberg G 2004 Sampling, reconstruction, and aliasing Computer Science Handbook ed
A B Tucker 2nd edn (Chapman and Hall/CRC) ch 39
[31] Holst G C and Lomheim T S 2011 CMOS/CCD Sensors and Camera Systems 2nd edn (JCD
Publishing/SPIE)
[32] Nakamura J 2006 Basics of image sensors Image Sensors and Signal Processing for Digital
Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 3
[33] Martinec E 2008 Noise, Dynamic Range and Bit Depth in Digital SLRs, unpublished
[34] Sanyal R 2015 Sony Alpha 7R II: Real-World ISO Invariance Study http://www.dpreview.
com/articles/7450523388

5-72
Physics of Digital Photography (Second Edition)

[35] Chen T, Catrysse P B, El Gamal A and Wandell B A 2000 How small should pixel size be?
Proceedings of SPIE 3965, Sensors and Camera Systems for Scientiﬁc, Industrial, and Digital
Photography Applications (Bellingham, WA: SPIE) pp 451–9
[36] Butler R 2011 Behind the Scenes: Extended Highlights! http://www.dpreview.com/articles/
2845734946
[37] Aptima Technology 2010 Leveraging Dynamic Response Pixel Technology to Optimize Inter-
scene Dynamic Range (Aptina Imaging Corporation) white paper
[38] Claff W J 2009 Sensor Analysis Primer–Engineering and Photographic Dynamic Range
unpublished
[39] Mizoguchi T 2006 Evaluation of image sensors Image Sensors and Signal Processing for
Digital Still Cameras ed J Nakamura (Boca Raton, FL: CRC Press/Taylor & Francis) ch 6

5-73
Index

Abbe cut-off frequency see ‘cut-off aperture see ‘aperture stop’

frequency (diffraction)’ aperture (detector) see ‘detector
Abbe’s sine condition 1.5.10 aperture’
aberration function 3.2.6 aperture function 3.2.3
aberration transfer function 5.3.2 aperture priority mode 2.8.1
aberrations 1.1.2, 1.1.5, 1.1.11, 3.1.9, aperture stop 1.1.11, 1.2.4, 1.3.1, 1.3.2,
3.2.6, 5.3.1, 5.5.1 3.2.3
absolute colourimetry 4.1.10 aperture value 2.5.6
ac value 3.1.8 APEX system 2.5.6
achromatic see ‘greyscale’ aplanatic lens 1.5.10
acutance 5.6 apodization 5.3.2
adapted white 4.6, 4.6.1 APS-C format 1.3.6, 5.4.1, 5.10.2
Adobe DNG converter 4.7, 4.8.2, 4.9 aspherical element 1.1.11
Adobe Photoshop 4.11.2, 4.11.4, astigmatism 1.1.11
4.11.5, 5.7.1 astrophotography 5.10.3
Adobe RGB colour space see ‘colour asymmetric PSF see ‘point-spread
space (Adobe RGB)’ function (asymmetry)’
adopted white 4.6, 4.6.1, 4.7, autofocus 1.2, 1.2.2, 1.2.4, 1.4.5
4.8.1, 4.9.1 auto-ISO mode 2.8.1 - 2.8.4
Airy disk 3.2.4, 3.3, 5.3.2, 5.10.2 average photometry 2.5, 2.5.3, 2.6.3,
Airy pattern 3.1.8, 3.2, 3.2.4 5.1.3
aliasing 3, 3.1.3, 3.4, 3.4.4, 5.5.1, 5.7.1, average scene luminance 2.5, 2.5.5
5.7.2
amplifier (charge) see ‘charge amplifier’ backlight 2.9
amplifier (programmable gain) see backside illumination 3.6.4, 5.8.6
‘programmable gain amplifier’ band limited 3.4.1, 3.4.4, 5.7.2
amplifier glow 3.8.5 banding see ‘posterisation’
amplitude 3.1.7, 3.2.1 baseband 3.4.3
amplitude transfer function 3.2.3, 3.2.5 base ISO setting 2.6.1, 2.6.4, 3.7.1,
analog gain 2.6, 2.6.4 5.8.1, 5.8.3, 5.10.6
analog-to-digital converter 2.1.1, 2.1.3, Bayer colour filter array see ‘colour
3.7, 3.7.1, 3.7.2, 5.8.1, 5.8.3 filter array’
analog-to-digital units see ‘digital beamsplitter 2.11.4
numbers’ bellows factor 1.3.4, 1.5.5, 5.1.2, 5.1.3
angular field of view 1.3, 1.3.1, 1.3.2, bias frame 3.8.5, 3.9.2
5.1.3 bias offset 3.7.2, 3.7.4, 3.9.2
angular field of view formula 1.3.4, bicubic convolution 5.7.1
1.3.5, 5.1.3 bilinear interpolation 4.3.2, 5.7.1
anti-aliasing filter 5.7.2, 5.8.5, also see binning see ‘pixel binning’
‘optical low-pass filter’ birefringence 3.4.7
apertal ratio 1.5.4 bit depth 2.1.1, 2.1.3, 2.2.1, 3.7.2, 4

6-1 ª IOP Publishing Ltd 2020

Physics of Digital Photography (Second Edition)

black body 4.2.1, 4.2.2, 4.6.1 chromaticity 2.1.2, 4.1.1

black clipping level 2.3 chromaticity coordinates (rg) 4.1.7
blur (motion) see ‘motion blur’ chromaticity coordinates (uv) 4.2.2, 4.9.1
blur spot 1.4.1, 1.4.6, 3.1.8, 3.1.10 chromaticity coordinates (xy) 4.1.9,
bokeh 1.4.6, 5.3.4 4.1.10, 4.2.1, 4.9.1
Boltzmann constant 4.2.1 chromaticity diagram (rg) 4.1.7
Bradford CAT 4.6.2, 4.7, 4.9.2 chromaticity diagram (uv) 4.2.2
Brewster’s angle 2.11.2 chromaticity diagram (xy) 4.1.9, 4.2.1
brightness 2.2.3, 2.4.2, 2.13.2 CIE colour space see ‘colour space..’
brightness value 2.5.6 CIE illuminant see ‘illuminant..’
blue sky 2.11.3 circle function 3.2.4
circle of confusion 1.4.1, 1.4.3, 1.4.4, 5,
camera characterization see ‘character- 5.1.3, 5.2.3 - 5.2.5, 5.9.3, 5.10.2
ization (camera)’ cinematography 1.3.5
camera equation 1.5.8, 5.1.3 circular polarizing filter 2.11.4
camera neutral 4.6 clipping 2.3, 2.3.3, 2.3.4, 2.6.1, 2.7, 2.9,
camera raw space 2.1.2, 4.1, 4.3, 4.4.2 2.10.1, 2.12, 4.11.4
camera raw space primaries 4.3.4 cloud 2.9.1
camera response functions 2.12.1, 3.2.1, CMOS sensor 1.5.9, 3.3.1, 3.6.2, 3.6.6,
3.6, 3.6.4, 4.3 5.8.6
camera shake 2.8.1, 5.6, 5.10.1 coherence 3.2.1, 3.2.3
candela 1.5.1, 4.1.10 collimation 1.1.6
cardinal point 1.1.6 colour 2.1.2, 3.6.3, 4.1.1
CAT see ‘chromatic adaptation colour constancy 4.6
transform’ colour demosaic see ‘demosaicing’
cathode-ray-tube monitor 2.2.5, 2.13.2 colour difference 4.4.3
CCD sensor 1.5.9, 3.3.1, 3.6.2, 3.6.6, colour filter array 2.1.2, 2.12.1, 3.4.5,
5.8.6 3.6.3, 4.3.1
CCT see ‘correlated colour temperature’ colour management 4.11.1, 4.11.2
centre of perspective 1.3.7 colour-matching functions (CIE RGB)
centre-weighted metering see ‘metering 4.1.3, 4.1.4, 4.1.5, 4.4.1
(centre-weighted)’ colour-matching functions (CIE XYZ)
characterization (camera) 4.4 4.1.8, 4.4.1
charge amplifier 3.6.6 colour-matching module 4.11.1
charge collection efficiency 3.6.2, 3.6.4 colour matrix see ‘transformation
charge detection 3.6.6 matrix’
charge readout 3.6.6 colour matrix (Adobe) 4.9
charge signal 3.3.1, 3.6, 3.6.4 colour profile (ICC) 4.11.1, 4.11.2, 4.11.4
chief ray 1.3.2 colour rotation matrix 2.1.2, 4.6, 4.8,
chroma 2.13.1 4.8.1, 4.8.2
chromatic aberration 1.1.11 colour space (Adobe RGB) 2.3.1, 4.5,
chromatic adaptation 4.6, 4.6.2 4.10.2, 4.11.1
chromatic adaptation transform 4.6, colour space (camera) see ‘camera raw
4.6.2 space’

6-2
Physics of Digital Photography (Second Edition)

colour space (CIE 1960 UCS) 4.2.2, 4.9.1 crop factor 1.3.6
colour space (CIE LAB) 2.2.3, 4.1, cross-format comparisons 1.3.6, 5, 5.1,
4.3.2, 4.4.2, 4.4.3 5.3.3, 5.4.1, 5.8.7
colour space (CIE RGB) 4.1.3, 4.1.6, CRT monitor see ‘cathode-ray tube
4.1.7 monitor’
colour space (CIE XYZ) 4.1.8, 4.4.2, current see ‘photocurrent’
4.4.3 cut-off frequency 3.1.8, 5.1.1
colour space (LMS) 2.1.2, 4.1.2, 4.6.2 cut-off frequency (detector) 3.3.4
colour space (output-referred) 2.1.2, 4.1, cut-off frequency (diffraction) 3.2.5,
4.1.12, 4.5 3.4.1, 3.5.2, 5.3.2, 5.10.1, 5.10.2
colour space (ProPhoto RGB) 4.5, cut-off frequency (effective) 5.3.2, 5.5.1
4.11.2, 4.11.3 cut-off frequency (object space) 5.10.1
colour space (reference) 4.1 cut-off frequency (sensor) see ‘cut-off
colour space (sRGB) 2.3.1, 4.1.12, 4.5, frequency (detector)’
4.5.1, 4.6, 4.7, 4.10, 4.11.1 cut-off frequency (system) 5.5, 5.10.1
colour temperature 4.2.1 cut-off frequency (vision) 3.4.3
colour tint 4.2.2 cycles per degree 5.6.3
Coltman’s formula 5.2.5 cycles per mm 3.1.7
coma 1.1.11, 1.5.10, 3.1.9, 5.3.4 cycles per pixel 5.4.1, 5.7.1
comb function 3.1.10, 3.3.3, 3.4.1, 3.4.2
complex amplitude 3.2.1 dark current 3.8.3, 3.9.2
compound lens see ‘lens (compound)’ dark-current shot noise see ‘noise (dark-
cone (light) 1.5.1, 1.5.2 current shot)’
cone cell (eye) 2.1.2 dark frame 3.9.2, 5.10.3
cone of vision 5.2.2 dark-frame subtraction 5.10.3
conversion factor 3.7.3, 3.9.1, 3.9.3, dark signal 3.8.2, 3.8.3, 4
4.3.1, 4.8.1, 5.8 dark-signal nonuniformity 3.8.5, 5.10.3
conversion gain 3.6.6, 5.9.2 data numbers see ‘digital numbers’
contour deﬁnition 5.6.3 dc bias 3.1.8
contrast 2.3.3, 2.12.2, 2.13.2 dcraw 2.12.1, 3.9, 4.3.2, 4.4.2, 4.8.2
contrast (mid-tone) 2.3.2, 2.3.3, 2.12.2 decibel 5.8, 5.9.1
contrast (waveform) 3.1.8 defocus aberration 1.2.1, 1.4.4, 1.4.6,
contrast ratio see ‘dynamic range 3.2.6, 5.2.3, 5.3.1, 5.3.4
(display)’ delta function 3.1.5, 3.1.6, 3.1.10, 3.3.3
contrast sensitivity function 5.6.3 demosaicing 2.1.2, 4.3.2
contrast transfer function 5.2.5 depletion region 3.6.2, 3.6.4, 3.6.6
convolution 3.1.4 - 3.1.6, 5.2.5, 5.7.1 depth of ﬁeld 1.4, 2.8.1, 2.10, 3.2.4, 5.1.3
convolution theorem 3.1.7, 3.4.2 depth of focus 5.2.6
coordinate representation 1.5.7, 1.5.8, Descarte’s formula 1.1.8, 1.1.9
3.1.2, 3.2.3 detector aperture 3.3, 3.3.1, 3.3.2
correlated colour temperature 4.2.2, dielectric 2.11.2
4.2.4, 4.6.1, 4.7, 4.9.1 diffraction 3, 3.1.8, 3.2.2, 3.2.3, 3.2.5
correlated double sampling 3.8.2 diffraction limited 3.2.6, 5.3.2, 5.10.1
correlation 3.2.5 diffraction softening 3.2.4, 3.2.5, 5.6,
cosine fourth law 1.5.7 5.10.2

6-3
Physics of Digital Photography (Second Edition)

diffuse lighting see ‘lighting’ electron count 2.1.1, 3.6.4, 3.7.3

digital gain 2.6, 2.6.2 - 2.6.4, 5.8.4, 5.8.8 electronic shutter see ‘shutter’
digital negative see ‘Adobe DNG element (lens) see ‘lens (simple)’
converter’ encoding white 4.6, 4.6.1
digital number 2.1.1, 3.7.2, 3.9, 3.9.3, enlargement factor 1.3.7, 1.4.1, 5.1.3,
5.8, 5.8.2 5.2, 5.2.2, 5.2.4, 5.3.3, 5.9.3
digital output level 2.1.3, 2.2, 2.2.4, entrance pupil 1.1.9, 1.3, 1.3.1, 1.3.3,
2.3.1, 2.4.2, 4.4.2, 4.10.1 1.5, 1.5.10, 3.2.3, 5.1.1, 5.10.2
diopters 1.1.1 entrance window 1.3.2, 1.3.4
Dirac comb function see ‘comb equivalence ratio 5.1, 5.3.3
function’ equivalence ratio (working) 5.1.2, 5.1.3
Dirac delta function see ‘delta function’ equivalence theory see ‘cross-format
direct lighting see ‘lighting’ comparisons’
display dynamic range see ‘dynamic ETTR see ‘exposing to the right’
range (display)’ exit pupil 1.2.4, 1.3.1, 1.3.3, 1.5.10, 3.2.3
display gamma see ‘gamma decoding’ exit window 1.3.2, 1.3.4
display luminance see ‘luminance exposing to the right 2.7, 5.8, 5.8.2,
(display)’ 5.8.3, 5.8.8, 5.10.4
display profile 4.11.1, 4.11.4 exposure (photometric) 1.3.1, 1.5, 1.5.8,
distortion 1.1.11 2.5, 5.1.3, 5.8.2
distortion (keystone) 1.3.8 exposure (radiometric) 1.5
dithering 2.2.2 exposure (spectral) 3.6.1, 3.6.4
downsampling 5.7, 5.7.2, 5.8.5 exposure compensation 2.7, 2.7.1
dynamic range 2.1 exposure duration 1.5.8, 1.5.9, 2.8.1,
dynamic range (ADC) 2.1.3 2.8.2, 3.6.1, 3.7.1, 3.7.3, 4.3.1,
dynamic range (display) 2.1.3, 2.2.5, 5.1.1, 5.10.5, 5.10.6
2.3.2, 2.12.2, 2.13.2 exposure index 2.6
dynamic range (highlight) 2.3.4, 2.6.4, exposure level 5.8.2, 5.8.3, 5.8.4, 5.10.4
5.8.8 exposure mode 2.8
dynamic range (image) 2.1.3, 2.3, 2.13.3 exposure strategy 2.5
dynamic range (perceivable) 5.9.3 exposure value 2.5.1, 2.5.6, 2.7.1, 2.7.3,
dynamic range (photographic) 5.9.3 2.8.3, 2.12
dynamic range (raw) 2.1.3, 3.7.1, 5.8.2 - external quantum efficiency see ‘quan-
5.8.4, 5.9, 5.9.3, 5.10.6 tum efficiency’
dynamic range (scene) 2.1.3, 2.9, 2.12 eye cone primaries 4.1.2, 4.1.3
dynamic range (sensor) 5.9.2 eye cone response functions 3.6.3, 4.1.2,
dynamic range (shadow) 2.3.4 4.1.9, 4.4.1

edge definition 5.4, 5.6 f-number 1.5, 1.5.3, 1.5.4, 1.5.10, 5.10.1
edge-spread function 5.5 f-number (working) 1.5.5, 5.1.3
effective focal length see ‘focal length’ f-stop 1.5.6, 2.1.3, 2.5.6, 2.10, 5.8
electric field 2.11.1, 3.2.1 Fermat’s principle 1.1.1, 1.1.4
electromagnetic energy 1.5.1, 3.1, 3.1.1 field curvature 1.1.11, 1.4.6, 5.3.1
electromagnetic optics 3.2.1 field of view 1.3, 1.3.2
electromagnetic wave 2.11.1, 3.2.1, 4.2.1 field stop 1.3.2

6-4
Physics of Digital Photography (Second Edition)

fill factor 2.6, 3.3.1, 3.3.4, 3.6.4, 4.3.4 full-well capacity 2.1.1, 2.1.3, 3.6.6,
film speed 2.5.3, 2.6 3.7.1, 3.7.2, 5.8.2, 5.9.1
filter (lens) 2.10, 2.10.1, 2.11 full-well capacity per unit area 5.8.1,
fixed pattern noise see ‘noise (fixed 5.8.6
pattern)’
fixed-point number 4.8.1 gain (analog) see ‘analog gain’
flare 2.2.5, 2.5.2 gain (conversion) see ‘conversion gain’
flash 2.9 gain (conversion factor) see ‘conversion
flat-field correction 5.10.3 factor’
floating element 1.2, 1.2.2 gain (digital) see ‘digital gain’
flux 1.5.1, 1.5.2, 1.5.7 gain (display) 2.13.2
focal distance 1.1.9 gamma correction 2.2.5
focal length 1.1.9, 1.5.4 gamma decoding 2.2.1, 2.2.5, 2.13.2, 4.5
focal length (working) 5.1.2 gamma encoding 2.2.1, 2.2.4, 2.3.1,
focal length multiplier 1.3.6, 5.1, also see 2.3.2, 4.5, 4.10.1
‘equivalence ratio’ gamut 4.1.9, 4.4.1, 4.5, 4.11.1
focal plane 1.1.9, 1.2 gamut mapping 4.11.2, 4.11.4
focal point 1.1.6, 1.1.9 Gaussian distribution 3.8.5, 3.9.2,
focus 1.1.4, 1.2, 1.4, 5.2.3, 5.2.5, 5.3.4 5.10.3
focus and recompose 1.4.5 Gaussian equation 1.1.7, 1.1.9, 1.2.1,
focus breathing 1.2, 1.2.2, 1.3.4, 1.3.5, 1.3.3
1.5.5 Gaussian optics 1.1.2, 1.1.4, 1.1.5,
focus at infinity 1.1.6, 1.1.9, 1.1.10, 1.1.11, 3.2.6
1.2.1, 1.3.3, 1.5.3, 1.5.4, 1.5.10, Gaussian reference sphere 3.2.6
5.1.2 geometrical optics 1.1.1
focusing 1.2, 1.2.1, 1.3.5, 1.4.5 graduated neutral density filter 1.2.2,
focusing screen 1.2.3 2.7, 2.9, 2.10.1
format see ‘sensor format’ Grassman’s laws 4.1.3, 4.1.8
forward matrix (Adobe) 4.9, 4.9.2 greyscale 2.1.2, 2.12.1, 4.1.1
Fourier transform 3.1.7, 3.2.3, 3.4.1, Gullstrand’s equation 1.1.8
3.9.2, 5.7.2
frame 2.12, 3.8.1, 3.9.1, 5.10.3 halo 2.12.2
frame averaging 2.12.1, 3.8.4, 5.10.3 headroom (exposure) 2.6.1, 5.8.4,
frame readout speed 1.5.9 5.10.5, 5.10.6
Fraunhofer diffraction 3.2.3 headroom (raw) see ‘raw headroom’
frequency (optical) see ‘optical Helmholtz equation 3.2.1
frequency’ high dynamic range 2.7, 2.9, 2.12
frequency (spatial) see ‘spatial highlight dynamic range see ‘dynamic
frequency’ range (highlight)’
frequency leakage 5.7.1 highlights 2.9
Fresnel diffraction 3.2.3 histogram 2.4, 2.4.1, 2.4.2, 5.10.3
Fresnel-Kirchhoff equation 3.2.2, 3.2.3 hue 4.1.1, 4.1.7, 4.1.9
Fuji X-Trans 3.6.3, 4.3.2 Hunt-Pointer-Estevez matrix 4.6.2
full-frame format 1.3.6, 3.2.4, 5.1, 5.3.1, Huygens-Fresnel principle 3.2.2
5.10.2 hyperfocal distance 1.4.4

6-5
Physics of Digital Photography (Second Edition)

ICC colour profile see ‘colour profile’ ISO 2720 standard 2.5.1, 2.5.2, 2.5.3,
illuminance 1.5.1, 1.5.3, 1.5.5, 1.5.7, 2.5.4
1.5.10, 2.12.1 ISO 2721 standard 2.5.3
illuminant A 4.2.4, 4.9.1 ISO 12232 standard 2, 2.5.1, 2.5.2, 2.5.4,
illuminant D50 4.2.4, 4.5, 4.8.2, 4.9, 2.6.1 - 2.6.3, 3.1, 5.1.3, 5.8.8
4.9.2 ISO 12233 standard 5.5
illuminant D55 4.2.4 ISO 17321 standard 4.4.2
illuminant D65 4.1.12, 4.2.4, 4.4.4, ISO gain 3.7.1, 3.7.3, 5.9.1
4.5.1, 4.6, 4.9.1, 4.10.2 ISO invariance 5.8.4
illuminant D75 4.2.4 ISO-less setting 5.8.4, 5.9.2, 5.10.6
illuminant E 4.1.4, 4.1.7, 4.1.9, 4.1.12, ISO setting 1.5.8, 2.5, 2.6, 3.7.1, 3.7.3,
4.2.4, 4.3.5 3.9.3
illuminant estimation 4.6, 4.6.1 ISO setting (base) see ‘base ISO setting’
illumination 4.2, 4.2.1, 4.2.2, 4.6 ISO setting (equivalent) 5.1.3
image 1.1.1, 1.1.3 ISO setting (raw) 5.8.8
image circle 1.3.4 ISO speed 2.6, 2.6.1, 5.8.8
image distance 1.1.3, 1.1.7 isoplanatic see ‘linear shift-invariance’
image dynamic range see ‘dynamic isotherm 4.2.2
range (image)’
image height 5.3.1, 5.3.3 jagged edges 5.7.1
image plane 1.1.3, 1.1.4, 1.1.10, 1.2.1, jinc function 3.2.4
5.10.1 jitter 5.6, 5.10.1
image space 1.1.6, 1.5.7, 5.10.1
image stabilization 2.8.1, 2.8.2 Kelvin 4.2.1
imaginary primary 4.1.2, 4.1.3, 4.1.8, keystone distortion see ‘distortion
4.1.9, 4.3.4, 4.5, 4.11.2, 4.11.4 (keystone)’
impulse response 3.2.2
incandescence 4.2.2 LAB colour space see ‘colour space
incident-light metering see ‘metering (CIE LAB)’
(incident-light)’ Lagrange theorem 1.1.10, 1.5.3, 1.5.7,
incident-light value 2.7.3 1.5.10
incoherence 3.2.1, 3.2.3, 5.3 Lambertian surface 1.5.2, 2.5.5
infinity focus see ‘focus at infinity’ Lambert’s cosine law 1.5.2
infrared-blocking filter 3.6.3 LCD display 2.13.2
input-referred units 3.9, 5.8.3, 5.9.1 least distance of distinct vision 1.4.1,
integration time 3.8.3, 3.9.2 5.2.1, 5.3.1, 5.9.3
intensity (luminous) 1.5.1 least resolvable separation 5.1.2, 5.3.2
intensity (optical) 3.2.1 lens (compound) 1.1.2, 1.1.5, 1.1.6,
intensity (radiant) 3.1.1, 3.2.1 1.1.8, 3.2.3
interference 3.2.2 lens (simple) 1.1.1
internal focusing see ‘focusing’ lens design 1.1.2, 1.1.11, 1.3.2, 3.2.6
iris diaphragm 1.4.6, 1.5.6 lens MTF see ‘modulation transfer
irradiance 2.12.1, 3.2.1, 4.4.2, function (lens)’
5.10.1 lens PSF see ‘point spread function
ISO 2240 standard 2.5.3 (lens)’

6-6
Physics of Digital Photography (Second Edition)

lensmakers’ formula 1.1.8 matrix metering see ‘metering (matrix)’

lighting 2.9 Maxwell’s equations 3.2.1
lightness 2.2.3, 2.4.2, 4.4.3, 5.1.3 meridional direction see ‘tangential
line pair 1.4.1, 3.1.7, 5.2.1 direction’
line pairs per mm 5.2.1, 5.3.2 meridional plane 1.3.2
line pairs per picture height 5.3.3, 5.4.1, metameric error 4.4.1
5.6.1 metamerism 4.1.1, 4.1.2, 4.1.3
linear shift-invariance 3.1.4, 3.2.6 meter calibration constant 2.5.1, 2.5.4,
linear systems theory 3.1 2.5.5, 2.7.3
linearization 4 metering (centre-weighted) 2.7.2
LMS colour space see ‘colour space metering (incident light) 2.5.5, 2.7.3
(LMS)’ metering (matrix) 2.7.2
low-pass filter 3.4.3 metering (reflected light) 1.5.5, 2.5,
luma 2.2.5, 2.13.1, 4 2.5.1, 2.7, 2.7.2
lumen 1.5.1 metering (spot) 2.7.2
luminance 1.5.1, 2.1.2, 3.1.1, 4.1.1, Michelson equation 3.1.8
4.1.8, 4.1.10 micro-contrast 5.3.4, 5.4
luminance (average scene) see ‘average microlens 3.3.1
scene luminance’ micron 3.3.1
luminance (display) 2.13.2 microscopy 5.10.1
luminance (relative) 2.1.2, 2.2.3, 2.13.1, middle grey 2.2.3, 2.3.4, 2.5, 2.5.1, 2.5.5,
4.1.11 2.6.4, 5.1, 5.1.3
luminosity see ‘standard luminosity Mie scattering 2.9
function’ model camera system 3.1.10, 3.5.1,
luminous efficacy 3.1.1, 4.1.10 3.5.2, 5.5.1
luminous energy 5.1.3 modulation depth 3.1.8, 3.4.7, 5.3
luminous flux 1.5.1, 1.5.2 modulation transfer function 3, 3.1.7,
luminous intensity 1.5.1 3.1.8
Luther-Ives condition 2.12.1, 4.1, 4.4, modulation transfer function (detector
4.4.1, 4.4.2 aperture) 3.3.4, 3.5.2, 5.5.1
lux 1.5.1 modulation transfer function (diffrac-
tion) 3.2.5, 3.5.2
MacAdam’s diagram see ‘chromaticity modulation transfer function (lens) 5.3,
diagram (uv)’ 5.3.1, 5.3.2, 5.3.4, 5.5.1
macro lens 1.1.10, 1.3.5 modulation transfer function (optical
magnetic field 3.2.1 low-pass filter) 3.4.8, 3.5.2, 5.5.1
magnification 1.1.10, 1.2.1, 1.3.4, 1.5.10, modulation transfer function (poly-
3.1.2, 5.10.1 chromatic) 3.6.5
Malus’ law 2.11.1 modulation transfer function (system)
manual focus 1.2.1 5.4
manual mode 2.8.4, 5.10.6 moire 3.4
Marechal criterion 3.2.6 monochromatic light 3.2.1
marginal ray 1.3.2, 1.5.3, 1.5.10 monochrome 4.1.1
matrix (transformation) see ‘transfor- mosaic 3.4.5, 3.6.3, 3.7.3, 4.3.1, 4.3.2
mation matrix’ motion blur 2.8.1, 2.8.2, 2.10, 5.1.3

6-7
Physics of Digital Photography (Second Edition)

MTF see ‘modulation optical quality factor see ‘aberration

transfer function’ transfer function’
MTF50 5.6, 5.6.1 optical transfer function 3, 3.1.7
optical transfer function (diffraction)
nearest-neighbour interpolation 5.7.1 3.2.5, 5.3.2
neutral density filter 2.10 optical transfer function (system) 3.1.10
nodal point 1.1.9 opto-electronic conversion function
nodal slide 1.1.9 2.3.2, 4.4.2
noise 2.2.2, 3.8, 5.1.1 output-referred colour space see ‘colour
noise (cyclic pattern) 4.3.2 space (output-referred)’
noise (dark-current shot) 3.8.2, 3.8.3 output-referred units 3.9, 5.8.2, 5.9.1
noise (fixed pattern) 3.8, 3.8.5, 3.9.3,
5.8, 5.10.3 panoramic 1.1.9
noise (measurement) 3.9 paraxial ray 1.1.3, 1.1.4, 1.1.5
noise (photon shot) 3.8.1, 5.8.2 paraxial region 1.1.3, 1.1.4, 1.1.11
noise (read) 2.1.1, 2.1.3, 3.8.2, 3.8.5, passband 5.7.1
3.9.2, 3.9.3, 5.8.1, 5.8.6 passband (spectral) see ‘spectral
noise (temporal) 3.8, 3.8.1, 5.8, 5.8.5, passband’
5.10.3 pentaprism 1.2.3
noise-equivalent exposure 5.9.1 perceptual intent 4.11.2, 4.11.4
noise filtering 5.10.3 perspective 1.3.3, 1.3.7, 5.1, 5.1.3
noise floor see ‘noise (read)’ phase 3.1.7, 3.2.1
noise model 3.9.3 phase-detect autofocus 1.2.4
noise power 3.8.4, 5.10.3 phase transfer function 3.1.7, 3.1.9,
noise reduction 5.8.5, 5.10.3 3.4.7, 3.4.8
non-spectral purples 4.1.9 phase transformation function 3.2.3
numerical aperture 1.5.10, 5.10.1 photocurrent 3.6.4
Nyquist frequency (detector) see photodiode see ‘photoelement’
‘Nyquist frequency (sensor)’ photoelectron 2.1.1, 3.7.1, 5.8.3
Nyquist frequency (sensor) 3.3.4, 3.4.7, photoelement 3.3, 3.6.2
5.4.1, 5.5.1, 5.6.2, 5.7.1 photogate see ‘photoelement’
Nyquist rate 3.4.4 photographic constant 2.5.1, 2.5.3
photometry 1.5, 1.5.1, 3.1, 3.1.1, 4.1.5
object distance 1.1.3, 1.1.7 photometry (average) see ‘average
object plane 1.1.3, 1.1.4, 1.2.1, 1.3.8, photometry’
5.10.1 photon shot noise see ‘noise (photon
object space 1.1.6, 1.5.7, 5.10.1 shot)’
ocular see ‘viewfinder’ photon transfer curve 5.8.3
Olympus E-M1 camera 3.9.1 - 3.9.3, photosite 3.3, 3.3.1, 3.3.3
4.8.1, 4.8.2, 5.8.2 - 5.8.4 phototopic vision see ‘standard lumi-
optical axis 1.1.1, 1.1.2, 1.3.2 nosity function’
optical black photosite 3.8.3 picture height 5.3.3, 5.4.1
optical low-pass filter 3, 3.1.3, 3.4, pixel binning 5.8.5, 5.8.6, 5.9.3
3.4.6, 3.4.7, 4.3.2, 5.5.1 pixel count 4.11.5, 5, 5.6.2, 5.8.5
optical path difference 3.2.6 pixel pitch 3.1.10, 3.3.3, 3.3.4, 3.4.5

6-8
Physics of Digital Photography (Second Edition)

pixel response nonuniformity 3.8.5, principal surface 1.5.10

5.10.3 printing 4.11.1, 4.11.2, 4.11.5
pixel size 5 profile (display) see ‘display profile’
pixels per inch 4.11.5, 5.2.1, 5.6.3 profile connection space 4.1, 4.9, 4.9.1,
Planckian locus 4.2.1, 4.2.2 4.9.2
Planck’s law 4.2.1 program mode 2.8.3
plane wave 3.2.3 programmable gain amplifier 3.7, 3.7.1,
point source 1.5.1, 1.5.2 3.9.3
point spread function 3, 3.1.3, 3.1.5, PSF see ‘point spread function’
3.1.9, 3.2.6, 5.3 pupil function 3.2.6
point spread function (asymmetry) 5.3.4 pupil magnification 1.3.2, 1.3.3, 1.4.3,
point spread function (detector aper- 5.1.2, 5.1.3
ture) 3.3, 3.3.2, 3.5.1 pure spectral colour see ‘hue’
point spread function (diffraction) 3.2.3,
3.5.1 quadrature 3.8.4, 3.9.3, 5.10.3
point spread function (lens) 5.3.1, 5.6.2 quantization step 2.1.3, 2.2.2, 3.7.2
point spread function (optical low-pass quantum efficiency 2.6, 3.3.4, 3.6.4, 5.1,
filter) 3.4.7, 3.5.1 5.8.1, 5.8.6
point spread function (polychromatic) quantum efficiency (internal) see
3.6.5 ‘charge collection efficiency’
point spread function (system) 3.1.10,
3.5, 5.2.3 radial direction see ‘sagittal direction’
Poisson distribution 3.8.1 radian 1.5.1
polarization 2.11, 3.2.1, 4.11.3 radiometry 1.5, 2.12.1, 3.1, 3.1.1
polarizing filter 1.2.2, 2.11 radius of curvature 1.1.1, 1.1.9
polychromatic light 3.2.1, 3.6.5, 4.1.1, raw channel 4.3.1
5.3.1 raw channel multipliers 4.4.2, 4.6, 4.6.3,
portrait lens 1.3.7 4.8, 4.8.1, 4.8.2
posterization 2.2.1, 2.2.2, 2.2.5 raw channel scaling 4.6.2, 4.6.3
power (spectral distribution) see ‘spec- raw clipping point 2.1.3, 3.7.4, 5.9.1,
tral power distribution’ 5.9.3
power (total refractive) 1.1.7, 1.1.8, raw data 2.1, 3.1, 3.7
1.1.9, 1.2.2, 1.5 raw dynamic range see ‘dynamic range
power (surface) 1.1.1, 1.1.3, 1.1.4, 1.1.8 (raw)’
pre-filtering 3.4.6 raw headroom 2.3.3, 2.12.2, 5.8.8
primaries (camera) see ‘camera raw raw level 2.1.1, 3.7, 3.7.1
space primaries’ raw tristimulus values 4.3.3
primaries (CIE RGB) 4.1.3, 4.1.4 raw value 2.1.1
primaries (CIE XYZ) 4.1.8 ray height 1.1.4, 1.1.5
primaries (eye cone) see ‘eye cone ray optics see ‘geometrical optics’
primaries’ ray tangent slope 1.1.4, 1.1.5, 1.5.3
primary (imaginary) see ‘imaginary Rayleigh quarter-wave limit 3.2.6
primary’ Rayleigh scattering 2.9, 2.9.1, 2.11.3
principal plane 1.1.6, 1.2.1, 1.5.10, 5.1.3 Rayleigh two-point criterion 5.3.2
principal point 1.1.6, 1.1.9, 1.3.3 raytrace see ‘ynu raytrace’

6-9
Physics of Digital Photography (Second Edition)

read noise see ‘noise (read)’ response curve see ‘sensor response
real rays 1.1.2, 1.1.11, 1.5.2, 1.5.10 curve’
reciprocal rule 2.8.1 response function (camera) see ‘camera
recombination 3.6.2, 3.6.4 response functions’
recommended exposure index 2.6.3 response function (eye cone) see ‘eye
reconstruction 3.4.3, 5.7.1, 5.7.2 cone response functions’
reconstruction kernel 5.7.1, 5.7.2 responsivity see ‘camera response
rectangle function 3.1.6, 3.3.4, 3.4.3, functions’
5.7.1 RMS wavefront error see ‘wavefront
reference sphere see ‘Gaussian error’
reference sphere’ Robertson’s method 4.2.2, 4.9.1
reference white 4.1.12, 4.2.3, 4.8, rolling shutter 1.5.9
4.10.2 rotation matrix see ‘colour rotation
reference white (camera) 4.3.5, 4.5.2, matrix’
4.6.3, 4.8
reflectance 2.5.5 s-curve 2.3.2, 2.3.3, 2.12.2, 4.10
reflected-light metering see ‘metering sagittal direction 5.3.1, 5.3.4
(reflected-light)’ sampling 3, 3.1.10, 3.3.2, 3.3.3, 3.4.1,
reflection 2.9, 2.9.1, 2.11.2 5.7.1
refracting surface 1.1.1 sampling theorem 3.4.4
refraction 1.1.1, 2.11.2, 3.4.7 saturation 2.1.3, 2.6.1, 5.8.2, 5.8.3, 5.8.4
refractive index 1.1.1, 1.1.6, 1.1.8, saturation (colour) 4.1.1, 4.1.7, 4.3.4
1.1.9, 1.5, 5.10.1 scattering 2.9
refractive power see ‘power’ scene dynamic range see ‘dynamic range
relative aperture 1.5, 1.5.3, 1.5.7 (scene)’
relative colourimetric intent 4.11.4 scene luminance ratio see ‘dynamic
relative colourimetry 4.1.11 range (scene)’
relative illumination factor 3.1.2, 3.5 Scheimpflug condition 1.3.8
relative luminance see ‘luminance sense node capacitance 3.6.6
(relative)’ sensitivity 2.6
rendering intent 4.11.2, 4.11.4 sensor 3.3
resampling 5.7, 5.8.5 sensor format 1.3.6, 1.4.1, 5.1, 5.10.2
resizing 4.11.5, 5.8.5 sensor Nyquist frequency see ‘Nyquist
resolution (image display) 4.11.5 frequency (sensor)’
resolution (object) 5.10, 5.10.1 sensor plane 1.2
resolution (optical) 3, 5.3.2 sensor response curve 2.1.1, 2.1.3, 2.12.1
resolution (perceived) 5.2 shadow dynamic range see ‘dynamic
resolution (print) 4.11.5 range (shadow)’
resolution (screen) 4.11.5 shadow improvement 5.8.3, 5.10.6
resolving power 2.8.1, 5 shadows 2.9
resolving power (lens) 5.3.1, 5.3.2 Shannon-Whittaker sampling theorem
resolving power (observer) 1.4.1, 5, 5.2, see ‘sampling theorem’
5.2.1, 5.2.2 sharpness 5, 5.4, 5.6, 5.10.2
resolving power (system) 5.5, 5.6, 5.10, shutter 1.2.3, 1.5.9
5.10.1 shutter priority mode 2.8.2

6-10
Physics of Digital Photography (Second Edition)

shutter shock 1.5.9 sRGB colour space see ‘colour space

shutter speed see ‘exposure duration’ (sRGB)’
sign convention 1.1.3, 1.1.7, 1.1.10 stabilization (image) see ‘image
signal (charge) see ‘charge signal’ stabilization’
signal (voltage) see ‘voltage’ standard colourimetric observer 4.1.3,
signal-to-noise ratio 2.6.1, 2.6.4, 3.7.1, 4.1.7
3.8.1, 3.9.3, 5, 5.1.1, 5.8, standard deviation 3.8.1, 3.8.2, 3.8.4
5.8.2 - 5.8.7, 5.10.5, 5.10.6 standard illuminant 4.2.4
signal-to-noise ratio per percentage area standard luminosity function 3.1.1,
5.8.6, 5.8.7, 5.9.3 3.6.3, 4.1.5, 4.1.8, 4.1.10
signal-to-noise ratio per pixel 5.8.5 standard output sensitivity 2.6.2,
signal-to-noise ratio per unit area 5.8.6 5.1.3, 5.8.8
silicon 3.6.2 standard viewing conditions 5.2.2,
sinc function 3.4.3, 5.7.1 5.2.4, 5.3.1, 5.9.3
sine function 1.1.1 - 1.1.3, 1.1.11 steradian 1.5.1
sine theorem 1.5.10 stop (photographic) see ‘f-stop’
single lens reﬂex 1.2.3 stopband 5.7.1
sinusoidal waveform 3.1.7, 3.1.8, 5.2.5 Strehl ratio 3.2.6
sky see ‘blue sky’ subjective quality factor 5.6, 5.6.3
smartphone 4, 4.7, 4.10, 5.10.1, 5.10.2 sunny-16 rule 2.5.3
Snell’s law 1.1.1 - 1.1.4, 1.1.11 sunriseand sunset 2.9.1
SNR see ‘signal-to-noise ratio’ surface power see ‘power (refractive)’
solid angle 1.5.1 sweet spot (lens) 5.10.1
spatial frequency 3.1.7, 3.1.8, 3.2.3, symmetric lens 1.3.1
3.2.5 sync speed 1.5.9
spatial period 3.4.1
spectral decomposition 3.1, 3.1.1 T-number 1.5.8
spectral exposure see ‘exposure tangent function 1.1.4
(spectral)’ tangent slope 1.1.4
spectral locus 4.1.7, 4.1.9 tangential direction 5.3.1, 5.3.4
spectral irradiance 3.1.1, 3.1.2, 3.3.1, telephoto lens 1.2.2, 1.3.3
3.5, 4.1.10, 4.3.1 temperature (colour) see ‘colour
spectral passband 3.2.1, 3.6.3 temperature’
spectral power distribution 3.6.5, temporal noise see ‘noise (temporal)’
4.1.1, 4.1.10, 4.1.11, 4.2.1, 4.2.4 thick lens 1.1.8
spectral radiance 3.1.1, 3.6.5, 4.1.6 thin lens 1.1.8, 3.2.3
spectral responsivity see ‘camera TIFF ﬁle 4.11.2, 4.11.3
response functions’ tilt-shift lens 1.3.8
speed value 2.5.6 time priority mode see ‘shutter priority
spherical aberration 1.1.2, 1.1.11, mode’
1.4.6, 1.5.10, 5.3.1 time value 2.5.6
spherical surface 1.1.1 tint (colour) see ‘colour tint’
spherical wave 3.2.2, 3.2.6 tonal levels 2.2.1, 4.11.3
spot metering see ‘metering (spot)’ tone curve 2.1.3, 2.3, 2.3.2, 2.12.2
sRGB colour cube 4.10.2 tone mapping (global) see ‘tone curve’

6-11
Physics of Digital Photography (Second Edition)

tone mapping (local) 2.3.2, 2.3.3, 2.12.2, voltage 2.1.1, 2.1.3, 2.2.5, 3.6.6, 3.7.1,
4, 4.10 5.8.4
tone mapping operator 2.12.2, 4 von-Kries CAT 4.6.2
transfer function see ‘optical transfer
function’ wave see ‘sinusoidal waveform’
transmission function (colour ﬁlter wave optics 3.2.1
array) 3.6.4, 4.3.4 wavefront 3.2.1, 3.2.3
transmission function (sensor) 3.6.4, wavefront error 3.2.6, 5.3.2
4.3.4 wavelength 2.9.1, 2.11.3, 3.2.1, 3.2.6,
transformation matrix 4.4.2, 4.4.4, 4.7 5.10.1
transmittance 1.5.3, 1.5.8, 2.5.2, 2.10 wavelength (visible range) 3.1.1, 4.1.1
triangle function 3.1.6 wavenumber 3.2.1
tripod 5.8.1, 5.10.1 Weber-Fechner law 2.2.3
tristimulus values 2.1.2, 2.2.4, 4.1.2, well capacity 3.6.2
4.1.6, 4.1.8 white (reference) see ‘reference white’
tristimulus values (raw) see ‘raw tristi- white balance 2.1.2, 4.6, 4.8.1
mulus values’ white balance matrix 2.12, 4.8, 4.8.1,
4.8.2, 4.9.2
ultraviolet light 5.10.1 white balance multipliers see ‘raw
undersampling 3.4.4 channel multipliers’
uniform chromaticity scale see ‘colour white point 4.2.3, 4.3.5, 4.4.2, 4.6.1, 4.7
space (CIE 1960 UCS)’ working equivalence ratio see
unit focusing see ‘focusing’ ‘equivalence ratio (working)’
unity gain 3.7.3, 5.8 working f-number see ‘f-number
unsharp mask 5.4, 5.6 (working)’
upsampling 5.7, 5.7.1 working focal length see ‘focal length
(working)’
variance 3.8.1, 3.8.4 working space 4.11.1, 4.11.2, 4.11.4
vector (colour) 4.1.11, 4.1.12, 4.3.3
veiling glare 5.3.4 XYZ colour space see ‘colour space
viewﬁnder 1.2.3, 1.4.5, 5.10.3 (CIE XYZ)’
viewing distance 1.4.1, 5.2.1, 5.2.2,
5.2.4 ynu raytrace 1.1.5, 1.1.6, 1.1.8
vignetting 1.4.6, 1.5.7, 1.5.8, 2.5.2, 3.1.2 Young-Helmholtz theory 4.1.2
virtual image 1.3.1, 1.3.2
visible range see ‘wavelengths Zone System 2.5.3
(visible range)’ zoom lens 1.3.5

6-12

Donald C Dilworth - Lense Design-IOP Publishing LTD (2018)
No ratings yet
Donald C Dilworth - Lense Design-IOP Publishing LTD (2018)
526 pages
Materi Narrative Text
No ratings yet
Materi Narrative Text
4 pages
01 (производство) Schwiegerling, Jim - Optical specification, fabrication, and testing-SPIE Press (2014)
100% (1)
01 (производство) Schwiegerling, Jim - Optical specification, fabrication, and testing-SPIE Press (2014)
218 pages
Cloud Atlas
0% (9)
Cloud Atlas
19 pages
BOOK (SPIE) - Field Guide To Optomechanical Design and Analysis
No ratings yet
BOOK (SPIE) - Field Guide To Optomechanical Design and Analysis
161 pages
Ultimate Guide Lens Design Forms
No ratings yet
Ultimate Guide Lens Design Forms
169 pages
Urban Design On Open Spaces
100% (1)
Urban Design On Open Spaces
20 pages
The Louvre Abu Dhabi: Analysis by
100% (1)
The Louvre Abu Dhabi: Analysis by
27 pages
2000 - Optical Design For Visual Systems - Bruce H Walker
No ratings yet
2000 - Optical Design For Visual Systems - Bruce H Walker
169 pages
Dokumen - Pub Optical Systems Engineering 0071754407 9780071754408
No ratings yet
Dokumen - Pub Optical Systems Engineering 0071754407 9780071754408
469 pages
Poems From The Desert - Mohammed Bin Rashid Al Maktoum
80% (5)
Poems From The Desert - Mohammed Bin Rashid Al Maktoum
16 pages
(Bruce H. Walker) Optical Design For Visual System
100% (4)
(Bruce H. Walker) Optical Design For Visual System
166 pages
Field Guide To Polarization
100% (1)
Field Guide To Polarization
143 pages
8.3 Dua For Seeking Protection
No ratings yet
8.3 Dua For Seeking Protection
5 pages
Handbook of Optics: Sponsored by The Optical Society of America
No ratings yet
Handbook of Optics: Sponsored by The Optical Society of America
12 pages
MY NEW LIFE MAIN WALKTHROUGH v. 1.7 Fixed
No ratings yet
MY NEW LIFE MAIN WALKTHROUGH v. 1.7 Fixed
54 pages
Dokumen - Tips - Optical Design With Zemax For PHD Advanced Optical Design With Zemax For PHD
No ratings yet
Dokumen - Tips - Optical Design With Zemax For PHD Advanced Optical Design With Zemax For PHD
89 pages
Art Blakey & The Jazz Messengers: 4. Moanin' (On "Moanin'" 1958)
No ratings yet
Art Blakey & The Jazz Messengers: 4. Moanin' (On "Moanin'" 1958)
4 pages
The Names of God
100% (1)
The Names of God
5 pages
Kingslake - Optical System Design (1983) - Part 3/3
100% (1)
Kingslake - Optical System Design (1983) - Part 3/3
132 pages
José Sasián - Introduction To Lens Design-Cambridge University Press (2013) PDF
100% (2)
José Sasián - Introduction To Lens Design-Cambridge University Press (2013) PDF
252 pages
John Greengo - Nature and Landscape Focus Keynote
No ratings yet
John Greengo - Nature and Landscape Focus Keynote
253 pages
Architect December 2018
100% (2)
Architect December 2018
164 pages
bk978 0 7503 2558 5ch0
No ratings yet
bk978 0 7503 2558 5ch0
20 pages
Lecture 3 - Color Image Processing
No ratings yet
Lecture 3 - Color Image Processing
105 pages
TTH Presskit
No ratings yet
TTH Presskit
12 pages
Nikon D3X Brochure (November 2008)
No ratings yet
Nikon D3X Brochure (November 2008)
14 pages
Lecture02 Optics Image Formation
No ratings yet
Lecture02 Optics Image Formation
7 pages
Children's Literature Dev. 1970s
No ratings yet
Children's Literature Dev. 1970s
6 pages
Pantazis Mouroulis John Macdonald Geometrical Optics and Optical Design PDF
No ratings yet
Pantazis Mouroulis John Macdonald Geometrical Optics and Optical Design PDF
185 pages
Fujifilm X-Pro1 Brochure
No ratings yet
Fujifilm X-Pro1 Brochure
16 pages
Dokumen - Tips - Optical Design With Zemax For PHD Basics Designforp Optical Design With Zemax
100% (1)
Dokumen - Tips - Optical Design With Zemax For PHD Basics Designforp Optical Design With Zemax
38 pages
Dinosaurs of The Lost World Rules
No ratings yet
Dinosaurs of The Lost World Rules
9 pages
Nightscapes Quick Guide
100% (1)
Nightscapes Quick Guide
8 pages
PHY-123 Geometrical Optics LEC-2
100% (1)
PHY-123 Geometrical Optics LEC-2
13 pages
Vdocuments - MX - Optical Design With Zemax Uni Jenade Designwithzemoptical Design With Zemax
100% (1)
Vdocuments - MX - Optical Design With Zemax Uni Jenade Designwithzemoptical Design With Zemax
39 pages
ODZ - PHD - Optical Design With Zemax PHD Basics 5 Aberrations II PDF
No ratings yet
ODZ - PHD - Optical Design With Zemax PHD Basics 5 Aberrations II PDF
34 pages
Computational Optics
100% (1)
Computational Optics
74 pages
Design of A 10x, 0.25 N.A., Microscope Objective
No ratings yet
Design of A 10x, 0.25 N.A., Microscope Objective
20 pages
Mobile Phone Camera Desig
No ratings yet
Mobile Phone Camera Desig
45 pages
Heritage Lalitpur 2
No ratings yet
Heritage Lalitpur 2
4 pages
Formwork Safe Practices Checklist During Design and Construction
No ratings yet
Formwork Safe Practices Checklist During Design and Construction
4 pages
Natural Landscape and Photography - PDF 20241111 173630 0000
No ratings yet
Natural Landscape and Photography - PDF 20241111 173630 0000
19 pages
Hotels: Quick & Useful Phrases
No ratings yet
Hotels: Quick & Useful Phrases
1 page
Reasons Bless The Lord Chords
No ratings yet
Reasons Bless The Lord Chords
3 pages
How To Process Color and Black-And - White Reversal and Negative Films and Papers
100% (3)
How To Process Color and Black-And - White Reversal and Negative Films and Papers
20 pages
Stop Shift Theory
No ratings yet
Stop Shift Theory
75 pages
Optical Testing: Introduction To
No ratings yet
Optical Testing: Introduction To
12 pages
2009 Class Summary Essay - Vineeth Abraham
No ratings yet
2009 Class Summary Essay - Vineeth Abraham
40 pages
Designing A Double-Gauss Lens, The Hard Way: David Shafer David Shafer Optical Design Fairfield, CT, 06824 #203-259-1431
No ratings yet
Designing A Double-Gauss Lens, The Hard Way: David Shafer David Shafer Optical Design Fairfield, CT, 06824 #203-259-1431
22 pages
ElliottErwitt PPT
No ratings yet
ElliottErwitt PPT
11 pages
Beam Splitter Tutorial Zemax
100% (1)
Beam Splitter Tutorial Zemax
12 pages
Zeiss About Lens Distortion
No ratings yet
Zeiss About Lens Distortion
28 pages
Visatec Manual
No ratings yet
Visatec Manual
13 pages
Asphere Design For Dummies
No ratings yet
Asphere Design For Dummies
20 pages
Project Report On Advertising Agencies
No ratings yet
Project Report On Advertising Agencies
65 pages
Portrait Photography: From The Victorians To The Present Day
No ratings yet
Portrait Photography: From The Victorians To The Present Day
35 pages
SchneiderKreuznach Optical Measurement Techniques With Telecentric Lenses
No ratings yet
SchneiderKreuznach Optical Measurement Techniques With Telecentric Lenses
61 pages
Technical Guide: Digital SLR Camera
No ratings yet
Technical Guide: Digital SLR Camera
16 pages
Branch-Common To All Disciplines: B.Tech. First Year
No ratings yet
Branch-Common To All Disciplines: B.Tech. First Year
8 pages
05a) Pupils and Stops - 1 - 25
No ratings yet
05a) Pupils and Stops - 1 - 25
9 pages
Photography Histogram
No ratings yet
Photography Histogram
8 pages
Untitled
No ratings yet
Untitled
1 page
Camera Design
No ratings yet
Camera Design
12 pages
Design of Basic Double Gauss Lenses
No ratings yet
Design of Basic Double Gauss Lenses
11 pages
IAT14 - Imaging and Aberration Theory Lecture 1 Paraxial Optics
No ratings yet
IAT14 - Imaging and Aberration Theory Lecture 1 Paraxial Optics
40 pages
Syllabus For Optical Engg.
No ratings yet
Syllabus For Optical Engg.
16 pages
The Shot by Gina Milicia
No ratings yet
The Shot by Gina Milicia
18 pages
Zoom Lens
No ratings yet
Zoom Lens
19 pages
Ending of Things
100% (2)
Ending of Things
10 pages
NatGeo GuideToPhotography
No ratings yet
NatGeo GuideToPhotography
15 pages
Fuji X-Pro1 Photographers Guide
No ratings yet
Fuji X-Pro1 Photographers Guide
18 pages
Intro To Iso, Aperture and Shutter Speed Iso
No ratings yet
Intro To Iso, Aperture and Shutter Speed Iso
4 pages
Zombie Song
No ratings yet
Zombie Song
2 pages
Ryan Bush - Asbstract Photography
No ratings yet
Ryan Bush - Asbstract Photography
20 pages
0805 Nebula Filters
No ratings yet
0805 Nebula Filters
5 pages
Focal Length - Opto Engineering
No ratings yet
Focal Length - Opto Engineering
1 page
My Left Foot - Summary Full Story
No ratings yet
My Left Foot - Summary Full Story
9 pages
Gurski Fin Eng
No ratings yet
Gurski Fin Eng
20 pages
Tutorial On Zemax
100% (1)
Tutorial On Zemax
10 pages
Thank You: First Steps With Your New INSEVIS-PLC For Choosing INSEVIS Products
No ratings yet
Thank You: First Steps With Your New INSEVIS-PLC For Choosing INSEVIS Products
8 pages
Zone Dial
No ratings yet
Zone Dial
2 pages
Dig Pass - Fontarum
No ratings yet
Dig Pass - Fontarum
13 pages
China Worksheet
No ratings yet
China Worksheet
1 page
2nd Photoshoot Plan-Shadows
No ratings yet
2nd Photoshoot Plan-Shadows
1 page
CoWorker Letter May 20 1982 - Herbert W Armstrong
No ratings yet
CoWorker Letter May 20 1982 - Herbert W Armstrong
3 pages
3组新）龟山汉墓为何成为世界奇迹？
No ratings yet
3组新）龟山汉墓为何成为世界奇迹？
11 pages
Luximprint Printing Capabilities Material Specifications V1.0
No ratings yet
Luximprint Printing Capabilities Material Specifications V1.0
2 pages
Perspective View: This Site
No ratings yet
Perspective View: This Site
1 page
Gradable and Non-Gradable Adjectives: Ex 1 Complete Very or Absolutely
No ratings yet
Gradable and Non-Gradable Adjectives: Ex 1 Complete Very or Absolutely
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Physicsof Digital Photography

Uploaded by

Physicsof Digital Photography

Uploaded by

This content has been downloaded from IOPscience. Please scroll down to see the full text.

Please note that terms and conditions apply.

You may also be interested in:

Physics of Digital Photography: Fundamental optical formulae

Enhancement of strain measurement accuracy using optical extensometer by application of

Parallel phase-shifting digital holography based on the fractional Talbot effect

Negative PMMA as a high-resolution resist - the limits and possibilities

Fabrication of circular sawtooth gratings using focused UV lithography

Diffraction-limited depth-from-defocus imaging with a pixel-limited camera using pupil phase

Digital approximation to extended depth of field in no telecentric imaging systems

IOP Publishing, Bristol, UK

ISBN 978-0-7503-2558-5 (ebook)

Published by IOP Publishing, wholly owned by The Institute of Physics, London

IOP Publishing, Temple Circus, Temple Way, Bristol, BS1 6HG, UK

1 Photographic optics 1-1

1.4.3 Depth of field control 1-44

2 Digital output and exposure strategy 2-1

2.5 Average photometry 2-20

3 Raw data model 3-1

3.6 Charge signal 3-50

4 Raw conversion 4-1

4.2 Illumination 4-18

4.11.3 16-bit TIFF files 4-58

5 Camera image quality 5-1

5.8.4 ISO invariance 5-53

The aim of this book is to provide a theoretical overview of the photographic

FoV ﬁeld of view

SNR signal-to-noise ratio

Physics of Digital Photography (Second Edition)

doi:10.1088/978-0-7503-2558-5ch1 1-1 ª IOP Publishing Ltd 2020

1.1 Optical image formation

1.1.2 Lens design

The simplest example conceptually is that of spherical aberration (SA). Consider an

1.1.3 Paraxial imaging

1.1.4 Gaussian optics

1.1.5 Compound lenses: ynu raytrace

Figure 1.6. Transfer of a paraxial ray between two surfaces.

1.1.6 Principal planes

Object space and image space

1.1.7 Gaussian conjugate equation

• Φ is the total refractive power of the compound lens. In general, Φ depends

1.1.8 Thick and thin lenses

1.1.9 Focal length

Rear effective focal length

Front effective focal length

Gaussian conjugate equation

Front and back focal distances

Effective focal length

Air–lens–air Water–lens–water Water–lens–air Air–lens–oil

be consistent with such formulae, −m is denoted by ∣m∣ where appropriate in this

1.1.11 Lens aberrations

• Coma: Comatic aberration or coma is caused by variation of magniﬁcation

Two further Seidel aberrations appear in the presence of polychromatic light.

Aberrations can be counterbalanced by utilising the degrees of freedom provided by

1.2.1 Unit focusing

The required solution is

1.2.2 Internal focusing

front group ﬂoating group rear group

• The lens does not physically extend.

1.2.3 Single lens reﬂex cameras

1.2.4 Phase-detect autofocus

1.3.1 Entrance and exit pupils

1.3.2 Chief rays

1.3.3 Pupil magniﬁcation

Both sEP and sXP reduce to zero when m p = 1.

1.3.4 Angular ﬁeld of view formula

vertical direction has been denoted as α. An expression for α can be obtained in

1.3.5 Focus breathing

1.3.6 Focal length multiplier

Medium format 0.70

Here X is the enlargement factor. For the 35 mm full-frame format, X ≈ 8 when an

1.3.8 Keystone distortion

1.4 Depth of field

1.4.1 Circle of confusion

Table 1.2. Standard CoC diameters for various sensor formats.

Sensor format Sensor dimensions (mm) CoC (mm)