0% found this document useful (0 votes)
54 views98 pages

2011 Field Guide To Image Processing (2012)

Uploaded by

VALERIE LAPIERRE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views98 pages

2011 Field Guide To Image Processing (2012)

Uploaded by

VALERIE LAPIERRE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 98

SPIE PRESS | Field Guide

Field Guide to

Image
Processing

Khan M. Iftekharuddin
Abdul A. Awwal

FG25 covers and title.indd 1 3/8/12 11:03 AM


Field Guide to

Image
Processing
Khan M. Iftekharuddin
Abdul A. Awwal

SPIE Field Guides


Volume FG25

John E. Greivenkamp, Series Editor

Bellingham, Washington USA


Library of Congress Cataloging-in-Publication Data

Iftekharuddin, Khan M. (Khan Mohammad), 1966-


Field guide to image processing / Khan M. Iftekharuddin,
Abdul A. Awwal.
p. cm. – (SPIE field guides ; FG25)
Includes bibliographical references and index.
ISBN 978-0-8194-9021-6
1. Image processing. 2. Image compression. I. Awwal,
Abdul A. S. II. Title.
TA1637.I34 2012
621.367–dc23
2012001596

Published by

SPIE
P.O. Box 10
Bellingham, Washington 98227-0010 USA
Phone: +1.360.676.3290
Fax: +1.360.647.1445
Email: books@spie.org
Web: http://spie.org

Copyright © 2012 Society of Photo-Optical Instrumenta-


tion Engineers (SPIE)

All rights reserved. No part of this publication may be re-


produced or distributed in any form or by any means with-
out written permission of the publisher.
The content of this book reflects the work and thought of
the author. Every effort has been made to publish reliable
and accurate information herein, but the publisher is not
responsible for the validity of the information or for any
outcomes resulting from reliance thereon. For the latest
updates about this title, please visit the book’s page on our
website.
Printed in the United States of America.
First Printing
Introduction to the Series
Welcome to the SPIE Field Guides—a series of publica-
tions written directly for the practicing engineer or sci-
entist. Many textbooks and professional reference books
cover optical principles and techniques in depth. The aim
of the SPIE Field Guides is to distill this information,
providing readers with a handy desk or briefcase refer-
ence that provides basic, essential information about op-
tical principles, techniques, or phenomena, including def-
initions and descriptions, key equations, illustrations, ap-
plication examples, design considerations, and additional
resources. A significant effort will be made to provide a
consistent notation and style between volumes in the se-
ries.
Each SPIE Field Guide addresses a major field of optical
science and technology. The concept of these Field Guides
is a format-intensive presentation based on figures and
equations supplemented by concise explanations. In most
cases, this modular approach places a single topic on a
page, and provides full coverage of that topic on that page.
Highlights, insights, and rules of thumb are displayed in
sidebars to the main text. The appendices at the end of
each Field Guide provide additional information such as
related material outside the main scope of the volume,
key mathematical relationships, and alternative methods.
While complete in their coverage, the concise presentation
may not be appropriate for those new to the field.
The SPIE Field Guides are intended to be living
documents. The modular page-based presentation format
allows them to be easily updated and expanded. We are
interested in your suggestions for new Field Guide topics
as well as what material should be added to an individual
volume to make these Field Guides more useful to you.
Please contact us at fieldguides@SPIE.org.

John E. Greivenkamp, Series Editor


College of Optical Sciences
The University of Arizona

Field Guide to Image Processing


The Field Guide Series

Keep information at your fingertips with all of the titles in


the Field Guide Series:
Adaptive Optics, Robert Tyson & Benjamin Frazier
Atmospheric Optics, Larry Andrews
Binoculars and Scopes, Paul Yoder, Jr. & Daniel
Vukobratovich
Diffractive Optics, Yakov Soskind
Geometrical Optics, John Greivenkamp
Illumination, Angelo Arecchi, Tahar Messadi, & John
Koshel
Image Processing, Khan Iftekharuddin & Abdul Awwal
Infrared Systems, Detectors, and FPAs, Second Edition,
Arnold Daniels
Interferometric Optical Testing, Eric Goodwin & Jim
Wyant
Laser Pulse Generation, Rüdiger Paschotta
Lasers, Rüdiger Paschotta
Microscopy, Tomasz Tkaczyk
Optical Fabrication, Ray Williamson
Optical Fiber Technology, Rüdiger Paschotta
Optical Lithography, Chris Mack
Optical Thin Films, Ronald Willey
Polarization, Edward Collett
Probability, Random Processes, and Random Data
Analysis, Larry Andrews & Ronald Phillips
Radiometry, Barbara Grant
Special Functions for Engineers, Larry Andrews
Spectroscopy, David Ball
Visual and Ophthalmic Optics, Jim Schwiegerling
Wave Optics, Dan Smith

Field Guide to Image Processing


Field Guide to Image Processing
Image-processing specialists use concepts and tools to
solve practical problems. Some of these tools are linear,
while others are nonlinear. The specialist develops a recipe
for solving this problem by combining various tools in
different sequences. To solve a given problem, one recipe
may call for image preprocessing followed by feature
extraction and finally object recognition. Another recipe
may skip the preprocessing and feature extraction, and
instead perform the recognition directly using a matched
filter on the raw image data. Once a recipe is selected,
it may require a number of parameters, that, depending
on the practical constraint, may need to be optimized to
obtain the best result given the image quality, dimension,
or content.
In this Field Guide, we introduce a set of basic image-
processing concepts and tools: image transforms and
spatial domain filtering; point processing techniques; the
Fourier transform and its properties and applications;
image morphology; the wavelet transform; and image
compression and data redundancy techniques. From these
discussions, readers can gain an understanding of how to
apply these various tools to image-processing problems.
However, true mastery is only gained when one has an
opportunity to work with some of these tools.
We acknowledge our gratitude to our family members and
parents for giving us the opportunity to work on this
book. In particular, Dr. Iftekharuddin would like thank
Tasnim and Labib for their constant support, and parents
Muhammad Azharuddin and Khaleda Khanam for their
encouragement; Dr. Awwal would like to thank Syeda,
Ibrahim, and Maryam for their constant support, and
parents Mohammad Awwal and Saleha Khatoon for their
encouragement.
Khan Iftekharuddin
Old Dominion University

Abdul Awwal
Lawrence Livermore National Laboratory

Field Guide to Image Processing


vii

Table of Contents

Glossary of Symbols and Notation ix


Image-Processing Basics 1
Image Processing Overview 1
Random Signals 2
General Image-Processing System 3
Simple Image Model 4
Sampling and Quantization 5

Spatial-Domain Filtering 6
Image Transforms 6
Image Scaling and Rotation 7
Point Processing 8
Spatial-Domain Convolution Filters 9
Convolution 10
Linear Filters 11
Gradient Filters 13
Histogram Processing 15

Frequency-Domain Filtering 17
The Fourier Transform 17
Discrete Fourier Transform 18
Properties of the Fourier Transform 19
Convolution and Correlation in the Fourier
Domain 20
More Properties of the Fourier Transform 21
Spectral Density 22
Properties of the Discrete Fourier Transform 23
Discrete Correlation and Convolution 26
Circular Convolution and Zero Padding 27
Matched Filtering 28
Filtering with the Fourier Transform 29
Low-Pass and High-Pass Filtering 30
Sampling 31
Spectrum of a Sampled Signal 32
More Sampling 33
Spectrum of a Finite Periodic Signal 34

Image Restoration 36
Image Restoration 36
Linear Space-Invariant Degradation 37
Discrete Formulation 38
Algebraic Restoration 39

Field Guide to Image Processing


viii

Table of Contents

Motion Blur 40
Inverse Filtering 42
Weiner Least-Squares Filtering 43

Segmentation and Clustering 44


Image Segmentation and Clustering 44
Hough Transform 46
Clustering 48

Image Morphology 50
Erosion and Dilation 50
Opening and Closing 52
Hit-or-Miss Transform 53
Thinning 54
Skeletonization 55
Gray-Level Morphology 56
Training a Structuring Element 57

Time-Frequency-Domain Processing 58
Wavelet Transform 58
Types of Fourier Transforms 59
Wavelet Basis 60
Continuous Wavelet Transform 61
Wavelet Series Expansion 62
Discrete Wavelet Transform 63
Subband Coding 64
Mirror Filter and Scaling Vector 65
Wavelet Vector and 1D DWT Computation 66
2D Discrete Wavelet Transform 67

Image Compression 69
Data Redundancy 69
Error-Free Compression 70
Spatial Redundancy 71
Differential Encoding Example 72
Block Truncation Coding: Lossy Compression 73
Discrete Cosine Transform 74
JPEG Compression 75
Equation Summary 76
Bibliography 81
Index 82

Field Guide to Image Processing


ix

Glossary of Symbols and Notation

CΨ Admissibility condition
e p,q ( t) Windowed Fourier transform
f ªs Erosion
f ⊕s Dilation
F Fourier transform operator
f ( k) 1D discrete signal
f ( k) ⊗ g( k) Correlation operation
Fn Fourier series expansion
F ( n) Discrete Fourier transform of 1D signal
F ( u, v) Fourier-transformed image
f ( x) ∗ g ( x) Convolution operation
f ( x, y) Image
fˆ( x, y) Restored (approximate) image
G ( n, m) Discrete Fourier transform of 2D signal
H Degradation model
H −1 Inverse filter
H ( u, v) 2D filter in the frequency domain
h( x, y) 2D filter (transfer function) in the
spatial domain
I Intensity
I∗E Hit-and-miss transform
i ( x, y) Illumination
L Grayscale
m Degraded image
n( x, y) 2D noise in spatial domain
p x,y ( x, y) Probability density function (PDF)
R Regions
rect( x/a) rect function
R f f ( x, y) Autocorrelation
R f g ( x, y) Cross-correlation
r ( x, y) Reflectance
sinc(au) sinc function
T Transformation matrix
W f (a, b) Wavelet transform
δm,n 2D discrete Kronecker delta
δ( t) Delta function
µ Mean
Field Guide to Image Processing
x

Glossary of Symbols and Notation

σ Standard deviation
Φ( t ) 1D scaling vector
Ψa,b ( x) Wavelet basis function
Ψ( t) 1D wavelet vector

Field Guide to Image Processing


Image-Processing Basics 1

Image Processing Overview

The purpose of image processing is to extract some


information from an image or prepare it for a certain task,
such as viewing, printing, or transmitting. Image quality
may be poor, so the image may need to be preprocessed.
Preprocessing can be in done in spatial or frequency
domain using a variety of techniques. Frequency-domain
processing can be implemented through a Fourier
transform. For example, if the camera was moved while
taking the image, a restoration technique may be needed
to eliminate motion blur or defocus.
After noise has been reduced or the image has been
enhanced in some way, one can look closely at an object
of interest in the image. Thresholding is a simple way to
isolate (segment) objects in an image if the background
has very low intensity. On the other hand, one may be
interested in finding points, lines, or other features in
the image, such as intensity, color, shape, or texture of
an object. The Hough transform will help find lines in an
image that intersect at different points.
In some noise-removal cases, the noise is of aspecific
shape, e.g., in the case where there are never more
than four consecutive image pixels in the area. Erosion
and dilation offer a powerful spatial-domain technique to
remove noise from an image. Often, it may be desirable
to zoom in to some detailed parts of an image to
observe something interesting. The more sophisticated
multiresolution-analysis power of the wavelet transform
is needed in these cases.
With the advent of the Internet and the widespread use
of pictures and movies, the need for image compression
is greater than ever. Volumes of data must be stored,
retrieved, and transmitted quickly, and clever compression
methods save time and space.

Field Guide to Image Processing


2 Image-Processing Basics

Random Signals

Stationary process: A random sequence u( n) can be


either
• strict-sense stationary, if { u( l ), 1 ≤ l ≤ k} is the same
as the shifted sequence {u( l + m), 1 ≤ l ≤ k} for any
integer and any length k; or
• wide-sense stationary, if E [ u( n)] = µ = constant,
E [ u( n) u( n0 )] = r ( n − n0 ) ⇒ covariance, i.e., r ( n, n0 )
= r ( n − n0 ).

Wide-sense stationary is used for any random process.


A Gaussian process is completely specified by its mean
and covariance. Therefore, strict-sense and wide-sense
stationary are the same.
Independence: Two random variables x and y are
independent if and only if their joint probability
density function (PDF) is the product of their marginal
densities, p x,y ( x, y) = p x ( x) p y ( y). Furthermore, two random
sequences x(n) and y(n) are independent if and only if for
every n and n0 , the random variables x(n) and y( n0 ) are
independent.
Orthogonality: Random variables x and y are
orthogonal if E [ x( y)] = 0, and uncorrelated if E [ x y∗ ] =
(E [ x])(E [ y∗ ]) ⇒ E [( x − µ x )( y − µ y )∗ ] = 0. Thus, zero-mean un-
correlated random variables are also orthogonal. Gaussian
random variables that are uncorrelated are also indepen-
dent.
Information: Consider a source (image) that generates a
discrete set of independent messages (gray levels) r k with
probabilities p k , k = 1, . . . , L. Information associated with
P
r k is defined as I k = − log2 p k bits, since ∞ p = 1. Each
k=1 k
p k ≤ 1, and I k is nonnegative.
Entropy: The average information generated by the
P
source, entropy, H = − ∞ p log2 p k bits/message. For
k=1 k
a digital image, image pixels can be considered as
independent under certain circumstances; hence, its
entropy can be estimated from its histogram.

Field Guide to Image Processing


Image-Processing Basics 3

General Image-Processing System

Image acquisition occurs when a sensitive device


(sensor) responds to electromagnetic energy (ultraviolet,
visible, IR), and the analog image is then digitized and
stored. Image preprocessing improves upon the quality
of a digitized image through such steps as contrast
enhancement, noise minimization, and thresholding.
These steps are primarily performed in algorithms and
software. Hardware implementation of algorithms is also
possible, primarily for real-time implementation of time-
critical applications. Image segmentation is the process
of extracting image features and segmenting image
contents into meaningful parts. Images can be segmented
along different features, such as intensity, texture, color,
and shape.
Representation and description selects the features
that best represent the intended object using image
segments. Recognition and interpretation assigns
labels to known data and interprets unknown data using
the labeled set. Knowledge data is the repository that
contains known information about image objects. This
data can be used to interactively improve all of the basic
image-processing steps discussed above.

Field Guide to Image Processing


4 Image-Processing Basics

Simple Image Model

An image f ( x, y) is a spatial intensity of light and can be


shown as

0 ≤ f ( x, y) < ∞

and

f ( x, y) = i ( x, y) r ( x, y) = L ⇒ gray level

where i ( x, y) is illumination (amount of source light


incident on the scene), and r( x, y) is reflectance (amount
of light reflected by the objects). It follows that

0 < i ( x, y) < ∞

and

0 ≤ r ( x, y) < 1

where r( x, y) = 0 is the total absorption of the object, while


r ( x, y) = 1 is the total reflectance of the object.

A grayscale image is composed of shades of gray varying


continuously from black to white, given as

L min ≤ L (gray level) ≤ L max

where L is the grayscale. We also know that

i min r min ≤ L ≤ i max r max

Usually, grayscale varies between [L min , L max ], i.e.,


between 0 and 255 discrete levels. Numerically, this can be
represented as [0, L − 1], where 0 is black and L is white.

Field Guide to Image Processing


Image-Processing Basics 5

Sampling and Quantization

Image sampling is the digitization of spatial coordinates


( x, y). Image resolution depends on sampling.

Aliasing is the image artifact produced by undersampling


an image. To avoid aliasing, one needs to sample the image
at twice the Nyquist frequency.

Consider an image as follows:

 
f (0, 0) f (0, 1) ··· f (0, M − 1)
 f (1, 0) f (1, 1) ··· f (1, M − 1) 
f ( x, y) ≈ 



···
f ( N − 1, 0) f ( N − 1, 1) · · · f ( N − 1, M − 1) N × M

where f ( x, y) is a continuous image with equally spaced


samples. For simplicity, consider that the samples are
powers of 2; hence, N = 2n , M = 2m , and gray level G = 2k .
Then, the number of bits needed to store a digitized image
is

b = N × M × k = N2k (if N = M )

Image quantization is the gray-level digitization of


amplitude f ( x, y). Image intensity variation (grayscale)
depends on quantization.

Binary image is the quantization of an image intensity


in just two levels, such as 1 (white) and 0 (black).

Bit-plane image is the quantization of an image intensity


at different quantized levels, such as G = 2m , where m is
the number of bit levels. Each bit-plane image contains
information contained in the corresponding bit level.

Field Guide to Image Processing


6 Spatial-Domain Filtering

Image Transforms

Spatial image transforms are basic spatial-domain


operations that are useful for contrast enhancement, noise
smoothing, and in-plane geometric manipulation of image
contents.

While spatial-domain image transforms include


different types of spatial operations on the image pixels,
a subset of these transforms is also known as image
transformations.
Spatial-domain image transformations are simple geomet-
ric manipulations of an image, such as translation, scal-
ing, and rotation operations. These operations belong to
a set of useful operators for rigid-body image registration
known as Affine transforms.
An image can be translated by ( X 0 , Y0 , Z0 ) such that X ∗ =
X + X 0 , Y ∗ = Y + Y0 , and Z ∗ = Z + Z0 . Here, ( X , Y , Z ) is the
old image coordinate, while ( X ∗ , Y ∗ , Z ∗ ) is the new image
coordinate. These may be formally expressed in matrix
format:
 X
 
X∗ 1 0 0 X0  
  
Y ∗  = 0 1 0 Y0  Y 
Z
Z∗ 0 0 1 Z0
1
where X ∗ , Y ∗ , and Z ∗ are new coordinates. To make it a
square matrix, the above can be rewritten as follows:
X∗ 1 0 0 X0 X
    

Y  0 1 0 Y0  Y 
 ∗ =   
 Z  0 0 1 Z0   Z 
1 0 0 0 1 1
Thus, it is written as v∗ = Tv, where T is the translation
matrix.

Field Guide to Image Processing


Spatial-Domain Filtering 7

Image Scaling and Rotation

An image can be scaled by factors S x , S y , and S z . The


corresponding image scaling expression is given as

X∗ Sx 0 0 0 X
    
 Y ∗   0 S y 0 0  Y 
 ∗ =   
 Z   0 0 S z 0  Z 
1 0 0 0 1 1
Similarly, an image can be ro-
tated about coordinate axes as
shown. Rotation of an image may
involve:
Image rotation about the Z axis by an angle θ (affects X
and Y coordinates):
cos θ sin θ 0 0
 
− sin θ cos θ 0 0
Rθ = 
 0

0 1 0
0 0 0 1
Image rotation about the X axis by an angle α:
1 0 0 0
 
0 cos α sin α 0
Rα = 
0 − sin α cos α 0

0 0 0 1
Image rotation about the Y axis by an angle β:
cos β 0 − sin β 0
 
 0 1 0 0
Rβ = 
 
sin β 0 cos β 0
0 0 0 1
It is possible to have translation, scaling, and rotation
together as v∗ = R [s(Tv)], where the order is important. It
is also possible to have the inverse for all previous 4 × 4
matrices together, with inverse translation matrix T −1
as
1 0 0 −X0
 
0 1 0 −Y0 
T −1 =  
0 0 1 − Z0 
0 0 0 −1

Field Guide to Image Processing


8 Spatial-Domain Filtering

Point Processing

Point processing refers to a point-wise operation on


each pixel-intensity value of an image. Point-processing
operations can be applied on binary and grayscale im-
ages. Consequently, spatial-domain image transforms
can be of two forms based on image type:
• Binary image transforms involve operations on
binary-pixel intensity values.
• An image negative obtains pixel-wise negatives
of an image. It is usually suitable for binary-
image processing.
• Logical operations apply logical operators
such as AND, OR, XOR, and XNOR on binary-
image pixels. They are often suitable for binary-
image processing.
• Grayscale transforms involve point-wise opera-
tions on grayscale pixel-intensity values.
• Log transformation applies a pixel-wise loga-
rithm transform on an image using the follow-
ing: s = k log(1 + I ); where k is constant, and I is
the intensity of a pixel in an image with I ≥ 0.
• Power law (gamma) transformation applies
a pixel-wise gamma transform on an image using
the following: s = kI γ , where k and γ are positive
constants, and I is the intensity of a pixel in a
binary image with I ≥ 0.
• Bit-plane slicing and transformation opera-
tion works on individual bit-planes and applies
transformation on each plane separately.
• Point-wise image averaging obtains average
pixel values for a group of pixels within a mask
and replaces all the values within that mask
with the average pixel intensity. The resulting
image is a smoothed version of the original
image.
In general, spatial transforms provide a way to
access image information according to spatial frequency,
size, shape, and contrast. Spatial transforms operate
on different scales, such as local pixel neighborhood
(convolution) and global image (filters implemented in the
Fourier domain).

Field Guide to Image Processing


Spatial-Domain Filtering 9

Spatial-Domain Convolution Filters

Consider a linear space-invariant (LSI) system as


shown:

The two separate inputs to the LSI system, x1 ( m) and


x2 ( m), and their corresponding outputs are given as
x1 ( m) → y1 ( m) and x2 ( m) → y2 ( m)
Thus, an LSI system has the following properties:
• Superposition property
A linear system follows linear superposition.
Consider the following LSI system with two inputs
x1 ( m, n) and x2 ( m, n), and their corresponding outputs
y1 ( m, n) and y2 ( m, n).

The linear superposition is given as


x1 ( m, n) + x2 ( m, n) → y1 ( m, n) + y2 ( m, n)
• Space invariance property
Consider that if the input x(m) to a linear system is
shifted by M , then the corresponding output is also
shifted by the same amount of space, as follows:
x( m − M ) → y( m − M ).
Futhermore, h(m, n; m0 , n0 ) ∼ = T r [δ( m − m0 , n − n0 )] = h( m −
0 0
m , n − n ;0, 0), where T r is the transform due to the
linear system, as shown in the above figure. Hence,
h( m, n; m0 , n0 ) = h( m − m0 , n − n0 ); the system is defined
as LSI or as linear time invariant (LTI).
• Impulse response property
A linear space-invariant system is completely
specified by its impulse response. Since any input
function can be decomposed into a sum of time-
delayed weighted impulses, the output of a linear
system can be calculated by superposing the sum of
the impulse responses. For impulse at the origin, the
output is h(m, n;0, 0) ∼
= T r [δ( m − 0, n − 0)].

Field Guide to Image Processing


10 Spatial-Domain Filtering

Convolution

As a consequence of LSI properties, the output of


a linear shift-invariant system can be calculated by
a convolution integral since the superposition sum
simplifies to a convolution sum due to the shift-invariant
property. The output of an LSI system is given as
Z ∞
y( m) = f ( m, z) x( z) dz
−∞

Following the shift invariance of LSI, convolution is


obtained, given as
Z ∞
y( m) = f ( m − z) x( z) dz
−∞

Convolution describes the processing of an image within a


moving window. Processing within the window always
happens on the original pixels, not on the previously
calculated values. The result of the calculation is the
output value at the center pixel of the moving window. The
following steps are taken to obtain the convolution:
1. Flip the window in x and y.
2. Shift the window.
3. Multiply the window weights by the corresponding
image pixels.
4. Add the weighted pixels and write to the output
pixel.
5. Repeat steps 2–4 until finished.
A problem with the moving window occurs when it
runs out of pixels near the image border. Several ‘trick’
solutions for the border region exist:
• Repeat the nearest valid output pixel.
• Reflect the input pixels outside the border and
calculate the convolution.
• Reduce the window size.
• Set the border pixels to zero or mean image.
• Wrap the window around to the opposite side of
the image (the same effect produced by filters
implemented inthe Fourier domain), i.e., the circular
boundary condition.

Field Guide to Image Processing


Spatial-Domain Filtering 11

Linear Filters

If the filtering satisfies the superposition property, then


the filter is termed linear. This property of LTI/LSI states
that the output of the sum of two inputs is equal to the
sum of their respective outputs. Any digital image can be
written as the sum of two images:
• The low-pass (LP) image contains the large-area
variation in the full image; the variation determines
the global contrast.
• The high-pass (HP) image contains the small-area
variation in the full image; the variation determines
the sharpness and detail content.
• The LP- and HP-image components can be calculated
by convolution or by filters implemented in the
Fourier domain: image ( x, y) = LP ( x, y) + HP ( x, y).

A simple low-pass
filter (LPF) is a
three-pixel window
with equal weights:
[1/3, 1/3, 1/3]. A sam-
ple LPF is pro-
vided.
A complimentary
high-pass filter (HPF)
is given as
[−1/3, 2/3, −1/3] = [0, 1, 0] − [1/3, 1/3, 1/3]
Some of the properties of LPF and HPF include:
• The sum of the LPF weights is one, and the sum of
the HPF weights is zero.
• The LPF preserves the signal mean, and the HPF
removes the signal mean.
• LP-filtered images obtain the same mean but a
lower standard deviation than the original image
histogram.

Field Guide to Image Processing


12 Spatial-Domain Filtering

Linear Filters (cont.)

• HP-filtered images obtain zero mean and a much


smaller standard deviation than the original image
(near-symmetric histogram).

Statistical filters are similar to convolution filters


with a moving window, but calculation within the
window generates a statistical measurement. Examples
include:
• morphological filters
• mean filters
• median filters

For a median filter, output at each pixel is the median


of the pixel values within the window. An example with a
five-pixel moving window for a 1D image is provided:
• Input = 10, 12, 9, 11, 21, 12, 10, 10
• Median filtered output = . . . , 11, 12, 11, 11, . . .

This filter is insensitive to outliers and can therefore


suppress impulse (single pixel or line) noise. A median
filter is especially effective at removing impulse (salt-and-
pepper) noise in an image.
Another useful statistical filter is a mean filter. Similar
to the median filter, for a mean filter, output at each pixel
is the mean of the pixel values within the window.
Finally, morphological filters are based on mathemati-
cal morphology. Morphological filters are a set of tools that
are useful for extracting image components such as bound-
aries, skeletons, and convex hull. Convex hull H of an ar-
bitrary set S is the smallest convex set containing S .

Field Guide to Image Processing


Spatial-Domain Filtering 13

Gradient Filters
Gradient filters provide a way to find the image gradient
(derivative) in any direction. These operators are also
called edge detectors since they are good at picking up
edge information in an image.

200
150
100 0
50 10
0 20
10
20 30
30

In general, a gradient filter combines gradients in the


x and y directions in a vector calculation. The filter is
obtained using local derivatives in the x and y directions
for line (edge) detection. The gradient operator obtained
on image f ( x, y) is given as
Gx
· ¸
∆ f¯ =
Gy

The angle is given as


a( x, y) = tan−1 (G x /G y )

The gradient,
q
|∆ f | = G 2x + G 2y

and its approximation,


|∆ f | = |G x | + |G y |

are the basis for digital implementation of the derivative


operation.
Examples of a few commonly known first-order (1st
derivative) gradient filters include:
• Robert filters
• Sobel filters
• Prewitt filters

Field Guide to Image Processing


14 Spatial-Domain Filtering

Gradient Filters (cont.)

Consider a 3 × 3 image as follows:

Z1 Z2 Z3
Z4 Z5 Z6
Z7 Z8 Z9

The digital implementation of a sample Sobel edge


operator is
G x = ( Z7 + 2 Z8 + Z9 ) − ( Z1 + 2 Z2 + Z3 )
G y = ( Z3 + 2 Z6 + Z9 ) − ( Z1 + 2 Z4 + Z7 )

An example of an image and corresponding Sobel-edge


extracted image is provided.

Examples of a second-order (2nd derivative) gradient


filter include:
• Laplacian filter
• Laplacian of Gaussian (LoG)

The Laplacian edge operator is given as


δ2 f δ2 f
∆2 f = +
δ x2 δ y2
Corresponding digital implementation is given as ∆2 f =
4∗ Z5 − ( Z2 + Z4 + Z6 + Z8 ).
An LoG edge operator involves Gaussian smoothing of
noise and a subsequent second-derivative operation. Since
the second derivative is more sensitive to noise, low-pass
Gaussian smoothing is very helpful for noisy image edge
extraction.

Field Guide to Image Processing


Spatial-Domain Filtering 15

Histogram Processing

A histogram obtains the PDF of image pixels (intensity,


color, etc.). A histogram is usually obtained by plotting the
number of pixels with intensity levels at different range
intervals.
Let the gray level in an image be (0 to L − 1), which is given
by a discrete function as follows:
nk
p( I k ) = ; for k = 0, 1, . . . , L − 1
n
where I k is the kth gray level, n k is the number of pixels
with gray levels, n is the total number of pixels, and k is
the global appearance of an image.
A point operation is defined by a grayscale transfor-
mation function f (G ). Histogram equalization is the
process of applying a point operation on a known im-
age histogram such that the output image has a flat his-
togram (equally many pixels at every gray level). His-
togram equalization is useful when putting two images in
the same footing for comparison.
A point operation f (G ) is applied on input image A ( x, y)
to transform A ( x, y) into B( x, y). It turns out that the
point operation needed for such histogram equalization is
simply a cumulative distribution function (CDF).
Let us assume that given the histogram of A ( x, y) by
H A (G ), we want to find the histogram of B( x, y) expressed
as HB (G ). HB (G ) is the equalized histogram. An
example of histogram equalization is shown below.

Field Guide to Image Processing


16 Spatial-Domain Filtering

Histogram Processing (cont.)

Assume that the area of an image is A m , and the


maximum gray level is G . Then, the number of pixels at
each gray level is A m /G . The PDF of the image is
1
p(G ) = H (G )
Am
and the corresponding CDF is given as
Z G 1
Z G
P (G ) = p( s) ds = H ( s) ds
0 Am 0
so the target point-operation equation can be written as
f (G ) = G m P (G )

Therefore, CDF is the point operation needed for


histogram equalization.
Adaptive histogram equalization, however, refers to
a histogram equalization technique based on the local
statistics of an image. An image may have variable
intensity distribution, and the image quality may vary
considerably at different locations in an image. Therefore,
a global histogram equalization may not always improve
quality across the image at the same rate. In such cases,
divide the image into smaller subimages and obtain local
statistics such as standard deviation first. Using these
local statistics, variable histogram equalization can be
applied across the subimages. Finally, all of the equalized
subimages can be brought together into a single image
using an interpolation method to remove discontinuity
across the borderlines.
Histogram matching involves matching the histogram
of an image to a target histogram. Once the PDF of
an image is known, observation of this PDF may define
the target histogram to match. The design of the target
histogram requires careful observation of the problem at
hand to ensure that the resulting equalized histogram
actually improves the overall quality of the image.

Field Guide to Image Processing


Frequency-Domain Filtering 17

The Fourier Transform

Transforms are orthogonal expansions of a function or an


image. The advantages of transforms are
• simplified image-processing operations and
• powerful computer processing.

To apply this tool intelligently, one must know transforms


and their properties. A 1D Fourier transform (FT) of a
spatial function f ( x) is described mathematically as
Z ∞
F ( u) = f ( x) e− j2πux dx
−∞

where f ( x) must be
• continuous in the finite region and
• absolutely integrable.

An FT is also known as a spectrum or Fourier


spectrum. To return to the spatial (or time) domain, one
must perform an inverse Fourier transform:
Z ∞
f ( x) = F ( u) e j2πux du
−∞

An image is a 2D function; image processing must


therefore use a 2D Fourier transform, defined by
Z ∞ Z ∞
G ( u, v) = g( x, y) e− j2π(ux+v y) dx d y
−∞ −∞

Here, ( x, y) denote the spatial domain, and ( u, v) are the


continuous-frequency-domain variables.
Image processing is rarely done using equations,
but rather by using a computer. It is, therefore,
natural to define these functions in a discrete (or
digital) domain. However, these analytical forms lend
themselves to properties of Fourier transforms that lead
to various practical applications ultimately implemented
in computers.

Field Guide to Image Processing


18 Frequency-Domain Filtering

Discrete Fourier Transform

To calculate the Fourier transform using a computer, the


integration operation is converted to summation. Discrete
approximation of a continuous-domain Fourier transform
is referred to as a discrete Fourier transform (DFT):

−1
NX ³ ´
− j 2Nπ nk
F ( n) = fk e , n = 0 , 1, . . . , N − 1
k=0

Here, k and f k represent the time or space domain F (n),


and n represents the frequency domain variable.
The inverse DFT can be calculated from the DFT as
follows:

−1 ³ ´
1 NX j 2Nπ kn
f ( k) = Fn e , k = 0, 1, . . . , N − 1
N n=0

For images, the 2D form of the DFT is used:

−1 NX
NX −1 ³ ´
− j 2Nπ ( kn+ ml )
G ( n, m) = g k,l e ,
k=0 l =0
m = 0, 1, . . . , M − 1, n = 0, 1, . . . ., N − 1
NX
(
−1 NX −1 ³ ´ ) ³ ´
− j 2Nπ kn − j 2Nπ ml
= g k,l e e ,
k=0 l =0
m = 0, 1, . . . , M − 1, n = 0, 1, . . . ., N − 1

The above double sum can be evaluated by


• two 1D transformations along rows and columns
• faster and more efficient computer implementation
using the fast Fourier transform (FFT) developed
by Cooley and Tukey

Most image-processing languages have a built-in function


to implement the FFT.

Field Guide to Image Processing


Frequency-Domain Filtering 19

Properties of the Fourier Transform

The properties of the Fourier transform give rise to a


number of practical applications, including
• speeding up convolution and correlation
• analyzing optical systems
• understanding the effect of scaling, translation, and
rotation
• matched filtering and object identification

Linearity: The Fourier transform of the linear combina-


tion of any number of functions is the sum of the Fourier
transform of individual functions:
F { c 1 f 1 ( x, y) + c 2 f 2 ( x, y) + · · ·} = c 1 F1 ( u, v) + c 2 F2 ( u, v) + · · ·

where c 1 and c2 are arbitrary constants, and F denotes the


Fourier transform operation.
Scaling: When the size of an object increases, its spectrum
becomes smaller. The scaling property states that an
object scaled by factors a and b will lead to an inverse
scaling of the spectrum, namely
1 ³u v´
F { f (ax, b y)} = F ,
ab a b
The scaling property explains why pattern recognition
based on the Fourier transform is not scale invariant.
Correlation: Pattern recognition is based on the
correlation operation, which can be performed efficiently
in the Fourier domain. Correlation of two functions in the
Fourier domain is the product of the Fourier transform of
the first function and the complex conjugate of the FT of
the other:
F { f ( x) ⊗ g( x)} = F ( u)G ∗ ( u)

The case when the two functions are the same is termed
autocorrelation; when they are different, it is called
cross-correlation.

Field Guide to Image Processing


20 Frequency-Domain Filtering

Convolution and Correlation in the Fourier Domain

Correlation is defined in the spatial domain, and the


inverse Fourier transform must be performed to return to
the spatial domain. In the case of autocorrelation,
F { f ( x) ⊗ g( x)} = F ( u)G ∗ ( u)

reduces to
F { f ( x ) ⊗ f ( x ) } = | F ( u )| 2

The inverse of this intensity spectrum generates the


autocorrelation output:
f ( x) ⊗ f ( x) = F −1 |F ( u)|2

The process of multiplying a Fourier transform by F ∗


and then taking the inverse transform is called matched
filtering. This Fourier transform property of correlation
forms the basis of performing matched filtering in a
computer using FFTs.
Transform of a transform: Taking the transform of a
transform reproduces the original function with its axes
reversed:
F {F ( u, v)} = f (− x, − y)

Optically, a lens performs a Fourier transform on its input


at the Fourier plane. Interestingly, it actually performs
a forward transform. This property explains why an image
is flipped at the output when two lenses are used to image
an input. It is because two lenses perform two forward
transforms.
Convolution: The most famous property of the Fourier
transform is the transform of convolution property, which
states that the Fourier transform of a convolution of two
functions is obtained by simply multiplying the individual
Fourier transforms:
F { f ( x) ∗ g( x)} = F ( u)G ( u)

Field Guide to Image Processing


Frequency-Domain Filtering 21

More Properties of the Fourier Transform

The output of a linear space-invariant system is the


convolution of the impulse response (or Green’s
function used by physicists) for 1D signals, PSF for
2D signals, or the point-impulse response for 3D signals
incorporating both space and time, and the input. Using
the convolution property, the output is obtained by
• multiplying the input (signal or image) transform by
the system transfer function
• taking an inverse transform

The convolution property avoids tedious integration to


evaluate a convolution output.
Translation: Correlation or matched filtering (a
technique to detect signal in the presence of noise) is
famous for its convenience of shift invariance, which
means that if the input object is shifted, the correlation
output will appear at the corresponding location. This
is afforded by the translation property of the Fourier
transform, which states that when an object is translated,
its Fourier transform undergoes a phase shift in the
frequency domain:
F { f ( x − a, y − b)} = e− j2π(au+bv) F ( u, v)

The amplitude and location of the Fourier transform


remains unchanged. A dual property of this is the
frequency-domain translation, which is obtained when
the input is multiplied by a space-dependent phase factor:
F { f ( x, y) e− i2π(ax+b y) } = F ( u + a, v + b)

Product of two functions: This property allows one to


derive the spectrum of a sampled signal as the transform
of a product. The Fourier transform of a product in the
spatial domain is convolution in the transform domain:
F { f ( x ) g ( x )} = F ( u ) ∗ G ( u )

This property explains why the spectrum of a sampled


signal is represented by a periodic spectrum.

Field Guide to Image Processing


22 Frequency-Domain Filtering

Spectral Density

The spectral density or power spectrum of a function


f ( x, y) is defined as the Fourier transform of its
autocorrelation function:
Z ∞ Z ∞
S f f ( u, v) = R f f ( x, y) e− j2π(ux+v y) dx d y
−∞ −∞

The autocorrelation is thus related to the spectral density


by an inverse transform relationship:
Z ∞ Z ∞
R f f ( x, y) = S f f ( u, v) e j2π(ux+v y) du dv
−∞ −∞

The cross-spectral density between two signals is


Z ∞ Z ∞
S f g ( u, v) = R f g ( x, y) e− j2π(ux+v y) dx d y
−∞ −∞

If g is related to f ( x, y) through convolution


Z ∞ Z ∞
g( x, y) = f ( x − a, y − b) h(a, b) da db
−∞ −∞

then one can show that, in the Fourier domain,

S g g ( u, v) = S f f ( u, v) | H ( u, v)|2

when g( x, y) involves an additive noise term


Z ∞ Z ∞
g( x, y) = f ( x − a, y − b) h(a, b) da db + n( x, y)
−∞ −∞

If f and n have zero mean and are uncorrelated,

S g g ( u, v) = S f f ( u, v) | H ( u, v)|2 + S nn ( u, v)

Field Guide to Image Processing


Frequency-Domain Filtering 23

Properties of the Discrete Fourier Transform

The discrete Fourier transform shares many of the


same properties as the continuous Fourier transform, but
because of the finite nature of the sampling signal, it has
additional properties.
Linearity: The discrete Fourier transform of the weighted
sum of any number of functions is the sum of the DFT of
the individual functions:

DFT{ c 1 f 1 (k, l ) + c2 f 2 (k, l ) + · · ·} = c1 F1 (m, n) + c2 F2 (m, n) + · · ·

where c1 and c2 are arbitrary constants, and F1 =


DFT{ f 1 (k, l )}, etc.
Separability: This property provides a faster method of
calculating the 2D DFT than the direct method provides.
Mathematically,

−1 NX
NX −1 ³ ´
− j 2Nπ ( kn+ ml )
F ( n, m) = f k,l e ,
k=0 l =0
n = 0, 1, . . . ., N − 1; m = 0, 1, . . . , M − 1
NX
(
−1 NX −1 ³ ´ ) ³ ´
− j 2Nπ kn − j 2Nπ ml
F ( n, m) = f k,l e e ,
l =0 k=0
n = 0, 1, . . . ., N − 1; m = 0, 1, . . . , M − 1
−1
NX ³ ´
− j 2Nπ ml
F ( n, m) = F ( n, l ) e
l =0

where

−1
NX ³ ´
− j 2Nπ kn
F ( n, l ) = f k,l e
k=0

Thus, a 2D FFT can be calculated by a row transform of


the original image followed by a column transform of the
resulting row transform.

Field Guide to Image Processing


24 Frequency-Domain Filtering

Properties of the Discrete Fourier Transform (cont.)

Periodicity: Both DFT and its inverse (or the original


signal) are periodic with period N :

F ( n) = F ( n + N )

for all n, and

f k = f k+ N

for all k. In 2D,

F ( n, m) = F ( n + N, m + N )
= F ( n + N, m)
= F ( n, m + N )

The periodicity evolves from the periodic property of


complex exponentials. Additionally,
• the spectrum of a discrete signal is periodic, a fact
that becomes evident when the DFT of a signal is
plotted
• the inverse of the DFT, the space- (time) domain
signal, consequently becomes a periodic extension of
the original signal

Real signals: For a real signal, the DFT has additional


symmetry. The real part of a DFT is evenly symmetric, and
the imaginary part is oddly symmetric. As a consequence,
the magnitude of the spectrum is evenly symmetric and
the phase is oddly symmetric. For a real 2D image, the
magnitude spectrum is symmetric in both the x- and y-
axes when the origin is assumed to be the center of the
image. Thus, for a real 2D image, specifying only a quarter
of the spectrum is enough to describe the image. However,
when computing a 2D FFT using a computer, the origin
is assumed to be at the lower left corner. To align the
spectrum origin to the center of the image, MATLAB®
functions like fftshift rearrange the computed 2D DFT to
the four quadrants using this symmetry.

Field Guide to Image Processing


Frequency-Domain Filtering 25

Properties of the Discrete Fourier Transform (cont.)

Translation: When a signal is translated, the correspond-


ing DFT undergoes a linear phase shift:

f ( k − k 0 , l − l 0 ) ⇔ e−( j2π/ N )(k0 m+l 0 n) F ( m, n)

This property also shows that shifting the image does


not change or move the magnitude spectrum but only
affects the phase spectrum.
Rotation: Rotating the image in the spatial domain
rotates the Fourier spectrum by the same amount.
Scaling: Scaling an image has an opposite effect. If the
image is scaled so that it is expanded, its spectrum will
shrink:

1
f (ak, bl ) ⇔ F ( m/a, n/ b)
|a.b|

Modulation:³ When
´ a signal is multiplied by a complex
j 2π
( m 0 k)
exponential e N
, its DFT is shifted by m 0 units:
³ ´
j 2π
( m 0 k+ n 0 l )
f ( k, l ) e N
↔ F (m − m0 , n − n0 )

One such application involves multiplying the 2D signal


by a phase function so that the 2D DFT is shifted and the
origin is at the center of the Fourier plane
µ ¶
M N
f ( k, l )(−1)k+l ↔ F m − , n −
2 2

where m = 0, 1, . . . , M − 1 and n = 0, 1, . . . , N − 1.
Time/space reversal: When the signal is reversed in the
time/space domain, the DFT of the reversed signal is given
by

f (− k, − l ) ↔ F (− m, − n) = F ( M − m, N − n)

Field Guide to Image Processing


26 Frequency-Domain Filtering

Discrete Correlation and Convolution

For a linear space-invariant system, the output is the


convolution of the input with the system response
(impulse response or PSF). The Fourier transform of the
system function is the transfer function. According to the
Fourier theory of convolution, the output transform is
simply the product of the input transform and the system
transfer function. To find the output image, an inverse
transform of the product must be performed:
f ( x)∗ g( x) = F −1 {F ( u)G ( u)}

Implementation of the above operation in the discrete


domain results in a circular convolution, not a linear
convolution:
f ( k) ∗eg( k) = DFT −1 {DFT( f )DFT( g)}

To extend this to discrete correlation,


f ( k) ⊗ g( k) = DFT −1 {DFT[ f ] con j [DFT( g)]}

Discrete convolution (or correlation) in the time (space)


domain of N samples each produces a convolved signal
that is 2 N − 1 samples. In convolution using DFT, N
samples in the time domain will produce a DFT that
is N samples long. The product is therefore N samples.
However, the inverse of the product of the DFTs is also
N samples long instead of 2 N − 1 samples. This result is
therefore not the linear convolution of two signals, but is
known as circular convolution.

Field Guide to Image Processing


Frequency-Domain Filtering 27

Circular Convolution and Zero Padding

Assume two signals, f =


{1, 2, 3, 4} and g= {5,6,7,8}.
Circular convolution
can be demonstrated by
displaying samples on
a circle, rotating the
reverse-ordered sample
in the forward direction,
and summing the over-
lapped signal.
The result of the cir-
cular convolution opera-
tion is
h(0) = f (0) g(0) + f (1) g(3) + f (2) g(2) + f (3) g(1) = 1 × 5 + 2 × 8
+ 3 × 7 + 4 × 6 = 66
h(1) = f (0) g(1) + f (1) g(0) + f (2) g(3) + f (3) g(2) = 1 × 6 + 2 × 5
+ 3 × 8 + 4 × 7 = 68, etc.

The same result can be obtained by taking the DFT of


each signal, multiplying them, and then taking an inverse
transform.
In order to obtain linear convolution from a circular
convolution (or DFT), each signal must be made 2 N − 1
samples long. The method of extending signals by adding
zeros is known as zero padding. If three zeros are added
to each of the signals and then a circular convolution
is performed, the result is the same as that of a linear
convolution.
The rule to obtain convolution or correlation using DFTs
involves extending the signals by adding zeros at the end.
Practically, zero padding is done by doubling the length
of each signal. For a signal that is 2D, this actually means
making an image 4× its original size, since the convolution
output will become 2 N − 1 in both x and y directions, i.e.,
(2 N − 1) × (2 N − 1).

Field Guide to Image Processing


28 Frequency-Domain Filtering

Matched Filtering

Matched filtering is a technique to detect a signal in


the presence of noise while maximizing the signal-to-
noise ratio (SNR). It is accomplished by correlating the
signal plus noise with a template of the signal itself. When
performing matched filtering or any kind of filtering
between an image and a template, the template and the
image should be appropriately zero padded. If this is not
done, circular correlation will result, causing the tails of
the correlation plane to wrap into the peak region of the
output plane.
• When two images are of the same size, zero padding
means making an image 4× its original size by simply
copying zeros equal to the number of pixels in the
image plane to its three sides.
• When the template is of smaller size ( M ), then both
images need to be made to the M + N − 1 size.

Before performing a DFT, both images must be of the same


size so that their DFTs can be multiplied correctly.
This method also applies to joint transform correla-
tion, where the correlation between two images is done
by displaying them side-by-side, taking a Fourier trans-
form with a lens of the sum of two images and squaring
it with a photodetector, and subsequently taking a second
Fourier transform. This squaring trick generates an FG ∗
type of term, the inverse transform of which results in cor-
relation terms. In this setup, there must be sufficient zero
padding between the two images and the outside so that
the cross-correlation terms do not overlap with the auto-
correlation terms.

Field Guide to Image Processing


Frequency-Domain Filtering 29

Filtering with the Fourier Transform

The filtering operation refers to the method of modifying


the frequency content of an image. The theoretical basis
of Fourier-domain filtering is the convolution property
of the Fourier transform. Thus, the image Fourier
transform is multiplied by the filter transfer function,
and then an inverse transform of the product is taken to
produce the frequency-filtered image.

G ( u, v) = F ( u, v) H ( u, v) Convolution Property
g( x, y) = F −1 {F ( u, v) H ( u, v)}

For a specific application, the linear filter H ( u, v) is


designed to perform the desired operation. Two specific
examples follow:
• When an image has random- or high-frequency noise,
a low-pass filter can enhance the image by reducing
the high-frequency noise.
• When the high-frequency edges or components need
to be enhanced, a high-pass filter can be employed to
suppress the low-frequency information and enhance
the high-frequency edges.

(a) (b)

(c)

Field Guide to Image Processing


30 Frequency-Domain Filtering

Low-Pass and High-Pass Filtering

A typical low-pass filter will have a nonzero value for


a range of frequency and zero outside that range. For
example,
H ( u, v) = 1 for u2 + v2 ≤ r 20
= 0 otherwise
A high-pass filter, therefore, can be just the opposite:
H ( u, v) = 0 for u2 + v2 ≤ r 0
= 1 otherwise

One can imagine a filter


with a smoother transi-
tion. The characteristics
of a low- or high-pass
filter are known as the
cutoff frequency—the
frequency at which the
filter exhibits the high-
to-low (low-to-high) tran-
sition. They allow us to
compare the difference between various filters. When the
DFT is used in a computer, implementing a low- or high-
pass filter is relatively simple:
H ( i, j ) = 1 when i 2 + j 2 ≤ r 2c , otherwise it equals 0. Then,
an inverse transform will produce the low-pass filtered
image.
Another well-known case where a low-pass filter is useful
occurs when retrieving a signal from its sampled form.
The spectrum of a sampled signal is the original signal
repeated at the interval of the sampling frequency, as
shown in (a). In order to retrieve the original signal, as
shown in (b), the spectrum centered at the origin must
be selected using a low-pass filter. The retrieving filter
is a simple mathematical expression, and it is simpler to
realize this sharp filter in the digital frequency domain
than in the analog frequency domain.

Field Guide to Image Processing


Frequency-Domain Filtering 31

Sampling

Digital processing of signals requires that an analog signal


or image be sampled first before converting it to digital
form. The Nyquist criterion governs how many samples
must be taken in order to faithfully represent an image
or signal. For a sinusoidal wave, at least two samples
per period are needed to crudely approximate a signal. In
other words, the Nyquist criterion states that for perfect
reconstruction of a band-limited signal (which means
there is a finite maximum frequency present in the signal),
the sampling frequency must be at least two times the
highest frequency. The effects of sampling an image can
be readily observed by the human eye as the sampling rate
is reduced. For example, take a high-resolution image and
digitize it. When the image size is reduced from 512 × 512
to 32 × 32, the loss of information content or the quality of
the deterioration is readily evident. Reducing the number
of samples leads to false contours among other artifacts.
Another aspect of sampling involves the number of
quantization levels, the reduction of which can reduce
the information content of an image. In a digital camera,
the most advertised specification is the size of the image,
which relates to the spatial sampling (a 1-megapixel
camera has 1000×1000 pixels). The quantization levels are
not readily noticeable. However, the combination of both
quantization levels and number of samples determines
the amount of memory required to store an image. For
example, an 8-bit black-and-white image of 1024 × 1024
size will be 1024 × 1024 × 8 = 210 × 210 × 8 = 1 MB. If it
were in color, there would be three times as many bits
per pixel, so the size would jump to 3 MB. Note that
210 = 1 K, 220 = 1 M, etc. The spatial resolution of a digital
image thus depends on these two factors: the number of
samples and the number of bits (tonal resolution) used to
represent each sample. Resolution indicates the quality
of a digital image. Spectral resolution may indicate the
number of colors, and the temporal resolution of a video
designates frames/sec.

Field Guide to Image Processing


32 Frequency-Domain Filtering

Spectrum of a Sampled Signal

In real life, an image or signal may not be band-limited.


The effect of sampling or undersampling can be best
understood when sampling is analyzed in the frequency
domain. Graphically, the sampled signal and its spectrum
give us an intuitive understanding of
• the role of the Nyquist theorem
• the method of recovering the original signal
• the effect of sampling frequency

Spectrum of a sampled signal: Consider a continuous


signal as shown in (a). Assume that its spectrum is band-
limited, as shown in (b). Ideal sampling is equivalent
to multiplication by a series of impulse functions, as
shown in (c). Multiplication in the time (space) domain
is equivalent to convolution in the frequency domain.
If T is the sampling period, then the spectrum of the
sampling function is another impulse function with period
1/T (not shown). Convolving with this spectrum, the band-
limited spectrum of the continuous-domain signal becomes
a series of spectra spaced at an interval of 1/T = f for the
sampled signal, as shown in Fig (d).

Field Guide to Image Processing


Frequency-Domain Filtering 33

More Sampling

The Nyquist theorem and aliasing can be understood by


a simple analysis of the spectrum of the sampled signal.
Assume that the bandwidth of the signal is 2B, where
the spectrum extends from −B to +B. In order for the
two spectra to not overlap, f ≥ 2B, which means that the
sampling frequency must be greater than or equal to two
times the bandwidth.
In order to recreate the original signal from the sampled
signal, the signal must be passed through a low-pass filter
with bandwidth equal to or greater than bandwidth B. As
long as the Nyquist criterion is maintained, recovery of
the original signal is guaranteed.
When this criterion is not satisfied, the high-frequency
component superimposes with the spectrum of the original
signal so that the original signal gets distorted, making it
impossible to retrieve the original signal using low-pass
filtering. This condition is known as aliasing.
When calculating the DFT of a sampled signal, it is
implied that the time domain signal is periodically
extended, as shown in (a). This is equivalent to convolving
by a series of impulse functions. As a result, the periodic
spectrum of the sampled signal gets multiplied by another
series of impulse functions in the frequency domain. The
spacing of the impulse function is related to the periodicity
of the time-domain samples. Thus, the DFT is nothing
but samples of the continuous-domain spectrum repeated
periodically, as shown in (b).

Field Guide to Image Processing


34 Frequency-Domain Filtering

Spectrum of a Finite Periodic Signal

Single pulse: Assume a rectangular pulse as shown in


(a). Its spectrum is a sinc function [sinc( x)/ x], as shown
in (b).

Infinite periodic series: If a periodic signal, as shown


in (c), is created by repeating the pulse, this case is
equivalent to convolving with a series of impulse functions
(remember that in sampling, it was a multiplication). The
frequency domain will be modified by a multiplication with
the spectrum of this impulse series.
Spectrum: The spectrum of the periodic function
will be a series of impulses that have an envelope
like the sinc function, as shown in (d). As a result,
the spectrum of this periodic function is discrete.
Truncated periodic series: Assume that instead of an
infinite series, the periodic function is truncated at a width
A , as shown in (e). This is equivalent to multiplying by a
huge rectangular function.
Spectrum: The spectrum will be modified by con-
volution with the spectrum of this huge rectangular
function, which is a narrow sinc function of width
1/ A . Imagine erecting a narrow sinc function at the
location of the impulse function in the Fourier do-
main, as shown in (f).

Field Guide to Image Processing


Frequency-Domain Filtering 35

Spectrum of a Finite Periodic Signal (cont.)

Mathematically, the single rectangular pulse is given as


rect( x/a) ↔ sinc(au)
The periodic rect function is expressed as a convolu-
tion with a series of impulse functions:
X X
rect( x/a) ⊗ δ( x − nb) ↔ sinc(au) · δ( u − n/ b)
The Fourier transform is equivalent to a series of impulses
the magnitude of which is enveloped by a sinc function,
sinc(au):
£ X ¤
rect( x/a) ⊗ δ( x − nb) rect( x/ A ) ↔
X
[sinc(au) · δ(u − n/b)] ⊗ sinc( Au)
Limiting the number of the rect series by a huge rect with
width A produces a very narrow sinc function sinc( Au)
convolving the series of delta functions that are under
another sinc envelope, sinc(au). Thus, the delta functions
acquire a certain width due to this narrow sinc convolving
with it.
Assume that the rectangular series is replaced by a
triangular function:
tri( x/a) ↔ sinc2 (au)
This will thus change the envelope of the spectrum
from sinc to sinc2 . The spectrum shown in (d) on the
previous page will be modified by the envelope. However,
a finite width on the series will have the same effect of
erecting a sinc at the bottom of the impulse functions.
Mathematically, the change of the function sinc( Au) to
sinc2 ( Au) in the above equation will be the spectrum for
the finite triangular series.
When such a pulse train is passed through a communi-
cation channel with finite bandwidth (or a lens with fi-
nite aperture), the spectrum will be truncated. The result-
ing rectangular pulse train will appear smoothed at the
edge as the high-frequency content is minimized by the
bandwidth. This effect explains why imaging with a finite-
aperture lens causes degradation in the fine details of an
image because the aperture acts as a low-pass filter.

Field Guide to Image Processing


36 Image Restoration

Image Restoration

A real-world image-acquisition system may suffer from


degradations caused by
• detectors
• relative velocity
• defocus
• noise
• illumination, such as stray light
• vibration

The goal of the image restoration is to reverse the


effect of degradation and noise. A typical degradation
model shows image f ( x, y) being degraded by noise and a
degradation function:

The goal of restoration is to find f ( x, y) given m( x, y). A


restoration involves three steps:
1. modeling degradation
2. removing noise
3. removing the system-induced effect

The first step of restoration is to model the degradation,


which can be either linear or nonlinear. The noise
added to the output can be additive or multiplicative.
Restoration can be performed in either the frequency or
spatial domain.
While additive noise can be removed by linear filtering, a
multiplicative noise may require a nonlinear process, such
as homomorphic filtering. In this process, a multiplicative
filter is converted to an additive noise by performing a
logarithm on the image intensity.

Field Guide to Image Processing


Image Restoration 37

Linear Space-Invariant Degradation

When a system is linear, it obeys superposition, which


implies that the output can be calculated as the
superposition sum of elementary responses (impulse
response) weighted by the input. Space or time
invariance implies that the output is independent of the
position or time in which it is applied. Combining the two
conditions of linearity and position (or time) invariance
allows one to express the output of a linear space-
invariant system in terms of a convolution integral.
For a time-domain system, impulse response h( t) is the
response to an impulse function δ( t). Equivalently, for an
optical system, the PSF is the response to a point source
of light. The PSF is generally nonnegative, whereas the
impulse response can have negative values.
For a spatial-domain, linear, space-invariant continuous
system, the output can be expressed in terms of its PSF
through a convolution integral:
Ï
g( x, y) = f (α, β) h( x − α, y − β) d α d β
−∞
When noise is added,
Ï
m( x, y) = f (α, β) h( x − α, y − β) d α d β + n( x, y)
−∞
An extension of the PSF includes the variable of time and
is called the point impulse response. Thus h( x, y, t) has
the variable of time and space and can be defined as the
response to a point source of light that is valid for an
infinitesimal duration of time.
Linear space-invariant system approximation facilitates
solving image restoration problems using FT and DFT-
based computational tools. In the Fourier domain, the
output will be
M ( u, v) = H ( u, v)F ( u, v) + N ( u, v)
where H (u, v) and F ( u, v) represents the Fourier transform
of m and h, respectively, and output m can be evaluated
using the DFT.

Field Guide to Image Processing


38 Image Restoration

Discrete Formulation

The discrete form of the degradation model is expressed


as m = H f + n, where m is the given output image, H is the
degradation model, n is noise, and f is the input image
to be restored. For the moment, consider everything to be
in one dimension. The discrete convolution g = H f can be
expressed in matrix notation in the form of a circulant
matrix:
 
h(0) 0 0
 h(1) h(0) 0 
H =
 0 h(1) h(0)

0 0 h(1)
In a circulant matrix, rows are the reverse of the impulse
response, and each row is a shifted version of the row
above.
If f has two elements, then one element can be worked
out by hand to see how the above matrix–vector product
leads to a convolution. When f is 2D, it becomes a huge
matrix that is complicated to visualize. A 2D extension of
a circulant matrix is known as a block circulant matrix.
To avoid circular convolution, the matrix (image) needs to
be zero padded.
The least-squares filter in the Fourier domain assumes a
transformation matrix A such that M = [A] m, and the least
square estimate fˆ is obtained from a restoration filter P
such that
fˆ = [A]−1 [P][A] m

where P is given by
[Sff ][H]∗T
[P] =
([H][Sff ][H]∗T + [Snn ])
which is similar to the Weiner filter for the continuous
domain.
Here, the correlation matrices [Rxy ] = E [XYT ] and the
spectral density [Sxy ] = [A][Rxy ][A]−1 .

Field Guide to Image Processing


Image Restoration 39

Algebraic Restoration

Unconstrained restoration: The unconstrained least-


squares approach is an algebraic technique to find the
original image f given a degraded image m and some
information about H and n. This approach is aimed at
minimizing the difference

k nk2 = k m − H fˆk2

which can be shown to be

fˆ = H −1 m

Since fˆ is chosen only to minimize km − H fˆk2 without


any other constraint, it is known as unconstrained
minimization. Unconstrained restoration leads to the
inverse-filter formulation.
Constrained restoration: The process of minimizing a
function kQ fˆk2 , where Q is a linear operator on f , given
the constraint

k m − H fˆk2 = k nk2

The Lagrange-multiplier-based technique adds this quan-


tity and calls it

J ( f ) = (k m − H fˆk2 − k nk2 + γkQ fˆk2 )

where γ is the Lagrange multiplier. Differentiating with


respect to fˆ and setting the expression to zero yields

fˆ = ( H 0 H + γQ 0 Q )−1 H 0 g

where H 0 (Q 0 ) denotes the transpose of H (Q ). Depending on


the transformation matrix Q , least-squares filters such as
the Weiner filter can be derived from the above equation.

Field Guide to Image Processing


40 Image Restoration

Motion Blur

Motion blur occurs when there is relative motion


between an object and the recording device or the
transmitting medium (atmosphere) and can be
• uniform
• non-uniform
• vibratory

The recording medium can be


• real time, such as night-vision goggles, camera, etc.
• permanent, such as photographic film

When a point source of light is recorded as a point, the


system exhibits no distortion. If an imaging system has
a defocus, then the point source will be recorded as the
PSF of defocus. If the point of light is going through a
turbulent atmosphere, there will be a phase distortion.
Some of these distortions can be mitigated in hardware,
such as in an adaptive optics system, or in software,
such as by modeling the distortion and performing image
restoration. The output of a space-invariant display
system with relative motion can be expressed by the
following system model:
G ( u, v) = D ( u, v)S ( u, v)F ( u, v)
where G (u, v) is the output spectrum, D ( u, v) is the
dynamic transfer function, S (u, v) is the static transfer
function, and F (u, v) is the input spectrum.
D ( u, v) can be estimated from the response of a point
source to the recording medium. When recorded on a
moving film, the point will be recorded as a line. If
recorded on a phosphor display such as night-vision
goggles, then the recording will be a gradually decaying
line. If the camera is on a moving platform, it may be
subject to additional vibration, and thus have a blurring
function due to vibration.
For a uniformly moving camera, a delta function will be
recorded as a rect function:
δ( t) ⇒ rect(au)

Field Guide to Image Processing


Image Restoration 41

Motion Blur (cont.)

A Fourier transform of the rect function generates a


model of the distortion:
D ( u, v) = sinc(au) e− jπua

Because the sinc function has zeroes, an inverse filter


cannot completely restore the image.
The other approach to restoration is deconvolution,
where the blurring function has performed a convolution.
A blind deconvolution is an attempt to reconstruct the
original image by starting with an estimate of the blur,
such as from the Fourier-domain zeroes.
The decay of a phosphor when excited by an input δ( t) is
e− t/τ . When an input δ( x, y, t) moving by a uniform velocity
k, the transfer function of the blurring function is denoted
by
1
e − t /τ →
1 + j 2π f τ
If the motion is an arbitrary angle Θ, then f is set to
um + vn, where u = k cos Θ and v = k sin Θ, and where m
and n are the unit vectors in the u and v directions,
respectively.

The image on the left is distorted by a vertical speed of


15 cm/sec, as shown on the right. Since the D ( u, v) does
not have a zero, it can be restored by an inverse filter.
Note that if the recording device is a CCD camera, then
the degradation will be modified.

Field Guide to Image Processing


42 Image Restoration

Inverse Filtering

Ignoring the noise term


G ( u, v) = H ( u, v)F ( u, v)

to obtain
· ¸
G ( u, v)
f ( x, y) = F −1
H ( u, v)
1
is known as the inverse filter, since it is expressed
H ( u,v)
as the inverse of the degradation function.

The picture on the left has been degraded by a velocity of


15 cm/sec; on the right, it is restored by an inverse filter
corresponding to 12 cm/sec.
Since
1
H ( u, v) =
1 + j 2π f τ
where f = um + vn is the speed vector, the inverse filter is
simply 1 + j 2π f τ. The restoration filter can be estimated
by substituting appropriate speeds along the x( u) and y(v)
axes, where τ is in the range of milliseconds.
The magnitude of the atmospheric turbulence can be
modeled as
H ( u, v) = exp[− c( u2 + v2 )5/6 ]

The phase distortion caused by the atmosphere can be


corrected by adaptive-optics techniques.

Field Guide to Image Processing


Image Restoration 43

Weiner Least-Squares Filtering

To avoid the problems of an inverse filter, a restored


image fˆ( x, y) is sought that minimizes some form of
difference between the restored image and the original,
undegraded image f ( x, y). Assume system function p( x, y)
such that it generates fˆ( x, y) from m( x, y):

fˆ( x, y) = m( x, y)∗ p( x, y)

while minimizing E {[ f ( x, y) − fˆ( x, y)]2 }. The restoration


filter, called a Weiner filter, is given by

H ∗ ( u, v)S f f ( u, v)
P ( u, v) =
S f f ( u, v)| H ( u, v)2 | + S nn ( u, v)
 
1  | H ( u, v)|2 
=
H ( u, v) | H ( u, v)2 | + S nn (u,v)
S f f ( u,v)

The Weiner filter expresses the restoration filter in terms


of spectral density, which can be obtained from the Fourier
transform of the correlation matrices. Note that
• When S nn ( u, v) = 0, the above becomes an inverse
filter.
• If noise is white, S nn ( u, v) = a = S nn (0, 0).
• When S nn ( u, v)/S nn ( u, v) → small, this expression leads
to an inverse filter.
• When S nn ( u, v)/S nn ( u, v) → large, P ( u, v) → 0.
• When no characteristics of the noise are known—
assuming the noise-to-signal power density ratio is
frequency independent—the ratio can be set to a
constant, [S nn ( u, v)]/[S f f ( u, v)] = Γ, where Γ can be
modified to find the best restored image:
" #
1 | H ( u, v)|2
P ( u, v) =
H ( u, v) | H ( u, v)2 | + Γ

Field Guide to Image Processing


44 Segmentation and Clustering

Image Segmentation and Clustering

The objective of image segmentation is to obtain a


compact representation from an image, motion sequence,
or set of features. Segmentation usually supports specific
applications.
Clustering, on the other hand, is the grouping of features
that belong together. Currently, there is no broad theory
available for segmentation and clustering.
Some of the most commonly used image-segmentation
techniques include:
• Thresholding
• Region growing
• Hough transform

Thresholding involves setting a specific threshold and


keeping or removing features above and below this
threshold value T . These features include
• intensity
• reflectance
• luminance
• texture
• color or
• other appropriate image characteristic variables

For example, a simple model of an image f ( x, y) is given as


f ( x, y) = r ( x, y) i ( x, y)
where r( x, y) is reflectance, and i ( x, y) is the luminance
features.
Image thresholding usually requires hard threshold or
soft (optimal) threshold.
If the PDF of any of the features listed above is bimodal,
then a simple hard threshold can remove unwanted
components.
The following figure shows a segmented image where
threshold T = 0.7 for the intensity feature.

Field Guide to Image Processing


Segmentation and Clustering 45

Image Segmentation and Clustering (cont.)

If the PDF for a feature—e.g., intensity—does not show


a bimodal histogram, the thresholding operation becomes
nontrivial. Since an image contains bright (foreground)
and dark (background) regions with Gaussian PDFs, a
soft-threshold value can be obtained by taking the average
of the mean values of the two PDFs. Such a threshold is
known as the optimal threshold: the error is minimized
for separating the background from the foreground. Soft-
and hard-thresholding processes yield binary images.
Region growing involves identification of a “seed” point
or set of points and growing regions surrounding these
seeds based on common properties of features such as
intensity, texture, and color. Regions can be grown using
a set of criteria. Let R represent the entire region;
segmentation of R into n subregions R1 , R2 , . . . , R n is
obtained using the following criteria:
Pn
• i =1
R i = R (i.e., segmentation must be complete)
• R i, i = 1, 2, . . . , n (is a connected region)
• R i ∩ R j = 0; for all i and j , for i = j the regions are
disjoint
• P (R j ) = TRUE for i = 1, . . . , n (i.e., all pixels in R j have
the same intensity)
• P (R i ∩ R j ) = FALSE for i = j (i.e., R i and R i are
different)

These criteria can be developed based on the application.


Some of the challenges for implementing robust region-
growing techniques include devising appropriate criteria
and selecting appropriate seed points.

Field Guide to Image Processing


46 Segmentation and Clustering

Hough Transform

Given a set of collinear edge points, there are an infinite


number of lines passing through it. These lines can be
obtained by varying the slope m and the intercept n. All
of these lines can then be represented as a single line in
parameter space. The parameter space is represented
by (m, n) in the following figure. The Hough transform
helps convert lines in image space to points in parameter
space.

Here are the steps to obtain the Hough transform:


1. In the image space, fix ( m, n) and vary ( x, y). This will
generate multiple different lines with the same slope
and intercept similar to y = mx + n.
2. In the parameter space, fix ( x, y) and vary ( m, n). This
will generate multiple lines through a point.
3. At each point of the (discrete) parameter space, count
how many lines pass through the point. The higher
the count, the more edges are collinear in the image
space.
4. For simultaneous detection of multiple lines, search
for all local maxima of the counter array.
5. Find a peak in the counter array by thresholding.
This is a “bright” point in the parameter space.
6. The intersecting points in the parameter space show
up as a cluster of points due to the discrete-point
nature of the lines in the image space.

Field Guide to Image Processing


Segmentation and Clustering 47

Hough Transform (cont.)

For digital implementation, all of the lines and points are


quantized and displayed, as shown in the figure below.

The simple definition of the Hough transform does not


work for cases where the slope of the line is −∞ < m < ∞.
The parameter space is infinite for this case. (The value
for n can have a similar problem.)
To keep the parameter space finite, a polar representa-
tion is adopted as follows:
ρ = x cos θ + y sin θ
where ρ is the distance between the image origin and the
line, and θ is the line orientation. Notice that an image
point is now represented by a sinusoid, not a line, in
parameter space.

The next figure shows


the polar representation
of the Hough transform.
The resulting polar pa-
rameter space is pro-
cessed using steps sim-
ilar to those discussed
above.

Field Guide to Image Processing


48 Segmentation and Clustering

Clustering

Clustering is the collecting of features that “belong


together.” Given the set of features detected in an image,
the task is to decide which groups of features are likely to
be part of same object, without knowing what objects we
are looking at. The feature set may include, among others:
• intensity
• color
• Fourier coefficients
• wavelet filter coefficients

Clustering yields segmentation of image content.


Clustering can be primarily of two types:
• Agglomerative clustering attachs each feature
to the closest cluster, iteratively yielding larger
cluster(s). The intercluster distance can be used to
fuse nearby clusters.
• Divisive clustering iteratively splits cluster(s)
along the best boundary. The intercluster distance
can also be used here to fuse nearby clusters.
Some of the options for point (feature) cluster distance
measures are as follows:
• Single-link clustering is obtained by computing the
closest (Euclidean) distance between elements. This
measure yields “extended” clusters.
• Complete-link clustering is obtained by computing
the maximum distance between an element of the
first cluster and one of the second. This measure
yields “rounded” clusters.
• Group-average clustering is obtained by comput-
ing the average of distances between elements in the
clusters. This measure yields “rounded” clusters.
The selection of an appropriate number of clusters is
an open research issue. There are different model-based
techniques to generate the number of clusters. It is
customary to start with heuristics and gradually refine the
number of clusters.

Field Guide to Image Processing


Segmentation and Clustering 49

Clustering (cont.)

The hierarchy in clustering is usually shown using


dendrograms.

K-means clustering is one of most basic agglomerative


clustering techniques. In this technique,
1. Choose a fixed number of clusters
2. Obtain the cluster centers
3. Allocate each point (feature) to the nearest cluster by
minimizing the point–cluster distance as follows:
( )
X X 2
kx j − µxi k
i ∈ clusters j ∈ elements of i ’th cluster

where x is the new point to be associated to a cluster


with mean µ.
The graph theoretic clustering approach is another
useful technique that has been used to segment natural
images. This clustering involves the following elements:
• Features are represented using a weighted graph.
The weights of the graph are obtained by measuring
affinity among the features.
• There are different affinity measures for different
types of features used for clustering. Affinity
measures using the intensity of the image may
involve computing Euclidean distances between
features.
• The affinity matrix can be formed using the
weighted affinity measures for an entire image.
• The affinity matrix can be cut up to obtain subgraphs
with strong interior links that represent different
clusters.

Field Guide to Image Processing


50 Image Morphology

Erosion and Dilation

Morphological operations are powerful image-processing


tools that are
• nonlinear
• composed of a series of elementary 2D spatial logic
operations
• capable of very complex operations

Erosion is the basic pattern-identification mechanism,


such that at every place where the pattern is present in
an image, an output is substituted in its location.
The pattern used to in-
terrogate the image is
known as the struc-
turing element. Ev-
ery structuring element
has an origin that de-
fines the place of occur-
rence of the structuring element in the image. The origin
can be the center or any other pixel of the structuring ele-
ment.
Assume that the following image on the left is to be
eroded by the structuring element above. Every place that
the structuring element fits the image is replaced by the
origin. As a result of erosion, the image on the right is
produced.

Effect: Erosion is seen


as a process of shrink-
ing the bright regions
of an image.
Application:
Intuitively, erosion can
be used to clean an
image of spot noise
whose size is smaller
than the structuring element. However, in the process, the
image will be shrunk.

Field Guide to Image Processing


Image Morphology 51

Erosion and Dilation (cont.)

Example: Imagine an image composed of tens of connected


pixels that is corrupted by pepper noise that is mostly
1, 2, or 3 pixels. Erosion in this case will remove the
connected pixel noise, leaving the image intact. Note that
salt noise cannot be removed by erosion alone. Salt-and-
pepper noise appears as white and black noise caused by
transmission errors.
Dilation is the opposite of erosion. In dilation, every place
in the image where a single bit is present is replaced by the
structuring element.

Effect: As a result of dilation, an image expands near its


internal or external boundary.
Application: Dilation can be used to fill holes in an image.

Field Guide to Image Processing


52 Image Morphology

Opening and Closing

Erosion and dilation are elementary binary operations


that can be combined to build secondary and more-
complex morphological operations. The first two of such
operations are known as opening and closing.
Opening is erosion followed by dilation. Opening keeps the
part of the image that contains the structuring element.
Whereas erosion keeps only the origin of the structuring
element that fits in the resulting image, opening keeps
the whole of the structuring element. Thus, opening is the
logical sum of all of the structuring elements that fit an
image. In the following figure, the image on the right is
obtained by eroding the image on the left and then dilating
the image in the middle.

One application of opening is the removal of pepper noise


from an image. It can be argued that erosion could achieve
the same thing; however, erosion alone will also shrink
the image. Opening helps restore the image, to a certain
extent.
Closing is dilation followed by erosion. One can imagine
an image with salt noise or with holes small enough
to be closed by a dilation operation. However, dilation
will increase the size of the object. A subsequent erosion
operation will help restore the shape of the image while
leaving the holes in the image closed.

Field Guide to Image Processing


Image Morphology 53

Hit-or-Miss Transform

Although erosion is a basic detection mechanism, some


detection applications require one to identify the object
alone without any additional parts attached to it.
To recognize an object in this fashion, a hit-or-miss
transform may be used. A structuring element M,
slightly bigger than the object to be searched, is used to
fit (detect) the outer markings of an object; H exactly fits
the object inside the outer marking. In other words, M
misses the object, and H hits the object; thus both the
inner and outer structures of an object are detected in a
single operation. See the following figure as an example.
By using a structuring element, as shown in the the left-
most figure, only the right-most object has been detected.
X within H indicates the origin of H.

Since the hit-or-miss transform is disjoint, both parts can


be represented using a single matrix using 1 for “hit,” 0
for “miss,” and x for “don’t care,” where both hit and miss
masks are zero. To detect a vertical line of exactly two
pixels, the mask
0 0 0
 
0 1 0
 
0 1 0
0 0 0
may be used. To obtain all of a 1-, 2-, or 3-pixel line, the
mask will be modified to
0 0 0
 
0 x 0
 
0 1 0
 
0 x 0
0 0 0
Eroding with this operator will detect these three lines.

Field Guide to Image Processing


54 Image Morphology

Thinning

One of the applications of the hit-or-miss transform


is thinning. Thinning generates the topology of an
object, where the object is eroded without losing its
connectedness. The idea is to sequentially erode (literally)
the object until a thin skeleton of the object remains.
With thinning, the eroded pixel is removed from the image,
whereas with erosion, the detection pixel is preserved
while everything else is eroded. The thinning of image I
by structuring pair E is expressed as

I i = I − (I ∗ E)

where I i is the set theoretic difference between I and


( I ∗ E ), and I ∗ E is the hit-or-miss transform.
Assume a hit-or-miss transform of the form

1 1
 
1 1
0 0

where the 1 in the center pixel is the origin. Note that this
is equivalent to h i
1 1
• a hit transform 1 1 ,
where the origin is the lower left pixel; and
h i
0 0
• a miss transform ,
1 1
where the origin is the upper left pixel.

Apply this to an image as follows:

Note that there are three matches


in the hit-or-miss transform, and
the pixels become 0; the next four
origin pixels become 0, and so on.

Field Guide to Image Processing


Image Morphology 55

Skeletonization

In many recognition algorithms, such as handwritten


character recognition, it is desirable to find a thinned
representation of the characters. Skeletonization is
a process of thinning (literally) an object based on a
maximum disk. In reality, an erosion type of operation
is performed. A maximum disk is the largest disk that
fits within the boundary of the image; the line joining the
centers of such a disk is called the skeleton of the image.

To extend this to the digital domain, one may consider a


maximal square, where skeletonization is accomplished by
keeping the center of the square.

Field Guide to Image Processing


56 Image Morphology

Gray-Level Morphology

Dilation: Assume a digital image f (m, n) and a


structuring element s(m, n), where f and s are discrete real
images; the dilation is thus defined by
( f ⊕ s)( k, l ) = max{ f ( k − m, l − n)
+ s( m, n) | ( k − m)( l − n) ∈ D f ;( m, n) ∈ D s }

where D f and D s are domains of f and s.


Apply this to a 1D signal:
¡ ¢
f = ∗ ∗ 1 0 3 2 5 4 0
¡ ¢
s = 3 2 5
Translating by two units,
f = (∗ ∗ 1 0 3 2 5 4 0)
s= 3 2 5
3 2 5
3 2 5
3 2 5
( f ⊕ s)(2) = max(1 + 3, 2 + 0, 3 + 5) = 8
( f ⊕ s)(3) = max(0 + 3, 3 + 2, 2 + 5) = 7
( f ⊕ s)(4) = max(6, 4, 10) = 10
( f ⊕ s)(5) = max(5, 7, 9) = 9
Grayscale erosion is defined as
( f ª s)( f )( k, l ) =
max{ f (k + m, l + n) − s( m, n) | (k + m)( l + n) ∈ D f ;( m, n) ∈ D g }
where D f and D g are domains of f and g. The condition
is that the structuring element should be contained by the
image being eroded.
The morphological gradient is found by calculating the
difference between the dilation and the erosion:
g = ( f ⊕ s) − ( f ª s)

Field Guide to Image Processing


Image Morphology 57

Training a Structuring Element

Assume that the hit-or-miss transform must be applied


to a practical situation to recognize a certain handwritten
pattern. Handwritten characters have a large variation,
and the hit-or-miss transform needs to define proper hit-
or-miss structuring elements so that various characters
can be recognized. Thus, these patterns must be trained
so that they can work on real-life images.
One of the methods that have been used for training
structuring elements is Hebbian learning. A point is
added to the training pattern if it helps in recognizing a
pattern. A miss structuring element is trained by adding
background points with the objective of reducing false
recognition.
Rank-order filtering: In many real-life applications,
binary morphological operations may be less useful
because the artifact they create can introduce unintended
noise. Thus, erosion to remove noise may erode an object
completely, or dilation may fill the background. A rank-
order filter counts the number of elements that overlap
with the structuring element. An output is marked when
the number of elements exceeds a certain percentage.
The convolution operation can be used as a way to count
the number of overlaps, as it naturally performs a sum
of overlapped pixels. By then applying a threshold on the
sum, one can implement rank-order filtering.
Binary morphological operators can be used to perform
boundary detection: an erosion operator can erode the
boundary pixels, and then a dilation operation can expand
the boundary. The difference between the eroded and
dilated image will produce the boundaries of each object.
Because the erosion is intended to delete the boundary
pixels, using the four-connected-pixel structuring element
described earlier will achieve this objective because it will
always be missing one of the four pixels. Different dilation
operators can be experimented with, such as four or eight
connected pixels.

Field Guide to Image Processing


58 Time-Frequency-Domain Processing

Wavelet Transform

The three types of wavelet transforms are


• continuous wavelet transform
• wavelet series expansion
• discrete wavelet transform (DWT)

Signal analysis tools such as the Fourier transform


usually offer frequency-domain content information in
a signal. However, there is a need to view signals in
both time and frequency domains simultaneously. To
enable such time–frequency joint analysis, the basis
function in a transform can be modified in two different
ways:
• Fixed-resolution time–frequency analysis involves
a continuous fixed-length window function for
localization, given as:
e p,q ( t) = f ( t − p) e2π jqt

where p is the time–location parameter, q is the


frequency parameter, and the p− q plane is the
frequency–time domain. The resulting continuous
Fourier transform is
Z ∞
f˜( p, q) = f ( t) e p,q ( t) dt
Z−∞

= f ( t) g( t − p) e2π jqt dt
−∞

Though this windowed Fourier transform offers the


time–frequency analysis capability to process signals,
the resolution is still fixed due to the fixed window
size of ( t − p).
• Variable-resolution time–frequency analysis involves
obtaining the variable-length window function, as
in wavelet transform. The wavelet transform offers
variable resolution for analyzing varying frequencies
in real signals.

Field Guide to Image Processing


Time-Frequency-Domain Processing 59

Types of Fourier Transforms

To understand the development of the wavelet trans-


form, it is useful to review different forms of the Fourier
transform.
The 1D Fourier integral of a spatial function f ( x) is
described mathematically as
Z ∞
F ( p) = f ( x) e− j2π xp dx
−∞
and the inverse Fourier integral is defined as
Z ∞
f ( x) = F ( p) e j2π xp d p
−∞
The Fourier integral is useful for continuous-domain
signal and image analysis.
The Fourier series expansion is given as
Z L
F n = F ( n ∇ p x) = f ( x) e− j2π(n∇ px) dx
0
where n is the number of samples, and ∇ p is the discrete
frequency quanta. The signal can be recovered using the
inverse Fourier series expansion, given as

F n e j 2π(n∆ px)
X
f ( x) = ∇ p
n=0
The Fourier series expansion is useful for representing
signals in dual continuous and discrete domains.
The discrete Fourier transform for a band-limited and
sampled signal l ( x) is given as
1 −1
NX ³ ´
− j 2Nπ nk
Lk = p ln e , n = 0, 1, . . . , N − 1
N n=0
where k represents the time-domain samples l k , and n
represents the frequency-domain variable. The inverse
DFT can be obtained from DFT as follows:
1 NX−1 ³ ´
j 2Nπ nk
ln = p Lk e , k = 0, 1, . . . , N − 1
N k=0
Similar to the Fourier transform, the wavelet transform
can have complementary representations.

Field Guide to Image Processing


60 Time-Frequency-Domain Processing

Wavelet Basis

A continuous, real-valued function Ψ( x) is called the


wavelet basis if it possesses the following two properties:
• Fourier spectrum, Ψ( s), of Ψ( x) satisfies the
following admissibility condition:
|Ψ( s)|2

Z
CΨ = ds < ∞
−∞ s
which states that the normalized power spectral
density of the function is bounded; and
R∞
• −∞ Ψ( x) dx = 0; and Ψ(0) = 0.

A set of wavelet basis functions {Ψa,b ( x)} can be


generated by translating and scaling the basis wavelet
Ψ( x) as
1 x−b
µ ¶
Ψa,b ( x) = p Ψ
a a
where a > 0 and b are real numbers, and a and b represent
the scaling and translation parameters, respectively.
Usually, the wavelet basis function Ψ( x) is centered at the
origin; therefore, Ψa,b ( x) is centered at x = b.
There are many wavelet basis functions that obey the
admissibility condition given above. Examples of wavelet
basis function include
• Mexican hat
• Duabechies
• Haar

The Haar basis functions are the simplest and have been
in use for more than a hundred years. Haar wavelet
functions have been included as one subset of Daubechies
basis functions. Haar basis functions contain abrupt
changes in signal.

Field Guide to Image Processing


Time-Frequency-Domain Processing 61

Continuous Wavelet Transform

Similar to the continuous FT, the continuous forward


wavelet transform of f ( x) with respect to the wavelet
basis function Ψ( x) is given as
Z ∞
W f (a, b) = 〈 f , Ψa,b ( x)〉 = f ( x) Ψa,b ( x) dx
−∞
where 〈, 〉 represents the inner product operation, and
Ψa,b ( x) is the basis function as shown above.
The inverse continuous wavelet transform is obtained as
Z ∞Z ∞
f ( x) = 1/K Ψ W f (a, b) Ψa,b ( x) db ( da/a2 )
0 −∞
where a and b are the scaling and translation variables,
respectively, and K Ψ is a constant. The above two
equations can be extended to 2D by following the
separability property of linear transforms as discussed
in the chapter on spatial domain.
Filter bank representation
An alternate expression of contin-
uous wavelet basis function dis-
cussed above is given as
1 ³x´
Ψ a ( x) = p Ψ
a a
The above function is the scaled
and normalized wavelet basis func-
tion. A reflected complex conjugate
version of the scaled wavelet is as
follows:
1 ³ x´
Ψ̃a ( x) = Ψ∗a (− x) = p Ψ∗ −
a a
The continuous wavelet transform expression can then be
rewritten as
Z ∞
W f (a, b) = f ( x) Ψ̃a ( b − x) dx = 〈 f , Ψ̃a 〉

The above equation implies that signal f ( x) is filtered by
a series of bandpass filters, i.e., filter banks, as shown in
the figure.

Field Guide to Image Processing


62 Time-Frequency-Domain Processing

Wavelet Series Expansion

A function Ψ( x) is called an orthogonal function if the


set of functions {Ψ j,k ( x)} meet the following two conditions:
• The function is defined as follows:

Ψ j,k ( x) = 2 j/2 Ψ (2 j x − k), for − ∞ < j , k < ∞

where j and k are the dilation and translation factors,


respectively, and the function forms an orthonormal
basis of L2 (R ).
• The wavelet set forms an orthogonal basis if it
satisfies the following condition:
〈Ψ j,k , Ψl,m 〉 = δ j,l δk,m

where l and m are integers, and δ j,k is the Kronecker


delta function.

The function f ( x) is then obtained as follows:



∞ X
c j,k Ψ j,k ( x) where
X
f ( x) =
j =−∞ −∞
j
Z ∞
c j,k = 〈 f ( x), Ψ j,k ( x)〉 = 2 2 f ( x)Ψ(2 j x − k) dx
−∞

If the region of support (ROS) of f ( x) and the wavelet


basis are restricted between [0, 1] (i.e., they are ‘0’ outside
the ROS), then the orthonormal basis function can be
expressed using just one index n as follows:
Ψn ( x) = 2 j/2 Ψ (2 j x − k)

where j and k are functions of n and are given as follows:


n = (2 j + k); j = 0, 1, . . . ; k = 0, 1 , . . . ; 2 j ≤ n

The wavelet basis function with this restricted ROS is


called the compact dyadic wavelet.

Field Guide to Image Processing


Time-Frequency-Domain Processing 63

Discrete Wavelet Transform

The formulations of discrete wavelet transforms


(DWTs) use the concepts of filter-bank theory,
multiresolution/time-scale analysis, and subband
coding.
Filter-bank theory: Consider an ideal bandpass filter
with impulse response H i (s) and filter outputs G i ( s). The
function f ( x) is given as

X ∞
X
f ( x) = g i ( x); with H i ( s) = 1
j =1 i =1

Each g i ( x) is formed by the usual convolution operation


as follows:
Z ∞
g j ( x) = 〈 f ( t), h i ( t − x)〉 = f ( t) h i ( t − x) dt
−∞
Note that the bandpass filters h i ( x) can be represented as
a bank of filters, as shown in the figure.
Multiresolution
filtering:
Consider
that the 2D
impulse re-
sponse of a
low-pass fil-
ter is given
as g i ( i, j ), and that the low-pass filter is subsampled by
half at each stage, producing reduced-resolution images.
Note that the sign ↓2 represents downsampling of the im-
ages by a factor of 2. Therefore, at each stage, the resulting
low-pass-filtered image has half the resolution of the pre-
vious stage, as shown in the figure.
The resulting high-pass filters are shown as h1 ( i, j ), h2 ( i, j ),
. . . , h n ( i, j ), and the reduced resolution images are shown as
f 1 ( i, j ), f 2 ( i, j ), . . . , f n ( i, j ), respectively. When these reduced
resolution images are organized in a stack with f ( i, j )
in the lowest level and f n ( i, j ) in the highest level,
a multiresolution pyramid representation of image
frequencies are formed.

Field Guide to Image Processing


64 Time-Frequency-Domain Processing

Subband Coding

Each of the reduced resolution images is obtained using


a combination of high-pass filtering and downsampling
of the images in stages to obtain different bands of
frequencies. This process is called subband coding.
A few of the desirable properties of subband coding
include:
• decomposition of image information into narrow
bandpass filters
• an absence of redundancy in information
• perfect reconstruction of the original image (the
perfect redundancy states that the original image can
be reconstructed without error if one starts at any
stage and proceeds backwards)

The forward subband coding and backward reconstruction


are shown in the figure above. The sign ↑2 represents
upsampling of the images by a factor of 2.
Subband coding assumes availability of both low- and
high-pass filters. This coding scheme has been used in
time-frequency domain analysis for some time because it
allows for the reconstruction of image data without loss. It
is also used in multiresolution wavelet design.
The basis functions are time shifted to generate filter
banks. The subband coding is closely related to pyramid
decomposition such that both methods obtain successive
approximation and addition of detail coefficients, and thus
yield perfect image reconstruction. Both of these methods
offer multiresolution versions of an image. The subband
coding can be used to quickly generate overcomplete
versions.

Field Guide to Image Processing


Time-Frequency-Domain Processing 65

Mirror Filter and Scaling Vector

The translation property of Fourier transform F


suggests that the translation of an image f ( x, y) by
amounts of a and b is given as follows:
F { f ( x − a, y − b)} = e− j2π(au+bv) F ( u, v)
where the image size is M × N . For a = u = M /2 and b = v =
N /2, the above reduces to
F { f ( x − M /2, y − N /2)} = (−1)u+v F ( u, v)
The translation property of the Fourier transform can be
used to obtain the high-pass filter, given the low-pass filter
is available in the subband coding scheme. This mirror
filter formulation reduces the complexity of designing the
necessary filters for DWT implementation. A 1D discrete
high-pass filter can be obtained using a 1D low-pass filter
as follows:
g 1 ( k) = (−1)k h 1 (− k + 1)
where g 1 (k) and h1 ( k) are the 1D discrete high- and low-
pass filters, respectively.
Consider the scaling vector sequence, given as
ϕ( t) = h1 ( k) ϕ(2 t − k)
where
X p X
h 1 ( k) = 2 and h 1 ( k) h 1 ( k + 2 l ) = δ( l )
k k
The scaling vector can also be computed by repeated
convolution of h1 ( t) with scaled versions of the
rectangular pulse function. The scaling vector must be
orthogonal under unit shifts as follows:
〈ϕ( t − m), ϕ( t − n)〉 = δm,n
where m and n are integers. The low-pass-filter coefficients
can be obtained as follows:
h 1 ( k) = 〈ϕ1,0 ( t), ϕ0,k ( t)〉
where
ϕn ( x) = 2 j/2 ϕ (2 j x − k), j = 0, 1, . . . .; k = 0, 1, . . . ; 2 j ≤ n
Examples of scaling vectors include Haar, Daubechies,
and Mallat, among others.

Field Guide to Image Processing


66 Time-Frequency-Domain Processing

Wavelet Vector and 1D DWT Computation

A discrete high-pass impulse response, also known as a


wavelet vector, can be computed using the mirror filter
property as follows:
g 1 ( k) = (−1)k h 1 (− k + 1)

while a basic wavelet can be computed as


Ψ( t) =
X
g 1 ( k) ϕ(2 t − k)
k

An orthogonal wavelet set using the basic wavelet is


obtained as
Ψ j,k ( x) = 2 j/2 Ψ (2 j x − k)

The following steps make up 1D DWT computation:

1. Compute scaling vector (function) φ( t) such that


it is orthonormal under unit shift. Therefore, the
scaling function satisfies the following:
〈φ( t − m), φ( t − n)〉 = δm,n

2. Obtain low-pass filter coefficients as follows:


h 1 ( k) = 〈ϕ1,0 ( t), ϕ0,k ( t)〉

where
ϕn ( x) = 2 j/2 ϕ(2 j x − k), j = 0, 1, . . . ; k = 0, 1, . . . ; 2 j ≤ n

3. Obtain high-pass filter coefficients g 1 ( k) from low-


pass filter h1 (k) using a mirror filter given as
g 1 ( k) = (−1)k h 1 (− k + 1)

4. Form the basic wavelet function, given as


Ψ( t) =
X
g 1 ( k) ϕ(2 t − k)
k

5. Obtain the orthonormal DWT set as follows:


Ψ j,k ( x) = 2 j/2 Ψ(2 j x − k)

Field Guide to Image Processing


Time-Frequency-Domain Processing 67

2D Discrete Wavelet Transform

The 1D DWT can be extended to a 2D DWT using


the separability property of linear transforms as
discussed in the chapter on spatial domain. Here are the
steps to obtain a 2D DWT:

• Consider a separable 2D scaling vector given as

φ( x, y) = φ( x)φ( y)

• Let Ψ( x) be the companion wavelet vector. The


three 2D basic wavelets are as follows:

Ψ1 ( x, y) = φ( x)Ψ( y)
Ψ2 ( x, y) = Ψ( x)φ( y)
Ψ3 ( x, y) = Ψ( x)Ψ( y)

• Use these three basic wavelets to obtain an


orthonormal set of 2D wavelet transforms in L2 (R 2 ),
given as
n o n ³ ´o
p
Ψ j,m,n ( x, y) = 2 j Ψ p x − 2 j m, y − 2 j n ; j≥0

where p = 1, 2 and 3, and i, j, m, and n are integers as


described above.

The wavelet-transformed images can be obtained by


taking the inner product with one of the wavelet basis
functions and the image at a specific stage. For example,
for the first stage ( j = 1), the four transformed subimages
are as follows:

f 10 ( m, n) = 〈 f ( x, y), φ( x − 2 m, y − 2 n)〉
f 11 ( m, n) = 〈 f ( x, y), Ψ1 ( x − 2 m, y − 2 n)〉
f 12 ( m, n) = 〈 f ( x, y), Ψ2 ( x − 2 m, y − 2 n)〉
f 13 ( m, n) = 〈 f ( x, y), Ψ3 ( x − 2 m, y − 2 n)〉

where f ( x, y) is the original image. For subsequent stages


( j > 1), f 10j ( x, y) is decomposed to form four subimages at
scale 2 j+1 .

Field Guide to Image Processing


68 Time-Frequency-Domain Processing

2D Discrete Wavelet Transform (cont.)

The four resulting 2D-DWT-transformed subimages are


known as follows:
• The f 0j+1 ( i, j ) is the low–low-filtered version of the
2
original, known as the approximate (a) subimage.
• The f 1j+1 ( i, j ) is the low–high-filtered version, known
2
as the vertical (v) subimage.
• The f 2j+1 ( i, j ) is the high–low-filtered version, known
2
as the horizontal ( h) subimage.
• Finally, the f 3j+1 ( i, j ) is the high–high-filtered version,
2
known as the diagonal (d ) subimage.
These four subimages are shown in the following figure:

The one-step 2D DWT decomposition of an image is shown


in the following figure. The four parts of the decomposed
image correspond to the a, v, h, and d components.

Field Guide to Image Processing


Image Compression 69

Data Redundancy

Redundancy enables image compression. Assume that


an image requires n1 bits to be represented; when
compressed, it requires n2 bits, such that n2 < n1 . The ratio
n 1 / n 2 is known as compression ratio. The image can be
compressed by using (a) coding or statistical, (b) interpixel,
and (c) psychophysical redundancy.
Coding redundancy: A grey-level image has a histogram
that is not naturally uniform. A binary coding results in
redundancy because the average number of bits required
to represent each pixel is not optimized to represent the
number of gray levels present in the image. If a 256-level
gray image is encoded using 8 bits, then the average bit
length is 8. In general, if a gray level g k has probability of
occurrence p( g k ), and it is encoded using length l , then the
average length is given by
−1
LX
L avg = l( gk) p ( gk)
k=0

Optimization is achieved by assigning fewer bits to the


most-probable gray levels at the expense of assigning more
bits to the least-frequent symbol.
General method: The frequency of gray-level occurrences
is given by the image histogram. Based on the histogram,
a higher frequency of occurrence will be assigned fewer
bits, resulting in a lowering of the average. This principal
is demonstrated in Huffman coding. The number of
bits assigned to each level is roughly log2 P ( l ). Interpixel
redundancy exploits spatial correlation to predict a pixel
value from its neighbors. Interlacing consecutive frames
of video achieves compression of 2:1, taking advantage of
psychovisual redundancy.

As an extreme case, assume that a gray-level image has


only two levels: level 0 and level 200. Instead of using all
8 bits to represent this image, it can be represented by
just a 0 and 1—only 1 bit—resulting in L avg = 1, which is
a compression ratio of 8.

Field Guide to Image Processing


70 Image Compression

Error-Free Compression

In many satellite or medical imaging applications,


lossless compression is the only option. Variable-
length coding, such as Huffman coding, is the most
popular and optimal technique of achieving error-free
compression, exploiting coding redundancy.
The core idea of Huffman coding is to assign the most-
frequently occurring codes with the least size of code.
Assume that the symbols ( q1 – q6 ) occur with the following
probabilities: q1 = 0.2, q2 = 0.5, q3 = 0.13, q4 = 0.06, q5 =
0.07, and q6 = 0.04.
q2 q1 q3 q5 q4 q6
0.5 0.2 0.13 0.07 0.06 0.04
(01110) (01111)
0.5 0.2 0.13 0.07 0.1
(0110) (0111)
0.5 0.2 0.13 0.17
(010) (011)
0.5 0.3 0.2
(00) (01)
0.5 0.5
(1) (0)

Method:
• The probabilities are arranged according to their
magnitudes.
• The least-probable two are combined.
• The process is continued until there are two
probabilities remaining.
• The most-probable code is assigned a single bit each.
• Working backward from the combined probabilities,
an extra bit is assigned to each of the component
probable codes until all of the codes have been
assigned.
The average length of the Huffman code is L = 0.5 × 1 +
0.2×2+0.13×3+0.07×4+0.06×5+0.04×5 = 2.07 b. Without
compression, these six levels ( q1 – q6 ) will require 3 bits in
binary.

Field Guide to Image Processing


Image Compression 71

Spatial Redundancy

Spatial or interpixel redundancy exploits correlation


across different pixels of an image. This allows a pixel
value to be estimated from its neighboring pixels.
A typical method known as run-length encoding can
represent the beginning value of a line and the length
of a line that maintains the same value. When an image
is converted to a difference image, run-length encoding
exploits the similarity between adjacent pixels to reduce
the information content.
Imagine a moving object on a static background. The
object moves, but the background does not. Therefore,
the observer only needs to have the new position of the
object updated; the old frame containing the background
can be reused. Similarly, differences between neighboring
frames can be used to reduce the information that must be
transmitted.
In run-length encoding, pictures are first decomposed into
scan lines. Compression is achieved by recording the gray
level at the beginning of each scan line, as well as the
length during which the gray level remains constant.
Assume a scan binary line:
1111000011000000000
The sequence is recoded as
(1, 4)(0, 4)(1, 2)(0, 9)
In each pair ( x, y), x is the binary level, and y is the run
length. The number of runs necessary to encode a line
can be 1 at the least and N at the most ( N runs are
needed only if each pixel on the line has a different gray
level). This method is very effective for binary images in
a fax machine; it may also be used to store coefficients
of compressed images. The concept of run-length encoding
can be extended to multilevel symbols, wherein the symbol
and its run are coded.

Field Guide to Image Processing


72 Image Compression

Differential Encoding Example

Assume an 8-bit image that can be encoded using between


0 and 127 gray levels. Assume a sequence of gray levels
4, 6, 10, 8, 15. This will be encoded as 4, 2, 4, −2, 7. Note that
all of the gray levels except the first are encoded as
i n − i n−1 . If 0 ≤ i n < 127, then −127 < i n − i n−1 < 127. The
original image requires 7 bits, and the differential image
requires 8 bits.
Assume that most of the difference falls between −a
and +b. If a histogram of the original image versus the
differential image is plotted, the differential image shows
a more-compact histogram. For example, assume that a =
−5 and b = +8. Let these codes be represented by 14 normal
binary sequences, and let the remaining two codes be used
for shift up or down so that those codes that are less
than −5 and greater than +8 can be coded using these
additional codes. If −5 ≤ i n − i n−1 ≤ 8, use c1 , c2 , . . . , c14 . If
i n − i n−1 > 8, use c 15 , and if i n − i n−1 < −5, use c 0 .

A 10 will be coded as c15 c2 and a 9 will be coded as


c 15 c 1 . Note that when the positive number 8 is crossed,
it is indicated by c 15 , and an additional code denotes its
relative position on the folded scale. Thus, a 23 is coded as
c 15 c 15 c 1 . A negative number smaller than −5 is denoted
by c 0 , followed by a code indicative of its relative position.
Differential coding can be encoded using an equal or an
unequal number of bits for each of the codes c0 to c 15 . In
order to use unequal-bit coding, such as Huffman coding,
the frequency of occurrence of different gray levels needs
to be estimated using a histogram. To reduce the size
of the code, assign the smallest code size to the highest
occurrence.

Field Guide to Image Processing


Image Compression 73

Block Truncation Coding: Lossy Compression

An image is divided into blocks of M × M pixels, where M ¿


N . In each of these blocks, the pixel values are thresholded
based on the mean of the numbers, such that if a number
is greater than the mean, it is set to A and B, otherwise,
such that the first two moments are preserved. With these
conditions, the values of A and B can be calculated as

q
r
A = f¯ − σ
m−q
m−q
r
B = f¯ + σ
q

where
q m
1 X M
1 X
σ= f¯2 − ( f¯)2 , f¯ = f i , f¯2 = f i2
m i=1 m i=1

and q is the number of bits converted to A , f¯ is the mean,


m = M × M , and σ is the standard deviation.

Assume the following:

Here, q = 2, m = 4, A = 51, and B = 109. The information


transmitted is the thresholded binary representation with
the two values of f and σ. For a 2 × 2 block, assuming
8 bits are used for f and σ and 1 bit for each pixel, a
total of 16 + 4 = 20 bits is required for 4 pixels; hence,
compression is 5 bits/pixel. If the block size is increased
to 4 × 4, 16 + 16 = 32 bits are required for 16 pixels, for a
compression of 2 bits/pixel.

Field Guide to Image Processing


74 Image Compression

Discrete Cosine Transform

The 2D discrete cosine transform (DCT) is given by

−1 NX
4 c(u, v) NX −1 (2 m + 1) uπ (2 n + 1)vπ
G ( u, v) = g m,n cos cos
N2 m=0 n=0 2N 2N
m, n = 0, 1, . . . ., N − 1

where

1
c( u, v) = for m = n = 0
2
= 1 for m, n = 1, 2, . . . , N − 1

Inverse DCT:

1
g( m, n) = G (0, 0)
N
−1 NX
1 NX −1 (2 m + 1) uπ (2 n + 1)vπ
+ 3
G ( u, v)cos cos
2 N u=1 v=1 2N 2N
m, n = 0, 1, . . . ., N − 1

Practical method: The image is divided into blocks of N × N


subimages. The 2D DCT can be computed by calculating
the 1D DCT of each row and then performing a 1D
DCT on the columns. Compression becomes lossy when a
percentage of the DCT coefficients are discarded and the
image is reconstructed from the remaining coefficients.
The advantage of the DCT over the FFT is that the
frequency-domain FFT assumes that the subimages are
periodic; as a result, the boundaries of each block may
show a discontinuity. The DCT, on the other hand,
assumes that functions are evenly symmetric; as a
result, blocks do not show discontinuity when recreated
using inverse transform. Practical methods for still-frame
continuous-image compression, such as JPEG, use the
DCT as a baseline coding system.

Field Guide to Image Processing


Image Compression 75

JPEG Compression

The JPEG (Joint Photographic Expert Group) format was


developed to enable transmitting video, still images, and
color and gray images through facsimile applications and
other communications channels. The following general
steps are used to create a JPEG image, each of which can
be altered to suit a particular application.
1. Transform the color image into a color space. The
most sensitive information is in the high-frequency
gray scale; therefore, most high-frequency color
information can be discarded.
2. Reduce chroma space resolution by downsampling.
Keeping 6 pixels per 2 × 2 image instead of 12 pixels
results in 50% reduction with no effect on quality.
3. Divide the image into 8 × 8 sub-images and perform
DCT on the block. The DCT coefficients are
quantized to integers using weighting functions,
where higher-order terms are quantized more
than the lower-order terms. Chrominance data is
quantized more than luminance data.
4. Encode quantized coefficients with lossless Huffman
coding, reducing data redundancy even further.

The decoding process reverses the above steps except for


the quantization step, which results in loss of information
and is therefore irreversible.
In step 3, the pixel levels of n-bit data are further
reduced by subtracting 2n−1 . The quantized DCT is then
reordered in a zigzag pattern designed to group the high-
frequency components, often long arrays of zeros. The
DCT coefficients are difference coded with respect to the
coefficients of the previous block, further reducing the
number of bits of the Huffman code in step 4.
The normalized DCT codes are created from the Huffman
code during decompression and then denormalized
using the weighting function. The inverse DCT of the
denormalized image is taken and then level shifted. The
DCT coefficients are quantized; thus, information is lost
(compare the subimage at this stage with its original).

Field Guide to Image Processing


76

Equation Summary

Entropy:

X
H=− p k log2 p k bits/message
k=1

Image model:
f ( x, y) = i ( x, y) r ( x, y) = L ⇒ gray level

Spatial-Domain Image Processing


Translation, scaling, and rotation:
v∗ = R [ s(Tv)]

LSI system output:


Z ∞
y( m) = f ( m − z) x( z) dz
−∞

Gradient operator on image f (x, y):


¸
Gx
·
∆ f¯ =
Gy

Laplacian edge operator:


δ2 f δ2 f
∆2 f = +
δ x2 δ y2
Frequency-Domain Image Processing
1D Fourier transform of spatial function f (x):
Z ∞
F ( u) = f ( x) e− j2πux dx
−∞

Inverse Fourier transform:


Z ∞
f ( x) = F ( u) e j2πux du
−∞

Field Guide to Image Processing


77

Equation Summary

2D Fourier transform:
Z ∞ Z ∞
G ( u, v) = g( x, y) e− j2π(ux+v y) dx d y
−∞ −∞

2D discrete Fourier transform:


−1 NX
NX −1 ³ ´
− j 2Nπ ( kn+ ml )
G ( n, m) = g k,l e ;
k=0 l =0
m = 0, 1, . . . , M − 1, n = 0, 1, . . . , N − 1

Convolution:
F { f ( x)∗ g( x)} = F ( u)G ( u)
G ( u, v) = F ( u, v) H ( u, v)

Image Restoration
Convolution integral with noise:
Ï
m( x, y) = f (α, β) h( x − α, y − β) d α d β + n( x, y)
−∞

LSI system approximation in the Fourier domain:


M ( u, v) = H ( u, v)F ( u, v) + N ( u, v)

Discrete form of the degradation model:


m= Hf +n

Unconstrained algebraic restoration:


fˆ = H −1 m

Constrained algebraic restoration:


fˆ = ( H 0 H + γQ 0 Q )−1 H 0 g

Field Guide to Image Processing


78

Equation Summary

Weiner Filter:
H ∗ ( u, v)S f f ( u, v)
P ( u, v) =
S f f ( u, v)| H ( u, v)2 | + S nn ( u, v)

Image Segmentation and Clustering


Polar representation:
ρ = x cos θ + y sin θ
Minimizing point–cluster distance:
( )
2
X X
kx j − µxi k
i ∈ clusters j ∈ elements of i ’th cluster

Image Morphology
Thinning:
I i = I − (I ∗ E)

Dilation:
( f ⊕ s)( k, l ) = max{ f ( k − m, l − n)
+ s( m, n) | ( k − m)( l − n) ∈ D f ;( m, n) ∈ D s }

Grayscale erosion:
( f ª s)( k, l ) = max{ f ( k + m, l + n) − s(m, n) | ( k + m)( l + n) ∈
D f ;( m, n) ∈ g }

Morphological gradient:
g = ( f ⊕ s) − ( f ª s)

Wavelet Transform
Continuous Fourier transform:
Z ∞
f˜( p, q) = f ( t) g( t − p) e2π jqt dt
−∞

Field Guide to Image Processing


79

Equation Summary

Fourier series expansion:


Z L
F n = F ( n ∇ p x) = f ( x) e− j2π(n∇ px) dx
0

Fourier spectrum:
|Ψ( s)|2

Z
CΨ = ds < ∞
−∞ s
Wavelet basis functions:
1 x−b
µ ¶
Ψa,b ( x) = p Ψ
a a
Continuous wavelet transform:
Z ∞
W f (a, b) = 〈 f , Ψa,b ( x)〉 = f ( x) Ψa,b ( x) dx
−∞

Inverse continuous wavelet transform:


Z ∞Z ∞
f ( x) = 1/K Ψ W f (a, b) Ψa,b ( x) db ( da/a2 )
0 −∞

Orthogonal function:
Ψ j,k ( x) = 2 j/2 Ψ (2 j x − k), for − ∞ < j, k < ∞
〈Ψ j,k , Ψl,m 〉 = δ j,l δk,m
∞ X ∞
c j,k Ψ j,k ( x); where
X
f ( x) =
j =−∞ −∞
Z ∞
c j,k = 〈 f ( x), Ψ j,k ( x)〉 = 2 j/2 f ( x)Ψ(2 j x − k) dx
−∞

1D discrete high-pass filter:


g 1 ( k) = (−1)k h 1 (− k + 1),

Scaling vector sequence:


ϕ( t) = h1 ( k) ϕ(2 t − k),
X p X
where h 1 ( k) = 2 and h 1 ( k) h 1 ( k + 2 l ) = δ( l )
k k

Field Guide to Image Processing


80

Equation Summary

1D discrete low-pass filter:


h 1 ( k) = 〈ϕ1,0 ( t), ϕ0,k ( t)〉,

where
ϕn ( x) = 2 j/2 ϕ (2 j x − k); j = 0, 1, . . . .; k = 0, 1, . . . ; 2 j ≤ n

Basic wavelet:
Ψ( t) =
X
g 1 ( k) ϕ(2 t − k)
k

Orthonormal set of 2D wavelet transforms:


n o n ³ ´o
p
Ψ j,m,n ( x, y) = 2 j Ψ p x − 2 j m, y − 2 j n ; j ≥ 0

Image Compression
Average bit length:
−1
LX
L avg = l( gk) p ( gk)
k=0

2D discrete cosine transform:


−1 NX
4 c(u, v) NX −1 (2 m + 1) uπ (2 n + 1)vπ
G ( u, v) = g m,n cos cos ;
N2 m=0 n=0 2 N 2N
m, n = 0, 1, . . . ., N − 1

Inverse discrete cosine transform:


1
g( m, n) = G (0, 0)
N
1 −1 NX
NX −1 (2 m + 1) uπ (2 n + 1)vπ
+ G ( u, v)cos cos ;
2N 3 u=1 v=1 2N 2N
m, n = 0, 1, . . . ., N − 1

Field Guide to Image Processing


81

Bibliography

Castleman, K. R., Digital Image Processing, Prentice Hall,


New York (1996).
Dougherty, E. R. and R. A. Lotufo, Hands-On Morphologi-
cal Image Processing, SPIE Press, Bellingham, WA (2003)
[doi:10.1117/3.501104].
Forsyth, D. and J. Ponce, Computer Vision: A Modern
Approach, Prentice Hall, New York (2001).
Gaskill, J. D., Linear Systems, Fourier Transforms, and
Optics, John Wiley & Sons, New York (1978).
Gonzalez, R.C. and R. E. Woods, Digital Image Processing,
3rd Ed., Prentice Hall, New York (2008).
Gonzalez, R. C., R. E. Woods, and S. L. Eddins, Digital
Image Processing Using MATLAB® , Prentice Hall, New
York (2004).
Gumley, L. E. Practical IDL Programming, Morgan
Kaufmann, New York (2002).
Jahne, Bernd, Digital Image Processing, Springer, New
York (2002).
Jain, A. K., Fundamentals of Digital Image Processing,
Prentice Hall, New York (1989).
Jong, M. T., Methods of Discrete Signal and System
Analysis, McGraw-Hill, New York (1982).
Karim, M. A. and A. A. S. Awwal, Introduction to Optical
Computing, John Wiley & Sons, New York (1992).
Trucco, E. and A. Verri, Introductory Techniques for 3D
Computer Vision, Prentice Hall, New York (1998).

Field Guide to Image Processing


82

Index

1D DWT computation, 66 convolution filter, 12


2D DWT, 67, 68 correlation, 19, 20, 26
2D Fourier transform cross-correlation, 19
(FT), 17 cumulative distribution
adaptive histogram function (CDF), 15, 16
equalization, 16 cutoff frequency, 30
additive, 36 deconvolution, 41
Affine transform, 6 defocus, 40
affinity, 49 degradation, 36, 41
agglomerative clustering, degradation model, 36
48, 49 dendrogram, 49
aliasing, 5, 33 description, 3
autocorrelation, 19, 20 differential coding, 72
bands of frequencies, 64 dilation, 51, 52, 56, 57
basic wavelet, 66, 67 discrete convolution, 26
basis function, 58 discrete cosine transform
binary image, 5, 8, 45 (DCT), 74
binary image transform, 8 discrete form, 38
bit-plane image, 5 discrete Fourier
bit-plane slicing and transform (DFT), 18,
transformation, 8 23, 33, 37, 59
blind deconvolution, 41 discrete wavelet
block circulant matrix, 38 transform (DWT), 58,
circulant matrix, 38 63, 65
circular convolution, 26, disjoint, 53
27 distance measures, 48
closing, 52 edge detector, 13
clustering, 44, 48 equalized histogram, 15
coding redundancy, 69 erosion, 50, 52–55, 57
compact dyadic wavelet, error-free compression, 70
62 fast Fourier transform
compression ratio, 69 (FFT), 18
constrained restoration, fftshift, 24
39 filter-bank
continuous wavelet representation, 61
transform, 58 filter-bank theory, 63
convolution, 10, 11, 20, filtering, 28, 29
21, 26, 29, 35, 37, 57, first-order (1st derivative)
63, 65 gradient filter, 13

Field Guide to Image Processing


83

Index

fixed-length window, 58 image negative, 8


Fourier domain, 11 image preprocessing, 3
Fourier integral, 59 image quantization, 5
Fourier plane, 20 image restoration, 36, 37
Fourier series expansion, image rotation, 7
59 image sampling, 5
Fourier spectrum, 17, 60 image segmentation, 3,
Fourier transform (FT), 44, 45, 48
17, 19, 20, 29, 37, 41, image space, 46
58, 59, 61, 65 image transformation, 6
frequency-domain impulse function, 37
translation, 21 impulse response, 9, 21,
frequency–time domain, 37
58 independent variables, 2
global appearance, 15
intensity, 20, 45
gradient filter, 13
interpixel redundancy, 69,
graph theoretic
71
clustering, 49
interpretation, 3
grayscale erosion, 56
inverse discrete cosine
grayscale image, 4, 8
transform (DCT), 74
grayscale transform, 8
inverse discrete Fourier
Green’s function, 21
hard threshold, 44 transform (DFT), 18
Hebbian learning, 57 inverse filter, 41–43
high-pass (HP), 11 inverse Fourier transform
high-pass filter (HPF), 11, (FT), 17
29, 30 Joint Photographic
histogram, 15 Expert Group (JPEG),
histogram equalization, 75
15, 16 joint probability density
histogram matching, 16 function (PDF), 2
hit-or-miss transform, 53, joint transform
54, 57 correlation, 28
Hough transform, 44, 46, K-means clustering, 49
47 knowledge data, 3
Huffman coding, 69, 70 Laplacian edge operator,
image, 4 14
image acquisition, 3 linear, 11, 19, 23, 36, 37
image edge extraction, 14 linear convolution, 26

Field Guide to Image Processing


84

Index

linear space invariant orthogonal, 2, 65


(LSI) system, 9, 10 orthogonal function, 62
linear superposition, 9 orthogonal wavelet, 66
linear system, 9 orthonormal basis, 62
linear time-invariant orthonormal DWT, 66
(LTI) system, 9 parameter space, 46, 47
linear transform, 67 periodic signal, 34
LoG edge operator, 14 periodicity, 24
log transformation, 8 phase shift, 21
logical operations, 8 phase spectrum, 25
lossless compression, 70 point impulse response,
low-pass (LP), 11 37
low-pass filter (LPF), 11, point operation, 15, 16
29, 30 point processing, 8
magnitude spectrum, 25 point source, 37, 40
matched filtering, 20, 21, point spread function
28 (PSF), 21, 37
matrix format, 6 point-wise image
maximum disk, 55 averaging, 8
mean filter, 12 polar parameter space, 47
median, 12 polar representation, 47
median filter, 12 power law (gamma)
mirror filter, 65, 66 transformation, 8
modulation, 25 probability distribution
morphological filter, 12 function (PDF), 15,
morphological gradient, 16, 44, 45
56 psychovisual redundancy,
motion blur, 40 69
moving window, 10, 12 rank-order filtering, 57
multiplicative, 36 real signal, 24
multiresolution pyramid, real time, 40
63 recognition, 3
multiresolution/time-scale rect function, 35
analysis, 63 region growing, 44, 45
noise, 36, 43 region of support (ROS),
nonlinear, 36 62
Nyquist criterion, 31, 33 relative motion, 40
opening, 52 representation, 3
origin, 50 resolution, 31, 63, 64

Field Guide to Image Processing


85

Index

rotation, 7, 25 strict-sense stationary, 2


run-length encoding, 71 structuring element, 50
salt-and-pepper noise, 51 subband coding, 63–65
sampling, 31–33 superposition, 11
scaling, 7, 19, 25 superposition sum, 37
scaling expression, 7 system response, 26
scaling function, 66 system transfer function,
scaling vector, 65–67 21
scan lines, 71 thinning, 54
second-order (2nd thresholding, 44, 46
derivative) gradient time invariance, 37
filter, 14 time–frequency joint
seed, 45 analysis, 58
separability, 23, 61, 67 training, 57
shift invariance, 10 transform of a product, 21
signal-to-noise ratio transform of a transform,
(SNR), 28 20
sinc function, 34 translation, 6, 7, 21, 25
sinusoid, 47 translation matrix, 7
skeleton, 54, 55 translation property, 65
skeletonization, 55 tri function, 35
Sobel edge operator, 14 turbulent, 40, 42
soft (optimal) threshold, unconstrained
44, 45 restoration, 39
space invariance, 37 variable-length coding, 70
space-invariant system, variable-length window,
37 58
spatial domain, 20 wavelet basis, 60, 67
spatial image transform, wavelet basis function,
6 60–62
spatial redundancy, 71 wavelet series expansion,
spatial transform, 8 58
spatial-domain image wavelet transform, 58, 59,
transform, 6, 8 61
spectral density, 22 wavelet vector, 66, 67
spectrum, 17, 32, 34, 35 Weiner filter, 43
stationary process, 2 wide-sense stationary, 2
statistical filter, 12 zero padding, 27

Field Guide to Image Processing


Khan M. Iftekharuddin is a pro-
fessor of electrical and computer
engineering, and director of the
Vision Lab at Old Dominion Uni-
versity; he holds a joint appoint-
ment with the biomedical engineer-
ing program. Previously, he worked
in the Department of Electrical and
Computer Engineering at the Uni-
versity of Memphis, where he re-
ceived the Herff Outstanding Re-
searcher Award in 2011. He is the
principal author of more than 120 refereed journal
and conference proceedings papers, and multiple book
chapters on biomedical image processing, image post-
processing and distribution, and optical interconnection
networks. Dr. Iftekharuddin currently serves as an as-
sociate editor for several journals, including Optical
Engineering, International Journal of Imaging, The Open
Cybernetics and Systemics Journal, and International
Journal of Tomography and Statistics. He is a fellow of
SPIE, a senior member of IEEE, and a member of IEEE
CIS and OSA.

Abdul A. Awwal is a technical staff


member with the Laser Science En-
gineering and Operations division
of the Lawrence Livermore National
Laboratory (LLNL), working in the
National Ignition Facility. He re-
ceived the R&D 100 Award in 2003
for the adaptive optics phoropter
and in 2008 for automatic align-
ment for laser fusion. Before joining
LLNL in 2002, he taught at Wright
State University, where he received the Excellence in
Teaching Award in 1996. His research interests are pat-
tern recognition, adaptive optics, optoelectronic comput-
ing, and optical/digital image processing. Dr. Awwal is the
author of 188 published articles, including 74 articles in
refereed journals and a text book on optical computing;
he also edited the book Adaptive Optics for Vision Science
(Wiley 2006). Currently, he is a topical editor (information
processing area) for Applied Optics. He is a fellow of SPIE
and OSA.
Image Processing
Khan M. Iftekharuddin and Abdul A. Awwal
Digital imaging is essential to many industries, such
as remote sensing, entertainment, defense, and
biotechnology, and many processing techniques have
been developed over time. This Field Guide serves
as a resource for commonly used image-processing
concepts and tools; with this foundation, readers
will better understand how to apply these tools to
various problems encountered in the field. Topics
include filtering, time-frequency-domain processing,
and image compression, morphology, and restoration.

SPIE Field Guides


The aim of each SPIE Field Guide is to distill a major field of
optical science or technology into a handy desk or briefcase
reference that provides basic, essential information about
optical principles, techniques, or phenomena.
Written for you—the practicing engineer or scientist—
each field guide includes the key definitions, equations,
illustrations, application examples, design considerations,
methods, and tips that you need in the lab and in the field.
John E. Greivenkamp
Series Editor

P.O. Box 10
Bellingham, WA 98227-0010
ISBN: 9780819490216
SPIE Vol. No.: FG25

www.spie.org/press/fieldguides

FG25 covers and title.indd 2 3/8/12 11:03 AM

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy