3-Discrete Cosine Transform PDF
3-Discrete Cosine Transform PDF
2 Prof P P Ghadekar
Transform Characteristic
Transforms are useful entities that encapsulate these characteristics
Data decorrelation
Data-independent basis functions
Fast implementation
Linearity
Orthogonality
3 Prof P P Ghadekar
Data decorrelation
The ideal transform completely decorrelates the data in a
sequence/block; i.e., it packs the most amount of energy in the
fewest number of coefficients.
In this way, many coefficients can be discarded after quantization
and prior to encoding.
It is important to note that the transform operation itself does not
achieve any compression.
It aims at decorrelating the original data and compacting a large
fraction of the signal energy into relatively few transform
coefficients.
4 Prof P P Ghadekar
Fast Implementation
The number of operations required for an n-point transform
is generally of the order O(n2).(order of n2 time complexity)
Some transforms have fast implementations, which reduce
the number of operations to O(n log n).
For a separable n × n 2-D transform, performing the row
and column 1-D transforms successively reduces the
number of operations from O(n4) to O(2n2 log n).
6 Prof P P Ghadekar
Linearity and Orthogonality
Linearity
Linearity principle allows one to one mapping between
pixel values and transform coefficients.
Orthogonality
Orthogonal transform have the feature of eliminating
redundancy in the transformed image.
7 Prof P P Ghadekar
Transform based Image Coding Scheme
Original
Image TTransformation
Quantiser Entropy
Enencoder Channel
m Encoder
Source Encoder
Transmission
Channel
Reconstructed
Image
8 Source Decoder
Prof P P Ghadekar
Transform based Image Coding Scheme
Quantisation
It is the process of reducing the number of possible values of a
quantity thereby reducing the number of bits needed to represent.
It is an irreversible process.
Entropy Encoder
The purpose of an entropy encoder is to reduce the number of
bits required to represent each symbol at the quantiser output.
Commonly used entropy coding techniques are Huffman coding,
Arithmetic coding, run length coding etc.
It is a lossless coding scheme.
9 Prof P P Ghadekar
Introduction
A Discrete cosine transform (DCT) expresses a sequence
of finitely many data points in terms of a sum of cosine
functions oscillating at different frequencies.
The discrete cosine transforms (DCT) and discrete sine
transform (DST) are members of a family of sinusoidal
unitary transforms.
They are real, orthogonal, and separable with fast algorithms
for its computation.
They have a great relevance to data compression.
The discrete cosine transform (DCT) is a Fourier-related
transform similar to the discrete Fourier transform (DFT),
but using only real numbers.
10 Prof P P Ghadekar
Introduction
It is equivalent to a DFT of roughly twice the length,
operating on real data with even symmetry.
There are eight standard variants, of which four are
common.
11 Prof P P Ghadekar
WHY TO USE DCT?
We use DCT to decorrelate the input signal efficiently.
12 Prof P P Ghadekar
DFT Basis Images
13 Prof P P Ghadekar
DCT Basis Images
14 Prof P P Ghadekar
Types of DCT
The family of discrete trigonometric transforms consists of 8 versions of DCT.
Each transform is identified as EVEN or ODD and of type I, II, III, and IV.
All present digital signal and image processing applications (mainly transform
coding and digital filtering of signals) involve only even types of the DCT and
DST.
Therefore, we consider these four even types of DCT.
15 Prof P P Ghadekar
Inverse transforms:
The inverse of DCT-I is DCT-I multiplied by 2/(n-1).
The inverse of DCT-IV is DCT-IV multiplied by 2/n. The
inverse of DCT-II is DCT-III multiplied by 2/n (and vice
versa).
Like for the DFT, the normalization factor in front of
these transform definitions is merely a convention and
differs between treatments.
For example, some authors multiply the transforms by
√(2/n) so that the inverse does not require any additional
multiplicative factor.
16 Prof P P Ghadekar
Why DCT-II is Common?
The DCT, and in particular the DCT-II, is often used in
signal and image processing, especially for lossy data
compression.
Because it has a strong "energy compaction" property.
Most of the signal information tends to be concentrated in
a few low-frequency components of the DCT, approaching
the Karhunen-Loève transform.
17 Prof P P Ghadekar
Why DCT-II is Common?
Performance of DCT-II is closest to the statistically optimal KLT
based on a number of performance criteria.
o Variance distribution,
o Energy packing efficiency,
o Residual correlation,
o Rate distortion,
o maximum reducible bits …
Exhibition of desirable characteristics for data compression
namely,
o Data decorrelation
o Data-independent basis functions
o Fast implementation
18 Prof P P Ghadekar
Why DCT-II is Common?
The importance of DCT II is further highlighted by its -
o Superiority in bandwidth compression (redundancy
reduction) of a wide range of signals.
o Powerful performance in the bit-rate reduction.
o Existence of fast algorithms for its implementation.
DCT-II and its inversion, DCT-III, have been employed in
the international image/video coding standards: e.g.: JPEG,
MPEG, H.261, H.263, H.264…
19 Prof P P Ghadekar
1-D DCT (DCT-II)
1-D DCT
2 N 1
(2i 1)k
X (k ) C (k ) x(i ) cos[ ]
N i 0 2N
1-D IDCT
N 1
(2i 1)k
2
x(i ) C (k ) X (k ) cos[ ]
N k 0
2N
k = 0, 1, 2, …, N-1.
and i = 0, 1, 2, …, N-1.
20 Prof P P Ghadekar
Orthogonality
C is orthogonal which Implies C-1 = CT and that
Entails:
CTC = CCT = I
This property is used to solve matrix equations easily.
21 Prof P P Ghadekar
1-D DCT
The discrete cosine transform, C, has one basic characteristic: it
is a real orthogonal matrix.
1 1 1
2 2 2
(2 n 1) π
2 cos π cos
3π
cos
C
n 2n 2n 2n
(n 1) π (n 1)3 π (n 1)(2 n 1) π
cos cos cos
2n 2n 2n
1 (n 1)
cos cos
2 2n 2n
1 3 (n 1)3
2 cos cos
C 1 C T
n 2 2n 2n
1 (2n 1) (n 1)(2n 1)
cos cos
2 2n 2n
22 Prof P P Ghadekar
Interpolation with DCT
What is Interpolation?
Interpolation is a method of constructing new data points
within the range of a discrete set of known data points.
23 Prof P P Ghadekar
Discrete Cosine Transform (DCT)
The most important tool for overcoming drawbacks of
the 2D DFT is the 2D DCT.
There are several definitions of this transform in practice
and here we will use the following (for the 1D DCT):
24 Prof P P Ghadekar
Inverse DCT
Inverse DCT is defined as:
25 Prof P P Ghadekar
2D DCT
There is not unique form of the 2D DCT. The simplest
realization technique is calculation of the 1D DCT
along rows and after that along columns of newly
obtained matrix.
However, there are alternative techniques for direct
evaluation of the 2D DCT. Again there are several
definitions of the 2D DCT that can be used in practice
but here we adopted:
26 Prof P P Ghadekar
Inverse 2D DCT
27 Prof P P Ghadekar
Mathematical Properties-DCT
DCT Matrices are real and orthogonal.
Linearity Property
M(αg + βf) = α Mg + βMf
For a matrix M, constants α and β, and vectors g and f, all DCTs are linear
transforms.
The Convolution-Multiplication Property
Convolution in the spatial domain is equivalent to taking an inverse
transform of the product of forward transforms of two data sequences.
The convolution — multiplication property is a powerful tool for
performing digital filtering in the transform domain.
28 Prof P P Ghadekar
Mathematical Properties-DCT
All DCTs are separable transforms.-Multidimensional transform
can be decomposed into successive application of one-
dimensional (1-D) transforms in the appropriate directions.
It is separable meaning that it can be separated into a pair of 1-D
DCTs.
To obtain the 2-D DCT of a block a 1-D DCT is first performed
on the rows of the block then a 1-D DCT is performed on the
columns of the resulting block.
The same applies to the IDCT.
This process is illustrated on the following slide.
29 Prof P P Ghadekar
30 Prof P P Ghadekar
Columns (256 pixels)
Raw Block of Pixels
128 127 130 128 134 130 128 128
pixel[n+1]
Part of a picture
-18 -1 1 0 2 -2 2 4 365 -1 -4 1 1 -3 1 3
11 -1 -5 2 0 0 1 -1 362 -3 -4 0 -3 -1 2 1
0 2 2 -3 3 -2-1 -2 3 359 -3 -3 0 -3 -3 0 -1
0 1 1 1 0 -1 -1 -2 368 1 1 -1 1 -3 -2 1
-4 -3 -3 -1 -3 -1 0 -2 370 0 -3 -3 -1 -1 -1 2
-2 0 -1 -2 -1 1 1 0 378 -3 -6 0 -3 -2 -1 -2
2 0 1 0 0 -1 -1 -1 383 -1 -6 2 -3 0 0 -3
31 Prof P P Ghadekar
DCT
For example, the DCT is used in JPEG image
compression, MJPEG video compression, and MPEG
video compression.
There, the two-dimensional DCT-II of 8x8 blocks is
computed and the results are quantized and entropy coded.
In this case, n is typically 8 and the DCT-II formula is
applied to each row and column of the block.
The result is an 8x8 transform coefficient array in which
the (0,0) element is the DC (zero-frequency) component
and entries with increasing vertical and horizontal index
values represent higher vertical and horizontal spatial
frequencies.
32 Prof P P Ghadekar
DCT– Matrix Form
DCT-Matrix Form
IDCT-Matrix Form
33 Prof P P Ghadekar
DCT Result
Prof P P Ghadekar 34
Example of a 4x4 DCT Matrix
Example of a 4x4 DCT Matrix
35 Prof P P Ghadekar
Example of a 4x4 DCT Matrix
N=4 point DCT can be generated by
c[n,m]= a[n] cos((2m+1)nП /8)
X=
37 Prof P P Ghadekar
Joint Photographic Expert Group
38 Prof P P Ghadekar
JPEG Coding
To perform the JPEG coding, an image (in colour or
gray scales) is first subdivided into blocks of 8x8
pixels.
The Discrete Cosine Transform (DCT) is then
performed on each block.
This generates 64 coefficients which are then
quantized to reduce their magnitude.
39 Prof P P Ghadekar
JPEG Coding
The coefficients are then reordered into a one-
dimensional array in a zigzag manner before further
entropy encoding.
The compression is achieved in two stages; the first is
during quantization and the second during the entropy
coding process.
JPEG decoding is the reverse process of coding.
40 Prof P P Ghadekar
JPEG Coding Block Diagram
41 Prof P P Ghadekar
JPEG-Block Preparation
Reasons for Block Processing
Taking DCT for an entire image requires large memory.
Taking DCT for an entire image is not a good idea for
compression due to spatially varying statistics within an image.
42 Prof P P Ghadekar
JPEG-Block Preparation
Luminance (Brightness)
Y=0.30R + 0.59G + 0.11B
Chrominance (Colour) PAL
U= -0.18R - 0.29G +0.44B
V=0.62R - 0.52G - 0.10B
43 Prof P P Ghadekar
JPEG-DCT
Each block of 64 pixels goes through a transformation called DCT.
To understand the nature of this transformation, let us consider the result of the
transformations for three different cases-
1)Uniform Grayscale
2)Two sections
3)Gradient Grayscale.
44 Prof P P Ghadekar
JPEG-DCT
Case-II Two Section
45 Prof P P Ghadekar
Quantization
After the T table is created, the values are quantized to reduce the
number of bits needed for encoding.
Quantization divides the number of bits by a constant and then
drops the fraction. This reduces the required number of bits even
more.
In most implementations, a quantizing table (8 by 8) defines how
to quantize each value.
The divisor depends on the position of the value in the T table.
This is done to optimize the number of bits and the number of 0s
for each particular application.
46 Prof P P Ghadekar
JPEG-Quantization
Quantization
47 Prof P P Ghadekar
Compression
After quantization the values are read from the table, and
redundant 0s are removed.
However, to cluster the 0s together, the process reads the table
diagonally in a zigzag fashion rather than row by row or column
by column.
The reason is that if the picture does not have fine changes, the
bottom right corner of the T table is all 0s.
JPEG usually uses run-length encoding at the compression phase
to compress the bit pattern resulting from the zigzag linearization.
48 Prof P P Ghadekar
Run Length Coding Result-20 1515 12 17 12 A 0 58
49 Prof P P Ghadekar
JPEG-Run length Encoding
A Zigzag scanning pattern is used to concentrate all the 0’s together. The runs of
0’s can be replaced by a single count (say,38 0’s).
Zigzag scanning
52 Prof P P Ghadekar
Zonal coding
Zonal coding
It is based on the fact that the transform coefficients of
maximum variance carry the most picture information.
The locations of the coefficients with the largest variances are
indicated by means of a zonal mask, which is same for all
blocks.
All transform coefficients in the zone are retained, while all
coefficient out side the zone are set to zero.
To design a zonal mask, variances of each coefficient can be
calculated based on a global image model, such as the Guass-
Markov model.
53 Prof P P Ghadekar
Bit Allocation in Zonal Coding
A simple bit allocation strategy is to choose the number of bits
proportional to the variance of each coefficient over the blocks.
If the number of retained coefficients are M with the variances
σi2 , i=1,2,3…….M. Then the number of bits allocated for each of
these coefficients is given by
54 Prof P P Ghadekar
Zonal coding
Information theory says coefficients with maximum variance carry
the most information.
15 out of 64 transform coefficients , with largest variance, are kept
keep
1 1 1 1 1 0 0 0
1 1 1 1 0 0 0 0
1 1 1 0 0 0 0 0
1 1 0 0 0 0 0 0 discard
1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0
Prof P P Ghadekar
0 0 0 0 0
55
Quantization
Retained coefficients are quantized and coded
Coefficients are either normalized by their standard
deviation and uniformly quantized.
56 Prof P P Ghadekar
Bit allocation
How many bits should be assigned to each coefficient?
Information theory tells us, that a Gaussian random variable,
subject to distortion D, cannot be represented by fewer bits.
Large variance coefficients need more bits. Small Variance
coefficients need less bits.
1 2
log 2 bits
2 D
57 Prof P P Ghadekar
Bit allocation table
8 7 6 4 3 2 1 0
7 6 5 4 3 2 1 0
6 5 4 3 2 1 1 0
4
3
2
1
0
58 Prof P P Ghadekar
Problem with zonal coding
If we consider a fixed area we run the risk of cutting off
occasional transform coefficients that are large enough to
be included
excluded
59 Prof P P Ghadekar
Threshold coding
One solution is to threshold coefficients and possibly use
different thresholds for different subimages.
This way we will end up keeping those coefficients that
exceed threshold, wherever they might be.
60 Prof P P Ghadekar
Threshold Coding
In most images, different blocks have different spectral and
statistical characteristic.
So it is necessary to use adaptive bit allocation methods.
In threshold coding, each transform coefficient is compared
with a threshold.
If it is smaller than the threshold then it is set to zero.
If it is larger than the threshold, it will be retained for
quantization and encoding.
The thresholding method is an adaptive method where only
those coefficient whose magnitude are above a threshold are
retained within each block.
61 Prof P P Ghadekar
Quantiser
The purpose of quantisation is to remove the components of the
transformed data that are unimportant to the visual appearance of
the image and to retain the visually important components.
The quantiser utilizes the fact that the human eye is unable to
perceive some visual information in an image.
Such information is deemed redundant and can be discarded
without noticeable visual artefacts.
Such redundancy is referred to as psychovisual redundancy.
The process of assigning a particular sample to a particular level is
called quantisation.
It is non-linear and irreversible. It is a lossy compression Scheme.
Rounding the number to the nearest integer can be represented
with minimum number of bits
62 Prof P P Ghadekar
Picking the N-largest transform coefficients
In a 2D array, how do we pick N-largest coefficients.
Answer: zig-zag scan. Coefficients are rearranged in a 1D-
sequence
0 1 5 6 14 15 27 28
2 4 7 13 16 26 29
3 8 12 17 25 30
9 11 18 24 31
10 19 23 32
20 22 33
21 34
35
63 Prof P P Ghadekar
Threshold details
How do we threshold the transform coefficients?
There are 3 ways
Single global threshold for all subimages
Different thresholds for different subimages
Different threshold for different locations in the subimage
64 Prof P P Ghadekar
Image Compression
Image compression is a method that reduces the amount of
memory it takes to store in image.
We will exploit the fact that the DCT matrix is based on our
visual system for the purpose of image compression.
67 Prof P P Ghadekar
Image Compression
Now we have found the matrix Y = C(CXT)T
68 Prof P P Ghadekar
Image Compression
8 x 8 Pixels Image
69 Prof P P Ghadekar
Image Compression
Gray-Scale Example
Value Range 0 (black) --- 255 (white)
63 33 36 28 63 81 86 98
27 18 17 11 22 48 104 108
72 52 28 15 17 16 47 77
132 100 56 19 10 9 21 55
187 186 166 88 13 34 43 51
184 203 199 177 82 44 97 73
211 214 208 198 134 52 78 83
211 210 203 191 133 79 74 86
X
70 Prof P P Ghadekar
Image Compression
2D-DCT of matrix
Numbers are coefficients of
polynomial
Y
71 Prof P P Ghadekar
Image Compression
Cut the least significant components
As you can see, we save a little over half the original memory.
72 Prof P P Ghadekar
Reconstructing the Image
New Matrix and Compressed Image
55 41 27 39 56 69 92 106
35 22 7 16 35 59 88 101
65 49 21 5 6 28 62 73
130 114 75 28 -7 -1 33 46
180 175 148 95 33 16 45 59
200 206 203 165 92 55 71 82
205 207 214 193 121 70 75 83
214 205 209 196 129 75 78 85
73 Prof P P Ghadekar
Can You Tell the Difference?
Original Compressed
74 Prof P P Ghadekar
Image Compression
Original Compressed
75 Prof P P Ghadekar
Linear Quantization
We will not zero the bottom half of the matrix.
The idea is to assign fewer bits of memory to store
information in the lower right corner of the DCT matrix.
76 Prof P P Ghadekar
Linear Quantization
Q=p* 8 16 24 32 40 48 56 64
16 24 32 40 48 56 64 72
24 32 40 48 56 64 72 80
32 40 48 56 64 72 80 88
40 48 56 64 72 80 88 96
48 56 64 72 80 88 96 104
56 64 72 80 88 95 104 112
64 72 80 88 96 104 112 120
77 Prof P P Ghadekar
Linear Quantization
p is called the loss parameter
78 Prof P P Ghadekar
Linear Quantization
We divide the each entry in the DCT matrix by the
Quantization Matrix
79 Prof P P Ghadekar
Linear Quantization
p=1 p=4
-38 13 4 -2 0 0 0 0 -9 3 1 -1 0 0 0 0
-20 -11 2 2 0 0 0 0
-5 -3 1 0 0 0 0 0
1 -1 0 0 0 0 0 0
4 -3 -2 0 0 0 0 0 1 0 0 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
80 Prof P P Ghadekar
Linear Quantization
p=1 p=4
81 Prof P P Ghadekar
Linear Quantization
p=1 p=4
82 Prof P P Ghadekar
Properties of DCT
Decorrelation - The principle advantage of image
transformation is the removal of redundancy between
neighboring pixels. This leads to uncorrelated transform
coefficients which can be encoded independently.
Energy Compaction - DCT exhibits excellent energy
compaction for highly correlated images. The uncorrelated
image has its energy spread out, whereas the energy of the
correlated image is packed into the low frequency region.
83 Prof P P Ghadekar
Orthogonality - IDCT basis functions are orthogonal .
Thus, the inverse transformation matrix of A is equal to its
transpose i.e. invA= A'.
Separability – Perform DCT operation in any of the
direction first and then apply on second direction,
coefficient will not change.
84 Prof P P Ghadekar
Applications
The DCT is used in JPEG image compression, MJPEG
video compression, and MPEG video compression.
A related transform, the modified discrete cosine
transform, or MDCT, is used in MP3 audio compression.
DCTs are also widely employed in solving partial
differential equations by spectral methods, where the
different variants of the DCT correspond to slightly
different even/odd boundary conditions at the two ends of
the array.
85 Prof P P Ghadekar
Advantages of DCT
The DCT does a better job of concentrating energy into
lower order coefficients than does the DFT for image data.
The DCT is purely real, the DFT is complex.
Assuming a periodic input, the magnitude of the DFT
coefficients is spatially invariant. This is not true for the
DCT.
86 Prof P P Ghadekar
Compared with DFT, DCT has two main advantages
It’s a real transform with better computational efficiency
than DFT which by definition is a complex transform.
It does not introduce discontinuity while imposing
periodicity in the time signal.
In DFT, as the time signal is truncated and assumed
periodic, discontinuity is introduced in time domain and
some corresponding artifacts is introduced in frequency
domain.
But as even symmetry is assumed while truncating the time
signal, no discontinuity and related artifacts are introduced
in DCT.
87 Prof P P Ghadekar
Drawbacks of DCT
Blocking artefacts
Graininess
Blurring
Blocking Artefact mean there is variations in overall
characteristics from one block to another block i.e. one block
is having different DC level and another block is also having
DC level. This gives rise to some checkerboard effect.
Graininess-It causes due to truncation of low spectral
coefficient or lack of details.
Blurring effect-It is due to truncation of high frequency
components.
88 Prof P P Ghadekar
JPEG 2000-Block Diagram
89 Prof P P Ghadekar
Thanks