Unit-Iii: Source Coding: Image and Video
Unit-Iii: Source Coding: Image and Video
IMAGE COMPRESSION
Further and
remaining
of the image
SIF
Source Intermediate Format (SIF) defined
in MPEG-1, is a video format that was
developed to allow the storage and
transmission of digital video.
SIF format- picture quality comparable with
that obtained from VCR.
Uses half the spatial resolution in the 4:2:0
format subsampling.
Uses half the refresh rate -temporal resolution
Frame refresh rate- 30 Hz for 525 line system
and 25 Hz for 625 line system.
QCIF
2-D matrix is required to store the required set of 8bit grey-level values that represent the image
For the colour image if a CLUT is used then a single
matrix of values is required
If the image is represented in R, G, B format then
three matrices are required
If the Y, Cr, Cb format is used then the matrix size for
the chrominance components is smaller than the Y
matrix ( Reduced representation)
Forward DCT
Each pixel value is quantized using 8 bits which
produces a value in the range 0 to 255 for the R, G, B
or Y and a value in the range 128 to 127 for the two
chrominance values Cb and Cr
If the input matrix is P[x,y] and the transformed
matrix is F[i,j] then the DCT for the 8X8 block is
computed using the expression:
1
(2 x 1)i (2 y 1) j
F [i, j ] C (i)C ( j ) P[ x, y ] cos
cos
4
16
16
7
x 0 y 0
Forward DCT
All 64 values in the input matrix P[x,y] contribute to
each entry in the transformed matrix F[i,j]
Image Compression
Quantization
In addition to classifying the spatial frequency
components the quantization process aims to reduce
the size of the DC and AC coefficients so that less
bandwidth is required for their transmission (by using a
divisor)
Image Compression
Quantization
Huffman encoding
Significant levels of compression can be
obtained by replacing long strings of binary
digits by a string of much shorter codewords
Progressive mode First the DC and lowfrequency coefficients of each block are sent
and then the high-frequency coefficients
Video Compression
I frames
I-frames (Intracoded frames) are encoded without reference to
any other frames. Each frame is treated as a separate picture
and the Y, Cr and Cb matrices are encoded separately using
JPEG
Iframes the compression level is small
They are good for the first frame relating to a new scene in a
movie
I-frames must be repeated at regular intervals to avoid losing
the whole picture as during transmission it can get corrupted
and hence looses the frame
The number of frames/pictures between successive I-frames is
known as a group of pictures (GOP). Typical values of GOP are 3
- 12
P frames
The encoding of the P-frame is relative to the
contents of either a preceding I-frame or a
preceding P-frame.
P-frames are encoded using a combination of
motion estimation and motion compensation
The accuracy of the prediction operation is
determined by how well any movement between
successive frames is estimated. This is known as
the motion estimation
Since the estimation is not exact, additional
information must also be sent to indicate any
small differences between the predicted and
actual positions of the moving segments involved.
This is known as the motion compensation
No of P frames between I-frames is limited to
avoid error propagation
PB-Frames
Video Compression
Motion estimation involves comparing small segments of
two consecutive frames for differences and should a
difference be detected a search is carried out to determine
which neighbouring segments the original segment has
moved.
To limit the time for search the comparison is limited to few
segments
Works well in slow moving applications like video telephony
For fast moving video it will not work effectively. Hence Bframes (Bi-directional) are used. Their contents are
predicted using the past and the future frames.
B- frames provides highest level of compression and
because they are not involved in the coding of other frames
they do not propagate errors
P-frame encoding
P-frame encoding
4 DCT blocks for the luminance signals in the
example here and 1 each for the two chrominance
signals are used
To encode a p-frame the contents of each
macroblock in the frame known as the target
frame are compared on a pixel-by-pixel basis with
the contents of the I or P frames (reference frames)
If a close match is found then only the address of
the macroblock is encoded
If a match is not found the search is extended to
cover an area around the macroblock in the
reference frame
P-frame encoding
B-frame encoding
B-frame encoding
To encode B-frame any motion is estimated with
reference to both the preceding I or P frame and
the succeeding P or I frame
The motion vector and difference matrices are
computed using first the preceding frame as the
reference frame and then the succeeding frame
as the reference
Third motion vectors and set of difference ,
matrices are then computed using the target and
the mean of the two other predicted set of
values
The set with the lowest set of difference matrices
is chosen and is encoded
Decoding of I, P, and B
frames
MPEG
MPEG-1 ISO Recommendation 11172 uses resolution
of 352x288 pixels and used for VHS quality audio
and video on CD-ROM at a bit rate of 1.5 Mbps
MPEG-2 ISO Recommendation 13818
Used in recording and transmission of studio quality
audio and video. Different levels of video resolution
possible
Low: 352X288 comparable with MPEG-1
Main: 720X 576 pixels studio quality video and
audio, bit rate up to 15 Mbps
High: 1920X1152 pixels used in wide screen HDTV
bit rate of up to 80Mbps are possible
MPEG
MPEG-4: Used for interactive multimedia
applications over the Internet and over various
entertainment networks
MPEG standard contains features to enable a user
not only to passively access a video sequence
using for example the start/stop/ but also enables
the manipulation of the individual elements that
make up a scene within a video
In MPEG-4 each video frame is segmented into a
number of video object planes (VOP) each of
which will correspond to an AVO (Audio visual
object) of interest
Each audio and video object has a separate object
descriptor associated with it which allows the
object providing the creator of the audio and /or
video has provided the facility to be manipulated
by the viewer prior to it being decoded and played
out