0% found this document useful (0 votes)
82 views58 pages

Unit-Iii: Source Coding: Image and Video

The document discusses image and video compression formats. It focuses on explaining the GIF, TIFF, SIF, CIF and QCIF image formats. It then provides details on JPEG and MPEG video compression standards. Specifically, it describes the stages of JPEG image compression including image preparation, forward discrete cosine transform (DCT), quantization, entropy encoding, and frame building. It also includes a schematic of the JPEG encoder and mentions that the JPEG decoder consists of corresponding decoding stages.

Uploaded by

Ashmi Shaji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views58 pages

Unit-Iii: Source Coding: Image and Video

The document discusses image and video compression formats. It focuses on explaining the GIF, TIFF, SIF, CIF and QCIF image formats. It then provides details on JPEG and MPEG video compression standards. Specifically, it describes the stages of JPEG image compression including image preparation, forward discrete cosine transform (DCT), quantization, entropy encoding, and frame building. It also includes a schematic of the JPEG encoder and mentions that the JPEG decoder consists of corresponding decoding stages.

Uploaded by

Ashmi Shaji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 58

UNIT-III

SOURCE CODING: IMAGE AND VIDEO


Image and Video Formats GIF, TIFF, SIF, CIF, QCIF
Image compression: READ, JPEG Video Compression:
Principles-I,B,P frames, Motion estimation, Motion
compensation, H.261, MPEG standard

IMAGE COMPRESSION

Image Compression GIF


Although colour images comprising 24-bit pixels are
supported GIF reduces the number of possible colours that are
present by choosing 256 entries from the original set of 224
colours that match closely to the original image

Hence instead of sending as 24-bit colour values only 8-bit


index to the table entry that contains the closest match to the
original is sent. This results in a 3:1 compression ratio
The contents of the table are sent in addition to the screen
size and aspect ratio information
The image can also be transferred over the network using the
interlaced mode

Image Compression GIF


Compression Dynamic mode using
LZW coding

The LZW can be used to obtain further levels of compression

Image Compression GIF


interlaced mode
1/8 and
1/8 of the
total
compress
ed image

GIF also allows an image to be stored and subsequently


transferred over the network in an interlaced mode; useful
over either low bit rate channels or the Internet which
provides a variable transmission rate

Image Compression GIF


interlaced mode

Further and
remaining
of the image

The compression image data is organized so that the


decompressed image is built up in a progressive way as the
data arrives

Tagged Image File


Format(TIFF)

Transfer of images/digitized documents


Supports 48 bits of pixel resolution
16 bits are used for each R,G,B colors
Code number indicates particular
format
Code 1- uncompressed format
Codes 2,3,4- digitized format
Code 5- LZW format

SIF
Source Intermediate Format (SIF) defined
in MPEG-1, is a video format that was
developed to allow the storage and
transmission of digital video.
SIF format- picture quality comparable with
that obtained from VCR.
Uses half the spatial resolution in the 4:2:0
format subsampling.
Uses half the refresh rate -temporal resolution
Frame refresh rate- 30 Hz for 525 line system
and 25 Hz for 625 line system.

CIF (Common Intermediate


Format)
CIF (Common Intermediate Format), also
known as FCIF (Full Common Intermediate
Format), is a format used to standardize
the horizontal and vertical resolutions in
pixels of YCbCr sequences in video signals.
It is commonly used in video
teleconferencing systems. It was first
proposed in the H.261 standard.

QCIF

Lower resolution QCIF-Sub QCIF


Y=128 x
96
Cb=Cr=64 x48

Image Compression JPEG encoder


schematic

The Joint Photographic Experts Group forms the


basis of most video compression algorithms

Image Compression Image


Preparation

Block preparation is necessary since computing


the transformed value for each position in a
matrix requires the values in all the locations to
be processed

Image Compression Image/block


preparation
Source image is made up of one or more 2-D matrices
of values

2-D matrix is required to store the required set of 8bit grey-level values that represent the image
For the colour image if a CLUT is used then a single
matrix of values is required
If the image is represented in R, G, B format then
three matrices are required
If the Y, Cr, Cb format is used then the matrix size for
the chrominance components is smaller than the Y
matrix ( Reduced representation)

JPEG Image/block preparation


Once the image format is selected then the values in each matrix are
compressed separately using the DCT
In order to make the transformation more efficient a second step known
as block preparation is carried out before DCT
In block preparation each global matrix is divided into a set of smaller
8X8 submatrices (block) which are fed sequentially to the DCT

Image Compression Image


Preparation

Once the source image format has been selected


and prepared (four alternative forms of
representation), the set values in each matrix are
compressed separately using the DCT)

Forward DCT
Each pixel value is quantized using 8 bits which
produces a value in the range 0 to 255 for the R, G, B
or Y and a value in the range 128 to 127 for the two
chrominance values Cb and Cr
If the input matrix is P[x,y] and the transformed
matrix is F[i,j] then the DCT for the 8X8 block is
computed using the expression:

1
(2 x 1)i (2 y 1) j
F [i, j ] C (i)C ( j ) P[ x, y ] cos
cos
4
16
16
7

x 0 y 0

Image Compression Image


Preparation

The values are first centred around zero by substracting 128


from each intensity/luminance value

Forward DCT
All 64 values in the input matrix P[x,y] contribute to
each entry in the transformed matrix F[i,j]

For i = j = 0 the two cosine terms are 0 and hence


the value in the location F[0,0] of the transformed
matrix is simply a function of the summation of all the
values in the input matrix
This is the mean of all 64 values in the matrix and is
known as the DC coefficient
Since the values in all the other locations of the
transformed matrix have a frequency coefficient
associated with them they are known as AC
coefficients

Image Compression Forward DCT


for j = 0 only the horizontal frequency coefficients are
present

for i = 0 only the vertical frequency components are


present
For all the other locations both the horizontal and
vertical frequency coefficients are present

Image Compression
Quantization
In addition to classifying the spatial frequency
components the quantization process aims to reduce
the size of the DC and AC coefficients so that less
bandwidth is required for their transmission (by using a
divisor)

The sensitivity of the eye varies with spatial


frequency and hence the amplitude threshold below
which the eye will detect a particular frequency also
varies
The threshold values vary for each of the 64 DCT
coefficients and these are held in a 2-D matrix known

Image Compression Example


computation of a set of quantized
DCT coefficients

Image Compression
Quantization

From the quantization table and the DCT and


quantization coefficents number of observations can be
made:
- The computation of the quantized coefficients
involves rounding the quotients to the nearest integer
value
- The threshold values used increase in magnitude
with increasing spatial frequency
- The DC coefficient in the transformed matrix is
largest
- Many of the higher frequency coefficients are zero

Image Compression Entropy


Encoding
Entropy encoding consists
of four stages
Vectoring

The entropy encoding operates on a one-dimensional string


of values (vector). However the output of the quantization is a 2-D matrix
and hence this has to be represented in a 1-D form. This is known as
vectoring
Differential encoding In this section only the difference in magnitude of
the DC coefficient in a quantized block relative to the value in the
preceding block is encoded. This will reduce the number of bits required
to encode the relatively large magnitude
The difference values are then encoded in the form (SSS, value) SSS
indicates the number of bits needed and actual bits that represent the
value
e.g: if the sequence of DC coefficients in consecutive quantized blocks
was: 12, 13, 11, 11, 10, --- the difference values will be 12, 1, -2, 0, -1

run length encoding


The remaining 63 values in the vector are the AC
coefficients

Because of the large number of 0s in the AC


coefficients they are encoded as string of pairs of
values
Each pair is made up of (skip, value) where skip is the
number of zeros in the run and value is the next non-zero coefficient

The above will be encoded as


(0,6) (0,7) (0,3)(0,3)(0,3) (0,2)(0,2)(0,2)(0,2)(0,0)
Final pair indicates the end of the string for this block

Huffman encoding
Significant levels of compression can be
obtained by replacing long strings of binary
digits by a string of much shorter codewords

The length of each codeword is a function of


its relative frequency of occurrence
Normally, a table of codewords is used with
the set of codewords precomputed using the
Huffman coding algorithm

Image Compression Frame


Building
In order for the remote computer to interpret all the
different fields and tables that make up the bitstream it
is necessary to delimit each field and set of table
values in a defined way

The JPEG standard includes a definition of the


structure of the total bitstream relating to a particular
image/picture. This is known as a frame
The role of the frame builder is to encapsulate all the
information relating to an encoded image/picture

Image Compression Frame


Building
At the top level the complete frame-plus-header is
encapsulated between a start-of-frame and an end-offrame delimiter which allows the receiver to determine
the start and end of all the information relating to a
complete image

The frame header contains a number of fields


- the overall width and height of the image in pixels
- the number and type of components (CLUT, R/G/B,
Y/Cb/Cr)
- the digitization format used (4:2:2, 4:2:0 etc.)

Image Compression Frame


Building
At the next level a frame consists of a number of
components each of which is known as a scan

The level two header contains fields that include:


- the identity of the components
- the number of bits used to digitize each component
- the quantization table of values that have been
used to encode each component
Each scan comprises one or more segments each of
which can contain a group of (8X8) blocks preceded by
a header
This contains the set of Huffman codewords for each
block

Image Compression JPEG


decoder

A JPEG decoder is made up of a number of


stages which are simply the corresponding
decoder sections of those used in the encoder

The JPEG decoder is made up of a number of stages


which are the corresponding decoder sections of those
used in the encoder

The frame decoder first identifies the encoded


bitstream and its associated control information and
tables within the various headers
It then loads the contents of each table into the
related table and passes the control information to the
image builder
Then the Huffman decoder carries out the
decompression operation using preloaded or the
default tables of codewords

The two decompressed streams containing the DC


and AC coefficients of each block are then passed to
the differential and run-length decoders

The resulting matrix of values is then dequantized


using either the default or the preloaded values in the
quantization table
Each resulting block of 8X8 spatial frequency
coefficient is passed in turn to the inverse DCT which
in turn transforms it back to their spatial form
The image builder then reconstructs the image from
these blocks using the control information passed to it
by the frame decoder

Although complex using JPEG compression ratios of


20:1 can be obtained while still retaining a good quality
image

This level (20:1) is applied for images with few colour


transitions
For more complicated images compression ratios of
10:1 are more common
Like GIF images it is possible to encode and rebuild
the image in a progressive manner. This can be
achieved by two different modes progressive mode
and hierarchical mode

Progressive mode First the DC and lowfrequency coefficients of each block are sent
and then the high-frequency coefficients

hierarchial mode in this mode, the total


image is first sent using a low resolution e.g
320 X 240 and then at a higher resolution 640
X 480

Video Compression

One approach to compressing a video


source is to apply the JPEG algorithm to
each frame independently. This is known as
moving JPEG or MJPEG
There are two types of compressed frames
- Those that are compressed
independently (I- frames)
- Those that are predicted (P-frame and
B-frame)

Video Compression Example frame


sequences I and P frames

In the context of compression, since video is simply a


sequence of digitized pictures, video is also referred to
as moving pictures and the terms frames and picture
are used interchangeably

I frames
I-frames (Intracoded frames) are encoded without reference to
any other frames. Each frame is treated as a separate picture
and the Y, Cr and Cb matrices are encoded separately using
JPEG
Iframes the compression level is small
They are good for the first frame relating to a new scene in a
movie
I-frames must be repeated at regular intervals to avoid losing
the whole picture as during transmission it can get corrupted
and hence looses the frame
The number of frames/pictures between successive I-frames is
known as a group of pictures (GOP). Typical values of GOP are 3
- 12

P frames
The encoding of the P-frame is relative to the
contents of either a preceding I-frame or a
preceding P-frame.
P-frames are encoded using a combination of
motion estimation and motion compensation
The accuracy of the prediction operation is
determined by how well any movement between
successive frames is estimated. This is known as
the motion estimation
Since the estimation is not exact, additional
information must also be sent to indicate any
small differences between the predicted and
actual positions of the moving segments involved.
This is known as the motion compensation
No of P frames between I-frames is limited to
avoid error propagation

Frame Sequences I-, P- and Bframes

Each frame is treated as a separate (digitized) picture and the Y, Cb and


Cr matrices are encoded independently using the JPEG algorithm (DCT,
Quantization, entropy encoding) except that the quantization threshold
values that are used are the same for all DCT coefficients

PB-Frames

A fourth type of frame known as PB-frame has also been


defined; it does not refer to a new frame type as such but rather
the way two neighbouring P- and B-frames are encoded as if they
were a single frame

Video Compression
Motion estimation involves comparing small segments of
two consecutive frames for differences and should a
difference be detected a search is carried out to determine
which neighbouring segments the original segment has
moved.
To limit the time for search the comparison is limited to few
segments
Works well in slow moving applications like video telephony
For fast moving video it will not work effectively. Hence Bframes (Bi-directional) are used. Their contents are
predicted using the past and the future frames.
B- frames provides highest level of compression and
because they are not involved in the coding of other frames
they do not propagate errors

P-frame encoding

The digitized contents of the Y matrix associated with each


frame are first divided into a two-dimensional matrix of 16 X 16
pixels known as a macroblock

P-frame encoding
4 DCT blocks for the luminance signals in the
example here and 1 each for the two chrominance
signals are used
To encode a p-frame the contents of each
macroblock in the frame known as the target
frame are compared on a pixel-by-pixel basis with
the contents of the I or P frames (reference frames)
If a close match is found then only the address of
the macroblock is encoded
If a match is not found the search is extended to
cover an area around the macroblock in the
reference frame

P-frame encoding

To encode a P-frame, the contents of each macroblock


in the frame (target frame) are compared on a pixel-bypixel basis with the contents of the corresponding
macroblock in the preceeding I- or P-frame

B-frame encoding

B-frame encoding
To encode B-frame any motion is estimated with
reference to both the preceding I or P frame and
the succeeding P or I frame
The motion vector and difference matrices are
computed using first the preceding frame as the
reference frame and then the succeeding frame
as the reference
Third motion vectors and set of difference ,
matrices are then computed using the target and
the mean of the two other predicted set of
values
The set with the lowest set of difference matrices
is chosen and is encoded

Decoding of I, P, and B
frames

I-frames decode immediately to recreate original frame


P-frames: the received information is decoded and the
resulting information is used with the decoded
contents of the preceding I/P frames (two buffers are
used)
B-frames :the received information is decoded and the
resulting information is used with the decoded
contents of the preceding and succeeding P or I frame
(three buffers are used)
PB-frame
A new frame type showing how two neighbouring P
and B frames are encoded as if they were a single
frame

MPEG
MPEG-1 ISO Recommendation 11172 uses resolution
of 352x288 pixels and used for VHS quality audio
and video on CD-ROM at a bit rate of 1.5 Mbps
MPEG-2 ISO Recommendation 13818
Used in recording and transmission of studio quality
audio and video. Different levels of video resolution
possible
Low: 352X288 comparable with MPEG-1
Main: 720X 576 pixels studio quality video and
audio, bit rate up to 15 Mbps
High: 1920X1152 pixels used in wide screen HDTV
bit rate of up to 80Mbps are possible

MPEG
MPEG-4: Used for interactive multimedia
applications over the Internet and over various
entertainment networks
MPEG standard contains features to enable a user
not only to passively access a video sequence
using for example the start/stop/ but also enables
the manipulation of the individual elements that
make up a scene within a video
In MPEG-4 each video frame is segmented into a
number of video object planes (VOP) each of
which will correspond to an AVO (Audio visual
object) of interest
Each audio and video object has a separate object
descriptor associated with it which allows the
object providing the creator of the audio and /or
video has provided the facility to be manipulated
by the viewer prior to it being decoded and played
out

Video Compression MPEG-1 video


bitstream structure: composition

The compressed bitstream produced by the video encoder is


hierarchical: at the top level, the complete compressed video
(sequence) which consists of a string of groups of pictures

Video Compression MPEG-1 video


bitstream structure: format

In order for the decoder to decompress the received


bitstream, each data structure must be clearly identified
within the bitstream

Video Compression MPEG-4 coding


principles

Content based video coding principles showing how a


frame/scene is defined in the form of multiple video
object planes

Video Compression MPEG 4


encoder/decoder schematic

Before being compressed each scene is defined in the


form of a background and one or more foreground
audio-visual objects (AVOs)

Video Compression MPEG VOP


encoder

The audio associated with an AVO is compressed using


one of the algorithms described before and depends on
the available bit rate of the transmission channel and the
sound quality required

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy