Chapter 5 Data Compression
Chapter 5 Data Compression
Multimedia System
Storage Space –Basic knowledge about Lossy Sequential DCT based Mode
Storage Space
Steps of Lossy Sequential DCT- based
Coding Requirements- Basic knowledge Mode
about Coding Requirements
Expanded Lossy DCT based Mode
Source, Entropy and Hybrid Coding
Steps of Expanded Lossy DCT-based
Run length encoding Mode
Huffman Encoding JPEG and MPEG JPEG and MPEG
Arithmetic encoding (introduction only)
compression process
3
Need of Compression
Need for Data Compression
Data compression is the process of reducing the size of a data file or a dataset by re-encoding its contents using fewer bits than
the original representation.
Storage Efficiency
Reduces the amount of storage space needed for data.
Transmission Speed
Decreases the time required to transmit data over networks.
Cost Savings
Lowers costs associated with data storage and transmission.
Performance Enhancement
Improves system performance by reducing load times.
Coding Requirements
Images need more storage space than text.
Audio and video files need even more storage and also require higher data transfer
rates.
Multimedia files, such as images, audio, and video, can consume large amounts of
storage space.
Efficient coding and compression techniques are essential to reduce the file sizes,
making it easier to store more files on devices with limited storage capacity.
Compressing multimedia files has some limitations. When decompressing, the quality
should be as good as possible.
Multimedia coding requirements focus on balancing storage, transfer rates, and quality,
with specific considerations for different application modes.
7
Application Mode
Hybrid Coding
Hybrid Coding combines two or more coding techniques to improve the efficiency and
effectiveness of data compression.
The goal is to exploit the strengths of different coding methods to achieve better compression
ratios and performance.
Examples of Hybrid Coding:
JPEG Compression
Combines discrete cosine transform (DCT) for spatial domain transformation with Huffman coding for
entropy coding.
MP3 Compression
Uses a combination of perceptual coding, MDCT (Modified Discrete Cosine Transform), and Huffman
coding.
12
13
Compression
A method of data compression where the original data can be perfectly reconstructed
from the compressed data.
Examples: ZIP files, PNG images, FLAC audio.
The most common models based on lossless technique are-
RLE (Run Length Encoding)
Dictionary Coder (LZ77, LZ78, LZR, LZW, LZSS, LZMA, LZMA2)
Prediction by Partial Matching (PPM)
Deflate
Content Mixing
Cont.
Huffman Encoding
Adaptive Huffman Coding
Shannon Fano Encoding
Arithmetic Encoding
Lempel Ziv Welch Encoding
ZStandard
Bzip2 (Burrows and Wheeler)
Run Length Encoding(RLE)
Processing:
Move to the next character.
If it is different, write down the previous character and the count, then reset the count to 1.
Completion:
After reaching the end of the string, write down the last character and its count.
Examples:RLE
Syntax: X!Y
Where
x :Repeating Character
!-marking flag
Y –no of total count
OR
Character/Value Count
Suppose a string :AAAAABBBCCDAA
RLE : A!5B!3C!2D!1A!2
RLE
Advantages
Simple and efficient for data with long consecutive sequences; lossless
compression.
Disadvantages
Ineffective for data lacking long runs; limited applicability beyond specific
repetitive patterns.
i.e if non repeating run sequence is present in inputs it forms outputs bigger than
inputs so, it is ineffective for short sequence or non-repetative sequence.
Huffman Encoding
1.Frequency Calculation:
Count the occurrences of each character in your input data.
2.Priority Queue Initialization:
Create a priority queue (min-heap) based on the frequencies of the characters.
3.Construct Huffman Tree:
Combine nodes until only one node remains in the priority queue, using the two nodes with the lowest
frequencies each time.
3.Assign Codes:
Traverse the Huffman tree to assign binary codes (0 and 1) to each character. Characters that appear more
frequently will get shorter codes.
5.Encoding:
Replace each character in the original data with its corresponding Huffman code to create the
compressed data and print output.
22
Huffman Coding
Coding of character
C= 111
F= 110
B= 10
A=01
E=00
24
Huffman Coding
Class work
1. Perform Huffman coding on the string: AAAABBBCCDDDEEFFGGGHHHH.
2. Perform Huffman coding on the string: XXYYYZZZZZZZZZZWWWWWWWW.
3. Perform Huffman coding on the string:
MMMNNNOOOOPPPPPPQQQQQQQQ.
4. Perform Huffman coding on the string: RRRRRSSSSSTTTTTUUUUUVVVV.
5. Perform Huffman coding on the string:
KKKKLLLLMMMMMMNNNNNNNOOOO.
Arithmetic Encoding
Unlike Huffman coding which used a separate codeword for each
character, arithmetic coding yields a single codeword for each encoded
string of characters.
The first step is to divide the numeric range from 0 to 1 into a number of
different characters present in the message to be sent – including the
termination character – and the size of each segment by the probability of
the related character.
Examples: AE
Encode the message NEPAL using arithmetic encoding.
Suppose probability of each character N=E=P=A=L=0.2.
Now Initialization: Start with the interval [0, 1).
Encoding 'n':
Divide the interval [0, 1) into sub-intervals based on the probabilities of each character.
Interval for 'n': [0, 0.2)
Interval for 'e': [0.2, 0.4)
Interval for 'p': [0.4, 0.6)
Interval for 'a': [0.6, 0.8)
Interval for 'l': [0.8, 1)
The interval for 'n' is [0, 0.2).
27
(Contd.)
Encoding 'p' in the interval [0.04, Encoding 'a' in the interval [0.056,
0.08): 0.064):
Divide the interval [0.04, 0.08) into Divide the interval [0.056, 0.064) into
sub-intervals based on the sub-intervals based on the
probabilities. probabilities.
Interval for 'n': [0.04, 0.048) Interval for 'n': [0.056, 0.0576)
Interval for 'e': [0.048, 0.056) Interval for 'e': [0.0576, 0.0592)
Interval for 'p': [0.056, 0.064) Interval for 'p': [0.0592, 0.0608)
Interval for 'a': [0.064, 0.072) Interval for 'a': [0.0608, 0.0624)
Interval for 'l': [0.072, 0.08) Interval for 'l': [0.0624, 0.064)
The interval for 'p' is [0.056, 0.064).
(Contd.)
Retains all original data, allowing for Sacrifices some data for size reduction,
Data Integrity
perfect reconstruction. resulting in a close approximation.
Discrete Cosine
Transformation is applied to
each blocks separately.
The output of each DCT is an
8*8 matrix of DCT coefficients.
DCT element (0,0) is the
average value of the block.
The other element tells how
much spectral power is
present at each spatial
frequency.
3.Quantization
Entropy Encoding:
Quantization is followed by Entropy Encoding (using Huffman Encoding only).
DC coefficients are encoded by subtracting the DC coefficients of the previous unit.
Huffman coding is preferred as it is free .
Coding tables for each DC and AC coefficient must be provided.
AC coefficients are processed using a zigzag sequence.
DC Coefficients
These represent the average brightness or intensity level of a block of pixels after applying
the Discrete Cosine Transform (DCT).
AC Coefficients
These represent the variations or high-frequency components in the block of pixels after
DCT.
(Contd.)
2. Processing step: Uses various compression algorithms for the first step.
Interframe coding: Determining motion vectors for each 8x8 pixel block.
3. Quantization in Picture Processing
This can also be considered equivalent to the µ-law and A-law, which are used for audio data [JN84].
Results can be treated differently based on their importance (e.g., quantized with different numbers of bits).
45
Data compression
Major Steps of Data Compression
4. Entropy Coding
For example, long sequences of zeroes can be compressed by specifying the number of occurrences followed by
the zero.
After four compression steps, digital data is converted into a formatted data stream, which may include the image starting
point and interpretation type, and an error correction code may be added.
Decompression is the inverse process of compression. Specific coders and decoders can be implemented very differently.
JPEG Compression and Decompression
Process
Encoding Process
The Moving Picture Experts Group (MPEG) method is used to compress video.
In principle, a motion picture is a rapid sequence of a set of frames in which each frame is a
picture.
In other words, a frame is a spatial combination of pixels, and a video is a temporal
combination of frames that are sent one after another.
Compressing video, then, means spatially compressing each frame and temporally
compressing a set of frames.
53
Video compression – MPEG Encoding
MPEG
MPEG (stands for Moving Picture Experts Group) is also a joint ISO and CCITT working group for developing standards for
compressing still images.
MPEG uses technology defined in other standards, such as JPEG and H.261.
MPEG was formed by the ISO to formulate a set of standards relating to a range of multimedia applications that involve the
use of video with sound.
MPEG can deliver a data rate of at most 1856000 bits/second, which should not be exceeded.
Data rates for audio are between 32 and 448 Kbits/second; this data rate enables video and audio compression of
acceptable quality.
The MPEG standard exploits the other standards and they are
JPEG:
H.261:
H.261 video compression standard has been defined by the ITU-T for the provision of video telephony and
videoconferencing services over an Integrated Services Digital Network (ISDN).
It specifies two resolution formats with an aspect ratio of 4:3 are specified. Common Intermediate Format
(CIF) defines a luminance component of 288 lines, each with 352 pixels. The chrominance components have
a resolution with a rate of 144 lines and 176 pixels per line.
Quarter CIF (QCIF) has exactly half of the CIF resolution i.e., 76*144 pixels for the luminance and 88*72 pixels
for the other components.
55
Video compression – MPEG Encoding
Basic of Video Compression:
Images are having highly redundant data i.e. adjacent pixels are highly correlated.
Temporal Correlation – Adjacent frames are similar and changes are due to object motion
Predict a new frame from previous frame and specify the prediction error.
Prediction error can be coded by using image coding methods (e.g., JPEG)
Prediction from past frames are known as forward prediction.
Prediction error can be coded with fewer bits and regions can’t predicted well are coded directly.
56
Video compression – MPEG Encoding
Basic of Video Compression:
Previous of next frame is known as reference image.
Video compression makes use of motion compensation to predict a frame from the previous and/or next
frame.
Macro blocs: Compression works on a block of 16*16 pixels called macro blocks.
57
Video compression – MPEG Encoding
MPEG video uses video compression algorithms called Motion-Compensated Discrete Cosine Transform algorithms.
Frequency Domain Decomposition: It uses DCT to decompose spatial blocks of image data to exploit statistical and
perceptual spatial redundancy.
Variable-length Coding: It exploits the statistical redundancy in the symbol sequence resulting from quantization as well
as in various types of side information.
As far as audio compression is concerned the time-varying audio input signal is first sampled and quantized using PCM, the
sampling rate and number of bits per sample being determined by the specific application.
The bandwidth that is available for transmission is divided into a number of frequency sub-bands using a bank of analysis
function which because of their role, are also known as critical-band filters.
58
Video compression – MPEG Encoding
Video Encoding
Video is simply a sequence of digitized pictures. The video that MPEG expects to process is composed of a sequence of
frames or fields of luma and chroma.
Image Preparation
An image must consist of three components, luminance Y and Cr and Cb (similar to the YUV format).
The luminance component has twice as many samples in the horizontal and vertical axes as the other components, i.e.
there is color subsampling.
The resolution of the luminance component should not exceed 768×576 pixels. The pixel depth is eight bits in each
component.
MPEG provides 14 different aspect rations which are coded in data stream. Some of them are: A square pixel (1:1), for 625
line image (16:9, European HDTV), 702 X 575 pixel images (4:3).
The image refresh frequency is also encoded in the data stream So far, eight frequencies have been defined (23.976Hz,
24Hz, 25Hz, 29.97Hz, 30Hz, 50Hz, 59.94Hz,and 60Hz), so no low image refresh frequencies are permitted.
59
Video compression – MPEG Encoding
Temporal Prediction in Still Images
The technique that is used to exploit the high correlation between successive frames is to predict the content of many of
the frames.
Instead of sending the source video as a set of individually-compressed frames, just a selection is sent in this form and, for
the remaining frames, only the differences between the actual frame contents and the predicted frame contents are sent.
This operation is known as motion estimation and, since to indicate any small differences between the predicted and actual
positions of the moving segments involved. The latter is known as motion compensation.
60
Video compression – MPEG Encoding
Types of Image Frames in MPEG
There are two basic types of compressed frame: those that are encoded independently and those that are predicted. The
first are as intra-coded frames or I frames.
In practice, there are two types of predicted frames: predictive or P-frames and bidirectional or B-frames and because of
the way they are derived, the latter are also known as inter-coded or interpolation frames.
Each frame is treated as a separate (digitized) picture and the Y, Cb , and Cr matrices are encoded independently using
the JPEG algorithm.
I - frames must be present in the output stream at regular intervals in order to allow for the possibility of the contents of an
encoded I - frame being corrupted during transmission.
The number of frames/pictures between successive I-frames is known as a group of pictures or GOP.
61
Video compression – MPEG Encoding
P-frames (Predictive-Coded Frames):
Not Independent
In practice, the number of P-frames between each successive pair of I-frames is limited since any errors present in the first P-
frame will be propagated to the next.
The number of frames between a P-frame and the immediately preceding I- or P-frame is called the prediction span. It is
given the symbol M and typical values range from 1 through 3.
P-frames consist of I-frame macro blocks and six predictive macro blocks.
The coder must determine if a macro block should be coded predicatively or as a macro block of an I-frame and
furthermore, if there is a motion vector that must be encoded.
A P-frame can contain macro blocks that are encoded using the same technique as I-frames.
62
Video compression – MPEG Encoding
63
Video compression – MPEG Encoding
B frames (bidirectionally predictive coded)
Due to an unexpected movement in real scenes, the regret may not have good matching in previous frame.
So, b frames are coded with reference to both previous and future reference frames ( either I or P)
Motion estimation involves comparing small segments of two consecutive frames for differences and, a search is carried out
to determine to which neighboring segment the original segments has moved.
In order to minimize the time for each search, the search region is limited to just a few neighboring segments.
It is possible for a segment to have moved outside the search region. To allow for this possibility, in applications such as
movies, in addition to P-frames, second types of frames are used called B-frames.
A B frame is defined as the difference from a prediction based on a previous and a following I or P frame.
It cannot, however, ever serve as a reference for prediction coding of other pictures.
64
Video compression – MPEG Encoding
B frames (bidirectionally predictive coded pictures)
The content of the B-frames are predicted using search regions in both past and future frames.
In addition to allowing for occasional fast moving objects, this also provides better motion estimation when, for example, an
object moves in front of or behind or another object.
B-frames provide the highest level of compression and, because they are not involved in the coding of other frames, they
do not propagate errors.
D-frames are inserted at regular intervals throughout the stream. These are highly compressed frames and are ignored
during the decoding of P- and B-
compression algorithm, the DC coefficient associated with each 8 * 8 block pixels- both for the luminance and the two
chrominance signals- is the mean of all the values in the related block.
65
Video compression – MPEG Encoding
D-frames (DC-Coded Frames):
Hence by using only encoded DC coefficients of each block of pixels in the periodically inserted D-frames, a low-resolution
sequence of frames is provided each of which can be decoded at the higher speeds that are expected with the rewind
and fast-forward operations.
Quantization
Concerning quantization, it should be noted that AC-coefficients of B and P frames are usually very large values, whereas
those of I frames are very small.
Thus, MPEG quantization, quantization adjusts itself accordingly. If the data rate increases too much, quantization becomes
more coarse. If the data rate falls, then quantization is performed with finer granularity.
66
Video compression – MPEG Encoding
MPEG Compression Process Overview
Sampling:
Video and audio signals are sampled and converted into digital formats, dividing the video into frames and then into
smaller blocks of pixels.
Video and audio signals are sampled at regular intervals to produce digital audio signals.
Quantization:
Video and audio signals are quantized to reduce data precision and size.
Discrete Cosine Transform (DCT) is applied for video, and Modified Discrete Cosine Transform (MDCT) for audio.
Video and audio data are compressed using entropy coding techniques.
Bitstream Formatting:
Entropy Coding:
Video and audio data are compressed using entropy coding techniques.
Bitstream Formatting:
MPEG audio coding is compatible with compact Disc Digital Audio (CD-DA) and Digital Audio Tape (DAT) audio data
coding.
Important criterion : the choice of sample rate of 44.1kHz or 48kHz (additionally 32kHz) at 16bits per sample value.
Three quality levels (layers) defined with varying encoding and decoding complexity.
Higher layer implementation must decode MPEG audio signals of lower layers.
Fast Fourier Transform (FFT) applied for audio transformation into frequency domain.
Noise level in each sub-band determined using psychoacoustic model for quantization.
Audio compression can be used for speech or music. For speech, we need
to compress a 64 kHz digitized signal, while for music, we need to compress
a 1.411 MHz signal.
Two categories of techniques are used for audio compression:
predictive encoding and perceptual encoding.
70
Audio Compression
Thank you!