0% found this document useful (0 votes)

197 views51 pages

Data Compression (RCS 087)

This document provides an overview of data compression techniques. It discusses lossless compression methods like Run Length Encoding (RLE) and Huffman Coding, as well as lossy compression methods. The key differences between lossless and lossy compression are explained. Lossless compression allows for exact reconstruction of the original data, while lossy compression results in some loss of information to achieve higher compression ratios. Huffman Coding is described as a lossless method that assigns variable-length codes to characters based on their frequency, with more common characters getting shorter codes.

Uploaded by

sakshi mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

197 views51 pages

Data Compression (RCS 087)

Uploaded by

sakshi mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 51

DATA COMPRESSION (RCS 087)

Unit-I
Introduction
• Data Compression
• Compression Techniques
• Loss less compression
• Lossy Compression
• Measures of performance
• Modeling and coding
• Mathematical Preliminaries for Lossless compression
• A brief introduction to information theory
• Models: Physical models
• Probability models
• Markov models
• composite source model
• Coding: uniquely decodable codes
• Prefix codes
Data Compression

• Data Compression is also referred to as bit-rate reduction or source

coding. This technique is used to reduce the size of large files.
• DC stands for Data Compression. DC is a digital signal process in
which data to be transmitted is compressed to reduce the storage amount
in bits. In other words, you can say that data storage space is reduced
than usual after applying DC. Data transmission greatly reduces data
storage space and transmission capacity. It is also known as source
coding or bit-rate reduction. Database management system, backup
utilities, etc use data compression method widely. There are many file
compression methods but ZIP and ARC are mostly known file formats.
Data Compression
• Data compression is the process of modifying, encoding or converting
the bits structure of data in such a way that it consumes less space on
disk.
• It enables reducing the storage size of one or more data instances or
elements. Data compression is also known as source coding or bit-rate
reduction.

• The advantage of data compression is that it helps us save our disk

space and time in the data transmission.
Data Compression
Compression Techniques

There are mainly two types of data compression techniques –

1.Lossless Data Compression
2.Lossy Data Compression
Compression Techniques
Lossless data compression

• Lossless data compression is used to compress the files without losing

an original file's quality and data. Simply, we can say that in lossless
data compression, file size is reduced, but the quality of data remains the
same.
• The main advantage of lossless data compression is that we can restore
the original data in its original form after the decompression.
• Lossless data compression mainly used in the sensitive documents,
confidential information, and PNG, RAW, GIF, BMP file formats.
• GIF (Graphics Interchange Format)
• Bitmap
Lossless data compression
Lossless data compression
Some most important Lossless data compression techniques are –
1.Run Length Encoding (RLE)
2.Lempel Ziv - Welch (LZW)
3.Huffman Coding
4.Arithmetic Coding
Lossy data compression

• Lossy data compression is used to compress larger files into smaller

files. In this compression technique, some specific amount of data
and quality are removed (loss) from the original file. It takes less
memory space from the original file due to the loss of original data
and quality. This technique is generally useful for us when the quality
of data is not our first priority.

• Lossy data compression is most widely used in JPEG images, MPEG

video, and MP3 audio formats.
Lossy data compression
Lossy data compression
Some important Lossy data compression techniques are-
1.Transform coding
2.Discrete Cosine Transform (DCT)
3.Discrete Wavelet Transform (DWT)
Difference between lossless and lossy data compression

• As we know, both lossless and lossy data compression techniques are

used to compress data form its original size. The main difference
between lossless and lossy data compression is that we can restore the
lossless data in its original form after the decompression, but lossy
data can't be restored to its original form after the decompression.

• The below table shows the difference between lossless and lossy data
compression -
Difference between lossless and lossy data compression
S.No Lossless data compression Lossy data compression
1. In Lossless data compression, In Lossy data compression,
there is no loss of any data and there is a loss of quality and
quality. data, which is not measurable.

2. In lossless, the file is restored In Lossy, the file does not

in its original form. restore in its original form.
3. Lossless data compression Lossy data compression
algorithms are Run Length algorithms are: Transform
Encoding, Huffman encoding, coding, Discrete Cosine
Shannon fano encoding, Transform, Discrete Wavelet
Arithmetic encoding, Lempel Transform, fractal
Ziv Welch encoding, etc. compression, etc.
Difference between lossless and lossy data
compression
4. Lossless compression is Lossy compression is
mainly used to compress mainly used to compress
text-sound and images. audio, video, and images.

5. As compare to lossy data As compare to lossless data

compression, lossless data compression, lossy data
compression holds more compression holds less
data. data.
Difference between lossless and lossy data
compression
6. File quality is high in the File quality is low in the
lossless data compression. lossy data compression.

7. Lossless data compression Lossy data compression

mainly supports RAW, BMP, mainly supports JPEG, GIF,
PNG, WAV, FLAC, and MP3, MP4, MKV, and OGG
ALAC file types. file types.
Run Length Encoding (RLE)

• Run-length encoding (RLE) is a form of lossless data compression in

which runs of data (sequences in which the same data value occurs in
many consecutive data elements) are stored as a single data value and
count, rather than as the original run. This is most efficient on data that
contains many such runs, for example simple graphic images such as
icons, line drawings, Conway's Game of Life, and animations. For
files that do not have many runs, RLE could increase the file size.
Example

• Consider a screen containing plain black text on a solid white

background. There will be many long runs of white pixels in the blank
space, and many short runs of black pixels within the text. A
hypothetical scan line, with B representing a black pixel and W
representing white, might read as follows:
• WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWW
WWWWWWWWWWWWWWWWWWWWWBWWWWWWWW
WWWWWW
Example
• With a run-length encoding (RLE) data compression algorithm applied
to the above hypothetical scan line, it can be rendered as follows:
• 12W1B12W3B24W1B14W
Huffman Coding
• Huffman coding is a lossless data compression algorithm. The idea is
to assign variable-length codes to input characters, lengths of the
assigned codes are based on the frequencies of corresponding
characters. The most frequent character gets the smallest code and the
least frequent character gets the largest code.
The variable-length codes assigned to input characters are
Prefix Codes, means the codes (bit sequences) are assigned in such a
way that the code assigned to one character is not the prefix of code
assigned to any other character. This is how Huffman Coding makes
sure that there is no ambiguity when decoding the generated
bitstream.
Huffman Coding
• Let us understand prefix codes with a counter example. Let there be
four characters a, b, c and d, and their corresponding variable length
codes be 00, 01, 0 and 1. This coding leads to ambiguity because code
assigned to c is the prefix of codes assigned to a and b. If the
compressed bit stream is 0001, the de-compressed output may be
“cccd” or “ccb” or “acd” or “ab”.
See this for applications of Huffman Coding.
There are mainly two major parts in Huffman Coding
1.Build a Huffman Tree from input characters.
2.Traverse the Huffman Tree and assign codes to characters.
Steps to build Huffman Tree
• Input is an array of unique characters along with their frequency of occurrences and
output is Huffman Tree.
1.Create a leaf node for each unique character and build a min heap of all leaf nodes
(Min Heap is used as a priority queue. The value of frequency field is used to
compare two nodes in min heap. Initially, the least frequent character is at root)
2.Extract two nodes with the minimum frequency from the min heap.

3.Create a new internal node with a frequency equal to the sum of the two nodes
frequencies. Make the first extracted node as its left child and the other extracted
node as its right child. Add this node to the min heap.
4.Repeat steps#2 and #3 until the heap contains only one node. The remaining node is
the root node and the tree is complete.
Huffman Coding
• Let us understand the algorithm with an example:
• character Frequency
• a 5
• b 9
• c 12
• d 13
• e 16
• f 45
Huffman Coding
• Step 1. Build a min heap that contains 6 nodes where each node
represents root of a tree with single node.
• Step 2 Extract two minimum frequency nodes from min heap. Add a
new internal node with frequency 5 + 9 = 14.
•

Huffman Coding
Huffman Coding
• Now min heap contains 5 nodes where 4 nodes are roots of trees with
single element each, and one heap node is root of tree with 3 elements.

• character Frequency
• c 12
• d 13
• Internal Node 14
• e 16
• f 45
• Step 3: Extract two minimum frequency nodes from heap. Add a new
internal node with frequency 12 + 13 = 25

• Now min heap contains 4 nodes where 2 nodes are roots of trees with
single element each, and two heap nodes are root of tree with more
than one nodes
• character Frequency
• Internal Node 14
• e 16
• Internal Node 25
• f 45
• Step 4: Extract two minimum frequency nodes. Add a new internal
node with frequency 14 + 16 = 30
• Now min heap contains 3 nodes.
• character Frequency
• Internal Node 25
• Internal Node 30
• f 45
• Step 5: Extract two minimum frequency nodes. Add a new internal
node with frequency 25 + 30 = 55
• Now min heap contains 2 nodes.
• character Frequency
• f 45
• Internal Node 55
• Step 6: Extract two minimum frequency nodes. Add a new internal
node with frequency 45 + 55 = 100
• Now min heap contains only one node.
• character Frequency
• Internal Node 100
• Since the heap contains only one node, the algorithm stops here.
Steps to print codes from Huffman Tree:
Traverse the tree formed starting from the root. Maintain an auxiliary
array. While moving to the left child, write 0 to the array. While
moving to the right child, write 1 to the array. Print the array when a
leaf node is encountered.

• The codes are as follows:
• character code-word
• f 0
• c 100
• d 101
• a 1100
• b 1101
• e 111
LZW (Lempel–Ziv–Welch)
• There are two categories of compression techniques, lossy and
lossless. Whilst each uses different techniques to compress files, both
have the same aim: To look for duplicate data in the graphic (GIF for
LZW) and use a much more compact data representation. Lossless
compression reduces bits by identifying and eliminating statistical
redundancy. No information is lost in lossless compression. On the
other hand, Lossy compression reduces bits by removing unnecessary
or less important information. So we need Data Compression mainly
because:
LZW (Lempel–Ziv–Welch)
• Uncompressed data can take up a lot of space, which is not good for limited
hard drive space and internet download speeds.
• While hardware gets better and cheaper, algorithms to reduce data size also
helps technology evolve.
• Example: One minute of uncompressed HD video can be over 1 GB.How
can we fit a two-hour film on a 25 GB Blu-ray disc?
• Lossy compression methods include DCT (Discreet Cosine Transform),
Vector Quantisation and Transform Coding while Lossless compression
methods include RLE (Run Length Encoding), string-table compression,
LZW (Lempel Ziff Welch) and zlib. There Exist several compression
Algorithms, but we are concentrating on LZW.
LZW (Lempel–Ziv–Welch)
• The LZW algorithm is a very common compression technique. This
algorithm is typically used in GIF and optionally in PDF and TIFF.
Unix’s ‘compress’ command, among other uses. It is lossless, meaning
no data is lost when compressing. The algorithm is simple to
implement and has the potential for very high throughput in hardware
implementations. It is the algorithm of the widely used Unix file
compression utility compress, and is used in the GIF image format.
The Idea relies on reoccurring patterns to save data space. LZW is the
foremost technique for general purpose data compression due to its
simplicity and versatility. It is the basis of many PC utilities that claim
to “double the capacity of your hard drive”.
LZW (Lempel–Ziv–Welch) how to work
• LZW compression works by reading a sequence of symbols, grouping
the symbols into strings, and converting the strings into codes.
Because the codes take up less space than the strings they replace, we
get compression.Characteristic features of LZW includes,
• LZW compression uses a code table, with 4096 as a common choice for the
number of table entries. Codes 0-255 in the code table are always assigned to
represent single bytes from the input file.
• When encoding begins the code table contains only the first 256 entries, with
the remainder of the table being blanks. Compression is achieved by using
codes 256 through 4095 to represent sequences of bytes.
• As the encoding continues, LZW identifies repeated sequences in the data,
and adds them to the code table.
• Decoding is achieved by taking each code from the compressed file and
translating it through the code table to find what character or characters it
represents.
• Example: ASCII code. Typically, every character is stored with 8
binary bits, allowing up to 256 unique symbols for the data. This
algorithm tries to extend the library to 9 to 12 bits per character.The
new unique symbols are made up of combinations of symbols that
occurred previously in the string. It does not always compress well,
especially with short, diverse strings. But is good for compressing
redundant data, and does not have to save the new dictionary with the
data: this method can both compress and uncompress data.
There are excellent article’s written up already, you can look more
indepth and also Mark Nelson’s article is commendable
• Implementation
• The idea of the compression algorithm is the following: as the input
data is being processed, a dictionary keeps a correspondence between
the longest encountered words and a list of code values. The words are
replaced by their corresponding codes and so the input file is
compressed. Therefore, the efficiency of the algorithm increases as the
number of long, repetitive words in the input data increases.
LZW ENCODING
• * PSEUDOCODE
• 1 Initialize table with single character strings
• 2 P = first input character
• 3 WHILE not end of input stream
• 4 C = next input character
• 5 IF P + C is in the string table
• 6 P=P+C
• 7 ELSE
• 8 output the code for P
• 9 add P + C to the string table
• 10 P=C
• 11 END WHILE
• 12 output code for P
Problems:
• The LZW algorithm is a very common compression technique.
• Suppose we want to encode the Oxford Concise English dictionary which contains about 159,000
entries. Why not just transmit each word as an 18 bit number?
• Problems:
• Too many bits,
• everyone needs a dictionary,
• only works for English text.
• Solution: Find a way to build the dictionary adaptively.
• Original methods due to Ziv and Lempel in 1977 and 1978. Terry Welch improved the scheme in
1984 (called LZW compression).
• It is used in UNIX compress -- 1D token stream (similar to below)
• It used in GIF comprerssion -- 2D window tokens (treat image as with Huffman Coding Above).
The LZW Compression Algorithm can
summarised as follows:
• W=NILL;
• while ( read a character k )
• {
• if wk exists in the dictionary
• w = wk;
• else
• add wk to the dictionary;
• output the code for w;
• w = k;
• }
• Original LZW used dictionary with 4K entries, first 256 (0-255) are ASCII codes.
Example:
• Input string is "^WED^WE^WEE^WEB^WET".

• w k output index symbol

• -----------------------------------------
• NIL ^
• ^ W ^ 256 ^W
• W E W 257 WE
• E D E 258 ED
• D ^ D 259 D^
• ^ W
• ^W E 256 260 ^WE
• E ^ E 261 E^
• ^ W

Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
17 pages
Unit 5 Data Compression
No ratings yet
Unit 5 Data Compression
98 pages
Group-8 DIP Presentation
No ratings yet
Group-8 DIP Presentation
100 pages
MM05 1
No ratings yet
MM05 1
27 pages
Lossless Vs Lossy Compression Presentation
No ratings yet
Lossless Vs Lossy Compression Presentation
9 pages
2024-11-12 Huffman Trees 分享 -
No ratings yet
2024-11-12 Huffman Trees 分享 -
11 pages
Compression
No ratings yet
Compression
21 pages
Compression Techniques
No ratings yet
Compression Techniques
11 pages
Data Compression
No ratings yet
Data Compression
35 pages
(Amerol) Multimedia Presentations
No ratings yet
(Amerol) Multimedia Presentations
7 pages
Data Structures Algorithms Part IIIb
No ratings yet
Data Structures Algorithms Part IIIb
37 pages
Lecture 22 Compression
No ratings yet
Lecture 22 Compression
42 pages
Module 5 IVP
No ratings yet
Module 5 IVP
112 pages
Text and Text Compression
No ratings yet
Text and Text Compression
28 pages
Compression Technique
No ratings yet
Compression Technique
27 pages
Unit 4
No ratings yet
Unit 4
18 pages
ICT - Module 1 Lecture 3
No ratings yet
ICT - Module 1 Lecture 3
43 pages
Chapter 4 Multi
No ratings yet
Chapter 4 Multi
45 pages
Wa0023.
No ratings yet
Wa0023.
28 pages
Compression (Compatibility Mode)
No ratings yet
Compression (Compatibility Mode)
12 pages
HGGJ Chapter Four
No ratings yet
HGGJ Chapter Four
30 pages
Chapter 7
No ratings yet
Chapter 7
70 pages
Presentation Layer & Application Layer
No ratings yet
Presentation Layer & Application Layer
9 pages
Ut 1 PPT
No ratings yet
Ut 1 PPT
77 pages
Enscape Ebook
No ratings yet
Enscape Ebook
71 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
Data Compression
No ratings yet
Data Compression
18 pages
Stu-Lossless Compression Algos
No ratings yet
Stu-Lossless Compression Algos
21 pages
Analysis and Comparison of Algorithms For Lossless Data Compression
No ratings yet
Analysis and Comparison of Algorithms For Lossless Data Compression
8 pages
Digital Data Compression
No ratings yet
Digital Data Compression
10 pages
Day 20
No ratings yet
Day 20
33 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
Chap15 1473751047 598113
No ratings yet
Chap15 1473751047 598113
34 pages
Spectra, Signals Report
No ratings yet
Spectra, Signals Report
8 pages
What Is Huffman Coding and Its History
No ratings yet
What Is Huffman Coding and Its History
5 pages
Compression: Safeen H. Rasool Assist. Lecturer
No ratings yet
Compression: Safeen H. Rasool Assist. Lecturer
16 pages
Comparison of Lossless Data Compression Algorithms
No ratings yet
Comparison of Lossless Data Compression Algorithms
12 pages
Data Compression
No ratings yet
Data Compression
20 pages
Chapter 7
No ratings yet
Chapter 7
36 pages
Chapter 4 Lossless Compression Algorithims
No ratings yet
Chapter 4 Lossless Compression Algorithims
30 pages
Synopsis On: Data Compression
No ratings yet
Synopsis On: Data Compression
25 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
Chapter 5 New
No ratings yet
Chapter 5 New
19 pages
Data Compression Techniques: Pushpender Rana, Student
No ratings yet
Data Compression Techniques: Pushpender Rana, Student
4 pages
Chapter 3 Multimedia Data Compression
No ratings yet
Chapter 3 Multimedia Data Compression
21 pages
Chapter 3-Part II
100% (1)
Chapter 3-Part II
26 pages
Assignment 1
No ratings yet
Assignment 1
14 pages
Mad Unit 3-Jntuworld
No ratings yet
Mad Unit 3-Jntuworld
53 pages
Data Compression
No ratings yet
Data Compression
7 pages
Why Needed?: Without Compression, These Applications Would Not Be Feasible
No ratings yet
Why Needed?: Without Compression, These Applications Would Not Be Feasible
11 pages
2022-23 Sem 2
No ratings yet
2022-23 Sem 2
254 pages
Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
18 pages
6.1 Lossless Compression Algorithms: Introduction: Unit 6: Multimedia Data Compression
No ratings yet
6.1 Lossless Compression Algorithms: Introduction: Unit 6: Multimedia Data Compression
25 pages
CHAPTER FOURmultimedia
No ratings yet
CHAPTER FOURmultimedia
23 pages
Lossless Compression
No ratings yet
Lossless Compression
11 pages
Huffman Coding MCQ
No ratings yet
Huffman Coding MCQ
9 pages
Data Compression Techniques
No ratings yet
Data Compression Techniques
41 pages
How To Sync On-Premises Active Directory To Azure Active Directory With Azure AD Connect
No ratings yet
How To Sync On-Premises Active Directory To Azure Active Directory With Azure AD Connect
15 pages
Vik
No ratings yet
Vik
23 pages
Data Compression Report
No ratings yet
Data Compression Report
10 pages
Unit 2 - BA
100% (1)
Unit 2 - BA
51 pages
Paraphrasing Tool For Hindi Text
No ratings yet
Paraphrasing Tool For Hindi Text
34 pages
Compression Techniques and Cyclic Redundency Check
No ratings yet
Compression Techniques and Cyclic Redundency Check
5 pages
Chrom Quest 41 Manual
No ratings yet
Chrom Quest 41 Manual
902 pages
Me 3
No ratings yet
Me 3
31 pages
Expert PDF Editor Crack 4-0-210
No ratings yet
Expert PDF Editor Crack 4-0-210
2 pages
Literature Survey
No ratings yet
Literature Survey
5 pages
E Commerce
No ratings yet
E Commerce
24 pages
B MAAP-X Quick Reference Guide
No ratings yet
B MAAP-X Quick Reference Guide
38 pages
Unit 1-5 MCQ Opps
No ratings yet
Unit 1-5 MCQ Opps
134 pages
Dell Emc Poweredge R640: Technical Guide
No ratings yet
Dell Emc Poweredge R640: Technical Guide
45 pages
Assignment 3
No ratings yet
Assignment 3
7 pages
CPPLUS T Sense Face Recognition Terminal User Manual
No ratings yet
CPPLUS T Sense Face Recognition Terminal User Manual
103 pages
ES 2023 L6 Embedded System Interfacing RTOS
No ratings yet
ES 2023 L6 Embedded System Interfacing RTOS
13 pages
Cpython Devguide PDF
No ratings yet
Cpython Devguide PDF
161 pages
Glossary of Terms Used in Information Technology
0% (1)
Glossary of Terms Used in Information Technology
4 pages
IAAA (Autosaved)
No ratings yet
IAAA (Autosaved)
55 pages
Ensayo Sobre La Urbanización
100% (1)
Ensayo Sobre La Urbanización
6 pages
Practical Answer
No ratings yet
Practical Answer
17 pages
Software Requirements Specification (SRS) Project Lane Management System - 1
No ratings yet
Software Requirements Specification (SRS) Project Lane Management System - 1
35 pages
Plagiarism Detection Research
No ratings yet
Plagiarism Detection Research
23 pages
Userbak 1
No ratings yet
Userbak 1
15 pages
ENSIA Exam Time Table 2024-2025 - Midterm S1 Planning-1
No ratings yet
ENSIA Exam Time Table 2024-2025 - Midterm S1 Planning-1
1 page
Yaschir
No ratings yet
Yaschir
4 pages
Sarasas Computers Yr 11 S1 Mid-Term Exam
No ratings yet
Sarasas Computers Yr 11 S1 Mid-Term Exam
5 pages
Programação para Dispositivos Móveis AV1
No ratings yet
Programação para Dispositivos Móveis AV1
5 pages
User Manager
No ratings yet
User Manager
3 pages
Modern Blue Ipad Displaying Futuristic Interface Wallpaper
No ratings yet
Modern Blue Ipad Displaying Futuristic Interface Wallpaper
1 page
Advance Excel Course Applicaiton 210721
No ratings yet
Advance Excel Course Applicaiton 210721
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Compression (RCS 087)

Uploaded by

Data Compression (RCS 087)

Uploaded by

DATA COMPRESSION (RCS 087)

• Data Compression is also referred to as bit-rate reduction or source

• The advantage of data compression is that it helps us save our disk

There are mainly two types of data compression techniques –

• Lossless data compression is used to compress the files without losing

• Lossy data compression is used to compress larger files into smaller

• Lossy data compression is most widely used in JPEG images, MPEG

• As we know, both lossless and lossy data compression techniques are

2. In lossless, the file is restored In Lossy, the file does not

5. As compare to lossy data As compare to lossless data

7. Lossless data compression Lossy data compression

• Run-length encoding (RLE) is a form of lossless data compression in

• Consider a screen containing plain black text on a solid white

• w k output index symbol

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.