0% found this document useful (0 votes)
97 views39 pages

Lossless Compression: Huffman Coding: Mikita Gandhi Assistant Professor Adit

The document discusses lossless compression techniques such as Huffman coding. It defines entropy as a measure of uncertainty in a data source, and explains how entropy is calculated based on the probabilities of symbols in the source. Shannon's coding theorem establishes that the minimum average code length cannot be less than the source entropy. Huffman coding assigns variable-length codes to symbols based on their probabilities, with shorter codes for more probable symbols.

Uploaded by

mehul03ec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views39 pages

Lossless Compression: Huffman Coding: Mikita Gandhi Assistant Professor Adit

The document discusses lossless compression techniques such as Huffman coding. It defines entropy as a measure of uncertainty in a data source, and explains how entropy is calculated based on the probabilities of symbols in the source. Shannon's coding theorem establishes that the minimum average code length cannot be less than the source entropy. Huffman coding assigns variable-length codes to symbols based on their probabilities, with shorter codes for more probable symbols.

Uploaded by

mehul03ec
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Lossless Compression:

Huffman Coding
Mikita Gandhi
Assistant Professor
ADIT
Application
some applications, such as satellite
image analysis, medical and business
document archival, medical images for
diagnosis etc, where loss may not be
tolerable and lossless compression
techniques are to be used.
Compression techniques
Some of the popular lossless image
compression techniques being used
are
(a) Huffman coding,
(b) Arithmetic coding,
(c) Ziv-Lempel coding,
(d) Bit-plane coding,
(e) Run-length coding etc.
Source entropy- a measure of
information content
Generation of information is generally
modeled as a random process that
has probability associated with it.
If P(E) is the probability of an event, its
information content I(E), also known
as self information is measured as

Source entropy- a measure of
information content
If P(E)=1, that is, the event always
occurs (like saying The sun rises in the
east), then we obtain from above that
I(E)=0, which means that there is no
information associated with it.
The base of the logarithm expresses the
unit of information and if the base is 2,
the unit is bits. For other values m of the
base, the information is expressed as m-
ary units. Unless otherwise mentioned,
we shall be using the base-2 system to
measure information content.

Source entropy- a measure of
information content
Now, suppose that we have an alphabet of n
symbols {a
i
/i=1,2..n}having probabilities of
occurrences P(a
1
), P(a
2
)P(a
n
). If k is the
number of source outputs generated, which is
considered to be sufficiently large, then the
average number of occurrences of symbol a
i
is
KP(a
i
)and the average self-information obtained
from k outputs is given by

and the average information per source output for
the source z is given by

Source entropy- a measure of
information content
The above quantity is defined as the
entropy of the source and measures
the uncertainty of the source. The
relationship between uncertainty and
entropy can be illustrated by a simple
example of two symbols a
1
and a
2
,
having probabilities P(a
1
)and P(a
2
)
respectively. Since, the summation of
probabilities is equal to 1,

and using equation, we obtain


Source entropy- a measure of
information content
If we plot H(z) versus P(a
1
) , we obtain
the graph shown

Source entropy- a measure of
information content
It is interesting to note that the entropy
is equal to zero for P(a
1
)=0 and
P(a
1
)=1. These correspond to the
cases where at least one of the two
symbols is certain to be present. H(z)
assumes maximum value of 1-bit for
P(a
1
)=1/2. This corresponds to the
most uncertain case, where both the
symbols are equally probable.
Source entropy- a measure of
information content
Example: Measurement of source entropy
If the probabilities of the source symbols are
known, the source entropy can be measured. Say,
we have five symbols a
1
, a
2
, a
3
, a
4
, a
5
having the
following probabilities:


the source entropy is given by

Shannons Coding Theorem for
noiseless channels
We are now going to present a very important
theorem by Shannon, which expresses the lower
limit of the average code word length of a source in
terms of its entropy. Stated formally, the theorem
states that in any coding scheme, the average
code word length of a source of symbols can at
best be equal to the source entropy and can never
be less than it. The above theorem assumes the
coding to be lossless and the channel to be
noiseless.
If m(z) is the minimum of the average code word
length obtained out of different uniquely
decipherable coding schemes, then as per
Shannons theorem, we can state that

Coding efficiency
The coding efficiency () of an
encoding scheme is expressed as the
ratio of the source entropy H(z) to the
average code word length L(z) and is
given by


Since according to Shannons Coding
theorem and both L(z) and H(z) are
positive,

Basic principles of Huffman
Coding
Huffman coding is a popular lossless
Variable Length Coding (VLC) scheme,
based on the following principles:
(a) Shorter code words are assigned to
more probable symbols and longer code
words are assigned to less probable
symbols.
(b) No code word of a symbol is a prefix of
another code word. This makes Huffman
coding uniquely decodable.
(c) Every source symbol must have a
unique code word assigned to it.

Basic principles of Huffman
Coding
In image compression systems
Huffman coding is performed on the
quantized symbols. Quite often,
Huffman coding is used in conjunction
with other lossless coding schemes,
such as run-length coding. In terms of
Shannons noiseless coding theorem,
Huffman coding is optimal for a fixed
alphabet size, subject to the constraint
that that the source symbols are
coded one at a time.
Assigning Binary Huffman codes
to a set of symbols
We shall now discuss how Huffman
codes are assigned to a set of source
symbols of known probability. If the
probabilities are not known a priori, it
should be estimated from a sufficiently
large set of samples. The code
assignment is based on a series of
source reductions and we shall
illustrate this with reference to the
example. The steps are as follows:
Assigning Binary Huffman codes
to a set of symbols
Assigning Binary Huffman codes
to a set of symbols
Assigning Binary Huffman codes
to a set of symbols
Assigning Binary Huffman codes
to a set of symbols
Assigning Binary Huffman codes
to a set of symbols
Assigning Binary Huffman codes
to a set of symbols
This completes the Huffman code assignment
pertaining to this example. From this table, it is evident
that shortest code word (length=1) is assigned to the
most probable symbol a
4
and the longest code words
(length=4) are assigned to the two least probable
symbols a
3

and a
5
. Also, each symbol has a unique
code and no code word is a prefix of code word for
another symbol. The coding has therefore fulfilled the
basic requirements of Huffman coding,

Assigning Binary Huffman codes
to a set of symbols
Assigning Binary Huffman codes
to a set of symbols
Encoding a string of symbols
using Huffman codes
Decoding a Huffman coded bit
stream
Questions?
1) Define the entropy of a source of
symbols.
2) How is entropy related to
uncertainty?
3) State Shannons coding theorem on
noiseless channels.
4) Define the coding efficiency of an
encoding scheme.
5) State the basic principles of Huffman
coding.
Multiple Choice
The entropy of a source of symbols is
dependent upon
(A) The number of source outputs
generated.
(B) The average codeword length.
(C) The probabilities of the source
symbols.
(D) The order in which the source
outputs are generated.
Multiple Choice
We have two sources of symbols to compare
their entropies. Source-1 has three symbols a
1
,a
2
and a
3
with probabilities

Source-2 also has three symbols a
1
,a
2
and a
3
, but
with probabilities
(A) Entropy of source-1 is higher than that of
source-2.
(B) Entropy of source-1 is lower than that of
source-2.
(C) Entropy of source-1 and source-2 are the
same.
(D) It is not possible to compute the entropies
from the given data.

Multiple Choice
Shannons coding theorem on
noiseless channels provides us with
(A) A lower bound on the average
codeword length.
(B) An upper bound on the average
codeword length
(C) A lower bound on the source
entropy.
(D) An upper bound on the source
entropy.
Multiple Choice
Which one of the following is not true for
Huffman coding?
(A) No codeword of an elementary symbol
is a prefix of another elementary
symbol.
(B) Each symbol has a one-to-one
mapping with its corresponding
codeword.
(C) The symbols are encoded as a group,
rather than encoding one symbol at a
time.
(D) Shorter code words are assigned to
more probable symbols.
Multiple Choice
Multiple Choice
Which of the following must be
ensured before assigning binary
Huffman codes to a set of symbols?
(A) The channel is noiseless.
(B) There must be exactly 2
n
symbols
to encode.
(C) No two symbols should have the
same probability.
(D) The probabilities of the symbols
should be known a priori.
Multiple Choice
Multiple Choice
Problem
Solution
Solution of problem

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy