0% found this document useful (0 votes)
79 views6 pages

Huffman Coding

Huffman coding is a method for compressing data that assigns variable-length codes to input characters, lengths of which are based on the frequencies of characters such that the most common characters have shorter codes. It builds a prefix-free Huffman tree from input characters and their frequencies to generate the optimal code. The algorithm works by first assigning each unique character a node, then repeatedly combining the two nodes with lowest frequencies until a root node is reached, with codes read from the tree left-to-right. It finds the optimal coding in O(N log N) time without searching all possible trees.

Uploaded by

jitensoft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views6 pages

Huffman Coding

Huffman coding is a method for compressing data that assigns variable-length codes to input characters, lengths of which are based on the frequencies of characters such that the most common characters have shorter codes. It builds a prefix-free Huffman tree from input characters and their frequencies to generate the optimal code. The algorithm works by first assigning each unique character a node, then repeatedly combining the two nodes with lowest frequencies until a root node is reached, with codes read from the tree left-to-right. It finds the optimal coding in O(N log N) time without searching all possible trees.

Uploaded by

jitensoft
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

HUFFMAN CODING AND HUFFMAN TREE

Coding:

Reducing strings over arbitrary alphabet o to strings over a


fixed alphabet c to standardize machine operations (|c | < |o |).
Binary representation of both operands and operators in
machine instructions in computers.

It must be possible to uniquely decode a code-string (string over


c ) to a source-string (string over o ).
Not all code-string need to correspond to a source-string.

Both the coding and decoding should be efficient.

Word: A finite non-empty string over an alphabet (o or c ).


Simple Coding Mechanism:

code(ai ) = a non-empty string over c , for ai o .


code(a1 a2 a n ) = code(a1 ).code(a2 )code(a n ).

Example. 0 = { A, B, C, D, E} and c = {0, 1}.


A

000 001 010

011

1
1

100 code( AAB) = 000. 000. 001;


easy to decode
01 001 0001 00001 code(C) = code( AB) = 001;
not always possible to
uniquely decode
01 001 0001 00001 prefix-free code
10 100 1000 10000 not prefix-free code

Prefixproperty
yes
no

yes
no

4.2

PREFIX-FREE CODE
Definition:

No code(ai ) is a prefix of another code(a j ).


In particular, code(ai ) code(a j ) for ai a j .

Binary-tree representation of prefix-free binary code:

0 = label(left branch) and 1 = label(right branch).

0
0
0
A
000

1
1
B
001

1
C
011

0
1
D
10 0
E
110

1
0
1
C
D
E
0
1 01
10
11
A
B
000 001
C and E have shorter code-word;
each non-terminal node has 2 children.

Advantage:

One can decode the symbols from left to right, i.e., as they are
received.

A sufficient condition for left-to-right unique decoding is the


prefix property.

Question:
? How can we keep prefix-free property and assign shorter codes
to some of the symbols { A, B, , E}?
? What should we do if the symbols in o occur with probabilities
p(A) = 0.1 = p(B), p(C) = 0.3, p(D) = p(E) = 0.25?

4.3

HUFFMAN-TREE

Binary tree with each non-terminal node having 2 children.

Gives optimal (min average code-length) prefix-free binary code


to each ai o for a given probabilities p(ai ) > 0.

Huffmans Algorithm:
1. Create a terminal node for each ai o , with probability p(ai )
and let S = the set of terminal nodes.
2. Select nodes x and y in S with the two smallest probabilities.
3. Replace x and y in S by a node with probability p(x) + p(y).
Also, create a node in the tree which is the parent of x and y.
4. Repeat (2)-(3) untill |S| = 1.
Example. 0 = { A, B, , E} and p( A) = 0.1 = p(B), p(C) = 0.3,
p(D) = p(E) = 0.25. The nodes in S are shown shaded.
0.1

0.1

0.3

0.25 0.25
D

0.2
A

0.45

5.5

0.25
D

C
10

E
11

After redrawing
the tree

1.0

0.25 0.25

0.45
0.3

0.3

A
000

B
001

D
01

4.4

NUMBER OF BINARY TREES


0-2 Binary Tree: Each non-terminal node has 2 children.

1
2N + 1
.
2N + 1 N
#(0-2 Binary trees with N terminal nodes)
1 2N 1
= #(Binary trees with N 1 nodes) =
2 N 2 .
2N 1 N 1
#(Binary trees with N nodes) =

Example. Binary trees with N 1 = 3 nodes correspond to 0-2


binary trees with N = 4 terminal nodes.

Merits of Huffmans Algorithm:

It finds the optimal coding in O(N . log N ) time.

It does so without having to search through


possible 0-2 binary trees.

1
2N 1
2N 1 N 1

4.5

DATA-STRUCTURE FOR IMPLEMENTING


HUFFMANS ALGORITHM
Main Operations:

Choosing the two nodes with minimum associated probabilities


(and creating a parent node, etc).
Use heap data-structure for this part.
This is done N 1 times; total work for this part is
O(N . log N ).
Addition of each parent node and connecting with the children
takes a constant time per node.
A total of N 1 parent nodes are added, and total time for
this O(N ).
Complexity: O(N . log N ).

4.6

EXERCISE
1. Consider the codes shows below.
A
000

B
001

C
011

D
10

E
110

(a) Arrange the codes in a binary tree form, with 0 = label(leftbranch) and 1 = label(rightbranch).
(b

Does these codes have the prefix-property? How do you


decode the string 10110001000?

(c) Modify the above code (keeping the prefix property) so that
the new code will have less average length no matter what
the probabilities of the symbols are. Show the binary tree
for the new code.
(d) What are the two key properties of the new binary tree
(hint: compare with your answer for part (a))?
(e) Give a suitable probability for the symbols such that
prob( A) < prob(B) < prob(C) < prob(D) < prob(E) and the
new code in part (c) is optimal (minimum aver. length) for
those probabilities.
2. Show the successive heaps in creating the Huffman-Tree for the
probabilities p( A) = 0.1 = p(B), p(C) = 0.3, p(D) = 0.14, p(E)
= 0.12, and p(F) = 0.24.
3. Give some probabilities for the items in o = { A, B, , F} that
give the largest possible value for optimal average code length.
4. Argue that for an optimal Huffman-tree, any subtree is optimal
(w.r.t to the relative probabilities of its terminal nodes), and also
the tree obtained by removing all children and other descendants
of a node x gives a tree which is optimal w.r.t to p(x) and the
probabilities of its other terminal nodes.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy