Huffman Coding
Huffman Coding
Coding:
011
1
1
Prefixproperty
yes
no
yes
no
4.2
PREFIX-FREE CODE
Definition:
0
0
0
A
000
1
1
B
001
1
C
011
0
1
D
10 0
E
110
1
0
1
C
D
E
0
1 01
10
11
A
B
000 001
C and E have shorter code-word;
each non-terminal node has 2 children.
Advantage:
One can decode the symbols from left to right, i.e., as they are
received.
Question:
? How can we keep prefix-free property and assign shorter codes
to some of the symbols { A, B, , E}?
? What should we do if the symbols in o occur with probabilities
p(A) = 0.1 = p(B), p(C) = 0.3, p(D) = p(E) = 0.25?
4.3
HUFFMAN-TREE
Huffmans Algorithm:
1. Create a terminal node for each ai o , with probability p(ai )
and let S = the set of terminal nodes.
2. Select nodes x and y in S with the two smallest probabilities.
3. Replace x and y in S by a node with probability p(x) + p(y).
Also, create a node in the tree which is the parent of x and y.
4. Repeat (2)-(3) untill |S| = 1.
Example. 0 = { A, B, , E} and p( A) = 0.1 = p(B), p(C) = 0.3,
p(D) = p(E) = 0.25. The nodes in S are shown shaded.
0.1
0.1
0.3
0.25 0.25
D
0.2
A
0.45
5.5
0.25
D
C
10
E
11
After redrawing
the tree
1.0
0.25 0.25
0.45
0.3
0.3
A
000
B
001
D
01
4.4
1
2N + 1
.
2N + 1 N
#(0-2 Binary trees with N terminal nodes)
1 2N 1
= #(Binary trees with N 1 nodes) =
2 N 2 .
2N 1 N 1
#(Binary trees with N nodes) =
1
2N 1
2N 1 N 1
4.5
4.6
EXERCISE
1. Consider the codes shows below.
A
000
B
001
C
011
D
10
E
110
(a) Arrange the codes in a binary tree form, with 0 = label(leftbranch) and 1 = label(rightbranch).
(b
(c) Modify the above code (keeping the prefix property) so that
the new code will have less average length no matter what
the probabilities of the symbols are. Show the binary tree
for the new code.
(d) What are the two key properties of the new binary tree
(hint: compare with your answer for part (a))?
(e) Give a suitable probability for the symbols such that
prob( A) < prob(B) < prob(C) < prob(D) < prob(E) and the
new code in part (c) is optimal (minimum aver. length) for
those probabilities.
2. Show the successive heaps in creating the Huffman-Tree for the
probabilities p( A) = 0.1 = p(B), p(C) = 0.3, p(D) = 0.14, p(E)
= 0.12, and p(F) = 0.24.
3. Give some probabilities for the items in o = { A, B, , F} that
give the largest possible value for optimal average code length.
4. Argue that for an optimal Huffman-tree, any subtree is optimal
(w.r.t to the relative probabilities of its terminal nodes), and also
the tree obtained by removing all children and other descendants
of a node x gives a tree which is optimal w.r.t to p(x) and the
probabilities of its other terminal nodes.