Ex 7 Daa
Ex 7 Daa
SEM: EVEN
Experiment : 7
Aim / Objective: To implement Huffman coding & determine its time complexity.
Theory: Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length
codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding
characters.
The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit sequences)
are assigned in such a way that the code assigned to one character is not the prefix of code assigned to any
other character. This is how Huffman Coding makes sure that there is no ambiguity when decoding the
generated bitstream.
Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d, and their
corresponding variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity because code
assigned to c is the prefix of codes assigned to a and b. If the compressed bit stream is 0001, the de-
compressed output may be “cccd” or “ccb” or “acd” or “ab”.
See this for applications of Huffman Coding.
There are mainly two major parts in Huffman Coding
Fig 7.1
SEM: EVEN
Algorithm:
1. Count Frequencies:
2. Create a Min-Heap:
*Insert each character as a node into a priority queue (min-heap), where the frequency is
the priority.
*Merge them into a new node, where frequency = sum of the two nodes.
*Assign '0' for left edges and '1' for right edges.
2. Create a Dictionary:
SEM: EVEN
*Convert the input string into a sequence of Huffman codes using the dictionary.
*When a leaf node (character) is reached, append it to the output and reset to the root.
Code:
import heapq
class node:
self.freq = freq
self.symbol = symbol
self.left = left
self.right = right
self.huff = ''
SEM: EVEN
if(node.left):
printNodes(node.left, newVal)
if(node.right):
printNodes(node.right, newVal)
nodes = []
for x in range(len(chars)):
SEM: EVEN
left = heapq.heappop(nodes)
right = heapq.heappop(nodes)
left.huff = 0
right.huff = 1
heapq.heappush(nodes, newNode)
printNodes(nodes[0])
Output:
Fig 7.2
SEM: EVEN
Advantages:
1. Optimal Compression: Huffman coding provides an optimal prefix code for a given set of
character frequencies, leading to efficient compression.
2. Lossless Compression: Unlike lossy compression methods (e.g., JPEG, MP3), Huffman
coding ensures that no data is lost during encoding and decoding.
3. Widely Used: Huffman coding is used in file compression formats like ZIP, GZIP, and image
compression formats like PNG.
4. Prefix-Free Property: Huffman codes are prefix codes, meaning no code is a prefix of
another, ensuring unique decodability.
Disadvantages:
1. Requires Two Passes: One pass is needed to determine frequency counts, and another pass to
encode, which can be inefficient in some applications.
2. Not Suitable for Dynamic Data: If character frequencies change frequently, a new Huffman
tree must be built, making it inefficient for real-time streaming.
3. Not Always the Best: If all characters have nearly equal frequencies, Huffman coding does not
provide significant compression benefits.
Time Complexity:
*Building the Huffman Tree: O(n log n) (using a min-heap for tree construction)
SEM: EVEN
*Encoding the Input: O(n) (replacing characters with their Huffman codes)
Space Complexity:
Encoded Output: O(n) (depends on the input size and compression efficiency)
SEM: EVEN