0% found this document useful (0 votes)
325 views48 pages

Huffman Coding by Akas

The document discusses Huffman coding and a feasibility study for developing a system to implement Huffman coding. It begins with an overview of Huffman coding and its history. It then discusses the requirements analysis phase, which involves gathering requirements, analyzing them for inconsistencies, and specifying the system based on the analyzed requirements. The main requirements are a frequency distribution table with valid symbols between 0-94 and ensuring the Huffman codes are "prefix-free" to avoid ambiguities. The feasibility study found the project to be technically, economically, and operationally feasible within the given time frame.

Uploaded by

Aquib
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC or read online on Scribd
0% found this document useful (0 votes)
325 views48 pages

Huffman Coding by Akas

The document discusses Huffman coding and a feasibility study for developing a system to implement Huffman coding. It begins with an overview of Huffman coding and its history. It then discusses the requirements analysis phase, which involves gathering requirements, analyzing them for inconsistencies, and specifying the system based on the analyzed requirements. The main requirements are a frequency distribution table with valid symbols between 0-94 and ensuring the Huffman codes are "prefix-free" to avoid ambiguities. The feasibility study found the project to be technically, economically, and operationally feasible within the given time frame.

Uploaded by

Aquib
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC or read online on Scribd
You are on page 1/ 48

HUFFMAN CODING 2010

1 FEASIBILITY STUDY
In computer science and information theory, Huffman coding is an entropy encoding algorithm
used for lossless data compression. The term refers to the use of a variable-length code table for
encoding a source symbol (such as a character in a file) where the variable-length code table has
been derived in a particular way based on the estimated probability of occurrence for each
possible value of the source symbol. It was developed by David A. Huffman at MIT, and
published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes".

Huffman coding uses a specific method for choosing the representation for each symbol,
resulting in a prefix code (sometimes called "prefix-free codes") (that is, the bit string
representing some particular symbol is never a prefix of the bit string representing any other
symbol) that expresses the most common characters using shorter strings of bits than are used for
less common source symbols. Huffman was able to design the most efficient compression
method of this type: no other mapping of individual source symbols to unique strings of bits will
produce a smaller average output size when the actual symbol frequencies agree with those used
to create the code

1.1 History:

In 1951, David A. Huffman and his MIT information theory classmates were given the choice of
a term paper or a final exam. The professor, Robert M. Fano, assigned a term paper on the
problem of finding the most efficient binary code. Huffman, unable to prove any codes were the
most efficient, was about to give up and start studying for the final when he hit upon the idea of
using a frequency-sorted binary tree and quickly proved this method the most efficient. In doing
so, the student outdid his professor, who had worked with information theory inventor Claude
Shannon to develop a similar code.

1.2 The algorithm:


1. Scan text to be compressed and tally occurrence of all characters.

2. Sort or prioritize characters based on number of occurrences in text.

3. Build Huffman code tree based on prioritized list.

4. Perform a traversal of tree to determine all code words.

5. Scan text again and create new file using the Huffman code.

The above construction algorithm uses a priority queue where the node with lowest probability is
given highest priority. This priority queue is used to build the Huffman tree which assigns more
frequent symbols less number of bits and symbols that occur less frequently would take up more

B.E COMPUTER ENGINEERING Page 1


HUFFMAN CODING 2010
number of bits. In other words assign shorter code words to more common bits and long code
words to less common bits and that would be ok because they are less frequent so that the data
compression is achieved.

1.3 Problem definition:

“To design the system that will allow the user to enter the total number of characters with their
frequencies at the terminal and then display the Huffman codes on the terminal in an interactive
manner. “

The main aim of the feasibility study activity is to determine whether it would be
financially, and technically feasible to develop the product. After thoroughly analyzing the
problem definition and Huffman coding algorithm from various standard books on information
theory and internet various strategies for solving the problem were analyzed and finally the
algorithm based on priority queue (singly linked list) was chosen.

1.3.1 Technical feasibility

We need C++ compiler software and a computer for our Huffman algorithm which is available.
The personal involved with the project should be well versed with the basic concepts of C, C++
and data structures.

1.3.2 Economical Feasibility:

Economic feasibility is dependent on the complexity of the problem and number of personnels
involved with the project. Huffman coding is a moderate problem and thus will need atleast 4
group members. Our team comprises of 4 engineering students who are familiar with data
structures and thus is economically feasible

1.3.3 Operational Feasibility:

This project is developed for information theory professionals and the proposed system provides
the fast and efficient operations. So it will be acceptable to a large extent. So the proposed system
is operationally feasible.

B.E COMPUTER ENGINEERING Page 2


HUFFMAN CODING 2010

1.3.4 Time feasibility


Our Huffman project should be completed by the 12th of April 2010. This time is feasible as per
scheduling conducted by the project manager

2 REQUIREMENT ANALYSIS AND SPECIFICATION


The aim of the requirements analysis and specification phase is to understand the exact
requirements of the customer and to document them properly. This phase consists of two distinct
activities:

B.E COMPUTER ENGINEERING Page 3


HUFFMAN CODING 2010
2.1 Requirement gathering:

The goal of the requirement gathering activity is to collect all relevant information regarding the
product to be developed from the customer with a view to clearly understand the customer
requirements .To thoroughly understand the problem, first a study of Graph theory was
conducted using standard text books (refer Bibliography) to understand the Huffman tree. The
the study of Huffman coding was taken up from various relevant sources like internet and books
on data compression. Finally, the method used by Huffman coding to compress the data was
understood to understand the motive behind the project.

2.2 Requirement analysis:

The goal of requirement analysis activity is to weed out the incompleteness and inconsistencies
in the above gathered requirements. The data collected from various sources including a group of
users usually contain several contradictions, ambiguities, incompleteness, inconsistencies, etc
since each user typically has only a partial and incomplete view of the system. In case of the
Huffman coding algorithm the main requirement is the frequency distribution table which needs
to be checked for contradictions, ambiguities, and incompleteness.

For example, the user should not enter the same symbol twice resulting in inconsistency. This
would lead to errors as each symbol has only one Huffman code. if the user enters only one
symbol this leads to incompleteness which is removed by not using of algorithm and we can
conventionally assign 1 or 0 to the symbol. Similarly, the number of symbols cant be greater
than 94.

B.E COMPUTER ENGINEERING Page 4


HUFFMAN CODING 2010

ASCIIHexSymbol ASCIIHexSymbol ASCIIHexSymbol ASCIIHexSymbol

0 0 NUL 16 10 DLE 32 20 (space) 48 30 0


1 1 SOH 17 11 DC1 33 21 ! 49 31 1
2 2 STX 18 12 DC2 34 22 " 50 32 2
3 3 ETX 19 13 DC3 35 23 # 51 33 3
4 4 EOT 20 14 DC4 36 24 $ 52 34 4
5 5 ENQ 21 15 NAK 37 25 % 53 35 5
6 6 ACK 22 16 SYN 38 26 & 54 36 6
7 7 BEL 23 17 ETB 39 27 ' 55 37 7
8 8 BS 24 18 CAN 40 28 ( 56 38 8
9 9 TAB 25 19 EM 41 29 ) 57 39 9
10 A LF 26 1A SUB 42 2A * 58 3A :
11 B VT 27 1B ESC 43 2B + 59 3B ;
12 C FF 28 1C FS 44 2C , 60 3C <
13 D CR 29 1D GS 45 2D - 61 3D =
14 E SO 30 1E RS 46 2E . 62 3E >
15 F SI 31 1F US 47 2F / 63 3F ?

ASCIIHexSymbol ASCIIHexSymbol ASCIIHexSymbol ASCIIHexSymbol

64 40 @ 80 50 P 96 60 ` 112 70 p
65 41 A 81 51 Q 97 61 a 113 71 q
66 42 B 82 52 R 98 62 b 114 72 r
67 43 C 83 53 S 99 63 c 115 73 s
68 44 D 84 54 T 100 64 d 116 74 t
69 45 E 85 55 U 101 65 e 117 75 u
70 46 F 86 56 V 102 66 f 118 76 v
71 47 G 87 57 W 103 67 g 119 77 w
72 48 H 88 58 X 104 68 h 120 78 x
73 49 I 89 59 Y 105 69 i 121 79 y
74 4A J 90 5A Z 106 6A j 122 7A z
75 4B K 91 5B [ 107 6B k 123 7B {
76 4C L 92 5C \ 108 6C l 124 7C |
77 4D M 93 5D ] 109 6D m 125 7D }

B.E COMPUTER ENGINEERING Page 5


HUFFMAN CODING 2010

78 4E N 94 5E ^ 110 6E n 126 7E ~
79 4F O 95 5F _ 111 6F o 127 7F 

As can be seen from the above table, the number of valid symbols that can be entered can’t be
greater than 94, because the standard ascii uses 32 among 128 for control purposes.

This incompleteness needs to removed by prompting the proper error message. The Huffman
coding has important feature of being “prefix free” i.e., none of the code words are prefix of any
other codeword’s, so that ambiguity will not occur among the Huffman code words because of
the prefix property and thus leads to correct decoding of the message .

2.3 SRS (software requirements specification):

The customer requirements identified during the requirements gathering and analysis activity are
organized into a SRS document. The important components of this document are:

Purpose:

a) Data compression: -The main purpose of the software is generating Huffman codes
used for data compression.

b) Overview:- In computer science and information theory, Huffman coding is an


algorithm used for lossless data compression. The term refers to the use of a variable-length code
table for encoding a source symbol (such as a character in a file) where the variable- length code
table has been derived in a particular way based on the estimated probability of occurrence for
each possible value of the source symbol. It was developed by David A. Huffman at MIT, and
published in the 1952 paper "A Method for the Construction of Minimum-Redundancy Codes”.
Huffman coding uses a specific method for choosing the representation for each symbol,
resulting in a prefix code that expresses the most common characters using shorter strings of bits
than are used for less common source symbols.

c) End users:-Information theory is a branch of applied mathematics and electrical


engineering involving quantification of information. It is the study of ways in which information
can be represented or encoded. The Huffman Coding Software is meant for information theory
professionals.

B.E COMPUTER ENGINEERING Page 6


HUFFMAN CODING 2010
 Functional requirements:

Functional requirements describe the functions to be supported by the system. Each


function can be characterized by the input data, the processing required on the input data and the
output data to be produced. The main functions to be supported are:-

1. The user provides symbol and its corresponding frequency as input. Frequency has to be a
positive integer.

2. The symbol and its corresponding frequency are inserted into the node of priority queue.

3. Once we have inserted all the symbols and their corresponding frequencies into the priority
queue , we build the Huffman tree for the symbols

4. Once the complete tree is created we determine the Huffman code of each symbol by
traversing the tree.

 Non functional requirements:

Hardware requirements: 16 MB RAM

Software requirements: Windows, UNIX or Macintosh OS and C++ compiler.

Ways for executing on Unix or Mac

1. download gcc ;in unix type “sudo apt-get install build-essential” at the terminal

2. Make minor changes to run on posix plateform i.e unix or mac eg change clrscr
with clear which is the corresponding command in linux for same purpose.

3. The GNU Compiler Collection (usually shortened to GCC) is a compiler system


produced by the GNU Project supporting various programming languages

4. GCC has been adopted as the standard compiler by most other modern Unix-like
computer operating systems, including GNU/Linux, the BSD family and Mac OS
X.

5. command for running on unix after the installation of gcc components “g++
-Wall -W -Werror huff.cpp -o huff”

B.E COMPUTER ENGINEERING Page 7


HUFFMAN CODING 2010

B.E COMPUTER ENGINEERING Page 8


HUFFMAN CODING 2010

 Goals of implementation:

In computer science data compression or source coding is the process of encoding


information using fewer bits (or other information-bearing units) than an unenclosed
representation would use, through use of specific encoding schemes.
As with any communication, compressed data communication only works when both the sender
and receiver of the information understand the encoding scheme. For example, this text makes
sense only if the receiver understands that it is intended to be interpreted as characters
representing the English language. Similarly, compressed data can only be understood if the
decoding method is known by the receiver.
Compression is useful because it helps reduce the consumption of expensive resources, such as
hard disk space or transmission bandwidth. On the downside, compressed data must be

B.E COMPUTER ENGINEERING Page 9


HUFFMAN CODING 2010
decompressed to be used, and this extra processing may be detrimental to some applications. For
instance, a compression scheme for video may require expensive hardware for the video to be
decompressed fast enough to be viewed as it's being decompressed (the option of decompressing
the video in full before watching it may be inconvenient, and requires storage space for the
decompressed video). The design of data compression schemes therefore involves trade-offs
among various factors, including the degree of compression, the amount of distortion introduced
(if using a lossy compression scheme), and the computational resources required to compress and
uncompress the data.
The goal of the Huffman coding implementation is the data compression-process of reducing the
amount of data required to represent a given quantity of information as illustrated in figure
below:

A source generates 4 different symbols {a1,a2,a3,a4} with probability {0.4;0.35;0.2;0.05}. A


binary tree is generated from left to right taking the two less probable symbols, putting them
together to form another equivalent symbol having a probability that equals the sum of the two
symbols. The process is repeated until there is just one symbol. The tree can then be read
backwards, from right to left, assigning different bits to different branches. The final Huffman
code is:

The standard way to represent a signal made of 4 symbols is by using 2


Symbol Code bits/symbol. If this Huffman code is used to represent the signal, then the
average length is lowered to 1.85 bits/symbol. The technique works by creating
a binary tree of nodes. Thus, on an average ,this achieves a saving of 0.15
a1 0 bits/symbol and thus achieving data compression.

In run-length encoding, large runs of consecutive identical data values


a2 10 are replaced by a simple code with the data value and length of the run. This is
an example of lossless data compression. It is often used to optimize disk space
on office computers, or better use the connection bandwidth in a computer
a3 110 network. For symbolic data such as spreadsheets, text, executable programs,
etc., losslessness is essential because changing even a single bit cannot be
tolerated (except in some limited cases).
a4 111 For visual and audio data, some loss of quality can be tolerated without losing
the essential nature of the data. By taking advantage of the limitations of the
human sensory system, a great deal of space can be saved while producing an output which is
nearly indistinguishable from the original. These lossy data compression methods typically offer
a three-way tradeoff between compression speed, compressed data size and quality loss.

B.E COMPUTER ENGINEERING Page 10


HUFFMAN CODING 2010
Lossy image compression is used in digital cameras, to increase storage capacities with minimal
degradation of picture quality. Similarly, DVDs use the lossy MPEG-2 Video codec for video
compression.
In lossy audio compression, methods of psychoacoustics are used to remove non-audible (or less
audible) components of the signal. Compression of human speech is often performed with even
more specialized techniques, so that "speech compression" or "voice coding" is sometimes
distinguished as a separate discipline from "audio compression". Different audio and speech
compression standards are listed under audio codecs. Voice compression is used in Internet
telephony for example, while audio compression is used for CD ripping and is decoded by audio
players.

Huffman coding today is often used as a "back-end" to some other compression method.
DEFLATE (PKZIP's algorithm) and multimedia codecs such as JPEG and MP3 have a front-end
model and quantization followed by Huffman coding.

Example implementations:
 DEFLATE (a combination of LZ77 and Huffman coding) – used by ZIP, gzip and PNG
files

 Unix pack utility (the .z file format) used Huffman coding


 bzip2 (a combination of the Burrows-Wheeler transform and Huffman coding)

 JPEG (image compression using a discrete cosine transform, then quantization, then
Huffman coding)
 MPEG (audio and video compression standards family in wide use, using DCT and
motion-compensated prediction for video)
o MP3 (a part of the MPEG-1 standard for sound and music compression, using
subbanding and MDCT, perceptual modeling, quantization, and Huffman coding)
o AAC (part of the MPEG-2 and MPEG-4 audio coding specifications, using
MDCT, perceptual modeling, quantization, and Huffman coding)

B.E COMPUTER ENGINEERING Page 11


HUFFMAN CODING 2010

DESIGN PROCEES FOR HUFFMAN CODING

During the software design phase, the designer transforms the SRS document document into the
design document. The design document produced at the end of design phase should be
implemented using a programming language in the coding phase.

The items that are taken into consideration in the design phase are the different modules which
constitute it. Control relationships and interfaces among different modules are identified.
Suitable data structures for the data to be stored need to be properly designed and documented.

One of the basic steps in the design process involves graphical representation of our main
problem. We use DFD’S for graphical representation.

Context level DFD

Frequency Huffman codes


Huffman
distribution table

Coding

The Huffman software takes the frequency distribution table as input and computes the
corresponding Huffman codes for each symbol.

B.E COMPUTER ENGINEERING Page 12


HUFFMAN CODING 2010

Level 1 DFD

input

Take input

From user
0.1
valid

Priority
Insert into queue node
Priority
Queue 0.2

Insert
into
Composite tree node
node of tree
symbol
0.3

Generate
codes

0.4 code

B.E COMPUTER ENGINEERING Page 13


HUFFMAN CODING 2010

In Level 1 DFD we provide character and its corresponding frequency as input, if the input is
valid then the input is inserted into the priority queue. Once we have inserted all the symbols
and their corresponding frequencies , we deque two nodes of lowest frequencies from the
priority queue and form the composite node of type tree which is then reinserted back in the
priority queue if it is not empty. Once the complete tree is created we determine the Huffman
code of each symbol by traversing the tree .

B.E COMPUTER ENGINEERING Page 14


HUFFMAN CODING 2010

Level 2 DFD (0.1)

input

Take no. of
symbols

0.1.1
valid

Enter the
symbol

0.1.2

valid

Enter
frequency

0.1.3
valid

In this DFD , we take the number of symbols as input , the user can enter only integers from 1 to
94. If the user enters any other value e.g characters then the error message is to be displayed and
the user must enter the valid input. After getting the valid number of characters the user must
enter the symbol for which the Huffman code is required. Here the user can enter the symbols.
After the valid input has been taken, we take the frequency of the character. In this case we have
to accept only the integer values; if the user provides the input other than integers then error
message is to be displayed.

B.E COMPUTER ENGINEERING Page 15


HUFFMAN CODING 2010

Level 2 DFD (0.2)

valid

Calculate
location in
Pque 0.2.1
position

Insert
node in
Composite node of tree Pque 0.2.2
Priority queue
node

In this DFD, the proper location of the valid symbol or composite node is determined & placed at
the same location. In this case if the front of the priority queue is null then the node is inserted at
the beginning of the priority queue else the node is placed at specific position depending on its
frequency.

B.E COMPUTER ENGINEERING Page 16


HUFFMAN CODING 2010

LEVEL 2 DFD (0.4)

symbol

Calculate path
from root to leaf

0.4.1 path found

Traverse path
assigning 0 to left
and 1 to right child

0.4.2 code

Here the input is the symbol for which we have to generate the Huffman code. While traversing
from root to leaf (symbol) we assign zero as we move to left child and 1 as we move to right.

B.E COMPUTER ENGINEERING Page 17


HUFFMAN CODING 2010

Data Dictionary
Name Description
Composite node Tree or priority queue node
Tree Symbol,its frequency and pointers to left and right childern
Priority queue Symbol,its frequency and pointer to next node in the queue
Frequency distribution table Symbols & their frequency
Huffman code Bit pattern used to represent a symbol
Root Root node of huffman tree
Leaf Symbols entered(having no children)
Path Unique sequence of edges from root to a symbol

B.E COMPUTER ENGINEERING Page 18


HUFFMAN CODING 2010

3.2 Structured design

3.2.1 Data design:

The main operations that the data structure must support are as follows:

- It must represent a binary Huffman tree with nonnegative frequencies.


- It must store a list of internal priority queue nodes in non decreasing order.

Huffman algorithm consists of two main data structures:-

1. Priority queue:

Priority queue holds the symbol, frequency and the pointer to the next
node in the queue. T his datastructure is used to create a node in the increasing
order priority queue. The main functions that manipulate this priority queue like
finding position for a node in the queue based on frequency, dequeing the first
node of the priority queue for composite node creation.

2. Binary tree:

Binary tree is a tree that holds symbol, frequency of the character, and
pointers to right and left children. This datastructure is used to create a node of
Huffman tree. Finally, the Huffman tree is traversed to generate the Huffman
code.

B.E COMPUTER ENGINEERING Page 19


HUFFMAN CODING 2010

3.2.2 Work break down structure (wbs):

MAIN

WELCOME
INPUT INSERT DISPLAY ENCODE

ENQUE DEQUE

FP

The main module calls input module which in turn calls enque module which inserts the nodes in
the priority queue at proper position based on frequency by calling the find position module.
Main module then calls the insert module. Insert module calls the deque module which returns
the lowest priority node. This information is used to create a composite node of type tree .The
summation of the frequencies and concatenation of the symbols of two nodes dequeued are then
used as parameters to create a composite node of type ‘priority queue’ into the queue if it is
empty . The main module may call the display module to display the symbols in the in-order
manner for debugging purposes. At last the encode module is used to generate the Huffman
codes for each symbol.

B.E COMPUTER ENGINEERING Page 20


HUFFMAN CODING 2010

3.2.3 Procedural design

3.2.3.1 Flowcharts:
start

Enter frequency and symbol

no
All
symbols
entered
yes

Insert symbol in priority queue in


increasing order

Build Huffman tree

Traversal Huffman tree to


generate code

stop

B.E COMPUTER ENGINEERING Page 21


HUFFMAN CODING 2010
Flowchart for main

start

Create a priority queue


node

Find position

yes

Position Insert at beginning


=Null

no

Insert at specific position

Stop

Flowchat for enque

B.E COMPUTER ENGINEERING Page 22


HUFFMAN CODING 2010

start

Pointer =front

Front=front->next

Return pointer

stop

Flow chart for deque

B.E COMPUTER ENGINEERING Page 23


HUFFMAN CODING 2010

start

Root!=Null yes

Root= left child

Display root

Root= right child

Flowchart for display

B.E COMPUTER ENGINEERING Page 24


HUFFMAN CODING 2010

start

Front != Null
stop no

yes

Dequeue two nodes of lowest


frequency

Create composite node of tree

Adjust nodes of tree

yes

Node has

children no

Assign left & right child

Insert composite
yes
Priority node into priority queue
queue no
empty

stop

Flowchart for Insertion

B.E COMPUTER ENGINEERING start Page 25


HUFFMAN CODING 2010

no

Symbol != node
stop

yes

Print 0
yes
Node= left child Symbol found
in left child

no

Print 1

Node= right child

Flow chart for Encode

B.E COMPUTER ENGINEERING Page 26


HUFFMAN CODING 2010
3.2.3.2 Algorithms

Algorithm for main :-

STEP1: begin

Step2: take frequency & symbol as input.

STEP3: repeat step 2 until there is no input

Step4: insert symbols in priority queue in increasing order.

Step5: build Huffman tree.

Step6: traverse the tree to generate the Huffman code.

Step7: stop.

Algorithm for encode :-

STEP1: begin

Step2: while symbol != Node

if symbol found in left subtree

Print 0

Node = left child

Goto 2

Else

Print 1

Node = right child

Goto 2

Step3: stop.

Algorithm for display:-

B.E COMPUTER ENGINEERING Page 27


HUFFMAN CODING 2010
STEP1: begin

Step2: If root!= NULL

Root=left child

Display (root)

Display the symbol.

Root=right child

Display(root)

Step3: stop.

Algorithm for enque:-

Step1: Begin

Step2: Create node of priority queue.

STEP3: Find the position of the symbol in priority queue.

STEP4: Insert node at specific position.

STEP5: STOP

B.E COMPUTER ENGINEERING Page 28


HUFFMAN CODING 2010
Algorithm for deque:-

STEP1: Begin.

STEP2: return the node having the lowest frequency from priority queue.

Step 3: update front of queue.

Step4: stop.

Algorithm for insertion:-

STEP1: begin

Step2: Dequeue two nodes of lowest frequency.

STEP3: create composite nodes of tree.

Step4: adjust the tiny trees

Step5: If front!= Null

Step6: insert composite node into priority queue

Step7: stop.

B.E COMPUTER ENGINEERING Page 29


HUFFMAN CODING 2010

2 CODING
/* HUFFMAN CODING (Mini Project)*/

/*

Implemented By :

Asif Iqbal,Aquib Rahid Pandit,Kaleem Dar & Sahil Sholla

*/

/*header files:*/

#include<iostream.h>

#include<string.h>

#include<math.h>

#include<stdlib.h>

#include<conio.h>

#include<ctype.h>

#include<graphics.h>

/*Global declarations*/

int n;

char b[94][2];

/* Structure Specifications */

/* Binary Tree Node Specification */

/* 'tree' is a Binary Tree, that holds symbol,its

B.E COMPUTER ENGINEERING Page 30


HUFFMAN CODING 2010
frequency and pointers to left and right children */

struct tree

char a[94];

int s;

struct tree *left,*right;

}*root=NULL,*tt[47]={NULL},*temp,*temp2,*t2,*ri,*le;

/* Priority Queue Node Specification */

/* 'pqu'is a priority queue node that holds symbol,its

frequency and pointer to next node in the queue*/

struct pqu

int info;

char a[94];

B.E COMPUTER ENGINEERING Page 31


HUFFMAN CODING 2010
struct pqu *ptr;

}*front=NULL,*t,*par,*t1,*p1,*p2;

Code has been


hidden for
security reasons

//main program

B.E COMPUTER ENGINEERING Page 32


HUFFMAN CODING 2010
void main()

int i;

welcome();

input();

insert();

//disp(root);

clrscr();

cout<<"\n\nThe corresponding codes are...\n\n";

for(i=0;i<n;i++)

cout<<b[i] << "==>";

encode(b[i]);

cout<<"\t";

getch();

B.E COMPUTER ENGINEERING Page 33


HUFFMAN CODING 2010

3 TESTING

Our Huffman coding software consists of different modules. An error in any one of these
modules will result in system error and if these errors are not debugged it will result in defective
system. This may result in rejection by the customer at the customer testing phase, and thus will
result in the project failure. Thus testing is an important activity. We need not to carry out unit
testing as our program is simple with fewer modules. However we do carry out phased testing of
our code to verify that the code is working properly. Various modules which perform the
function of inputting the frequency table, enquing the symbols into priority queue at their
specific position as per their frequency; this task is performed by find position (fp) module; and
then dequeinq the symbols from priority queues and inserting it into the binary tree.

At this phase we include a debugging module named display to display the symbols of complete
binary tree by inorder traversal. If the result of display module is not what is expected then the
errors in the code were identified and corrected. We have taken some conventions as specified in
SRS document. They are:

1. We assign 0 to the left child and 1 to the right child in the tree.

2. The position of a new composite node with same frequency as of those symbol present in the
priority queue, we place the one which enters the priority queue next at high priority.

3. We assign symbol with higher frequency as left child & lower frequency as right child in the
tree.

4. We assign 1 for any string or character in the case of only one character.

By taking all the above conventions into consideration we conducted integration testing using
phased approach because of the fewer number of modules.

Test Case (0)

Input:

 Total number of symbols = 6.

 Frequency distribution table as:-

{ a=3,b=3,c=4,d=4,e=5,f=5 }

B.E COMPUTER ENGINEERING Page 34


HUFFMAN CODING 2010
Expected output:

 ffeefebadcbbaabadcddcc

Result:

 Verification successful

Elaboration of above case

Iteration (1)

Priority queue

front a=3 b=3 c=4 d=4 e=5 f=5

Processing:

for i=o, z=0

p1=address of (a) p2=address of (b)

a1=a b1=b

temp
root
ba=6
tt[0]
temp2

b=3 a=3

Nul Nul Nul Nul


l l l l

Iteration (2)

Priority queue

B.E COMPUTER ENGINEERING Page 35


HUFFMAN CODING 2010

front c=4 d=4 e=5 f=5 ab=6

Processing:

for (i=0, z=1)

p1=address of (c) p2=address of (d)

a1=c b1=d

temp
root
dc=6
tt[1]
temp2

d=4 c=4

Nul Nul Nul Nul


l l l l

Iteration (3)

Priority queue

front e=5 f=5 ba=6 dc=8

Processing:

for (i=0, z=2)

p1=address of (e) p2=address of (f)

a1=e b1=f

B.E COMPUTER ENGINEERING Page 36


HUFFMAN CODING 2010

temp
root
fe=10
tt[2]
temp2

f=5 e=5

Nul Nul Nul Nul


l l l l

Iteration (4)

Priority queue

front ba=6 dc=8 fe=10

Processing:

for (i=0, z=3)

p1=address of (ab) p2=address of (cd)

a1=ab b1=cd

temp
root
badc
tt[2]
temp2

ba=66 dc=85
666

b=3 a=3 d=4 c=4

nul nul nul nul nul nul nul nul


l l l l l l l l

B.E COMPUTER ENGINEERING Page 37


HUFFMAN CODING 2010
Iteration (5)

Priority queue

front fe=10 badc=14

Processing:

for (i=0, z=4)

p1=address of (fe) p2=address of (badc)

a1=fe b1=badc

febadc =24

badc=14
fe=10

f=5 e=5 ba=66 dc=85


666

b=3 a=3 d=4 c=4

 The inorder traversal of the above tree yields “ffeefebadcbbaabadcddcc”

Which was the expected output, thus the routines involved in creation of priority queue and
Huffman tree are verified to be correct. This completes the phased integration testing .

B.E COMPUTER ENGINEERING Page 38


HUFFMAN CODING 2010

B.E COMPUTER ENGINEERING Page 39


HUFFMAN CODING 2010

B.E COMPUTER ENGINEERING Page 40


HUFFMAN CODING 2010
Once the phased testing was complete, all the other modules where integrated to form
the complete Huffman software. Then system testing was conducted according to the following
plan:

We first designed various test cases wit which the software was tested with various test
cases used:

CASE(1)

Input:

 no. of symbols = a

Expected = error message

Result: Success

For the test case(1) we incorporated certain lines of code which checked whether the input; for
number of symbols; is a number or not. This validation was performed by using isdigit( )

function which is included in ctype library.

CASE (2)

Input:

 no of symbols = 0

Expected = error message

Result: success

We simply checked whether the input; for no. of characters; is 0, and display a message to enter
non-zero number and then continue the program to take new input

CASE (3).

Input:

 no. of symbols = -1

Expected = error message


B.E COMPUTER ENGINEERING Page 41
HUFFMAN CODING 2010
Result: success

tn[0] contains negative sign and thus isdigit( ) will be false and message to enter valid positive
integers will again be displayed.

CASE(4).

Input:

 no. of symbols = 1

Expected=error message

Result: success

We don’t need to execute Huffman algorithm because it won’t be necessary nor will it result in
any data compression. Thus we simply exit the program.

CASE (5).
input

enter symbol = asdd

Expected = error message

Result: success

Strlen is used to first calculate the length of string entered and if it is more than 1 message is
displayed to enter one symbol only

CASE (6).

B.E COMPUTER ENGINEERING Page 42


HUFFMAN CODING 2010
Input:

 no.of symbols = 98

Expected = error message

Result: success

We check the condition if n>94 i.e. if the no. of symbols is specified more than 94 we display the
message to the user to enter numbers less than 94.

CASE (7).

Input

 enter symbol = aa

Expected = error message

Result: success

To eliminate redundancy or ambiguity we include a loop which runs for the number of symbols,
continuously comparing the newly input symbol with the previous input symbols which are
stored in b[ ], if the match is found then message is displayed notifying the user that the newly
input symbol is already entered and current loop is exited.

After all the above validation we performed regression testing of all the previous cases and
verify the result. Now that all the cases behaved as expected & thus system testing is complete.

B.E COMPUTER ENGINEERING Page 43


HUFFMAN CODING 2010

B.E COMPUTER ENGINEERING Page 44


HUFFMAN CODING 2010

B.E COMPUTER ENGINEERING Page 45


HUFFMAN CODING 2010

B.E COMPUTER ENGINEERING Page 46


HUFFMAN CODING 2010

Future scope:
The huffman coding the we have considered is simple binary Huffman coding but many
variations of Huffman coding exist,

1. n-ary Huffman coding:

The n-ary Huffman algorithm uses the {0, 1, ... , n − 1} alphabet to encode message and build an
n-ary tree. This approach was considered by Huffman in his original paper. The same algorithm
applies as for binary (n equals 2) codes, except that the n least probable symbols are taken
together, instead of just the 2 least probable. Note that for n greater than 2, not all sets of source
words can properly form an n-ary tree for Huffman coding. In this case, additional 0-probability
place holders must be added. If the number of source words is congruent to 1 modulo n-1, then
the set of source words will form a proper Huffman tree.

2.Adaptive Huffman coding:

A variation called adaptive Huffman coding calculates the probabilities dynamically based on
recent actual frequencies in the source string. This is somewhat related to the LZ family of
algorithms.

3.Huffman template algorithm:

Most often, the weights used in implementations of Huffman coding represent numeric
probabilities, but the algorithm given above does not require this; it requires only a way to order
weights and to add them. The Huffman template algorithm enables one to use any kind of
weights (costs, frequencies etc)

4. Length-limited Huffman coding:

Length-limited Huffman coding is a variant where the goal is still to achieve a minimum
weighted path length, but there is an additional restriction that the length of each codeword must
be less than a given constant. The package-merge algorithm solves this problem with a simple
greedy approach very similar to that used by Huffman's algorithm. Its time complexity is O(nL),
where L is the maximum length of a codeword. No algorithm is known to solve this problem in
linear or linearithmic time, unlike the presorted and unsorted conventional Huffman problems,
respectively.

5.Huffman coding with unequal letter costs:

In the standard Huffman coding problem, it is assumed that each symbol in the set that the code
words are constructed from has an equal cost to transmit: a code word whose length is N digits
will always have a cost of N, no matter how many of those digits are 0s, how many are 1s, etc.
When working under this assumption, minimizing the total cost of the message and minimizing
the total number of digits are the same thing.Huffman coding with unequal letter costs is the

B.E COMPUTER ENGINEERING Page 47


HUFFMAN CODING 2010
generalization in which this assumption is no longer assumed true: the letters of the encoding
alphabet may have non-uniform lengths, due to characteristics of the transmission medium. An
example is the encoding alphabet of Morse code, where a 'dash' takes longer to send than a 'dot',
and therefore the cost of a dash in transmission time is higher. The goal is still to minimize the
weighted average codeword length, but it is no longer sufficient just to minimize the number of
symbols used by the message. No algorithm is known to solve this in the same manner or with
the same efficiency as conventional Huffman coding.

Moreover we can extend the range of the huffman coding software to incorporate unicode
whicb will require an interfacing module that will interpret a perticular key in different languages
based on the option selected. In that case the maximum number of symbols will be of the order
of 232 .

B.E COMPUTER ENGINEERING Page 48

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy