Compression Decompression
Compression Decompression
COMPRESSION &
DECOMPRESSION
Submitted in partial fulfillment of the requirement for the
Award of the degree of
Bachelor of Technology
In
Information Technology
By
RAHUL SINGH
SHAKUN GARG
0407713057, 0407713042
1 1
CONTENTS
ACKNOWLEDGEMENT 4
CERTIFICATE 5
LIST OF TABLES 6
LIST OF FIGURES 6
ABSTRACT 7-13
2 2
MAIN REPORT 20-118
3 3
7.2 Code Construction 68
7.3 Huffing Program 68
7.4 Building Table 69
7.5 Decompressing 70
7.6 Transmission & storage of Huffman encoded data 72
8 Working of Project 73
8.1 Module & their description 73
9 Data Flow Diagram 75
10 Print Layouts 82
11 Implementation 85
12 Testing 87
12.1 Test plan 87
12.2 Terms in testing fundamentals 88
13 Conclusion 94
14 Future Enhancement & New Direction 95
14.1 New Direction 95
14.2 Scope of future work 96
14.3 Scope of future application 96
15 Source Code 97-118
16 References 119
4 4
ACKNOWLEDGEMENT
Keep away from people who try to belittle your ambitions. Small people always do that,
but the really great make you feel that you too, can become great.
We take this opportunity to express my sincere thanks and deep gratitude to all
those people who extended their wholehearted co-operation and have helped me in
completing this project successfully.
First of all, we would like to thank Mr. Gaurav Vajpai (Project Guide) for his
strict supervision, constant encouragement, inspiration and guidance, which ensure the
worthiness of my work. Working under him was an enrich experience. His inspiring
suggestions and timely guidance enabled us to perceive the various aspects of the project in a
new light.
We would also thank to Head Dept. of IT, Prof. Jaideep Kumar,H.O.D., who
guided us a lot in completing this project. We would also like to thank my parents & project
mate for guiding and encouraging me throughout the duration of the project.
We will be failing in our mission if we do not thank other people who directly or
indirectly helped us in the successful completion of this project. So, our heartfull thanks to
all the teaching and non- teaching staff of computer science and engineering department of
our institution for their valuable guidance throughout the working of this project.
RAHUL SINGH
SHAKUN GARG
MANISH SRIVASTAVA
5 5
Dr. K.N. Modi Institute of Engineering and Technology
Modinagar
Affiliated to UP Technical University, Lucknow
CERTIFICATE
This is to certify that RAHUL SINGH (0407713057), SHAKUN GARG (0407713042) and
MANISH SRIVASTAVA (0407713021) of the final year B. Tech. (IT) have carried out a
Mr. GAURAV VAJPAI in Department IT for the partial fulfillment of the award of the
bonafide record of work done by them during the year 2007 2008.
Head, Department of IT
6 6
LIST OF TABLES
FILE TABLE
1
2 DETAIL TABLE
LIST OF FIGURES
1. ARCHITECTURE OF NETPOD 19
2 PERT CHART 36
38
3. GANTT CHART
7 7
COMPRESSION & DECOMPRESSION
and sounds. Just by looking at the applications around us, the Internet, development of Video
CDs (Compact Disk) ,Video Conferencing, and much more, all these applications use
I guess many of us have surfed the Internet, have you ever become so frustrated in waiting
for a graphics intensive web page to be opened that you stopped the transfer I bet you
have. Guess what will happened if those graphics are not compressed?
Uncompressed graphics, audio And video data consumes very large amount of physical
storage which for the case of uncompressed video, even present CD technology is unable to
handle.
Files available for transfer from one host to another over a network (or via modem) are often
stored in a compressed format or some other special format well-suited to the storage medium
and/or transfer method. There are many reasons for compressing/archiving files. The more
common are:
8 8
File compression can significantly reduce the size of a file (or group of files). Smaller files
take up less storage space on the host and less time to transfer over the network, saving both
The objective of this system is to compress and decompress files. This system will be used to
compress files , so that they may take less memory for storage and transmission from one
Our project will be able to Compress message in such a form that can be easily transmitted
over the network or from one system to another. At the receiver end after decompressing the
message receiver will get the original message. This is how effective transmission of data that
Reusability:
Reusability is possible as and when we require in this application. We can update it next
version. Reusable software reduces design, coding and testing cost by amortizing effort over
several designs. Reducing the amount of code also simplifies understanding, which increases
9 9
the likelihood that the code is correct. We follow up both types of reusability: Sharing of
newly written code within a project and reuse of previously written code on new projects.
Extensibility:
This software is extended in ways that its original developers may not expect. The following
principles enhance extensibility like Hide data structure, Avoid traversing multiple links or
methods, Avoid case statements on object type and distinguish public and private operations.
Robustness:
Its method is robust if it does not fail even if it receives improper parameters. There are some
facilities like Protect against errors, Optimize after the program runs, validating arguments
Understandability:
A method is understandable if someone other than the creator of the method can understand
the code (as well as the creator after a time lapse). We use the method, which small and
Cost-effectiveness:
Its cost is under the budget and make within given time period. It is desirable to aim for a
system with a minimum cost subject to the condition that it must satisfy all the requirements.
Scope of this document is to put down the requirements, clearly identifying the
information needed by the user, the source of the information and outputs expected
METHODOLOGY ADOPTED
The methodology used is the classic Life-cycle model the WATERFALL MODEL
10 10
HARDWARE & SOFTWARE REQUIREMENTS
HARDWARE SPECIFICATIONS:
11 11
Ram 128 MB RAM or higher
Language: Java
TESTING TECHNOLOGIES
Unit testing
12 12
Module testing
Integration testingSystem testing
Acceptance testing
UNIT TESTING
Unit testing is the testing of a single program module in an isolated environment. Testing of
MODULE TESTING
A module encapsulates related component. So can be tested without other system modules.
INTEGERATION TESTING
Integration testing is the testing of the interface among the system modules. In other words it
SYSTEM TESTING
13 13
System testing is the testing of the system against its initial objectives. It is done either in a
ACCEPTANCE TESTING
Acceptance Testing is performed with realistic data of the client to demonstrate that the
software is working satisfactorily. Testing here is focused on external behavior of the system;
the amount of distortion introduced (if using a lossy compression scheme), and the
14 14
1. OBJECTIVE
The objective of this system is to compress and decompress files. This system will be used to
compress files , so that they may take less memory for storage and transmission from one
2. SCOPE
Our project will be able to Compress message in such a form that can be easily transmitted
over the network or from one system to another. At the receiver end after decompressing the
message receiver will get the original message. This is how effective transmission of data that
Reusability:
15 15
Reusability is possible as and when we require in this application. We can update it next
version. Reusable software reduces design, coding and testing cost by amortizing effort over
several designs. Reducing the amount of code also simplifies understanding, which increases
the likelihood that the code is correct. We follow up both types of reusability: Sharing of
newly written code within a project and reuse of previously written code on new projects.
Extensibility:
This software is extended in ways that its original developers may not expect. The following
principles enhance extensibility like Hide data structure, Avoid traversing multiple links or
methods, Avoid case statements on object type and distinguish public and private operations.
Robustness:
Its method is robust if it does not fail even if it receives improper parameters. There are some
facilities like Protect against errors, Optimize after the program runs, validating arguments
Understandability:
A method is understandable if someone other than the creator of the method can understand
the code (as well as the creator after a time lapse). We use the method, which small and
Cost-effectiveness:
Its cost is under the budget and make within given time period. It is desirable to aim for a
system with a minimum cost subject to the condition that it must satisfy all the requirements.
16 16
Scope of this document is to put down the requirements, clearly identifying the
information needed by the user, the source of the information and outputs expected
MODULE DESCRIPTION
Huffman Zip
Encoder
Decoder
Table
DLnode
Priority Queue
Huffman Node
Huffman zip is the main function which uses applet. It is used for user interface.
Encoder is the module for compressing the file. It implements Huffman algorithm for
compressing the text and image file. It first calculate the frequencies of all the occurring
symbols. Then on the basis of these frequencies it generates the priority queue. This priority
queue is used for finding the symbols with least frequencies. Now the two symbols with
lowest frequencies are deleted from the queue and a new symbol is added to the queue with
frequency equal to the sum of these two symbols. In the meanwhile we generate a tree with
leaf nodes are the two deleted node and the root node is the new node added to the queue. At
last we traverse the tree starting from the root node to the leaf node assigning 0 to the left
17 17
child and 1 to the right node. In this way we assign code to every symbol in the file. These
are binary codes then we group these binary codes and calculate the equivalent integers and
Decoder works in the reverse order as the encoder. It reads the input from the compressed file
and convert it into equivalent binary code. It has one another input the binary tree generated
in the encoding process and on the basis of these data it generates the original file. This
Table is used for storing the codes of each symbol. Priority queue takes input the symbols
and there related frequencies and on the basis of these frequencies it assign priorities to each
symbol. Huffman node is used for creating the binary tree it takes input two symbol from the
priority queue and create two nodes by comparing the frequencies of these two symbol. It
places the symbol with less frequency to the left and the symbol with high frequency to the
right, it then deletes these two symbol from the priority queue and places a new symbol with
frequency equal to the sum of frequencies of these two deleted symbol. It also generate a
parent node to the two node and assign frequency equal to the sum of frequencies of the two
leaf node.
18 18
Existing hardware will be used:
Intel Pentium-IV
128 MB RAM
Language: Java
Database: MS Access.
MAIN REPORT
19 19
OBJECTIVE AND SCOPE
The objective of this system is to compress and decompress files. This system will be used to
compress files , so that they may take less memory for storage and transmission from one
SCOPE
Our project will be able to Compress message in such a form that can be easily transmitted
over the network or from one system to another. At the receiver end after decompressing the
message receiver will get the original message. This is how effective transmission of data that
Reusability:
Reusability is possible as and when we require in this application. We can update it next
version. Reusable software reduces design, coding and testing cost by amortizing effort over
several designs. Reducing the amount of code also simplifies understanding, which increases
the likelihood that the code is correct. We follow up both types of reusability: Sharing of
newly written code within a project and reuse of previously written code on new projects.
Extensibility:
20 20
This software is extended in ways that its original developers may not expect. The following
principles enhance extensibility like Hide data structure, Avoid traversing multiple links or
methods, Avoid case statements on object type and distinguish public and private operations.
Robustness:
Its method is robust if it does not fail even if it receives improper parameters. There are some
facilities like Protect against errors, Optimize after the program runs, validating arguments
Understandability:
A method is understandable if someone other than the creator of the method can understand
the code (as well as the creator after a time lapse). We use the method, which small and
Cost-effectiveness:
Its cost is under the budget and make within given time period. It is desirable to aim for a
system with a minimum cost subject to the condition that it must satisfy all the requirements.
Scope of this document is to put down the requirements, clearly identifying the
information needed by the user, the source of the information and outputs expected from
the system
21 21
THEORETICAL BACKGROUND
Introduction
A brief introduction to information theory is provided in this section. The definitions and
methods are discussed. The following string of characters is used to illustrate the concepts
22 22
Theory:
closely related to algorithmic information theory) and by rate-distortion theory. These fields
of study were essentially created by Claude Shannon, who published fundamental papers on
the topic in the late 1940s and early 1950s. Doyle and Carlson (2000) wrote that data
compression "has one of the simplest and most elegant design theories in all of engineering".
Cryptography and coding theory are also closely related. The idea of data compression is
Many lossless data compression systems can be viewed in terms of a four-stage model. Lossy
data compression systems typically include even more stages, including, for example,
The Lempel-Ziv (LZ) compression methods are among the most popular algorithms for
and compression ratio, although compression can be slow. LZW (Lempel-Ziv-Welch) is used
in GIF images. LZ methods utilize a table based compression model where table entries are
substituted for repeated strings of data. For most LZ methods, this table is generated
dynamically from earlier data in the input. The table itself is often Huffman encoded (e.g.
SHRI, LZX).
The very best compressors use probabilistic models whose predictions are coupled to an
algorithm called arithmetic coding. Arithmetic coding, invented by Jorma Rissanen, and
turned into a practical method by Witten, Neal, and Cleary, achieves superior compression to
the better-known Huffman algorithm, and lends itself especially well to adaptive data
23 23
Definition :
In computer science and information theory, data compression or source coding is the process
of encoding information using fewer bits (or other information-bearing units) than an
unencoded representation would use through use of specific encoding schemes. For example,
this article could be encoded with fewer bits if one were to accept the convention that the
word "compression" be encoded as "comp". One popular instance of compression with which
many computer users are familiar is the ZIP file format, which, as well as providing
As is the case with any form of communication, compressed data communication only works
when both the sender and receiver of the information understand the encoding scheme. For
example, this text makes sense only if the receiver understands that it is intended to be
interpreted as characters representing the English language. Similarly, compressed data can
Compression is useful because it helps reduce the consumption of expensive resources, such
as hard disk space or transmission bandwidth. On the downside, compressed data must be
decompressed to be viewed (or heard), and this extra processing may be detrimental to some
applications. For instance, a compression scheme for video may require expensive hardware
for the video to be decompressed fast enough to be viewed as it's being decompressed (you
always have the option of decompressing the video in full before you watch it, but this is
inconvenient and requires storage space to put the uncompressed video). The design of data
compression schemes therefore involve trade-offs between various factors, including the
degree of compression, the amount of distortion introduced (if using a lossy compression
scheme), and the computational resources required to compress and uncompress the data.
24 24
A code is a mapping of source messages (words from the source alphabet alpha) into
codewords (words of the code alphabet beta). The source messages are the basic units into
which the string to be represented is partitioned. These basic units may be single symbols
from the source alphabet, or they may be strings of symbols. For string EXAMPLE, alpha = {
block-block indicates that the source messages and codewords are of fixed length and
A block-block code for EXAMPLE is shown in Figure 1.1 and a variable-variable code is
given in Figure 1.2. If the string EXAMPLE were coded using the Figure 1.1 code, the length
of the coded message would be 120; using Figure 1.2 the length would be 30.
a 000 aa 0
b 001 bbb 1
c 010 cccc 10
d 011 ddddd 11
25 25
The oldest and most widely used codes, ASCII and EBCDIC, are examples of block-block
codes, mapping an alphabet of 64 (or 256) single characters onto 6-bit (or 8-bit) codewords.
These are not discussed, as they do not provide compression. The codes featured in this
When source messages of variable length are allowed, the question of how a message
ensemble (sequence of messages) is parsed into individual messages arises. Many of the
algorithms described here are defined-word schemes. That is, the set of source messages is
determined prior to the invocation of the coding scheme. For example, in text file processing
In Pascal source code, each token may represent a message. All codes involving fixed-length
source messages are, by default, defined-word codes. In free-parse methods, the coding
algorithm itself parses the ensemble into variable-length sequences of symbols. Most of the
known data compression methods are defined-word schemes; the free-parse model differs in
A code is distinct if each codeword is distinguishable from every other (i.e., the mapping
these features is desirable. The codes of Figure 1.1 and Figure 1.2 are both distinct, but the
code of Figure 1.2 is not uniquely decodable. For example, the coded message 11 could be
decoded as either ddddd or bbbbbb. A uniquely decodable code is a prefix code (or prefix-free
code) if it has the prefix property, which requires that no codeword is a proper prefix of any
other codeword. All uniquely decodable block-block and variable-block codes are prefix
codes. The code with codewords { 1, 100000, 00 } is an example of a code which is uniquely
decodable but which does not have the prefix property. Prefix codes are instantaneously
26 26
decodable; that is, they have the desirable property that the coded message can be parsed into
codewords without the need for lookahead. In order to decode a message encoded using the
codeword set { 1, 100000, 00 }, lookahead is required. For example, the first codeword of the
message 1000000001 is 1, but this cannot be determined until the last (tenth) symbol of the
message is read (if the string of zeros had been of odd length, then the first codeword would
A minimal prefix code is a prefix code such that if x is a proper prefix of some codeword,
then x sigma is either a codeword or a proper prefix of a codeword, for each letter sigma in
beta. The set of codewords { 00, 01, 10 } is an example of a prefix code which is not
minimal. The fact that 1 is a proper prefix of the codeword 10 requires that 11 be either a
constraint prevents the use of codewords which are longer than necessary. In the above
example the codeword 10 could be replaced by the codeword 1, yielding a minimal prefix
code with shorter codewords. The codes discussed in this paper are all minimal prefix codes.
In this section, a code has been defined to be a mapping from a source alphabet to a code
alphabet; we now define related terms. The process of transforming a source ensemble into a
encoding of the source ensemble. The algorithm which constructs the mapping and uses it to
transform the source ensemble is called the encoder. The decoder performs the inverse
27 27
Lossless compression algorithms usually exploit statistical redundancy in such a way as to
represent the sender's data more concisely, but nevertheless perfectly. Lossless compression is
possible because most real-world data has statistical redundancy. For example, in English
text, the letter 'e' is much more common than the letter 'z', and the probability that the letter 'q'
Another kind of compression, called lossy data compression, is possible if some loss of
fidelity is acceptable. For example, a person viewing a picture or television video scene might
not notice if some of its finest details are removed or not represented perfectly (i.e. may not
even notice compression artifacts). Similarly, two clips of audio may be perceived as the
same to a listener even though one is missing details found in the other. Lossy data
compression algorithms introduce relatively minor differences and represent the picture,
Lossless compression schemes are reversible so that the original data can be reconstructed,
while lossy schemes accept some loss of data in order to achieve higher compression.
However, lossless data compression algorithms will always fail to compress some files;
indeed, any compression algorithm will necessarily fail to compress any data containing no
discernible patterns. Attempts to compress data that has been compressed already will
In practice, lossy data compression will also come to a point where compressing again does
not work, although an extremely lossy algorithm, which for example always removes the last
byte of a file, will always compress a file up to the point where it is empty.
A good example of lossless vs. lossy compression is the following string -- 888883333333.
What you just saw was the string written in an uncompressed form. However, you could save
space by writing it 8[5]3[7]. By saying "5 eights, 7 threes", you still have the original string,
28 28
just written in a smaller form. In a lossy system, using 83 instead, you cannot get the original
Image compression:
Image here refers to not only still images but also motion-pictures and compression is the
Compression is simply representing information more efficiently; "squeezing the air" out of
the data, so to speak. It takes advantage of three common qualities of graphical data; they are
Today , compression has made a great impact on the storing of large volume of image data.
Even hardware and software for compression and decompression are increasingly being made
part of a computer platform. Compression does have its trade-offs. The more efficient the
compression technique, the more complicated the algorithm will be and thus, requires more
computational resources or more time to decompress. This tends to affect the speed. Speed is
not so much of an importance to still images but weighs a lot in motion-pictures. Surely you
do not want to see your favourite movies appearing frame by frame in front of you.
Most methods for irreversible, or ``lossy'' digital image compression, consist of three main
29 29
The three steps of digital image compression.
Image compression is the application of Data compression on digital images. In effect, the
objective is to reduce redundancy of the image data in order to be able to store or transmit
Image compression can be lossy or lossless. Lossless compression is sometimes preferred for
artificial images such as technical drawings, icons or comics. This is because lossy
compression methods, especially when used at low bit rates, introduce compression artifacts.
Lossless compression methods may also be preferred for high value content, such as medical
imagery or image scans made for archival purposes. Lossy methods are especially suitable for
natural images such as photos in applications where minor (sometimes imperceptible) loss of
The best image quality at a given bit-rate (or compression rate) is the main goal of image
compression. However, there are other important properties of image compression schemes:
or file (without decompression and re-compression). Other names for scalability are
progressive coding or embedded bitstreams. Despite its contrary nature, scalability can also
especially useful for previewing images while downloading them (e.g. in a web browser) or
for providing variable quality access to e.g. databases. There are several types of scalability:
30 30
Region of interest coding Certain parts of the image are encoded with higher quality than
others. This can be combined with scalability (encode these parts first, others later).
Meta information Compressed data can contain information about the image which can be
used to categorize, search or browse images. Such information can include color and texture
The quality of a compression method is often measured by the Peak signal-to-noise ratio. It
measures the amount of noise introduced through a lossy compression of the image.
However, the subjective judgement of the viewer is also regarded as an important, perhaps
Video Compression:
A raw video stream tends to be quite demanding when it comes to storage requirements, and
demand for network capacity when being transferred between computers. Before being stored
When compressing an image sequence, one may consider the sequence a series of
independent images, and compress each frame using single image compression methods, or
one may use specialized video sequence compression schemes, taking advantage of
similarities in nearby frames. The latter will generally compress better, but may complicate
Compression algorithms may be classified into two main groups, reversible and irreversible.
If the result of compression followed by decompression gives a bitwise exact copy of the
original for every compressed image, the method is reversible. This implies that no
quantizing is done, and that the transform is accurately invertible, i.e. it does not introduce
round-off errors.
31 31
When compressing general data, like an executable program file or an accounting database, it
is extremely important that the data can be reconstructed exactly. For images and sound, it is
often convenient, or even necessary to allow a certain degradation, as long as it is not too
noticeable by an observer.
Text compression:
The following methods yield two basic data compression algorithms, which produce good
The first strategy is a statistical encoding that takes into account the frequencies of symbols
to built a uniquely decipherable code optimal with respect to the compression criterion.
Huffman method (1951) provides such an optimal statistical coding. It admits a dynamic
version where symbol counting is done at coding time. The command "compact" of UNIX
Ziv and Lempel (1977) designed a compression method using encoding segments. These
segments are stored in a dictionary that is built during the compression process. When a
segment of the dictionary is encountered later while scanning the original text it is substituted
by its index in the dictionary. In the model where portions of the text are replaced by pointers
on previous occurrences, the Ziv and Lempel's compression scheme can be proved to be
asymptotically optimal (on large enough texts satisfying good conditions on the probability
distribution of symbols). The dictionary is the central point of the algorithm. Furthermore, a
hashing technique makes its implementation efficient. This technique improved by Welch
32 32
The problems and algorithms discussed above give a sample of text processing methods.
Several other algorithms improve on their performance when the memory space or the
number of processors of a parallel machine are considered for example. Methods also extend
33 33
LZW ALGORITHM
Compressor algorithm:
w = NIL;
while (read a char c) do
if (wc exists in dictionary) then
w = wc;
else
add wc to the dictionary;
output the code for w;
w = c;
endif
done
output the code for w;
Decompressor algorithm:
read a char k;
output k;
w = k;
while (read a char k) do
if (index k exists in dictionary) then
entry = dictionary entry for k;
else if (index k does not exist in dictionary && k == currSizeDict)
entry = w + w[0];
else
signal invalid code;
endif
output entry;
add w+entry[0] to the dictionary;
w = entry;
done
34 34
DEFINITION OF THE PROBLEM
Problem Statement:
In today's world of computing, it is hardly possible to do without graphics, images and sound.
Just by looking at the applications around us, the Internet, development of Video CDs
(Compact Disks), Video Conferencing, and much more, all these applications use graphics
I guess many of us have surfed the Internet; have you ever become so frustrated in waiting
for a graphics intensive web page to be opened that you stopped the transfer I bet you
have. Guess what will happened if those graphics are not compressed ?
Uncompressed graphics, audio and video data consumes very large amount of physical
storage which for the case of uncompressed video, even present CD technology is unable to
CASE 1
Take for instance, if we want to display a TV-quality full motion Video, how much of
physical storage will be required ? Szuprowics states that "TV-quality video requires 720
kilobytes per frame (kbpf) displayed at 30 frames per second (fps) to obtain a full-motion
effect, which means that one second of digitised video consumes approximately 22 MB
(megabytes) of storage. A standard CD-ROM disk with 648 MB capacity and data transfer
rate of 150 KBps could only provide a total of 30 seconds of video and would take 5 seconds
35 35
to display a single frame." Based on Szuprowics's statement we can see that this is clearly
unacceptable.
Transmission of uncompressed graphics, audio and video is a problem too. Expensive cables
with high bandwidth are required to achieve satisfactory result, which is not feasible for the
general market.
CASE 2
Take for example the transmission of uncompressed audio signal over the line for one
second :
From the table we can see that for better quality of sound transmitted over the channel, both
the bandwidth and storage requirement increases, and the size is not feasible at all.
Thus, to provide feasible and cost effective solutions, most multimedia systems use
Therefore, in this paper I will address on one specific standard of compression, JPEG. And at
the same time, I will also be going through basic compression techniques that serve as the
This paper focused on three forms of JPEG image compression : 1) Baseline Lossy JPEG ,2)
Progressive and 3) Motion JPEG. Each of their algorithm; characteristics and advantages will
be gone through.
36 36
I hope that by the end of the paper, reader will gain more knowledge of JPEG, understand
how it works and not just know that it's another form of image compression standard.
37 37
Analysis and design refers to the process of examining a business situation with the intent of
Analysis
Design
ANALYSIS:
Create a system definition that forms the foundation for all subsequent engineering work.
Both hardware and software expertise are required to successfully attain the objectives listed
above.
DESIGN
The most creative and challenging phase of the system life cycle is system design. The term
design describes a final system and the process by which it is developed. It refers to the
38 38
technical specifications (analogous to the engineers blueprints) that will be applied in
implementing the candidate system. It also includes the construction of programs and
program testing. The key question here is: How should the problem be solved? The major
The first step is to determine how the output is to be produced and I what format. Samples of
the output (and input) are also presented. Second, input data and master files (data base) have
to be designed to meet the requirements of the proposed output. The operational (processing)
phases are handled through program construction and testing, including a list of the programs
needed to meet the systems objectives and complete documentation. Finally, details related
to justification of the system and an estimate of the impact of the candidate system on the
user and the organization are documented and evaluated by management as a step towards
implementation.
The final report prior to the implementation phase includes procedural flowcharts, record
layouts, report layouts, and workable plans for implementing the candidate system.
Information on personnel, money, h/w, facilities and their estimated cost must also be
available. At this point, projected costs must be close to actual cost of implementation.
In some firms, separate groups of programmers do the programming where as other firms
employ analyst-programmers that do the analysis and design as well as code programs. For
this discussion, we assume that two separate persons carry out analysis and programming.
There are certain functions, though, that the analyst must perform while programs are being
written.
39 39
SYSTEM DESIGN:
Software design sits at the technical kernel of software engineering and is applied regardless
of the software process model that is used. Beginning once software requirements have been
analyzed and specified, software design is the first of the three technical activities Design,
Code generation and Test-that are required to build and verify the software. Each activity
single word-quality. Design is the place where quality is fostered in software engineering.
Design provides us with representation of software that can be assessed for quality. Design is
the only way that we can accurately translate a customers requirements into a finished
software product or system. Software design serves as the foundation for all the software
engineering and software support steps that follow. Without design we risk building an
unstable system-one that will fall when small changes are made; one that may be difficult to
test; one whose quality cannot be assessed until late in the software process, when time is
DESIGN OBJECTIVES:
Design phase of software development deals with transforming the customer requirements as
described in the SRS document into a form implement able using a programming language.
However, we can broadly classify various design activities into two important parts:
Detailed design
40 40
During high level design, different modules and the control relationships among them are
identified and interfaces among these modules are defined. The outcome of high level design
is called the Program Structure or Software Architecture. The structure chart is used to
During detailed design, the data structure and the algorithms used by different modules are
designed. The outcome of the detailed design is usually known as the Module Specification
document.
A good design should capture all the functionality of the system correctly. It should be easily
goodness of a design, since a design that is easily understandable is also easy to maintain and
change.
In order to enhance the understandability of a design, it should have the following features:
problem into modules facilitates taking advantage of the divide and conquers principle if
different modules are almost independent of each other then each module can be understood
41 41
Clean decomposition of a design problem into modules means that the modules in a software
The primary characteristics of clean decomposition are high cohesion and low coupling.
A module having high cohesion and low coupling is said to be Functional Independent of
other modules by the term functional independence we mean that a Cohesive module
Functionally independent module has minimal interaction with other modules. Functional
Functional independence reduces error propagation. An error existing in one module does not
directly affect other modules and also any error existing in other modules does not directly
this module.
Reuse of a module is possible because each module performs some well-defined and precise
function and the interface of the module with other modules is simple and minimum
complexity of the design is reduced because different modules can be understood in isolation,
DESIGN PRINCIPLES:
42 42
Modularity
Abstraction
A system consists of components, which have components of their own; indeed a system is a
design such hierarchies there are two possible approaches: top-down and bottom-up. The top-
down approach starts from the highest-level component of the hierarchy and proceeds
through to lower levels. By contrast, a bottom-up approach starts with the lowest-level
component of the hierarchy and proceeds through progressively higher levels to the top-level
component.
Top-down design methods often result in some form of stepwise refinement. Starting from
an abstract design, in each step the design is refined to more concrete to a more concrete
level, until we reach a level where no more refinement is needed and the design can be
implemented directly. Bottom-up methods work with layers of abstraction Starting from
the very bottom, operations that provide a layer of abstraction are implemented. The
operations of this layer are then used to implement more powerful operations and a still
higher layer of abstraction, until the stage is reached where the operations supported by the
43 43
MODULARITY:
The real power of partitioning comes if a system is partitioned into modules so that the
modules are solvable and modifiable separately. It will be even better if the modules are also
so that each component can be implemented separately, and a change to one component has
system repair-changing a part of the system is easy as it affects few other parts-and in system
building-a modular system can be easily built by putting its modules together.
ABSTRACTION :
that permits a designer to consider a component at an abstract level without worrying about
the details of the implementation of the component. Any component or system provides some
that component without bothering with the internal details that produce the behavior.
Presumably, the abstract definition of a component is much simpler than the component
itself.
44 44
There are two common abstraction mechanisms for software systems: Functional abstraction
module to compute the log of a value can be abstractly represented by the function log.
Similarly, a module to sort an input array can be represented by the specification of sorting.
Functional abstraction is the basis of partitioning in function- oriented approaches. That is,
when the problem is being partitioned, the overall transformation function for the system is
partitioned into smaller functions that comprise the system function. The decomposition of
The second unit for abstraction is data abstraction. Data abstraction forms the basis for
providing some services. Hence, the decomposition of the system is done with respect to the
When solving a small problem, the entire problem can be tackled at once. For solving larger
problems, the basic principles are the time-tested principle of divide and conquer. Clearly,
dividing in such a manner that all the divisions have to be conquered together is not the intent
of this wisdom. This principle, if elaborated, would mean, Divide into smaller pieces, so that
45 45
Problem partitioning, which is essential for solving a complex problem, leads to hierarchies
in the design. That is, the design produced by using problem partitioning can be represented
as a hierarchy of components. The relationship between the elements in this hierarchy can
vary depending on the method used. For example, the most common is the whole-part of
relationship. In this the system consists of some parts, each past consists of subparts, and so
on. This relationship can be naturally represented as a hierarchical structure between various
system parts. In general hierarchical structure makes it much easier to comprehend a complex
system. Due to this, all design methodologies aim to produce a design that has nice
hierarchical structures.
Requirement Determination
A system is intended to meet the needs of an organization so as to save storage capacity. Thus
the first step in the design is to specify these needs or requirements. Determining the
requirements to be met by a system in an organization. Having done this, the next step is to
departments are held and, through discussions, priorities among various applications are
determined, subject to the constraints of available computer memory, bandwidth, time taken
Requirement Specification
46 46
The top management of an organization first decides that a compression & decompression
system would be desirable to improve the operations of the organization. Once this basic
decision is taken, a system analyst is consulted. The first job of the system analyst is to
understand the existing system. During this stage he understands the various aspect of
algorithm, datastructures. Based on this he identifies what aspects of the operations of the
project need changes. The analyst discusses it and users his functions and determines the
areas where a changes can made it effective. The applications where a file transferring is
allowed is checked. It is not important to get the users involved from the initial stages of the
development of an application.
Feasibility Analysis
Having drawn up the rough specification, the next step is to check whether it is feasible to
implement the system. A feasibility study takes into account various constraints within which
the system should be implemented and operated. The resources needed for implementation
such as computing equipment, manpower and cost are estimated, based on the specifications
of users requirements. These estimates are compared with the available resources. A
comparison of the cost of the system and the benefits which will accrue is also made. This
document, known as the feasibility report, is given to the management of the organization.
Final Specifications
The developer of this s/w studies this feasibility report and suggests modifications in the
requirements, if any. Knowing the constraints on available resources, and the modified
47 47
requirements specified by the organization, the final specifications of the system to be
developed are drawn up by the system analyst. These specifications should be in a form
which can be easily understood by the users. The specification state what the system would
achieve. It does not describe how the system would do it. These specifications are given back
to the users who study them, consult their colleagues and offer suggestions to the systems
analyst for appropriate changes. These changes are incorporated by the system analyst and a
new set of applications are given back to the users. After discussions between the system
analyst and the users the final specifications are drawn up which are approved for
implementation? Along with this, criteria for system approval are specified, which will
Hardware Study
System Design
The next step is to develop the logical design of the system. The inputs to the system design
phase are functional specifications of the system and details about the computer
configuration. During this phase the logic of the programs is designed, and program test
plans and implementation plan are drawn up. The system design should begin from the
48 48
System Implementation
The next phase is implementation of the system. In this phase all the programs are written,
user operational document is written, users are trained, and the system tested with operational
data.
System Evaluation
After the system has been in operation for a reasonable period, it is evaluated and a plan for
its improvement is drawn up .This is called system life cycle. The shortcomings of a system-
namely, what a user expected from the system and what he actually got-are realized only after
a system is used for a reasonable time. Similarly, the shortcomings in this system are realized
System Modification
definitely cost time and money. But users expect modifications to be made as the name
Further, systems designed for use by clients cannot be static. These systems are intended for
real world problem. The environment in which a activity is conducted never remains static.
New changes occurred . New efficient algorithms occurred as research have been going on..
organization is bad. A system should be designed for change. The strength of a good
49 49
computer-based system is that it is amenable to change. A good system designer is one who
can foresee what aspects of a system would change and would design the system in a flexible
SYSTEM PLANNING
a planning, just like living system or a new product. System analysis and design are keyed to
the system planning. The analyst must progress from one stage to another methodically,
RECOGNITION OF NEED
One must know what the problem is before it can be solved. The basis for a
alternative system can solve the problem. It entails looking into the duplication of effort,
bottlenecks, inefficient existing procedure, or whether parts of the existing system would be
FEASIBILITY STUDY:
Many feasibility studies are disillusioning for both users and analysts. First, the study often
pre supposes that when the feasibility document is being prepared, the analyst is in a position
to evaluate solutions. Second, most studies tend to overlook the confusion inherent in the
50 50
system develop the constraints and assumed attitudes .If the feasibility study is to serve as
Is there a new and a better way to do the job that it will benefit the user?
What is recommended?
The most successful system projects are not necessarily the biggest or Most visible in a
business but rather than truly meets user expectations. Most projects fail because of inflated
Economic feasibility
Technical feasibility
Operational feasibility
1. ECONOMIC FEASIBILITY:
It is the most frequently used method for evaluating the effectiveness of a system that is
expected from the system and compares them with costs. If benefits outweigh costs then the
decision is made to design and implement the system. Otherwise, further justification or
alteration in the proposed system will have to be made if it is to have a change of being
approved. This is an ongoing effort that improves in accuracy at each phase of the system life
cycle.
51 51
1. Hardware Cost:
It relates to the actual purchase or lease of computer and peripherals (for example, printer,
disk, drive, tape unit). Determining the actual cost of the hardware is generally more difficult
when various users than for a dedicated stand-alone system share the system. In some cases,
the best way to control for this cost is to treat it as an operating cost.
In this system we are taking it as operating cost so as to minimize the cost of the initial
2. Personnel Cost:
It includes EDP staff salaries and benefits (health insurance, vacation time, sick pay, pay,
etc.) as well as pay for those involved in developing the system. Cost incurred during the
development of a system is Online costs and labeled development costs. Once the system is
installed, the costs of operating and maintaining the system become recurring cost.
Facility costs are expanses incurred in the preparation of the physical site where the
application or the computer will be in operation. This includes wiring, flooring, acoustics,
lighting and air conditioning. These costs are treated as one-time costs and are incorporated in
As our proposed system it incurred only wiring cost now a days all the sites are well
maintained such as flooring and lighting. Thus it would not go to incur extra expanse.
52 52
Operating cost includes all costs associated with the day-to-day operation of the system; the
amount depends on the number of shifts, the nature of applications, and the caliber of the
operating staff. There are various ways of covering the operating costs. One approach is to
treat the operating cost as the overhead. Another approach is to charge each authorized use
for the amount the processing they request from the system. The amount charged is based on
the computer time, staff time, and the volume of the output produced. In any case, some
As our candidate system is not so big we require only one server and some few terminals for
data maintaining and processing of data. Their costs can be easily determined at the
installation time of the proposed system. As computer is also a machine so it also has
depreciation by using any of the depreciation methods we can determine its annual costs after
Supply cost is variable costs that increase use of paper, ribbons, disks, and the like. They
A system is also expected to provide benefits. The first task is to identify each benefit and
then assign a monetary value to it for cost/benefit analysis. Benefits may be tangible and
The two major benefits are improving performance and minimizing the cost of
to information and easier access to the system by authorized users. Minimizing costs through
an efficient system error-control or reduction of staff-is a benefit that should be measured and
53 53
This cost in our proposed system is dependent on the number of customers so sometimes it is
more or sometimes it is less. It is not very easy to estimate this cost, what we can do is to
make a rough estimate of this cost and when this system is installed at a client side we can
compare this rough estimated cost with the actual expenses incurred due to this supply cost.
2. TECHNICAL FEASIBILITY:
Technical feasibility centers on the exciting computer system (hardware, software, etc.) and
to what extent it can support the proposed edition for example, if the current computer is
operating at 80 percent capacity-an arbitrary ceiling- then running another application could
overload the system or require additional hardware. This involves financial consideration to
accommodate technical enhancements. If the budget I serious constraint, then the project is
Presently at our client side all the work is done manually so question of overload the system
performance and required an additional hardware is not raised thus our candidate system is
technically feasible.
3. OPERATIONAL FEASIBILITY:
People are inherently resistant to change, and computer has been known to facilitate change.
An estimate should be made of how strong a reaction the user staff is likely to have towards
54 54
installations have something to do with turnover, transfers, retraining, and changes in
system requires special efforts to educate, sell, and train the staff on new ways of conducting
business.
There is no doubt that the people are inherently resistant to change, and computers
have been known to facilitate change. As in today's world all the work is computerized
because of computerization people only get benefits. As far as our system is concerned it is
only going to benefit the staff of the clinic in their daily routine work. There is no danger of
someone is loosing job or not get proper attention after the installation of our proposed
REQUIREMENT ANALYSIS
their relationship within and outside the system. One aspect of analysis is defining the
boundaries of the system and determining whether or not a candidate system should consider
other related system. During analysis, data are collected on the available files, decision
examples. The interview is commonly used tool in analysis. It requires special skills and
sensitivity to the subjects being interviewed. Bias in data collection and interpretation can be
a problem, training, experience and commonsense are required for collection of the
55 55
Once analysis is completed, the next step is to decide how the problem might be
solved. Thus in, system design, we move from the logical to the physical aspects of the
System Planning.
56 56
HARDWARE & SOFTWARE REQUIREMENTS
HARDWARE SPECIFICATIONS:
Language: Java
57 57
PROJECT DESCRIPTION
Huffman is a coding algorithm presented by David Huffman in 1952. It's an algorithm which
works with integer length codes. In fact if we want an algorithm which does integer length
We use huffman for example, for compressing the bytes outputted by lzp. First we have to
know the probabilities of them, we use a qsm model for that matter. Based on the
probabilities it makes the codes which then can be outputted. Decoding is more or less the
reverse process, based on the probabilities and the coded data, it outputs the decoded byte.
To make the probabilities the algorithm uses a binary tree. It stores there the symbols and
their probabilities. The position of the symbol depends on its probability. Then it assigns a
code based on its position in the tree. The codes have the prefix property and are
instantaneously decodable thus they are well suited for compression and decompression.
The Huffman compression algorithm assumes data files consist of some byte values that
occur more frequently than other byte values in the same file. This is very true for text files
and most raw gif images, as well as EXE and COM file code segments.
By analyzing, the algorithm builds a "Frequency Table" for each byte value within a file.
With the frequency table the algorithm can then build the "Huffman Tree" from the frequency
table. The purpose of the tree is to associate each byte value with a bit string of variable
length. The more frequently used characters get shorter bit strings, while the less frequent
characters get longer bit strings. Thusly the data file may be compressed.
58 58
To compress the file, the Huffman algorithm reads the file a second time, converting each
byte value into the bit string assigned to it by the Huffman Tree and then writing the bit string
to a new file. The decompression routine reverses the process by reading in the stored
frequency table (presumably stored in the compressed file as a header) that was used in
compressing the file. With the frequency table the decompressor can then re-build the
Huffman Tree, and from that, extrapolate all the bit strings stored in the compressed file to
Huffman Encoding :
Huffman encoding works by substituting more efficient codes for data and the codes are then
stored as a conversion table and passed to the decoder before the decoding process takes
place. This approach was first introduced by David Huffman in 1952 for text files and has
spawned many variations. Even CCITT (International Telegraph and Telephone Consultative
Committee) 1 dimensional encoding used for bilevel, black and white image data
Algorithm :
Basically in Huffman Encoding each unique value is assigned a binary code, with codes
varying in length. Shorter codes are then used for more frequently used values. These codes
are then stored into a conversion table and passed to the decoder before any decoding is done.
Let's imagine that there is this data stream that is going to be encoded by Huffman Encoding :
59 59
AAAABCDEEEFFGGGH
The frequency for each unique value that appears are as follows :
A : 4, B : 1, C : 1, D : 1, E : 3, F : 2, G : 3, H :1
Based on the frequency count the encoder can generate a statistical model reflecting the
From the statistical model the encoder can build a minimum code for each and store it in the
conversion table. The algorithm pairs up 2 values with the least probability, in this case we
take B and C and combine their probability so as to be treated as one unique value. Along the
way each value B, C and even BC is being assigned a 0 or 1 on their branch. This means that
0 and 1 will be the least significant bits of the codes B and C respectively. From there the
algorithm compares the remaining values for another 2 values with the smallest probability
and repeat the whole process again until they extend up to form a structure of a up-side down
60 60
61 61
62 62
63 63
64 64
The binary code for each of the unique value can then be known following down from the top
of the up-side down tree (most significant bit) until we reached the unique value we want
(least significant bit). Let's take for example we want to find the code for B : Follow the path
shown by the blue arrow on the diagram above, and arrive on B. Notice that beside each of
the paths we take, there is a bit value, combining each of these values which we came across,
and we will get the code for B : 1000. The same approach is then used to find all of the
unique values, and their codes are then stored in the conversion table.
Code Construction:
To assign codes you need only a single pass over the symbols, but before doing that you need
to calculate where the codes for each codelength start. To do so consider the following: The
longest code is all zeros and each code differs from the previous by 1 (I store them such that
the last bit of the code is in the least significant bit of a byte/word).
Codes with length three start at (0000+4*1)>>1 = 010. There are 4 codes with length
4 (that is where the 4 comes from), so the next length 4 code would start at 0100. But
since it shall be a length 3 code we remove the last 0 (if we ever remove a 1 there is a
65 65
Codes with length 0 start at (10+0*1)>>1 = 1. If anything else than 1 is start for the
Then visit each symbol in alphabetical sequence (to ensure the second condition) and assign
the startvalue for the codelength of that symbol as code to that symbol. After that increment
Apart from the ceil(log2(alphabetsize)) boundary for the nonzero bits in this particular
canonical huffman code it is useful to know the maximum length a huffman code can reach.
The maximum length of the code also depends on the number of samples you use to derive
your statistics from; the sequence is as follows (the samples include the fake samples to give
To compress a file (sequence of characters) you need a table of bit encodings, e.g., an ASCII
table, or a table giving a sequence of bits that's used to encode each character. This table is
constructed from a coding tree using root-to-leaf paths to generate the bit sequence that
Assuming you can write a specific number of bits at a time to a file, a compressed file is
made using the following top-level steps. These steps will be developed further into sub-
steps, and you'll eventually implement a program based on these ideas and sub-steps.
66 66
Build a table of per-character encodings. The table may be given to you, e.g., an ASCII table,
Read the file to be compressed (the plain file) and process one character at a time. To process
each character find the bit sequence that encodes the character using the table built in the
previous step and write this bit sequence to the compressed file.
To build a table of optimal per-character bit sequences you'll need to build a Huffman coding
tree using the greedy Huffman algorithm. The table is generated by following every root-to-
leaf path and recording the left/right 0/1 edges followed. These paths make the optimal
1 Count the number of times every character occurs. Use these counts to create an initial
forest of one-node trees. Each node has a character and a weight equal to the number of times
2 Use the greedy Huffman algorithm to build a single tree. The final tree will be used in the
next step.
3 Follow every root-to-leaf path creating a table of bit sequence encodings for every
character/leaf.
67 67
Header Information:
You must store some initial information in the compressed file that will be used by the
uncompression/unhuffing program. Basically you must store the tree used to compress the
There are several alternatives for storing the tree. Some are outlined here, you may explore
Store the character counts at the beginning of the file. You can store counts for every
character, or counts for the non-zero characters. If you do the latter, you must include
some method for indicating the character, e.g., store character/count pairs.
You could use a "standard" character frequency, e.g., for any English language text
you could assume weights/frequencies for every character and use these in
You can store the tree at the beginning of the file. One method for doing this is to do a
pre-order traversal, writing each node visited. You must differentiate leaf nodes from
internal/non-leaf nodes. One way to do this is write a single bit for each node, say 1
for leaf and 0 for non-leaf. For leaf nodes, you will also need to write the character
stored. For non-leaf nodes there's no information that needs to be written, just the bit
Decompressing:
68 68
Decompression involves re-building the Huffman tree from a stored frequency table (again,
presumable in the header of the compressed file), and converting its bit streams into
characters. You read the file a bit at a time. Beginning at the root node in the Huffman Tree
and depending on the value of the bit, you take the right or left branch of the tree and then
return to read another bit. When the node you select is a leaf (it has no right and left child
nodes) you write its character value to the decompressed file and go back to the root node for
If your system is continually dealing with data in which the symbols have similar frequencies
of occurence, then both encoders and decoders can use a standard encoding table/decoding
tree. However, even text data from various sources will have quite different characteristics.
For example, ordinary English text will have generally have 'e' at the root of the tree, with
short encodings for 'a' and 't', whereas C programs would generally have ';' at the root, with
short encodings for other punctuation marks such as '(' and ')' (depending on the number and
length of comments!). If the data has variable frequencies, then, for optimal encoding, we
have to generate an encoding tree for each data set and store or transmit the encoding with the
data. The extra cost of transmitting the encoding tree means that we will not gain an overall
benefit unless the data stream to be encoded is quite long - so that the savings through
compression more than compensate for the cost of the transmitting the encoding tree also.
WORKING OF PROJECT:
69 69
There are following functions in project
Huffman Zip
Encoder
Decoder
Table
DLNode
Priority Queue
Huffman Node
Huffman zip is the main function which uses applet. It is used for user interface. Encoder is
the module for compressing the file. It implements Huffman algorithm for compressing the
text and image file. It first calculate the frequencies of all the occurring symbols. Then on the
basis of these frequencies it generates the priority queue. This priority queue is used for
finding the symbols with least frequencies. Now the two symbols with lowest frequencies are
deleted from the queue and a new symbol is added to the queue with frequency equal to the
sum of these two symbols. In the meanwhile we generate a tree with leaf nodes are the two
deleted node and the root node is the new node added to the queue. At last we traverse the
tree starting from the root node to the leaf node assigning 0 to the left child and 1 to the right
node. In this way we assign code to every symbol in the file. These are binary codes then we
group these binary codes and calculate the equivalent integers and store them in the output
70 70
Decoder works in the reverse order as the encoder. It reads the input from the compressed file
and convert it into equivalent binary code. It has one another input the binary tree generated
in the encoding process and on the basis of these data it generates the original file. This
Table is used for storing the codes of each symbol. Priority queue takes input the symbols and
there related frequencies and on the basis of these frequencies it assign priorities to each
symbol. Huffman node is used for creating the binary tree it takes input two symbol from the
priority queue and create two nodes by comparing the frequencies of these two symbol. It
places the symbol with less frequency to the left and the symbol with high frequency to the
right, it then deletes these two symbol from the priority queue and places a new symbol with
frequency equal to the sum of frequencies of these two deleted symbol. It also generate a
parent node to the two node and assign frequency equal to the sum of frequencies of the two
leaf node.
When solving a small problem, the entire problem can be tackled at once. For solving larger
problems, the basic principles the time-tested principle of divide and conquer. Clearly,
71 71
dividing in such a manner that all the divisions have to be conquered together is not the intent
of this wisdom. This principle, if elaborated, would mean divide into smaller pieces, so that
Problem partitioning, which is essential for solving a complex problem, leads to hierarchies
in the design. That is, the design produced by using problem partitioning can be represented
as a hierarchy of components. The relationship between the elements in this hierarchy can
vary depending on the method used. For example, the most common is the whole-part of
relationship. In this the system consists of some parts, each past consists of subparts, and so
on. This relationship can be naturally represented as a hierarchical structure between various
system parts. In general hierarchical structure makes it much easier to comprehend a complex
system. Due to this, all design methodologies aim to produce a design that has nice
hierarchical structures.
The DFD was first designed by Larry Constantine as a way of expressing system
A DFD, also known as bubble chart, has the purpose of clarifying system requirements and
identifying major transformations that will become programs in system design. So it is the
starting point of the design phase that functionally decomposes the requirement specifications
down to the lowest level of detail. A DFD consists of series of bubbles joined by lines
DFD SYMBOLS
72 72
In the DFD, there are four symbols.
2 An arrow identifies data flow- data in motion. It is a pipeline through which information
flows.
3 A circle or a bubble (some people use an oval bubble) represents a process that
SYMBOLS MEANING
Data flow
73 73
Data Store
CONSTRUCTING DFD
1 Processes should be named and numbered for easy reference. Each name should be
2 The direction of flow is from top to bottom and from left to right. Data traditionally flow
from the source (upper left corner) to the destination (lower right corner), although they may
flow back to a source. One way to indicate this is to draw a long flow line back to the source.
An alternative way is to repeat the source symbol as a destination. Since it is used more than
once in the DFD, it is marked with a short diagonal in the lower right corner.
4 The names of data sources and destinations are written in capital letters. Process and data
The DFD is designed to aid communication. If it contains dozens of processes and data stores
it gets too unwieldy. The rule thumb is to explode the DFD to a functional level, so that the
next sublevel does not exceed 10 processes. Beyond that, it is best to take each function
separately and expand it show the explosion of the single process. If a user wants to know
74 74
what happens within a given process, then the detailed explosion of that process may be
shown.
A DFD typically shows the minimum contents of data elements that flow in and out.
A leveled set has a starting DFD, which is a very abstract representation of the system,
identifying the major inputs and outputs and the major processes in the system. Then each
process is refined and a DFD is drawn for the process. In other words, a bubble DFD is
expanded into a DFD during refinement. For the hierarchy to be consistent, it is important
that the net inputs and outputs of the DFD for a process are the same as the inputs and outputs
of the process are the same as the inputs and the outputs of the process in the higher level
DFD. This refinement stops if each bubble can be easily identified or understood. It should be
pointed out that during refinement, though the net input and output are preserved, a
refinement of the data might also occur. That is , a unit of data may be broken into its
components for processing when the detailed DFD for a process is being drawn .So , as the
The DFD methodology is quite effective, especially when the required design is unclear the
analyst need a notational language for communication. The DFD is easy to understand for
The main problem however is the large number of iterations that often are required to arrives
75 75
The DFD helps to understand the functioning & module used in the coding . It describe easily
flow and store of the data.What variable are given in input & flow of data in the program &
the final output. Here we are referencing some DFDs which helps in understanding the
program
76 76
Updation of
Code generator
priority queue
77 77
Print Layouts
78 78
79 79
80 80
81 81
IMPLEMENTATION:
The implementation phase is less creative than system design. It is primarily concerned with
user training, site preparation, and file conversion. When the candidate system is linked to
terminals to remote sites, the telecommunication network and test of the network along with
During the implementation phase, the system actually takes physical shape
As in the other two stages, the analyst, his or her associates and the user performs many tasks
including: -
Evaluating the final system to make sure that it is fulfilling original need and that it
The analyst involvement in each of these activities varies from organization to organization .
For a small organizations, specialists may work on different phases and tasks, such as
training, ordering equipment, converting data from old methods to the new or certifying the
The implementation phase with an evaluation of the system after placing it into operation
for a period of time .by then, most program errors will have shown up and most costs will
82 82
have become clear .To make sure that the system audit is a last check or review of a system
to ensure that it meets design criteria. Evaluation forms the feedback part of the cycle that
During the final testing user acceptance is tested followed by user training. Depending on the
nature of the system, extensive user training may be required. Conversion usually takes place
In the extreme, the programmer is falsely viewed as some who ought to be isolated from
other aspects of system development. Programming is itself design work, however. The initial
Programming provides a reality test for the assumptions maid by the analyst it is therefore
System testing checks the readiness and accuracy of the system to access update and retrieve
data from new files. Once the program becomes available test data are read into the computer
and processed against the file provide for testing in most conversions a parallel run is
conducted where the new system runs simultaneously with the old system this method though
83 83
TEST PLAN
to developer, the client, and the rest of the team, this is what can be expected.
Introduction:
Summarizes key features and expectations of software along with testing approach.
Scope:
It includes a description of text types.
testing.
Test resources:
Specifies testers and bug fixers.
Error:
84 84
The term Error is used in two different ways. It refers to difference between the
actual output of the software and the correct output. In this interpretation, error is an essential
measure of the difference actual and ideal output. Error is also used to refer to human action
Fault:
Fault is a condition that causes a system to fail in performing its required function. A
fault is a basic reason for software malfunction and is synonymous with the commonly used
term 'Bug'.
Failure:
Failure is the inability of a system or component to perform a required function
according to its specifications. A software failure occurs if the behavior if the software is
different from the specified behavior. Failure may be caused due to functional or
performance reasons.
Unit Testing
Module testing
Integration testing
85 85
System testing
Acceptance testing
Unit Testing :
The term 'Unit Testing' comprises the set of tests performed by an
individual programmer prior to the integration of the unit into a larger system. The situation
is illustrated as follows:
A program unit is usually small enough, so the programmer who developed it can
test it in great detail, and certainly in greater detail than will be possible when the unit
is integrated into an evolving software product. In unit testing, the programs are tested
separately, independent of each other. Since the check is done at the program level, it is
Module Testing :
A module encapsulates related component. So can be tested without other system modules.
Subsystem testing :
86 86
Subsystem testing may be independently designed and implemented. Common
problems such as sub-system interface mistakes can be checked and can concentrate on it
in this phase.
There are four categories of tests that a programmer will typically perform on a program
unit:
Functional Tests
Performance Test
Stress Test
Structure Test
Functional Test :
Functional test cases involves exercising the code with nominal input values for which
expected results are known, as well as boundary values (minimum values, maximum
values, and values on and just outside the functional boundaries) and special values.
Performance Test :
parts of the unit, program throughput, response time, and device utilization by the
program unit. A certain amount of performance tuning may be done during testing,
87 87
however, caution must be exercised to avoid expending too much effort on fine tuning
of a program unit that contributes little to the overall performance of the entire system.
Stress Test :
Stress tests are those tests designed to intentionally break the unit. A great deal can be
learned about the strengths and limitations of a program by examining the manner in which
Structure Test :
Structure tests are concerned with exercising the internal logic of a program and traversing
particular execution paths. Some authors refer collectively to functional performance and
stress testing as black box testing, while structure testing is referred to as white box or
glass box testing. The major activities in structural testing are deciding which path to
exercise, deriving test data to exercise those paths, determining the test coverage criterion to
be used and executing the test cases on some modules and subsystems. This mix
alleviates many of the problems encountered in pure top-down testing and retains the
Automated tools used in integration testing include module drivers, test data generators,
test cases (both input and expected results) in a descriptive language. The driver tool
88 88
then calls the routine using specified test cases, compares actual with the expected results,
Some module drivers also provide program stubs for top-down testing. Test cases
are written for the stub, and when the stub is invoked by the routine being tested, the
drivers examine the input parameters to the stub and return the corresponding outputs to
the routine. Automated test drivers include AUT, MTS, TEST MASTER and TPL.
Test data generators are of two varieties; those that generate files of random
data values according to some predefined format, and those that generate test data for
particular execution paths. In the latter category, symbolic executors such as ATTEST can
sometimes be used to driver a set of test data that will force program execution to follow a
acceptance testing to simulate the operating environment in which the software will
function. Simulators are used in situation in which operation of the actual environment
that do not exist, and the Saturn Flight Program Simulators for simulating live flight tests
cases, and measuring the coverage achieved when the test cases are exercised.
System Testing
Integration testing
Acceptance testing
89 89
Strategies for integrating software components into a functioning product include the
bottom-up strategy, the top-down strategy, and the sandwich strategy. Careful planning and
scheduling are required to ensure that modules will be available for integration into
the evolving software product when needed. The integration strategy dictates the order in
which modules must be available, and thus exerts a strong influence on the order in
tests, and stress tests to verify that the implemented system satisfies its requirements.
organizations.
90 90
CONCLUSIONS
Data compression is a topic of much importance and many applications. Methods of data
compression have been studied for almost four decades. This paper has provided an overview
of data compression methods of general utility. The algorithms have been evaluated in terms
of the amount of compression they provide, algorithm efficiency, and susceptibility to error.
While algorithm efficiency and susceptibility to error are relatively independent of the
characteristics of the source ensemble, the amount of compression achieved depends upon the
exploit local redundancy or context information. A semantic dependent scheme can usually
Susceptibility to error is the main drawback of each of the algorithms presented here.
Although channel errors are more devastating to adaptive algorithms than to static ones, it is
possible for an error to propagate without limit even in the static case. Methods of limiting
investigated.
91 91
NEW DIRECTIONS:
Data compression is still very much an active research area. This section suggests
The discussion of illustrates the susceptibility to error of the codes presented in this survey.
Strategies for increasing the reliability of these codes while incurring only a moderate loss of
efficiency would be of great value. This area appears to be largely unexplored. Possible
one or more codewords to act as error flags. For Huffman encoding & decoding it may be
necessary for receiver and sender to verify the current code mapping.
Another important research topic is the development of theoretical models for data
compression which address the problem of local redundancy. Models based on Huffman
Entropy tends to be overestimated when symbol interaction is not considered. Models which
exploit relationships between source messages may achieve better compression than
92 92
Since this system has been generated by using Object Oriented programming, there are every
chances of reusability of the codes in other environment even in different platforms. Also its
present features can be enhanced by some simple modification in the codes so as to reuse it in
We can implement easily this application. Reusability is possible as and when we require in
this application. We can update it next version. We can add new features as and when we
SOURCE CODE
HuffmanZip.java
93 93
import javax.swing.*;
import java.io.*;
import java.awt.*;
import java.awt.event.*;
public HuffmanZip()
{
// Container con=getContentPane();
Container c=getContentPane();
enc=new JButton("Encode");
dec=new JButton("Decode");
center=new JButton();
title=new JLabel(" Zip Utility V1.1 ");
choose=new JFileChooser();
icon=new ImageIcon("huff.jpg");
center.setIcon(icon);
enc.addActionListener(
new ActionListener()
{
public void actionPerformed(ActionEvent e)
{
int f=choose.showOpenDialog(HuffmanZip.this);
if (f==JFileChooser.APPROVE_OPTION)
94 94
{
input1=choose.getSelectedFile();
encoder=new Encoder(input1);
HuffmanZip.this.setTitle("Compressing.....");
encoder.encode();
JOptionPane.showMessageDialog(null,encoder.getSummary(),"Summary",JOp
tionPane.INFORMATION_MESSAGE);
HuffmanZip.this.setTitle("Zip utility v1.1");
}
}
}
);
dec.addActionListener(
new ActionListener()
{
public void actionPerformed(ActionEvent e)
{
int f=choose.showOpenDialog(HuffmanZip.this);
if (f==JFileChooser.APPROVE_OPTION)
{
input2=choose.getSelectedFile();
decoder=new Decoder(input2);
decoder.decode();
HuffmanZip.this.setTitle("Decompressing.....");
JOptionPane.showMessageDialog(null,decoder.getSummary(),"Summary",JOp
tionPane.INFORMATION_MESSAGE);
HuffmanZip.this.setTitle("Zip utility v1.1");
}
}
}
);
95 95
//c.add(bar,BorderLayout.SOUTH);
c.add(dec,BorderLayout.EAST);
c.add(enc,BorderLayout.WEST);
c.add(center,BorderLayout.CENTER);
c.add(title,BorderLayout.NORTH);
setSize(250,80);
setVisible(true);
}
Encoder.java
import java.io.*;
96 96
import javax.swing.*;
for(int i=0;i<256;i++)
{
freq[i]=0;
}
97 97
}
try
{
System.out.println(" "+in.available());
totalBytes=in.available();
int mycount=0;
in.mark(totalBytes);
while (mycount<totalBytes)
{
int a=in.read();
mycount++;
freq[a]++;
}
in.reset();
}
catch(IOException eofexc)
{
System.out.println("error");
try
{
for(int j=0;j<256;j++)
{
98 98
}
//create tree....................................
while (q.sizeQ()>1)
{
one=q.removeFirst();
two=q.removeFirst();
int f1=one.getFreq();
int f2=two.getFreq();
if (f1>f2)
{
HuffmanNode t=new HuffmanNode(null,
(f1+f2),0,two,one,null);
one.up=t;
two.up=t;
q.insertM(t);
}
else
{
HuffmanNode t=new HuffmanNode(null,
(f1+f2),0,one,two,null);
one.up=t;
two.up=t;
q.insertM(t);
}
tree =q.removeFirst();
}
catch(Exception e)
{
System.out.println("Priority Queue error");
}
code=new String[256];
for(int i=0;i<256;i++)
code[i]="";
traverse(tree);
99 99
if(freq[i]==0)
continue;
// System.out.println(""+i+" "+code[i]+" ");
}
// System.out.println("size of table"+rec.recSize());
try
{
while (count<totalBytes)
{
outbyte+=code[in.read()];
count++;
if (outbyte.length()>=8)
{
int k=toInt(outbyte.substring(0,8));
csize++;
outf.write(k);
outbyte=outbyte.substring(8);
}
}
while(outbyte.length()>8)
{
csize++;
int k=toInt(outbyte.substring(0,8));
100 100
outf.write(k);
outbyte=outbyte.substring(8);
}
if((recordLast=outbyte.length())>0)
{
while(outbyte.length()<8)
outbyte+=0;
outf.write(toInt(outbyte));
csize++;
}
outf.write(recordLast);
outf.close();
}
catch(Exception re)
{
System.out.println("Error in writng....");
}
float ff=(float)csize/((float)totalBytes);
System.out.println("Compression "+recordLast+" ratio"+csize+"
"+(ff*100)+" %");
done=true;
if (n.lchild==null&&n.rchild==null)
{
101 101
HuffmanNode m=n;
int arr[]=new int[20],p=0;
while (true)
{
if (m.up.lchild==m)
{
arr[p]=0;
}
else
{
arr[p]=1;
}
p++;
m=m.up;
if(m.up==null)
break;
}
for(int j=p-1;j>=0;j--)
code[n.getValue()]+=arr[j];
}
// System.out.println("Debug3");
if(n.lchild!=null)
traverse(n.lchild);
if(n.rchild!=null)
traverse(n.rchild);
}
}
for(int i=7;i>=0;i--)
{
s+=arr[i];
}
return s;
}
102 102
int output=0,wg=128;
for(int i=0;i<8;i++)
{
output+=wg*Integer.parseInt(""+b.charAt(i));
wg/=2;
}
return output;
}
Decoder.java
103 103
import java.io.*;
import javax.swing.*;
freq=new int[256];
for(int i=0;i<256;i++)
{
freq[i]=0;
}
try
{
in1 = new FileInputStream(inputFile);
inF=new ObjectInputStream(in1);
in=new BufferedInputStream(in1);
// int arr=0;
table=(Table)(inF.readObject());
104 104
outputFile = new File(table.fileName());
outf=new FileOutputStream(outputFile);
System.exit(0);
}
try
{
for(int j=0;j<256;j++)
{
int r =table.pop();
// System.out.println("Size of table "+r+" "+j);
if (r>0)
{
HuffmanNode t=new
HuffmanNode("dipu",r,j,null,null,null);
q.insertM(t);
}
}
//create tree....................................
while (q.sizeQ()>1)
{
one=q.removeFirst();
two=q.removeFirst();
int f1=one.getFreq();
105 105
int f2=two.getFreq();
if (f1>f2)
{
HuffmanNode t=new HuffmanNode(null,
(f1+f2),0,two,one,null);
one.up=t;
two.up=t;
q.insertM(t);
}
else
{
HuffmanNode t=new HuffmanNode(null,
(f1+f2),0,one,two,null);
one.up=t;
two.up=t;
q.insertM(t);
}
tree =q.removeFirst();
}
catch(Exception exc)
{
System.out.println("Priority queue exception");
}
String s="";
try
{
mycount=in.available();
while (totalBytes<mycount)
{
arr=in.read();
s+=toBinary(arr);
while (s.length()>32)
{
for(int a=0;a<32;a++)
{
int
wr=getCode(tree,s.substring(0,a+1));
if(wr==-1)continue;
106 106
else
{
outf.write(wr);
s=s.substring(a+1);
break;
}
}
totalBytes++;
}
s=s.substring(0,(s.length()-8));
s=s.substring(0,(s.length()-8+arr));
int counter;
while (s.length()>0)
{
if(s.length()>16)counter=16;
else counter=s.length();
for(int a=0;a<counter;a++)
{
int
wr=getCode(tree,s.substring(0,a+1));
if(wr==-1)continue;
else
{
outf.write(wr);
s=s.substring(a+1);
break;
}
}
}
outf.close();
}
catch(IOException eofexc)
{
System.out.println("IO error");
}
107 107
summary+="Compressed size : "+ mycount+" bytes.";
summary+="\n";
while (true)
{
if (decode.charAt(0)=='0')
{
node=node.lchild;
}
else
{
node=node.rchild;
}
if (node.lchild==null&&node.rchild==null)
{
return node.getValue();
}
if(decode.length()==1)break;
decode=decode.substring(1);
}
return -1;
}
}
for(int i=7;i>=0;i--)
{
108 108
s+=arr[i];
}
return s;
}
public int toInt(String b)
{
int output=0,wg=128;
for(int i=0;i<8;i++)
{
output+=wg*Integer.parseInt(""+b.charAt(i));
wg/=2;
}
return output;
}
public int getCurrent()
{
return totalBytes;
}
public int lengthOftask()
{
return mycount;
}
public String getSummary()
{
return summary;
}
}
DLnode.java
public class DLNode
109 109
{
private DLNode next,prev;
private HuffmanNode elem;
public DLNode()
{
next=null;
prev=null;
elem=null;
}
public DLNode(DLNode next,DLNode prev,HuffmanNode elem)
{
this.next=next;
this.prev=prev;
this.elem=elem;
}
}
HuffmanNode.java
import java.io.*;
110 110
public class HuffmanNode implements Serializable
{
]
PriorityQueue.java
public class PriorityQueue
111 111
{
if (head.getNext()==tail)
{
DLNode d=new DLNode(tail,head,o);
head.setNext(d);
tail.setPrev(d);
}
else
{
DLNode n=head.getNext();
HuffmanNode CurrenMax=null;
int key=o.getFreq();
while (true)
{
if (n.getElement().getFreq()>key)
{
112 112
DLNode second=n.getPrev();
if(isEmpty())
throw new Exception("Queue is empty");
HuffmanNode o=head.getNext().getElement();
DLNode sec=head.getNext().getNext();
head.setNext(sec);
sec.setPrev(head);
size--;
return o;
}
public HuffmanNode removeLast() throws Exception
{
if(isEmpty())
throw new Exception("Queue is empty");
DLNode d=tail.getPrev();
HuffmanNode o=tail.getPrev().getElement();
tail.setPrev(d.getPrev());
d.getPrev().setNext(tail);
size--;
return o;
}
113 113
public boolean isEmpty()
{
if(size==0)return true;
return false;
}
public int sizeQ()
{
return size;
}
public HuffmanNode first()throws Exception
{
if(isEmpty())
throw new Exception("Stack is empty");
return head.getNext().getElement();
}
Table.java
114 114
import java.io.*;
REFERENCES
115 115
TITLE AUTHOR
WEBSITES:-
1. http://www.google.com
2. http://www.wikipedia.org
3.http://www.nist.gov
ENCLOSED:
116 116