0% found this document useful (0 votes)
14 views

21ECE72_Coding and Cryp Module 1

The document provides an overview of Information Theory and Source Coding, covering key concepts such as entropy, mutual information, and various coding techniques including Huffman, Arithmetic, Lempel-Ziv, and Run Length coding. It explains the mathematical foundations of these concepts, emphasizing their applications in communication systems and data compression. The document also includes formulas and examples to illustrate the calculation of entropy and the effectiveness of different coding methods.

Uploaded by

dnekaidbsj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

21ECE72_Coding and Cryp Module 1

The document provides an overview of Information Theory and Source Coding, covering key concepts such as entropy, mutual information, and various coding techniques including Huffman, Arithmetic, Lempel-Ziv, and Run Length coding. It explains the mathematical foundations of these concepts, emphasizing their applications in communication systems and data compression. The document also includes formulas and examples to illustrate the calculation of entropy and the effectiveness of different coding methods.

Uploaded by

dnekaidbsj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Module 1

Information Theory and Source Coding


Syllabus :
Introduction to Information Theory, Uncertainty and Information, Entropy, Mutual information, Relationship
between entropy and mutual information, Shannon Fano coding.
Source Coding Techniques: Huffman Coding, Arithmetic coding, Lempel-Ziv Coding, Run length coding.

Text Book : Bose, Ranjan. Information theory, coding and cryptography, 3 rd Edition, Tata McGraw-Hill Education,
2015, ISBN: 978-9332901257

General Introduction to Information Theory


Information Theory is a branch of mathematics that deals with the quanti ication of information. It
provides a framework for understanding how information is transmitted, stored, and processed.
Uncertainty and Information
 Uncertainty: The degree of unpredictability associated with an event.
 Information: The reduction in uncertainty.
Entropy
 Entropy (H): A measure of the average amount of information contained in a message.
o Higher entropy indicates greater uncertainty.
o Lower entropy indicates less uncertainty.
Formula for Entropy:
H(X) = -∑ P(xi) log₂ P(xi)
where:
 X is a random variable.
 P(xi) is the probability of the i-th outcome.
Mutual Information
 Mutual Information (I): A measure of the shared information between two random variables.
 It quanti ies the reduction in uncertainty about one random variable when the other is known.
Formula for Mutual Information:
I(X;Y) = H(X) - H(X|Y) = H(Y) - H(Y|X)
where:
 H(X|Y) is the conditional entropy of X given Y.
Relationship between Entropy and Mutual Information
 Mutual information is always non-negative.
 Mutual information is zero if and only if the two random variables are independent.
 Mutual information is equal to the entropy of one variable if the other variable is a deterministic
function of the irst.
Shannon Fano Coding
 A source coding technique that assigns variable-length codes to symbols based on their
probabilities.
 The codes are constructed such that more probable symbols have shorter codes.
Source Coding Techniques
Huffman Coding
 A greedy algorithm that constructs optimal pre ix-free codes.
 It involves building a binary tree based on the probabilities of the symbols.
 The codes are assigned by traversing the tree from the root to the corresponding leaf.
Arithmetic Coding
 A source coding technique that represents a sequence of symbols as a single real number.
 It achieves higher compression ef iciency than Huffman coding for many sources.
Lempel-Ziv Coding
 A class of algorithms that exploit repeated patterns in the input data.
 The data is compressed by replacing repeated sequences with pointers to previously seen
occurrences.
 Lempel-Ziv-Welch (LZW) is a popular variant of Lempel-Ziv coding.
Run Length Coding
 A simple compression technique that replaces sequences of identical symbols with a pair of values:
the symbol and the number of consecutive occurrences.
 It is effective for data with long runs of the same symbol.
Note: These are just a brief overview of the concepts involved in information theory and source coding.
For a deeper understanding, it is recommended to explore textbooks and online resources.
Information Theory:
• Information theory applies the laws of probability theory, and mathematics in general, to study the
collection and processing of information.
• In the context of communication systems, information theory, originally called the mathematical
theory of communication, deals with mathematical modelling and analysis of communication
systems, rather than with physical sources and physical channels.
Can we measure information?
• Consider the two following sentences:
1. There is a traf ic jam on New Horizon College of Engineering
2. There is a traf ic jam on New Horizon College of Engineering near Gate No 3.
Sentence 2 seems to have more information than that of sentence 1. From the semantic viewpoint,
sentence 2 provides more useful information.
Information theory is the scienti ic study of information and communication systems designed to handle it
(information).
• Including telegraphy, radio communications, and all other systems concerned with the
processing and/or storage of signals.
• In particular, Information Theory provides answers to the following two fundamental questions:
• What is the minimum number of bits per symbol required to fully represent the source?—
Entropy of the source
• What is the ultimate transmission rate for reliable communication over a noisy channel?—
Capacity of a channel
An information source is an object that produces an event, the outcome of which is random and in
accordance with some probability distribution.
A practical information source in a communication system is a device that produces messages. It can be
either analogue or digital.
Here, we shall deal mainly with the discrete sources, since the analogue sources can be transformed to
discrete sources through the use of sampling and quantisation techniques.
A discrete information source is a source that has only a inite set of symbols as possible outputs. The set
of possible source symbols is called the source alphabet, and the elements of the set are called symbols
ENTROPY:
Conditions of Occurrence of Events
If we consider an event, there are three conditions of occurrence.
 If the event has not occurred, there is a condition of uncertainty.
 If the event has just occurred, there is a condition of surprise.
 If the event has occurred, a time back, there is a condition of having some information.
These three events occur at different times. The differences in these conditions help us gain knowledge on
the probabilities of the occurrence of events.
Entropy: When we observe the possibilities of the occurrence of an event, how surprising or uncertain it
would be, it means that we are trying to have an idea on the average content of the information from the
source of the event.
Entropy can be de ined as a measure of the average information content per source symbol.

Where pi is the probability of the occurrence of character number i from a given stream of characters and
b is the base of the algorithm used. Hence, this is also called as Shannon’s Entropy.
Conditional Entropy: The amount of uncertainty remaining about the channel input after observing the
channel output, is called as Conditional Entropy.

It is denoted by
Example:
Consider a diskette storing a data ile consisting of 100,000 binary digits (binits), i.e., a total of 100,000
“0”s and “1”s . If the binits 0 and 1 occur with probabilities of ¼ and ¾ respectively, then binit 0 conveys
an amount of information equal to log2 (4/1) = 2 bits, while the binit 1 conveys information amounting to
log2 (4/3) = 0.42 bit.
The quantity H is called the entropy of a discrete memory-less source. It is a measure of the average
information content per source symbol. It may be noted that the entropy H depends on the probabilities of
the symbols in the alphabet of the source.
Example
Consider a discrete memory-less source with source alphabet {s 0,s1,s2} with probabilities p0=1/4, p1=1/4
and p2=1/2. Find the entropy of the source.
Solution
The entropy of the given source is
H = p0log2(1/p0) + p1log2(1/p1) + p2log2(1/p2)
= ¼log2(4) + ¼log2(4) + ½log2(2)
= 2/4 + 2/4 + 1/2
= 1.5 bits
For a discrete memory-less source with a ixed alphabet:
• H=0, if and only if the probability pk=1 for some k, and the remaining probabilities in
the set are all zero. This lower bound on the entropy corresponds to ‘no uncertainty’.
• H=log2(K), if and only if pk=1/K for all k (i.e. all the symbols in the alphabet are
equiprobable). This upper bound on the entropy corresponds to ‘maximum
uncertainty’.

• 0  H  log2(K) K is the radix (number of symbols) of the alphabet S of the source

• In Case I, it is very easy to guess whether the message s0 with a probability =0.01 will occur or the
message s1 with probability =0.99 will occur.(Most of the time message s 1 will occur). Thus in this
case, the uncertainty is less.
• In Case II, it is somewhat dif icult to guess whether s0 will occur or s1 will occur as their
probabilities are nearly equal. Thus in this case, the uncertainty is more.
In Case III, it is extremely dif icult to guess whether s0 or s1 will occur, as their probabilities are
equal. Thus in this case, the uncertainty is maximum
Entropy is less when uncertainty is less.
Entropy is more when uncertainty is more.
Thus, we can say that entropy is a measure of uncertainty.

An analog signal is band limited to B Hz, sampled at the Nyquist rate, and the samples are quantized into 4-
levels. The quantization levels Q1, Q2, Q3, and Q4 (messages) are assumed independent and occur with
probs. P1 = P2 = 1 and P2 = P3 = 3 . Find the information rate of the source.
Relation between Entropy and Mutual Information
Mutual Information: quanti ies the amount of information that knowing one random variable Y gives about
another random variable X. It is a measure of how much the uncertainty in X is reduced by knowing Y.
SHANNON- FANO CODING:
Lempel Ziv–Welch Coding
A drawback of the Huffman code is that it requires knowledge of a probabilistic model of
the source; unfortunately, in practice, source statistics are not always known a priori.
thereby compromising the ef iciency of the code. To overcome these practical
limitations, we may use the Lempel-Ziv algorithm/ which is intrinsically adaptive and
simpler to implement than Huffman coding.
A key to ile data compression is to have repetitive patterns of data so that patterns seen
once, can then be encoded into a compact code symbol, which is then used to represent
the pattern whenever it reappears in the ile. For example, in images, consecutive scan
lines (rows) of the image may be indentical. They can then be encoded with a simple code
character that represents the lines. In text processing, repetitive words, phrases, and
sentences may also be recognized and represented as a code. A typical ile data
compression algorithm is known as LZW - Lempel, Ziv, Welch encoding. Variants of this
algorithm are used in many ile compression schemes such as GIF iles etc. These are
lossless compression algorithms in which no data is lost, and the original ile can be
entirely reconstructed from the encoded message ile. The LZW algorithm is a greedy
algorithm in that it tries to recognize increasingly longer and longer phrases that are
repetitive, and encode them. Each phrase is de ined to have a pre ix that is equal to a
previously encoded phrase plus one additional character in the alphabet. Note “alphabet”
means the set of legal characters in the ile. For a normal text ile, this is the ascii character
set. For a gray level image with 256 gray levels, it is an 8 bit number that represents the
pixel’s gray level. In many texts certain sequences of characters occur with high frequency.
In English, for example, the word the occurs more often than any other sequence of three
letters, with and, ion, and ing close behind. If we include the space character, there are
other very common sequences, including longer ones like of the. Although it is impossible
to improve on Huffman encoding with any method that assigns a ixed encoding to each
character, we can do better by encoding entire sequences of characters with just a few
bits. The method of this section takes advantage of frequently occurring character
sequences of any length. It typically produces an even smaller representation than is
possible with Huffman trees, and unlike basic Huffman encoding it 1) reads through the
text only once and 2) requires no extra space for overhead in the compressed
representation. The algorithm makes use of a dictionary that stores character sequences
chosen dynamically from the text. With each character sequence the dictionary associates
a number; if s is a character sequence, we use codeword(s) to denote the number assigned
to s by the dictionary. The number codeword(s) is called the code or code number of s. All
codes have the same length in bits; a typical code size is twelve bits, which permits a
maximum dictionary size of 2 12 = 4096 character sequences.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy