0% found this document useful (0 votes)
13 views44 pages

DC-PPT 5

The document covers key concepts in digital communication, including source coding theorems, optimal codes, and various coding techniques such as Huffman coding and prefix codes. It explains the characteristics of discrete memoryless sources, measures of information, entropy, and the differences between lossy and lossless compression. Additionally, it discusses the efficiency and redundancy of coding schemes, along with mathematical examples to illustrate Huffman coding and its applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views44 pages

DC-PPT 5

The document covers key concepts in digital communication, including source coding theorems, optimal codes, and various coding techniques such as Huffman coding and prefix codes. It explains the characteristics of discrete memoryless sources, measures of information, entropy, and the differences between lossy and lossless compression. Additionally, it discusses the efficiency and redundancy of coding schemes, along with mathematical examples to illustrate Huffman coding and its applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

DIGITAL COMMUNICATION

ETE 3111

1
LEARNING OUTCOME

• Source Coding Theorem


• Optimal Codes
• Prefix codes
• Huffman codes
• Shannon-Fano-Elias Codes

2
DISCRETE MEMORYLESS SOURCE

• A discrete memoryless source (DMS) is a source of information where the


probability of a symbol doesn't depend on the previous symbol. It's a type of
discrete source that produces a sequence of symbols from a finite alphabet

3
CHARACTERISTICS OF A DMS

• Alphabet: The set of symbols that the source can produce


• Symbol probabilities: The probability of each symbol in the alphabet
• Symbol rate: The number of symbols produced per second
• Entropy: The average amount of information delivered by the source
• Information rate: The product of the entropy and the average number of
symbols produced per second
• Redundancy: The difference between the actual entropy and the maximum
entropy

4
MEASURE OF INFORMATION

• It is an information content of a message.


• Consider an information source emitting independent messages
m={m1,m2,………mn} with probability of occurence P={P1,P2,………Pn}
• Here, P1+P2+………+Pn=1
• Amount of Information is given by,
• Ik= log2(1/Pk) = log(1/Pk) / log2 bits.

5
PROPERTIES OF INFORMATION

• More uncertainty about message means Information is more.


If P 1 =1/4 and P2=1/2
Then, I 1 = log2 (1/ P 1) = log2 4 = 2 log2 2 = 2 bits
And I 2 = log2 (1/ P 2) = log2 2= 1 bit
• If receiver knows the message which is being transmitted then information is
zero.
In this case P=1
I= log2 (1/ P ) = log2 (1) = 0 bit

6
ENTROPY

• Entropy is the average information per message of a source. For a random


variable X, Entropy is denoted by H(X)
• Hence, H(X) = ∑ Pi Ii
= ∑ Pi log (1/Pi )
= - ∑ Pi log Pi
Unit: Bits/messages or Bits/symbol.

7
INFORMATION RATE

• The information rate is represented by R and is given as:


Information rate : R = r H
Where, ‘H’ is the entropy and ‘r’ is the rate at which messages are generated.

Information rate R is presented in average number of bits of information per


second.
It is calculated as follows:
R= r (messages/second) * H (bits/messages)
R = rH bits/second

8
LOSSY VS LOSSLESS COMPRESSION

• Lossy compression reduces file size by permanently removing some data from
the original file, resulting in a lower quality but smaller file, while lossless
compression reduces file size without losing any information, allowing for
perfect reconstruction of the original data, although the compressed file might
be larger than with lossy compression.

9
LOSSY VS LOSSLESS COMPRESSION

Key differences:
Data loss:
• Lossy compression discards some data, leading to quality loss, while lossless
compression preserves all data, maintaining original quality.
File size:
• Lossy compression typically creates much smaller files compared to lossless
compression due to the discarded data.

10
LOSSY VS LOSSLESS COMPRESSION

Applications:
• Lossy compression is ideal for multimedia like images and videos where some
quality loss is acceptable for smaller file sizes, while lossless compression is
preferred for data where preserving exact information is crucial, like text
documents.

Examples:
• Lossy: JPEG image format
• Lossless: PNG image format, ZIP archive

11
AVERAGE CODEWORD LENGTH

• The average codeword length is the average length of all the codewords in a
set. It's a measurement used in coding theory to evaluate the efficiency of a
coding scheme.
How is it used?
• In data compression, a shorter average codeword length can lead to better
compression ratios.
• In error correction, a shorter average codeword length can lead to faster data
transmission rates.

12
AVERAGE CODEWORD LENGTH

• How is it calculated?
• The formula for calculating the average codeword length, denoted as "L", is:
Lavg = ∑ (Pi * li), where "Pi" represents the probability of the i-th symbol and
“li" is the length of its corresponding codeword.
• Example
• For example, if there are 25 codewords, each with five bits and a probability of
0.04, the average codeword length is 5 bits.

13
AVERAGE CODEWORD LENGTH

Calculate average codeword length.

14
PREFIX CODE

• A prefix code is a code system where no code word is a prefix of another


code word. This property makes prefix codes useful for data compression and
searching.

• Other names
• Prefix codes are also known as prefix-free codes, prefix condition codes,
instantaneous codes, and Huffman codes.

15
PREFIX CODE

16
PREFIX CODE

How does it work?


• In a prefix code, no code word can be derived from another by adding bits to a
shorter code word.
• Prefix codes are represented by binary trees, where each leaf corresponds to a
symbol.
• The left and right branches of the tree are coded by 0 and 1, respectively.
• The binary code word for a symbol is the sequence of 0s and 1s that
corresponds to the path from the root to the leaf.

17
PREFIX CODE

Why are prefix codes useful?


• Prefix codes are useful because they guarantee unique decodability and
instantaneous decoding.

• For example, if you have a sequence of values that are encoded using a prefix
code, you can pick out each value without needing to know where one value
starts and ends.

18
INSTANTANEOUS CODE/ PREFIX CODE

• An instantaneous code is a code where each codeword can be decoded as


soon as it's received. This means that you don't need to wait for later symbols
to decode the current symbol.

19
INSTANTANEOUS CODE/ PREFIX CODE

Explanation
• Instantaneous codes are also known as prefix codes.
• A code is instantaneous if no codeword is a prefix of any other codeword.
• This means that you can decode each codeword in a sequence without
needing to refer to any later codewords.
• You can express instantaneous codes using code trees.
• Instantaneous codes are useful because you can decode each codeword as
soon as it's received.

20
FIXED-LENGTH ENCODING VS VARIABLE
LENGTH ENCODING

• In digital communication, "fixed-length encoding" means each piece of data (like


a character or symbol) is represented by a code with the same number of bits,
while "variable-length encoding" allows different data elements to be
represented by codes with varying bit lengths, often leading to better
compression when dealing with data where certain symbols occur more
frequently than others; essentially, frequently used symbols get shorter codes
in variable-length encoding.

21
EXAMPLES OF FLC & VLC

• Fixed-Length Coding: This technique uses a constant number of bits to


represent each symbol from the source. Although simple, it may not always be
the optimal choice for minimizing the number of bits required. For instance, if
we have three colors (Red, Green, and Blue) each represented with equal
probability, a fixed-length approach would use 2 bits per symbol.
• ASCII code is fixed-length.The ASCII standard uses 7 bits per character.

22
EXAMPLES OF FLC & VLC

• Variable-Length Coding: On the other hand, variable-length coding assigns


different lengths of code words to different symbols based on their
probabilities. More frequent symbols get shorter code words, and less frequent
ones get longer ones. This method can bring us closer to the theoretical
minimum, known as the source's entropy. For the same example, using variable-
length coding could bring down the bit requirement to 1.5 bits per symbol.

23
OPTIMAL CODES

• Optimal codes are the best codes available for encoding information, with the
shortest average word length. They are built using algorithms like Huffman's.

24
SOURCE CODING THEOREM

• The source coding theorem, also known as Shannon's source coding


theorem (or noiseless coding theorem)in digital communication states
that the minimum average number of bits required to represent information
from a source (like text, audio, or video) is equal to its entropy, meaning that
no compression technique can reliably represent data using fewer bits per
symbol than the entropy of the source, without losing information; essentially
setting a theoretical limit on how much data can be compressed without loss.
• The average codeword length Lavg for any distortion less source encoding is
bounded as follows:
• H(X) ≤ Lavg

25
KEY POINTS ABOUT THE SOURCE
CODING THEOREM
Entropy as the limit:
The entropy of a source is a measure of its inherent uncertainty, and the source
coding theorem states that the average codeword length needed to represent
the source data cannot be less than the entropy.
Practical implications:
• This theorem is crucial in designing data compression algorithms, as it indicates
that the best compression that can be achieved to encode data is by using a bit
rate close to the source entropy.
Lossless compression:
• The source coding theorem applies primarily to lossless compression, where
the original data can be perfectly reconstructed from the compressed version

26
HUFFMAN CODING

• Huffman coding is a lossless data compression algorithm. The idea is to assign


variable-length codes to input characters, lengths of the assigned codes are
based on the frequencies of corresponding characters.
• The variable-length codes assigned to input characters are Prefix Codes, means
the codes (bit sequences) are assigned in such a way that the code assigned to
one character is not the prefix of code assigned to any other character. This is
how Huffman Coding makes sure that there is no ambiguity when decoding
the generated bitstream.

27
BACKGROUND OF HC

• In computer science and information theory, a Huffman code is a particular


type of optimal prefix code that is commonly used for lossless data
compression. The process of finding or using such a code is Huffman coding, an
algorithm developed by David A. Huffman while he was a Sc.D. student at MIT,
and published in the 1952 paper "A Method for the Construction of Minimum-
Redundancy Codes"

28
HUFFMAN CODING

• There are mainly two major parts in Huffman Coding

• Build a Huffman Tree from input characters.


• Traverse the Huffman Tree and assign codes to characters.

29
STEPS OF HUFFMAN CODING

The steps of Huffman coding are:


• Count character frequency: Count how many times each character appears in
the data.
• Sort by frequency: Put the characters in order from most frequent to least
frequent.
• Build a Huffman tree: Create a tree with each unique character as a leaf node.
• Traverse the tree: Find the code for each character in the message by
traversing the tree.
• Assign codes: Assign variable-length codes to each character based on how
often they appear in the data. The most frequent character gets the shortest
code.

30
EFFICIENCY

• The merit of any code is measured by its average length in comparison to


H(m) (the average minimum length or Entropy).
• We define the code efficiency n (eta) as

• n = H(m) / Lavg

31
REDUNDANCY

• The redundancy Y is defined as

• Y= 1-n

32
TOTAL NUMBERS OF ORIGINAL
MESSAGES REQUIREMENT

• For an r-ary code, we will have exactly r messages left in the last reduced set if
and only if, the total number of original messages is r+k(r-1), where k is an
integer.
• This is because each reduction decreases the number of messages by r-1.
• Hence, if there is a total of k reductions, the total number of original messages
must be r+k(r-1)
• In case the original messages do not satisfy this condition, we must add some
dummy messages with zero probability of occurrence until this condition is
fulfilled.

33
IN CASE OF EXTENSIONS

• Number of messages = (m-ary)^order

34
EXAMPLE OF HUFFMAN CODING

• Ref book: Modern Digital and Analog Communication Systems by B.P.Lathi. (5th
Edition) chapter 12- Introduction to Information Theory.
• Example 12.1: A zero memory source emits six messages with probabilities
0.3, 0.25, 0.15, 0.12, 0.10 and 0.08. Find te 4-ary (Quaternary) Huffman code.
• Determine its average word length, the efficiency and the redundancy.

35
EXAMPLE OF HUFFMAN CODING

• Example 12.2: A memoryless source emits messages m1 and m2 with


probabilities 0.8 and 0.2 respectively. Find the optimum (Huffman) binary code
for this source as well as for its second and third order extensions (i.e. for
N=2 and 3). Determine the code efficiencies in each case.

36
MATHEMATICAL PROBLEMS

• Problem 1 : A discrete memoryless source has an alphabet of seven symbols


with probabilities for its output as described in the following table:

Symbol S0 S1 S2 S3 S4 S5 S6
Probability 0.25 0.25 0.125 0.125 0.125 0.0625 0.0625

• Compute the Huffman code for this source moving the ‘combined’ symbol as
high as possible. Explain why the computed source code has an efficiency of
100 percent.

37
SOLUTION- PROBLEM 1

Table 1:
Symbol Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6
S0 0.25 0.25 0.25 0.25 0.50 0.50
S1 0.25 0.25 0.25 0.25 0.25 0.50
S2 0.125 0.125 0.25 0.25 0.25
S3 0.125 0.125 0.125 0.25
S4 0.125 0.125 0.125
S5 0.0625 0.125
S6 0.0625

38
SOLUTION: PROBLEM 1

Table 2:
Symbol Probability Codeword Codeword Length
S0 0.25 10 2
S1 0.25 11 2
S2 0.125 001 3
S3 0.125 010 3
S4 0.125 011 3
S5 0.0625 0000 4
S6 0.0625 0001 4

39
SOLUTION: PROBLEM 1

• We followed the path indicated by the dotted line to obtain the codeword for
symbol S0 as 10.
• Similarly we can obtain codewords for the remaining symbols. These are listed
in Table 2.
• Entropy, H= 2.625 bits/symbol
• Average codeword length, Lavg= 2.625 bits/symbol
• Efficiency, n = 100%

40
MATHEMATICAL PROBLEMS

• Problem 2: A source emits seven messages with probabilities ½, ¼, 1/8, 1/16,


1/32, 1/64 and 1/64 respectively. Find the entropy of the source. Obtain the
compact binary code and find the average length of the code word.
Furthermore, Determine the efficiency and the redundancy of the code.
• Solution: Problem 2:
• H= 1.96875
• Lavg= 1.96875
• n= 100% and Redundancy= 0 %

41
MATHEMATICAL PROBLEMS

• Problem 3: For the previous problem , obtain the compact 3-ary code and find
the average length of the codeword. Determine the efficiency and redundancy
of this code.
• Sol- Problem 3:
• H=1.96875
• Lavg= 1.3125 3-ary digits/message
• n=94.64% and redundancy= 5.36%

42
MATHEMATICAL PROBLEMS

• Problem 4: A memoryless source emits six messages with probabilities 0.3,


0.25, 0.15, 0.12, 0.1 and 0.08. Find the 4-ary (Quaternary) Huffman code.
Determine its average codeword length, the efficiency and the redundancy.

• Sol: Problem 4:
• Minimum no. of messages required = r+k (r-1) = 4+1(4-1) = 7 where, k = no.
of stages and r=4 for 4-ary coding.

43
MATHEMATICAL PROBLEMS

• Problem 5: A source produces 9 symbols (S1,S2,….,S9). Construct ternary &


quaternary Huffman coding by moving symbols as high as possible.. Also find
efficiency & redundancy of the coding. Probability of symbols are given by:
Pi={0.11,0.11,0.11, 0.11,0.11,0.11, 0.11,0.11,0.11 }

• Problem 6: For the text “INFORMATION THEORY”, find 3-ary Huffman code
and also determine average length, entropy , code efficiency and redundancy.

44

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy