DC-PPT 5
DC-PPT 5
ETE 3111
1
LEARNING OUTCOME
2
DISCRETE MEMORYLESS SOURCE
3
CHARACTERISTICS OF A DMS
4
MEASURE OF INFORMATION
5
PROPERTIES OF INFORMATION
6
ENTROPY
7
INFORMATION RATE
8
LOSSY VS LOSSLESS COMPRESSION
• Lossy compression reduces file size by permanently removing some data from
the original file, resulting in a lower quality but smaller file, while lossless
compression reduces file size without losing any information, allowing for
perfect reconstruction of the original data, although the compressed file might
be larger than with lossy compression.
9
LOSSY VS LOSSLESS COMPRESSION
Key differences:
Data loss:
• Lossy compression discards some data, leading to quality loss, while lossless
compression preserves all data, maintaining original quality.
File size:
• Lossy compression typically creates much smaller files compared to lossless
compression due to the discarded data.
10
LOSSY VS LOSSLESS COMPRESSION
Applications:
• Lossy compression is ideal for multimedia like images and videos where some
quality loss is acceptable for smaller file sizes, while lossless compression is
preferred for data where preserving exact information is crucial, like text
documents.
Examples:
• Lossy: JPEG image format
• Lossless: PNG image format, ZIP archive
11
AVERAGE CODEWORD LENGTH
• The average codeword length is the average length of all the codewords in a
set. It's a measurement used in coding theory to evaluate the efficiency of a
coding scheme.
How is it used?
• In data compression, a shorter average codeword length can lead to better
compression ratios.
• In error correction, a shorter average codeword length can lead to faster data
transmission rates.
12
AVERAGE CODEWORD LENGTH
• How is it calculated?
• The formula for calculating the average codeword length, denoted as "L", is:
Lavg = ∑ (Pi * li), where "Pi" represents the probability of the i-th symbol and
“li" is the length of its corresponding codeword.
• Example
• For example, if there are 25 codewords, each with five bits and a probability of
0.04, the average codeword length is 5 bits.
13
AVERAGE CODEWORD LENGTH
14
PREFIX CODE
• Other names
• Prefix codes are also known as prefix-free codes, prefix condition codes,
instantaneous codes, and Huffman codes.
15
PREFIX CODE
16
PREFIX CODE
17
PREFIX CODE
• For example, if you have a sequence of values that are encoded using a prefix
code, you can pick out each value without needing to know where one value
starts and ends.
18
INSTANTANEOUS CODE/ PREFIX CODE
19
INSTANTANEOUS CODE/ PREFIX CODE
Explanation
• Instantaneous codes are also known as prefix codes.
• A code is instantaneous if no codeword is a prefix of any other codeword.
• This means that you can decode each codeword in a sequence without
needing to refer to any later codewords.
• You can express instantaneous codes using code trees.
• Instantaneous codes are useful because you can decode each codeword as
soon as it's received.
20
FIXED-LENGTH ENCODING VS VARIABLE
LENGTH ENCODING
21
EXAMPLES OF FLC & VLC
22
EXAMPLES OF FLC & VLC
23
OPTIMAL CODES
• Optimal codes are the best codes available for encoding information, with the
shortest average word length. They are built using algorithms like Huffman's.
24
SOURCE CODING THEOREM
25
KEY POINTS ABOUT THE SOURCE
CODING THEOREM
Entropy as the limit:
The entropy of a source is a measure of its inherent uncertainty, and the source
coding theorem states that the average codeword length needed to represent
the source data cannot be less than the entropy.
Practical implications:
• This theorem is crucial in designing data compression algorithms, as it indicates
that the best compression that can be achieved to encode data is by using a bit
rate close to the source entropy.
Lossless compression:
• The source coding theorem applies primarily to lossless compression, where
the original data can be perfectly reconstructed from the compressed version
26
HUFFMAN CODING
27
BACKGROUND OF HC
28
HUFFMAN CODING
29
STEPS OF HUFFMAN CODING
30
EFFICIENCY
• n = H(m) / Lavg
31
REDUNDANCY
• Y= 1-n
32
TOTAL NUMBERS OF ORIGINAL
MESSAGES REQUIREMENT
• For an r-ary code, we will have exactly r messages left in the last reduced set if
and only if, the total number of original messages is r+k(r-1), where k is an
integer.
• This is because each reduction decreases the number of messages by r-1.
• Hence, if there is a total of k reductions, the total number of original messages
must be r+k(r-1)
• In case the original messages do not satisfy this condition, we must add some
dummy messages with zero probability of occurrence until this condition is
fulfilled.
33
IN CASE OF EXTENSIONS
34
EXAMPLE OF HUFFMAN CODING
• Ref book: Modern Digital and Analog Communication Systems by B.P.Lathi. (5th
Edition) chapter 12- Introduction to Information Theory.
• Example 12.1: A zero memory source emits six messages with probabilities
0.3, 0.25, 0.15, 0.12, 0.10 and 0.08. Find te 4-ary (Quaternary) Huffman code.
• Determine its average word length, the efficiency and the redundancy.
35
EXAMPLE OF HUFFMAN CODING
36
MATHEMATICAL PROBLEMS
Symbol S0 S1 S2 S3 S4 S5 S6
Probability 0.25 0.25 0.125 0.125 0.125 0.0625 0.0625
• Compute the Huffman code for this source moving the ‘combined’ symbol as
high as possible. Explain why the computed source code has an efficiency of
100 percent.
37
SOLUTION- PROBLEM 1
Table 1:
Symbol Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Stage 6
S0 0.25 0.25 0.25 0.25 0.50 0.50
S1 0.25 0.25 0.25 0.25 0.25 0.50
S2 0.125 0.125 0.25 0.25 0.25
S3 0.125 0.125 0.125 0.25
S4 0.125 0.125 0.125
S5 0.0625 0.125
S6 0.0625
38
SOLUTION: PROBLEM 1
Table 2:
Symbol Probability Codeword Codeword Length
S0 0.25 10 2
S1 0.25 11 2
S2 0.125 001 3
S3 0.125 010 3
S4 0.125 011 3
S5 0.0625 0000 4
S6 0.0625 0001 4
39
SOLUTION: PROBLEM 1
• We followed the path indicated by the dotted line to obtain the codeword for
symbol S0 as 10.
• Similarly we can obtain codewords for the remaining symbols. These are listed
in Table 2.
• Entropy, H= 2.625 bits/symbol
• Average codeword length, Lavg= 2.625 bits/symbol
• Efficiency, n = 100%
40
MATHEMATICAL PROBLEMS
41
MATHEMATICAL PROBLEMS
• Problem 3: For the previous problem , obtain the compact 3-ary code and find
the average length of the codeword. Determine the efficiency and redundancy
of this code.
• Sol- Problem 3:
• H=1.96875
• Lavg= 1.3125 3-ary digits/message
• n=94.64% and redundancy= 5.36%
42
MATHEMATICAL PROBLEMS
• Sol: Problem 4:
• Minimum no. of messages required = r+k (r-1) = 4+1(4-1) = 7 where, k = no.
of stages and r=4 for 4-ary coding.
43
MATHEMATICAL PROBLEMS
• Problem 6: For the text “INFORMATION THEORY”, find 3-ary Huffman code
and also determine average length, entropy , code efficiency and redundancy.
44