0% found this document useful (0 votes)

3 views16 pages

Week 2

The document discusses the concepts of convexity and concavity in information measures, particularly focusing on entropy and mutual information. It introduces key theorems in information theory, such as the Source Coding Theorem and the Channel Coding Theorem, and elaborates on data processing inequalities and Huffman coding for optimal prefix codes. Additionally, it covers the Kraft Inequality and the conditions for constructing efficient coding schemes in communication systems.

Uploaded by

v3193373

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views16 pages

Week 2

Uploaded by

v3193373

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Week 2

1 / 16
Convexity and Concavity of Information Measures
Often we need to maximize or minimize Information measures in
different contexts
As we know, convexity and concavity guarantee that local
minima/maxima is global
Concavity of H(X )
We have already seen this for Bernoulli RV but not for any pmf. To show
this, we take two random variables X1 and X2 with pmfs p1 and p2 defined
over the same sample space.
Obviously, H(X1 ) is a function in terms of the pmf p1 , so we may represent
this as H(p1 ). We need to show that
H(λp1 + (1 − λ)p2 ) ≥ λH(p1 ) +( (1 − λ)H(p2 ) for λ ∈ [0, 1]
1 with prob λ
Let us define Z = Xθ where θ =
2 with prob 1 − λ
P
P[Z = z] = θ P[Z = z, θ] = P[Z = X1 = z|θ = 1]P[θ = 1] + P[Z =
X2 = z|θ = 2]P[θ = 2] = λp1 (z) + (1 − λ)p2 (z). As,
H(Z ) ≥ H(Z |θ) ⇒ H(λp1 + (1 − λp )) ≥ λH(p1 ) + (1 − λ)H(p2 )
2 /216
Concavity of MI
Later, we will see that capacity of a communication channel where the
channel is defined by p(y |x) is given as:
C = maxp(x) I (X ; Y )
What is the guarantee that we will be able obtain such a p(x)?
Lemma: Convexity/Concavity is preserved over affine transforms.
Affine transforms are of the form Ax + b where x and b are vectors. Show
that if f (x) is convex, so is f (Ax + b).
Concavity of I (X ; Y ) for fixed p(y |x)
P P
I (X ; Y ) = H(Y )−H(Y |X ) = y p(y ) log p(y )− x p(x) log H(Y |X = x)
p(y |x) isPfixed, so we need to worry about how I (X ; Y ) behaves w.r.t p(x).
p(y ) = x p(x)p(y |x). As p(y |x) is fixed, p(y ) is a linear function of
p(x). H(Y ) is a concave function of p(y ) which is a linear function of p(x)
and hence H(Y ) is a concave function of p(x). For fixed p(y |x), H(Y |X )
is fixed and hence the right term is a linear function of p(x). A linear
function is both concave and convex (satisfied with equality). So, Concave
- Convex(linear) = Concave + Concave = Concave (Why?)
3 / 16
Data Processing Inequality
Markov Chain
A sequence of random variables is said to be a markov chain if any RV in
the chain only depends on its previous one. Graphically, let X , Y , Z be 3
RVs, X → Y → Z is a Markov chain if
P(Z |X , Y ) = P(Z |Y ) ⇒ P(X , Y , Z ) = P(X )P(Y , Z |X ) =
P(X )P(Y |X )P(Z |Y , X ) = P(X )P(Y |X )P(Z |Y )
We can also, say that given Y , X and Z are independent, as:
P(X , Z |Y ) = P(X |Y )P(Z |X , Y ) = P(X |Y )P(Z |Y )
Hence, I (X ; Z |Y ) = 0

Data Processing Inequality1 (DPI)

It states that no matter how you obtain Z from Y , if X → Y → Z ,
information about X from Z cannot be more than that obtained from Y

I (X ; Y ) ≥ I (X ; Z )
1
Prove this by expanding I (X ; Y , Z ) = I (X ; Z ) + I (X ; Y |Z ) = I (X ; Y ) + I (X ; Z |Y )
4 / 16
Fano’s Inequality
We may think of a comm system as X → Y → g (Y ) = X̂
We are interested
( in Pe = P[X̂ ̸= X ] and its bounds.
1 if X̂ ̸= X w.p. pe
Let E = .
0 otherwise w.p. (1 − pe )
H(E ) = H(pe ) = −[pe log pe + (1 − pe ) log (1 − pe )]
H(E , X |X̂ ) = H(X |X̂ ) + H(E |X , X̂ )
| {z }
=0
= H(E |X̂ ) + H(X |E , X̂ )
| {z } | {z }
≤H(E )=H(Pe ) ≤Pe log |X |

H(X |E , X̂ ) = P[E = 0]H(X |X̂ , E = 0) + P[E = 1]H(X |X̂ , E = 1)

≤ 0 + Pe log |X |

H(Pe ) + Pe log |X | ≥ H(X |X̂ ) ≥ H(X |Y ) [∵ I (X ; Y ) ≥ I (X ; Xˆ) by DPI]

log |X | term may be improved to log (|X | − 1) if it is guaranteed that
X̂ ∈ X because H(X |X̂ , E = 1) can have |X | − 1 possibilities
5 / 16
Information Theory in a Nutshell

Shannon’s work or the subject revolves around 3 main theorems:

Source Coding Theorem: The average code length of a source is
greater or equal to entropy
Channel Coding Theorem: There exists a coding scheme which
allows one to send data at a rate (termed as the capacity of a channel)
with arbitrary low probability of error . The catch of this theorem is
that there exists such a code but Shannon did not show which code
will achieve this! Such proofs are tricky as they are non-constructive!
Source-channel separation theorem: Since source (compressing)
and channel coding (expanding) are exactly opposite, it might appear
that they are interdependent (depends on entropy and MI from
previous theorems). However, the theorem states that they are
independent and can be treated separately and is the reason behind
having two separate blocks in a communication system.

6 / 16
Data Compression: Source Coding
X Singular Non-singular, Uniquely decod- Instantaneous
but not uniquely able, but not in-
decodable stantaneous
1 0 0 10 0
2 0 010 00 10
3 0 01 11 110
4 0 10 110 111

Code Classes
Singular: Many-one mapping
Non-singular: One-one mapping
Uniquely Decodable: Code extension is non-singular (may need
following bits to decode)
Instantaneous/Prefix: Does not require next bits to decode (self
punctuating/decoding/prefix-free)

All Codes ⊃ Singular ⊃ Non-singular7 / ⊃

16 Uniquely Decodable ⊃ Prefix
Kraft Inequality
For any instantaneous code over an alphabet of size D, the codeword
lengths l1 , l2 , . . . , lm must satisfy
X
D −li ≤ 1
i

Conversely, given a set of codeword lengths that satisfy this inequality,

there exists an instantaneous code with these word lengths.
Proof:
The proof uses the following arguments:
A D-ary tree where each node has D possible children. The nodes
(except root) can represent any codeword.
If a codeword or node is part of the prefix tree, then its descendants
cannot be a codeword (violates prefix condition)
Let lmax be the length of the longest codeword
The nodes at the level of lmax are a codeword, or not (meaning they
are descendants of codewords from previous levels)
8 / 16
Kraft Inequality (Contd)

Based on the arguments, the total no

of leaf nodes at lmax is D lmax . The no
of descendants of a codeword at level
i at lmax is D lmax −li 1 . The set of these
descendant codewords are disjoint for
every
P limaxand hence, we should have:
−li ≤ D lmax ⇒
P −i
i D i D ≤1
Figure: Tree for D = 2
The converse is also evident as any code say l1 , l2 , . . . , lm be sorted in
length. Then l1 is the 1st codeword with all its descendants removed, l2
the 2nd with its descendants removed and so on. Eventually, it forms a tree
with just the codewords as leaf nodes, which should satisfy the inequality.
1
So take a codeword(node) at level li ; this node will be producing 2 (for D=2) new
children at every level after li . The number of such levels is lmax − li and hence this node
has 2lmax −li descendants till lmax .
9 / 16
Optimal Codes
We want codes with minimum average length while it should be
instantaneous/prefix.
So, this is an optimization problem s.t. constraints.
X
min L = pi li
X
s.t. D −li ≤ 1

Evidently, if li is reduced, D −li increases. So the objective and constraint

counter each other. So, we try to use equality constraint instead of an
inequality. UsingP method of Lagrange multipliers, we have:
J = pi li + λ( D −li − 1). Differentiating w.r.t li , we have:
P
∂J −li ln D = 0 ⇒ D −li = pi ∗
∂lj = pi − λD P λ ln D ⇒ λ = 1/ ln D ⇒ li = − logD pi
Hence, L∗ = − pi logD pi = HD (X )
However, it might not be possible to set li = − logD pi because we ignored
the constraint that li must be non-negative integers. Hence, the objective
is to get as close as possible to the entropy.
10 / 16
Source Coding Theorem

The expected length L for any instantaneous D-ary code for a random
variable X is greater or equal to the entropy HD (X ): L ≥ HD (X ) with
equality iff D −li =
Ppi −l
Proof: Let L = pi li be the expected length, Let ri = PDD −li i
X X
L − HD (X ) = pi li + pi logD pi
i
X X
=− pi logD D −li + pi logD pi
X pi X
= pi logD − logD D −li
ri
≥ 0 [∵ Gibbs Inequality and Kraft Inequality]
P −l
Equality occurs iff D(p||r ) = 0 and D i = 1 which happens when
−l
pi = D ⇒ li = − logD pi ∈ Z ∀i
i +

11 / 16
Upper Bound on Length
As we see li = logD p1i may not be an integer. The next best could be
li = ⌈logD p1i ⌉.
If we choose this li , first we check if it satisfies Kraft’s inequality.
Of course,
1 1 1
logD ≤ ⌈logD ⌉ < logD +1
pi pi pi
1 1
−⌈log ⌉ − log
D −li = D D p
i ≤D D p
i

P −⌈logD p1 ⌉ P − logD p1 P
and hence, D i ≤ D i = pi = 1
So, this choice of li satisfies Kraft’s inequality and by taking expectation
over the first inequality, we have

1 1 1
E logD ≤ E ⌈logD ⌉ < E logD +1
pi pi pi
∴ HD (X ) ≤ L < HD (X ) + 1

12 / 16
Huffman Codes
We know li∗ = − logD pi is the optimal prefix code length which might not
be realizable. The code obtained by the ceil is upper bounded in length but
of course not optimal and thus there is a question that arises: Is there an
optimal realizable prefix code?
Huffman Codes
Given an alphabet, Huffman came up with a simple algorithm that
generates a prefix code that is optimal in the sense that any other code
over the same alphabet cannot have a lower expected length.

13 / 16
Optimality Proof
The complete proof is provided in the book. The ideas are presented in
brief over here.
There are many optimal codes while Huffman is one of them.
WLOG PMFs P are ordered, so that p1 ≥ p2 · · · ≥ pm . A code is
optimal if pi li is minimal.

Properties of Instantaneous/Prefix Optimal Code

The lengths are ordered inversely with probabilities
The two longest codewords have same length
Two longest codewords differ only in the last bit and correspond to
two least likely symbols

The first one is proved using a swapping argument. Let Cm be an optimal

′
code with pj > pk and lj ≤ lk , satisfying the property while Cm is obtained
by swapping codewords j and k and hence having lengths lk and lj . By
′
computing L(Cm ) − L(Cm ) show that lj ≤ lk for optimality of L(Cm ).
14 / 16
Optimality of Huffman Code - Arguments

The 2nd follows by trimming. If p5 has

longer length as shown in (a), without
disturbing the prefix condition, we can
trim the last bit to obtain (b).
The last follows by rearrangement. Even Should be p_1

if p4 and p5 are not siblings in (c), we

can make them so, as they have same
lengths and others can be swapped
accordingly to get (d). Siblings only
differ in last bit.
Figure: p1 ≥ p2 ≥ · · · ≥ p5
Thus, if p1 ≥ p2 ≥ · · · ≥ pm , ∃ an optimal code with
l1 ≤ l2 ≤ · · · ≤ lm−1 = lm , and codewords C (xm−1 ) and C (xm ) that differ
only in the last bit. These are called canonical codes.
15 / 16
Induction step (Canonical ↔ Huffman)
To show Huffman code is optimal, we define a Huffman reduction (HR)
for an alphabet of size m, p = (p1 , p2 , . . . , pm ) with p1 ≥ p2 ≥ · · · ≥ pm as
p ′ = (p1 , p2 , . . . , pm−1 + pm ) over an alphabet of size m − 1. Let L∗ (p)
and L∗ (p ′ ) denote two optimal canonical codes.

HR
L∗ (p) −−→ L(p ′ )[(a) → (c)]
Expansion
L∗ (p ′ ) −−−−−−→ L(p)

Figure: Expansion

L(p) = L∗ (p ′ ) + pm−1 + pm , L(p ′ ) = L∗ (p) − pm−1 − pm ⇒

(L(p) − L∗ (p)) + (L(p ′ ) − L∗ (p ′ )) = 0 ⇒ L(p) = L(p ∗ )
16 / 16

Pending TS of CRIF Works
No ratings yet
Pending TS of CRIF Works
21 pages
Data Compression Solutions
79% (19)
Data Compression Solutions
67 pages
Paracetamol - Infusion: Presentation Dose
No ratings yet
Paracetamol - Infusion: Presentation Dose
2 pages
MY Resume
No ratings yet
MY Resume
1 page
Eir December 2019
No ratings yet
Eir December 2019
1,937 pages
Math7224 Notes
No ratings yet
Math7224 Notes
32 pages
Ranjan Bose Information Theory Coding and Cryptography Solution Manual
89% (38)
Ranjan Bose Information Theory Coding and Cryptography Solution Manual
61 pages
Zanussi ZAN2250 Sewing Machine Instruction Manual
No ratings yet
Zanussi ZAN2250 Sewing Machine Instruction Manual
76 pages
Session 3
No ratings yet
Session 3
44 pages
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
No ratings yet
Elements of Information Theory 2006 Thomas M. Cover and Joy A. Thomas
16 pages
Smart Assistive Multi Final
No ratings yet
Smart Assistive Multi Final
11 pages
Carleial 1983
No ratings yet
Carleial 1983
5 pages
Solution Manual: A. Viterbi K. Omura
No ratings yet
Solution Manual: A. Viterbi K. Omura
207 pages
Huffman Coding
No ratings yet
Huffman Coding
39 pages
2020 05 22 - 684496o PDF
No ratings yet
2020 05 22 - 684496o PDF
2 pages
Machine and Industrial Design in Mechanical Engineering (Milan Rackov, Radivoje Mitrović, Maja Čavić) (Z-Library)
No ratings yet
Machine and Industrial Design in Mechanical Engineering (Milan Rackov, Radivoje Mitrović, Maja Čavić) (Z-Library)
725 pages
Entropy
No ratings yet
Entropy
9 pages
Course Catalogue and Timetable 1st-2nd Semester - Dept of Engineering Enzo Ferrari - AA2023-2024
No ratings yet
Course Catalogue and Timetable 1st-2nd Semester - Dept of Engineering Enzo Ferrari - AA2023-2024
2 pages
Datasheet SX95
No ratings yet
Datasheet SX95
1 page
IICT Notes Unit-2
No ratings yet
IICT Notes Unit-2
17 pages
Introduction To Information Theory
No ratings yet
Introduction To Information Theory
20 pages
Accident Definition & Meaning - Merriam-Webster
No ratings yet
Accident Definition & Meaning - Merriam-Webster
8 pages
HTML & SQL Programmes
No ratings yet
HTML & SQL Programmes
4 pages
Midit 10
No ratings yet
Midit 10
5 pages
Optimal Source Code: L L L L P L
No ratings yet
Optimal Source Code: L L L L P L
11 pages
Chap 2
No ratings yet
Chap 2
47 pages
Dual, Low Noise, High Performance Uncompensated Operational Amplifier
No ratings yet
Dual, Low Noise, High Performance Uncompensated Operational Amplifier
5 pages
EE4740 Lecture4 Slides
No ratings yet
EE4740 Lecture4 Slides
43 pages
English 4: Quarter 1: Week 3
No ratings yet
English 4: Quarter 1: Week 3
12 pages
Markov Model Info
No ratings yet
Markov Model Info
7 pages
Information Theory Notes
No ratings yet
Information Theory Notes
4 pages
Information Theory
No ratings yet
Information Theory
122 pages
Mve 200 - 3
No ratings yet
Mve 200 - 3
2 pages
Area Manager Training Programme Overview PDF
No ratings yet
Area Manager Training Programme Overview PDF
2 pages
Unit 1 INFORMATION ENTROPY FUNDAMENTALS
No ratings yet
Unit 1 INFORMATION ENTROPY FUNDAMENTALS
13 pages
Digital Communication Unit 5
No ratings yet
Digital Communication Unit 5
105 pages
Lect2 PDF
No ratings yet
Lect2 PDF
25 pages
Placer Gold Operations Manual
100% (1)
Placer Gold Operations Manual
178 pages
Lecture 2: Gibb's, Data Processing and Fano's Inequalities: 2.1.1 Fundamental Limits in Information Theory
No ratings yet
Lecture 2: Gibb's, Data Processing and Fano's Inequalities: 2.1.1 Fundamental Limits in Information Theory
6 pages
Information and Entropy: Aria Nosratinia - Information Theory 2-1
No ratings yet
Information and Entropy: Aria Nosratinia - Information Theory 2-1
7 pages
Cotton Case Study
No ratings yet
Cotton Case Study
2 pages
Cet455-Qp May 24
No ratings yet
Cet455-Qp May 24
2 pages
CS340 Machine Learning Information Theory
No ratings yet
CS340 Machine Learning Information Theory
22 pages
Entropy
No ratings yet
Entropy
21 pages
Podcast Lesson Plan
No ratings yet
Podcast Lesson Plan
3 pages
15-583:algorithms in The Real World: Data Compression I - Introduction - Information Theory - Probability Coding
No ratings yet
15-583:algorithms in The Real World: Data Compression I - Introduction - Information Theory - Probability Coding
33 pages
Source Coding Ompression
No ratings yet
Source Coding Ompression
34 pages
EE6340 - Information Theory Problem Set 5 Solution: Max Max I
No ratings yet
EE6340 - Information Theory Problem Set 5 Solution: Max Max I
3 pages
Information Theory Lecture Notes
No ratings yet
Information Theory Lecture Notes
37 pages
Md-070 Application Extensions Technical Design
100% (1)
Md-070 Application Extensions Technical Design
16 pages
Transmitters Compression
No ratings yet
Transmitters Compression
18 pages
Information Theory and Coding PDF
No ratings yet
Information Theory and Coding PDF
150 pages
Noise, Information Theory, and Entropy
No ratings yet
Noise, Information Theory, and Entropy
34 pages
Lecture 9
No ratings yet
Lecture 9
24 pages
2 Information Theory
No ratings yet
2 Information Theory
40 pages
Saffola
No ratings yet
Saffola
2 pages
Code-Based Cryptography: Error-Correcting Codes and Cryptography
No ratings yet
Code-Based Cryptography: Error-Correcting Codes and Cryptography
46 pages
Type VR Vacuum Circuit Breaker Interruptor Automático Al Vacío Tipo VR Disjoncteur Sous Vide Type VR
No ratings yet
Type VR Vacuum Circuit Breaker Interruptor Automático Al Vacío Tipo VR Disjoncteur Sous Vide Type VR
113 pages
EE 376A: Information Theory: Lecture Notes
No ratings yet
EE 376A: Information Theory: Lecture Notes
75 pages
3 Source Coding
No ratings yet
3 Source Coding
31 pages
Mobile Communicaton Engineering: Review On Fundamental Limits On Communications
No ratings yet
Mobile Communicaton Engineering: Review On Fundamental Limits On Communications
31 pages
SpeedHeat EzeeStat II Instructions Rev 04
No ratings yet
SpeedHeat EzeeStat II Instructions Rev 04
4 pages
ICE513 Module 4 - Source Coding
No ratings yet
ICE513 Module 4 - Source Coding
26 pages
Information Theory and Coding: What You Need To Know in Today's ICE Age!
No ratings yet
Information Theory and Coding: What You Need To Know in Today's ICE Age!
44 pages
ECEVSP L03 Compression2
No ratings yet
ECEVSP L03 Compression2
40 pages
Noise, Information Theory, and Entropy: CS414 - Spring 2007
No ratings yet
Noise, Information Theory, and Entropy: CS414 - Spring 2007
44 pages
Lecture 6 PDF
No ratings yet
Lecture 6 PDF
5 pages
Notes It
No ratings yet
Notes It
46 pages
Unit I Information Theory & Coding Techniques P I
No ratings yet
Unit I Information Theory & Coding Techniques P I
48 pages
Chemical Catalog
No ratings yet
Chemical Catalog
58 pages
States of Matter
No ratings yet
States of Matter
9 pages
Ranjan Bose Information Theory Coding and Cryptography Solution Manual PDF
100% (1)
Ranjan Bose Information Theory Coding and Cryptography Solution Manual PDF
61 pages
70 433 Question
No ratings yet
70 433 Question
5 pages
Data Compression Can Be Achieved by Assigning To of The Data Source and
No ratings yet
Data Compression Can Be Achieved by Assigning To of The Data Source and
42 pages
Data Compression: Peng-Hua Wang
No ratings yet
Data Compression: Peng-Hua Wang
41 pages
Data Compression Introduction
No ratings yet
Data Compression Introduction
43 pages
Lec27 PDF
No ratings yet
Lec27 PDF
26 pages
5 Data Compression
No ratings yet
5 Data Compression
6 pages
Saira
100% (1)
Saira
6 pages
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
No ratings yet
Lecture 2 28 August, 2015: 2.1 An Example of Data Compression
7 pages
Uniquely Decodable Codes
No ratings yet
Uniquely Decodable Codes
10 pages
Project Delivery Scaffold Erecting Dismantling and Modification Inspection Checklist
No ratings yet
Project Delivery Scaffold Erecting Dismantling and Modification Inspection Checklist
3 pages
Anila 8611
No ratings yet
Anila 8611
18 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
4.5/5 (2)
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Week 2

Uploaded by

Week 2

Uploaded by

Week 2

Data Processing Inequality1 (DPI)

H(X |E , X̂ ) = P[E = 0]H(X |X̂ , E = 0) + P[E = 1]H(X |X̂ , E = 1)

H(Pe ) + Pe log |X | ≥ H(X |X̂ ) ≥ H(X |Y ) [∵ I (X ; Y ) ≥ I (X ; Xˆ) by DPI]

Shannon’s work or the subject revolves around 3 main theorems:

All Codes ⊃ Singular ⊃ Non-singular7 / ⊃

Conversely, given a set of codeword lengths that satisfy this inequality,

Based on the arguments, the total no

Evidently, if li is reduced, D −li increases. So the objective and constraint

Properties of Instantaneous/Prefix Optimal Code

The first one is proved using a swapping argument. Let Cm be an optimal

The 2nd follows by trimming. If p5 has

if p4 and p5 are not siblings in (c), we

L(p) = L∗ (p ′ ) + pm−1 + pm , L(p ′ ) = L∗ (p) − pm−1 − pm ⇒

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.