Advance Digital Communication
Advance Digital Communication
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 1 / 31
Last time
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 2 / 31
Information Content: Review
1
h(x) = log2
p(x)
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 3 / 31
Entropy: Review
0 ≤ H(X ) ≤ log|X |
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 4 / 31
This time
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 5 / 31
Outline
1 Decomposability of Entropy
3 Mutual Information
Definition
Joint and Conditional Mutual Information
4 Wrapping up
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 6 / 31
Decomposability of Entropy
Example 1 (Mackay, 2003)
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 7 / 31
Decomposability of Entropy
Example 1 (Mackay, 2003)
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 7 / 31
Decomposability of Entropy
Example 1 (Mackay, 2003)
p(X = 0) =
p(X = 1) =
p(X = 2) =
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 7 / 31
Decomposability of Entropy
Example 1 (Mackay, 2003)
1
p(X = 0) = 2
p(X = 1) =
p(X = 2) =
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 7 / 31
Decomposability of Entropy
Example 1 (Mackay, 2003)
1
p(X = 0) = 2
1
p(X = 1) = 4
p(X = 2) =
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 7 / 31
Decomposability of Entropy
Example 1 (Mackay, 2003)
1
p(X = 0) = 2
1
p(X = 1) = 4
1
p(X = 2) = 4
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 7 / 31
Decomposability of Entropy
Example 1 (Mackay, 2003) — Cont’d
By definition,
1 1 1
H(X ) = log 2 + log 4 + log 4 = 1.5 bits.
2 4 4
But imagine learning the value of X gradually:
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 8 / 31
Decomposability of Entropy
Example 1 (Mackay, 2003) — Cont’d
By definition,
1 1 1
H(X ) = log 2 + log 4 + log 4 = 1.5 bits.
2 4 4
But imagine learning the value of X gradually:
1 First we learn whether X = 0:
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 8 / 31
Decomposability of Entropy
Example 1 (Mackay, 2003) — Cont’d
By definition,
1 1 1
H(X ) = log 2 + log 4 + log 4 = 1.5 bits.
2 4 4
But imagine learning the value of X gradually:
1 First we learn whether X = 0:
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 8 / 31
Decomposability of Entropy
Example 1 (Mackay, 2003) — Cont’d
By definition,
1 1 1
H(X ) = log 2 + log 4 + log 4 = 1.5 bits.
2 4 4
But imagine learning the value of X gradually:
1 First we learn whether X = 0:
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 8 / 31
Decomposability of Entropy
Generalization
p|X |
p2
H(p) = H(p1 , 1 − p1 ) + (1 − p1 )H ,...,
1 − p1 1 − p1
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 9 / 31
Decomposability of Entropy
Generalization
p|X |
p2
H(p) = H(p1 , 1 − p1 ) + (1 − p1 )H ,...,
1 − p1 1 − p1
(1 − p1 ) = probability of X 6= x0
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 9 / 31
Decomposability of Entropy
Generalization
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 10 / 31
1 Decomposability of Entropy
3 Mutual Information
Definition
Joint and Conditional Mutual Information
4 Wrapping up
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 11 / 31
Entropy in Information Theory
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 12 / 31
Relative Entropy
Definition
The relative entropy or Kullback-Leibler (KL) divergence between two
probability distributions p(X ) and q(X ) is defined as:
X p(x) p(X )
DKL (pkq) = p(x) log = Ep(X ) log .
q(x) q(X )
x∈X
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 13 / 31
Relative Entropy
Definition
The relative entropy or Kullback-Leibler (KL) divergence between two
probability distributions p(X ) and q(X ) is defined as:
X p(x) p(X )
DKL (pkq) = p(x) log = Ep(X ) log .
q(x) q(X )
x∈X
Note:
I Both p(X ) and q(X ) are defined over the same alphabet X
Conventions:
0 def 0 def p def
0 log =0 0 log =0 p log =∞
0 q 0
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 13 / 31
Relative Entropy
Properties
DKL (pkq) ≥ 0
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 14 / 31
Relative Entropy
Properties
DKL (pkq) ≥ 0
DKL (pkq) = 0 ⇔ p = q
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 14 / 31
Relative Entropy
Properties
DKL (pkq) ≥ 0
DKL (pkq) = 0 ⇔ p = q
DKL (pkq) 6= DKL (qkp)
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 14 / 31
Relative Entropy
Properties
DKL (pkq) ≥ 0
DKL (pkq) = 0 ⇔ p = q
DKL (pkq) 6= DKL (qkp)
I Not a true distance since is not symmetric and does not satisfy the
triangle inequality
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 14 / 31
Relative Entropy
Properties
DKL (pkq) ≥ 0
DKL (pkq) = 0 ⇔ p = q
DKL (pkq) 6= DKL (qkp)
I Not a true distance since is not symmetric and does not satisfy the
triangle inequality
I Hence, “KL divergence” rather than “KL distance”
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 14 / 31
Relative Entropy
Uniform q
1
Let q correspond to a uniform distribution: q(x) = |X |
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 15 / 31
Relative Entropy
Example (from Cover & Thomas, 2006)
Let X ∈ {0, 1} and consider the distributions p(X ) and q(X ) such that:
p(X = 1) = θp p(X = 0) = 1 − θp
q(X = 1) = θq q(X = 0) = 1 − θq
What distributions are these?
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 16 / 31
Relative Entropy
Example (from Cover & Thomas, 2006)
Let X ∈ {0, 1} and consider the distributions p(X ) and q(X ) such that:
p(X = 1) = θp p(X = 0) = 1 − θp
q(X = 1) = θq q(X = 0) = 1 − θq
What distributions are these?
1 1
Compute DKL (pkq) and DKL (qkp) with θp = 2 and θq = 4
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 16 / 31
Relative Entropy
Example (from Cover & Thomas, 2006) — Cont’d
θp 1 − θp
DKL (pkq) = θp log + (1 − θp ) log
θq 1 − θq
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 17 / 31
Relative Entropy
Example (from Cover & Thomas, 2006) — Cont’d
θp 1 − θp
DKL (pkq) = θp log + (1 − θp ) log
θq 1 − θq
1 1
1 2 1 2 1
= log 1
+ log 3
=1− log 3 ≈ 0.2075 bits
2 4
2 4
2
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 17 / 31
Relative Entropy
Example (from Cover & Thomas, 2006) — Cont’d
θp 1 − θp
DKL (pkq) = θp log + (1 − θp ) log
θq 1 − θq
1 1
1 2 1 2 1
= log 1
+ log 3
=1− log 3 ≈ 0.2075 bits
2 4
2 4
2
θq 1 − θq
DKL (qkp) = θq log + (1 − θq ) log
θp 1 − θp
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 17 / 31
Relative Entropy
Example (from Cover & Thomas, 2006) — Cont’d
θp 1 − θp
DKL (pkq) = θp log + (1 − θp ) log
θq 1 − θq
1 1
1 2 1 2 1
= log 1
+ log 3
=1− log 3 ≈ 0.2075 bits
2 4
2 4
2
θq 1 − θq
DKL (qkp) = θq log + (1 − θq ) log
θp 1 − θp
1 3
1 4 3 4 3
= log 1
+ log 1
= −1 + log 3 ≈ 0.1887 bits
4 2
4 2
4
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 17 / 31
1 Decomposability of Entropy
3 Mutual Information
Definition
Joint and Conditional Mutual Information
4 Wrapping up
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 18 / 31
Mutual Information
Definition
Let X , Y be two r.v. with joint distribution p(X , Y ) and marginals p(X )
and p(Y ):
Definition
The mutual information I (X ; Y ) is the relative entropy between the joint
distribution p(X , Y ) and the product distribution p(X )p(Y ):
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 19 / 31
Mutual Information
Definition
Let X , Y be two r.v. with joint distribution p(X , Y ) and marginals p(X )
and p(Y ):
Definition
The mutual information I (X ; Y ) is the relative entropy between the joint
distribution p(X , Y ) and the product distribution p(X )p(Y ):
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 19 / 31
Relationship between Entropy and Mutual Information
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 20 / 31
Relationship between Entropy and Mutual Information
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 20 / 31
Relationship between Entropy and Mutual Information
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 20 / 31
Relationship between Entropy and Mutual Information
= H(X ) − H(X |Y )
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 20 / 31
Relationship between Entropy and Mutual Information
= H(X ) − H(X |Y )
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 20 / 31
Mutual Information:
Properties
I (X ; Y ) ≥ 0 why?
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 21 / 31
Mutual Information:
Properties
I (X ; Y ) ≥ 0 why?
I (X ; Y ) = I (Y ; X )
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 21 / 31
Mutual Information:
Properties
I (X ; Y ) ≥ 0 why?
I (X ; Y ) = I (Y ; X )
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 21 / 31
Mutual Information:
Properties
I (X ; Y ) ≥ 0 why?
I (X ; Y ) = I (Y ; X )
Finally:
H(X) H(Y )
H(X, Y )
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 22 / 31
Mutual Information
Example 1 (from Mackay, 2003)
p(X = 0) = p p(X = 1) = 1 − p
p(Y = 0) = q p(Y = 1) = 1 − q
Z = (X + Y ) mod 2
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 23 / 31
Mutual Information
Example 1 (from Mackay, 2003)
p(X = 0) = p p(X = 1) = 1 − p
p(Y = 0) = q p(Y = 1) = 1 − q
Z = (X + Y ) mod 2
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 23 / 31
Mutual Information
Example 1 (from Mackay, 2003)
p(X = 0) = p p(X = 1) = 1 − p
p(Y = 0) = q p(Y = 1) = 1 − q
Z = (X + Y ) mod 2
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 23 / 31
Mutual Information
Example 1 (from Mackay, 2003) — Solution (a)
(a) As X ⊥⊥ Y and q = 1/2 the noise will flip the input with probability
q = 0.5 regardless of the original input distribution. Therefore:
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 24 / 31
Mutual Information
Example 1 (from Mackay, 2003) — Solution (a)
(a) As X ⊥⊥ Y and q = 1/2 the noise will flip the input with probability
q = 0.5 regardless of the original input distribution. Therefore:
Hence:
I (X ; Z ) = H(Z ) − H(Z |X ) = 1 − 1 = 0
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 24 / 31
Mutual Information
Example 1 (from Mackay, 2003) — Solution (a)
(a) As X ⊥⊥ Y and q = 1/2 the noise will flip the input with probability
q = 0.5 regardless of the original input distribution. Therefore:
Hence:
I (X ; Z ) = H(Z ) − H(Z |X ) = 1 − 1 = 0
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 24 / 31
Mutual Information
Example 1 (from Mackay, 2003) — Solution (b)
(b)
def
` = p(Z = 0) = p(X = 0) × p(no flip) + p(X = 1) × p(flip)
= pq + (1 − p)(1 − q)
= 1 + 2pq − q − p
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 25 / 31
Mutual Information
Example 1 (from Mackay, 2003) — Solution (b)
(b)
def
` = p(Z = 0) = p(X = 0) × p(no flip) + p(X = 1) × p(flip)
= pq + (1 − p)(1 − q)
= 1 + 2pq − q − p
Similarly:
p(Z = 1) = p(X = 1) × p(no flip) + p(X = 0) × p(flip)
= (1 − p)q + p(1 − q)
= q + p − 2pq
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 25 / 31
Mutual Information
Example 1 (from Mackay, 2003) — Solution (b)
(b)
def
` = p(Z = 0) = p(X = 0) × p(no flip) + p(X = 1) × p(flip)
= pq + (1 − p)(1 − q)
= 1 + 2pq − q − p
Similarly:
p(Z = 1) = p(X = 1) × p(no flip) + p(X = 0) × p(flip)
= (1 − p)q + p(1 − q)
= q + p − 2pq
and:
I (Z ; X ) = H(Z ) − H(Z |X )
= H(`, 1 − `) − H(q, 1 − q) why?
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 25 / 31
1 Decomposability of Entropy
3 Mutual Information
Definition
Joint and Conditional Mutual Information
4 Wrapping up
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 26 / 31
Joint Mutual Information
I (X ; Y ) = H(X ) − H(X |Y )
Reduction in uncertainty in X due to knowledge of Y
I (X1 , . . . , Xn ; Y1 , . . . , Ym ) = I (Y1 , . . . , Ym ; X1 , . . . , Xn )
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 27 / 31
Conditional Mutual Information
The conditional mutual information between X and Y given Z = zk :
I (X ; Y |Z = zk ) = H(X |Z = zk ) − H(X |Y , Z = zk ).
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 28 / 31
Conditional Mutual Information
The conditional mutual information between X and Y given Z = zk :
I (X ; Y |Z = zk ) = H(X |Z = zk ) − H(X |Y , Z = zk ).
I (X ; Y |Z ) = H(X |Z ) − H(X |Y , Z )
p(X , Y |Z )
= Ep(X ,Y ,Z ) log
p(X |Z )p(Y |Z )
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 28 / 31
Conditional Mutual Information
The conditional mutual information between X and Y given Z = zk :
I (X ; Y |Z = zk ) = H(X |Z = zk ) − H(X |Y , Z = zk ).
I (X ; Y |Z ) = H(X |Z ) − H(X |Y , Z )
p(X , Y |Z )
= Ep(X ,Y ,Z ) log
p(X |Z )p(Y |Z )
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 28 / 31
Conditional Mutual Information
The conditional mutual information between X and Y given Z = zk :
I (X ; Y |Z = zk ) = H(X |Z = zk ) − H(X |Y , Z = zk ).
I (X ; Y |Z ) = H(X |Z ) − H(X |Y , Z )
p(X , Y |Z )
= Ep(X ,Y ,Z ) log
p(X |Z )p(Y |Z )
3 Mutual Information
Definition
Joint and Conditional Mutual Information
4 Wrapping up
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 29 / 31
Summary
Decomposability of entropy
Relative entropy
Mutual information
Reading: Mackay §2.5, Ch 8; Cover & Thomas §2.3 to §2.5
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 30 / 31
Next time
Jensen’s inequality
Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 31 / 31