Advance Digital Communication

This document summarizes a lecture on information theory. It reviews concepts like information content, entropy, and joint/conditional entropies. It then previews topics to be covered in the current lecture, including the decomposability of entropy, relative entropy/KL divergence, and mutual information. Mutual information is defined as the reduction in uncertainty of one random variable due to knowledge of another. The document provides an example to illustrate the decomposability of entropy into separate entropies for binary variables.

Uploaded by

Muhammad Abdul Jabbar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views66 pages

Advance Digital Communication

Uploaded by

Muhammad Abdul Jabbar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 66

COMP2610/COMP6261 - Information Theory

Lecture 7: Relative Entropy and Mutual Information

Mark Reid and Aditya Menon

Research School of Computer Science

The Australian National University

August 12th, 2014

Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 1 / 31
Last time

Information content and entropy: definition and computation

Entropy and average code length
Entropy and minimum expected number of binary questions
Joint and conditional entropies, chain rule

Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 2 / 31
Information Content: Review

Let X be a random variable with outcomes in X

Let p(x) denote the probability of the outcome x ∈ X

The (Shannon) information content of outcome x is

1
h(x) = log2
p(x)

As p(x) → 0, h(x) → +∞ (rare outcomes are more informative)

Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 3 / 31
Entropy: Review

The entropy is the average information content of all outcomes:

X 1
H(X ) = p(x)log2
x
p(x)

Entropy is minimised if p is peaked, and maximized if p is uniform:

0 ≤ H(X ) ≤ log|X |

Entropy is related to minimal number of bits needed to describe a random

variable

Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 4 / 31
This time

The decomposability property of entropy

Relative entropy and divergences
Mutual information

Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 5 / 31
Outline

1 Decomposability of Entropy

2 Relative Entropy / KL Divergence

3 Mutual Information
Definition
Joint and Conditional Mutual Information

4 Wrapping up

Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 6 / 31
Decomposability of Entropy
Example 1 (Mackay, 2003)

Let X ∈ {0, 1, 2} be a r.v. created by the following process:

1 Flip a fair coin to determine whether X = 0

Mark Reid and Aditya Menon (ANU) COMP2610/COMP6261 - Information Theory Semester 2 7 / 31
Decomposability of Entropy
Example 1 (Mackay, 2003)