0% found this document useful (0 votes)
5 views34 pages

CH 6

Chapter 6 introduces Information Theory, covering mathematical models for information sources, measures of information, and coding techniques such as lossless and lossy data compression. It discusses key concepts like mutual information, joint and conditional entropy, and the Huffman coding algorithm. The chapter also explores entropy and mutual information for continuous random variables, providing formulas and properties relevant to the field of digital communications.

Uploaded by

李峻毅
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views34 pages

CH 6

Chapter 6 introduces Information Theory, covering mathematical models for information sources, measures of information, and coding techniques such as lossless and lossy data compression. It discusses key concepts like mutual information, joint and conditional entropy, and the Huffman coding algorithm. The chapter also explores entropy and mutual information for continuous random variables, providing formulas and properties relevant to the field of digital communications.

Uploaded by

李峻毅
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications


Chapter 6: An Introduction to Information Theory

Instructor: Jia-Chin Lin


Department of Communication Engineering,
National Central University
Email: jiachin@ieee.org

✫ ✪
Instructor: Jia-Chin Lin 1

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Outline
• Mathematical Models for Information Sources

• A Logarithmic Measure Of Information

• Lossless Coding of Information Sources

• Lossy Data Compression

• Channel Model and Channel Capacity

✫ ✪
Instructor: Jia-Chin Lin 2

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Mathematical Models for Information


Sources
n o
• Each letter in alphabet x1 , x2 , . . . , xL has a given
probabilily pk

pk = P [X = xk ], 1≤k≤L

• Where
L
X
pk = 1
k=1

• An analog source has an output waveform x(r) that is a sample


function of a stochastic processX(t),use sampling theorem may be
used to represent

X n n
X(t) = X( )sinc[2W (t − )]
n=−∞
2W 2W

✫ ✪
Instructor: Jia-Chin Lin 3

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

A Logarithmic Measure Of Information


• I(x; y) is mutual information between x and y

P [x|y]
I(X; Y ) = log
P [x]
• the mutual information between random variables X and Y is
defined as the average of I(x; y)
XX
I(X; Y ) = P [X = x, Y = y]I(X; Y )
x∈X y∈Y
XX P [x|y]
= P [X = x, Y = y] log
P [x]
x∈X y∈Y

✫ ✪
Instructor: Jia-Chin Lin 4

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

A Logarithmic Measure Of Information


• I(X; Y ) = I(Y ; X)
• I(X; Y ) ≥ 0 with equality if and only if X and Y are independent
• I(X; Y ) ≤ min|X|, |Y | where|X|and|Y | denote the size of the
alphabets
• When the random variables X and Y are statistically
independent,P [x|y] = P [x],thenI(X; Y ) = 0
1
I(x; y) = log = − log P [X = x]
P [X = x]
XX
I(x; y) =− P [X = x, Y = y] log P [X = x]
x∈X y∈Y
X
=− P [X = x] log P [X = x]
x∈X
X
H(X) = − P [X = x] log P [X = x]

✫ ✪
x∈X

Instructor: Jia-Chin Lin 5



Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

0 ≤ H(X) ≤ log |X| (1)

I(X; Y ) = H(X) (2)

I(X; Y) ≤ min {H(X), H(Y)} (3)

Y = g(x), then H(Y ) ≤ H(X) (4)


✫ ✪
Instructor: Jia-Chin Lin 6

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Joint and Conditional Entropy


• Joint entropy of X and Y
X
H(X, Y ) = − P [X = x, Y = y] log P [X = x, Y = y]
(x,y)∈X×Y

• When the value of random variable X is known to be x


X
H(Y |X = x) = − P [Y = y|X = x] log P [Y = y|X = x]
(x,y)∈X×Y

P
H(Y |X = x) = P [X = x]H(Y |X = x)
x∈X
P
=− P [X = x, Y = y] log P [X = x, Y = y]
(x,y)∈X×Y

• Then
H(X, Y ) = H(X) + H(Y |X)

✫ ✪
Instructor: Jia-Chin Lin 7

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

• Joint entropy
X
H(X1 , X2 , . . . , Xn ) =− P [X1 = x1 , X2 = x2 , . . . , Xn = xn ]
x1 ,x2,... ,xn

× log P [X1 = x1 , X2 = x2 , . . . , Xn = xn−1 ]


= H(X1 ) + H(X2 |X1 ) + H(X3 |X1 , X2 ) + · · ·
+H(Xn |X1 , X2 , . . . , Xn−1 )
n
X
≤ H(Xi )
i=1
= nH(x)

✫ ✪
Instructor: Jia-Chin Lin 8

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Lossless Source
• Since the number of occurrences of ai in x is roughly npi and the
source is memoryless

N npi
Q
log P [X =x] ≈ log (pi )
i=1
N
X
= npi log pi
i=1
= −nH(X)

• Hence,
P [X =x] ≈ 2−nH(x)
• This states that all typical sequences have roughly the same
probability, and this common probability is 2−nH(x)

✫ ✪
Instructor: Jia-Chin Lin 9

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Discrete Stationary Sources


• The entropy of a block of random variableX1 , X2 , ....Xk
k
X
H(X1 X2 · · · Xk ) = H( Xi | X1 X2 · · · Xi−1 )
i=1

• The entropy per letter for the k-symbol block


1
Hk (X) = H(X1 X2 · · · Xk )
k
• As k → ∞
∆ 1
H∞ (X) = Hk (X) = lim H(X1 X2 · · · Xk )
k k→∞

• From previous equation

H∞ (X) = lim H( Xk | X1 X2 · · · Xk−1 )


k→∞

✫ ✪
Instructor: Jia-Chin Lin 10

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

• Define

H( Xk | X1 X2 · · · Xk−1 ) ≤ H( Xk−1 | X1 X2 · · · Xk−2 )

• for k ≥ 2. From our previous result that conditioning on a random


variable cannot increase entropy, we have

H( Xk | X1 X2 · · · Xk−1 ) ≤ H( Xk−1 | X2 X3 · · · Xk−1 )

H( Xk | X2 X3 · · · Xk−1 ) = H( Xk−1 | X2 X3 · · · Xk−2 )


• This result demonstrates H( Xk | X1 X2 · · · Xk−1 ) is a nonincreasing
sequence in k
• We have result

Hk (X) ≥ H( Xk | X1 X2 · · · Xk−1 )

• Define the entropy rate of the source in terms of the conditional


entropy
✫ ✪
Instructor: Jia-Chin Lin 11

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

• From the definition of Hk (X)
1
Hk (X) = [H(X1 X2 · · · Xk−1 ) + H(Xk |X1 · · · Xk−1 )]
k
1
= [(k − 1)Hk−1 (X) + H(Xk |X1 · · · Xk−1 )]
k
k−1 1
≤ Hk−1 (X) + Hk (X)
k k
• Result
Hk (X) ≤ Hk−1 (X)
• Hk (X) and the conditional entropy Xk−1 ) + H(Xk |X1 · · · Xk−1 ) are
both nonnegative and nonincreasing with k, both limits must exist.
Express Hk+j (X)
1
Hk+j (X) = H(X1 X2 · · · Xk−1 )
k+j
1
+ [H( Xk | X1 X2 · · · Xk−1 ) + H(Xk+1 |X1 · · · Xk )
k+j
+ · · · + H(Xk+j |X1 · · · Xk+j−1 )]
✫ ✪
Instructor: Jia-Chin Lin 12

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

1 j+1
Hk+j (X) = H(X1 X2 · · · Xk−1 ) + H( Xk | X1 X2 · · · Xk−1 )
k+j k+j
• As j → ∞
H∞ (X) ≤ H( Xk | X1 X2 · · · Xk−1 )
• It is valid for k → ∞

H∞ (X) ≤ lim H( Xk | X1 X2 · · · Xk−1 )


k→∞

• From Equation 6.3-12

H∞ (X) ≥ lim H( Xk | X1 X2 · · · Xk−1 )


k→∞

• The entropy rate of a discrete stationary source is defined


1
H∞ (X) = lim H(Xk |X1 , X2 , · · · , Xk−1 ) = lim H(X1 , X2 , · · · , Xk−1 )
k→∞ k→∞ k

✫ ✪
Instructor: Jia-Chin Lin 13

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

The Huffman Coding


• Huffman (1952) devised a variable-length encoding algorithm,
based on the Source letter probabilitiesP (xi ), i = 1, 2, ...., L
• This algorithm is optimum in the sense that the average number of
binary digits required to represent the source symbols are a
minimum, subject to the constraint that the code words satisfy the
prefix condition ,which allows the received sequence to be
uniquely and instantaneously decodable.

✫ ✪
Instructor: Jia-Chin Lin 14

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

The Huffman Coding

✫ ✪
Instructor: Jia-Chin Lin 15

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

The Huffman Coding

✫ ✪
Instructor: Jia-Chin Lin 16

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Entropy and Mutual Information for Continuous Random Variables
• Mutual information
Z ∞ Z ∞
p(y|x)p(x)
I(X; Y ) = p(x)p(y|x) log dxdy
−∞ −∞ p(x)p(y)

• Properties
– I(X; Y ) = I(Y ; X)
– I(X; Y ) ≥ 0
– I(X; Y ) = H(X) − H(X|Y ) = H(Y ) − H(Y |X)
• Conditional differential entropy
Z ∞Z ∞
H(X|Y ) = − p(x, y) log p(x | y)dxdy
−∞ −∞

✫ ✪
Instructor: Jia-Chin Lin 17

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Entropy and Mutual Information for Continuous Random Variables
The random variable X is discrete and Y is continuous. X has
possible outcomes xi , i = 1, 2, . . . , n. When X and Y are statistically
dependent, the marginal pdf p(y) can be expressed as:
n
X
p(y) = p(y|xi )P[xi ]
i=1

The mutual information provided about the event X = xi by the


occurrence of the event Y = y is

p(y|xi )P[xi ]
I(xi ; y) = log
p(y)P[xi ]
p(y|xi )
= log
p(y)

✫ ✪
Instructor: Jia-Chin Lin 18

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Entropy and Mutual Information for Continuous Random Variables
Then the average mutual information between X and Y is
n Z ∞
X p(y|xi )
I(X; Y ) = p(y|xi )P[xi ] log dy
i=1 −∞ p(y)

✫ ✪
Instructor: Jia-Chin Lin 19

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

ex 6.4-1. Suppose that X is a discrete random variable with two
equally probable outcomes x1 = A and x2 = −A. Let the conditional
pdfs p(y|xi ), i = 1, 2, be Gaussian with mean xi and variance σ 2 .

1 (y − A)2
p(y|A) = √ exp(− 2
)
2πσ 2σ

1 (y + A)2
p(y| − A) = √ exp(− 2
)
2πσ 2σ
The average mutual information
Z ∞ 
1 p(y|A) p(y| − A)
I(X; Y ) = p(y|A) log + p(y| − A) log dy
2 −∞ p(y) p(y)

where
1
p(y) = [p(y|A) + p(y| − A)]
2

✫ ✪
Instructor: Jia-Chin Lin 20

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Channel Model and Channel Capacity


A general communication channel is described in terms of its set of
possible inputs and outputs
Denote X : the input alphabet
Denote Y : the output alphabet
the conditional probability is denoted by P[y1 , y2 , . . . , yn |x1 , x2 , . . . , xn ]
A channel is called memoryless if we have
n
Y
P[y|x] = P[yi |xi ] for all n
i=1

In other words, a channel is memoryless if the output at time i depends


only on the input at time i.

✫ ✪
Instructor: Jia-Chin Lin 21

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Binary Symmetric Channel

✫ ✪
Instructor: Jia-Chin Lin 22

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Binary Symmetric Channel


The diagram of a binary symmetric channel

P [Y = 0|X = 1] = P [Y = 1|X = 0] = p

P [Y = 1|X = 1] = P [Y = 0|X = 0] = 1 − p

✫ ✪
Instructor: Jia-Chin Lin 23

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Discrete Memoryless Channel


The BSC is a special case of a more general discrete input, discrete
output channel. The DMC is a channel model in which the input and
output alphabets X and Y are discrete sets and the channel is
memoryless.

✫ ✪
Instructor: Jia-Chin Lin 24

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

AWGN Waveform Channel


We consider a channel model in which both inputs and outputs are
waveform. Assume the channel has given a bandwidth W , with ideal
frequency response C(f ) = 1 within the frequency range [−W, +W ].

x(t): the band-limited input


y(t): the corresponding output
N0
n(t): the AWGN noise with power spectral density
2

y(t) = x(t) + n(t)

✫ ✪
Instructor: Jia-Chin Lin 25

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

AWGN Waveform Channel


The channel input is subject to a power constraint of the form

E[X 2 (t)] ≤ P

which for ergodic inputs results in an input power constraint of the form
T /2
1
Z
lim x2 (t)dt ≤ P
T →∞ T −T /2

✫ ✪
Instructor: Jia-Chin Lin 26

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

AWGN Waveform Channel


We can express x(t), y(t), and n(t) in the form
X
x(t) = xj φj (t)
j
X
n(t) = nj φj (t)
j
X
y(t) = yj φj (t)
j

where xj , yj , and nj are the sets of coefficients in the corresponding


expansions
Z ∞ Z ∞
yj = y(t)φj (t)dt = (x(t) + n(t))φj (t)dt
−∞ −∞
= xj + nj

✫ ✪
Instructor: Jia-Chin Lin 27

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

AWGN Waveform Channel


Since n,j s are iid zero-mean Gaussian random variable with variance
N0
, it follows that
2
1 (yj − xj )2
p(yj |xj ) = √ exp − , i = 1, 2, . . .
πN0 N0

and by the independence of n,j s

N
Y
p(y1 , y2 , . . . , yN |x1 , x2 , . . . , xN ) = p(yj |xj )
j=1

for any N .

✫ ✪
Instructor: Jia-Chin Lin 28

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

AWGN Waveform Channel


The power constraint on the input waveform can be written as

T /2 2W T
1 1 X 2
Z
lim x2 (t)dt = lim xj
T →∞ T −T /2 T →∞ T
j=1
1
= lim× 2W T E[X 2 ]
T →∞ T

= 2W E[X 2 ]
≤ P

We conclude that in the discrete-time channel model we have


P
E[X 2 ] ≤ 2
W

✫ ✪
Instructor: Jia-Chin Lin 29

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Channel Capacity
• The entropy and the rate distortion function provide the minimum
required rate for compression of a discrete memoryless source
subject to the condition that it can be losslessly recovered.
• We introduce a third fundamental quantity called channel capacity
that provides the maximum rate at which reliable communication
over a channel is possible.

✫ ✪
Instructor: Jia-Chin Lin 30

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Channel Capacity
For an arbitrary DMC the capacity is given by

C = max I (X; Y )
p

where the maximization is over all PMFs of the form


p = p1 , p2 , . . . , p|X | on the input alphabet X . The p,i s naturally


satisfy the constraints

pi ≥ 0 i = 1, 2, . . . , |X |

|X |
X
pi = 1
i=1

The unit of C are bits per transmission or bits per channel use.

✫ ✪
Instructor: Jia-Chin Lin 31

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

ex 6.5-1. For an BSC, due to the symmetry of the channel, the
capacity is achieved for a uniform input distribution, i.e., for
1
P [Y = 0|X = 1] = P [Y = 1|X = 0] = . The maximum mutual
2
information is given by

C = 1 + p log2 p + (1 − p) log2 (1 − p) = 1 − H(p)

✫ ✪
Instructor: Jia-Chin Lin 32

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Channel Capacity with Input Power Constraint

The channel model is equivalent to 2W uses per second of


P
discrete-time AWGN channel input power constraint of and noise
2W
2 N0
variance of σ = . The capacity of this discrete-time channel is
2
P
 
1
C = log2 1 + 2W 
 
2 N0
 2 
1 P
= log2 1 + bits/channel use
2 N0 W

✫ ✪
Instructor: Jia-Chin Lin 33

Ch. 6: An Introduction to Information Theory CO6019-∗: Digital Communications

Channel Capacity with Input Power Constraint

The capacity of the continuous-time channel is given by


 
1 P
C = 2W × log2 1 +
2 N0 W
 
P
= W log2 1 + bits/s
N0 W

To see how the capacity changes as W → ∞, we need use the relation


ln(1 + x) → x as x → 0
 
P P
C∞ = lim W log2 1 + = (log2 e)
W →∞ N0 W N0

✫ ✪
Instructor: Jia-Chin Lin 34

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy