0% found this document useful (0 votes)
14 views5 pages

Midit 11

hu

Uploaded by

raghad.yousif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views5 pages

Midit 11

hu

Uploaded by

raghad.yousif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

ECE 776 - Information theory (Spring 2011)

Midterm

Please give well-motivated answers.

Q1 (1 point) A binary i.i.d. sequence X 4 = (X1 , ..., X4 ) is passed through a memoryless


binary symmetric channel with cross-over probability p to produce the output sequence Y 4 =
Y4
(Y1 , ..., Y4 ). In other words, pY |X (1|0) = pY |X (0|1) = p and p(y 4 |x4 ) = pY |X (yi |xi ). Set
i=1
4
pX (1) = q and calculate pY (1). For p = 0.2 and q = 0.2, is x = (0, 0, 1, 0) a typical sequence
(4) (4)
in A0.1 (pX (x)) (i.e., = 0.1) and is y 4 = (1, 0, 0, 0) in A0.1 (pY (y))?

Sol.: We have

H(X) = H(pX (x)) = −0.2 log2 0.2 − 0.8 log2 0.8 = 0.722 bit.

Moreover,
pY (1) = q(1 − p) + (1 − q)p = 2 · 0.2 · 0.8 = 0.32,
so that
H(Y ) = H(pY (y)) = −0.32 log2 0.32 − 0.68 log2 0.68 = 0.904 bit.
For sequence x4 = (0, 0, 1, 0), we calculate

1X
4
1 1
− log2 p(x4 ) = − log2 p(xi ) = − (3 · log2 0.8 + log2 0.2)
4 4 i=1 4
= 0.823 ≤ H(X) + ,
(4)
and |0.823 − H(X)| > 0.1, so we can conclude that x4 ∈
/ A0.1 (pX (x)). For y 4 = (1, 0, 0, 0),
1 1
− log2 p(y 4 ) = − (3 · log2 0.68 + log2 0.32) = 0.828
4 4
(4)
and |0.828 − H(Y )| ≤ 0.1, so y 4 ∈ A0.1 (pY (y)).

Q2 (1 point) For the problem at the point above, with any 0 ≤ p,q ≤ 1, argue using the AEP
(n)
that the number of typical sequences at the output, A (pY (y)), is larger than, or equal to,
(n)
the number of typical sequences at the input, A (pX (x)), for any n. For which values of
p and q do the two sets have the same number of sequences? (you can neglect in your
arguments).

Sol.: The set of typical sequences at the input has around 2nH(q) sequences, whereas the set
of typical output sequences is about 2nH(p∗q) , where we denoted p ∗ q = q(1 − p) + (1 − q)p.
We thus need to argue that H(p ∗ q) ≥ H(q). To see this you can use the strict concavity of
the entropy. In fact,
H(q) = (1 − p)H(q) + pH(1 − q)
for any 0 ≤ p, q ≤ 1 since H(q) = H(1 − q). Now using Jensen’s inequality

(1 − p)H(q) + pH(1 − q) ≤ H(p ∗ q),

1
with equality if and only if p = 0, 1 or q = 1/2. Under the latter conditions, the sets of
typical sequences at input and output have the same number of sequences.
Alternatively (ed equivalently), you can argue that p ∗ q is strictly closer to 0.5 than q for
any p 6= 0 or 1 and q 6= 0.5.

Q3 (1 point) Consider now an asymmetric binary channel with pY |X (1|0) = 0.9 and pY |X (0|1) =
0.1. Moreover, the input is X ∼ Ber(0.2). Calculate the joint distribution of X and Y and
find the best estimate X̂ of X given Y as a function X̂ = g(Y ) (in the sense of minimizing
the probability of error). Calculate the probability of error of such an estimator and check
that it satisfies the Fano’s inequality. Interpret your results.

Sol.: The joint distribution pXY (x, y) is

X\Y 0 1
0 0.8 · 0.1 = 0.08 0.8 · 0.9 = 0.72 .
1 0.2 · 0.1 = 0.02 0.2 · 0.9 = 0.18

Note that X and Y are independent. In fact, the probability of receiving Y is independent
of which X is sent. We have pXY (x, y) = pX (x)pY (y) with pY (1) = 0.9.
The best estimator is thus x̂ = g(y) = 0 for both y = 0 and y = 1. The error probability is
thus

Pe = Pr[X̂ 6= X] = Pr[X 6= 0]
= 0.02 + 0.18 = 0.2.

From Fano inequality, we have

H(X|Y ) ≤ H(X|X̂) ≤ H(Pe ) + Pe (log2 (2) − 1) = H(Pe ),

but

H(X|Y ) = H(X|X̂) = H(X)


= H(0.8) = 0.72

and H(Pe ) = H(X) = H(0.8). Fano inequality is thus satisfied with equality.

Q4 (1 point) Consider an arbitrary binary random vector X n , not necessarily i.i.d. or even
stationary, such that H(X n ) ≥ nα for some 0 ≤ α ≤ 1. Assume that this sequence goes
through a memoryless binary symmetric channel with cross over probability p, producing
Y n . Then, the following inequality is well known:

H(Y n ) ≥ nH(p ∗ H −1 (α)),

where H −1 (α) is the inverse of the binary entropy function in the interval [0, 1/2] (i.e.,
q = H −1 (α) is the value of 0 ≤ q ≤ 1/2 such that H(q) = α) and a ∗ b = a(1 − b) + b(1 − a).
Provide an example of a random vector H(X n ) with H(X n ) = nα for which this inequality
is strict (i.e., H(Y n ) > nH(p ∗ H −1 (α))). You can build your example with n = 2, p = 0.1
and α = 12 . (Hint: The vector X n should not be i.i.d.! Moreover, H −1 (1/2) = 0.11).

2
Sol.: First, we note that if X n is i.i.d., then it must be Ber(H −1 (α)), since we want H(X n ) =
nH(X) = nH(H −1 (α)) = nα. In this case, we have H(Y n ) = nH(Y ) = nH(p ∗ H −1 (α)), so
that the inequality at hand holds with equality.
We then consider a constant vector X1 = X2 . To impose H(X 2 ) = H(Xi ) = 2 12 = 1, we
select Xi as Ber(1/2). To calculate, H(Y 2 ) = H(Y1 , Y2 ) we evaluate the joint distribution
pY1 Y2 (y1 , y2 ):

Y1 \Y2 0 1
2 2
0 1/2 · 0.9 + 1/2 · 0.1 = 0.41 0.1 · 0.9 = 0.09 .
1 0.1 · 0.9 = 0.09 1/2 · 0.12 + 1/2 · 0.92 = 0.41

We then obtain

H(Y1 , Y2 ) = −2 · 0.41 · log2 (0.41) − 2 · 0.09 · log2 (0.09)


= 1. 68.

Which we must compare to

nH(p ∗ H −1 (α)) = 2H(0.1 ∗ H −1 (1/2))


= 2H(0.1 ∗ 0.11)
= 2H(0.188) = 1.39.

We clearly have H(Y 2 ) > 2H(0.1 ∗ H −1 (1/2)).

P1 (2 points) Your friend picks uniformly at random a card X from a deck of 40 cards,
keeping it hidden from you. You can ask one question Q, out of a set of predetermined
questions, about the card X, and receive an answer A = f(X, Q), where f(·, ·) is a given
deterministic function. When selecting the question Q, the object X is unknown, and thus
Q and X must be independent.
a. You can choose a distribution p(q) on the questions Q. In order to maximize the usefulness
of such question, you want to maximize I(Q, A; X). Why? Show that I(Q, A; X) = H(A|Q)
and interpret this equality (Hint: Use the chain rule).
b. Suppose that a second friend can select a question Q knowing X in order to increase
I(Q, A; X) and help you out. In this case, Q is not independent of X. Quantify the increase
in mutual information I(Q, A; X) with respect to point a. and interpret your result.
c. Now suppose that two independent questions Q1 , Q2 bot distributed according to p(q)
are asked, independent of X, eliciting answers A1 and A2 , where Ai = f (X, Qi ) for i = 1, 2.
Show that two questions are less valuable than twice a single question in the sense that
I(X; Q1 , A1 , Q2 , A2 ) ≤ 2I(X; Q1 , A1 ) (Hint: The date processing inequality may be useful).

Sol.: a. Maximizing I(Q, A; X) maximizes the amount of information that the pair (Q, A)
brings about the card X. Using the fact that Q and X are independent, we have

I(Q, A; X) = I(Q; X) + I(A; X|Q)


= I(A; X|Q)
= H(A|Q) − H(A|X, Q)
= H(A|Q),

3
where the last line follows from the fact that A = f (X, Q). In other words, the amoung
of information that (Q, A) brings about the card X is equal to the uncertainty about the
answer A when asking question Q.
b. We now have

I(Q, A; X) = I(Q; X) + I(A; X|Q)


= I(Q; X) + H(A|Q),

so that the increase in mutual information is I(Q; X). This can be easily interpreted — The
choice of the question itself brings information about X.
c. We have
I(X; Q1 , A1 , Q2 , A2 ) = I(X; Q1 , A1 ) + I(X; Q2 , A2 |Q1 , A1 ).
We then need to prove that

I(X; Q2 , A2 |Q1 , A1 ) ≤ I(X; Q2 , A2 ) = I(X; Q1 , A1 ),

where the last equality is clear. This is equivalent to proving

H(Q2 , A2 |Q1 , A1 ) ≥ H(Q2 , A2 |X).

But this is true by the data processing inequality since (Q1 , A1 ) − X − (Q2 , A2 ).

P2 (2 points) A constant process X1 = X2 = ... = Xn with Xi ∼ Ber(0.1) (i.e., Pr[Xi = 1] =


0.1) is passed through a memoryless binary symmetric channel with cross-over probability
p = 0.3, producing output Y n = (Y1 , ..., Yn ).
a. Is Y n stationary? Is it i.i.d.? Is it ergodic?
b. Demonstrate a lossless compression scheme that achieves compression rates arbitrarily
close to the entropy rate H(Y).
c. Answer the two questions above for the case where the input X n is i.i.d. Ber(0.1).

Sol.:
a. Given the generation mechanism, we have that with probability 0.1, the process Y n is
i.i.d. Ber(0.7), while with probability 0.9 process Y n is i.i.d. Ber(0.3). The output process
Y n is stationary since the mechanism used for generation does not depend on time. However,
it is neither i.i.d. nor ergodic. To show this, it is enough to notice that the empirical average

1X
n
Yi
n i=1

does not converge to E[Y ] = 0.1 · 0.7+ 0.9 · 0.3 = 0.34, but to a random variable equal to
0.7 with probability 0.1 and equal to 0.3 with probability 0.9.
b. A lossless compression scheme that achieves compression rates arbitrarily close to the
entropy rate H(Y) can be devised by applying Shannon codes on blocks of sufficient length
as discussed in class. We recall that the scheme works for any stationary process.
c. In this case, the process Y n is i.i.d. Ber(0.1 · 0.7 + 0.9 · 0.3 = 0.34). Therefore, lossless
compression schemes that achieves compression rates arbitrarily close to the entropy H(Y) =
H(Y ) can be devised based on typicality or using the approach mentioned above.

4
P3 (2 points) Consider a Markov chain X n = X1 , ..., Xn over the alphabet {0, 1, 2} with
transition probability Pr[Xm+1 = i|Xm = j] defined by the following table
i\j 0 1 2
0 1/2 1/8 3/8
1 3/8 1/2 1/8
2 1/8 3/8 1/2
Assume that the initial distribution is Pr[X1 = i] = 1/3 for i = 0, 1, 2.
a. Is X n stationary? (Hint: Check whether the initial distribution is the same as the
stationary distribution)
b. Find the entropy H(X) and the entropy rate H(X ). Which one is larger? Why?
c. Consider the process Z n = Z1 , ..., Zn , where Z1 = X1 and Zm = (Xm − Xm−1 ) mod 3
for m = 2, ..., n (recall that (0 + 3k) mod 3 = 0, (1 + 3k) mod 3 = 1, (2 + 3k) mod 3 = 2
for k = ..., −2, −1, 0, 1, 2, ...). Focus on Z2 , ..., Zn (that is, excluding Z1 ). Is this process
stationary? Is it i.i.d.?
d. Focusing on Z2 , ..., Zn , calculate the entropy H(Z) and the entropy rate H(Z).
Sol.: a. The stationary distribution (μ0 , μ1 , μ2 ) satisfies the "equilibrium" conditions
1 3 1
μ0 = μ0 + μ1 + μ2
2 8 8
1 1 3
μ1 = μ + μ + μ
8 0 2 1 8 2
3 1 1
μ2 = μ0 + μ1 + μ2 ,
8 8 2
which are clearly satisfied by the initial distribution. So, the process is stationary.
b. Since the process is a stationary Markov chain, we can calculate
H(X) = H((1/3, 1/3, 1/3)) = log2 3 = 1.58 bit,
and
H(X ) = H(X2 |X1 ) = H((1/2, 1/8, 3/8))
1 1 1 1 3 3
= − log2 − log2 − log2
2 2 8 8 8 8
= 1.4 bit.
As we expect, H(X ) < H(X), since the process has memory.
c. The process is i.i.d. (and thus also stationary). In fact, Zm for m ≥ 2 is distributed as

⎨ 1/2 z = 0
p(z) = 1/8 z = 1

3/8 z = 2
independently of all previous Zm−i for i ≥ 1.
d. The entropy is
H(Z) = H((1/2, 1/8, 3/8))
= 1.4 bit,
and the entropy rate is H(Z) = H(Z).

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy