0% found this document useful (0 votes)
108 views85 pages

Signal Processing

1) The document provides exercises on concepts in Hilbert spaces and statistical signal processing. 2) One exercise asks to verify Parseval's equality relating the L2 norm of a vector to the sum of inner products of its coefficients in an orthonormal basis. 3) Another exercise asks to prove that projecting a vector onto a subspace yields the optimal approximation in that subspace in terms of minimizing distance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views85 pages

Signal Processing

1) The document provides exercises on concepts in Hilbert spaces and statistical signal processing. 2) One exercise asks to verify Parseval's equality relating the L2 norm of a vector to the sum of inner products of its coefficients in an orthonormal basis. 3) Another exercise asks to prove that projecting a vector onto a subspace yields the optimal approximation in that subspace in terms of minimizing distance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 85

Statistical Signal Processing and

Applications: exercise book

Andrea Ridolfi
Luciano Sbaiz
Martin Vetterli
EPFL - I&C - LCAV
2
Chapter 1

Review Material

1.1 Hilbert Spaces


Exercise 1. Parsevals equality.  
N
Given a nite dimensional space W = and an orthonormal basis v (i) , i = 0, . . . , N 1,
verify that for any x W
N
 1  2
2  
x2 = < x, v (i) > .
i=0

Solution 1. Parsevals inequality


N 1
Let x W , then x = i=0 i v (i) , i =< x, v (i) >. Therefore
N 1 N 1
x22 =< x, x >=< x, i=0 i v (i) >= i=0 i < x, v (i) >,

Now, changing i =< x, v (i) >, we have


N 1 N 1  2
> .
2
x2 = i=0 < x, v (i) > < x, v (i) >= i=0 < x, v
(i)

Exercise 2. Optimal approximation.  


Consider a Hilbert space H and a subspace W spanned by an orthonormal basis v (i) , i I
W
(one can assume a nite dimensional case for simplicity). Prove that the approximation x
of x H given by: 
=
x < x, v (i) > v (i)
iI

satises:
) x
(a) (x x ;
2
2 is minimum among all linear approximations in W .
(b) x x

Solution 2. Optimal approximation


(a) From the property of the orthogonality we know that

(x x)
x
< x x,
x >= 0,

3
4 Chapter 1.

Proof by writing out:

< x x,
x> = < x, x
> < x,
x >

= < x, (i)
2
< x, v > v (i) > x
iI

= 2
< x, v (i) > < x, v (i) > x
iI
  2

= 2
< x, v (i) > x
iI

= 2 x
x 2=0
  2
where the last equation holds from the Parsevals equality. Note that iI < x, v (i) > equals
 
2 and not x2 , because it represents the projection of x onto the orthonormal basis v (i) ,
x
which has less elements then those that would be required to fully represent x.

(b) x = nI n v (n) for some n . For
is a linear combination of the basis approximation x
simplicity, we can assume that n R. The case where n C can be analyzed in a similar way
considering the real and the imaginary part of n .
We want to nd the coecients n for which we have the minimum of x x 22 . Therefore,
2
i x x
2=0

Thus,


22
x x = < x x,
xx >
i i

= (< x, x > < x, x > < x, x > + < x,
x >)
i

 
= < x, x > n < x, v (n) > m < v (m) , x >
i
nI mI


+ n m < v (n) , v (m) >
nI mI

  
(n) (m)
= < x, x > n < x, v > m < v ,x > + n2
i
nI mI nI
= 0

From which,
 
< x, v (i) > < v (i) , x > +2i = 0 i = Re < x, v (i) >

as claimed.

Exercise 3. Hilbert Spaces in Probability.


Consider the random variables X0 , X1 , X2 , dened on the same probability space. Suppose that
1.1. Hilbert Spaces 5

the mean of each variable is 0 and the joint correlation matrix is



8 4 1
RX = E[[X0 X1 X2 ]T [X0 X1 X2 ]] = 4 8 4 .
1 4 8

Dene the Hilbert space H as the space generated by all the linear combinations of the variables
X0 , X1 , and X2 , i.e.
H = {a0 X0 + a1 X1 + a2 X2 , a0 , a1 , a2 R}.
(a) Determine an orthogonal basis, {Y0 , Y1 } for the subspace W generated by X0 and X1 .
(b) Find the best approximation of the variable X2 in the subspace W , i.e. the random variable
Y that minimizes E[|Y X2 |2 ], with Y W . (hint: apply the projection theorem)

Solution 3. Hilbert Spaces in Probability.


This exercise may seem strange at a rst glance, but it is actually a standard exercise of linear
algebra. One should just replace the scalar product used in RN with

X, Y  = E[XY ]

where X and Y are random variables dened on the same probability space. One could easily
verify that this product is actually a valid scalar product. The scalar product induces always a
norm, dened by 
X = X, X.
With these denitions, the space H is actually a Hilbert space (one could verify that all the
properties valid for vector spaces hold for the set H and also that H is complete).
The space H is generated by the random variables X0 , X1 , X2 , which represent a basis of the
space. They are the vectors of the space and one can apply them the usual vector operations.

(a) The subspace W is the subspace of H generated by the vectors (i.e. the random vari-
ables) X0 and X1 . To determine an orthogonal basis, one can apply the Graham-Schmidt
procedure:
X0
Y0 = X 0
X1 X1 ,Y0 Y0
Y1 = X1 X1 ,Y0 Y0  ,

and replace the scalar product and the norm with the denitions that we presented earlier.
We obtain,
Y0 = 2X02
X X
X1 X1 , 202  202
Y1 = X X
(X1 X1 , 202  202 
1 1 X1 .
= 2 X +
6 0 6

2 , we write it as a linear combination


(b) To determine the best approximation of X2 in W , say X
of X0 and X1 (or equivalently of Y0 and Y1 ),
2 = b 0 X0 + b 1 X1 .
X

The error of the approximation is given by


2
E = X2 X
6 Chapter 1.

To apply the projection theorem we impose that the approximation error is orthogonal to
W . This correspond to the two equations:

E, X0  = 0
E, X1  = 0,

which give the linear system:



X0 , X0 b0 + X1 , X0 b1 = X2 , X0 
X0 , X1 b0 + X1 , X1 b1 = X2 , X1 .

The solution of the system is


b0 = 61
7
b1 = 12 ;
2 = X0 /6 + 7X1 /12.
therefore, X

Exercise 4. Fourier basis.


Consider the Fourier basis {w(k) }k=0,...,N 1 , dened as:
2
wn(k) = ej N nk .

(a) Prove that it is an orthogonal basis in N . The inner product is dene as in l2 space.
(b) Normalize the vectors in order to get an orthonormal basis.
(c) Propose the best least square approximation y N of a general vector y N +1.
Solution 4. Fourier basis
(a) Fourier basis is a sequence of N-dimensional vectors
  
(k) (k) (k)
w(k) = w0 , w1 , ..., wN 1 .
k=0,...,N 1

Recall that the set of N non-zero orthogonal vectors in an N -dimensional subspace


 is
 a basis for
the subspace. Therefore, we need to prove the orthogonality across the vectors w(k) k=0,...,N 1 .
Let us compute the inner product, that is:

N
 1 N
 1
2 2
< w(k) , w(h) > = wn(k) wn(h) = ej N nk ej N nh
n=0 n=0
N
 1 
2 N if k = h
= ej N n(kh) =
0 otherwise
n=0

Since the inner product of the vectors is equal to 0 for k = h, we conclude that they are
orthogonal. However, they do not have a unit norm and therefore are not the orthonormal
vectors.

(b) In order to obtain the orthonormal basis we normalize the vectors with the factor 1/ N ,
having:
1.1. Hilbert Spaces 7

N
 1
(k) (h) 1 2 1 2
< wnorm , wnorm > = ej N nk ej N nh
n=0 N N
N 1 
1  j 2 n(kh) 1 if k = h
= e N =
N 0 otherwise
n=0

(c) In order to use the projection theorem we need to dene a new


 space S that is a subspace of
CN +1 . A natural extension from CN to CN +1 is to dene S = (cT , 0)T , c CN . In that case,
 T
(k) (k)T
the orthonormal basis for the space S is ws = wnorm , 0 . Now, the best linear approximation
of y CN +1 on the subspace S, which minimizes the norm y y,
is obtained by projecting y
(k)
onto an orthogonal basis ws ,

N
 1
y = < y, ws(k) > ws(k)
k=0

Exercise 5. Circulant matrices.


2 2
Prove that the Fourier basis vectors (1, ej N k , . . . , ej N (N 1)k )T are left eigenvectors of an
N N circulant matrix.

Solution 5. Circulant matrices


Lets write the N N circulant matrix as

h(0) h(N 1) h(1)
h(1) h(0) h(2)  

H= .. .. .. .. = h(0) h(1) h(N 1) ,
. . . .
h(N 1) h(N 2) h(0)

i.e., each column h(i) is a downward shifted version of the rst column h(0) of the matrix H.
By denition, left eigenvector satises the following equality

y = wT H = wT

The product wT H results in a row-vector y of the length N , where the entry y(i), corresponds
to the inner product of the vector wT and the column h(i) .
Since
 2 2
T
w = 1, ej N k , ..., ej N (N 1)k

and the column h(i) of the matrix H is

T
(h(N i), h(N i + 1), ..., h(N 1), h(0), h(1), ..., h(N i 1)) ,
8 Chapter 1.

the element y(i) of y is then

N
 1 N
i1
2 2
y(i) = wT h(i) = h(m)ej N (m(N i))k + h(m)ej N (m+i)k
m=N i m=0

Now, knowing that the complex exponential term ejw is periodic with period 2, we can write
the following
2 2
ej N mk = ej N (m+N )k
2 2
and the exponential terms of the rst sum then reads ej N (m(N i))k = ej N (m+i)k . Finally,
the inner product y(i) is given by

N
 1 N
 1
j 2
N (m+i)k j 2
N ik
2 2
y(i) = h(m)e =e h(m)ej N mk = ej N ik H[k]
m=0 m=0

where H[k] is the k-th Fourier coecients of the sequence h(0). Therefore, the product y is
given by:

 2 2

y = wT H = H[k] 1, ej N k , ..., ej N (N 1)k = H[k]wT = wT

The Fourier basis vector w is, therefore, left eigenvector with the corresponding Fourier coecient
H[k] as its eigenvalue.

1.2 Basics of discrete signal processing


Exercise 6. Discrete sinc function
A discrete sinc function is the inverse DFT (IDFT) of the indicator function IM [n] of an interval
[M, M ], that is:

1 M n M
IM [n] = (1.1)
0 otherwise.

Assume that 2M is a divisor of N , with N the length of the IDFT.

(a) Derive a formula for sincM [n].

(b) Using Matlab, compute the discrete sinc function and compare it with the result from (a).

Solution 6. Discrete sinc function

Exercise 7. Shannon and orthonormal basis


Shannons sampling theorem states: A real bandlimited signal f (t) having no spectral compo-
nents equal or above m is uniquely dened by its samples taken at twice m , often called the
1.2. Basics of discrete signal processing 9

Nyquist frequency. By denoting Ts = /m , a reconstruction formula that complements the


sampling theorem is:
+

f (t) = f (nTs )sincTs (t nTs ), (1.2)
n=

where
sin(t/Ts )
sincTs (t) = . (1.3)
t/Ts
An alternative interpretation of the sampling theorem is as a series expansion of bandlimited
signals on an orthonormal basis. Dene:
1
n,Ts (t) = sincTs (t nTs ). (1.4)
Ts

(a) Show that {n,Ts (t)}nZ form an orthonormal set, i.e.

< n,Ts , m,Ts >= nm . (1.5)

(b) Show that any continuous-time signal f (t) bandlimited to m can be represented in the
orthonormal basis {n,Ts (t)}nZ . That is, another way to write the interpolation formula
(??) is
+

f (t) = < n,Ts , f > n,Ts (t).
n=

Solution 7. Shannon and orthonormal basis


(a) To prove that n , m  = nm , we use the Parsevals relation,

 +
n , m  = n (t) m (t) dt

 +
1
= n () m () d
2

where () is the Fourier transform of (t). We also know that the sinc function is equivalent
to a rectangular function in the Fourier domain,


Ts , Ts < < Ts
F {sincTs (t)} =
0 , otherwise

and therefore,

   jnTs
1 Ts e , Ts < < Ts
n () = F sincTs (t nTs ) =
Ts 0 , otherwise
10 Chapter 1.

Using this result in the Parsevals relation, we get

 +  + Ts  
1 1
n () m () d = Ts ejnTs Ts e+jmTs d
2 2 Ts
 + Ts
Ts
= ej(mn)Ts d
2 Ts

Ts 1  + T
ej(mn)Ts
s
=
2 j (m n) Ts Ts
1  
= ej(mn) ej(mn)
2j (m n)
sin [ (m n)]
=
(m n)
= sinc (m n) = 0 , m = n

For n = m, we get

 + Ts  + Ts
Ts Ts
ej(mn)Ts d = 1d
2 Ts 2 Ts

Ts +
= [] Ts
2  Ts 
Ts
= +
2 Ts Ts
= 1

Therefore,

1
 +
n , m  = 2 n () m () d = nm

as stated.

(b) We wish to prove that 1 n , f  = f (nTs ). We can easily do this by using the Parsevals
Ts
relation, such that

  + T 
1 1 s
n , f  = Ts e jnTs
F () d
Ts 2 Ts T
s
 +m
1
= F () ejnTs d
2 m
= f (nTs )

by denition of the Fourier transform.

Exercise 8. Discrete time processing


Consider the following system:
1.2. Basics of discrete signal processing 11

Discretetime
C/D D/C
x[n] system y[n] yr(t)
xc (t)

T T

where the discrete-time system is a squarer, i.e., y[n] = x2 [n].


What is the largest value of T such that yr (t) = x2c (t)? Assume that xc (t) has the maximal
frequency fmax .

Solution 8. Discrete time processing


Using the convolution-multiplication property, we have

y[n] = x2 [n]
Y (ejw ) = X(ejw )
X(ejw )

Therefore, Y (ejw ) will occupy twice the frequency band that X(ejw ) does if no aliasing occurs.
Hence, Y (ejw ) = 0 for < w < , implies X(ejw ) = 0 for 2 < w < 2 . If we denote by fmax
the maximum frequency of X(ejw ), then


2T fmax
2
and

1
T
4fmax

Exercise 9. Downsampling
Consider x[n] and y[n] = x[nN ] as two sampled versions of the same continuous-time signal,
with sampling periods T and N T , respectively. Prove that
N 1
1 
Y (ej ) = X(ej(2k)/N ) (1.6)
N
k=0

by going back to the underlying time-domain signal and resampling it with an N -times longer
sampling period.
Hint: Recall that the discrete-time Fourier transform X(ej ) of x[n] is:
  !
j 1  2
X(e ) = XT = XC k , (1.7)
T T T T
k=

where T is the sampling period. Then Y (ej ) = XN T (/N T ) (since the sampling period is now
N T ), where XN T (/N T ) can be written similarly to (??). Finally, split the sum involved in
XN T (/N T ) into k = nN + l, and gathering terms, (??) will follow.

Solution 9. Downsampling
Consider x[n] and y[n] to be obtained from sampling xc (t) with sampling periods T and N T ,
respectively.
12 Chapter 1.

 w  !
jw 1  w 2
Y (e ) = XN T = Xc k
NT NT NT NT
k
N 1 !
1   w 2
= Xc (nN + l) by def ining n and l through k = nN + l
NT NT NT
l=0 n=
N 1
" !#
1  1  w 2l 2
= Xc n
N T n= NT T
l=0
N
 1
1
= X(ej(w2l)/N )
N
l=0

1.3 Probability
Exercise 10. Gaussian random variable
Suppose that a measurement X[n] is aected by a random noise W1 [n] due to external interfer-
ences and by a random noise W2 [n] due to a defected measurement device. The noise W1 [n] is
i.i.d., distributed as a Gaussian random variable with mean m1 and variance 12 , and the noise
W2 [n] is i.i.d., distributed as a Gaussian random variable with mean m2 and variance 22 . Call
W the sum of the two noises, i.e., W = W1 + W2 .
(a) Compute the mean and the variance of W ;
(b) Give the joint distribution of W .
Suppose now that we dont know the law of the process W [n] but we have observed a
realization of it, w[1], . . . , w[K]. We are interested in estimating its mean based on the
observation w[1], . . . , w[K].
(c) Propose an empirical estimator of the mean;
(d) Check if such an estimator is biased.

Solution 10. Gaussian random variable


(a) We compute the mean as:
E[W ] = E[W1 + W2 ] = E[W1 ] + E[W2 ] = m1 + m2 = m
and variance as:
 
var(W ) = E |W1 + W2 |2 |E [W1 + W2 ]|2

For simplicity, we assume that W1 and W2 are real. Then,

   
2 2 2
var(W ) = E |W1 | + 2E [W1 W2 ] + E |W2 | |E [W1 ] + E [W2 ]|
   
= E |W1 |2 + 2E [W1 ] E [W2 ] + E |W2 |2 |E [W1 ]|2 |E [W2 ]|2 2 |E [W1 ] E [W2 ]|
= var(W1 ) + var(W2 )
1.3. Probability 13

(b) The process W [n] is a Gaussian process with mean m = m1 + m2 and variance 2 = 12 + 22 .

(c) The natural way to estimate the mean is to compute the average of the realizations w[k],

K
1 
m
(w[1], ..., w[K]) = w[k]
K
k=1

(d) We analyze the bias

E [m (w[1], ..., w[K])] m


       
1 K 1 K
limN E K k=1 w[k] m = limN E K k=1 w[k] m = 0

The bias is zero, and we say that the estimator is unbiased.

Exercise 11.
What is the correlation RX [n] of an independent identically distributed (i.i.d) process X of
2
variance X = 1 and zero mean? What is the power spectral density SX ()?

Solution 11.
The correlation is [RX [n] = m2X + X
2
[n] = [n]. The power spectral density is SX () =
F [RX [n]] = 1.

Exercise 12. Power Spectrum Density


Consider the stochastic process dened as

Y [n] = X[n] + X[n 1]

where and X[n] is a zero-mean wide-sense stationary process with autocorrelation function
given by
RX [k] = 2 |k|
for || < 1.
(a) Compute the power spectrum density SY (ej ) of Y [n].
(b) For which values of does Y [n] corresponds to a white noise? Explain.

Solution 12. Power Spectrum Density


The process Y [n] is obtained by ltering the wide-sense stationary process X[n], i.e.

Y [n] = H(z)X[n],

with H(z) = 1 + z 1 . We are in the case of application of the ltering formula. Therefore,

SY () = |H(ej )|2 SX ().

The function |H(ej )|2 is given by

|H(ej )|2 = 1 + 2 cos + 2 ,


14 Chapter 1.

The PSD of X[n] is computed by taking the DTFT of RX [n], that gives

 1 2
SX () = RX [k]ejk = X
2
.
1 2 cos + 2
k=

Hence, the PSD of Y [n] is

2 1 + 2 cos + 2
SY () = X (1 2 ) .
1 2 cos + 2
To have that Y [n] is a white process, we should impose that the spectral density is a constant.
This corresponds to setting = . The interpretation is the following. The process X[n] is
an AR process. In fact, it can be obtained by ltering a white noise W [n], which has variance
2
X (1 2 ), with the synthesis lter
1
Hs (z) = ,
1 z 1
which has a pole for z = . The lter H(z) is an FIR lter (i.e. it has only zeros) and has
exactly one zero at z = . We can imagine that the process Y [n] is obtained by ltering W [n]
with the cascade of the lters Hs (z) and H(z). Therefore, to obtain a white noise at the output,
we must have that the zero of H(z) cancels the pole of Hs (z), i.e. = .

Exercise 13. Stationarity


Consider the following block diagram
Y1 [n]
X1 [n] h1
0
1 Y [n]
+
2 SW
X2 [n] h2
Y2 [n]

The two input processes X1 [n], X2 [n] are jointly gaussian, uncorrelated, white, with zero mean,
2 2
and variance X 1
= 1, X 2
= 2 respectively. The two blocks h1 , h2 are linear, time-invariant
lters with transfer functions
H1 (z) = 1 + z 1 ,
H2 (z) = 1 z 1 ,
respectively. The output process Y [n] is obtained by means of the output switch SW .
(a) Suppose that the output switch is constantly on the position 0, or 2, what can you say
on the output process Y [n]? Is it stationary in wide and/or strict sense? Is it gaussian?
Compute the correlation and (if it exists) the spectral density of Y [n].
(b) What happens if the switch is on the position 1? Answer to the same question of the
previous case.
(c) Suppose that the position of the switch changes with the value of the time index n. The
switch takes the position 0, when n is even and the position 1, when n is odd. Is
Y [n] stationary in this case? Is it gaussian? Compute the correlation function of Y [n] (be
careful on the denition of the correlation function in this case!)
1.3. Probability 15

(d) Suppose that the switch takes a random position among 0 and 1 with equal probability
and independently of the values of X1 [n] and X2 [n]. Is Y [n] stationary in some sense in
this case? What is the correlation and (if it exists) the spectral density of Y [n]?

Solution 13. Stationarity


(a) The autocorrelation of h1 [n] is given by

rh1 [m] = h1 [m]


h1 [m]
= ( [m] + [m 1])
( [m] + [m 1])
= ( [m] + [m 1])
( [m] + [m + 1])
= [m]
[m] + [m]
[m + 1]
+ [m 1]
[m] + [m 1]
[m + 1]
= [m 1] + 2 [m] + [m + 1]

Since X [n] is a white process, rx1 [m] = x2 [m], and thus

ry1 [m] = rh1 [m]


rx1 [m]
= ( [m 1] + 2 [m] + [m + 1])
x2 [m]
= x2 [m 1] + 2x2 [m] + x2 [m + 1]
= [m 1] + 2 [m] + [m + 1]

The power spectral density Sy1 () is the Fourier transform of the autocorrelation function
ry1 [m],

Sy1 () = F {ry1 [m]}


= ej + 2 + ej
= 2 + 2 cos

Another way of obtaining Sy1 () is by performing all calculations in the z domain:

Sy1 (z) = Sh1 (z) Sx1 (z)


 $ %
= H1 (z) H1 z 1 x2
= (1 + z 1 )(1 + z)
= z 1 + 2 + z

and then setting z = ej for obtaining the Fourier transform:

Sy1 () = ej + 2 + ej
= 2 + 2 cos

For the second case, we get


16 Chapter 1.

ry2 [m] = 2 [m 1] + 4 [m] 2 [m + 1]

Sy2 () = 4 4 cos
(b) Since X1 and X2 are uncorrelated, Y1 and Y2 are also uncorrelated, and thus

ry [m] = E [Y [n] Y [n + m]]


= E [(Y1 [n] + Y2 [n]) (Y1 [n + m] + Y2 [n + m])]
= E [Y1 [n] Y1 [n + m]] + E [Y2 [n] Y2 [n + m]]
+E [Y1 [n] Y2 [n + m]] + E [Y2 [n] Y1 [n + m]]
= E [Y1 [n] Y1 [n + m]] + E [Y2 [n] Y2 [n + m]] + 0 + 0
= ry1 [m] + ry2 [m]
= [m 1] + 6 [m] [m + 1]
And the power spectral density is given by

Sy () = F {ry [m]}
= ej + 6 ej
= 6 2 cos
which is equivalent to Sy1 () + Sy2 ().
(c) We have four possible cases:



E [y1 [k] y1 [l]] , k, l even

E [(y1 [k] + y2 [k]) (y1 [l] + y2 [l])] , k, l odd
E [y [k] y [l]] =

E [y1 [k] (y1 [l] + y2 [l])] , k even, l odd

E [(y1 [k] + y2 [k]) y1 [l]] , k odd, l even
which, according to the results above, equals


ry1 [k l] , k, l even

ry1 [k l] + ry2 [k l] , k, l odd
E [y [k] y [l]] =

ry1 [k l] , k even, l odd

ry1 [k l] , k odd, l even
which means that the process is not stationary anymore.
(d) In this case, the autocorrelation function ry [k l] takes into account the random variable
that determines the position of the switch. Since the probability is equally distributed, each
combination of k and l occurs with a probability of 14 . Hence,

3 1
ry [k l] = ry [k l] + (ry1 [k l] + ry2 [k l])
4 1 4
1
= ry1 [k l] + ry2 [k l]
4
which means that the process is, in fact, stationary.
Chapter 2

ARMA Models

Exercise 14. A Simple AR Process


Consider the discrete time stochastic process {X[n]}n0 dened by

X[n + 1] = aX[n] + W [n + 1], n0

where |a| < 1, X[0] is a Gaussian random variable of mean 0 and variance c2 , and {W [n]}n1 is
a sequence of i.i.d. Gaussian variables of mean 0 and variance 2 , and independent of X[0].

(a) Express X[n] in terms of X[0], W [1], . . . , W [n] (and a). Give the mean and variance of
X[n];

2

(b) Suppose now that c2 = 1|a| 2 . Show that with this specic choice for the variance of X[0],

the process {X[n]}n0 is strictly stationary.


(c) Give the one-step predictor of X[n]: X[n|n 1].

(d) What is the whitening (or analysis) lter of {X[n]}n ?


What is the generating (or synthesis) lter of {X[n]}n ?

(e) Give the covariance function RX [k] =  [X[n + k]X[n]].


(f) Write X[n] in terms of W [n], W [n1] and X[n2]. Deduce from this the two-step predictor

of X[n]: X[n|n2], the projection of X[n] onto H (X, n 2), the Hilbert subspace spanned
by the random variables X[n 2], X[n 3], . . ..

Solution 14.
[A Simple AR Process]

(a) The recursion formula

X[n + 1] = aX[n] + W [n + 1], n0

17
18 Chapter 2.

yields

X[1] =aX[0] + W [1]


X[2] =aX[1] + W [2] = a2 X[0] + aW [1] + W [2]
X[3] =aX[2] + W [3] = a3 X[0] + a2 W [1] + aW [2] + W [3]
..
.
X[n] =an X[0] + an1 W [1] + an2 W [2] + + W [n]
n1

n
=a X[0] + ak W [n k].
k=0

Hence the mean of X[n] is given by


n1

 [X[n]] = an [X[0]] + ak  [W [n k]] = 0
k=0

since both  [X[0]] = 0 and  [W [j]] = 0, j 0.

The variance of X[n] is given by



n1
 n1

 
 |X[n]|2 =  an X[0] + ak W [n k] an X[0] + aj W [n j]
k=0 j=0
n1
 n1

= |a|2n  |X[0]|2 + an aj  [X[0]W [n j]] + 
ak an [W [n k]X [0]]
j=0 k=0
n1

+ ak aj  [W [n k]W [n j]]
k,j=0

 
Recall that 
|X[0]|2 = c2 , the sequence of random variables W [n] and X[0] are indepen-
 
dent and centered, thus [X[0]W [n j]] = [W [n k]X [0]] = 0, 0 j, k (n 1)

and [W [n k]W [n j]] = 2 [j k]. Combining these observations, we have
n1

 |X[n]|2 = |a|2nc2 + 2 ak aj [j k]
k,j=0
n1

= |a|2n c2 + 2 |a|2k
k=0
!
2n 2 2 1 |a|2n
= |a| c +
1 |a|2

2
(b) If c2 = 1|a|2 , the variance of X[n] is independent of n and is given by

 |X[n]|2 = 1 |a|2 .
2
19

Following the same steps in part (a), one can show that
X[n + k] = ak X[n] + ak1 W [n + 1] + ak2 W [n + 2] + + W [k + n]. (2.1)
Thus
X[n] X[n]
X[n + 1] W [n + 1]

.. = A ..
. .
X[n + k] W [n + k]
The distribution of (X[n], W [n + 1], , W [n + k]) is independent of n 0, and therefore
the distribution of (X[n], , X[n + k]) is independent of n 0, therefore {X[n]}n0 is
strictly stationary.
(c) Recall that
X[n] aX[n 1] = W [n],
thus X[n] aX[n 1], u = W [n], u = 0 for all u H(X, n 1) since H(X, n 1) =
H(W, n 1) (see Theorem 2.2 in class notes). Recall that, roughly speaking, H(W, n 1)
is composed of linear combinations of W [n 1], W [n 2], . . . . Note also that aX[n 1]
H(X, n 1) (since it is a linear function of X[n 1]), hence by the projection theorem this
is the best linear approximation (best in least square sense) for X[n], thus X[n|n 1] =
aX[n 1].
(d) The whitening lter makes {X[n]}n0 a white noise, here we have
X[n] aX[n 1] = W [n],
so it is clear that P (z) = 1 az 1 . The generating lter is given by
1 1 
H s (z) = = 1
= an z n
P (z) (1 az )
n0

and
X[n] = W [n] + aW [n 1] + a2 W [n 2] + + ak W [n k] + . . .

(e) Using (??) we obtain



k1

 [X[n + k]X [n]] = ak X[n] + aj W [n + k j] X [n]
j=0
k1

=ak  |X[n]|2 + aj  [W [n + k j]X [n]]
j=0
 
=a k
 |X[n]|2
!!
1 |a|2n
=ak |a|2n c2 + 2
1 |a|2

where the last equality follows from part (a) and [W [n + k j]X [n]] = 0 since W [n +
2
k j] and X[n] are independent 0 j k 1. Plugging c2 = 1|a| 2 yields

2
RX [k] = ak .
1 |a|2
20 Chapter 2.


Note that the above equality together with the fact that [X[n]] = 0 shows that the
2
process {X[n]}n0 is wide sense stationary with the special condition c2 = 1|a| 2 . As one

of your collegues suggested in the excercise session this could be an alternative proof for
part (b). Since the process is wide sense stationary and Gaussian, it is strictly stationary.
Recall that the statistics of a Gaussian process is completely determined by its rst and
second order properties.
(f) Again from (??), we have
X[n] = a2 X[n 2] + aW [n 1] + W [n].
thus X[n] a2 X[n 2], u = aW [n 1] + W [n], u = aW [n 1], u + W [n], u = 0
for all u H(X, n 2) since H(X, n 2) = H(W, n 2) and the random variables
W [n 2], W [n 3], . . . are independent of W [n 1] and W [n]. Note also that a2 X[n 2]
H(X, n 2), hence by the projection theorem this is the best least square approximation

for X[n] knowing X[n 2], X[n 3], . . . , thus X[n|n 2] = a2 X[n 2].

Exercise 15. Canonical Representation


Let {X[n]}n be a centered AR signal with power spectral density
1
SX () =
26 10 cos
(a) Give the canonical representation of it
P (z) X[n] = W [n]
2
(give P (z), give the variance of {W [n]}n ).

Hint: you can use Fejers identity, i.e., for all , = 0, and for all z  such that
|z| = 1: !
1 1
(z ) z = z|z |2 .


(b) Give the one-step predictor of X[n]: X[n|n 1].
(c) What is the whitening (or analysis) lter of {X[n]}n ?
What is the generating (or synthesis) lter of {X[n]}n ?
(d) Give the covariance function RX [k] =  [X[n + k]X[n]].
(e) Write X[n] in terms of W [n], W [n1] and X[n2]. Deduce from this the two-step predictor

of X[n]: X[n|n2], the projection of X[n] onto H (X, n 2), the Hilbert subspace spanned
by the random variables X[n 2], X[n 3], . . ..

Solution 15. Canonical Representation


(a)
1 1
SX () = =
26 10 cos j
26 10 e +e 2
j

1 1
= =
26 5ej 5ej (5ej 1)(5ej 1)
1 1
= j 1 j 1 j = 1 j
5e (1 5 e )5e j (1 5 e ) 25(1 5 e )(1 15 ej )
21

Recalling that
1
SX () = 2
|P (ej )|2
one can choose
1
P (z) = 1 z 1
5
1
and 2 = 25 . Note that with this choice P (z) is minimum phase, that is, it is stable, causal
and all its zeros lie inside the unit circle.

(b) With above choice for P (z), we have the following recursion for X[n],

1
X[n] X[n 1] = W [n] (2.2)
5
1
where {W [n]}n is white noise sequence with variance 2 = 25 . The exact reasoning in
 1
the Solution of Exercise 3, part (b) will yield X[n|n 1] = 5 X[n 1].

(c) The whitening lter is


1
P (z) = 1 z 1
5
and the generating lter is
 !n
s 1 1 1
H (z) = = 1 = z n .
P (z) (1 5 z 1 ) n0 5

(d) Using the above expression for the generating lter yields
!k
1 1
X[n] = W [n] + W [n 1] + + W [n k] + . . .
5 5

and !k !(k+1)
1 1
X[n + k] = W [n + k] + + W [n] + W [n 1] + . . .
5 5

Since [W [i]W [j] ] = 2 ij , the only non-zero terms in  [X[n + k]X[n]] are those with
i = j. Therefore,
!k  !2 !4 !k

RX [k] = [X[n + k]X[n] ] = 1
5
2
1+
1
5
+
1
5
+ ... =
1
5
1 25
25 24
.

(e) The recursion in (??) yields


!2
1 1 1
X[n] = X[n 1] + W [n] = X[n 2] + W [n 1] + W [n].
5 5 5
$ 1 %2 $ %2

X[n|n 2] = X[n 2] since 15 X[n 2] H (X, n 2) (it is a linear function
5 $ %2
of X[n 2]) and X[n] 15 X[n 2] = 15 W [n 1] + W [n] H (X, n 2) (indeed,
1
5 W [n 1] + W [n] H (W, n 2) and H (W, n 2) = H (X, n 2) for the canonical
representation (see Theorem 2.2 in class notes)).
22 Chapter 2.

Exercise 16.
This exercise has a more theoretical flavor (no long computation).

We consider a voice synthesizer based on ltering of a white noise H(z)W [n] = X[n],
where the input W [n] is a real Gaussian white noise, centered, with correlation
RW [n] = n ( > 0),
1
H(z) =
1 + a1 z 1 + a2 z 2
is a minimum phase lter with real coecients a1 and a2 , and X[n] is the process describing the
synthesized voice.

(a) Is X[n] a wide-sense stationary process? Justify precisely your answer.

We are now interested in computing the second order properties of X[n]

(b) Compute the power spectral density of X[n], SX ().

(c) Compute the mean of X[n] and express the variance of X[n] as a function of SX ().

(d) Give the recursive expression of the correlation of X[n].

(e) Using the recursive expression of the correlation write a system of linear equations that
allows to obtain the variance of X[n] as a function of a1 , a2 and .
Hint : Exploit the fact that X[n] is real.

In real life, the synthesizer is not perfect and its defects can be modelled as an additive white
noise V [n] with variance V2 . Finally, the signal we obtain is given by

Y [n] = X[n] + V [n]

where X[n] and V [n] are supposed to be independent. In particular, if we call H(X) the Hilbert
space spanned by X[n] and H(V ) the Hilbert space spanned by V [n], the independence on X
and V implies the orthogonality of the two spaces H(X) and H(V ). Call now H(W ) the Hilbert
space spanned by W [n]

(f) Are the two spaces H(V ) and H(W ) orthogonal? Justify precisely your answer.

By listening the noised synthesized voice, we would like to estimate the characteristics of its
generating system. More precisely:
A) From the process Y [n] we would like to recover the original synthesized voice X[n].

(g) Give the transfer function of a lter to optimally estimate, in the mean square sense, X[n]
from Y [n], and express such a transfer function in terms of a1 , a2 , , and V2 .

B) From the estimate of X[n] we would like to recover the coecients a1 and a2 of the transfer
function H(z) and the parameter of the white noise W [n],

(h) Write the system of linear equations to obtain a1 , a2 , and from the correlation of X[n].
23

Solution 16.
The rst part of the exercise can be straightforwardly solved by applying the fundamental ltering
formula for wide-sense stationary processes. We recall that such formula requires the input
process to be wide-sense stationary with summable correlation and the lter to be stable. The
input process W [n] is a white noise and it is very easy to check that is wide-sense stationary and
with summable correlation. The lter is stable since, by assumption, is minimum phase, and it
is time invariance, since by construction its coecients are constant.
We remark that, in general, we cannot directly prove properties of X[n] from the fact that X[n]
is a linear combination of W [n]. The key point is the stability of the linear combination, i.e.,
the stability of the lter. If the lter is not stable or not time invariant, X[n] is still a linear
combination of W [n], but X[n] is surely not wide-sense stationary, its mean is surely not zero,
and the Hilbert space its spans is surely not equal to the space spanned by W [n].
We also remark that the exercise clearly asked for precise answers.
1) Yes, X[n] is wide-sense stationary by the fundamental ltering formula for wide-sense
stationary processes.
2) By the fundamental ltering formula
SX () = |H()|2 SW ()

= (2.3)
(1 + a1 ej + a2 e2j )(1 + a1 ej + a2 e2j )
Note that SW () = since W [n] is a white noise.
3) Again, by the fundamental ltering formula


 [X] =  [W ] h[k].
k=

W [n] is centered, that is E[W ] = 0 and k= h[k] < since H(z) is stable, thus
E[X] = 0.

Alternatively, one can study the specic structure of the problem. In symbolic notation,
the ltering by H(z) can be expressed as
X[n] = H(z)W [n]
1
= W [n]
1 + a1 z 1 + a2 z 2
which can be equivalently written as
W [n] = (1 + a1 z 1 + a2 z 2 )X[n].
Interpreting z 1 as the delay operator we have
W [n] = X[n] + a1 X[n 1] + a2 X[n 2] (2.4)
which also reveals the auto-regressive structure of the process. Taking expectation on both
sides of the above equality gives
E [W [n]] = E [X[n]] + a1 E [X[n 1]] + a2 E [X[n 2]]
0 = E [X] + a1 E [X] + a2 E [X]
24 Chapter 2.

Thus E [X] = 0. We remark that the right side of equation (??) is a nite linear combi-
nation of X[n]. Therefore, the expectation is just the sum of the expectations of each terms.

The lter H(z) being stable and the correlation of the input process RW [n] = [n] being
summable, the correlation of the output process RX [n] is also summable. Hence, we have
the following relation for the variance of X[n]

1
Var(X) = RX [0] = SX ()d.
2
The fact that Var(X) = RX [0] is because of the fact that X[n] is centered.
4) The correlation structure of X[n] can be obtained by either using the formulas derived in
class for AR processes, after observing that the process is an AR-process as we already did
in part 3), or one can do the computation explicitly. Using the formula for AR processes
will directly yield

RX [m] + a1 RX [m 1] + a2 RX [m 2] = [m], m0 (2.5)

and RX [m] = RX [m] since X[n] is real.


5) Considering the recursive expression in (??) for m = 0, 1, 2 together with the fact that
RX [m] = RX [m] (since X[n] is real) yields the following system of linear equations

RX [0] + a1 RX [1] + a2 RX [2] =


RX [1] + a1 RX [0] + a2 RX [1] = 0
RX [2] + a1 RX [1] + a2 RX [0] = 0. (2.6)

The three equations can be easily solved to obtain the three unknowns RX [0], RX [1] and
RX [2]. By recalling that Var(X) = RX [0], the solution of the system gives us the variance.
6) Due to the stability and time invariance of the lter, we know from class notes (see section
A Hilbert Space Viewpoint of Linear Prediction of the ARMA chapter - in the current
notes is theorem Theorem 2.2 of Chapter 2) that

H(X) = H(W )

(we remark that the result was presented for the Hilbert spaces spanned by the past, i.e.,
H(X, n), but it is straightforwardly extended to the Hilbert spaced spanned by the whole
processes). Since we know that H(X) and H(V ) are orthogonal, H(W ) and H(V ) are also
orthogonal.
Alternatively, from the fact that W [n] is a nite linear combination of X[n], X[n 1] and
X[n 2] (see equation (??)) we have H(W ) H(X). consequently, the orthogonality
between H(X) and H(V ) implies the orthogonality between H(W ) and H(V ).
7) The lter that optimally estimates X[n] from Y [n], optimally in the sense that it minimizes
the mean squared error, is the Wiener lter whose transfer function is given by
SXY ()
F (ej ) = . (2.7)
SY ()
Recall that in our case,
Y [n] = X[n] + V [n]
25

and X[n] and V [n] are independent. The cross-correlation of X[n] and Y [n] is given by

RXY [m] = E[X[n + m]Y [n]]


= E[X[n + m](X[n] + V [n])]
= E[X[n + m]X[n]] + E[X[n + m]V [n]]
= RX [m]

since X[n] and V [n] are independent. Taking the Fourier transform and using the expression
for SX () given in (??), we obtain

SXY () = SX ()

= . (2.8)
(1 + a1 ej + a2 e2j )(1 + a1 ej + a2 e2j )

The auto-correlation of Y [n] is given by

RY [m] = E[Y [n + m]Y [n]]


= E[(X[n + m] + V [n + m])(X[n] + V [n])]
= E[X[n + m]X[n]] + E[X[n + m]V [n]] + E[V [n + m]X[n]] + E[V [n + m]V [n]]
= E[X[n + m]X[n]] + E[V [n + m]V [n]]
= RX [m] + RV [m].

Taking the Fourrier transform, we obtain

SY () = SX () + SV ()

= + V2 . (2.9)
(1 + a1 ej + a2 e2j )(1 + a1 ej + a2 e2j )

Note that SV () = V2 since V [n] is white. Combining (??), (??) and (??) we obtain

F (ej ) = .
+ V2 (1 + a1 ej + a2 e2j )(1 + a1 ej + a2 e2j )

8) Just observe that the system of linear equations in (??) can be solved for a1 , a2 and if
the correlation of X[n], that is if RX [0], RX [1] and RX [2] are known. Thus, the desired
system of equations is the one given in (??). Note that we have referred this system of
linear equations as Yule-Walker Equations in the class.
26 Chapter 2.
Chapter 3

Prediction and Estimation in the


General Non-ARMA Case

Exercise 17.
The process X[n] is a real AR process of order M .

1) Write the recursion that allows to synthesize the process X[n] from a white noise process
W [n].

2) Write the correlation structure of X.

Suppose that the order M is unknown but an estimate of the correlation function RX [n]
is available for n 0.

3) Describe precisely a procedure to determine the order M , the parameters of the AR model
and the variance of the input noise W .

4) What is the complexity of the algorithm proposed in the previous question? Can you
suggest an algorithm that does the same thing using a number of operations on the order
of M 2 ? (Details are not needed.)

5) What is the expression of the power spectral density of X[n], SX () (as a function of the
parameters and the order M)?

Solution 17.

1) The recursion that allows to synthesize the process X[n] from a white noise process W [n]
is:
X[n] = a1 X[n 1] + . . . + aM X[n M ] + W [n].

2) The correlation structure of X is.


2
RX [n] = a1 RX [n 1] + . . . + aM X[n M ] + [n]W .

3) According to Yule-Walker equations, compute the mean square error  m 22 of the linear
prediction for increasing orders m. This error will strictly decrease until order M + 1, then

27
28 Chapter 3.

it will be a constant. This is how you compute the order M of the ARMA process. Then
you can solve the Yule-Walker equations of orderM to get the parameters of the AR model.
The mean square prediction error of the predictor of order M corresponds to the variance
of the input noise W .
4) In order to compute the mean square error of the predictor of order m, we need to compute
the determinant of a (m + 1) (m + 1) matrix. This computation is of complexity m2 .
We repeat this operation from m = 1 to m = M . At this point the complexity reaches
already M 3 . Computing the ARMA coecient requires the inversion of a M M matrix
which demands order M 2 operations. In the end, the proposed algorithm uses a number
of operations of order M 3 .
If you know that M  K, you can use the Levinsons Algorithm to obtain all the coecients
of order less than K with order K 2 operations. You obtain the set of coecient of order 1
rst, then of order 2, ..., until order M and M + 1. At order M + 1 you realize that the
set of coecients are the same at order M and at order M + 1. You can deduce that the
order of the AR process is M . Consequently it is more clever to drop the algorithm at this
point and not run it until K. This algorithm will require order M 2 operations!
5)
1
H(z) = .
1 a1 z 1 . . . aM z M
Then X(z) = H(z)W (z). Using the fundamental ltering formula, we obtain

SX () = |H(ej )|2 W
2
.

Exercise 18.
The process X[n] is a real AR process of order 2:
1 1
X[n] = X[n 1] + X[n 2] + W [n],
4 8
where W[n] is a white noise. Using Yule Walker equations, give the best linear predictor of order
1 of X[n]. Compare the obtained coecient with 1/4 and comment.

Extra Training: Same exercise with X[n] = 0.3X[n 1] + 0.7X[n 2] 0.154X[n 3] + W [n],
and a linear predictor of order 2. This exercise will make you handle the correlation matrix.

Solution 18.
We are looking for a such that

X[n] = aX[n 1]
and that minimizes
 22 = E[|X[n] X[n]|
2
].
According to Yule-Walker equations, a = RX [1]/RX [0].

RX [1] = E[X[n]X[n 1]]


1 1
= E[X[n 1]X[n 1]] + E[X[n 2]X[n 1]] + E[W X[n 1]]
4 8
1 1
= RX [0] + RX [1].
4 8
29

We conclude that a = 2/7. 2/7 is slightly larger than 1/4. This can be explained by the positive
covariance between X[n 1] and X[n 2]. So this predictor says:
1 1
X[n] = X[n 1] + X[n 1].
4 28
1
28 X[n 1] predicts the contribution of 18 X[n 2].

For the extra question,


a1  0.196
a2  0.670.
An important thing to know to solve this exercise is:
 1  
a b 1 d b
= .
c d ad bc c a

Exercise 19.
In this exercise, we consider the application of the Wiener lter in reducing additive noise.
Consider a signal X[n] embedded in additive zero mean white Gaussian noise. That is,
Y [n] = X[n] + W [n].
Assume that X[n] and W [n] are uncorrelated.
(a) Derive the transfer function of an optimal non-causal lter.
(b) We dene the following signal to noise ratio:
Rxx (ej )
a() = .
Rww (ej )
How is the Wiener lter response in the case of noise-free frequencies, i.e., a(o )  1? and
in the case of very high noise, i.e., a(o ) 0? what can you conclude?

Solution 19.
(a) The expression for the Wiener lter is
SXY ()
H(ej ) =
SY ()
If Y [n] = X[n] + W [n] and X[n] and W [n] are uncorrelated with W [n] zero-mean, we have
RXY [m] =  [X[n + m](X[n] + W [n])] =  [X[n + m]X[n]] = RX [m]
RY [m] =  [(X[n + m] + W [n + m])(X[n] + W [n])] = RX [m] + RW [m]
hence,
SXY () = SX ()
SY () = SX () + SW ().
The Wiener lter is thus
SX ()
H(ej ) =
SX () + SW ()
30 Chapter 3.

(b)

SX ()
H(ej ) =
SX () + SW ()
a()
= .
a() + 1

If a(o )  1, H(ej ) 1. That is, the lter applies little or no attenuation to the
noise-free frequency component. If a(o ) 0, H(ej ) 0. That is, the lter applies a
high attenuation to the noisy frequency component. In conclusion, for additive noise, the
Wiener lter attenuates each frequency component in proportion to an estimate of the
signal to noise ratio.

Exercise 20.
2
Suppose that X[n] is zero mean white Gaussian noise with variance X = 1. A desired response
D[n] is obtained by applying X[n] to a linear lter h[n]; Our task is to design a linear lter g[n]
that minimizes a cost function J given by
 2   
J=  E [n] =  (D[n] Y [n])2 . (3.1)

Suppose that h[n] is a 3-tap FIR lter given by [h0 , h1 , h2 ]. We want to determine a 2-tap
optimum Wiener lter, which minimizes the cost function J.
(a) Compute the cross correlation vector RDX [n].

(b) Determine an optimal 2-tap lter g[n].

(c) Repeat (a) and (b) for the case when h[n] is an IIR lter with a transfer function given by

1
H(z) = . (3.2)
1 az 1

Solution 20.

(a) The cross correlation vector RDX [m] is dened as:

RDX [m] =  [D[n]X[n m]]


=  [(h0 X[n] + h1X[n 1] + h2X[n 2])X[n m]]
= h0 RX [m] + h1 RX [m 1] + h2 RX [m 2]
= h0 [m] + h1 [m 1] + h2 [m 2]
2
where we have used the fact that X[n] is a white noise with variance X = 1, Hence we
nd that

h0 if m = 0

h
1 if m = 1
RDX [m] =

h2 if m = 2


0 otherwise
31

(b) The optimal lter that estimates D[n] from X[n] is given by the Wiener lter formula:

g[k]RX [m k] = RDX [m].
k

Recall from the derivation of the formula that the equation above is obtained by dieran-
tiating with respect to g[m] (say the mth tap of the wiener lter in the context of this
exercise). Note that in this exercise we are restricted to use a two tap lter whose tap gains
we would like to choose optimally. Thus, in order for the rst tap gain g[0] to be optimal,
the lter should satisfy

g[k]RX [k] = RDX [0]
k
g[0][0] + g[1][1] = h0 [0] + h1 [1] + h2 [2]
thus g[0] = h0 and the optimality condition for the second tap gain yields

g[k]RX [1 k] = RDX [1]
k
g[0][1] + g[1][0] = h0 [1] + h1 [0] + h2 [1]
hence g[1] = h1 .
(c) To calculate RDX [m] we need to use the formula
1
= 1 + az 1 + a2 z 2 + ... + an z n + ...
1 az 1
Therefore


D[n] = ak X[n k]
k=0

The cross correlation vector RDX [m] is then


" #

RDX [m] =  ak X[n k] X[n m]
k=0



= ak [m k]
k=0

am if m 0
=
0 otherwise

And by the same arguments in part (b) one can easily nd g[0] = 1 and g[1] = a.

Exercise 21.
Suppose that a desired process X[n] is generated by ltering the white gaussian noise W [n]
(centered with variance 1) using a lter h[k], where
1 + 34 z 1
H(z) = . (3.3)
1 + 12 z 1
32 Chapter 3.

Consider now the signal Y [n] = X[n] + V [n], where V [n] is zero mean white Gaussian noise with
variance 1/2 and uncorrelated with W [n].
(a) Design a Wiener lter for estimating X[n] from Y [n].
(b) Repeat (a) for the case when V [n] is a random variable given by:
1 1
V [n] = X[n 1] X[n 2].
3 9

Solution 21.
(a) By ltering the unit-variance white noise with a lter H(ej ), we get an output signal X[n]
with power spectrum:
SX () = |H(ej )|2 .
The Wiener lter that estimates X[n] from Y [n] was obtained in Exercise 2:
SX ()
Q(ej ) =
SX () + SV ()
|H(ej )|2
= .
|H(ej )|2 + (1/2)

(b) In this case, the signal V [n] is not independent of the signal X[n]. Therefore, we can not
apply the formula derived in Exercise 2. We calculate the Wiener lter expression using
the general formula, that is:
SXY ()
Q(ej ) =
SY ()

RXY [m] =  [X[n](X[n m] + V [n



m])]
!
=  [X[n]X[n m]] +  X[n]
1 1
X[n m 1] X[n m 2]
3 9
1 1
= RX [m] + RX [m + 1] RX [m + 2].
3 9
By taking the Fourier transform
1 1
SXY () = SX ()(1 + ej e2j ).
3 9

RY [m] =  [(X[n] + V [n])(X[n  m] + V [n m])] !


=  [X[n]X[n m]] +  X[n]
1 1
X[n m 1] X[n m 2]
3 9
 ! 
+  1
3
1
X[n 1] X[n 2] X[n m]
9
 ! !
+  1
3
1
X[n 1] X[n 2]
9
1
3
1
X[n m 1] X[n m 2]
9
91 8 1 8 1
= RX [m] + RX [m 1] RX [m 2] + RX [m + 1] RX [m + 2]
81 27 9 27 9
33

The Fourier transform is then


!
91 1 $ j %
SY () = SX () + 8e 3e2j + 8ej 3e2j
81 27
! !
1 1 1 1
= SX () 1 + ej e2j 1 + ej e2j
3 9 3 9

And the Wiener lter is given by:

1
Q(ej ) = .
1 + 13 ej 19 e2j

Exercise 22.
The process X[n] is a real AR process:
X[n] = 0.3X[n 1] 0.4X[n 2] + 0.5X[n 3] + W [n],
where W [n] is a white noise.

(a) What is the order of the above AR process?.


(b) Using Yule Walker equations, give the best linear predictor of order 2 of X[n], i.e. nd a

and b in X[n] = aX[n 1] + bX[n 2] such that the  22 = E[|X[n] X[n]|
2
] is minimized.

Solution 22.
(a) The AR process is of order 3.
(b) According to Yule-Walker equations,
    
RX [0] RX [1] a RX [1]
= .
RX [1] RX [0] b RX [2]
Knowing that,
RX [1] = E[X[n]X[n 1]]
= 0.3E[X[n 1]X[n 1]] 0.4E[X[n 2]X[n 1]] + 0.5E[X[n 3]X[n 1]] + E[W X[n 1]]
= 0.3RX [0] 0.4RX [1] + 0.5RX [2],
and
RX [2] = E[X[n]X[n 2]]
= 0.3E[X[n 1]X[n 2]] 0.4E[X[n 2]X[n 2]] + 0.5E[X[n 3]X[n 2]] + E[W X[n 2]]
= 0.3RX [1] 0.4RX [0] + 0.5RX [1],
one will obtain
a  0.133
b  0.333.
34 Chapter 3.
Chapter 4

Adaptive Signal Processing

Exercise 23.
Consider the following diagram:

V0 [n] H0 X0 [n]

1
V1 [n] SW
S[n]
2
V2 [n]

V3 [n] H1 X1 [n]

where H0 and H1 are causal lters with the transfer functions:


H0 (z) = 1 + z 1 ,
H1 (z) = 1 z 1 .
V0 , V1 , V2 and V3 are white stationary processes, uncorrelated, jointly gaussian with zero mean
and the variances
V2 0 = 1 V1 1 = 1 V2 2 = 2 V2 3 = 1.
The switch SW is in the position 1 when the time index n is even and 2 when n is odd.
(a) Is X0 [n] Gaussian process? Is X0 [n] wide sense stationary process? Compute the correla-
tion of X0 [n].

(b) Determine the optimal lter that estimates the process D[n], i.e. D[n], given the observa-
tion of the two samples X0 [n] and X0 [n 1].

(c) Determine the optimal lter that estimates the process S[n], i.e. S[n], given the observation
of the two samples X1 [n] and X1 [n 1].
How can we use such an estimator to determine an estimate of D[n]? Compare this solution
with the one obtained in question 2), which one would you prefer?

35
36 Chapter 4.


(d) Consider the output of the system as a vector X[n] = [X0 [n]X1 [n]]T and give the expression

for the optimal estimator of D[n] based on the two observation of X[n] 1].
and X[n
Hint : Write the estimator as
T T

D[n] = fn,0 X[n] + fn,1 X[n 1].

Recall :  1  
a b 1 d b
= .
c d ad cb c a

Solution 23.
(a) We have:

V0 [n] + V0 [n 1] + V1 [n], n is even
X0 [n] = V0 [n] + V0 [n 1] + S[n] =
V0 [n] + V0 [n 1] + V2 [n], n is odd
Process X0 [n] is a sum of two processes D[n] and S[n] that are Gaussian at every instant.
Therefore, X0 [n] is Gaussian, as well. To check if the process is wide sense stationary we
compute the mean and the variance.
[X0[n]] = 0

2V2 0 + V2 1 , n is even
[X0 [n]X0[n]] = [(V0 [n]+V0 [n1]+S[n])(V0[n]+V0[n1]+S[n])] = 2V2 0 + V2 2 , n is odd
We can see that S[n] is not a wss process and consequently X0 [n] is not wss.
In order to compute the correlation of X0 [n] we need rst to compute the correlation of
S[n].

 2
[V1 [n]V1 [m]] = [n m]V1 , n and m are even




[V1 [n]V2 [m]] = 0, n even, m odd
RS [n, m] = [S[n]S[m]] =

[V2 [n]V
1 [m]] = 0, n odd, m even


[V2 [n]V2 [m]] = [n m]V2 2 , n and m are odd
Then,
RX0 [n, m] = [X0[n]X0[m]] = [(V0 [n] + V0 [n 1] + S[n])(V0 [m] + V0[m 1] + S[m])]
= 2V2 [n m] + V2 0 [n 1 m] + V2 0 [n m + 1] + RS [n, m]
0 2
2V0 [n m] + V2 1 [n m], n and m are even
= V2 0 [n 1 m] + V2 0 [n m + 1], n even, m odd or n odd, m even

2V2 0 [n m] + V2 2 [n m], n and m are odd

(b) We dene the cost function as:


1

Jmin = [|D[n] fn (k)X0 [n]|2 ].
k=0

The optimal lter is given by:


   1  
fn (0) RX0 [n, n] RX0 [n 1, n] RDX0 [n, n]
=
fn (1) RX0 [n, n 1] RX0 [n 1, n 1] RDX0 [n, n 1]
37

where

RDX0 [n, m] = [D[n]X0[m]] = [D[n](D[m] + S[m])] = RD [n, m] + RDS [n, m],


RD [n, m] = 2V2 0 [n m] + V2 0 [n 1 m] + V2 0 [n m + 1],
RDS [n, m] = 0 for all n and m
Then, when n is even we have:
   1    
fn (0) 2V2 0 + V2 1 V2 0 2V2 0 7/11
= = ,
fn (1) V2 0 2V0 + V2 2
2
V2 0 1/11

and when n is odd we have:


   1    
fn (0) 2V2 0 + V2 2 V2 0 2V2 0 5/11
= = .
fn (1) V2 0 2V0 + V2 1
2
V2 0 2/11

(c) We dene the cost function as:


1

Jmin = [|S[n] fn (k)X1 [n]|2 ].
k=0

The optimal lter is given by:


   1  
fn (0) RX1 [n, n] RX1 [n 1, n] RSX1 [n, n]
=
fn (1) RX1 [n, n 1] RX1 [n 1, n 1] RSX0 [n, n 1]

where

RX1 [n, m] = [X1[n]X1[m]] = [(V3 [n] V3[n 1] + S[n])(V3 [m] V3 [m 1] + S[m])]


= 2V2 3 [n m] V2 3 [n 1 m] V2 3 [n m + 1] + RS [n, m]

2V2 3 [n m] + V2 1 [n m], n and m are even
= V2 3 [n 1 m] V2 3 [n m + 1], n even, m odd or n odd, m even ,

2V2 3 [n m] + V2 2 [n m], n and m are odd
2
V1 [n m] n,m even
 
RSX1 [n, m] = [S[n]X1 [m]] = [S[n](V3 [m]V3 [m1]+S[m])] = RS [n, m] = 2 [n m] n,m odd
V2
0 otherwise
For even n we have:
   1    
fn (0) 2V2 3 + V2 1 V2 3 V2 1 6/17
= = ,
fn (1) V2 3 2V3 + V2 2
2
0 1/17

and when n is odd we have:


   1    
fn (0) 2V2 3 + V2 2 V2 3 V2 2 12/17
= = .
fn (1) V2 3 2V2 3 + V2 1 0 2/17

The estimated of D[n] can be determine as:



D[n]
= X0 [n] S[n]
38 Chapter 4.

(d) The optimal lter is obtained in the same way as before


   1  
fn,0 RX [n, n] RX [n 1, n] RDX [n, n]
=
fn,1 RX [n, n 1] RX [n 1, n 1] RDX [n, n 1]

where
    
X0 [n] RX0 [n, m] RX0 X1 [n, m]
RX [n, m] = R [ X0 [m] X1 [m] ] = ,
X1 [n] RX1 X0 [n, m] RX1 [n, m]

and  
RDX0 [n, m]
RDX [n, m] = .
RDX1 [n, m]

Exercise 24.
Consider the following schematic diagram:

S[n]
S_W

V_0[n]
D[n] X[n] Y[n]

H(z) f_n

V_1[n]

The two processes V0 [n] and V1 [n] are zero mean jointly Gaussian, they are mutually uncorrelated
and their self correlation functions are
|n|
RV0 [n] = E[V0 [n + m]V0 [m]] = 0 ,

|n|
RV1 [n] = E[V1 [n + m]V1 [m]] = 1 .
In the following we will take 0 = 1/2, 1 = 1/3. The switch SW selects one of the two processes
to generate the desired process D[n] that has to be estimated. The measurements are obtained
by ltering D[n] with the lter H(z) = 1 + z 1 and adding the noise process S[n], which is i.i.d,
zero mean, jointly Gaussian with V0 and V1 and with variance S2 = 1. The measurement process
X[n] is ltered by the time-varying lter fn of length L = 3 to obtain the estimate Y [n]. The
goal is to minimize the variance of the estimation error E[n] = D[n] Y [n].

(a) Assume rst that the switch is in the position 0 (i.e. D[n] = V0 [n]). Write the normal
equations for the lter fn and nd the optimal linear lter. Is this a Wiener lter? Compute
the estimation error variance, E[E[n]2 ]. Do the same with the switch in the position 1.

(b) Assume now that the switch is in the position 0 for the even samples and 1 for the
odd samples. Do the following steps:

(a) Compute the correlation function RD [n, m] = E[D[n]D[m]]. Is the process D[n] sta-
tionary?
(b) Compute the correlation functions RX [n, m] = E[X[n]X[m]] and RDX [n, m] = E[D[n]X[m]].
39

(c) Write the normal equations for the even and odd time indexes. Find the optimal
linear lter and the error variance for the two cases (even and odd time indexes) and
compare them with the result of question a).

(c) Assume that the position of the switch is chosen randomly and independently for each
sample. The probability of position 0 is p0 = 1/2. Compute again the correlation
RD [n, m] = E[D[n]D[m] and check if the process is stationary. Compute the optimal linear
lter in this case and compare the answer with the results of question b).

Solution 24.

(a) We want to minimize the error



E[n] = D[n] Y [n] = D[n] fn [i]X[n i]
i

where

X[n] = h[k]D[n k] + S[n].
k

If we dene the cost function as


Jn = [E[n]2 ],
than it has a unique minimum which can be found by setting the rst derivative to zero.
Following the steps that are given in the section 4.2.1 of the lecture notes, we nd that the
normal equation is:
L1

fn [j]RX [n j, n i] = RDX [n, n i] i = 0, . . . , L 1, 
n .
j=0

Now, we compute RX [n j, n i] and RDX [n, n i] for the case where D[n] = V0 [n].

RX [n j, n i] = [(h[0]V0[n j] + h[1]V0[n j 1] + S[n j])


(h[0]V0 [n i] + h[1]V0 [n i 1] + S[n i])]
= 2RV0 [i j] + RV0 [i j + 1] + RV0 [i j 1] + S2 [i j]
|ij| |ij+1| |ij1|
= 20 + 0 + 0 + S2 [i j],

RDX [n, n i] = [V0 [n] ((h[0]V0 [n i] + h[1]V0[n i 1] + S[n i])


= RV0 [i] + RV0 [i + 1]
|i| |i+1|
= 0 + 0 ,

We can see that the processes D[n] and X[n] are stationary BUT the lter fn is not a
Wiener lter since we limit the length of the lter to be L = 3.
From the Yule-Walker equation we have
1
fn = RX,n rDX,n .
40 Chapter 4.

1
fn [0] 2 + 20 + 1 20 + 1 + 20 220 + 0 + 30 1 + 0
fn [1] = 20 + 1 + 20 2 + 20 + 1 20 + 20 + 1 0 + 20 .
fn [2] 220 + 0 + 30 20 + 1 + 20 2 + 20 + 1 20 + 30
Changing the 0 = 1/2 we get
fn = [0.3944 0.0361 0.0031]T .
We compute [|E[n]|2 ] from the formula:

[|E[n]|2 ] = 
[(D[n] f T Xn )2 ]
2
= D + f T RX f 2f T rDX
Then, we can nd
[|E[n]|2 ] = 0.4343.
When the switch is in the position 1, all the steps are the same and we need to change
V0 to V1 . In that case we have,
1
fn [0] 2 + 21 + 1 21 + 1 + 21 221 + 1 + 31 1 + 1
fn [1] = 21 + 1 + 21 2 + 21 + 1 21 + 21 + 1 1 + 21
fn [2] 221 + 1 + 31 21 + 1 + 21 2 + 21 + 1 21 + 31

0.3999
= 0.0797
0.0144
and
[|E[n]|2 ] = [(D[n] f T Xn)2 ] = 0.4912.
(b) - We need to distinguish the four cases:
|nm|

0 n, m even,
|nm|
RD [n, m] = 1 n, m odd,


0 n even, m odd,
0 n odd, m even.
Clearly, the process D[n] is not stationary.

- Let us call Dh [n] = k h[k]D[n k] = D[n] D[n 1]. Then
RX [n, m] = [X[n]X[m]] = [(Dh[n] + S[n])(Dh[m] + S[m])]
= RDh [n, m] + RS [n, m] = RDh [n, m] + S2 [n m]
and
RDh [n, m] = [(D[n] + D[n 1])(D[m] + D[m 1])]
= RD [n, m] + RD [n, m 1] + RD [n, m 1]
+RD [n 1, m] + RD [n 1, m 1]
|nm| |nm|

0 + 0 + 0 + 1 n, m even,

|nm| |nm|
1 + 0 + 0 + 0 n, m odd,
= |nm+1| |nm1|

0 + + + 0 n even, m odd,

0
|nm+1|
1
|nm1|
0 + 1 + 0 + 0 n odd, m even.
41

The correlation RDX is equal to


RDX [n, m] = [D[n]X[m]] = [D[n](D[m] + D[m 1] + S[m])]
= RD [n, m] + RD [n, m 1]
|nm|

0 +0 n, m even,

|nm|
1 +0 n, m odd,
= |nm+1|

0 + n even, m odd,

0
|nm+1|
0 + 1 n odd, m even.
Clearly, the process is not stationary.
- We have the normal equation
L1

fn [j]RX [n j, n i] = RDX [n, n i] i = 0, . . . , L 1, n 
j=0

and to evaluate RX and RDX , we need to consider the cases when n is even and n is
odd.
Let us, for example, consider the case when n is even. Then,
1
fn [0] RX [n, n] RX [n 1, n] RX [n 2, n] RDX [n, n]
fn [1] = RX [n, n 1] RX [n 1, n 1] RX [n 2, n 1] RDX [n, n 1]
fn [2] RX [n, n 2] RX [n 1, n 2] RX [n 2, n 2] RDX [n, n 2]
1
3 1 + 20 20 + 21 1
= 2
0 + 1 3 1 + 21 20
20 + 21 21 + 1 3 20
and we compute
fn = [0.3644 0.0963 0.0752]T .
Applying the same formula for computing the error as in the previous part, we get:
[|E[n]|2] = 0.6409
The process D[n] is not stationary and this explains why the error is larger than in
the part a) for both D[n] = V0 [n] and D[n] = V1 [n].
(c) Since the position of the switch is randomly chosen we can introduce the random variable
SW [n] that describe the position of the switch. Positions p0 and p1 appear with the same

probability of 1/2. To compute RD [n, m] = [D[n]D[m]], we can use the following formula:
[f (D)] = [[f (D)|SW ]]
=
1
4
[f (D)|sW = (0, 0)] + 14 [f (D)|sW = (0, 1)]
+ [f (D)|sW = (1, 0)] + [f (D)|sW = (1, 1)] .
1 1
4 4
Then
RD [n, m] = [D[n]D[m]]
=
1
4
 [D[n]D[m]|sW = (0, 0)] + [D[n]D[m]|sW = (1, 1)]
1
4
1 |nm| 1 |nm|
= + 1
4 0 4
42 Chapter 4.

The process is stationary.


To compute the optimal lter we need:
RX [n, m] = [X[n]X[m]] = [(D[n] + D[n 1] + S[n])(D[m] + D[m 1] + S[m])]
= 2 RD [n m] + RD [n m + 1] + RD [n m 1] + S2 [n m],
and
RDX [n, m] = [D[n]X[m]] = [D[n](D[m] + D[m 1] + S[m])]
= RD [n m] + RD [n, m 1].
Changing in the normal equation we get:
1
f [0] 2.4167 1.0069 0.4294 0.7083 0.2923
f [1] = 1.0069 2.4167 1.0069 0.2986 = 0.0010
f [2] 0.4294 1.0069 2.5167 0.1308 0.0018

Exercise 25.
Consider the following block diagram:

D[n] X0 [n]
S0 [n] h0

h1

S1 [n] g Y [n]

h2

S2 [n] h3
X1 [n]

The three input processes S0 [n], S1 [n], and S2 [n] are jointly gaussian, with zero mean, and
the cross-correlations are zero. The processes S0 [n] and S1 [n] are white and the variances are
S2 0 = S2 1 = 1. The correlation of the process S2 [n] is

[n m] if n is even
E[S2 [n]S2 [m]] =
2[n m] if n is odd.
The blocks h0 , h1 , h2 , and h3 are linear, time-invariant lters with transfer functions H0 (z), . . . , H3 (z)
given by
H0 (z) = H2 (z) = 1 z 1 , H1 (z) = H3 (z) = 1 + z 1 .
43

(a) (4 points) Is the process X0 [n] gaussian? Is it stationary?

(b) (7 points) Is the process X1 [n] gaussian? Is it stationary?

(c) (7 points) The block g at the output combines the processes X0 [n] and X1 [n] to obtain an
estimate of the process D[n]. The estimate is obtained by applying the expression

Y [n] = aX0 [n] + bX0 [n 1] + cX1 [n].

Compute the parameters a, b, and c in order to minimize the cost function Jn = E[(D[n]
Y [n])2 ].

(d) (7 points) Compute the error variance of the estimate, Jn corresponding to the minimum
(use the symbols a, b, and c if you do not have the answer to the previous question).

(e) (5 points) Suppose that the lter h3 is replaced by the lter h 3 with transfer function
H3 (z) = 1 + az 1 , with a = 1. What can you say on the stationarity of the process
X1 [n]? What would be substantially dierent concerning the design of the block g?

Solution 25.
1)

X0 [n] = S0 [n] S0 [n 1] + S1 [n] + S1 [n 1]


S0 and S1 are jointly Gaussian; hence, X0 is also Gaussian.
E[X0 [n]X0 [m]] = 2RS0 [n m] RS0 [n m + 1] RS0 [n m 1] + 2RS1 [n m] RS1 [n m +
1] RS1 [n m 1],
Therefore,
E[X0 [n]X0 [m]] = RX0 [n m] = 4[n m], also, clearly E[X0 [n]] = 0 and E[X02 [n]] < ; thus
X0 is stationary.

2)

X1 [n] = S1 [n] S1 [n 1] + S2 [n] + S2 [n 1]


as above, X1 is also Gaussian.
To study the stationarity, we observe that E[X0 [n]] = 0 and E[X02 [n]] = 1 + 1 + 1 + 2 = 5 < ,
Therefore, two of the three necessary conditions for the stationarity is satised, to verify the
third condition, we have,
RX1 = 2RS1 [n m] RS1 [n m + 1] RS1 [n m 1] + RS2 [n, m] + RS2 [n, m 1] + RS2 [n
1, m] + RS2 [n 1, m 1],
Thus,

RX1 = 5[n m] + [n m 1] for n even


RX1 = 5[n m] + [n m + 1] for n odd

RX1 (n, m) depends on n and not only on n m; hence, X1 is not stationary.

3)
44 Chapter 4.

X0 [n]
We build the vector X[n] = X0 [n 1] . By applying the projection theorem, we have
X1 [n]

a
E[(D[n] X[n]T b )X[n]] = 0
c
which
gives,

a
RX b = DX
c
with,
RX0 [0] RX0 [1] RX0 X1 [n, n]
RX = E[X[n]X[n]T ] = RX0 [1] RX0 [0]4 RX0 X1 [n 1, n] ,
RX0 X1 [n, n] RX0 X1 [n, n 1] RX1 [n, n]
and
DX = E[D[n]X[n]].
We already found that RX0 [0] = 4, RX0 [1] = 0, RX1 [n, n] = 5, therefore,
RX0 X1 [n, n] = E[X0 [n], X1 [n]] = 0,
RX0 X1 [n 1, n] = E[X0 [n 1], X1 [n]] = 1,
E[D[n]X0 [n]] = E[D[n]2 ] = 2,
E[D[n]X0 [n 1]] = E[D[n]D[n 1]] = 1,
E[D[n]X1 [n]] = 0.

We obtain
4 0 0 a 2
0 4 1 b = 1 .
0 1 5 c 0
Thus, a = 12 , b = 5
19 , c = 1
19

4)
The cost function corresponding
to the optimal lter is given
by
a   a
Jmin = E[(D[n] X[n]T b )2 ] = D
2
a b c RX b = 2 1 5
19 = 14
19 .
c c

5)
If a = 1, we have
E[X1 [n]X1 [n]] = 2 + RS2 [n, n] + a2 RS2 [n 1, n 1],
Therefore, the element (3, 3) of RX depends on n, which means that the optimal lter is time-
varying.

Exercise 26.
Consider the following diagram:
45

S0 [n] h0

+
+ X0 [n]
S1 [n] h1 g
Y [n]

h2 X1 [n]

The processes S0 [n], S1 [n] are jointly Gaussian, uncorrelated, white with zero-mean and unit
variance. The lters h0 , h1 , h2 have z-transforms

1
H0 (z) = 1 z 1 , H1 (z) = 1 + z 1 , H2 (z) = 1 + z 1
2
respectively. The lter g is not completely specied. We only know that:

1) It is stable.

2) The z-transform of g has the form

1
G(z) =
a0 + a1 z 1 + a2 z 2

where a0 , a1 , a2 are unknown real constants.

Answer precisely the following questions:

1) Are the processes X0 [n], X1 [n] Gaussian? Are they stationary?

2) Propose an algorithm that, by using only the measurements of the processes X0 and X1 ,
estimates the coecients a0 , a1 , a2 of G(z). Remark: You should give enough information
to be able to write a computer program, for example use a pseudo code. For the adaptive
elements, if any, computation of the step size will be addressed in the next question.

3) Assume by some means that we are able to estimate that the variance of the process Y ,
Y2 is in the range 1 < Y2 < 2. In such a case, what is a reasonable range for the step-size
of the adaptive elements used in the previous answer?

4) Consider the maximum step-size computed in the previous question. Compare the
behavior of the algorithm when we take a step size /2 and /10. (Explain in words
how the behavior of the algorithm diers when we use these two dierent values for the
step-size.)
46 Chapter 4.

Solution 26.
TBD.

Exercise 27.
Consider the following schematic diagram:

D[n]

S0 [n] h0
+
g0 X0 [n]
+
S1 [n] h1
+
g1 X1 [n]
+
S2 [n] h2

The processes S0 [n], S1 [n], S2 [n] are zero mean white processes, uncorrelated and with variance
1. The lters h0 , h1 , h2 are causal time invariant linear lters. We know that H1 (z) = 1 + z 1 ,
while the lters h0 and h2 are unknown. The lter g0 is also causal and time invariant and the
transfer function has the structure

G0 (z) = a0 z 1 + a1 z 2 + a2 z 3 ,

where a0 , a1 , a2 are unknown parameters. The lter g1 has transfer function G1 (z) = 1 + z 1 .
We want to estimate the process D[n]; unfortunately, the elements and the processes inside the
rectangle in dashed line are not accessible. Only the processes X0 [n], X1 [n] can be measured
and used for the estimation. Some measurements allows to say that the spectral density of the
process X1 [n] is given by
SX1 () = 2 + 2 cos .

(a) Propose a scheme to estimate D[n] based only on the observation of the processes X0 [n],
X1 [n]. (State clearly the error process E[n] whose variance is to be minimized by the
adaptive lter(s).)

(b) Which is the lter length that you would choose for the adaptive lter(s)? Justify your
answer.

(c) Assume that the lenght of the adaptive lter(s) is L = 4. What is the range of the step-size
that we can consider for the LMS algorithm (steepest-descent range)? Which is a more
conservative range?
47

Solution 27.

(a) The process D[n] can be estimated by the following adaptive lter which uses the least-
mean-squares (LMS) algorithm.

D[n]

S0 [n] h0
g0 ++ X0 [n] + E[n]

D1 [n] V [n]
S1 [n] h1
+
g1 X1 [n] Y [n]
f
+
S2 [n] h2
D2 [n]

Figure 4.1: Scheme 1.

The cost function to be minimised by the lter is given by

Jn = E[E[n]2 ] = E[(X0 [n] Y [n])2 ] = E[(D[n] + V [n] Y [n])2 ]


= E[D[n]2 ] + 2E[D[n](V [n] Y [n])] + E[(V [n] Y [n])2 ]
= E[D[n]2 ] + E[(V [n] Y [n])2 ].

Note that D[n] and (V [n] Y [n]) are uncorrelated and zero mean processes.

An alternative (and maybe more intuitional) scheme is given in Fig.??, where we rst
estimate the process D1 [n] by using a Wiener lter and then pass these estimates through
an adaptive lter (LMS) which again minimizes the cost function

Jn = E[E[n]2 ] = E[(X0 [n] Y [n])2 ] = E[D[n]2 ] + E[(V [n] Y [n])2 ]

1 [n].
but the input to the adaptive lter is now the estimated process D
The Wiener lter is given by

SD1 X1 ()
W (ej ) = .
SX1 ()

We know that the spectral density of the process X1 [n] is given by

SX1 () = 2 + 2 cos() = (1 + ej )(1 + ej )


48 Chapter 4.

D[n]

S0 [n] h0
+ X0 [n] + E[n]
g0 +
V [n]
D1 [n]
S1 [n] h1
+ 1 [n]
g1 X1 [n] Wiener D Y [n]
f
+ filter
S2 [n] h2
D2 [n]

Figure 4.2: Scheme 2

and the cross corelation of the processes D1 [n] and X1 [n] can be found as
RD1 X1 [m] =  "[D1[n + m]X1[n]] #

=  D1 [n + m] g1 [k](D1 [n k] + D2 [n k])
kZ

= 
g1 [k] ( [D1 [n + m]D1 [n k]] +  [D1[n + m]D2[n k]])
kZ

= g1 [k]RD1 [m + k]
kZ
= RD1 [m] + RD1 [m + 1]
where to obtain the last equality we used the fact that the impulse respose of the lter g1
is given by g1 [n] = [n] + [n 1]. Taking the Fourier transform of this expression and
using the ltering formula yields,
SD1 X1 () = (1 + ej )SD1 ()
= (1 + ej )|H1 (ej )|2 SS1 ()
= (1 + ej )2 (1 + ej ).
Hence the Wiener lter is given by
SD1 X1 ()
W (ej ) = = 1 + ej .
SX1 ()
In part (b) of the exercise we will see that the optimum solution for the adaptive lter in
the rst scheme actually decomposes into the structure in the second scheme.
(b) The optimal lenght for the adaptive lters can be determined by looking at the lenght of
the optimal solutions for the lters. We concantrate on the rst scheme given in Fig.??.
Let f denote the optimal solution for the adaptive lter in Fig.?? and F (ej ) denote its
transfer function (Fourier transform of f ). Thus,
SX0 X1 ()
F (ej ) =
SX1 ()
49

and we have
RX0 X1 [m] =  [X0[n + m]X1[n]]
= E[(D[n + m] + a0 D1 [n + m 1] + a1 D1 [n + m 2] + a2 D1 [n + m 3])
(D1 [n] + D2 [n] + D1 [n 1] + D2 [n 1])]
= a0 RD1 [m 1] + a1 RD1 [m 2] + a2 RD1 [m 3] + a0 RD1 [m] + a1 RD1 [m 1] + a2 RD1 [m 2].
Note that D[n], D1 [n] and D2 [n] are uncorrelated processes. Taking the fourier transform,
SX0 X1 () = (1 + ej )(a0 + a1 ej + a2 e2j )SD1 ().
Recalling that SD1 () = (1 + ej )(1 + ej ) = SX1 () we obtain
F (ej ) = (1 + ej )(a0 + a1 ej + a2 e2j ) (4.1)
j 2j 3j
= a0 + (a0 + a1 )e + (a1 + a2 )e + a2 e (4.2)
j j 2j 3j
= (1 + e )(a0 e + a1 e + a2 e ) (4.3)
(??) shows that the optimal lter f has four taps, hence our choice for the length of
the adaptive lter in Fig.?? should be L = 4, while (??) shows that the the optimal
solution for the adaptive lter in Fig.?? decomposes in to the structure in Fig.?? since
F (ej ) = W (ej )G0 (ej ).

(c)
RX1 [m] = 2[m] + [m 1] + [m + 1]
Thus,
2 1 0 0
1 2 1 0
RX1 =
0

1 2 1
0 0 1 2
Using Matlab, one can see that the maximum eigenvalue of RX1 is 3.6180. Hence 0 < <
2 2
max = 0.5528. A more conservative range will be 0 < < L2 = 0.25. X1

Exercise 28.
Consider the following diagram:

S0 [n] h0

+
Y [n] X0 [n]
S1 [n] h1 g
+

X1 [n]
h2
50 Chapter 4.

The processes S0 [n], S1 [n] are jointly Gaussian, uncorrelated, white with zero-mean and unit
variance. The lters h0 , h1 , h2 have z-transforms
1
H0 (z) = 1 z 1 , H1 (z) = 1 + z 1 , H2 (z) = 1 + z 1
2
respectively. The lter g is not completely specied. We only know that:

1) It is stable.

2) The z-transform of g has the form

1
G(z) =
a0 + a1 z1 + a2 z 2

where a0 , a1 , a2 are unknown real constants.

Answer precisely the following questions:

1) Are the processes X0 [n], X1 [n] Gaussian? Are they stationary?

2) Propose an algorithm that, by using only the measurements of the processes X0 and X1 ,
estimates the coecients a0 , a1 , a2 of G(z). Remark: You should give enough information
to be able to write a computer program, for example use a pseudo code. For the adaptive
elements, if any, computation of the step size will be addressed in the next question.

3) Assume by some means that we are able to estimate that the variance of the process Y ,
Y2 is in the range 1 < Y2 < 2. In such a case, what is a reasonable range for the step-size
of the adaptive elements used in the previous answer?

4) Consider the maximum step-size computed in the previous question. Compare the
behavior of the algorithm when we take a step size /2 and /10. (Explain in words
how the behavior of the algorithm diers when we use these two dierent values for the
step-size.)

Solution 28.

(a) The processes X0 [n], X1 [n] are obtained by ltering and adding jointly Gaussian processes.
Since all lters are linear, time-invariant, and stable, X0 [n] and X1 [n] jointly Gaussian and
stationary.

(b) Since the unknown lter has the structure of an IIR lter with only the denominator, it
is convenient to connect the adaptive lter on the output X0 [n] and compute the desired
process D[n] by using X1 [n]. In order to compensate for the presence of the lters h1 and
h2 , we add a lter h at the output X1 [n]. Such a lter has transfer function

H1 (x)
H(z) = .
H2 (z)

In this way, if we neglect the presence of S0 [n], the identity between the output of the
adaptive lter and D[n] is obtained when the transfer function of the adaptive lter is
F (z) = G(z)1 = a0 + a1 z 1 + a2 z 2 . The block diagram is the following:
51

S0 [n] h0

X0 [n]
+
Y [n]
S1 [n] h1 g f
+
+
D[n]
X1 [n]
h2 h

and the size of the adaptive lter is L = 3.


(c) We estimate the maximum value of the variance of the output X0 [n], i.e.


2 2
X 0 ,max
= Y,max + h0 [n]2 S2 0 = 2 + 2 = 4.
n=

A reasonable range for the step size is given by


2 1
0<< 2 = .
LX 0 ,max
6

(d) In both cases the LMS algorithm converges; however, we expect a faster convergence when
the step size is larger, i.e. = /2. On the other hand, the smaller value of the step size
reduces the noise amplitude of the lter coecients and asymptotically the cost function
takes a smaller value.

Exercise 29.
Consider the following matlab script
hdeterministic = [1 1];
hrandom = 5; % number of random terms that have to be identified
hlen = hrandom + length(hdeterministic) - 1; % length of the filter that
we should identify
BigNumber = 2000; % number of samples that we process
FilterChangeStep = 200; % the filter that we identify changes every
FilterChangeStep samples
eprocess = zeros(BigNumber, 1); % error value for each estimated sample
noisestd = 1e-3; % noise standard deviation
xvector = randn(hlen, 1); % xvector contains the last hlen samples of
the input process

%%% ADD HERE


52 Chapter 4.

%initialization of filter f for the first iteration


% (vector of size hlen x 1)
f = zeros(hlen, 1); % this is an example

%%% END ADD

% main loop
for n = 1:BigNumber,
x = randn(1,1); % new sample of the input process
xvector = [x; xvector(1:end-1)]; % update xvector
% check if the filter has to be changed
if rem(n, FilterChangeStep) == 1,
h = conv(hdeterministic, rand(hrandom, 1));
end;
d = h*xvector + noisestd * randn(1,1); % desired process
y = f*xvector; % current estimate
e = d - y; % current error

%%% ADD HERE

% update of filter f for next iteration


% YOU ARE NOT ALLOWED TO USE VARIABLE h
f =

%%% END ADD

eprocess(n) = e; % save e for final statistics


end;

figure; plot(eprocess);
fprintf(1, The average error variance is %f, mean(eprocess.^2));

The goal is to complete the program in order to minimize the variance of the error process
(variable eprocess) by nding an appropriate lter f according to the following rules:

- You cannot use the variable h, which corresponds to the lter that has to be estimated.

- You cannot modify the other variables of the program (only f or your own variables).

- Needless to say, you shall not cheat by playing with the seed of the random number gen-
erator!

This is a free exercise, that is, there is no unique correct answer to the problem. One can nd
dierent solutions to this exercise, with dierent complexity and performances. You should try
to nd the best solution for the problem by using all the information that is available to you on
the structure of the identied system.
Do not forget to write comments and if necessary explanations on what your Matlab code does
53

and give a sample output of your program (as well as the code itself).

Solution 29.
Here is one solution we suggest for the problem but you might come up with solutions that
perform better than the one below:

hdeterministic = [1 1];
hrandom = 5; % number of random terms that have to be identified
hlen = hrandom + length(hdeterministic) - 1; % length of the filter that we should identify
BigNumber = 2000; % number of samples that we process
FilterChangeStep = 200; % the filter that we identify changes every FilterChangeStep samples
eprocess = zeros(BigNumber, 1); % error value for each estimated sample
noisestd = 1e-3; % noise standard deviation
sigmax = 1; % input process standard deviation
xvector = randn(hlen, 1); % xvector contains the last hlen samples of the input process

%%% ADD HERE

%initialization of filter f for the first iteration


% (vector of size hlen x 1)
%f = 0.5 * ones(hlen, 1); % this is an example
frandom = 0.5 * ones(hrandom, 1); f = conv(hdeterministic,
frandom);

%%% END ADD

% main loop
for n = 1:BigNumber,
x = sigmax * randn(1,1); % new sample of the input process
xvector = [x; xvector(1:end-1)]; % update xvector
% check if the filter has to be changed
if rem(n, FilterChangeStep) == 1,
h = conv(hdeterministic, rand(hrandom, 1));
end;
d = h*xvector + noisestd * randn(1,1); % observed process
y = f*xvector; % current estimate
e = d - y; % current error

%%% ADD HERE

% update of filter f for next iteration


% YOU ARE NOT ALLOWED TO USE VARIABLE h
xrandom = conv(hdeterministic, xvector);
xrandom = xrandom(length(hdeterministic):end-length(hdeterministic)+1);
mu = 2 / hrandom / (sum(conv(hdeterministic(end:-1:1), hdeterministic)) * sigmax^2) / 2;
% mu = 2 / L / S_max / 2
frandom = frandom + mu * e * xrandom;
54 Chapter 4.

if rem(n + 1, FilterChangeStep) == 1,
frandom = 0.5 * ones(hrandom, 1);
end;
f = conv(hdeterministic, frandom);
% f = f + 0.1 * e * xvector; % this is an example

%%% END ADD

eprocess(n) = e; % save e for final statistics


end;

figure; plot(eprocess);
fprintf(1, The average error variance is %f, mean(eprocess.^2));

Exercise 30.
In all of the following gures the processes S0 [n] and S1 [n] are zero mean white processes, uncor-
related and with variance 1. The lters h0 , h1 , h2 are causal time invariant linear lters. Provide
the suitable schemes with adaptive, stable and causal lters for each section. We can measure
the signals pointed out of the boxes

(a) Considering the following scheme,

X0 [n]
S0 [n] h0 a0 + a1 z 1 + a2 z 2

X1 [n]
1 + z 1

X2 [n]
S1 [n] h1 1 z 1

we would like to determine the parameters a0 , a1 , a2 .

(b) In the following scheme,


55

1 X0 [n]
S0 [n] h0
a0 + a1 z 1 + a2 z 2

X1 [n]
1 + z 1

X2 [n]
S1 [n] h1 1 z 1

We would like to estimate a0 , a1 , a2 .

(c) Considering the following scheme,

D[n]
X0 [n]
S0 [n] h0 a0 + a1 z 1 + a2 z 2

1 X1 [n]
S1 [n] h1
b0 + b1 z 1 + b2 z 2

X2 [n]
S2 [n] h2 1 12 z 1

We would like to obtain an estimate of D.

Solution 30.

(a) The parameters that have to be estimated correspond to the taps of an FIR lter. There-
fore, it is convenient to use an adaptive lter in parallel to the unknown lter and use the
output X0 as desired signal D[n]. It remains to choose the input of the adaptive lter. We
remark that the output X2 [n] is perturbed by the noise S1 [n]; hence, we prefer to use the
56 Chapter 4.

output X1 [n] instead. An adaptive lter can be interpreted as a projection of the desired
process on the space generated by the input process; so, some additive noise at the input
(even if uncorrelated with S0 [n]) would perturb the estimate of the parameters. It remains
to remove the eect of the lter 1 + z 1 that generates X1 [n]. Obviously, we cannot use
an inverse lter to remove such an eect, since it would be marginally stable (it would
have a pole in 1). The only possibility is to add the same eect on the process X0 [n]. In
conclusion, we obtain the following diagram:

X1 [n]
a0 + a1 z 1 + a2 z 2 1 + z 1

D[n]
X2 [n]
1 + z 1 f

1 z 1

(b) In this case the lter we are looking for has the structure of an AR lter, i.e. its transfer
function has a denominator only. In order to use an adaptive FIR lter, we the process X0 [n]
as the input of the lter. The desired signal can be obtained by taking any combination
of X1 [n] and X2 [n]. In fact, in this case the process S1 [n] perturbs the desired signal, but
the projection theorem ensures that the eect is removed. Among all the possibilities to
generate the desired process, there is one that does not require additional lters and is
extremely simple. This consists in adding the outputs X1 [n] and X2 [n]. In fact, if we call
D0 [n] the process at the input of the lter a0 + a1 z 1 + a2 z 2 , the desired process would
be

D[n] = (1+z 1 )D0 [n]+(1z 1D0 [n]+(1z 1)H1 (z)S0 [n] = D0 [n]+(1z 1)H1 (z)S0 [n].

In conclusion, the resulting diagram is the following:


57

1 X0 [n]
f
a0 + a1 z 1 + a2 z 2

X1 [n]
1 + z 1
D[n]

X2 [n]
1 z 1

(c) This problem combines the two previous ones. To estimate D[n], we should take the error of
the projection of X0 [n] on the space generated by the input of the lter a0 + a1 z 1 + a2 z 2 .
Unfortunately such a process is not accessible. A way to compute it is to estimate the
output X2 [n] and apply the inverse of the lter 1 12 z 1 . Remark that the inverse lter is
stable, since it has a single pole inside the unit circle. This solution is not the optimal one,
in fact the output of the inverse lter is perturbed by the process S2 [n]. A better solution
is to use the output X1 [n] and apply a second adaptive lter connected as an equalizer.
The complete diagram is the following:

D[n]
D[n]
X0 [n]
a0 + a1 z 1 + a2 z 2 g

1 X1 [n]
f
b0 + b1 z 1 + b2 z 2

X2 [n] 1
1 12 z 1
1 12 z 1

Remark that the adaptive lter f has length 3 and, at convergence, we have that the transfer
function is F (z) = b0 + b1 z 1 + b2 z 2 . Since the process X2 [n] is used to determine the
desired signal, the eect of the process S2 [n] is removed. The lter g has also length 3 and
its transfer function converges to G(z) = b0 + b1 z 1 + b2 z 2 .
58 Chapter 4.

Exercise 31.

In all the following gures the processes S0 [n] and S1 [n] are zero mean white processes, uncorre-
lated and with variance 1. The lters h0 , h1 , h2 are causal time invariant linear lters. Provide
the suitable schemes with adaptive, stable and causal lters for each section. We can measure
only the signals pointed out of the boxes

(a) Considering the following scheme,

S0 [n] h0 h1

X0 [n]
S1 [n] a0 + a1 z 1 + a2 z 2 1 + 2z 1

1 X1 [n]
1
1 z 1
2

we would like to determine the parameters a0 , a1 , a2 .

(b) In the following scheme,


59

S0 [n] h0

X0 [n]
S1 [n] h1 a0 + a1 z 1 + a2 z 2

X1 [n]
1 + 2z 1

X2 [n]
1 + 7z 1 + 12z 2

We would like to estimate a0 , a1 , a2 .

Solution 31.

(a) The lter to be identied has a transfer function with only a numerator; hence, it is
convenient to place the adaptive lter in parallel with it. To compute the input of the
lter, we can use the process X1 [n] and apply the inverse of the lter

1
.
1 12 z 1

We should also take into account the presence of the lter 1 + 2z 1 on the output X0 [n]
(i.e. the desired process). This lter cannot be inverted, since the zero is outside of the
unit circle. To compensate for it, we had the same ltering operation on the input of the
adaptive lter. The complete diagram is the following
60 Chapter 4.

S0 [n] h0 h1

X0 [n]
S1 [n] a0 + a1 z 1 + a2 z 2 1 + 2z 1 f

X1 [n]
1 12 z 1 1 + 32 z 1 + z 2

Remark that the process S0 [n] does not perturb the convergence of the lter, since it is
orthogonal to the process S1 [n] and it aects the desired signal only.

(b) Since the lter for which we want to estimate the parameters has a transfer function
with only the numerator, we connect the adaptive lter by using the output X0 [n] as
desired process. As input process we can take any combination of the processes X1 [n]
and X2 [n]. Clearly, it is necessary to compensate for the presence of the lters 1 + 2z 1
and 1 + 7z 1 + 12z 2. A way to do that is to add a lter between the output X0 [n] and
the desired signal. An alternative is to combine X1 [n] and X2 [n] with two lters so that
the eect of the lters 1 + 2z 1 and 1 + 7z 1 + 12z 2 is removed and the process W [n] is
obtained (see the gure below). We remark that,

(1 + 2z 1 )(1 12z 1 ) + (1 + 7z 1 + 12z 2 )2 = 1,

which means that (1 12z 1)X1 [n] + 2X2 [n] = W [n] and no additional lter is needed.
The resulting diagram is shown below
61

h0

W [n] X0 [n]
a0 + a1 z 1 + a2 z 2 f

X1 [n]
1 + 2z 1 1 12z 1
X[n]
X2 [n]
1 + 7z 1 + 12z 2 2

Exercise 32.
Consider the following algorithm:

S0 [n] h0

+
S1 [n] h1 h2
+

+
f h3

E[n]

where S0 [n], S1 [n] are jointly Gaussian uncorrelated white processes, with zero mean and unit
variance. The blocks h0 , h1 , h2 and h3 are causal lters with z-transforms

1
H0 (z) = 1 z 1 ,
2
H1 (z) = 1 + z 1 ,
1
H2 (z) = ,
1 12 z 1
H3 (z) = 1 + z 1 ,
62 Chapter 4.

respectively. The block f is an adaptive lter of length L = 3. Remark that the error process is
not taken in the usual way, but it is given by the dierence of the outputs of the lters h2 and
h3 .
(a) Suppose that the lter f adapts ideally in order to minimize the error variance E[E 2 ].
What are the 3 lter coecients?
(b) In the same condition of the question 1), is the process E Gaussian? Is it stationary?
(c) Compute the variance of E.
(d) Suppose that, instead of the ideal case, we adapt the lter f using the LMS algorithm,
what is the range of the step size that you would consider? What do you expect for the
behavior of the lter and error variance J in the case where you take = 1/10? Can you
obtain J = Jmin (i.e. the same variance of question 3)?

Solution 32.
The block diagram may look complicate due to the unusual presence of the lter h3 between
the output of the adaptive lter f and the block that computes the error process. However, we
should remember that an adaptive lter has the same properties of a regular lter. In particular,
in a cascade we can change the order of the lters without changing the overall transfer function
(this comes from the commutative property of convolution). This is of course only valid if we do
not change the error process E[n] that drives the adaptation of the lter. In our case, we can
swap the adaptive lter and the lter h3 . Furthermore, we remark that the lter h1 is equal to
h3 ; therefore, we can move this lter before the split of the process S1 [n]. Concerning the process
S0 [n], we remark that it cannot aect the behavior of the adaptive lter, since the adaptive lter
receives the process S1 [n], which is uncorrelated with S0 [n] (geometrically, we can say that the
lter projects the output of h2 on the space generated by the process S1 [n]). In conclusion, the
process S0 [n], ltered by the cascade of h0 and h2 , appears on the error process E[n]. We remark
also that the convolution of the two lters is simply the identity. We can draw the following
equivalent block diagram:
S0 [n]

X1 [n] Y1 [n] +
S1 [n] h1 h2
+
D[n]
+
f

E[n]

This is the usual problem of identication of the lter h2 by the adaptive lter f .

(a) The optimal lter (corresponding to minimum variance of J) is given by the solution of
the normal equations:
RX1 f = rDX1 ,
63

where the matrix RX1 is the correlation matrix of the vector

X1 [n] = [ X1 [n] X1 [n 1] X1 [n 2] ]T

and the vector rDX1 is the correlation between the process D[n] and the vector X1 , i.e.
rDX1 = E[D[n]X1 [n]]. We recall that the correlation of the process X1 [n] is given by

RX1 [n] = h1 h
1 RS1 [n],

where h
1 represents the time reversal of the impulse response of h1 . This relation gives,

RX1 [n] = [n + 1] + 2[n] + [n 1].

This allows to compute the correlation matrix RX1 , which is



RX1 [0] RX1 [1] RX1 [2] 2 1 0
RX1 = RX1 [1] RX1 [0] RX1 [1] = 1 2 1 .
RX1 [2] RX1 [1] RX1 [0] 0 1 2

Concerning the cross correlation term , we remind that the correlation between the output
of h2 , say Y1 [n], and the input X1 [n] is given by the function

RY1 X1 [n] = h2 RX1 [n] (4.4)

(this can be proved by writing the output as a convolution of the input and taking into
account that the input is stationary). In our case, Y1 [n] is the only part of D[n] that is corre-
lated to X1 [n]; therefore, RDX1 [n] = RY1 X1 [n] and rDX1 = [ RY1 X1 [0] RY1 X1 [1] RY1 X1 [2] ]T .
To compute (??) for n = 0, 1, 2, we take into account that the inverse z-transform of H2 (z)
is h2 [n] = 1/2n 1 [n], which gives

5
RY1 X1 [0] = h[1]RX1 [1] + h[0]RX1 [0] =
2
9
RY1 X1 [1] = h[2]RX1 [1] + h[1]RX1 [0] + h[0]RX1 [1] =
4
9
RY1 X1 [2] = h[3]RX1 [1] + h[2]RX1 [0] + h[1]RX1 [1] = .
8
Therefore, the optimal lter f is

f = R1
X1 rDX1 = [
33
32
7
16
11
32 ]T

(b) The input processes are jointly gaussian and stationary. The process E[n] is computed by
ltering the input processes using linear time-invariant stable lters; therefore, E[n] is a
gaussian stationary process.

(c) The variance of E[n] is simply the minimum cost function Jmin of the adaptive lter, i.e.
2
Jmin = D rTDX1 R1 2 T
X1 rDX1 = D rDX1 f .

2
The variance D is given by
2
D = S2 0 + Y2 1
64 Chapter 4.

and
+


Y2 1 = RY1 [0] = (h2 h
2 RX1 )[0] = h2 [k]h2 [n + k]RX1 [n]
n= k=




1 1 1 1 1
=2 + + = 4.
22k 2k 2k+1 2k 2k1
k=0 k=0 k=1

2
Finally, we compute D = 1 + 4 = 5 and
33

32
Jmin = 5 [ 5 9 9
] 7 = 269 .
2 4 8 16 256
11
32

(d) If LMS is used, a raisonable range for the step size is


2 1
0<< 2 = .
LX 1
3

An alternative is to compute the spectral density of X1 ,

SX1 () = 2 + 2 cos()

and take
2 1
0<< = .
LSX1 ,max 6
If = 1/10, we expect that the LMS iteration converges, since is in the valid range. With
respect to the choice of maximum value of the step size, we expect that the convergence
will be slightly slower, while the error variance will be smaller. Since LMS is a stochastic
gradient method, the lter coecients remain noisy and the error variance never reaches
the minimum variance Jmin .
Chapter 5

Spectral Estimation

Exercise 33. AR once again


Consider a centered (zero-mean) real-valued AR process {Xn }n verifying the equation

X[n + 1] = aX[n] + W [n + 1], n 


where

- a , |a| < 1,

- W [n] is a real-valued white noise (i.e., a sequence of i.i.d. random variables), centered,
with variance 2 > 0.

We now observe a realization x[n] of the AR process X[n] and we would like to estimate the
power spectral density.

(a) Describe precisely a parametric method for estimating the power spectral density of the
AR process X[n].

(b) Suppose you can choose to observe either 100 or 1000 realizations of X[n]. How many
realizations would you choose for your spectral estimator? Justify your answer precisely.

(c) Propose a recursive method for estimating the power spectral density of a more general
AR process
N
 1
X[n + 1] = ai X[n i] + W [n + 1], n . 
i=0

(d) Compare the computational burden of the recursive method with the one of the direct
approach.

Solution 33. AR once again

(a) Using the symbolic notation we can express the process X[n] as

X[n + 1] = aX[n] + W [n + 1],

X[n + 1](1 az 1 ) = W [n + 1],

65
66 Chapter 5.

1
X[n + 1] = W [n + 1].
1 az 1
Then, the power spectral density SX () is given as:
1 1
SX () = 2 = 2 .
|1 aej |2 1 + a2 2a cos

We need to estimate a and 2 . The two parameters can be estimated using Yule-Walker
equations.
(b) Since the noise is white and centered then it is always better to use more realizations for
estimating the covariance matrix used in Yule-Walker equations.
(c) The power spectral density of a more general AR process can be estimated by rst esti-
mating the parameters a0 , . . . , aN 1 with Levinsons algorithm, by starting with a one-step
predictor, and iteratively computing the coecients of higher-order predictors until the co-
ecients of the N -th order predictor has been determined.
(d) In the case of having a model whose order N is known a priory, the computational com-
plexities of both using Levinsons recursive algorithm and directly solving Yule-Walker
equations are O(N 2 ). However, in the case where the model order N is not known a
priory, the computational complexity of Levinsons recursive algorithm stays the same,
whereas the complexity of iteratively solving Yule-Walker equations for dierent orders n,
until the right order N has been found, is O(N 3 ).

Exercise 34. Annihilating filter method vs. MUSIC


Assume that we have a random process X[n] that is composed of 3 complex sinusoids:
3

X[n] = k ej(2fk n+k ) ,
k=1

where (f1 , f2 , f3 ) = (0.2, 0.3, 0.4), (1 , 2 , 3 ) = (1, 2, 3) and the phases k are stationary random
variable, independent and uniformly distributed over [0, 2). The signal is aected by additive
2
zero-mean white noise with W , independent of X[n]. We have the access only to the noisy
realizations, i.e.
Y [n] = X[n] + W [n].
2
(a) Simulate 20 realizations of Y [n] when W = 1 and from this realizations estimate the
frequencies fk and the weights k of the sinusoids using:
(a) annihilating lter method,
(b) MUSIC method.
2
(b) Do the same steps when W = 4 and compare the two methods.
(c) Assume that the signal X[n] is deterministic, i.e. the phases k are known. We want
to estimate fk and k . Can we now use the annihilating lter method and the MUSIC
method? Point out the dierences.

Solution 34. Annihilating filter vs. MUSIC


67

(a) In Matlab we have the following code:

% Signal X
c = [ 1 2 3]; f = [.2 .3 .4]; X = zeros(1,30);
% We have 20 realizations of the process
for i = 1:20
theta = 2*pi*rand(1,3);
W = 1*randn(1); % or 4*randn(1);
% We choose 30 samples from each realization
n = 0:29; X(i,:) = c.*exp(j*theta)*exp(j*2*pi*f*n) + W;
end

% ANNIHILATING FILTER METHOD

% In this part we can only use one realization


Xl = toeplitz(conj(X(1,3:29)), X(1,3:-1:1));

Xr = -conj(X(1,4:30));

h = pinv(Xl)*Xr; root = roots([1conj(h)]);

% frequency estimates
fe = phase(root)/2/pi

% weight estimates
n1 = [ 0 1 2]; ce = abs(inv(exp(j*2*pi*n1*fe)) *conj(X(1,1:3)))

% As the result we have that the frequency the sinusoid with the amplitude
% 1 and frequency 0.2 is not well estimated since it is usually barried
% into the noise. The estimation accuracy for the weights are usually very
% poor.

% MUSIC

% First, we estimate the covariance matrix.


R = zeros(5,5);
for p = 1:20 % 20 realizations
for k=5:30 % 30 samples
R = R + conj([X(p,k) X(p,k-1) X(p,k-2) X(p,k-3) X(p,k-4)])*
[X(p,k) X(p,k-1) X(p,k-2) X(p,k-3) X(p,k-4)];
end
end
R = R/26/20; % average over the number of samples {1/(M-N)} and number of realizations
[G S V] = svd(R);
Gnoise=G(:,4:5); % the eigenvectors that correspond to the noise space

% plot the function in 100 points


n2 = [ 0 1 2 3 4 ]; for k = 1:100
root_music(k)=exp(j*2*pi*(k-1)*n2/100)*Gnoise*Gnoise*exp(j*2*pi*(k-1)*n2/100);
68 Chapter 5.

music(k) = 1/real(root_music(k));
end

figure; plot([0:1/100:1-1/100], real(root_music));


% The frequencies of the zeros of the plot correspond to the frequencies of
% the signal X.
figure; plot([0:99], music);
% The frequencies of the peaks of the plot correspond to the frequencies of
% the signal X.
% The weights can be estimated in the same way as for the annihilating
% filter method or also using the equation (5.15) from the lecture notes.
% The second option is more stable to noise since we are "substracting" the
% noise component.
(b) The annihilating lter method can be used in the same way as for the previous case. The
MUSIC method can be used as well. The only dierence would be when estimating the
weights because we cannot use straight forward the equation (5.15) from the lecture notes
(try to see why).

Exercise 35. Line Spectrum Estimation: the Dual Problem


Let x(t) be a continuous periodic signal of period T ,
 M1

x(t) = ak (t nT tk )
n k=0

where (t) is a Dirac delta function. Assume that you want to use the annihilating lter method to
estimate parameters tk , k = 0 . . . M 1 from an appropriate set of the Fourier series coecients.
[n] of x(t).
(a) Compute the Fourier series coecients x
(b) Write a system of equations that allows you to nd tk for M = 3. What is the minimum
number of Fourier series coecients required for a unique solution?
(c) How does the noisy case dier from the previous case, where the presence of noise was not
considered?

Solution 35. Spectral estimation


(a) The Fourier series coecients of x(t) are given by
 M1
1 T  2
[m] =
x ak (t nT tk )ej T mt dt
T 0
n k=0

 1  T M1
 2
= ak (t nT tk )ej T mt dt
T 0
n k=0
 T M1

1 2
= ak (t tk )ej T mt
dt
T 0 k=0
M1
1  2
= ak ej T mtk
T
k=0
69

(b) For M = 3, we have


1 2 2 2
[m] =
x (a0 ej T mt0 + a1 ej T mt1 + a2 ej T mt2 )
T
Using the annihilating lter method, we will choose a lter of length 4 being [1, h1 , h2 , h3 ].
We have then H(Z)X(m) = 0. In matrix notation we get

1
[3] x
x [2] x
[1] x[0] 0
h
x[4] x[3] x
[2] x[1] 1
h2 = 0

[5] x
x [4] x
[3] x[2] 0
h3

We thus need at least 6 components of Fourier series coecients to have a unique solution.
(c) As already discussed in class, the anhilating lter method is not robust to addition of noise.
For small variance of the noise and large values of the ak coecients, the algorithm will
still perform well, but as soon as the SNR becomes too small, the algorithm cannot be used
anymore.

Exercise 36. Numerical Analysis of the Periodogram (Matlab)


Consider an AR(1) process X[n] = cX[n 1] + W [n] where c 1 (take c = 0.9, X[0] 1) and
W to be an i.i.d. normally distributed noise of zero mean and unit variance.
(a) Write a function pergram that computes a periodogram Rp (ej ) of a process of length M .
Consider rst the signal without noise X[n] = cX[n 1]. For M = 256, 512, 1024 plot in
logarithmic scale Rp and compute the variance of Rp . Comment on your results. Repeat
the experiment for the noisy signal X[n] = cX[n 1] + W [n]. Does the variance decrease
as you increase M ? Explain your answer.
(b) Compute the averaged periodogram of N = 4 segments of size L = 256. Plot the result,
and compute its variance. Did the variance decrease with respect to the M = 256 case in
the exercise 4.1.?
(c) Add a sinusoid s[n] = A sin(2n/F ) to the signal X, with A = 5, F = 10. Compute and plot
the periodogram of the new signal for M = 256, 512, 1024. What do you notice? Compare
the resulting component corresponding to the sinusoid with the component corresponding
to the AR process. Repeat the experiment with the averaged periodogram. How do the two
components mentioned above modify? Next, add one more sinusoid s0[n] with amplitude A
and with a frequency close to the frequency of s[n] (F). Repeat the above experiment and
comment on your results.

Solution 36. The periodogram

(a) The variance of the periodogram Rp (ej ) does not decrease when the number of samples
are increased. This can be directly observed from the fact that the uctuations in the
plots do not decrease as M increases from 256 to 1024. The variance of the periodogram
V ar(Rp (ej )) can be computed by considering several realizations of the process X[n] and
looking at the value of Rp (ej ) for xed .
(b) The spectrum gets smoother by averaging and the variance decreases.
70 Chapter 5.

(c) The harmonic is well detected, while the spectrum of the AR signal is noisy. Averaging
especially helps to smooth the component corresponding to the AR signal. When adding
a second harmonic, if the resolution is not large enough, the two harmonics confound on
the spectrum and are not detectable separately.

Exercise 37. Alinghi II: Mast stress analysis (16 points)

The Mast (the large pole used to hold up the sails) is denitively a critical component of the
sailing boat. Here again, the high technology materials used for its construction are pushed to
their stress limit. During prototype testings, the behavior of the Mast must be monitored so to
assure that it is properly dimensioned: if the mast breaks, the game is over (as for NZ team in
the 2003 edition).

More precisely, we monitor the elongations of the Mast using a piezoelectric sensor positioned at
its middle point, as depicted in the gure below.
71

Much of the information on the mast stress is contained in the power spectrum of the signal
measured by the piezoelectric sensor. In particular such a power spectrum is smooth and can be
approximated by a fractional polynomial

1 
S () =
C(z) z=ej

1) Assuming that the approximation of a smooth spectrum (fractional polynomial) is correct,


precisely describe a method for estimating the spectrum. More precisely we need to

1.a) Estimate the number of parameters describing the spectrum (order of the polynomial,
etc.)
1.b) Estimate the value of such parameters
1.c) Provide an estimation error

We then realize that the smooth spectrum (fractional polynomial) approximation is not exactly
correct.

2) How this will aect the estimation of the number and values of the parameters?

The technical team complains that the method you have proposed is too complicated and ask
you to use a periodogram based approach

3) Give precise arguments to defend your choice.


72 Chapter 5.

Solution 37.

(1) The mean is given by

 1 4 1 1
H0 (z) = 4( )k 2k = = Y0 (z) = X(z)
3 1 + 13 z 1 P (z) P (z)

X[n] is WSS, so Y0 [n] is WSS too.

(2) From above it is clear that it is AR process.

(3) It has exactly the structure of correlation of AR processes; referee to the lecture notes.

(4) To check if the impulse response is really given by h0 [k], one needs to build up the analysis
lter and multiply it by the output of the system. The obtained signal should be white
noise.

(5) The wiener lter is the optimal lter that could be designed. The process is WSS and such
lter can be used.

Exercise 38. White Noise Periodogram


Let y(t) be a zero-mean white noise with variance 2 and let

N 1
1  2
Y (k ) = y(l)ejk l ; k = k (k = 0, . . . , N 1)
N l=0 N

denote its (normalized) DFT evaluated at the Fourier frequencies k .

(a) Derive the covariances

E[Y (k )Y (r )], k, r = 0, . . . , N 1

k) =
(b) Use the result of the previous calculation to conclude that the periodogram (
|Y (k )|2 is an unbiased estimator of the PSD of y(t).

(c) Explain whether the unbiasedness property holds for = k as well. Present an intuitive
explanation of your nding.

Solution 38. White Noise Periodogram


73

(a)

N 1 N 1
1  1 
E[Y (k )Y (r )] = E[ y(l)ejk l y (m)ejr m ]
N l=0 N m=0
N 1 N 1
1  
= E[ y(l)y (m)ej(k lr m) ]
N m=0
l=0
N
 1 N
 1
1
= E[y(l)y (m)]ej(k lr m)
N
l=0 m=0
N
 1 N
 1
1
= 2 lm ej(k lr m)
N
l=0 m=0
N 1
1 2  jl 2 (kr)
= e N
N
l=0
1 2
= N kr
N
= 2 kr .

(b) Since y(t) is a zero-mean white noise with variance 2 , we know that its PSD is equal to
the variance:

Sy () = 2 .

k ) is an unbiased estimator of the PSD, we should show that its expecta-


To show that (
tion is equal to the actual value of the PSD. We could do it in the following way:

k )] =
E[( E[|Y (k )|2 ]
= E[Y (k )Y (k )]
= 2 kk
= 2 .


(c) Taking () = |Y ()|2 for any , we can nd the expectation of the estimator ()
in the
74 Chapter 5.

following way:

E[()] = E[|Y ()|2 ]
N 1 N 1
1  1 
E[Y ()Y ()] = E[ y(l)ejl y (m)ejm ]
N l=0 N m=0
N 1 N 1
1  
= E[ y(l)y (m)ej(lm) ]
N m=0
l=0
N
 1 N
 1
1
= E[y(l)y (m)]ej(lm)
N
l=0 m=0
N
 1 N
 1
1
= 2 lm ej(lm)
N
l=0 m=0
N 1
1 2  j0
= e
N
l=0
1 2
= N
N
= 2 .

Since the expected value of the estimator () is equal to the actual value of the PSD
Sy (), it is unbiased for any frequency .

Exercise 39. Window selection for Blackman-Tukey method


Consider the case when the signal is composed of two harmonic components which are spaced
in frequency by a distance larger than 1/N . If you were to use a Blackman-Tukey method for
spectral estimation, what window would you use if:
(a) The two spectral lines are closely-spaced in frequency, and they have similar magnitudes?
(b) The two spectral lines are not closely-spaced in frequency, and their magnitudes dier
signicantly?

Solution 39. Window selection for Blackman-Tukey method


(a) In order to discriminate between the two spectral lines of similar magnitudes spaced at
a distance slightly larger than 1/N , one needs to use a window which provides the best
spectral resolution, that is, the rectangular window (or the unmodied periodogram).
(b) Although using a window with high spectral resolution is benecial, it has its downsides.
Namely, such windows have worse side-lobe attenuation, which in the case of Blackman-
Tukey method means that the spectral lines leak more energy to the surrounding frequen-
cies.
In the problem at hand, if the used window has low side-band attenuation, the spectral
line with signicantly higher magnitude can leak enough energy to the frequencies around
the weaker spectral line as to make the weaker spectral line less pronounced and harder
to detect. In order to reduce the contribution of the more powerful spectral line to the
far-away frequencies, one should use windows which have a better side-lobe attenuation,
such as Hamming, von Hann, variants of Kaiser window etc.
75

Exercise 40. Spectral factorization and estimation


Let {X[n]}nZ be a centered AR process with the power spectral density of the form

b
SX () = , |a1 | < 1, |a2 | < 1, b > 0 ,
(1 + a21 2a1 cos )(1 + a22 2a2 cos )

where a1 , a2 and b are unknown real valued parameters.

(a) Give the canonical representation of the process X[n]

P (z)X[n] = W [n]

(give the whitening lter P (z) and the variance 2 of the noise process {W [n]}nZ).

(b) Give the procedure to determine the parameters a1 , a2 and b of the AR process X[n], and
to estimate the power spectral density SX ().

Solution 40. Spectral factorization and estimation


The power spectral density SX () can be transformed in the following way:

b
SX () =
(1 + a21 2a1 cos )(1 + a22 2a2 cos )
b
=
|1 a1 ej |2 |1 a2 ej |2
b
= .
|(1 a1 ej )(1 a2 ej )|2

(a) The power spectral density of an AR process has the form

1
S() = 2
|P (ej )|2 W
2
where P (z) is the minimum phase whitening lter, and W is the noise variance. Since
|a1 | < 1 and |a2 | < 1, the polynomial P (z) = (1 a1 z )(1 a2 z 1 ) is strictly minimum
1

phase, and since b > 0, we can see that the PSD SX () corresponds to an AR process
P (z)X[n] = W [n], whose whitening lter is given by

P (z) = 1 (a1 + a2 )z 1 + a1 a2 z 2 ,
2
with the noise W [n] having the variance W = b.

(b) Making the substitutions p1 = a1 + a2 and p2 = a1 a2 , we can write P (z) = 1 p1 z 1


p2 z 2 . The parameters p1 , p2 and b can be determined by solving the following Yule-Walker
equations:

X [1] + p2 R
b + p1 R X [2] = R
X [0]
X [0] + p2 R
p1 R X [1] = R
X [1]

p1 RX [1] + p2 RX [0] = R X [2] ,

where R X [1] and R


X [0], R X [0] are the empirical correlation estimates at lags 0, 1 and 2.
76 Chapter 5.

Once the parameters p1 , p2 and b have been determined, the parameters a1 and a2 can be
determined by solving the non-linear system

a1 + a2 = p1
a1 a2 = p2 ,

under the constraints that |a1 | < 1 and |a2 | < 1. Furthermore, the estimated power spectral
density SX () has the form

b
SX () = .
|1 p1 ej p2 e2j |2
Chapter 6

Transforms

Exercise 41. Karhunen-Lo`


eve Transform
(a) Consider an i.i.d. (independent, identically distributed) sequence of normalized Gaussian
random variables X0 , X1 , . . . , XN 1 (E[Xi ] = 0, E[Xi2 ] = 1, for i = 0, 1, . . . , N 1). Dene
a new set of random variables Y = [Y0 , Y1 , . . . , YN 1 ]T as

X0
X1

Y = A .. ,
.
xN 1
where
0,0 0,1 0,N 1
1,0 1,1 1,N 1

A= .. .. ..
. . .
N 1,0 N 1,1 N 1,N 1
is a real square matrix.
Show that the correlation function satises:
N
 1
Ri,j = E[Yi Yj ] = i,k j,k ,
k=0

for i, j = 0, 1, . . . , N 1.
(b) Show that the following equality holds:
N
1 1
1/2
det (A) = i ,
i=0

where i s are eigenvalues of the correlation matrix dened as



r0,0 r0,1 r0,N 1
r1,0 r r1,N 1
1,1
Ry = .. .. .. .
. . .
rN 1,0 rN 1,1 rN 1,N 1

77
78 Chapter 6.

(c) Consider a time sequence of random vectors Y[n] = [Y0 [n], Y1 [n], . . . , YN 1 [n]]T . The KLT
of the random signal Y[n] is obtained as Z[n] = T Y[n], where the rows of the matrix
T are the eigenvectors of the correlation matrix of the signal Y[n] (sorted in descending
order of the corresponding eigenvalues).
Show that the resulting vector coecients Z[n] are uncorrelated. Are they independent?

Solution 41. Karhunen-Lo`


eve Transform

(a) The set of random variables Y is determined as

N
 1
Yi = i,k X[k].
k=0

The correlation function is given by


N
 1 N
 1
Ri,j = E[Yi Yj ] = E[ i,k j,l Xk Xl ]
k=0 l=0
N
 1 N
 1
= i,k j,l E[Xk Xl ].
k=0 l=0

Since the random variables Xi are normalized independent Gaussian variables, we know
that E[Xk ] = 0 and E[Xk2 ] = 1. It follows that E[Xk Xl ] = 0 for k = l. Therefore, we can
write
E[Xk Xl ] = [k l].

The correlation is now given by


N
 1 N
 1
Ri,j = i,k j,l [k l],
k=0 l=0

or, equivalently
N
 1
Ri,j = i,k j,k .
k=0

(b) From (a) we can see that the correlation matrix RY is dened as RY = A AT . If we
apply the det operator, we obtain
$ % 2
det (RY ) = det (A) det AT = (det (A)) . (6.1)

The correlation matrix Ry can also be expressed in terms of its eigenvalues and eigenvectors
as
RY = VY Y VYT ,
where Vy is a matrix, which contains the eigenvectors as columns and y is a diagonal
matrix with the eigenvalues along the diagonal.
79

Similarly, we can write

$ T % N1
1
det (RY ) = det (VY ) det (Y ) det VY = i (6.2)
i=0

because Y is diagonal and det (VY ) = 1. Therefore from (??) and (??), we have
N
1 1
2
det (RY ) = (det (A)) = i .
i=0

It follows that
N
1 1
1/2
det (A) = i .
i=0

(c) The KLT matrix T is given by T = VYT because it contains the eigenvectors of the corre-
lation matrix as the rows. Therefore, we can write

Z[n] = VYT Y[n].


 
The correlation RZ is given by RZ = E Z[n] ZT [n] . It follows
 
RZ = E VYT Y[n] YT [n] VY = VYT RY VY .

From (b) we know that RY = VY Y VYT . Therefore

RZ = VYT VY Y VYT VY = Y ,

because VYT VY = I. The correlation matrix RZ is diagonal and, thus, the variables
Zi [n], for i = 0, 1, . . . , N 1, are uncorrelated. Since the random variables are Gaussian,
uncorrelation is equivalent to independence.

Exercise 42. Using the Karhunen-Lo`


eve Transform in Matlab

(a) Generate an i.i.d. sequence of 5 normalized Gaussian random variables X0 , X1 , X2 , X3


and X4 .
(b) Using the sequence generated in (a) and the results from the theoretical part, generate a
sequence Y[n] of M = 10000 i.i.d. jointly Gaussian random vectors of size N = 5 (the
corresponding signal matrix has the size N M ) with the following correlation matrix:

1.9 0.5 0.3 0.2 0.05
0.5 2.3 0.4 0.2 0.1

RY = 0.3 0.4 1.5 0.9 0.7 .

0.2 0.2 0.9 1.1 0.8
0.05 0.1 0.7 0.8 1.2

Hint: You may use the Matlab function eig to calculate the eigenvectors and eigenvalues.
Y of the generated sequence. How far is it from the
(c) Evaluate the correlation matrix R

specied RY ? Compute Ry for dierent values of M and compare it to RY .
80 Chapter 6.

(d) Derive the KLT matrix T based on the evaluated correlation R Y . Calculate the transform
coecients Z = T Y. Now, evaluate the resulting correlation of the transform coecients
Z . Is it diagonal?
R

Solution 42. Using the Karhunen-Lo`


eve Transform in Matlab

(a) In Matlab:

N = 5; x = randn(N,1);

(b) First, let us generate the Gaussian normalized sequence X[n] of the length M :

M = 10000; x = randn(N,M);
1/2
Now, choose the matrix A, as A = VY Y , where VY and Y are eigenvectors and
eigenvalues of the autocorrelation matrix RY . This ensures that the correlation of the
variables Yi s is given by RY in limit (Exercise 1).

Ry = [1.9 0.5 0.3 0.2 0.05;


0.5 2.3 0.4 0.2 0.1;
0.3 0.4 1.5 0.9 0.7;
0.2 0.2 0.9 1.1 0.8;
0.05 0.1 0.7 0.8 1.2];
[Vy,Ly] = eig(Ry); A = Vy*Ly^0.5; y = A*x;

(c) The correlation of the generated sequence y is evaluated by

Ry1 = (y*y)/M;

y approximates the expected correlation Ry . They are not exactly the


The correlation R
same because of the nite length of the Gaussian sequence x. As the length M grows, the
approximated correlation R y is closer to the expected correlation Ry .

(d) The KLT matrix T contains the eigenvectors of the estimated correlation matrix R y as
rows in descending order of the corresponding eigenvalues. The KLT and the correlation
Rz are calculated by

% Compute the eigenvectors and eigenvalues of the estimated correlation matrix Ry1
[Vy1,Ly1] = eig(Ry1);
% Then, sort the eigenvalues
[Lsorted,I] = sort(diag(Ly1));
% Arrange them in descending order (sort gives ascending order)
I = I(length(I):-1:1);
% Take the corresponding columns from Vy1 and put them as rows in T
T = Vy1(1:N,I);
% Apply the KLT
z = T*y;
% Compute the correlation Rz
Rz = (z*z)/M;
81

The correlation Rz is diagonal. This is expected since we used the estimated correlation
matrix R y . The KLT is obviously signal-dependent, because it is constructed using the
properties of the generated signal.

Exercise 43. Karhunen-Loe ve transform


T
Consider a block of Gauss-Markov rst-order random variables of size 4: X = [X0 , X1 , X2 , X3 ] .
Its correlation matrix is given by:

1 2 3
1 2
RX =
2 1

3 2 1

where is the correlation coecient between two adjacent random variables.


Now take two sub-transforms of size 2 2, namely, KLT of [X0 , X1 ]T and KLT of [X2 , X3 ]T to
T
produce Y = [Y0 , Y1 , Y2 , Y3 ] .

(a) What is the resulting transform?

(b) Calculate the resulting correlation matrix RY .

(c) Calculate the coding gain associated to RY , i.e. the two sub-transforms, and compare with
the coding gain of the KLT.

Solution 43. Karhunen-Lo


eve transform

(a) The autocorrelation of [X0 , X1 ] is


 
1
R[X0 ,X1 ] =
1

The
eigenvectors of this 2 2 symmetric matrix are always v0 = 1/ 2[1, 1]T and v1 =
1/ 2[1, 1]T Using
 
1 1
T = 1/ 2
1 1

as a transform, we get
 
1+ 0
R[Y0 ,Y1 ] =
0 1

The resulting transform to be used is



Y0 1 1 0 0 X0
Y1 1 1 0 0 X1

Y2 = 1/ 2 0 0 1 1 X2
Y3 0 0 1 1 X3
82 Chapter 6.

(b) The resulting RY is obtained by


RY = T RX T T .
Calculations give

1+ 0 /2(1 + )2 /2(1 2 )
0 1 /2(2 1) /2(1 )2
RY =
/2(1 + )2

2
/2( 1) 1+ 0
/2(1 2 ) /2(1 )2 0 1

(c)
1 1
GY = 1/4
=
(3i=0 RY2 (i)) ((1 + )2 (1 )2 )1/4
1 1
GKT L = 1/4
= 1/4
(det Rx ) (1 32 + 34 6 )
Remark that GKT L > GY for any . This is in agreement with the fact that the KLT
maximizes the coding gain.

Exercise 44. KLT of circulant correlation matrices


Let X be a real periodic sequence of period N = 4 with correlation matrix Rx :

1 0.4 0.2 0.4
0.4 1 0.4 0.2
Rx = 0.2 0.4

1 0.4
0.4 0.2 0.4 1

(a) Compute its KLT, that is, the transform T that diagonalizes Rx .

(b) Consider now the DFT matrix SN of size N = 4. Compute SN Rx SN . What do you obtain?
Recall that the DFT can be formulated as a complex matrix multiplication X[k] = SN x[n]
where the DFT matrix SN is given by SN [k, n] = WNkn .
(c) Compare both solutions. What can you conclude?

Solution 44. KLT of circulant correlation matrices

(a) The KLT matrix T is given by the eigenvectors of Rx :



1/2 1/2 1/2 1/2
0 (2)/2 0 (2)/2
T =
 

(2)/2 0 (2)/2 0
1/2 1/2 1/2 1/2

To show that this is indeed a KLT matrix we compute T Rx T T :



0.4 0 0 0
0 0.8 0 0
T Rx T =
T
0
,
0 0.8 0
0 0 0 2
which is a diagonal matrix.
83

(b) The DFT matrix SN [k, n] = WNkn of size N = 4 is given by:



1 1 1 1
1 j 1 j
D= 1 1 1
.
1
1 j 1 j


If we compute now SN Rx SN we obtain:

8 0 0 0
0 3.2 0 0

SN Rx SN =
0
,
0 1.6 0
0 0 0 3.2

which is also a diagonal matrix.


(c) Both transforms T and SN give a diagonal correlation matrix and can be used as a decor-
relation transform. However, the DFT matrix is constant for a given N and much easier
to compute that the KLT matrix. However, the DFT matrix does not always produce the
same results of the KLT. This exercise is a particular case where X is periodic and Rx is a
circulant matrix. The reason is that the DFT matrix diagonalizes ANY circulant matrix.
Therefore, if Rx is a circulant matrix, the DFT matrix is preferable as a decorrelation
transform.

Exercise 45. Karhunen-Lo eve transform


Let Rx be the correlation matrix of a real periodic sequence x of period N .

(a) Calculate Rx for the case when N = 2 and Rx = [1, 0.5]. Find the KLT of x, that is,
y = T x.
(b) Assume next that N = 4 and the correlation matrix Rx of the sequence x is given by

1 0.4 0.2 0.4
0.4 1 0.4 0.2

Rx = (6.3)
0.2 0.4 1 0.4
0.4 0.2 0.4 1

Give a 4 4 matrix T that diagonalizes Rx . What is the resulting correlation matrix Ry ?

Hint: Note that the DFT matrix diagonalizes any circulant matrix.

Solution 45. Karhunen-Loe ve transform


Since x[n] is periodic, the correlation function is also periodic, i.e., Rx [n] = Rx [n+kN ], n, k Z.
Therefore, if we use blocks of N consecutive samples of x[n], we obtain vectors whose correlation
matrix is circulant.

(a) In this case N = 2 and Rx [n] = [1, 0.5] and the correlation matrix is
 
1 0.5
Rx = ,
0.5 1
84 Chapter 6.

and the KLT is the matrix


" #
1 1
H= 2 2 .
1 12
2

Remark that this is also the DFT of size 2 (properly scaled).



(b) In this case, we should use the DFT of size 4 and normalize 4, i.e.,

1 1 1 1
1 j 1 j
H = 12
1 1 1 1
1 j 1 j

we have that Y = H X; therefore,



2
0.8
RY = H Rx H =



0.4
0.8

Exercise 46. Discrete Cosine Transform in Matlab


The KLT is a signal-dependent transform. This property is inconvenient if a signal has to be
transmitted because the receiver needs to know both the transform coecients and the transform
basis vectors. The Discrete Cosine Transform (DCT) is signal-independent and very close to the
optimal KLT in terms of correlation of the transform coecients.

(a) For the jointly Gaussian sequence of vectors derived in Exercise 4 of the numerical part,
calculate the DCT coecients. Hint: Use the Matlab function dct.

(b) Evaluate the correlation matrix of the DCT coecients. How far is it from the KLT
correlation matrix?

Solution 46. Discrete Cosine Transform

(a) The Matlab function dct performs the DCT along columns of the input matrix, thus

z1 = dct(y);

(b) The correlation is calculated similarly as in Exercise 4:

Rz1 = (z1*z1)/M;
85

Notice that the correlation matrix of the DCT coecients is not diagonal, but the diagonal
terms carry most of energy in the transform domain. Comparing the DCT coecient
correlation to the KLT coecient correlation produced in Exercise 4, we can conclude that
the DCT coecients are more correlated. However, the main advantage of the DCT is
xed structure, that is, the basis vectors do not depend on the signal.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy