Parameter Estimation in Lapacian Noise
Parameter Estimation in Lapacian Noise
in Laplace Noise
Ta-Hsin Li
Kai-Sheng Song
Department of Statistics
Florida State University
Tallahassee, FL 32306-4330, USA
kssong@stat.fsu.edu
I. I NTRODUCTION
Consider the problem of estimating the frequency, , of a
sinusoidal signal from noisy observations
yt := A cos( t) + B sin( t) + t
(t = 1, . . . , n),
(1)
n1 | yt exp(it )|2 , with i := 1, and the asymptotic variance of this estimator coincides with the Gaussian CRLB.
Analytical and simulation studies show that typical frequency estimation procedures that exist in the literature either
attain the Gaussian CRLB asymptotically (e.g., the NLS
method [2]) or fall short of it (e.g., the signal/noise subspace
methods [16]). The question is: can we do better than the
Gaussian CRLB in non-Gaussian cases?
In this paper, we provide a positive answer to the question.
Toward that end, we first examine the CRLB in non-Gaussian
cases generally and show that the Gaussian CRLB is the
worse-case performance limit, i.e., the largest lower bound,
among a large family of noise distributions. Then, we focus
on the case of Laplace noise and show that the Laplace MLE
attains asymptotically the Laplace CRLB which is only one
half of the Gaussian CRLB.
The Gaussian assumption of the noise is often made
in practice for its mathematical tractability rather than its
goodness of fit to the data. In reality, departures from the
Gaussian assumption can occur in many different forms, one
of which is heavy tails. A heavy-tailed distribution has greater
tail probabilities than what the Gaussian model suggests.
It manifests itself as outliers in the observations that can
cause algorithms developed under the Gaussian assumption
to malfunction. The Laplace distribution has heavier tails than
the Gaussian distribution and therefore is often used to model
heavy-tailed data in the statistical literature. The Laplace
distribution can also be used as a surrogate in developing more
robust algorithms against outliers or in solving problems that
do not have a solution under the Gaussian assumption (i.e.,
blind deconvolution of non-minimum phase systems).
As with the methods of Gaussian maximum likelihood and
periodogram maximization, it is very difficult to compute
the Laplace MLE without an extremely good initial guess,
because the likelihood function is full of local extrema in
symmetry
Let the t be i.i.d. random variables with mean zero, variance 2 > 0, and probability density function p(x). Assume
that p(x) is almost everywhere differentiable with derivative
p(x),
1 2
1
0
2
4B
1
:=
41 A2
2
1
2 + B2 )
symmetry
(A
6
2
{ p(x)}
dx < .
p(x)>0 p(x)
I( ) = Ig ( ).
In this expression, Ig ( ) is the Fisher information matrix under
the Gaussian assumption, i.e.,
1 T
X X,
2
where X := [x1 , x2 , x3 ], x1 := vec[cos( t)], x2 := vec[sin( t)],
and x3 := vec[At sin( t) + Bt cos( t)].
Ig ( ) :=
L( | y) = log p(yt st ( ))
t=1
L
=
A
L
=
B
L
=
p(
t )
p(t ) cos( t)
t=1
n
t=1
n
6B/n2
6A/n2
12/n3
p(
t )
sin( t)
p(t )
p(
t )
t=1
so that
I( ) := E L( | y) T L( | y)
(
)
p(
) 2
= E
(XT X)
p( )
= ( / 2 ) (XT X).
TABLE I
ROLE OF BANDWIDTH PARAMETER
Bandwidth
1 = O n
=0
15 , 12
12 , 1
=
Initial Accuracy
(0)
n
O(1)
O n
O n
O n1
Final Accuracy
n
O n1/2
(1+3
)/2
On
O n(2+ )/2
3/2
On
The gist of the method is the following [19]. For any given
(2 /(1 + 2 ), 2 /(1 + 2 )) with fixed (0, 1), let
yt ( ) be obtained by filtering the data with the 2nd-order IIR
filter H(z1 ) := {1 (1 + 2 ) z1 + 2 z2 }1 , i.e.,
yt ( ) := H(z1 ) yt .
n ( ) =
t=1
(1 + 2 ) {yt1 ( )}2
t=1
Proof. We omit the proof here, except to say that the key
in proving the assertion is to establish that asymptotically,
(m)
Using this function of , a sequence { n } can be obtained
from the accelerated version of fixed-point iteration
(m)
(m1)
(m1)
n := 2n n
n
(m = 1, 2, . . . ),
Rn ( ) := 1 ( + Dn ) 1 ( )
can be approximated in distribution by the random process
R( ) := T z + ( 2/ ) T
(m)
lim n = n = n ( n )
n := arccos( n ),
is produced. The consistency and asymptotic normality of n
as an estimator of can be established [19].
In this algorithm the bandwidth parameter plays an
important role in determining the required accuracy of the
initial values and the final accuracy of the frequency estimator.
The relationship is summarized in Table I.
Based on these results, the following three-step algorithm
(TSA) was proposed in [19] to bring an initial value of
accuracy O(1) to a final estimate of accuracy arbitrarily close
to O(n3/2 ) at the cost of computational complexity O(n log n):
1. Take 1 1 = O(1) to accommodate initial values of accuracy O(1); iterate O(n log n) times to obtain an estimate
of accuracy O(n1/2 ).
2. Take 1 2 = O(n1/3 ) and use the result from Step 1
as the initial value; iterate O(1) times to get an estimate
of accuracy O(n1 ).
3
0.40
0.35
90
TSA
NLS
MLE
CRLBG
CRLBL
0.30
80
1/MSE (dB)
0.20
0.15
FREQUENCY
0.25
70
60
0.10
50
0.05
40
0.0
30
50 100
10
15
20
25
200
300
400
500
600
700
800
900
SAMPLE SIZE
30
NUMBER OF ITERATIONS
NLS, cannot be used to compute the Laplace MLE. Fortunately, there are a plenty of general-purpose algorithms that
do not require the differentiability. The simplex algorithm of
Nelder and Mead [24], available in many software packages
such as Mathematica and R, is such an example. There are also
special algorithms designed for nonlinear regression with the
1 norm. For example, the interior point algorithm proposed
in [25] for nonlinear quantile regression is readily available
for R (the function nlrq in the quantreg package).
Fig. 2 shows the result of a simulation study in the case
of Laplace noise based on 1,000 Monte Carlo trials for each
sample size. The signal and noise parameters are: A = 1, B = 0,
= 0.15 2 , and = 0 dB (normalized for each trial). The
three values of in the TSA procedure are 0.85, 1 n0.6 ,
and 1 n0.9 . The numbers of iterations with these values are
6, 3, and 11 for each trial. The TSA procedure is initialized
with Pronys estimator which is known to be biased regardless
of the sample size and have a standard error O(n1/2 ), so the
initial MSE is merely O(1). Fig. 2 shows that the estimates
from the TSA procedure attain the Gaussian CRLB for all
the sample sizes. It also shows that using the TSA estimates
as initial values to minimize 2 ( ) by a standard optimization
routine (in this case the Nelder-Mead algorithm) does not lead
to an improved accuracy. However, by replacing 2 ( ) with
1 ( ), the initial TSA estimates are improved considerably,
and the final estimates (the Laplace MLE) closely follow the
Laplace CRLB, as predicted by Theorem 2, except for the
smallest sample size n = 50. As expected, the improvement is
V. C ONCLUDING R EMARKS
In this paper we have demonstrated the possibility of
achieving more accurate frequency estimates than the Gaussian
CRLB suggests for sinusoidal signals in non-Gaussian noise.
In particular, we have shown that in the case of Laplace noise
the maximum likelihood estimator, which minimizes the sum
of absolute errors, is able to attain asymptotically the Laplace
CRLB which is 50% smaller than the Gaussian CRLB attained
by nonlinear least squares and periodogram maximization.
In addition to the theoretical findings, we have also proposed
a computational procedure to obtain the maximum likelihood
estimates numerically. The procedure utilizes an iterative algorithm proposed in [19] [20] to produce sufficiently accurate
initial values for standard optimization routines. Owing to the
global convergence property of the initialization algorithm, the
proposed procedure is able to accommodate poor initial values
of accuracy O(1) and produce a final estimator of accuracy
O(n3/2 ) that attains the Laplace CRLB for sufficiently large
sample sizes.
The proposed procedure is also generalized to the case
of multiple sinusoids, provided that the frequencies satisfy a
separation condition [20]. In principle, one can also generalize
the problem by assuming that the noise has a generalized
Gaussian distribution of the form p(x) exp(|x/c| ), where
> 0 is a predetermined constant and c > 0 is the scale
parameter. Note that for Gaussian noise, = 2 and for Laplace
noise, = 1. Under this assumption, the maximum likelihood
estimator of can be found by minimizing ( ) := |yt
(A cos( t) + B sin( t))| . Because special mathematical and
computational tools are needed to handle the general case of
6= 1, 2, this problem deserves further investigation.
R EFERENCES
[1] P. Stoica, List of references on spectral line analysis, Signal Processing, vol. 31, no. 3, pp. 329340, 1993.
[2] A. M. Walker, On the estimation of a harmonic component in a time
series with stationary independent residuals, Biometrika, vol. 58, no. 1,
pp. 2136, 1971.
[3] L. C. Palmer, Coarse frequency estimation using the discrete Fourier
transform, IEEE Trans. Inform. Theory, vol. 20, no. 1, pp. 104109,
1974.
[4] D. C. Rife and R. R. Boorstyn, Single-tone parameter estimation from
discrete-time observations, IEEE Trans. Inform. Theory, vol. 20, no. 5,
pp. 591598, 1974.
[5] E. Aboutanios and B. Mulgrew, Iterative frequency estimation by
interpolation on Fourier coefficients, IEEE Trans. Signal Processing,
vol. 53, no. 4, pp. 12371242, 2005.
[6] P. Stoica, R. L. Moses, B. Friedlander, and T. Soderstrom, Maximum
likelihood estimation of the parameters of multiple sinusoids from noisy
measurements, IEEE Trans. Acoust., Speech, and Signal Processing,
vol. 37, no. 3, pp. 378392, 1989.
[7] D. Starer and A. Nehorai, Newton algorithm for conditional and
unconditional maximum likelihood estimation of the parameters of
exponential signals in noise, IEEE Trans. Signal Processing, vol. 40,
no. 6, pp. 15281534, 1992.
[8] H. Van Hamme, Maximum likelihood estimation of superimposed
complex sinusoids in white Gaussian noise by reduced effort coarse
search (RECS), IEEE Trans. Signal Processing, vol. 39, no. 2, pp. 536
538, 1991.