0% found this document useful (0 votes)
26 views7 pages

Robust Estimators (By Lax 1980)

Uploaded by

nicholasfa0120
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views7 pages

Robust Estimators (By Lax 1980)

Uploaded by

nicholasfa0120
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Robust Estimators of Scale: Finite-Sample Performance in Long-Tailed Symmetric

Distributions
Author(s): David A. Lax
Source: Journal of the American Statistical Association , Sep., 1985, Vol. 80, No. 391
(Sep., 1985), pp. 736-741
Published by: Taylor & Francis, Ltd. on behalf of the American Statistical Association
Stable URL: https://www.jstor.org/stable/2288493

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to
digitize, preserve and extend access to Journal of the American Statistical Association

This content downloaded from


113.211.181.80 on Thu, 03 Mar 2022 08:32:30 UTC:34:56 UTC
All use subject to https://about.jstor.org/terms
Robust Estimators of Scale: Finite-Sample
Performance in Long-Tailed Symmetric Distributions
DAVID A. LAX*

This article presents the results of a Monte Carlo studyThis


of the
article describes these 17 estimators and presents the results
robustness of scale estimates in the presence of long-tailed, of the high-precision Monte Carlo study.
symmetric distributions. The article examines the performance
of several families of estimates in samples of size 20 from 2. SCALE AND SCALE ESTIMATION
several distributions. The family of A-estimators, finite sample
approximations to the asymptotic variance of M-estimators ofAn estimator is considered to be a scale estimator if it satis
location, appears to be more robust than the sample standard three properties: (a) the scale of a sample is nonnegative and
deviation, the median absolute deviation from the median (MAD),is zero only when all the sample observations are identical; (b)
trimmed standard deviations, and M-estimators of scale. The the scale is invariant to additive shifts in the location of the
most successful A estimator uses the biweight weighting func-sample; and (c) the scale is transparent to multiplicative shifts.
tion, which is also the basis for high-performance robust lo- Thus, if X is the vector of observations and a and b are scalar
cation estimates. constants, then S(X) is a scale estimator if

KEY WORDS: Robustness; Scale estimation; M-estimators;


S(a + bX) = IbIS(X) ? 0; (2.1)
Biweight. with equality only if all the elements of X are identical.
This article seeks robust estimators among this class of es-
timators. The article does not define the scale of a distribution
1. INTRODUCTION
and then seek robust estimators of this newly defined charac-
This article presents the results of a Monte Carlo study of of a distribution, because it seems difficult to define a
teristic
the robustness of scale estimators in the presence of errors that
single natural estimand that represents the scale of a distribu-
follow long-tailed, symmetric, unimodal distributions. The
tion. ro-
Unlike the problem of estimating the location of a uni-
bustness of scale estimates is of concern both for describing
modal, symmetric density, there does not appear to be a single
the dispersion of a sample and for confidence statements and
characteristic of a distribution that implies a useful and complete
hypothesis tests about location parameters and regression coef-
ordering of all distributions according to their scale. Two simple
ficients, which rely on estimates of the dispersion of residuals
observations illustrate this difficulty in defining scale.
from a fitted model.
The standard deviation, the most common of scale estimates, 1. The family of densities f(x I y, a) = cv-f0((x - p)/u)
is naturally ordered according to scale by a. The same ordering
is notably nonrobust to slight deviations from normality. For
technique does not hold, however, when comparing distribu-
example, Tukey (1960) noted that when the data follow a unit-
tions from two families because the scale parameter in each
normal distribution contaminated by a N(0, 9) distribution, the
family is defined only up to a constant of proportionality.
standard deviation has lower asymptotic relative efficiency than
2. Measures of the informativeness of a distribution such as
the mean absolute deviation (the mean of the absolute values
the variance, log variance, entropy, or the rth mean might seem
of the deviations of each observation from the sample mean)
to provide reasonable definitions of scale, but do not exist for
when the percentage of contamination exceeds .18%. Thus a
all plausible distributions. Moreover, the ordering implied by
variety of robust scale estimators for univariate, unimodal, sym-
each is contradicted by the appealing and relatively weak partial
metric distributions have since been proposed (see Huber 1964,
ordering suggested by Rothschild and Stiglitz (1970): One dis-
Gross 1976, Lax 1975a, b, Harter et al. 1979, De Wet and van
tribution is more uncertain than another with the same mean if
Wyck 1979, and Lemmer 1979). In a review of work on robust
all risk-averse von Neumann-Morgenstem utility functions pre-
scale estimators, Iglewicz (1982) combined the Monte Carlo
fer the latter distribution. Because a decision maker with a
results of Gross (1976) and Lax (1975a) with a number of new
utility function that is not quadratic will prefer some distribu-
results.
tions with higher variances to others with the same mean but
Lax (1975a, b) performed a Monte Carlo study of more than
lower variances, for example, the variance does not order dis-
150 scale estimators. Seventeen of these estimators were se-
tributions according to their uncertainty in a completely ap-
lected because they were either promising or commonly used.
pealing way (also see Bickel and Lehmann 1976, 1979, and
The behavior of this subset was studied with greater precision.
Oja 1981).

* David A. Lax is Assistant Professor, Business Administration, Harvard This article thus considers scale estimators that satisfy Equa-
Business School, Boston, MA 02163. John Tukey, Gary Simon, and David tion (2.1).
Pasta made major contributions to this research. The author also thanks David
Donoho, Paul Velleman, David Hoaglin, Frederick Mosteller, and James Se-
benius. He gratefully acknowledges financial support from Army Research ? 1985 American Statistical Association
Office Grant DAHC04-74-0178 and National Science Foundation Grant SOC- Journal of the American Statistical Association
75-15702. September 1985, Vol. 80, No. 391, Theory and Methods

736

This content downloaded from


113.211.181.80 on Thu, 03 Mar 2022 08:32:30 UTC
All use subject to https://about.jstor.org/terms
Lax: Robust Scale Estimators 737

3. DESIGN OF A STUDY OF FINITE- tributions of the logs of scale-estimators, I also examine several
SAMPLE PERFORMANCE pseudovariances of the sampling distributions. The 100p%
pseudovariance is defined as the variance of the normal distri-
3.1 Evaluation Criteria
bution that has the same 100(1 - 2p)% interquartile distance.
To assess an estimator's robustness to long-tailed symmetric Formally, if Fn is the empirical cumulative distribution of a
noise, Tukey (see Hoaglin et al. 1982) proposed evaluating an sample of size n, and FD is the standardized normal distribution
estimator's performance for three distributions, which he calls function, the 100p% pseudovariance Vp* is defined to be
"the three corners," that in his opinion "span" the space of
distributions of concern. In addition to the unit-normal distri- VP* = (n n (1) - (r)) (3.2)
bution, he proposed one distribution with consistently long tails
and one with potentially erratic tail behavior. This article ex- The pseudovariances provide two kinds of informat
amines the performance of scale estimators in these three dis- whether an estimator's sampling distribution is longer
tributors: tailed than a normal distribution, and (b) whether the
a smooth shape or whether there is erratic tail behavior (that
1. normal-observations follow a normal distribution with
might yield a few wild values in a large sample). (See Andrews
mean 0 and variance 1.
et al. 1972.) For a normal distribution, the pseudovariances for
2. slash-observations follow the same distribution as NI
selected p will be constant and equal to the variance. The l00p%
U, where N - N(O, 1) and U - U(0, 1), where U is independent
pseudovariances of a long-tailed distribution will increase as p
of N and U(0, 1) is a uniform density on (0, 1); this distribution
decreases; the pseudovariances of a short-tailed distribution will
has the consistently long tails of a Cauchy distribution.
decrease as p decreases.
3. one-wild-in a sample of size 20, 19 points will be drawn
Because the pseudovariances ignore the most extreme values,
from N(O, 1) and one point will be drawn from N(O, 100); thus
they distinguish between smooth and erratic tail behavior. An-
all samples of size 20 from this distribution have one potentially
drews et al. (1972) noted that when the tails of a symmetric
wild point in 20.
distribution are smooth, the 4.2% pseudovariance will approx-
A reliable scale estimator should give similar estimates over imately equal the variance. When the tail behavior is erratic
repeated samples from a distribution; that is, the estimator's and a large sample contains a few extreme points, the variance
sampling distribution should have small variation and display will be inflated much more than the pseudovariances. Thus,
smooth tail behavior. For estimators whose sampling distri- for long-tailed sampling distributions, one might infer that the
butions have smooth tail behavior, I prefer those with the small- tail behavior is smooth if the variance is between, say, the 1%
est variability. The variance of the scale estimator's sampling and 10% pseudovariance and that the tail behavior is erratic if
distribution is, itself, an inappropriate measure of variability. the variance exceeds the 1 % pseudovariance.
A scale estimator S provides the same ordering of samples as The performance of an estimator across the three "corner"
the scale estimator kS, where k is an arbitrary positive constant; distributions is measured by its worst-case performance. If EN,
yet the variance of kS is k2var(S). Because var[ln(S)] is unaf- ES, and Ew represent the efficiencies of the log of an estimator
fected by the scaling of the estimator, estimators are compared under the normal, slash, and one-wild densities, the estimator's
using the variance of the log of the estimate. Because the dis- worst-case performance, or following Tukey, the triefficiency
tribution of a scale estimate is likely skewed to the right, the is the minimum of the efficiencies over the three distributions,
log transformation will also have a symmetrizing influence. It min{EN, ES, EW}. Estimators with smooth sampling distribu-
is worth noting that other scale-free measures of variation were tions are ranked according to their triefficiencies.
also used by Lax (1975b) and produced the same ranking of Some estimators will dominate others. That is, estimator A
estimates as var[ln(S)]. For the purposes of brevity, the re- dominates estimator B if estimator A is more efficient in all
mainder of the article refers to the variance of the log of thethree distributions than estimator B. A dominated estimator can
estimate as the variance of the estimator or of its sampling be discarded unless it has other advantages.
distribution.
Let Vmin be the smallest known variance of (the log of) a
3.2 Monte Carlo Calculations
scale estimator in repeated samples from a distribution. Then
an estimator with variance V has conditional relative variance The finite sample behavior of scale estimators under a gi
efficiency distribution for the data is estimated by approximating the s
pling distribution of the (log of the) estimator using Mon
E = I00 X Vmin/V. (3.1)
Carlo methods. A random sample of size 20 is drawn from
For some distributions, the minimum datapossible
distribution, and the scale
variance is estimate
known. is computed for
For other distributions, the minimum estimator. One thousand
possible variance draws fromnot
may the normal distribu
be known. Bounds like the Cramer-Rao lower bound are sel- and 640 draws each from the slash distribution and the o
dom sharp, and we only know the minimum variance from wild distribution provide a good approximation of the samp
among the estimators we have considered. distributions of each estimator for each of the three distr
To protect against the nonrobustness of the variance as a tions.
measure of dispersion-which is after all the purpose of this Because the Monte Carlo calculations use a swindle or vari-
study- and to examine the tail behavior of the sampling dis- ance-reduction technique, the estimated variances and pseu-

This content downloaded from


113.211.181.80 on Thu, 03 Mar 2022 08:32:30 UTC
All use subject to https://about.jstor.org/terms
738 Journal of the American Statistical Association, September 1985

dovariances are considerably more accurate


= x(i) - M2(X) than
be the deviation those
from M2(X). one
To compute a migh
obtain from a simple Monte Carlo study with 1,000 or 640 Gaussian skip, first normalize the deviations d(i) by dividing by
draws. Simon (1976) described the swindle, and Lax (1975b) the expected value of x(i), assuming that each xj - iid N(O, 1)
presented details of the swindle calculations. for j = 1, . . ., n. The normalized deviation is
The random numbers used were generated by Andrews et al.
(1972), who explained how the random numbers were gener- gi = d(i)IE(x(-) I xj - N(O, 1)). (4.4)
ated. All of the Monte Carlo computations were performed in If M,() is another measure of location, a Gaussian skip is
double-precision arithmetic on an IBM 360/91 computer. defined as

4. THE ESTIMATORS SG = (M1(gk, gk, . . .9 , gn))l/k, (4.5)

Some of the estimators tested in exploratory Monte Carlo fork> 1.


studies (Lax 1975a) were based on functions of the order sta- The Gaussian skip evaluated in this article has k = 2. The
tistics of a sample. Others were derived from nonparametric location estimator used for Ml and M2 was the robust biweight
confidence intervals or hypothesis tests. Others derived from estimator of location (see Mosteller and Tukey 1977), which
the Princeton robustness study (Andrews et al. 1972). I selected is defined as follows: If X is a sample with median m and
the following 17 of these estimates, either because they were median absolute deviation from the median MAD and c is a
in common use or because their performance in one or more positive constant, let ui = (xi - m)l(c MAD) be normalized
distributions was promising. deviations. Then the biweight estimator of location is defined
as
4.1 The Sample Standard Deviation
n n

For a vector of sample observations X = {xl, x2, . * ., Mbi(X) = E XiWbi(Ui) E Wbi(Ui), (4.6)
i=1 i=1
with average xi, let di = xi - x be the deviations of xi from
the mean. The sample standard deviation equals with biweight weighting fun

/n 1 /2
Wbi(U) = (1 - U 2)2
S= d?/ (n - 1)). (4.1)
= 0 otherwise. (4.7)

4.2 The Trimmed Standard Deviation 4.5 M-Estimates of Scale

A two-sided p% trimmed mean, M2,pf(), is obtained by sort- Let T be an estimate of the center of a sample X, and
ing the observations, temporarily setting aside the [pnl2] be
small-
some function. Let ui = (xi - T)IS be the normalized
est observations and the [pnl2] largest observations (whereobservations.
[q] Huber (1964) suggested solving the following
means the greatest integer part of q), and computing the arith-
equation for S:
metic average of the remaining observations. A one-sided r%
I n
trimmed mean, Mlir( ), is obtained by sorting the observations, 1 v 2(ui) = E[ V 2(Z) I Z -
temporarily setting aside the [rn] largest observations, and com- n - 1 i=1
puting the arithmetic average of the remaining observations.When ,v(u) = u and T = xi, the right-hand side of (4.8)
Let M2,p(X) be the p% two-sided mean of the vector of sample
equals 1 and the sample standard deviation is the solution to
observations X. Let di = xi - M2,p(X) be the deviations of theIn other words, ,v(u) = u implies that all the squared
(4.8).
xi from M2,p(X). By analogy to (4.1), a trimmed standard de-
deviations receive equal weight no matter how large they are.
viation is defined as Huber suggests a function VH that limits the influence of
Strim = (Mi,r(db d2, . . . d2))12. (4.2) points far from the estimated center T of the sample. The Huber
V/H function is
The trimmed mean evaluated in the study uses a 20% two-sided
trimmed mean and a 20% one-sided trimmed mean; that is, V/H(U) = -b, u < -b
p = r = 20.
=u, u ? - b
4.3 The MAD
=b, u > b, (4.9)
The median absolute deviation from the median, called the
for some b > 0. The function
MAD, is a common resistant measure of scale (see Mosteller
deviations of observations th
and Tukey 1977). Let m be the median of {xI, x2, . . ., x,,}
is, observations whose normalized deviations are bigger in
and let di = Ixi - ml be the absolute deviation of xi from the
magnitude than b-are not included fully in the sum in (4.8);
median. Then, the MAD is defined by the median of the ab-
their influence is limited because their contributions are set to
solute deviations di from the median,
b2 rather than u,?.
MAD = med{dl, d2, * , dn}l (4.3) When ,u is monotone, a solution to (4.8) will be unique if
it exists. For nonmonotone ,u, if one solution to (4.8) exists,
4.4 The Gaussian Skip
there will usually be two positive solutions. A good starting
Let x(f) be the ith order statistic in a sample of n. Let M2(X)
guess SO should lead, in the case of the Huber estimator, to the
be a measure of location evaluated at the sample X, appropriate
and let d(i)solution. Nonetheless, Equations (4.8) and (4.9)

This content downloaded from


113.211.181.80 on Thu, 03 Mar 2022 08:32:30 UTC
All use subject to https://about.jstor.org/terms
Lax: Robust Scale Estimators 739

may have negative solutions. Thus when one intends to use a 4.6.1 The BiweightA Estimator. The bisquare V function
scale estimator in an automatic fashion as part of a larger al- is Vbi(u) = UWbi(U), where Wbi(u) is the biweight weighting
gorithm, the Huber scale estimator may be an unsuitable choice. function given by (4.7). Substituting Ybi into (4.14) and ma-
Equation (4.8) can be solved iteratively. If we replace the nipulating yields
right-hand side of (4.8) by 1, take T to be the median, take S0 _ _ ~~~~~~~~~~1/2

to be the MAD, and for the kth iteration, define Uik = (xi - (xi- T)2(1 U U2)4
T)/Sk_ 1, Newton-Raphson iteration gives
Sbix = n" Liuj <1 (4.15)
in (n- 1)1/2 | (1 - u2)(1 -5,2)
S |~1(l E l/(Uik) - (n 1) lu,l < 1

Sk = Sk-l I + i=) . (4.10)


4.6.2 The Modified Biweight A Estimator. Noting that
2 E Uik(Uik)V 1(Uik)
substituting Vbi(U) = Wbi(U) into (4.14) can be written as
_ n_S0 [ s' uw.u.~2l 1/2
The article evaluates both an estimator that iterates the calcu-
ncSo~~
lation of (4.10) to convergence and a one-step (UiWbi(U i))2
estimator with
no iteration. Sbi- 1)1/2 _uI < 1 \lb<, - , (4.16)
(n E Wbi(Ui) + Wbi(Ui)Ui
4.6 A-Estimators of Scale ju,l < I

If xi - iid N(0, a2) andx = n I l7=l xi, then as n > cc,


I created a modified biweight estimator by approximating the

(n var(x))1'2 -> a. (4.11) sum in the denominator by assuming that w1i(u,)ui u 0. Thus
the modified biweight estimator is a weighted average of the
Thus the variance of the location estimator x can serve as a squared deviations from T,
scale estimator.
_ _ ~~~~~~~~~1/2
An A estimate of scale is defined, analogously to (4.11),
E ((Xi - T)2Wbi(Ui))2
from the asymptotic variance of a robust estimator of location.
The robust M estimate of location, given a scale estimate S, a (n- 1)1/2 > Wi(U1)
positive constant c, and some function qV, is defined to be the
Iulj < I
solution Tn of the following equation (Huber 1964):
n 4.6.3 The Sine A Estimator. Gross (1976) used an A es-
V((xi - TJ)/S) = 0. (4.12) timator of scale with c set to 2.1, T chosen to be the median,
So chosen to be the MAD, and
By analogy to (4.11), under appropriate regularity conditions,
qi(u) = sin(u), lul s 7r
as n -> oo,
= 0, otherwise. (4.18)
(n var(Tn)) 1/2 > (A ,,(T, F)) 1/2, (4.13)
Thus the sine A estimator is
where A,(T, F) is the asymptotic variance of the M estimator
n2. ISO
based on the function Vi and with the data following distribu-
tion F. Sli (n- 1)1/2
Setting T to be an estimate of the location of the sample, So
to be an estimate of the scale of the sample, c to be a positive
constant, and ui = (xi - T)/cSO, it is not difficult to derive - uJl s Xt 1u, < 7X
the following finite sample approximation to A,W(T, F) (for
example, see Gross 1976): x ([ sin2(ui)J cos(u,) ) . (4.19)
4.6.4 The Modified Sine A Estimator. I modified
A estimator by inserting an arctangent transform
attempt to symmetrize the ratio in (4.19). The m
S2 c _= n(CS)2_ 2(Ui) u i) (4.14)
estimator is

I shall call S#,C an A estimator of scale. I set T to be the median


of the sample and So to be the MAD of the sample for each A
estimator of scale evaluated in the Monte Carlo study.
Sms = ( n2 2 iLtangent-l I Ls i 1/2 (4.20)
When qi(u) = u, the A estimator of scale reduces to the
sample standard deviation. The factor nl(n (n
- 1) is 112 to
inserted cos(u,)
lutl c nf /
obtain the asymptotically correct estimate.
A-estimators using a number of different 4.6.5 Redescending
V functions were v Functions. Th
evaluated in the exploratory Monte Carlo study (Lax 1975a, V1H given by (4.9) is nondecreasing. A
b). The remainder of this section discusses the A-estimators redescending if q(u) -*0 as u -o o. The f
that performed best in the exploratory study and thus that are biweight and sine A-estimators are thu
included in the high-precision study. These estimators all out- descending estimators, observations tha
performed A-estimators based on the Huber vi function. estimated center T have minimal or no in

This content downloaded from


113.211.181.80 on Thu, 03 Mar 2022 08:32:30 UTC
All use subject to https://about.jstor.org/terms
740 Journal of the American Statistical Association, September 1985

Table 1 Variance Efficiencies for Selected Estimators: are defined in terms of the variance of the log of an estimator
Monte Carlo Estimates in Samples of Size 20 in repeated samples of size 20 from a distribution.
Seven estimators are undominated: the biweight A-estimators
Efficiency
with c = 9 and c = 10, the modified biweight A estimator,
Estimator Normal One-Wilda Slashb Triefficiencyc the modified sine A estimator, the iterated Huber M estimator
with b = 1.4, the sample standard deviation, and the trimmed
A-Estimators (Vu function)
Bisquare (c = 6) 65.2 77.1 90.1 65.2 standard deviation. Of these, only three estimators performed
Bisquare (c = 7) 74.8 82.9 89.3 74.8 well across the three distributions; the triefficiencies of the
Bisquare (c = 8) 81.8 85.4 87.6 81.8
biweight A-estimators with c = 9 and c = 10 and the modified
Bisquare (c = 9) 86.7 85.8 86.1 85.8d
'Bisquare (c = 10) 90.0 84.8 84.6 84.6d sine A estimator exceed 82%, whereas the triefficiencies of the
Modified bisquare (c = 6) 47.5 56.8 96.8 47.5d other four undominated estimators fall below 50%.
Sine (c = 2.1) 77.5 83.7 88.4 77.5
The biweight with c = 9 has the largest triefficiency (85.8%).
Modified sine (c = 2.1) 82.1 89.6 94.5 82.1d
M-Estimators (Huber u function) The modified sine A estimator, which dominates the sine A
b = 1.4 (iterated) 48.1 56.8 100.0 48.1d estimator, can probably reach the same or better levels of per-
b = 1.7 (iterated) 72.3 83.8 83.8 72.3
formance by raising the scaling constant c above 2.1. Raising
b = 1.4 (one-step) 55.2 68.1 86.8 55.2
b = 1.7 (one-step) 60.5 71.8 83.1 60.5 the scaling constant gives positive weight to more of the ob-
b = 2.0 (one-step) 69.8 76.1 75.9 69.8 servations and thus should improve the estimator's performance
Sample Standard Deviation 100.0 10.9 - e _e
in the normal and one-wild distributions while hurting perform-
Trimmed Standard Deviation 89.9 100.0 28.1 28.1d
MAD 35.3 41.5 91.8 35.3 ance in the consistently long-tailed distribution.
Gaussian Skip 54.7 59.3 90.1 54.7 The sample standard deviation performed quite poorly in both
a In samples of size 20 from a one-wild long-tailed19
distribution, distributions.
data points The are
trimmed standard
drawn from deviation
N(0, 1)suc-
cessfully
and the remaining point is drawn from N(O, 100). protects against the one potentially wild value in 20
b The random variable Z = XIY follows the slash distribution if X - N(0, 1) and Y - U(0, 1).
c An estimator's triefficiency is the smallest ofand
itsloses little (90% efficiency)
efficiencies over thein the normal
three distribution, but
distributions.
d The estimate is undominated. its performance deteriorates to 28% efficiency in the Cauchy-
e The slash distribution has Cauchy tails. Thus the variance of the standard deviation should
be infinite and the efficiency should be zero. tailed slash distribution. The MAD and Gaussian skip protect
successfully
NOTE: The variance of an estimator used to compute against the
efficiency isCauchy-tailed
the variance slashof
distribution, but
the sampling
distribution of the log of the estimator. The variance efficiency of an estimator is the ratio of the
they estimator's
smallest known variance in a distribution over the are relatively inefficient
variance ininthe other
that two distributions.
distribution. MADThe
is median absolute deviation from the median. biweight estimator with c = 9 is more than twice as efficient
(in terms of triefficiency) as the MAD.
whereas with nondecreasing functions, outlying observations The redescending biweight and sine A estimator outperform
always have some influence on the estimate. M-estimators that use the nondecreasing Huber ,v function.
Do we want to ignore outlying observations altogether? In Only the iterated Huber M estimator with b = 1.4 was un-
the exploratory Monte Carlo study, A-estimators with rede- dominated; the highest triefficiency among the Huber M esti-
scending qi functions outperformed A-estimators with nonde-mator (iterated, b = 1.7) was 72.3%. As Section 4.6 mentions,
creasing V' functions. This dominance suggests that we mayredescending A-estimators also dominated A-estimators with
indeed wish to ignore extreme outliers completely. The com- Huber functions.
parison in the next section between nondecreasing M-estimators The superiority of redescending qi functions over nonde-
and redescending A-estimators in the high-precision Monte Carlocreasing functions suggests that robust scale estimators should
study also speaks to this question. completely ignore outlying observations. Many robust location
estimators including the biweight location estimator also give
5. RESULTS
extreme outliers zero influence.
Table 1 presents the efficiencies of the scale estimators de- How far away from the center of the sample must an obser-
scribed in Section 4. The efficiencies described in Section 3.1 vation be before we call it an outlier and give it zero weight?

Table 2. Selected Pseudovariances Divided by the Variance of the Logarithm of the Estimator

Distribution

Normal One-Wilda Slashb

Estimator .1% 1% 4.2% 10% 25% .1% 1% 4.2% 10% 25% .1% 1% 4.2% 10% 25%

Biweight A estimator
(c = 9) 1.12 1.07 1.04 1.02 1.01 1.11 1.07 1.05 1.03 1.02 1.06 1.03 1.00 .96 .93
Biweight A estimator
(c = 10) 1.10 1.07 1.04 1.03 1.02 1.10 1.07 1.05 1.03 1.02 1.08 1.04 .99 .96 .93
Modified Sine
A estimator 1.10 1.07 1.04 1.02 1.01 1.08 1.06 1.05 1.04 1.03 1.02 1.00 .98 .97 .95
MAD 1.05 1.04 1.01 1.00 .98 1.00 1.03 1.02 1.01 1.00 1.04 1.00 .98 .98 .96

a In samples of size 20 from the one-wild distribution, 19 points are drawn from N(O, 1) and one
b The random variable Z = XI Y follows the slash distribution if X - N(0, 1) and Y V U(0, 1).
NOTE: MAD is median absolute deviation from the median.

This content downloaded from


113.211.181.80 on Thu, 03 Mar 2022 08:32:30 UTC
All use subject to https://about.jstor.org/terms
Lax: Robust Scale Estimators 741

Should scale estimators ignore fewer or more outlying obser- of each estimate is not substantially greater than its pseudo-
vations than location estimators? variances in all three distributions, we can infer that the sam-
Both the results of this Monte Carlo study and theoretical pling distribution of the estimate is consistently rather than
calculations suggest that scale estimators should ignore fewer erratically long-tailed.
points than location estimators. The biweight location estimator In summary, A-estimators of scale are more robust in sym-
achieves the best balance across distributions when c = 6 (see metric long-tailed distributions than the other estimators stud-
Mosteller and Tukey 1977). Table 1 shows that the biweight ied. A-estimators can be used without monitoring (unlike M-
scale estimator achieves the highest triefficiency with c = 9. estimators) because they always yield positive values. Finally,
A higher scaling constant c means that more points influence redescending V functions such as the biweight and sine out-
the estimate; in other words, the best biweight scale estimator perform nonredescending V/ functions such as the Huber v
uses more of the sample than the best biweight location esti- function.
mator. [Received April 1982. Revised February 1985.]
A simple calculation supports this conclusion. The Fisher
information about a parameter p contained in the distribution REFERENCES
g(x I p) can be written as
Andrews, D. F., Bickel, P. J., Hempel, F. R., Huber, P. J., Rogers, W. H.,
and Tukey, J. W. (1972), Robust Estimates of Location: Survey and Ad-
I, = E[hp(x)], (5.1)
vances, Princeton, NJ: Princeton University Press.
Bickel, P. J., and Lehmann, E. L. (1976), "Descriptive Statistics for Non-
where hp(x) is the score for p, and
parametric Models: III-Dispersion," The Annals of Statistics, 4, 1139-
1158.
hp(x) = (alap) ln g(x I p). (5.2) (1979), "Descriptive Statistics for Nonparametric Models: IV-
Spread," Contributions to Statistics, Jaroslav Hajek Memorial Volume, ed.
If x follows the distribution f (x I ,u, a) = a I'fo((x - /a),
Jana Jureckova, Prague: Academia, pp. 33-40.
then the score for location hP is De Wet, T., and van Wyk, J. W. J. (1979), "Efficiency and Robustness of
Hogg's Adaptive Trimmed Means," Communications in Statistics, Part A-
hp (x) = (ad/d) ln f(x j u, a). (5.3) Theory and Methods, 8, 117-128.
Gross, A. M. (1976), "Confidence Interval Robustness With Long-Tailed Sym-
One can easily show that the score for scale is metric Distributions," Journal of the American Statistical Association, 71,
409-416.
ha(x) = (a/da) ln f(x a a) = [(x - u) a]h- 1/a. Harter, H. L., Moore, A. H., and Curry, T. F. (1979), "Adaptive Robust
Estimation of Location and Scale Parameters of Symmetric Populations,"
(5.4) Communications in Statistics, Part A-Theory and Methods, 8, 1473-1491.
Hoaglin, D. C., Mosteller, F., and Tukey, J. W. (eds.) (1982), Understanding
Comparing I,, = E[h2(x)] and I, = E[h2(x)], Equation (5.4)Robust and Exploratory Data Analysis, New York: John Wiley.
shows that the information about scale includes a term in Huber, P. J. (1964), "Robust Estimation of a Location Parameter," The Annals
of Mathematical Statistics, 35, 73-101.
((x - 2
Iglewicz, B. (1982), "Robust Scale Estimates," in Understanding Robust and
Exploratory Data Analysis, eds. D. C. Hoaglin, F. Mosteller, and J. W.
E[((x- ,u) /a)2h2(x)].
Tukey, New York: John Wiley.
Lax, D. A. (1975a), "An Interim Report of a Monte Carlo Study of Robust
Thus the extreme observations contribute substantial informa-
Estimates of Width," Technical Report 93 (Ser. 2), Princeton University,
tion about scale-and relatively more about scale than about Dept. of Statistics.
location. We therefore expect that robust scale estimators should (1975b), Robust Estimators of Widths in Long-Tailed Symmetric Dis-
tributions: Performance in Small Samples, unpublished A.B. thesis,
ignore less of the sample to attain efficiency. This intuition is
Princeton University, Dept. of Statistics.
consistent with the Monte Carlo results. Lemmer, H. (1979), "A Robust Estimate of Spread," South African Statistical
As Section 3.1 mentions, the pseudovariances allow us to Journal, 13, 121-126.
Mosteller, F., and Tukey, J. W. (1977), Data Analysis and Regression: A
examine the shape of the sampling distribution of the estimators.
Second Course in Statistics, Reading, MA: Addison-Wesley.
Table 2 presents the pseudovariances of the log estimator di- Oja, H. (1981), "On Location, Scale, Skewness, and Kurtosis of Univariate
vided by the variance of the log (estimator) for the biweight Distributions," Scandinavian Journal of Statistics, 8, 154-168.
Rothschild, M., and Stiglitz, J. (1970), "Increasing Risk I: A Definition,"
A-estimators with c = 9, for c = 10, for the modified sine A
Journal of Economic Theory, 2, 225-243.
estimator, and as a comparison, for the MAD. Because the Simon, G. (1976), "Computer Simulation Swindles, With Applications to
100p% pseudovariances increase as p decreases for all four Estimates of Location and Dispersion," Applied Statistics, 25, 266-274.
Tukey, J. W. (1960), "A Survey of Sampling From Contaminated Dis-
estimators in all three distributions, we can infer that the sam-
tributions," in Contributions to Probability and Statistics, eds. I. Olkin, S.
pling distribution of (the log of) each estimator is long-tailed. Ghurye, W. Hoeffding, W. Madow, and H. Mann, Stanford, CA: Stanford
This "long-tailedness" is not problematic; because the variance University Press.

This content downloaded from


113.211.181.80 on Thu, 03 Mar 2022 08:32:30 UTC
All use subject to https://about.jstor.org/terms

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy