A Robust Method For Multiple Linear Regression: Technometrics
A Robust Method For Multiple Linear Regression: Technometrics
D. F. Andrews
To cite this article: D. F. Andrews (1974) A Robust Method for Multiple Linear Regression,
Technometrics, 16:4, 523-531
Techniques of fitting are said to be resistant when the result is not greatly altered
in the case a small fraction of the data is altered: techniques of fitting are said to be
robust of efficiency when their statistical efficiency remains high for conditions more
realistic than the utopian cases of Gaussian distributions with errors of equal variance.
These properties are particularly important in the formative stages of model building
when the form of the response is not known exactly. Techniques with these properties
are proposed and discussed.
Downloaded by [71.74.132.239] at 00:05 11 May 2016
maximize a function and many involve operations where a < b < c < d and s defined by
that sequentially treat one variable at a time. s = median l]c, - median {~,)l}.
Non-Gaussian maximum likelihood estimates are
obtained by numerically maximizing a function of In the same reference Andrcws developed a SINE
the parameters. The same method is used in other estimate using
approaches (see Jaeckel 1972). However least-
squares calculations or equivalently Gaussian max-
imum likelihood calculations lead to the solution of
systems of linear equations. These are usually solved
(D=
I(0 ()IZI2 ca
sin I IZl<CB
(3.3)
TABLE l-Asymptotic Variances of the Sine Estimate Compared with that of Two Trimmed Means.
Distributicms
have been Resealed to have Equal Interquartile Ranges
SINE
DISTRIBUTION ESTIMATE TRIMMED MEANS
c = 2.1 10% 25%
Norma 1 1.04 1.06 1.19
Laplace or 1.16
1.38 1.41
double exponential
to finding a local maximum of the function The parameters @may be estimated by the location
?&b(xi - p/s) where q(z) = - d/dz #(z). In this of a local maximum of the function Z$( r,(b)/s(b) )
second form they may be extended to regression where -# is the integral of (3.2) given by
Downloaded by [71.74.132.239] at 00:05 11 May 2016
TARL~ 2-Asymptotic Variance and Eficiency of SineEstimate Relative To the Arithmetic Mean
for Some Distributions
VARIANCE EFFICIENCY
DISTRIBUTION SINE ESTIMATE MEAN VAR(MEAN)/VAR(SINE)
c = 2.1
One possible starting value would be b, = I$LS , n = 20, pl = .15 and pz = .l group L contains
the least-squares estimate of 0. However if the data X(4) , X(5) , . * . , X(8) and group H contains xC13),
is far from Gaussian gLs may be far from the Z(14) , . . . , X(17) from the sorted xc,) together with
global maximum and a distant, local, maximum the associated values of y.
may be encountered. The quantity b is defined in terms of medians
In the location case the median was used as the (me4
starting value. A regression analogue of the median
med ( yK) - med (yK}
is developed in the next section. The estimate b= H
requires much computation but has a relatively med 1~~) - rnld {z,) ’
H L
high ‘(breakdown point”, so that many observations
may be perturbed greatly with only slight changes In the example to follow pl = pz = 0. In this case
occurring in the estimate. See Hampel (1971) for up to 25% of the x’s and/or the y’s may be perturbed
further details on this concept and Andrews et al. arbitrarily far without greatly affecting b. In
(1972 Chapter 5) for a finite sample definition. general 2 - $(pl - pZ) of the x’s and f - +(p, + p2)
of the y’s may be so perturbed.
5. REGRESSION BY MEDIANS The operator R is non-linear and non-idempotent.
The model (4.2) may be written in vector form Repeated operation by R will change the result.
In the least-squares technology the sweep operator
y = X@ + ue = xlpl + . . . + x& + ae
is applied to the independent variables successively
Downloaded by [71.74.132.239] at 00:05 11 May 2016
where xi denotes a column vector of X. We want to and then to the dependent variable. This may be
find an estimate of @ done here. The first variable is used to modify the
remaining k by applying
e = 61 , a2 , . . . , A)
R l.k+l(. . . (R1,dR1,4M))) . . . 1 = M*
with a high breakdown point. Such an estimate may
be defined in terms of the following generalized Then the second variable is used to modify the
(‘sweep” operator R designed to estimate and remove following k - 1 variables by applying R to M*,
the dependence of one variable on another. the result of the previous operation:
The operator is defined on a data matrix M which
R,,,+,(. 1. (Rd’h,dM*))). . .I.
initially contains the raw data,
This process may be continued for all the inde-
M = [X : y].
pendent variables. The operation is non-linear.
Then R, , is defined to operate on the columns of this Typically further iteration is required, the number
matrix by adjusting the jth column by a multiple of iterations depending in part on the number of
of the ith column regressors. The sequence of operations is repeated
m times. This sequence may be represented by the
R,i:Mi+-Mi - bM,
algorithm
where the coefficient b is a function of M, and Mi .
DO Z=ltom
Let x and y denote the columns Mi and M i
respectively. A least-squares sweep operator uses b DO i=ltok
defined by the least-squares regression of y on x,
DO j=i+l to k+l
apply Rii .
The particular robust operator we shall discuss
uses a quantity b defined in the following three Notationally we may express this sequence as
paragraphs.
Two groups may be formed by
i) sorting the data according to 2, , j = 1, . . . , n
ii) setting aside two sets of p,n points each where niC1” denotes repeated operation with i
corresponding to the largest and the smallest increasing. The estimated coefficients may be
Xi calculated conveniently by applying R not to M
iii) setting aside two sets of p,n points each with itself but to the augmented matrix
xi immediately above and below the median
M
{Xi}. M+= .. ,
The remaining points form two groups which will
be denoted by L and H corresponding to those with III I
Low and High values of xi . Thus if, for example, The end result of the above procedure is a set of
s(b,) = median (lr,(bJl) The Gaussian likelihood ratio test of the hypothesis
y = y,, against the alternative y # y0 is based on
starting with b, . In the example to follow the a t statistic to measure the regression of r =
function (4.3) was used with c = 1.5. This corre- r(& + rod) on X’d. The statistic can be written
sponds to a procedure giving less influence to large
residuals than the c = 2.1 case studied in Andrews t = (n - l)‘z(x,‘dr,)/(z(x,d)‘~~,’ - [Z(xL’d~,)]“)4
et al. (1972). The function to be optimized (6.1) is
A robust analogue of this test is based on the
a sum of cosines. Since most of the arguments of
regression of p{r/s@) ) on p(x,d/scI) where 6 is a
the cosines are small, the sum is nearly quadratic.
robust estimate of 0 and where sd = median { 1x,d] ) .
A Fletcher-Powell (1963) type of procedure is
To prevent a small number of points from strongly
efficient for optimizing this sort of function.
affecting the test both variables have been modified.
Alternatively any least-squares program may be
If s is given its asymptotic value, 3F-‘(.75)
used. This follows from noting that the maximum
- F-‘(.25), for symmetric cumulative distributions
of (6.1) satisfies the system of 1cequations
F, the moments of cp(x/s) (where cp is as defined
in (3.2)) arc given in Table 3. The similarity of
g x;i9’(r;(bi>/s(bi-,)} = 0 the even moments suggests that the (F(_Y~/s)may be
combined to form a statistic with a t distribution.
which may be rewritten as
In particular, the ratio pJpZ2 is less than 3, the
Zxikwiri(bJ = 0 (6.2) value of p4/p2’ for normal variables. Gayen (1950)
shows that under these conditions the F test for
where wi2 = $‘(Tj(b<)/S(b,-I)). Since any propor- the ratio of variances is conservative. This suggests
tionality constant may be ignored it is easy to that the t test based on the regression of ‘p{ r/s(p) )
calculate on p(x,‘d/sd) is conservative. The proposed test is
sin [rj(b,)/s(b,-,)/r,(bi~ Iril I 7rc based on the statistic
Wi2 =
10 otherwise
t* = (n - l)t~(p,(d)(ei(r)l(Z~i2(d)~cpiZ(r)
otherwise the system of equations (6.2) is just the
system of weighted least squares equations. Thus the
estimate may be easily calculated by where
v,(d)= dz,‘dld anda,(r) = cp(r,ls@))
i) selecting an initial estimate b”“, where all summations are taken over all i such that
ii) using this estimate to find residuals r(b”‘) IX,‘d/Sd/ < 2.1 and ]r*/s(@)] < 2.1 and wz is the
scale estimate s(O) and weights w”‘, number of such terms. Since this quantity involves
DISTRIBUTION
2
1-12 1J-4 '6 '4"2 ?923
only m terms the significance of t may be conser- grounds for confidence in the present approach.
vatively assessed by comparing it to a t distribution The test is only locally powerful. Extreme departures
with m - 1 degrees of freedom. Efron’s (1969) from the hypothesis may be assessed using a simpler
results, while not exactly relevant, provide further test such as the sign test.
TABLE 4-Data from Operation of A Plant for the Oxidation of Ammonia to Nitric Acid
Downloaded by [71.74.132.239] at 00:05 11 May 2016
1 42 80 27 89
2 37 80 27 88
3 37 75 25 90
4 28 62 24 87
5 18 62 22 87
6 18 62 23 87
7 19 62 24 93
8 20 62 24 93
9 15 58 23 87
10 14 58 18 80
11 14 58 18 89
12 13 58 17 88
13 11 58 18 82
14 12 58 19 93
15 8 50 18 89
16 7 50 18 86
17 8 50 19 72
18 8 50 19 79
19 9 50 20 80
20 15 56 20 82
21 15 70 20 91
FIT
(The estimated standard errors for the robust fits (3), (4) were obtained from the weighted
least-squares procedure described at the end of Section 6.)
with an associated residual root mean square error attention. Fit (2) is the least-squares fit to the data
of 1.12. (Our values for these coefficients differ after the 4 points eventually set aside by Daniel
Downloaded by [71.74.132.239] at 00:05 11 May 2016
slightly from those of Daniel and Wood because of and Wood have been removed from the fitting
differences in our treatment of roundoffs). equation. The probability plot of the residuals,
Most researchers do not have the insight and Figure 2, exhibits only slight anomalies.
perseverance of these authors. However the fitting Fit (3) is a robust fit with c = 1.5. The prob-
procedure described in the previous sections applied ability plot of residuals from this fit, Figure 3,
to the original data yields similar results as we shall identifies the 4 points. Fit (4) is the same fitting
show. If, following the suggestion of Daniel and procedure applied to the data with the 4 points
Wood the variable xl2 is included in the fit the removed. Note that the fit is unaffected by the
residuals are further reduced. 4 points. The probability plot of the remaining
The four fits-two least-squares fits by Daniel residuals, Figure 4, is comparable to Figure 2.
and Wood and two robust fits are summarized in The robust fitting procedure (3) has immediately
Table 5 and Table 6. and routinely led to the identification of 4 question-
Fit (1) is the original least-squares fit. The prob- able points. The fit is independent of these points.
ability plot of residuals from this fit, Figure 1, As seen in Table 6, the coefficients of both robust
suggests that 1 point (21) deserves particular fits (3 and 4) are well within the standard errors
8- B-
6- . 6-
.
4- 4-
.
.**
2-
2: .**
2 0 .**
3 -2 -
@z l ****
.
. .
-4 - -4 -
-6 - -6 -
.
-8 - -8
I, I I lI,IiII I I I II I I IIIIII~ 1’1
1 2 5 ,O 20 ’ 40 ’ 60 ’ 80 90 95 98 99 1 2 5 (0 20 ’ 40 ’ 60 ’ 80 90 95 98 99
30 50 70 30 50 60
PROBABILITY X 100% PROBABILITY X ?OO%
FIGURI~ l-Probability Plot Residuals from Least-Squares FIGURE: Z-Probability Plot Residuals from Least-Squares
Fit of r, , x2 , z3 Fit 4 Points Omitted.
10. ACKNOWLEDGEMENTS
The author is grateful for the many helpful
-4 - comments and suggestions for further investigation
he has received from J. M. Chambers, C. L. Mallows
-6 - and J. W. Tukey. This work was supported in part
by the National Research Council of Canada. The
-I3 -
,,d I IIIIII1 I I !_ referees have made many suggestions helpful in
12 5 10 20 40 60 80 90 95 98 99 the revision of this paper.
30 50 70
PROBABILITY X 100%
REFERENCES
FIGURN 3-Probability Plot Residuals from Robust Fit of
51 , r2 , x3
PI ANDREWS, D. F. (1971). Significance tests based on
residuals. Biometrika 68, 139-148.
Downloaded by [71.74.132.239] at 00:05 11 May 2016