0% found this document useful (0 votes)
141 views10 pages

A Robust Method For Multiple Linear Regression: Technometrics

Uploaded by

Sachin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views10 pages

A Robust Method For Multiple Linear Regression: Technometrics

Uploaded by

Sachin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Technometrics

ISSN: 0040-1706 (Print) 1537-2723 (Online) Journal homepage: http://amstat.tandfonline.com/loi/utch20

A Robust Method for Multiple Linear Regression

D. F. Andrews

To cite this article: D. F. Andrews (1974) A Robust Method for Multiple Linear Regression,
Technometrics, 16:4, 523-531

To link to this article: http://dx.doi.org/10.1080/00401706.1974.10489233

Published online: 09 Apr 2012.

Submit your article to this journal

Article views: 104

Citing articles: 71 View citing articles

Full Terms & Conditions of access and use can be found at


http://amstat.tandfonline.com/action/journalInformation?journalCode=utch20

Download by: [71.74.132.239] Date: 11 May 2016, At: 00:05


TECHNOMETRICSO, VOL. 16, NO. 4, NOVEMBER 1974

A Robust Method for Multiple Linear Regression


D. F. Andrews
Bell Laboratories
Murray Hill, New Jersey
and
University of Toronto
Toronto, Ontario

Techniques of fitting are said to be resistant when the result is not greatly altered
in the case a small fraction of the data is altered: techniques of fitting are said to be
robust of efficiency when their statistical efficiency remains high for conditions more
realistic than the utopian cases of Gaussian distributions with errors of equal variance.
These properties are particularly important in the formative stages of model building
when the form of the response is not known exactly. Techniques with these properties
are proposed and discussed.
Downloaded by [71.74.132.239] at 00:05 11 May 2016

KEY WORDS alternatives may be required. If the form of the


Linear Regression model is not known exactly, then a least squares fit
Mrdtiple Regression to a hypothesized, invalid model may obscure the
Robust Estimation inappropriateness of this model. This inappropriate-
Least Squares ness may be revealed in certain plots of residuals.
Least Absolrlte Deviations
Sine Estimat,e However the appreciation of such plots requires
IIltber Estimate much skill and judgement, perhaps more than can
be expected of the user in a non-mathematical area
(see Andrews (1971) for examples). A robust fit
1. INTRODUCTION may leave several residuals much larger, more clearly
Much of statistical computing is done on linear indicating that something is wrong. See the example
regression models. The linear regression program in Section 8 for a illustration of this.
accounts for approximately one half of the number Procedures have been developed and will be
of uses of the UCLA BMD programs at the Univer- described below which are resistant to gross devia-
sity of Toronto. If analysis of variance is included tions of a small number of points and relatively
as a special case of linear regression this fraction is efficient over a broad range of distributions. If the
increased. Currently regression models are being data is Gaussian they will yield, with high prob-
applied widely in Linguistics, Sociology and History. ability, results very similar to those of a least
Almost every discipline is making use of regression squares analysis.
analysis.
2. ROBUST REGRESSION: SOME KNOWN APPROACHES
Least-squares is an optimal procedure in many
senses when the errors in a regression model have a Least-squares calculations have received much
Gaussian distribution or when linear estimates are attention from numerical specialists. Golub and
required (Gauss-Markov Theorem). Least-squares Reinsch (1970), Wilkinson (1970) and others have
is very far from optimal in many non-Gaussian proposed procedures with very good computational
situations with longer tails (see Andrews et al. 1972, properties. Non-linear least-squares has also received
Chapter 7 for further discussion). It is unlikely that much attention from Marquardt (1963) and others.
the use of least squares is desirable in all instances. To date there seems to have been relatively little
Some alternative to least squares is required. A work done on other methods. Gentleman (1965)
recent study (Andrews et al. 1972) clearly demon- and Forsythe (1972) have considered algorithms for
strates the inefficiency of least-squares relative to minimizing the sum of pth powers of residuals, a
more robust estimates of location for a wide variety generalization of least squares. Recently some
of distributions. aspects of rank procedures have been discussed by
Even in careful experimental work, where errors JureEkovit (1971) and Jaeckel (1972). Relies (1968)
are frequently assumed to be nearly Gaussian, has st’udied regression extensions of Huber’s (1964)
estimates.
Received Jan. 1X3; l,evised Feb. 1974 Many multiple regression estimation procedures
523
524 DAVID F. ANDREWS

maximize a function and many involve operations where a < b < c < d and s defined by
that sequentially treat one variable at a time. s = median l]c, - median {~,)l}.
Non-Gaussian maximum likelihood estimates are
obtained by numerically maximizing a function of In the same reference Andrcws developed a SINE
the parameters. The same method is used in other estimate using
approaches (see Jaeckel 1972). However least-
squares calculations or equivalently Gaussian max-
imum likelihood calculations lead to the solution of
systems of linear equations. These are usually solved
(D=
I(0 ()IZI2 ca
sin I IZl<CB
(3.3)

by applying a series of operators that “eliminate”


and s as before. The estimate with c = 2.1 was
each variable in succession.
studied.
In the proposed method an operator is defined
The asymptotic variance of such an estimate of
that operates on one variable at a time. It is used to
this type may be readily evaluated for a density
determine the starting point of a maximization
function f symmetric about p and with interquartile
procedure.
range u, (see Huber (1964), Andrews et al. (1972))
3. SOME RECENT NEW RESULTS ON by the formula
ESTIMATES OF LOCATION var (8)
In a recent work (Andrews, Bickel, Hampel,
Downloaded by [71.74.132.239] at 00:05 11 May 2016

Huber, Rogers and Tukey 1972) some new estimates


of location were studied which had high efficiency
for the Gaussian distribution and strong robustness where p’(z) denot’es the derivative of ‘p with respect
under extreme departures from normality. These to z. This asymptotic variance is recorded in Table 1
estimates may be usefully extended to regression below along with the asymptotic variances of two
situations. other commonly used robust estimators, the 10%
An estimate b of location may be defined for a trimmed mean and the 25y0 trimmed mean. The
set of numbers x1 , . . . , r, as a solution to the distributions used were standardized so that their
equation. interquartile ranges were equal to the corresponding
value for the normal distribution. Table 2 presents
the efficiency of the SINE estimate relative to the
arithmetic mean.
where s(x) is an estimate of spread. Such an estimate The SINE est’imate is more efficient for the Normal
is called an M-estimate (Huber (1964)). If the distribution and neither trimmed mean is more
density function for x is a member of the location efficient for all of the other long tailed dist’ributions.
family ~(cc:p(, U) = (l/a)f([x - p]/(r) Equation 3.1 is The variance depends also on c. As c increases the
the maximum likelihood equation for ~1with s = g estimate and its properties tends to least-squares.
and cp = II/f. For exploratory work more robust estimates with
The form of the function (o and the definition of c = 1.5 or 1.8 are recommended. Many robust
the scale parameter s determine the propert’ies of p. estimates have to be computed using an iterat’ive
Huber (1964) proposed solving for p using cpdefined procedure. In most cases, the SINE estimate requires
by only one iteration from a starting point p, the
median of the xi since the first order Taylor expan-
- Ic x < --k
sion of (3.1) can be solved in closed form,
i
PC4 = 2 Ml”
tan {(P - Pllcsl
‘I k 72> k;
s is determined simultaneously. Hampel (in Andrews
= -2 sin(%2)/x COS
(-2) ,
et al. 1972) suggested a class of estimates for location
based on a function cpof t,he form if the set of -c, satisfying js, - p] 5 cs is the same as
the set satisfying ]c, - p] I cs (both summations
w (4 121 1215 a are over this set).
I sgn (~).a a < 1215 b
4. EXTENSIONS TO THE REGRESSION I’ROFCEM
G”(z)
=jsgn
(&&--) b<IdI c The M-estimates for location arc defined to be
solutions of the equation (3.1) where s is determined
lo lzl > c somehow, perhaps simultaneously. This is equivalent

TECHNOMETRICSO, VOL. 16, NO. 4, NOVEMBER 1974


A ROBUST METHOD FOR MULTIPLE LINEAR REGRESSION 525

TABLE l-Asymptotic Variances of the Sine Estimate Compared with that of Two Trimmed Means.
Distributicms
have been Resealed to have Equal Interquartile Ranges

SINE
DISTRIBUTION ESTIMATE TRIMMED MEANS
c = 2.1 10% 25%
Norma 1 1.04 1.06 1.19

Cauchy 1.31 2.17 1.15

Logistic 1.15 1.14 1.19

Laplace or 1.16
1.38 1.41
double exponential

1.19 1.19 1.18


t4

to finding a local maximum of the function The parameters @may be estimated by the location
?&b(xi - p/s) where q(z) = - d/dz #(z). In this of a local maximum of the function Z$( r,(b)/s(b) )
second form they may be extended to regression where -# is the integral of (3.2) given by
Downloaded by [71.74.132.239] at 00:05 11 May 2016

models since xi - P may be considered as a residual,


ri , and s as a scale statistic. The estimate is defined 11 + COS
(:)jc I.4 I CT
as the values of parameters for which
e4 =
W(f-il4, (4.1) 10 121> CT. (4.3)
a function of the corresponding residuals, attains The particular local maximum found by an
a local maximum. Relles (1968) uses this method iterative optimization program will depend on the
with convex $. starting value b, and on the numerical maximization
Consider the model procedure used.
If the parameter estimate 6 is not to be greatly
Y, = x,d% + x& + . . . + xi,& + uei influenced by a few data points which are far from
= xi’@ + ae, (4.2) the regression plane then, in general, #(z) must be
bounded and tend to a constant and hence, for
where 0 is a vector of unknown parameters, xi’ is
smooth +,
a row vector of independent variables, ,J is an
unknown scale parameter and ei is a residual. lim #‘(z) = 0.
1.1~m
Given any lc-vector b the residuals
Hampel (1971) in a study of general properties of
ri(b) = y; - x;‘b, this kind notes the desirability of this property.
may be formed. A robust scale estimate can be As a result of this constraint it follows that there
defined by can be more than one local maximum of (4.1).
Hence the choice of the starting point b, may be
s(b) = median (jri(b)l). important.

TARL~ 2-Asymptotic Variance and Eficiency of SineEstimate Relative To the Arithmetic Mean
for Some Distributions

VARIANCE EFFICIENCY
DISTRIBUTION SINE ESTIMATE MEAN VAR(MEAN)/VAR(SINE)
c = 2.1

Normal 1.04 1.0 0.96


Cauchy 1.31 m m

Logistic 1.15 1.24 1.08

Laplace 1.38 1.89 1.37

t4 1.19 1.65 1.39

TECHNOMETRICSQ, VOL. 16, NO. 4, NOVEMBER 1974


526 DAVID F. ANDREWS

One possible starting value would be b, = I$LS , n = 20, pl = .15 and pz = .l group L contains
the least-squares estimate of 0. However if the data X(4) , X(5) , . * . , X(8) and group H contains xC13),
is far from Gaussian gLs may be far from the Z(14) , . . . , X(17) from the sorted xc,) together with
global maximum and a distant, local, maximum the associated values of y.
may be encountered. The quantity b is defined in terms of medians
In the location case the median was used as the (me4
starting value. A regression analogue of the median
med ( yK) - med (yK}
is developed in the next section. The estimate b= H
requires much computation but has a relatively med 1~~) - rnld {z,) ’
H L
high ‘(breakdown point”, so that many observations
may be perturbed greatly with only slight changes In the example to follow pl = pz = 0. In this case
occurring in the estimate. See Hampel (1971) for up to 25% of the x’s and/or the y’s may be perturbed
further details on this concept and Andrews et al. arbitrarily far without greatly affecting b. In
(1972 Chapter 5) for a finite sample definition. general 2 - $(pl - pZ) of the x’s and f - +(p, + p2)
of the y’s may be so perturbed.
5. REGRESSION BY MEDIANS The operator R is non-linear and non-idempotent.
The model (4.2) may be written in vector form Repeated operation by R will change the result.
In the least-squares technology the sweep operator
y = X@ + ue = xlpl + . . . + x& + ae
is applied to the independent variables successively
Downloaded by [71.74.132.239] at 00:05 11 May 2016

where xi denotes a column vector of X. We want to and then to the dependent variable. This may be
find an estimate of @ done here. The first variable is used to modify the
remaining k by applying
e = 61 , a2 , . . . , A)
R l.k+l(. . . (R1,dR1,4M))) . . . 1 = M*
with a high breakdown point. Such an estimate may
be defined in terms of the following generalized Then the second variable is used to modify the
(‘sweep” operator R designed to estimate and remove following k - 1 variables by applying R to M*,
the dependence of one variable on another. the result of the previous operation:
The operator is defined on a data matrix M which
R,,,+,(. 1. (Rd’h,dM*))). . .I.
initially contains the raw data,
This process may be continued for all the inde-
M = [X : y].
pendent variables. The operation is non-linear.
Then R, , is defined to operate on the columns of this Typically further iteration is required, the number
matrix by adjusting the jth column by a multiple of iterations depending in part on the number of
of the ith column regressors. The sequence of operations is repeated
m times. This sequence may be represented by the
R,i:Mi+-Mi - bM,
algorithm
where the coefficient b is a function of M, and Mi .
DO Z=ltom
Let x and y denote the columns Mi and M i
respectively. A least-squares sweep operator uses b DO i=ltok
defined by the least-squares regression of y on x,
DO j=i+l to k+l
apply Rii .
The particular robust operator we shall discuss
uses a quantity b defined in the following three Notationally we may express this sequence as
paragraphs.
Two groups may be formed by
i) sorting the data according to 2, , j = 1, . . . , n
ii) setting aside two sets of p,n points each where niC1” denotes repeated operation with i
corresponding to the largest and the smallest increasing. The estimated coefficients may be
Xi calculated conveniently by applying R not to M
iii) setting aside two sets of p,n points each with itself but to the augmented matrix
xi immediately above and below the median
M
{Xi}. M+= .. ,
The remaining points form two groups which will
be denoted by L and H corresponding to those with III I
Low and High values of xi . Thus if, for example, The end result of the above procedure is a set of

TECHNOMETRICSO, VOL. 16, NO. 4, NOVEMBER 1974


A ROBUST METHOD FOR MULTIPLE LINEAR REGRESSION 527

parameter estimates iii) solving the least-squares equations associated


with the model
b,’ = - (Mn+l,k+1+, .a. , Mn+k.k+l+) y, = xj’@ + noise
and a residual vector r = r(b,) = y - Xb, where with weights wj for the next estimate in the
iteration. If a weighted least-squares program
r = (Ml.k+l , . . . , W++d. is not available the system
It can be shown that this procedure has at least w,yj = w,xjb, + noise
one fixed point. Round-off errors may make this may be solved using ordinary least-squares.
computationally unattainable. However the proce- It is not obvious that iterating until convergence
dure is used only to get a crude starting point for is achieved is desirable. In Andrews et al. (1972) it
a subsequent optimization. m = [1c/2] + 2 t’imes. was found that estimates based on a good starting
point and a finite number of iterations had in some
6. IMPROVING THE INITIAL ESTIMATE instance better properties than fully iterated esti-
The repeated use of the R operator yields crude mates.
residuals and b, , a crude estimate of the parameters.
These may be used as a starting point for a further 7. INFERENCE AND TESTS
iteration designed to improve the efficiency of the
We propose interval estimates or test procedures
procedure.
for one-dimensional subspaces of the parameter
This may be done by maximizing the function
space. We consider parameter values some distance
Downloaded by [71.74.132.239] at 00:05 11 May 2016

F #b-ih)/s(bi-1)) 63.1) from 13~in a direction d and find a confidence


interval for y in the expression
(which is analogous to (4.1)) with respect to bi
where 0 = eo + rd.

s(b,) = median (lr,(bJl) The Gaussian likelihood ratio test of the hypothesis
y = y,, against the alternative y # y0 is based on
starting with b, . In the example to follow the a t statistic to measure the regression of r =
function (4.3) was used with c = 1.5. This corre- r(& + rod) on X’d. The statistic can be written
sponds to a procedure giving less influence to large
residuals than the c = 2.1 case studied in Andrews t = (n - l)‘z(x,‘dr,)/(z(x,d)‘~~,’ - [Z(xL’d~,)]“)4
et al. (1972). The function to be optimized (6.1) is
A robust analogue of this test is based on the
a sum of cosines. Since most of the arguments of
regression of p{r/s@) ) on p(x,d/scI) where 6 is a
the cosines are small, the sum is nearly quadratic.
robust estimate of 0 and where sd = median { 1x,d] ) .
A Fletcher-Powell (1963) type of procedure is
To prevent a small number of points from strongly
efficient for optimizing this sort of function.
affecting the test both variables have been modified.
Alternatively any least-squares program may be
If s is given its asymptotic value, 3F-‘(.75)
used. This follows from noting that the maximum
- F-‘(.25), for symmetric cumulative distributions
of (6.1) satisfies the system of 1cequations
F, the moments of cp(x/s) (where cp is as defined
in (3.2)) arc given in Table 3. The similarity of
g x;i9’(r;(bi>/s(bi-,)} = 0 the even moments suggests that the (F(_Y~/s)may be
combined to form a statistic with a t distribution.
which may be rewritten as
In particular, the ratio pJpZ2 is less than 3, the
Zxikwiri(bJ = 0 (6.2) value of p4/p2’ for normal variables. Gayen (1950)
shows that under these conditions the F test for
where wi2 = $‘(Tj(b<)/S(b,-I)). Since any propor- the ratio of variances is conservative. This suggests
tionality constant may be ignored it is easy to that the t test based on the regression of ‘p{ r/s(p) )
calculate on p(x,‘d/sd) is conservative. The proposed test is
sin [rj(b,)/s(b,-,)/r,(bi~ Iril I 7rc based on the statistic
Wi2 =
10 otherwise
t* = (n - l)t~(p,(d)(ei(r)l(Z~i2(d)~cpiZ(r)
otherwise the system of equations (6.2) is just the
system of weighted least squares equations. Thus the
estimate may be easily calculated by where
v,(d)= dz,‘dld anda,(r) = cp(r,ls@))
i) selecting an initial estimate b”“, where all summations are taken over all i such that
ii) using this estimate to find residuals r(b”‘) IX,‘d/Sd/ < 2.1 and ]r*/s(@)] < 2.1 and wz is the
scale estimate s(O) and weights w”‘, number of such terms. Since this quantity involves

TECHNOMETRICSO, VOL. 16, NO. 4, NOVEMBER 1974


528 DAVID F. ANDREWS

TALILIC 3--MfJVUXts Of q(Z/S) Where 2s = F-'(0.76) - F-*(0.25)

DISTRIBUTION
2
1-12 1J-4 '6 '4"2 ?923

Normal 0.32 0.19 0.14 1.94 4.55


Cauchy 0.26 0.17 0.13 2.44 7.27

Logistic 0.32 0.20 0.15 1.96 4.65

Laplace 0.32 0.21 0.17 2.05 4.98

t4 0.32 0.20 0.15 2.00 4.83

only m terms the significance of t may be conser- grounds for confidence in the present approach.
vatively assessed by comparing it to a t distribution The test is only locally powerful. Extreme departures
with m - 1 degrees of freedom. Efron’s (1969) from the hypothesis may be assessed using a simpler
results, while not exactly relevant, provide further test such as the sign test.

TABLE 4-Data from Operation of A Plant for the Oxidation of Ammonia to Nitric Acid
Downloaded by [71.74.132.239] at 00:05 11 May 2016

Cooling Water Acid


Observation Stack Air Flow Inlet Temperature Concentration
Number Loss
\, x1 x2 x3

1 42 80 27 89

2 37 80 27 88
3 37 75 25 90
4 28 62 24 87
5 18 62 22 87
6 18 62 23 87
7 19 62 24 93
8 20 62 24 93
9 15 58 23 87
10 14 58 18 80
11 14 58 18 89
12 13 58 17 88
13 11 58 18 82
14 12 58 19 93
15 8 50 18 89
16 7 50 18 86
17 8 50 19 72
18 8 50 19 79

19 9 50 20 80
20 15 56 20 82
21 15 70 20 91

TECHNOMETRICSO, VOL. 16, NO. 4, NOVEMBER 1974


A ROBUST METHOD FOR MULTIPLE LINEAR REGRESSION 529

8. EXAMPLE apparent that one observation (21) has an abnor-


Daniel and Wood (1971 Chapter 5) consider in mally large residual. This observation has altered
some detail an example with 21 observations and the coefficients of the fitted model considerably.
3 independent variables. The example is based on After much careful work on this and other aspects
data from Brownlee (1965, Section 13.12). The data Daniel and Wood set aside this observation and
are also presented in Draper and Smith (1966 three others (1, 3, 4) and present an explanation for
Chapter 6) and given here in Table 4. Daniel and the unusual behaviour of these points. They then
Wood note anomalies in the plot of residuals from fit the variables x1 , x2 and x1’ to the remaining
a standard least-squares regression fit. From a points to obtain the equation
normal probability plot of these residuals it is y = -15.4 - 0.07& + 0.53X* + 0.0068&”

T.IIILIS S-Response and Residuals from Various Fits

Observation Response Residuals


Number Least-Squares Robust Fit c=1.5
with 1,3,4,21 without with 1,3,4,21 without

1 42 3.24 6.08 6.11 6.11

2 37 -1.92 1.15 1.04 1.04


Downloaded by [71.74.132.239] at 00:05 11 May 2016

3 37 4.56 6.44 6.31 6.31

4 28 5.70 8.18 8.24 8.24

5 18 -1.71 -0.67 -1.24 -1.24

6 18 -3.01 -1.25 -0.71 -0.71

7 19 -2.39 -0.42 -0.33 -0.33

8 20 -1.39 0.58 0.67 0.67

9 15 -3.14 -1.06 -0.97 -0.97

10 14 1.27 0.35 0.14 0.14

11 14 2.64 0.96 0.79 0.79

12 13 2.78 0.47 0.24 0.24

13 11 -1.43 -2.51 -2.71 -2.71

14 12 -0.05 -1.34 -1.44 -1.44

15 8 2.36 1.34 1.33 1.33

16 7 0.91 0.14 0.11 .ll

17 8 -1.52 -0.37 -0.42 -0.42

18 8 -0.46 0.10 0.08 .08

19 9 -0.60 0.59 0.63 0.63

20 15 1.41 1.93 1.87 1.87

21 15 -7.24 -8.63 -8.91 -8.91


I:esidt&+ given in italics come from points not ilxluded in the fittillg procedure.

TECHNOMETRICSO, VOL. 16, NO. 4, NOVEMBER 1974


530 DAVID F. ANDREWS

TA~LI,; 6--Coc&inbs and Estimated Standard Errors

FIT

(1) E(Y) = -39.9 -I- 0.72~~ + 1.30x2 - 0.15X3

(S.E. Coef.) (0.17) (0.37) (0.16)

(2) E(y) = -37.6 + 0.80x1 + 0.58x2 - 0.07X3

(S.E. Coef.) (0.07) (0.17) (0.06)

(3) & (4) E(y) = -37.2 + 0.82x1 + 0.52x2 - 0.07x3

(S.E. Coef.) (0.05) (0.12) (0.04)

(The estimated standard errors for the robust fits (3), (4) were obtained from the weighted
least-squares procedure described at the end of Section 6.)

with an associated residual root mean square error attention. Fit (2) is the least-squares fit to the data
of 1.12. (Our values for these coefficients differ after the 4 points eventually set aside by Daniel
Downloaded by [71.74.132.239] at 00:05 11 May 2016

slightly from those of Daniel and Wood because of and Wood have been removed from the fitting
differences in our treatment of roundoffs). equation. The probability plot of the residuals,
Most researchers do not have the insight and Figure 2, exhibits only slight anomalies.
perseverance of these authors. However the fitting Fit (3) is a robust fit with c = 1.5. The prob-
procedure described in the previous sections applied ability plot of residuals from this fit, Figure 3,
to the original data yields similar results as we shall identifies the 4 points. Fit (4) is the same fitting
show. If, following the suggestion of Daniel and procedure applied to the data with the 4 points
Wood the variable xl2 is included in the fit the removed. Note that the fit is unaffected by the
residuals are further reduced. 4 points. The probability plot of the remaining
The four fits-two least-squares fits by Daniel residuals, Figure 4, is comparable to Figure 2.
and Wood and two robust fits are summarized in The robust fitting procedure (3) has immediately
Table 5 and Table 6. and routinely led to the identification of 4 question-
Fit (1) is the original least-squares fit. The prob- able points. The fit is independent of these points.
ability plot of residuals from this fit, Figure 1, As seen in Table 6, the coefficients of both robust
suggests that 1 point (21) deserves particular fits (3 and 4) are well within the standard errors

8- B-

6- . 6-
.
4- 4-
.
.**
2-
2: .**

2 0 .**
3 -2 -
@z l ****
.
. .
-4 - -4 -

-6 - -6 -
.
-8 - -8
I, I I lI,IiII I I I II I I IIIIII~ 1’1
1 2 5 ,O 20 ’ 40 ’ 60 ’ 80 90 95 98 99 1 2 5 (0 20 ’ 40 ’ 60 ’ 80 90 95 98 99
30 50 70 30 50 60
PROBABILITY X 100% PROBABILITY X ?OO%

FIGURI~ l-Probability Plot Residuals from Least-Squares FIGURE: Z-Probability Plot Residuals from Least-Squares
Fit of r, , x2 , z3 Fit 4 Points Omitted.

TECHNOMETRICSO, VOL. 16, NO. 4, NOVEMBER 1974


A ROBUST METHOD FOR MULTIPLE LINEAR REGRESSION 531

. the other hand the procedure is insensitive to


S-
moderate numbers of extreme observations with the
6-
** result that these may be readily detected by examin-
ing residuals and further calculation with these
4- values set aside may not be necessary. However the
principal advantage lies in the detections of observa-
tions to be studied further.

10. ACKNOWLEDGEMENTS
The author is grateful for the many helpful
-4 - comments and suggestions for further investigation
he has received from J. M. Chambers, C. L. Mallows
-6 - and J. W. Tukey. This work was supported in part
by the National Research Council of Canada. The
-I3 -
,,d I IIIIII1 I I !_ referees have made many suggestions helpful in
12 5 10 20 40 60 80 90 95 98 99 the revision of this paper.
30 50 70
PROBABILITY X 100%
REFERENCES
FIGURN 3-Probability Plot Residuals from Robust Fit of
51 , r2 , x3
PI ANDREWS, D. F. (1971). Significance tests based on
residuals. Biometrika 68, 139-148.
Downloaded by [71.74.132.239] at 00:05 11 May 2016

PI ANDRICWS,D. F., BICKICL, P. J., HAMPEL, F. R., HUNICR,


of the coefficients of the least squares fit (2) with P. J., ROGERS, W. H. and TUKIGY, J. W. (1972). Robust
points 1, 3, 4 and 21 deleted. Estimates of Location: Survey and Advances. Princeton
Univ. Press.
The robust fitting procedure does not directly
[31 BROWNLFX, K. A. (1965). Statistica Theory and Methodol-
suggest any modifications of the original model as ogy in Science and Engineering (2nd edition) New York,
suggested by Daniel and Wood. However by pro- Wiley.
viding residuals uncontaminated by the effects of [41 DANIEL, C. and WOOD, F. S. (1971). Fitting Equations to
the anomalous observations it gives the analyst a Data, Wiley, New York.
[5] DRAPER, N. R. and SMITH, H. (1966). Applied Regression
better chance to discover such improvements. Analysis, Wiley, New York.
9. CONCLUSION PI EFRON, B. (1969). Student’s t-test under symmetry
conditions. J. Amer. Statist. Assoc. 64, 1278-1302.
A method for estimation and testing in robust [71 FLICTCHICR,R. and POWELL, M. J. D. (1963). A rapidly
regression has been developed. The method requires convergent descent method for minimization. Computer
a crude, safe, initial fit which is refined to yield a J. 6, 163-168.
procedure relatively efficient for near Gaussian data. FORSYTHE, A. B. (1972). Robust estimat,ion of straight
line regression coefficients by minimizing p-th power
The procedure is iterative and, compared with
deviations. Technometrics 14, 159-166.
least-squares relatively expensive to compute. On GAYEN, A. K. (1950). The distribution of the variance
ratio in random samples of any size drawn from non-
normal universes. Biometrika 37, 236-255.
0- [lo] GENTLEMBN, W. M. (1965). Robust estimation of multi-
variate location by minimizing p-th power deviation,
6- unpublished Ph. D. thesis Princeton University.
1111GOLUR, G. H. and REINSCH, C. H. (1970). Singular value
decomposition and least squares solution. Numer. Math.
14,402420.
1121HAMPEL, F. It. (1971). A qualitative definition of
robustness. Ann. Math. Statist. 48,1887-1896.
[I31 HU~ICR, P. J. (1964). Robust estimation of a location
parameter. Ann. Math. Statist. 35, 73-101.
[I41 JASCKKL, L. A. (1972). Estimating regression coefficients
by minimizing the dispersion of the residuals. Ann.
Math. Statist. 43,1449-1458.
Il.51 JUREEKOV~, J. (1971). Nonparametric estimate of
regression coefficients. Ann. Math. Statist. 42, 1328-1338.
[I61MARQU,\RDT, D. W. (1963). An algorithm for least-
squares estimation of non-linear parameters. J. Sot. Ind.
Appl. Math., 11, pp. 431-441.
PROBABILITY X 100%
[171 RELLES, D. A. (1968). Robust Regression by Modified
Least-Squares unpublished Ph.D. t,hesis Yale University.
FIGIJRI~; 4-Probability Plot Residuals from Robust Fit WI WILKINSON, G. N. (1970). A general recursive procedure
4 Points Omit ted for analysis of variance, Biometrika 57, 19-46.

TECHNOMETRICSO, VOL. 16, NO. 4, NOVEMBER 1974

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy