FCDS - RA ch4 Sp21
FCDS - RA ch4 Sp21
MULTIPLE REGRESSION
is called a multiple linear regression model with p-1 predictor variables. β0 , β1 , ... ,
βp-1 are constant parameters and called the regression coefficients. The parameter βk
represents the expected change in the response Y per unit change in Xk when all the
remaining regressor variables Xi (i k) are held constant. For this reason the parameters
βk , k = 1, 2, ..., p-1 are often called partial regression coefficients. The error variables
εi are independent N(0, σ2 ).
Models that is more complex in structure than (4.1) may often still be analyzed
by multiple linear regression techniques. For example, consider the cubic polynomial
model
Y = 0 + 1 x + 2x2 + 3 x3 + (4.2)
If we let x1 = x , x2 = x2 and x3 = x3 , then (4.2) can be written as
Y = 0 + 1x1 + 2 x2 + 3 x3 + (4.3)
which is a multiple linear regression model with three regressor variables.
Models that include interaction effects may also be analyzed by multiple
regression models. For example, suppose that the model is
Y = 0 + 1 x1 + 2 x2 + 12 x1x2 + (4.4)
If we let x3 = x1 x2 and β3 = β12 , then (4.4) can be written as
Y = 0 + 1 x1 + 2 x2 + 3 x3 +
which is a linear regression model. In general, any regression model that is linear in the
parameters (β's) is a linear regression model regardless of the shape of the surface that
it generates.
With n independent observations on Y and the associated values of xi, the
complete model becomes
- 52 -
Y1 = 0 + 1 x11 + 2 x11 + ... + p-1 x1,p 1 + 1
Y2 = 0 + 1 x 21 + 2 x 22 + ... + p-1x 2,p1 + 2
(4.5)
. . .
n 1 xn1
Y xn2 . . . xn,p 1 p-1 n
Or
Y X (4.7)
(n1) (np) (n1) (n1)
where:
Y is a vector of responses ;
β is a vector of parameters ;
X is a matrix of known constants;
is a vector of independent random variables with expectation;
E( ) = 0
and variance covariance matrix ;
Var( ) = 2 I
Consequently, the random vector Y has expectation;
E(Y) = X (4.8a)
and variance covariance matrix ;
Var(Y) = 2 I (4.8b)
- 53 -
Let us denote the vector of least squares estimates ˆ 0 , ˆ 1 , , ˆ p1 of the regression
ˆ 0
ˆ
ˆ 1
ˆ
p 1
Then ̂ must satisfy
Q
2X(Y Xˆ ) 0 XY XXˆ (4.10)
ˆ
Equation (4.10) are the least squares normal equations. To solve them, multiply both
sides of (4.10), from left, by the inverse of XX , we obtain the least square estimate ̂
as
ˆ (XX)1 XY (4.11)
provided that (X X)-1 exits.
Ŷ1 Y1 e1
Y e
Ŷ2
e = Y - Y , where Ŷ = , Y = 2 and
ˆ e = 2 (4.12)
Ŷn Yn en
The vector of fitted values is given by
Ŷ = X ˆ X(XX)1 XY= H Y (4.13)
where
H = X (XX)1 X (4.14)
Similarly, the vector of residuals is given by
e = Y - Yˆ Y - HY (I H)Y (4.15)
The nxn matrix H is usually called the "hat" matrix because it maps the vector
of observed values into a vector of fitted values. The hat matrix has several useful
properties. It is symmetric (H = H) and idempotent (H H = H). Similarly the matrix (I
- H) is also symmetric and idempotent.
- 54 -
4.4 Properties of the Least Squares Estimators
The statistical properties of the least squares estimators ̂ may be easily
demonstrated. Consider first bias: by (4.11) and (4.8a) we have
E(ˆ ) E[(XX)1 XY] (XX)1 X E(Y) (XX)1 X X
Thus ̂ is an unbiased estimator of β.
2 (XX)1
Thirdly, if we further assume that the errors εi are normally distributed, then ̂
is also the maximum likelihood estimator of β .
4.5 Estimation of σ2
As in simple linear regression, we may develop an estimator of σ2 from the
residual sum of squares
n
SSE ei2 e e (Y Y)
ˆ (Y Y)
ˆ YY YY
ˆ Y
ˆ Y Y
ˆ Y
ˆ
i 1
- 55 -
4.6 Analysis of Variance Results
In multiple regression problems certain tests of hypotheses about the model
parameters are useful in measuring model adequacy. The test for significance of
regression is a test to determine if there is a linear relationship between the response Y
and any of the regressor variables x1 , x2 , ... , xp-1 . The appropriate hypotheses are
Rejection of H0 implies that at least one of the regressors x1 , x2 , ... , xp-1 contributes
significantly to the model. The test procedure is a generalization of that used in simple
linear regression. The total sum of squares Syy is partitioned into a sum of squares due
to regression and error (residual) sum of squares,
i 1 i 1
Therefore,
SSR SST SSE ˆ XY nY2
- 56 -
4.7 Coefficient of Multiple Determination
The coefficient of multiple determination R2 is defined as
SSR SSE
R2 = =1-
S yy S yy
It is customary to think of R2 as a measure of reduction in the variability of Y
explained by the regressor variables x1 , x2 , ... , xp-1 . Since
0 SSE S y y
it follows that,
0 R2 1
However, a large value of R2 does not necessarily imply that the regression model is a
good one. Adding a regressor to the model will always increase R2 regardless of
whether or not the additional regressor contributes to the model.
The positive square root of R2 ,
0 R = R 2 1
is the multiple correlation coefficient between Y and the set of regressor variables
x1 , x2 , ... , xp. That is R is a measure of the linear association between Y and
x1 , x2 , ... , xp-1 .
Example 4.1
The following data consist of the scores that 10 students obtained in an
examination, their IQ’s and the number of hours they spent studying for the
examination.
IQ Number of hours studied Score
X1 X2 Y
112 6 74
126 13 97
100 5 54
113 7 70
112 11 77
121 7 88
110 8 73
103 5 54
111 6 69
124 2 82
a- Fit a linear regression model to the data, and state the estimated regression function.
How ̂1 and ̂ 2 are interpreted here?
b- Test whether there is a regression relation (significance of regression), using a level
of significance of 0.05, state the alternative, decision rule, and conclusion.
- 57 -
c- Calculate the coefficient of multiple determination R2. How is this measure
interpreted here?
d- Predict the score of a student with IQ of 108 who studied 6 hours for the
examination.
Solution
Here we have,
1 112 6 74
1 126 13 97
1 100 5 54
1 113 7 70 738 10 1132 70
1 112 11 , 77 ,
X Y 84511 , X X = 1132 128780 7989
X Y
1 121 7 88 5360 70 7989 578
1 110 8 73
1 103 5 54
1 111 6 69
1 124 2 82
Note that from the definition of X and from the first row of the matrix X’X, we
have
n n 113.2
n=10, 1132 xi1 and 70 xi 2 , therefore x Also from the first row
i 1 i 1 7.0
n
of the matrix X’Y,we have 738 y i , therefore y 73.8 .
i 1
- 58 -
FIG. 4.1 Scatter and residual plots of
the students data.
i=1
- 59 -
H0 : β1 = β2 =0 VS Ha : at least one of β1 & β2 does not equal zero.
Hence the ANOVA table is given by
- 60 -
4.8 Inferences About Regression Parameters
In section 4.4 we have proved that the least squares estimator ̂ is an unbiased
estimator of β
E ˆ =
with variance-covariance matrix
var (ˆ 0 ) cov( ˆ 0 , ˆ 1 ) ... cov(ˆ 0 , ˆ p1 )
ˆ ˆ
cov(1 , 0 ) ˆ
var (1 ) ... ˆ ˆ
cov(1 , p1 )
Var ˆ = 2(X X)1 =
ˆ , ˆ ) cov( ˆ , ˆ ) ... var(ˆ )
cov ( p 1 0 p 1 1 p 1
The estimated variance covariance matrix S2 ( ̂ ) is then,
S 2 (ˆ 0 ) S( ˆ 0 , ˆ 1 ) ... S(ˆ 0 , ˆ p1 )
ˆ ˆ
S(1 , 0 ) 2 ˆ
S (1 ) ... ˆ ˆ
S(1 , p1 )
S 2 (ˆ ) = Var ˆ = MSE (X X)1 = (4.19)
ˆ ˆ ˆ ˆ ... S 2 (ˆ p1 )
S(p1 , 0 ) S( p1 , 1 )
From S2( ̂ ) , one can obtain S2( ̂ 0 ), S2( ̂1 ) or whatever other variance is needed, or
any needed covariances.
- 61 -
4.9 Interval Estimation of the Mean Response
For given values of X1 , X2 , ... , Xp-1 , denoted by xh1 , xh2 , ... , xh,p-1 1 , the mean
response is denoted by E[Y/Xh]. We define the vector Xh as
1
x
h1
xh = xh2
xh,p-1
so that the mean response to be estimated is
E[Yh | Xh ] = 0 + 1xh1 + 2xh2 + ... + p - 1xh,p-1 = xh (4.24)
The estimated mean response corresponding to xh , denoted by ŷ h is
yˆ h xh ˆ (4.25)
This estimator is unbiased
E[yˆ h] = xh E[ˆ ] = xh = E[Yh | Xh]
with variance
Var (yˆ h ) = Var (xh ˆ ) = xh [var (ˆ )]xh (4.26)
Thus an unbiased estimator for this variance, using (4.19) is given by
S (yˆ ) = x [S (ˆ )]x = MSE [x (X X) x ]
2 2 1
h h h h h (4.27)
Consequently a 100(1 - α)% C.I. on the mean response at Xh is
yˆ h t / 2, n-p MSE [xh (X X)1 xh ] (4.28)
where tα/2, n-p denotes the (α/2)100 percentile of the t distribution with n-p degrees of
freedom.
Example 4.2
The following data show the number of bedrooms, the number of baths, and the
prices at which a random sample of eight one-family houses sold recently in a certain
large housing development:
- 62 -
Number of bedrooms Number of baths Price (dollars)
x1 x2 Y
3 2 78,800
2 1 74,300
4 3 83,800
2 1 74,200
3 2 79,700
2 2 74,900
5 3 88,400
4 2 82,900
Given that:
8 25 16 107 -20 -17 637,000
X X = 25 87 55 , (X X)1 =
1
-20 32 -40 , X Y = 2,031,100
84
16 55 36 -17 -40 71 1,297,700
a- Fit a linear multiple regression model to the data, and state the estimated
regression function. How ̂1 and ̂ 2 interpreted here?
Answer:
Here we have a linear multiple regression model with 2 independent variables
(p-1 = 2) and n=8. The least squared estimate of the parameters are given by
107 -20 -17 637,000 5,476,100 65,191.7
ˆ = (X X) X Y = -20 32 -40 2,031,100 = 347,200 = 4,133.3
1 1 1
84 84
-17 -40 71 1,297,700 63,700 758.3
Thus the regression function is
ŷ = 65,192 + 4133x1 + 758 x2
This estimated regression function indicates that mean price of houses is
expected to increase by $4133 when the number of bedrooms increases by 1 room,
holding the number of baths constant, and that mean price of houses is expected to
increase by $758 when the number of baths increases by 1 bath, holding the number
of bedrooms constant.
b- Set up the ANOVA table. Conduct an F test to determine whether or not there is a
linear association between the house price and the number of bedrooms and the
number of baths; use α = 0.01. State the alternatives, decision rule, and conclusion.
What is the corresponding p-value?
Answer
- 63 -
To test whether house prices are linearly related to the number of bedrooms and the
number of baths, we construct the ANOVA table. The basic quantities needed are:
n
Y = 79625 , SST = S yy = (y i - y)2 = 185,955,000 , SSE = SST - SSR = 685834
i=1
Hence the ANOVA table is given by
Test of Regression relation. To test whether house prices are linearly related to
the number of bedrooms and the number of baths (significance of the regression
model):
H0 : β1 = β2 = 0 versus Ha : not both β1 and β2 equal 0
we use the F test statistic
MSR
F0 = = 675.3
MSE
For α = 0.01, we require F0.01, 2, 5 = 13.3 , since F0 = 675.3 > 13.3 , we reject H0 ,
that is house prices are linearly related to the number of bedrooms and the number
of baths. The p-value for this test is less than 0.001 since we note from the F-table
that F0.001, 2, 5 = 37.1.
84
-17 -40 71
From which it follows that,
- 64 -
S2 ( ̂1 ) = 137,167. (1/84).32 = 52,254.1
S2 ( ̂ 2 ) = 137,167. (1/84).71 = 115,938.8
Next, we require, t α/2, n-p = t 0.05, 5 = 2.015. Thus, the 90% C.I. for β1 is given by
ˆ 1 t / 2,np S2(ˆ 1 ) 1 ˆ 1 t / 2,np S2(ˆ 1 )
e- Obtain a 98% confidence interval for the mean house price of houses having 3
bedrooms and 3 baths. Interpret your interval estimate.
Answer
Here we have xh1 = 3 and xh2 = 3 , so we define
1
xh = 3
3
The point estimate of mean house price corresponding to xh is
65192
yˆ h xhˆ = [1 3 3] 4133 = 79,865
758
The estimated variance, using (4.27) is given by
S ( yˆ h ) = xh [S (ˆ )]xh = MSE [xh (X X) xh ]
2 2 1
- 65 -
i.e.
78561 E[Yh] 81169
Thus, with confidence coefficient 98%, we estimate that the mean house prices of
houses having 3 bedrooms and 3 baths are somewhere between 78,561 and 81,169
dollars.
f- Obtain a 98% confidence interval for a new house price with 3 bedrooms and 3
baths.
Answer
From the results obtained in (D), the 98% prediction limits a new observation
Yh(new) corresponding to xh using (2.29) , are given by
yˆ h t / 2, n-p MSE [1 xh (X X)1 xh ]
92
= 79865 3.365 137167 1 +
84
= ( 76757 , 81669 )
<+><+><+><+><+><+>
- 66 -
EXERCISES
[2] The manager of a soft-drink wants to use model for studying the linear relationship
between the number of cases (x1) and the delivery distance in Km (x2) with the
delivery time (Y). The following calculations are obtained from the observations:
20 150 64 550
X X = 150 1626 530.5 , X Y = 5513 , S yy = 4241
64 530.5 266.6 2006
- 67 -
6 94 198 7 2
7 66 132 3 3
8 55 106 3 3
9 52 104 2 2
10 70 144 5 4
11 83 176 6 2
12 69 154 5 4
13 92 202 8 6
14 66 136 5 1
15 72 164 5 5
16 58 122 3 3
a- Fit a linear multiple regression model to the data, and state the estimated
regression function. How ̂1 is interpreted here?
b- Set up the ANOVA table. Conduct an F test to determine whether or not there is
a linear association between the sales price and the footage, the number of
rooms and the age; use α = 0.01. State the alternatives, decision rule, and
conclusion. What is the corresponding p-value?
c- Obtain a 95% prediction interval for a residence price that has 182 square meter,
seven rooms and is 4 years old.
d- Test the hypotheses that :
H0 : 1 = 0 against Ha : 1 0
Use α = .01
[4] A hospital administrator wished to study the relation between patient satisfaction
(Y) and patient’s age (X1, in years), severity of illness (X2, an index) and
anxiety level (X3, an index). The administrator randomly selected 23 patients
and collect the data presented below, where larger values of Y, X2 and X3 are,
respectively, associated with more satisfaction, increased severity of illness, and
more anxiety.
X1 50 36 40 41 28 49 42 45 52 29 29 43
X2 51 46 48 44 43 54 50 48 62 50 48 53
X3 2.3 2.3 2.2 1.8 1.8 2.9 2.2 2.4 2.9 2.1 2.4 2.4
Y 48 57 66 70 89 36 46 54 26 77 89 67
X1 38 34 53 36 33 29 33 55 29 44 43
X2 55 51 54 49 56 46 49 51 52 58 50
X3 2.2 2.3 2.2 2 2.5 1.9 2.1 2.4 2.3 2.9 2.3
Y 47 51 57 66 79 88 60 49 77 52 60
a- Fit a linear multiple regression model for the three predictor variables to the data
and state the estimated regression function. How ̂ 2 is interpreted here?
b- Obtain the residuals and prepare a box plot of the residuals. Do there appear to be
- 68 -
any outliers?
c- Plot the residuals against Ŷ , each of the predictor variables, and each two-factor
interaction term on separate graphs. Also perform a normality test. Interpret your
plots and summarize your findings.
d- Test whether there is a regression relation; use α = 0.10. State the alternatives,
decision rule, and conclusion. What does your test imply about β1, β2 and β3 ?
What is the p-value of the test?
e- Calculate the coefficient of multiple determination. What does it indicate here?
[5] Set up the X matrix and β vector for the following regression model (i=1, 2, 3, 4)
log Y i = o + 1 X i 1 + 2 Xi 2 - 3 X i 1 X i 2 + i
- 69 -