0% found this document useful (0 votes)

47 views18 pages

FCDS - RA ch4 Sp21

This section discusses multiple linear regression models where the response variable Y may be related to multiple independent variables X1, X2, ..., Xp-1. It describes how to write more complex models, like those with interaction terms or cubic terms, in the form of a multiple linear regression model. It also explains how to estimate the regression coefficients using the method of least squares and calculate fitted values and residuals.

Uploaded by

bashighschool888

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views18 pages

FCDS - RA ch4 Sp21

Uploaded by

bashighschool888

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Chapter 4

MULTIPLE REGRESSION

4.1 Multiple Regression Models

The response variable Y may related to p-1 independent variables (regressors)
X1 , X2 , ... , Xp-1 . The model
Y =  0 + 1x1 +  2x 2  + p-1xp1  
(4.1)
[Response] = [mean (depending on X1 , X2 , ... , Xp-1)] + [Error]

is called a multiple linear regression model with p-1 predictor variables. β0 , β1 , ... ,
βp-1 are constant parameters and called the regression coefficients. The parameter βk
represents the expected change in the response Y per unit change in Xk when all the
remaining regressor variables Xi (i  k) are held constant. For this reason the parameters
βk , k = 1, 2, ..., p-1 are often called partial regression coefficients. The error variables
εi are independent N(0, σ2 ).
Models that is more complex in structure than (4.1) may often still be analyzed
by multiple linear regression techniques. For example, consider the cubic polynomial
model
Y = 0 + 1 x +  2x2 +  3 x3 +  (4.2)
If we let x1 = x , x2 = x2 and x3 = x3 , then (4.2) can be written as
Y = 0 + 1x1 + 2 x2 + 3 x3 +  (4.3)
which is a multiple linear regression model with three regressor variables.
Models that include interaction effects may also be analyzed by multiple
regression models. For example, suppose that the model is
Y = 0 + 1 x1 + 2 x2 + 12 x1x2 +  (4.4)
If we let x3 = x1 x2 and β3 = β12 , then (4.4) can be written as
Y = 0 + 1 x1 + 2 x2 + 3 x3 + 
which is a linear regression model. In general, any regression model that is linear in the
parameters (β's) is a linear regression model regardless of the shape of the surface that
it generates.
With n independent observations on Y and the associated values of xi, the
complete model becomes

- 52 -
Y1 =  0 + 1 x11 +  2 x11 + ... +  p-1 x1,p 1 + 1
Y2 =  0 + 1 x 21 +  2 x 22 + ... +  p-1x 2,p1 +  2
(4.5)

Yn =  0 + 1 xn1 +  2 xn2 + ... + p-1xn,p 1 + n

where the error terms are assumed to have the properties:
1. E(i ) = 0 ;
2. Var(i ) =  2
( constant ) ; and
3. Cov(i ,  j ) = 0 , i  j .
It can also be written as
p-1

Yi =  0 +   j x +  , i =1, 2, ... , n (4.6)

ij i
j=1

In matrix notations (4.5) becomes

 Y1   1 x11 x12 . . . x1,p1   0    
 Y  1 x 
1

x 2,p1   1   2  
 2 = 
x 22 . . .
+ 
21

   . . .    
      
 n   1 xn1
Y xn2 . . . xn,p 1    p-1   n 
Or
Y  X    (4.7)
(n1) (np) (n1) (n1)

where:
Y is a vector of responses ;
β is a vector of parameters ;
X is a matrix of known constants;
 is a vector of independent random variables with expectation;
E(  ) = 0
and variance covariance matrix ;
Var(  ) =  2 I
Consequently, the random vector Y has expectation;
E(Y) = X (4.8a)
and variance covariance matrix ;
Var(Y) =  2 I (4.8b)

4.2 Least Squares Estimators

The method of least squares is used to estimate the regression coefficients β in
(4.5). Suppose that n > p, the least squares function is
2
n  n p 1

Q( )       Yi   0    jxij      (Y  X )(Y  X )
2
(4.9)
i 1  
i
i 1 j1

- 53 -
Let us denote the vector of least squares estimates ˆ 0 , ˆ 1 , , ˆ p1 of the regression

coefficients β0 , β1 , ... , βp-1 as ̂ :

 ˆ 0 
 
 ˆ 
ˆ   1 
 
 
ˆ
 p 1 
Then ̂ must satisfy

Q
 2X(Y  Xˆ )  0  XY  XXˆ (4.10)

ˆ

Equation (4.10) are the least squares normal equations. To solve them, multiply both
sides of (4.10), from left, by the inverse of XX , we obtain the least square estimate ̂
as
ˆ  (XX)1 XY (4.11)
provided that (X X)-1 exits.

4.3 Fitted Values and Residuals

Let the vector of the fitted values Ŷi be denoted by Ŷ and the vector of the
ˆ be denoted by e, then
residual terms ei = Yi - Yi

 Ŷ1   Y1   e1 
  Y  e 
 Ŷ2 
e = Y - Y , where Ŷ =   , Y =  2  and
ˆ e =  2 (4.12)
   
     
 Ŷn   Yn   en 
The vector of fitted values is given by
Ŷ = X ˆ  X(XX)1 XY= H Y (4.13)
where
H = X (XX)1 X (4.14)
Similarly, the vector of residuals is given by
e = Y - Yˆ  Y - HY  (I  H)Y (4.15)
The nxn matrix H is usually called the "hat" matrix because it maps the vector
of observed values into a vector of fitted values. The hat matrix has several useful
properties. It is symmetric (H = H) and idempotent (H H = H). Similarly the matrix (I
- H) is also symmetric and idempotent.

- 54 -
4.4 Properties of the Least Squares Estimators
The statistical properties of the least squares estimators ̂ may be easily
demonstrated. Consider first bias: by (4.11) and (4.8a) we have
E(ˆ )  E[(XX)1 XY]  (XX)1 X E(Y)  (XX)1 X X  
Thus ̂ is an unbiased estimator of β.

Secondly the variance property of ̂ is expressed by the covariance matrix (note

that if A is a constant matrix and Y is a random vector with variance covariance matrix
y , then the variance covariance matrix of Z = A Y is given by
Var(Z) = Var (AY) = A Var(Y) A = A y A )
The variance covariance matrix of ̂ is given by

var(ˆ )  var[(XX)1 XY]

 (XX)1 X var(Y) X(XX)1
 (XX)1 X { 2I}X(XX)1 (4.16)
=  2(XX ) (X X ) ( X X)1
-1

  2 (XX)1
Thirdly, if we further assume that the errors εi are normally distributed, then ̂
is also the maximum likelihood estimator of β .

4.5 Estimation of σ2
As in simple linear regression, we may develop an estimator of σ2 from the
residual sum of squares
n
SSE   ei2  e e  (Y  Y)
ˆ (Y  Y)
ˆ  YY  YY
ˆ Y
ˆ Y  Y
ˆ Y
ˆ
i 1

 YY  2ˆ XY  ˆ XXˆ

ˆ equals its transpose ŶY since it is a scalar and Ŷ  Xˆ  )
(note that YY
Since XX ̂ = X Y (least squares normal equations (4.10)), this last equation becomes

SSE  YY  ˆ XY (4.17)

The error sum of squares has (n -p) degrees of freedom since p parameters are
estimated in the regression model. The error mean square is MSE = SSE/(n -p). We can
show that the expected value of MSE is σ2, so an unbiased estimator of σ2 is given by
SSE 1
ˆ 2  MSE   (YY  ˆ XY) (4.178)
np np

- 55 -
4.6 Analysis of Variance Results
In multiple regression problems certain tests of hypotheses about the model
parameters are useful in measuring model adequacy. The test for significance of
regression is a test to determine if there is a linear relationship between the response Y
and any of the regressor variables x1 , x2 , ... , xp-1 . The appropriate hypotheses are

H0 : β1 = β2 = ... = βp-1 = 0 VS Ha : βj  0 for at least one j

Rejection of H0 implies that at least one of the regressors x1 , x2 , ... , xp-1 contributes
significantly to the model. The test procedure is a generalization of that used in simple
linear regression. The total sum of squares Syy is partitioned into a sum of squares due
to regression and error (residual) sum of squares,

Syy = SSR + SSE

The test procedure for testing H0 is to compute
SSR / ( p - 1 ) MSR
F0 = =
SSE / ( n - p ) MSE
where MSR and MSE are the regression and residual mean squares respectively, and
reject H0 at α significance level if
F0 > Fα , p-1 , n-p
The corresponding ANOVA table is shown in table (4.1).

Source Sum of Squares Degrees of Mean F0

Freedom Square
Regression SSR p-1 MSR
MSR/MSE
Residual SSE n-p MSE
Total SST = Syy n-1
Table 4.1. The ANOVA table for significance of multiple regression

A computational formula for SSR is found by starting with

SSE  YY  ˆ XY
the total sum of squares is given by
n n

SST = Syy =  (Yi  Y) =  Yi2  nY 2  Y Y - nY 2

i 1 i 1

Therefore,
SSR  SST  SSE  ˆ XY  nY2

- 56 -
4.7 Coefficient of Multiple Determination
The coefficient of multiple determination R2 is defined as
SSR SSE
R2 = =1-
S yy S yy
It is customary to think of R2 as a measure of reduction in the variability of Y
explained by the regressor variables x1 , x2 , ... , xp-1 . Since
0  SSE  S y y
it follows that,
0  R2  1
However, a large value of R2 does not necessarily imply that the regression model is a
good one. Adding a regressor to the model will always increase R2 regardless of
whether or not the additional regressor contributes to the model.
The positive square root of R2 ,

0 R = R 2 1
is the multiple correlation coefficient between Y and the set of regressor variables
x1 , x2 , ... , xp. That is R is a measure of the linear association between Y and
x1 , x2 , ... , xp-1 .

Example 4.1
The following data consist of the scores that 10 students obtained in an
examination, their IQ’s and the number of hours they spent studying for the
examination.
IQ Number of hours studied Score
X1 X2 Y
112 6 74
126 13 97
100 5 54
113 7 70
112 11 77
121 7 88
110 8 73
103 5 54
111 6 69
124 2 82
a- Fit a linear regression model to the data, and state the estimated regression function.
How ̂1 and ̂ 2 are interpreted here?
b- Test whether there is a regression relation (significance of regression), using a level
of significance of 0.05, state the alternative, decision rule, and conclusion.

- 57 -
c- Calculate the coefficient of multiple determination R2. How is this measure
interpreted here?
d- Predict the score of a student with IQ of 108 who studied 6 hours for the
examination.
Solution
Here we have,
1 112 6   74 
1 126 13   97 
  
1 100 5   54 
   
1 113 7   70   738   10 1132 70 
1 112 11  ,  77  , 
X Y   84511  , X X = 1132 128780 7989 

X  Y     
1 121 7   88   5360   70 7989 578 
1 110 8   73 
   
1 103 5   54 
1 111 6   69 
   
1 124 2   82 
Note that from the definition of X and from the first row of the matrix X’X, we
have
n n 113.2 
n=10, 1132   xi1 and 70   xi 2 , therefore x    Also from the first row
i 1 i 1  7.0
n
of the matrix X’Y,we have 738   y i , therefore y  73.8 .
i 1

 20.4509 -0.1832 0.0558   93.00 

-1  
(X X ) =  -0.1832 0.0017 -0.0013  , thus we have ˆ  (XX) XY   1.40 
 1

 0.0558 -0.0013 0.0123   1.17 

a- The estimated regression function is

Ŷ = - 93 + 1.4 x1 + 1.17 x2
This estimated regression function indicates that the mean scores of students is
expected to increase by 1.4 when the IQ increases by 1, holding the number of
hours studied constant, and the mean scores of students is expected to increase
by 1.17 when the number of hours studied increases by 1, holding the IQ levl
constant
The residual plot of ei against the fitted values is shown in Fig. 4.1. From this
graph, it clear that there is no any obvious departures from multiple linear
regression model. So the variance is constant, there is no outliers and also there in
any indication for nonlinearity.

- 58 -
FIG. 4.1 Scatter and residual plots of
the students data.

Fig. 4.2 Normal Probability Plot and test of

the students data
Fig. 4.2 presents a normal probability plot of the residuals. The points in this figure
fall reasonably close to a straight line and the p-value is large (>0.10), suggesting
that the distribution of the error terms does not depart substantially from a normal
distribution.
b- To test whether the exam scores are linearly related to the students IQ’s and the
number of hours studied, we construct the ANOVA table. The basic quantities
needed are:

y  73.8 , SST  S yy   (y i - y )2  1639.6, SSR  ˆ XY  ny 2  1585.2 and

i=1

SSE = SST-SSR = 54.4

- 59 -
H0 : β1 = β2 =0 VS Ha : at least one of β1 & β2 does not equal zero.
Hence the ANOVA table is given by

Source SS d.f. MSS F0

Regression SSR=1585.2 2 MSR=792.6 MSR/MSE=
Residual SSE=54.4 7 MSE=7.77 102.0

Total SST = 1639.6 9

Thus
SSR /2 MSR
Fc = =  102.0
SSE / 7 MSE
From the F distribution table we have, F.05 , 2 , 7 = 4.74. and since Fc = 102 > 4.74,
we conclude that H0 is highly significant, that is there is a strong linear association
between the exam score and the students IQ and the number of hours studied.
The p-value for this test is < 0.001.

c- The coefficient of multiple determination R2 is given by

SSR 1585.2
R2 = =  0.967
S yy 1639.6
Thus, when the two independent variables, the students IQ and the number of hours
studied are considered, the variation of students score is reduced by 96.7%.

d- The required predictor score is

ŷ  93  1.4(108)  1.17(6)  65
The MINITAB output is

The regression equation is

Y = - 93.0 + 1.40 X1 + 1.17 X2
Predictor Coef SE Coef T P
Constant -93.00 12.61 -7.38 0.000
X1 1.4012 0.1148 2.20 0.000
X2 1.1696 0.3091 3.78 0.007
S = 2.78820 R-Sq = 96.7% R-Sq(adj) = 95.7%
Analysis of Variance
Source DF SS MS F P
Regression 2 1585.18 792.59 101.95 0.000
Residual Error 7 54.42 7.77
Total 9 1639.60

- 60 -
4.8 Inferences About Regression Parameters
In section 4.4 we have proved that the least squares estimator ̂ is an unbiased
estimator of β
E ˆ =   
with variance-covariance matrix
 var (ˆ 0 ) cov( ˆ 0 , ˆ 1 ) ... cov(ˆ 0 , ˆ p1 ) 
 
 ˆ ˆ
cov(1 ,  0 ) ˆ
var (1 ) ... ˆ ˆ
cov(1 ,  p1 ) 
 
Var ˆ =  2(X X)1 =  
 
 ˆ , ˆ ) cov( ˆ , ˆ ) ... var(ˆ ) 
 cov ( p  1 0 p  1 1 p  1 
The estimated variance covariance matrix S2 ( ̂ ) is then,
 S 2 (ˆ 0 ) S( ˆ 0 , ˆ 1 ) ... S(ˆ 0 , ˆ p1 ) 
 
 ˆ ˆ
S(1 ,  0 ) 2 ˆ
S (1 ) ... ˆ ˆ
S(1 ,  p1 ) 

S 2 (ˆ ) = Var ˆ = MSE (X X)1 =   (4.19)
 
 ˆ ˆ ˆ ˆ ... S 2 (ˆ p1 ) 
 S(p1 ,  0 ) S(  p1 , 1 ) 
From S2( ̂ ) , one can obtain S2( ̂ 0 ), S2( ̂1 ) or whatever other variance is needed, or
any needed covariances.

4.8.1 Confidence Interval for βk

If the errors εi are NID(0, σ2), we have
ˆ k -  k
T= ~ t n p (4.20)
S (ˆ )
2
k

Therefore a 100(1 - α) % confidence limits for βk are

ˆ
 ˆ )
t  / 2,np S 2( (4.21)
k k

4.8.2 Tests For βk

The appropriate test statistic for testing
H0 : k = 0 VS Ha : k  0 (4.22)
is
ˆ k (4.23)
T0 =
S (ˆ k )
2

and the decision rule is: Reject H0 at α significance level if

| T0 | > t  / 2, n-p
otherwise accept H0.

- 61 -
4.9 Interval Estimation of the Mean Response
For given values of X1 , X2 , ... , Xp-1 , denoted by xh1 , xh2 , ... , xh,p-1 1 , the mean
response is denoted by E[Y/Xh]. We define the vector Xh as
1 
x 
 h1 
xh =  xh2 
 
 
 xh,p-1 
 
so that the mean response to be estimated is
E[Yh | Xh ] = 0 + 1xh1 + 2xh2 + ... + p - 1xh,p-1 = xh  (4.24)
The estimated mean response corresponding to xh , denoted by ŷ h is

yˆ h  xh ˆ (4.25)
This estimator is unbiased
E[yˆ h] = xh E[ˆ ] = xh  = E[Yh | Xh]
with variance
Var (yˆ h ) = Var (xh ˆ ) = xh [var (ˆ )]xh (4.26)
Thus an unbiased estimator for this variance, using (4.19) is given by
S (yˆ ) = x [S (ˆ )]x = MSE [x (X X) x ]
2 2 1
h h h h h (4.27)
Consequently a 100(1 - α)% C.I. on the mean response at Xh is
yˆ h t  / 2, n-p MSE [xh (X X)1 xh ] (4.28)
where tα/2, n-p denotes the (α/2)100 percentile of the t distribution with n-p degrees of
freedom.

4.10 Prediction of New Observation

The prediction limits with 1-α confidence coefficient for a new observation
Yh(new) corresponding to xh , the specified values of the X variable , are given by
yˆ h  t  / 2, n-p MSE [1  xh (X X)1 xh ] (4.29)

Example 4.2
The following data show the number of bedrooms, the number of baths, and the
prices at which a random sample of eight one-family houses sold recently in a certain
large housing development:

- 62 -
Number of bedrooms Number of baths Price (dollars)
x1 x2 Y
3 2 78,800
2 1 74,300
4 3 83,800
2 1 74,200
3 2 79,700
2 2 74,900
5 3 88,400
4 2 82,900
Given that:
 8 25 16 107 -20 -17   637,000
X X =  25 87 55 , (X X)1 =
1 
 -20 32 -40  , X Y =  2,031,100
84
 16 55 36  -17 -40 71 1,297,700
a- Fit a linear multiple regression model to the data, and state the estimated
regression function. How ̂1 and ̂ 2 interpreted here?

Answer:
Here we have a linear multiple regression model with 2 independent variables
(p-1 = 2) and n=8. The least squared estimate of the parameters are given by
107 -20 -17   637,000  5,476,100  65,191.7 
ˆ = (X X) X Y =  -20 32 -40  2,031,100 =  347,200 =  4,133.3
   
1 1 1
84 84
 -17 -40 71 1,297,700  63,700  758.3
Thus the regression function is
ŷ = 65,192 + 4133x1 + 758 x2
This estimated regression function indicates that mean price of houses is
expected to increase by $4133 when the number of bedrooms increases by 1 room,
holding the number of baths constant, and that mean price of houses is expected to
increase by $758 when the number of baths increases by 1 bath, holding the number
of bedrooms constant.

b- Set up the ANOVA table. Conduct an F test to determine whether or not there is a
linear association between the house price and the number of bedrooms and the
number of baths; use α = 0.01. State the alternatives, decision rule, and conclusion.
What is the corresponding p-value?
Answer

- 63 -
To test whether house prices are linearly related to the number of bedrooms and the
number of baths, we construct the ANOVA table. The basic quantities needed are:
n
Y = 79625 , SST = S yy = (y i - y)2 = 185,955,000 , SSE = SST - SSR = 685834
i=1
Hence the ANOVA table is given by

Source Sum of Squares d. f. MSS F0

Regression SSR = 185,269,166 2 MSR = 92634583 MSR/MSE

= 675.3
Residual SSE = 685,834 5 MSE = 137,166.8

Total SST = 185,955,000 7

Test of Regression relation. To test whether house prices are linearly related to
the number of bedrooms and the number of baths (significance of the regression
model):
H0 : β1 = β2 = 0 versus Ha : not both β1 and β2 equal 0
we use the F test statistic
MSR
F0 = = 675.3
MSE
For α = 0.01, we require F0.01, 2, 5 = 13.3 , since F0 = 675.3 > 13.3 , we reject H0 ,
that is house prices are linearly related to the number of bedrooms and the number
of baths. The p-value for this test is less than 0.001 since we note from the F-table
that F0.001, 2, 5 = 37.1.

c- Calculate the coefficient of multiple determination R2. How is it interpreted here?

Answer
The coefficient of multiple determination R2 is given by
2 SSR 185269166
R = = = 0.996
Sy y 185955000
Thus, when the two independent variables the number of bedrooms and the number
of baths are considered, the variation of house's prices is reduced by 99.6%.

d- Find a 90% confidence interval for β1 and β2.

Answer
First, we need the estimated variance-covariance matrix S2( ̂ ).
107 -20 -17 
 1   
S (ˆ ) = MSE (X X) = 137,167    -20 32 -40 
2 1

 84 
 -17 -40 71
From which it follows that,

- 64 -
S2 ( ̂1 ) = 137,167. (1/84).32 = 52,254.1
S2 ( ̂ 2 ) = 137,167. (1/84).71 = 115,938.8
Next, we require, t α/2, n-p = t 0.05, 5 = 2.015. Thus, the 90% C.I. for β1 is given by
ˆ 1  t  / 2,np S2(ˆ 1 )   1  ˆ 1  t  / 2,np S2(ˆ 1 )

4133.3 - 2.015 52,254.1   1  4133.3 + 2.015 52,254.1

3672.7   1  4593.9
and the 90% C.I. for β2 is given by
ˆ 2  t  / 2,np S2(ˆ 2 )   2  ˆ 2  t  / 2,np S2(ˆ 2 )

758 - 2.015 115,938.8   2  758 + 2.015 115938.8

297.4   2  1218.6

e- Obtain a 98% confidence interval for the mean house price of houses having 3
bedrooms and 3 baths. Interpret your interval estimate.
Answer
Here we have xh1 = 3 and xh2 = 3 , so we define
 1
xh =  3 
 3 
The point estimate of mean house price corresponding to xh is
 65192 
yˆ h  xhˆ = [1 3 3]  4133  = 79,865
 758 
The estimated variance, using (4.27) is given by
S ( yˆ h ) = xh [S (ˆ )]xh = MSE [xh (X X) xh ]
2 2 1

107 -20 -17   1 

 1 
= 137,167   [1 3 3]  -20 32 -40   3 
 84 
 -17 -40 71  3 
 92 
= 137167   = 150,230.5
 84 
Now we need tα/2,n-p = t 0.01, 5 = 3.365 , and obtain by (4.28) the confidence limits
The 98% confidence interval for E[Yh ] is
yˆ h t  / 2, n-p MSE [xh (X X)1 xh ]  79865 3.365 150230.5  (78561,81169)

- 65 -
i.e.
78561  E[Yh]  81169
Thus, with confidence coefficient 98%, we estimate that the mean house prices of
houses having 3 bedrooms and 3 baths are somewhere between 78,561 and 81,169
dollars.

f- Obtain a 98% confidence interval for a new house price with 3 bedrooms and 3
baths.
Answer
From the results obtained in (D), the 98% prediction limits a new observation
Yh(new) corresponding to xh using (2.29) , are given by
yˆ h t  / 2, n-p MSE [1  xh (X X)1 xh ]

 92 
= 79865 3.365 137167 1 + 
 84 
= ( 76757 , 81669 )

<+><+><+><+><+><+>

- 66 -
EXERCISES

[1] For the multiple linear regression model,

Yi = 0 + 1xi1 + 2 xi2 +  p1 xi,p1 + i , i=1, 2, ..., n.
where ε1 , ε2 , ..., εn are independent N(0,σ 2 ) r.v.'s.
a- Put the model in matrix form, and then obtain the least squares estimate ̂ of the
vector β of the unknown parameters.
b- Prove that ̂ is unbiased and derive its variance covariance matrix.
c- Show that the residual sum of squares is given by
SSE  YY  ˆ XY

[2] The manager of a soft-drink wants to use model for studying the linear relationship
between the number of cases (x1) and the delivery distance in Km (x2) with the
delivery time (Y). The following calculations are obtained from the observations:
 20 150 64   550 
X X =  150 1626 530.5  , X Y = 5513  , S yy = 4241
 
 64 530.5 266.6   2006

a- State the values of n, X and Y .

b- Fit a linear multiple regression model to the data, and state the estimated
regression function. How ̂1 is interpreted here?
c- Set up the ANOVA table. Conduct an F test to determine whether or not there is
a linear association between the number of cases (x1) and the delivery distance
(x2) and the delivery time (Y); use α = 0.01. State the alternatives, decision rule,
and conclusion.
d- Calculate the coefficient of multiple determinations R2. How is it interpreted
here?
e- Test the hypotheses that: H0 :  2 = 0 against Ha :  2  0 . Use α = .01.
[3] The following data presents the sales price Y (in thousands of dollars), square
footage x1 (in square meter) , number of rooms x2 , and age x3 (in years) for 16
single-family residences sold during 1995 in a certain city.

Residence Sales Price Square meter Rooms Age

i Y x1 x2 x3
1 52 110 2 6
2 48 102 3 7
3 87 210 9 8
4 38 92 2 7
5 76 154 6 2

- 67 -
6 94 198 7 2
7 66 132 3 3
8 55 106 3 3
9 52 104 2 2
10 70 144 5 4
11 83 176 6 2
12 69 154 5 4
13 92 202 8 6
14 66 136 5 1
15 72 164 5 5
16 58 122 3 3

a- Fit a linear multiple regression model to the data, and state the estimated
regression function. How ̂1 is interpreted here?
b- Set up the ANOVA table. Conduct an F test to determine whether or not there is
a linear association between the sales price and the footage, the number of
rooms and the age; use α = 0.01. State the alternatives, decision rule, and
conclusion. What is the corresponding p-value?
c- Obtain a 95% prediction interval for a residence price that has 182 square meter,
seven rooms and is 4 years old.
d- Test the hypotheses that :
H0 : 1 = 0 against Ha : 1  0
Use α = .01
[4] A hospital administrator wished to study the relation between patient satisfaction
(Y) and patient’s age (X1, in years), severity of illness (X2, an index) and
anxiety level (X3, an index). The administrator randomly selected 23 patients
and collect the data presented below, where larger values of Y, X2 and X3 are,
respectively, associated with more satisfaction, increased severity of illness, and
more anxiety.
X1 50 36 40 41 28 49 42 45 52 29 29 43
X2 51 46 48 44 43 54 50 48 62 50 48 53
X3 2.3 2.3 2.2 1.8 1.8 2.9 2.2 2.4 2.9 2.1 2.4 2.4
Y 48 57 66 70 89 36 46 54 26 77 89 67

X1 38 34 53 36 33 29 33 55 29 44 43
X2 55 51 54 49 56 46 49 51 52 58 50
X3 2.2 2.3 2.2 2 2.5 1.9 2.1 2.4 2.3 2.9 2.3
Y 47 51 57 66 79 88 60 49 77 52 60

a- Fit a linear multiple regression model for the three predictor variables to the data
and state the estimated regression function. How ̂ 2 is interpreted here?
b- Obtain the residuals and prepare a box plot of the residuals. Do there appear to be
- 68 -
any outliers?
c- Plot the residuals against Ŷ , each of the predictor variables, and each two-factor
interaction term on separate graphs. Also perform a normality test. Interpret your
plots and summarize your findings.
d- Test whether there is a regression relation; use α = 0.10. State the alternatives,
decision rule, and conclusion. What does your test imply about β1, β2 and β3 ?
What is the p-value of the test?
e- Calculate the coefficient of multiple determination. What does it indicate here?

[5] Set up the X matrix and β vector for the following regression model (i=1, 2, 3, 4)
log Y i = o + 1 X i 1 +  2 Xi 2 -  3 X i 1 X i 2 +  i

[6] Consider the multiple linear regression model;

Yi = 1Xi1 + 2 Xi2 + i , i = 1, 2, ... , n.
where εi are uncorrelated with E[εi] = 0 and var[εi] = σ2. State the least squares
criterion and derive the least squares estimators of β1 and β2.
< - >< - >< - >< - >< - >< - >< - >< - >< - >

- 69 -

MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
No ratings yet
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
12 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
Lecture 2 Multivariate Linear Regression Models
No ratings yet
Lecture 2 Multivariate Linear Regression Models
15 pages
Chapter 2
No ratings yet
Chapter 2
19 pages
Education and Research: UP School of Statistics Student Council
No ratings yet
Education and Research: UP School of Statistics Student Council
26 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Stat 136 Chapter 4 Least Squares Theory and Analysis of Variance
No ratings yet
Stat 136 Chapter 4 Least Squares Theory and Analysis of Variance
34 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Chapter 3 Multivariate Linear Regression
No ratings yet
Chapter 3 Multivariate Linear Regression
16 pages
Least Squares Estimation PDF
No ratings yet
Least Squares Estimation PDF
5 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
Multiple Regression
No ratings yet
Multiple Regression
22 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
17 pages
Linear Regression Analaysis - 21
No ratings yet
Linear Regression Analaysis - 21
22 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
EC1 Slides Part4
No ratings yet
EC1 Slides Part4
35 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Paper On Polynomial Regression
No ratings yet
Paper On Polynomial Regression
7 pages
Multiple Regression Model - Matrix Form
No ratings yet
Multiple Regression Model - Matrix Form
22 pages
Topic09. Multiple Regression
No ratings yet
Topic09. Multiple Regression
36 pages
Linear Model
No ratings yet
Linear Model
11 pages
3.1 Multiple Linear Regression Model
No ratings yet
3.1 Multiple Linear Regression Model
11 pages
LMnotes 04
No ratings yet
LMnotes 04
9 pages
Lecture 12 - Adv. Correlation and Multiple Regression
No ratings yet
Lecture 12 - Adv. Correlation and Multiple Regression
32 pages
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
No ratings yet
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
34 pages
Mukti Linear Regresson
No ratings yet
Mukti Linear Regresson
35 pages
Chapter5-Multiple Linear Regression
No ratings yet
Chapter5-Multiple Linear Regression
5 pages
WST 311 Notes Part 2 2024
No ratings yet
WST 311 Notes Part 2 2024
21 pages
Econometric Lec3
No ratings yet
Econometric Lec3
76 pages
Review of Multiple Regression: Assumptions About Prior Knowledge. This Handout Attempts To Summarize and Synthesize
No ratings yet
Review of Multiple Regression: Assumptions About Prior Knowledge. This Handout Attempts To Summarize and Synthesize
12 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Linear Least Squared
No ratings yet
Linear Least Squared
23 pages
Regression Modelling and Least-Squares: GSA Short Course: Session 1 Regression
No ratings yet
Regression Modelling and Least-Squares: GSA Short Course: Session 1 Regression
6 pages
Unit4 Multivariate Analysis
No ratings yet
Unit4 Multivariate Analysis
20 pages
Multiple Linear Regression & Nonlinear Regression Models
No ratings yet
Multiple Linear Regression & Nonlinear Regression Models
51 pages
Multi Variate Regression
No ratings yet
Multi Variate Regression
52 pages
ECON 5350 Class Notes Least Squares: 2.1 The Problem
No ratings yet
ECON 5350 Class Notes Least Squares: 2.1 The Problem
4 pages
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
No ratings yet
Lecture 22: Review For Exam 2 1 Basic Model Assumptions (Without Gaussian Noise)
7 pages
Unit 4 Multiple Linear Regression
No ratings yet
Unit 4 Multiple Linear Regression
3 pages
4.1 Multiple Regression Models
No ratings yet
4.1 Multiple Regression Models
6 pages
REg 03
No ratings yet
REg 03
39 pages
Mult Regression
No ratings yet
Mult Regression
28 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
Chapter 1 - Linear Regression With 1 Predictor: Statistical Model
No ratings yet
Chapter 1 - Linear Regression With 1 Predictor: Statistical Model
35 pages
Econ 332 Lecture Notes April 2021
No ratings yet
Econ 332 Lecture Notes April 2021
57 pages
6 Multiple Regression
No ratings yet
6 Multiple Regression
26 pages
Lecture2 241007 162001
No ratings yet
Lecture2 241007 162001
11 pages
Lec 30
No ratings yet
Lec 30
10 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Eric M. Sears (2020) The Atlantic Origins of Americas Panicked Decade PHD DISSERT
No ratings yet
Eric M. Sears (2020) The Atlantic Origins of Americas Panicked Decade PHD DISSERT
305 pages
Bai 2000 Var Variance Co Variance
No ratings yet
Bai 2000 Var Variance Co Variance
37 pages
A Solution Manual and Notes For: The Elements of Statistical Learning by Jerome Friedman, Trevor Hastie, and Robert Tibshirani
No ratings yet
A Solution Manual and Notes For: The Elements of Statistical Learning by Jerome Friedman, Trevor Hastie, and Robert Tibshirani
121 pages
Generalized Least Squares Theory
No ratings yet
Generalized Least Squares Theory
32 pages
Computer Vision For The Web - Sample Chapter
No ratings yet
Computer Vision For The Web - Sample Chapter
19 pages
The Relation Between Stock Returns and Earnings
No ratings yet
The Relation Between Stock Returns and Earnings
24 pages
Hawera Solomon Article Review
No ratings yet
Hawera Solomon Article Review
7 pages
Eviews: Example #3: Education and U.S. Mortality Rates
100% (1)
Eviews: Example #3: Education and U.S. Mortality Rates
35 pages
The FED Model and Expected Asset Returns
No ratings yet
The FED Model and Expected Asset Returns
68 pages
Zeger (1988) Models For Longitudinal Data A Generalized Estimating Equation Approach PDF
No ratings yet
Zeger (1988) Models For Longitudinal Data A Generalized Estimating Equation Approach PDF
13 pages
VAR, SVAR and SVEC Models: Implementation Within R Package Vars
No ratings yet
VAR, SVAR and SVEC Models: Implementation Within R Package Vars
32 pages
Econometrics 2
No ratings yet
Econometrics 2
84 pages
Lehna Et Al. 2022 - Forecasting Day-Ahead Electricity Prices - A Compari ... Ies and Neural Network Models Taking External Regressors Into Account
No ratings yet
Lehna Et Al. 2022 - Forecasting Day-Ahead Electricity Prices - A Compari ... Ies and Neural Network Models Taking External Regressors Into Account
15 pages
Jack Johnston, John DiNardo Econometric Methods, Fourth Edition PDF
75% (4)
Jack Johnston, John DiNardo Econometric Methods, Fourth Edition PDF
514 pages
CQF January 2017 M5L6 Blank PDF
100% (3)
CQF January 2017 M5L6 Blank PDF
122 pages
Paper On Var
No ratings yet
Paper On Var
157 pages
Applied Econometrics 2nd Edition Dimitrios Asteriou PDF Download
No ratings yet
Applied Econometrics 2nd Edition Dimitrios Asteriou PDF Download
40 pages
PC2 - Corregida
No ratings yet
PC2 - Corregida
8 pages
Export Preformance of S Piece Crops in Ethiopia
No ratings yet
Export Preformance of S Piece Crops in Ethiopia
10 pages
Gujarati Book
No ratings yet
Gujarati Book
3 pages
Real-Time Neuroimaging and Cognitive Monitoring Using Wearable Dry EEG
No ratings yet
Real-Time Neuroimaging and Cognitive Monitoring Using Wearable Dry EEG
15 pages
Modelling Volatility and Correlation: Introductory Econometrics For Finance' © Chris Brooks 2008 1
No ratings yet
Modelling Volatility and Correlation: Introductory Econometrics For Finance' © Chris Brooks 2008 1
59 pages
Bayesian Multivariate Time Series Methods For Empirical Macroeconomics
No ratings yet
Bayesian Multivariate Time Series Methods For Empirical Macroeconomics
14 pages
Bonat 2018
No ratings yet
Bonat 2018
30 pages
Jack K. Strauss Simultaneity and VAR PDF
No ratings yet
Jack K. Strauss Simultaneity and VAR PDF
5 pages
Dad Regression Using Python Statsmodel Formula PDF
No ratings yet
Dad Regression Using Python Statsmodel Formula PDF
1 page
Financial Deepening and The Performance of Manufacturing Firms in Nigeria
No ratings yet
Financial Deepening and The Performance of Manufacturing Firms in Nigeria
10 pages
Assessing The Impact of The Oil Price Shocks On Economic Growth in Arab Countries
No ratings yet
Assessing The Impact of The Oil Price Shocks On Economic Growth in Arab Countries
12 pages
Topic 9. Factorial Experiments (ST&D Chapter 15) : 1 2 I I I I
No ratings yet
Topic 9. Factorial Experiments (ST&D Chapter 15) : 1 2 I I I I
19 pages
Inflation Dynamics in Transition Economy of Lao PDR.
100% (1)
Inflation Dynamics in Transition Economy of Lao PDR.
31 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

FCDS - RA ch4 Sp21

Uploaded by

FCDS - RA ch4 Sp21

Uploaded by

Chapter 4

4.1 Multiple Regression Models

Yn =  0 + 1 xn1 +  2 xn2 + ... + p-1xn,p 1 + n

Yi =  0 +   j x +  , i =1, 2, ... , n (4.6)

In matrix notations (4.5) becomes

4.2 Least Squares Estimators

coefficients β0 , β1 , ... , βp-1 as ̂ :

4.3 Fitted Values and Residuals

Secondly the variance property of ̂ is expressed by the covariance matrix (note

var(ˆ )  var[(XX)1 XY]

 YY  2ˆ XY  ˆ XXˆ

SSE  YY  ˆ XY (4.17)

H0 : β1 = β2 = ... = βp-1 = 0 VS Ha : βj  0 for at least one j

Syy = SSR + SSE

Source Sum of Squares Degrees of Mean F0

A computational formula for SSR is found by starting with

SST = Syy =  (Yi  Y) =  Yi2  nY 2  Y Y - nY 2

 20.4509 -0.1832 0.0558   93.00 

 0.0558 -0.0013 0.0123   1.17 

a- The estimated regression function is

Fig. 4.2 Normal Probability Plot and test of

y  73.8 , SST  S yy   (y i - y )2  1639.6, SSR  ˆ XY  ny 2  1585.2 and

SSE = SST-SSR = 54.4

Source SS d.f. MSS F0

Total SST = 1639.6 9

c- The coefficient of multiple determination R2 is given by

d- The required predictor score is

The regression equation is

4.8.1 Confidence Interval for βk

Therefore a 100(1 - α) % confidence limits for βk are

4.8.2 Tests For βk

and the decision rule is: Reject H0 at α significance level if

4.10 Prediction of New Observation

Source Sum of Squares d. f. MSS F0

Regression SSR = 185,269,166 2 MSR = 92634583 MSR/MSE

Total SST = 185,955,000 7

c- Calculate the coefficient of multiple determination R2. How is it interpreted here?

d- Find a 90% confidence interval for β1 and β2.

4133.3 - 2.015 52,254.1   1  4133.3 + 2.015 52,254.1

758 - 2.015 115,938.8   2  758 + 2.015 115938.8

107 -20 -17   1 

[1] For the multiple linear regression model,

a- State the values of n, X and Y .

Residence Sales Price Square meter Rooms Age

[6] Consider the multiple linear regression model;

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.