0% found this document useful (0 votes)
83 views12 pages

7 Multiple Regression 3

This lecture discusses violations of two assumptions in the classical normal linear regression model: 1) Multi-collinearity, which occurs when there are linear relationships between independent variables, making it difficult to interpret regression coefficients. High correlation between variables can also increase standard errors. 2) Heteroscedasticity, which is non-constant error variances, violating the assumption of homoscedasticity. This makes least squares estimators less efficient. One approach is to transform the model so errors have equal variance. The Goldfeld-Quandt test can check for heteroscedasticity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views12 pages

7 Multiple Regression 3

This lecture discusses violations of two assumptions in the classical normal linear regression model: 1) Multi-collinearity, which occurs when there are linear relationships between independent variables, making it difficult to interpret regression coefficients. High correlation between variables can also increase standard errors. 2) Heteroscedasticity, which is non-constant error variances, violating the assumption of homoscedasticity. This makes least squares estimators less efficient. One approach is to transform the model so errors have equal variance. The Goldfeld-Quandt test can check for heteroscedasticity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Lecture 7: The Multiple Regression Model - Part III

Dr. Keith Wong

The Chinese University of Hong Kong

Dr. Keith Wong (CUHK) SEEM 3570: Stochastic Models 1 / 12


Introduction

Having discussed the classical normal linear regression models, we would like
to revisit some of the assumptions and explore situations in which the
assumptions are violated.
Seeking for improvement (if any) should violations occur.
We focus on revisiting the following two assumptions
No linear relationship exists between two or more of the independent variables
The errors variance are identical and constant (homoscedasticity).

Dr. Keith Wong (CUHK) SEEM 3570: Stochastic Models 2 / 12


Multi-collinearity

If perfect collinearity exists, the regression estimators â, bˆ1 , · · · , bˆk are not
well-defined.
Intuitively, recall that the coefficient bi measures the change in Y by shifting
one unit of xi , given all other variables are held constant.
However, if a linear relationship exists between two or more of the
independent variables, it would be impossible to change the value of one of
them without changing the value(s) of some of the rest.
Hence the previous interpretation is no longer valid.

Dr. Keith Wong (CUHK) SEEM 3570: Stochastic Models 3 / 12


Multi-collinearity
To illustrate, suppose R = a + b1 Sd + b2 Sw + , where R is the average exam
score, Sd is the average study hours per day, and Sw is the average study
hours per week.
Obviously, average study hours per day and per week possess a linear
relationship: 7Sd = Sw .
By no means we can vary one of the both while holding the other constant.
Mathematically, recall that
β̂ = (X> X)−1 (X> Y)
where  
  1 x11 ··· xk1
Y1 1 x12 ··· xk2 
Y =  ...  , X = . ..  ,
   
.. ..
 .. . . . 
Yn
1 x1n ··· xkn
if exact linear relationship exists among independent variables, determinant of
(X> X) is zero.
Hence, in calculating β̂, the inverse (X> X)−1 is not well defined.
Dr. Keith Wong (CUHK) SEEM 3570: Stochastic Models 4 / 12
Multi-collinearity

In practice, we are faced with more challenging situation - having


independent variables with a high degree of multi-collinearity.
No clear-cut to detect in general.
Least square estimates still possible!
Interpretation remains difficult
The distributions of β̂ are quite sensitive to the correlation between
independent variables, and also to the magnitude of the standard errors
sâ , sbˆ1 , · · · , sbˆk .
Large standard errors are likely resulted.

Dr. Keith Wong (CUHK) SEEM 3570: Stochastic Models 5 / 12


Multi-collinearity

For illustration purpose, a 3-variable regression gives

s2
sb̂2 = , u = 1, 2
u Sxu xu (1 − r 2 )
−s 2 r
Cov (bˆ1 , bˆ2 ) = p ,
(1 − r 2 ) Sx1 x1 Sx2 x2
Sx1 x2
where r = √ is the simple correlation between x1 and x2 .
Sx1 x1 Sx2 x2

When r → 1 or − 1, sb̂2 get very large.


u

In hypothesis test, it tends to be more difficult to reject the null hypothesis.


Remark: However, it can happen that the overall model is still significant
(F-test)

Dr. Keith Wong (CUHK) SEEM 3570: Stochastic Models 6 / 12


Heteroscedasticity

Recall the assumption of homoscedasticity, i.e. the variances of the error


terms 1 , · · · , n are identically equal to σ 2
May be inappropriate in some situations.
For example, family income v.s. expenditure
Not surprising to expect that low-income families have a rather steady
spending pattern, while high-income families have a rather volatile spending
pattern.
Hence the error variance associated with high-income families should be higher
than those associated with low-income families.
If applying the regression model on this situation, least square estimators are
still unbiased, but no longer efficient (i.e. not minimum variance among
unbiased estimators)
Motivated to consider techniques for handling non-identical error variances.

Dr. Keith Wong (CUHK) SEEM 3570: Stochastic Models 7 / 12


Heteroscedasticity

In general, there are many approaches.


For simplicity, we now consider a particular setting: error variances vary
directly with an independent variable.
Further assume σi2 are known.
Consider the following twovariable regression model:

Yi = a + bxi + i for i = 1, · · · , n

Here, we assume that σi2 = E[2i ] = Cxi2 , where C > 0 constant.


Remark that the model may fit into the income-expenditure example.
Idea: transform the model to be one with identical errors.

Dr. Keith Wong (CUHK) SEEM 3570: Stochastic Models 8 / 12


Heteroscedasticity

Upon dividing both sides of the model by xi , we have


Yi a i
= +b+ for i = 1, · · · , n
xi xi xi
or of the form

Yi∗ = a∗ + b ∗ xi∗ + ∗i for i = 1, · · · , n


Yi
where Yi∗ = , a∗ = b, b ∗ = a, xi∗ = x1i , ∗i = xii .
xi
h 2i
Then Var ((∗ )2 ) = E x 2 = x12 · Cxi2 = C ← constant variance!
i i

The least square estimation of the parameters can be applied on the


transformed model now.

Dr. Keith Wong (CUHK) SEEM 3570: Stochastic Models 9 / 12


Heteroscedasticity - Test

An informal but useful way is to examine the pattern of the residuals. e.g. a
plot of squares of residuals, ˆ2i against time for a time-series model.
Specific to the alternative hypothesis that σi2 = Cxi2 , Goldfeld-Quandt Test
can be used.
Idea:
calculate two regression lines, one using data thought to be associated with
low variance errors, and the other using data thought to be associated with
high variance errors.
If the residual variances associated with each regression line are approximately
equal, the homoscedasticity assumption cannot be rejected.

Dr. Keith Wong (CUHK) SEEM 3570: Stochastic Models 10 / 12


Heteroscedasticity - Goldfeld-Quandt Test

The Procedure is outlined as follows:


1 Order the data by the magnitude of the independent variable xi
2 Omit the middle d observations. d might be chosen, for example, to be
one-fifth of the total sample size.
3 Fit two separate regressions, the first for the portion of the data associated
with low values of xi , and the second associated with high values of xi . Each
model involves n−d n−d
2 pieces of data and 2 − 2 dof.
4 Calculate the residual sum of squares associated with each regression, say
ESSlow for the model with low values of xi , ESShigh for that with high values
of xi .
ESShigh
5 Given that the error process is normally distributed, ESSlow ∼ F n−d−4 , n−d−4 .
2 2
ESS
We can reject the null hypothesis at a chosen level of significance if ESShigh
low
is
greater than the critical value of the F distribution.
Remark: Can be applied to the multiple regression model with k independent
variables. Then the dof for F is n−d−2k−2
2 .

Dr. Keith Wong (CUHK) SEEM 3570: Stochastic Models 11 / 12


Suggested Readings

Chapters 4 and 6 of Robert S. Pindyck, Daniel L. Rubinfeld, Econometric Models


and Economic Forecasts (4th Edition), McGrawHill, Inc., 1997

Dr. Keith Wong (CUHK) SEEM 3570: Stochastic Models 12 / 12

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy