0% found this document useful (0 votes)
3 views23 pages

Topic 4 Multiple Regression_Estimation

The document discusses multiple regression estimation, focusing on the implications of omitted variable bias (OVB) when relevant variables are excluded from the model. It explains how OVB can lead to biased and inconsistent estimates, and outlines the conditions under which this occurs, along with potential solutions such as including omitted variables or using instrumental variables. Additionally, it covers the assumptions of multiple regression, the importance of avoiding multicollinearity, and the distribution of OLS estimators.

Uploaded by

yeelamnoah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views23 pages

Topic 4 Multiple Regression_Estimation

The document discusses multiple regression estimation, focusing on the implications of omitted variable bias (OVB) when relevant variables are excluded from the model. It explains how OVB can lead to biased and inconsistent estimates, and outlines the conditions under which this occurs, along with potential solutions such as including omitted variables or using instrumental variables. Additionally, it covers the assumptions of multiple regression, the importance of avoiding multicollinearity, and the distribution of OLS estimators.

Uploaded by

yeelamnoah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Econ 3121

Topic 4: Multiple Regression


Estimation

1
Econ 3121

➢ We covered
Yi  0  1 X i  u i
where we only have one regressor (explanatory variable,
independent variable) Xi
➢ But what if true specification is
Yi   0   1 X 1i   2 X 2i . . .  k X ki  u i
e.g. Y is wage; X1 is year of education; X2 is age;
X3 is female binary variable;…etc.
➢ If all those variables determine Y, but we leave them out,
they are in effect captured by u.

2
1) Omitted Variable Bias
Econ 3121

➢ The true model is


Yi   0   1 X 1i   2 X 2i  u i
➢ But we instead run
Yi   0   1 X 1i  u i, where u i   2 X 2i  u i
➢ If X2 and X1 are correlated (corr(X2,X1) is not equal to 0)
and X2 is a truly determinant of Y (beta2 is not equal to 0),
then our OLS Assumption 1 breaks down: Eu i|X 1i   0
➢ Then  1 will not be unbiased, nor will it be consistent.

3
1) Omitted Variable Bias
Econ 3121

➢ OVB (omitted variable bias)


 p covX 1i , u i 
1  1 
varX 1i 

bias

covX 1i ,  2 X 2i  u i  covX 1i , X 2i 
 1    1  2
varX 1i  varX 1i 

bias bias

➢ In this context, we can determine the sign of bias.


The sign is determined by  2 and covX 1i, X 2i .
 2 and covX 1i , X 2i positive bias  2 and covX 1i , X 2i positive bias
 
   

 2 and covX 1i , X 2i negative bias  2 and covX 1i , X 2i negative bias


 
    4
1) Omitted Variable Bias
Econ 3121

➢ OVB Example
➢ Suppose that the correct model is
Wage i   0   1 Education i   2 WorkingExperience i  u i
➢ But we use the model
Wage i   0   1 Education i  u i
➢ Then we will have an OVB.
➢ What is direction of OVB?
 2  and covEducation i, WorkingExperience i 
Negative bias: the estimated beta1_hat is smaller than the
true beta1. We underestimate the effect of education on wage.

5
1) Omitted Variable Bias
Econ 3121

➢ How do we solve the OVB problem?


➢ If we have data of the omitted variable, we just include
them in the regression, i.e., using multiple regression
➢ If we do not have data of the omitted variable, this is a
much more complicated problem. One potential solution is
to use instrumental variable (discussed later)

6
2) Multiple Regression Model
Econ 3121
➢ Clearly, if we leave out relevant variables, we have got
problems. In most cases, a multivariate model specification is
appropriate.
➢ Yi   0   1 X 1i   2 X 2i . . .  k X ki  u i
➢ X1,X2,X3…,Xk are the k different regressors
➢ beta0 is constant
➢ beta1: the effect of X1 on Y holding all other variables constant.
➢ beta2: the effect of X2 on Y holding all other variables constant.
….
➢ u: the error terms, all other factors that affect Y.
➢ Everything from univariate regression carries over. Now we
have to estimate not just beta0 and beta1, but also beta2,
beta3,…betak. 7
3) OLS in multiple Regression Model
Econ 3121
➢ We still want estimators of the beta’s so sum squared errors
are minimized:
u i  Yi   0   1 X 1i   2 X 2i . . .  k X ki
min  u 2i   i 0 1 1i 2 2i
Y     X   X . . .  X
k ki  2

➢ FOC:  Y i   0   1 X 1i   2 X 2i . . .  k X ki 
2
 0
 0
 Y i   0   1 X 1i   2 X 2i . . .  k X ki  2
 0
 1
...
 Y i   0   1 X 1i   2 X 2i . . .  k X ki  2
 0
 k 8
3) OLS in multiple Regression Model
Econ 3121
➢ We want find estimators of the beta’s (or the linear function
of X’s) so the sum of squared residuals is minimized.

9
3) OLS in multiple Regression Model
Econ 3121
➢ There are k+1 unknowns (beat0, beta1, beta2…,betak) and
k+1 equations.
➢ A lot of algebra. But the idea is the same as univariate case.
➢ OLS “line” will be:

Y i   0   1 X 1i   2 X 2i . . .  k X ki
 0 ,  1 . . . ,  k are estimators from OLS procedure.

➢ Again, residual is u i  Yi  Y i
➢ Again,  u i  0

10
3) OLS in multiple Regression Model
Econ 3121
➢ Regression of TestScore against STR:

➢ Now include percent English Learners in the district


(PctEL):

➢ What happens to the coefficient on STR?


➢ Why? (Note: corr(STR, PctEL) = 0.19)

11
3) OLS in multiple Regression Model
Econ 3121

12
4) Fit
Econ 3121

➢ As with the univariate case, everything extends to multiple


regression:
SER  1
n  k  1  u 2i  SSR
n  k  1

where n-k-1 is the degree of freedom (the number of


parameters to estimate is k+1)
➢ R2 is again the proportion of variation in Y explained by
the regressors in the liner model

R2  ESS  1  SSR
TSS TSS

13
4) Fit
Econ 3121

➢ R2 can never decrease by adding additional variables, this


implies you can keep adding variables without penalty,
even if the variables are ridiculous.
2
➢ The adjusted R2, R addresses this:
R
2
 1  n  1 SSR
n  k  1 TSS

➢ Let’s relate R 2with R2

R 2  1  SSR  SSR  1  R 2
TSS TSS
2
R  1  n  1 SSR  1  n  1 1  R 2 
n  k  1 TSS n  k  1

14
4) Fit
Econ 3121
2
➢ The R (the “adjusted R2”) corrects this problem by
“penalizing” you for including another regressor – the does
not necessarily increase when you add another regressor.
➢ Note 0  R 2  1
If R 2  0, then R 2  1  n  1  0
n  k  1
If R 2  1, then R 2  1
➢ Note that R 2  R 2
➢ however, if n is large, the two will be very close.
As n  , n  1  1, so R 2  1  1  R 2   R 2
n  k  1
➢ Don’t focus on R 2 or R 2 to decide your model. Let theory
dictate your variables. Use R 2 or R 2 as an indication of
whether you may have left stuff out. 15
5) Multiple Regression Assumption Econ 3121

➢ Assumption #1: the conditional mean of u given


the included X’s is zero.
➢ E(ui|X1i,…, Xki) = 0
➢ This has the same interpretation as in regression with
a single regressor.
➢ If an omitted variable (1) belongs in the equation (so
is in u) and (2) is correlated with an included X, then
this condition fails
➢ Failure of this condition leads to omitted variable bias
➢ The solution – if possible – is to include the omitted
variable in the regression.

16
5) Multiple Regression Assumption Econ 3121

➢ Assumption #2: (X1i,…,Xki,Yi), i =1,…,n, are i.i.d.


➢ This is satisfied automatically if the data are collected
by simple random sampling.

➢ Assumption #3: large outliers are rare (finite


fourth moments)
➢ This is the same assumption as we had before for a
single regressor. As in the case of a single regressor,
OLS can be sensitive to large outliers, so you need to
check your data (scatterplots!) to make sure there are
no crazy values (typos or coding errors).

17
5) Multiple Regression Assumption Econ 3121
➢ Assumption #4: There is no perfect multicollinearity
➢ Perfect multicollinearity is when one of the regressors is an
exact linear function of the other regressors:
➢ Example: Suppose you accidentally include STR twice:
➢ Type: reg testscr str str in Stata:

18
5) Multiple Regression Assumption Econ 3121
➢ Assumption #4: There is no perfect multicollinearity
➢ Example:
Y   0   1 X 1   2 X 2   3 X 3 . . .
Suppose X 3 : age in years
X 2 : age in days
X 2  365X 3 ; corrX 3 , X 2   1
➢ This does not make sense. How do you interpret beta2?
➢ Literally, if we hold X3 and all other variables constant, Y
will change by beta2 if we change X2 by 1 unit. How can
you change X2 (age in days) but also keep X3 (age in year)
constant?
➢ Stata will drop one automatically.

19
6) Distribution of OLS estimators
Econ 3121

➢ The standard error of beta1_hat, beta2_hat,…,betak_hat


become difficult to do without matrix algebra.
➢ Just like univariate, each beta_hat has a normal
distribution in large samples (as n goes to infinite) by
applying CLT.
➢ The usual hypothesis testing can be done, such as t-stat,
p-values, and confidence interval. We will do this in
Chapter 7.

20
7) Multicollinearity
Econ 3121

➢ Perfect v.s. Imperfect multicollinearity


➢ Perfect multicollinearity: two major reasons that happens
and OLS cannot run.

(i) The age example: age in years and age in days has the
same exact information.
Interest rate in decimal (e.g. 0.08) or basis points (e.g.
8)

Solution: remove one of them (otherwise Stata will
automatically drop one.)
21
7) Multicollinearity
Econ 3121

➢ Perfect multicollinearity:
(ii) dummy variable trap
1
Y  0  1X 1  2X 2  3X 3 where X 0 
1
 0X 0  1X 1  2X 2  3X 3 ...
1
Suppose that X1 is female dummy variable and X2 is male
dummy variable.
e.g. X1 X2 X0 So, don’t include X1 (or X2).
1 0 1
0 1 1
0 1 1
 
0 1 1
1 0 1
1 0 1 22
7) Multicollinearity
Econ 3121

➢ Imperfect multicollinearity:
➢ X2 and X3 might be highly correlated, but not perfectly.
➢ OLS can still run, but the standard error on beta2, beta3 or
both may be very high, which will result in a wide
confidence interval, low t-stat, high p-value (more likely
NOT to reject the null beta=0.)
➢ Classic Example:
Consumption i   0   1 Income i   2 Wealth . . . u i
➢ Income and wealth will usually have a correlation>0.90
➢ What should you do? You can collect more data, you can
combine the variables, or just do nothing and point the
potential problem.
23

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy