Topic 4 Multiple Regression_Estimation
Topic 4 Multiple Regression_Estimation
1
Econ 3121
➢ We covered
Yi 0 1 X i u i
where we only have one regressor (explanatory variable,
independent variable) Xi
➢ But what if true specification is
Yi 0 1 X 1i 2 X 2i . . . k X ki u i
e.g. Y is wage; X1 is year of education; X2 is age;
X3 is female binary variable;…etc.
➢ If all those variables determine Y, but we leave them out,
they are in effect captured by u.
2
1) Omitted Variable Bias
Econ 3121
3
1) Omitted Variable Bias
Econ 3121
bias
covX 1i , 2 X 2i u i covX 1i , X 2i
1 1 2
varX 1i varX 1i
bias bias
➢ OVB Example
➢ Suppose that the correct model is
Wage i 0 1 Education i 2 WorkingExperience i u i
➢ But we use the model
Wage i 0 1 Education i u i
➢ Then we will have an OVB.
➢ What is direction of OVB?
2 and covEducation i, WorkingExperience i
Negative bias: the estimated beta1_hat is smaller than the
true beta1. We underestimate the effect of education on wage.
5
1) Omitted Variable Bias
Econ 3121
6
2) Multiple Regression Model
Econ 3121
➢ Clearly, if we leave out relevant variables, we have got
problems. In most cases, a multivariate model specification is
appropriate.
➢ Yi 0 1 X 1i 2 X 2i . . . k X ki u i
➢ X1,X2,X3…,Xk are the k different regressors
➢ beta0 is constant
➢ beta1: the effect of X1 on Y holding all other variables constant.
➢ beta2: the effect of X2 on Y holding all other variables constant.
….
➢ u: the error terms, all other factors that affect Y.
➢ Everything from univariate regression carries over. Now we
have to estimate not just beta0 and beta1, but also beta2,
beta3,…betak. 7
3) OLS in multiple Regression Model
Econ 3121
➢ We still want estimators of the beta’s so sum squared errors
are minimized:
u i Yi 0 1 X 1i 2 X 2i . . . k X ki
min u 2i i 0 1 1i 2 2i
Y X X . . . X
k ki 2
➢ FOC: Y i 0 1 X 1i 2 X 2i . . . k X ki
2
0
0
Y i 0 1 X 1i 2 X 2i . . . k X ki 2
0
1
...
Y i 0 1 X 1i 2 X 2i . . . k X ki 2
0
k 8
3) OLS in multiple Regression Model
Econ 3121
➢ We want find estimators of the beta’s (or the linear function
of X’s) so the sum of squared residuals is minimized.
9
3) OLS in multiple Regression Model
Econ 3121
➢ There are k+1 unknowns (beat0, beta1, beta2…,betak) and
k+1 equations.
➢ A lot of algebra. But the idea is the same as univariate case.
➢ OLS “line” will be:
Y i 0 1 X 1i 2 X 2i . . . k X ki
0 , 1 . . . , k are estimators from OLS procedure.
➢ Again, residual is u i Yi Y i
➢ Again, u i 0
10
3) OLS in multiple Regression Model
Econ 3121
➢ Regression of TestScore against STR:
11
3) OLS in multiple Regression Model
Econ 3121
12
4) Fit
Econ 3121
R2 ESS 1 SSR
TSS TSS
13
4) Fit
Econ 3121
R 2 1 SSR SSR 1 R 2
TSS TSS
2
R 1 n 1 SSR 1 n 1 1 R 2
n k 1 TSS n k 1
14
4) Fit
Econ 3121
2
➢ The R (the “adjusted R2”) corrects this problem by
“penalizing” you for including another regressor – the does
not necessarily increase when you add another regressor.
➢ Note 0 R 2 1
If R 2 0, then R 2 1 n 1 0
n k 1
If R 2 1, then R 2 1
➢ Note that R 2 R 2
➢ however, if n is large, the two will be very close.
As n , n 1 1, so R 2 1 1 R 2 R 2
n k 1
➢ Don’t focus on R 2 or R 2 to decide your model. Let theory
dictate your variables. Use R 2 or R 2 as an indication of
whether you may have left stuff out. 15
5) Multiple Regression Assumption Econ 3121
16
5) Multiple Regression Assumption Econ 3121
17
5) Multiple Regression Assumption Econ 3121
➢ Assumption #4: There is no perfect multicollinearity
➢ Perfect multicollinearity is when one of the regressors is an
exact linear function of the other regressors:
➢ Example: Suppose you accidentally include STR twice:
➢ Type: reg testscr str str in Stata:
18
5) Multiple Regression Assumption Econ 3121
➢ Assumption #4: There is no perfect multicollinearity
➢ Example:
Y 0 1 X 1 2 X 2 3 X 3 . . .
Suppose X 3 : age in years
X 2 : age in days
X 2 365X 3 ; corrX 3 , X 2 1
➢ This does not make sense. How do you interpret beta2?
➢ Literally, if we hold X3 and all other variables constant, Y
will change by beta2 if we change X2 by 1 unit. How can
you change X2 (age in days) but also keep X3 (age in year)
constant?
➢ Stata will drop one automatically.
19
6) Distribution of OLS estimators
Econ 3121
20
7) Multicollinearity
Econ 3121
(i) The age example: age in years and age in days has the
same exact information.
Interest rate in decimal (e.g. 0.08) or basis points (e.g.
8)
…
Solution: remove one of them (otherwise Stata will
automatically drop one.)
21
7) Multicollinearity
Econ 3121
➢ Perfect multicollinearity:
(ii) dummy variable trap
1
Y 0 1X 1 2X 2 3X 3 where X 0
1
0X 0 1X 1 2X 2 3X 3 ...
1
Suppose that X1 is female dummy variable and X2 is male
dummy variable.
e.g. X1 X2 X0 So, don’t include X1 (or X2).
1 0 1
0 1 1
0 1 1
0 1 1
1 0 1
1 0 1 22
7) Multicollinearity
Econ 3121
➢ Imperfect multicollinearity:
➢ X2 and X3 might be highly correlated, but not perfectly.
➢ OLS can still run, but the standard error on beta2, beta3 or
both may be very high, which will result in a wide
confidence interval, low t-stat, high p-value (more likely
NOT to reject the null beta=0.)
➢ Classic Example:
Consumption i 0 1 Income i 2 Wealth . . . u i
➢ Income and wealth will usually have a correlation>0.90
➢ What should you do? You can collect more data, you can
combine the variables, or just do nothing and point the
potential problem.
23