Lec 5 V 11
Lec 5 V 11
Anbes Tenaye
Department of Agricultural Economics
Hawassa University
2 / 44
Reminder Thursday January 29 10:27:18 2015 Page 1
4 / 44
Regression when X is a binary variable
Yi = 0 + 1 Di + ui
Yi = 0 + ui ! E (Yi |D = 0) = 0
Yi = 0 + 1 + ui ! E (Yi |D = 1) = 0 + 1
5 / 44
Dummy variables
• The group with an indicator of 0 is the base group, the group against
which comparisons are made.
• It does not matter how we choose the base group, but it is important
to keep track of which group is the base group.
• If the two groups do not di↵er then 1 is zero.
6 / 44
Example
7 / 44
Proportions and percentages as dependent variables
8 / 44
Proportions and percentages as dependent variables
9 / 44
Homoskedasticity
10 / 44
The ideal analysis
11 / 44
Omitted variable bias -ZCM assumption
• In the last lecture you saw that E (u|X ) = 0 is important in order for
the OLS estimator to be unbiased.
• The omitted variable is thus important if the omission leads to a
violation of the ZCM assumption.
• The bias that arise from such an omission is called omitted variable
bias.
12 / 44
Omitted variable bias
13 / 44
OVB example
We estimate:
yi = 0 + 1X +u
while the true model is:
yi = 0 + 1X + 2Z +v
14 / 44
Example: Corr (Z , X ) 6= 0
wages = 0 + 1 educ + ui
|{z}
1 pinc+vi
15 / 44
Example: Z is a determinant of Y
wages = 0 + 1 educ + ui
|{z}
2 MS+vi
16 / 44
Example: Omitted variable bias
wages = 0 + 1 educ + ui
|{z}
3 ability +vi
• Ability - the higher your ability the ”easier” education is for you and
the more likely you are to have high education.
• Ability - the higher your ability the better you are at your job and the
higher wages you get.
17 / 44
How to overcome omitted variable bias
18 / 44
Cross tabulation
One can address omitted variable bias by splitting the data into subgroups.
For example:
19 / 44
Cross tabulation
20 / 44
Multiple linear regression model
21 / 44
Multiple linear regression model
Y X Other variables
Wages Education Experience, Ability
Crop Yield Fertilizer Soil quality, location (sun etc)
Test score STR Average family income
22 / 44
Multiple linear regression model
The general multiple linear regression model for the population can be
written in the as:
23 / 44
Multiple linear regression model
24 / 44
Example
Yi = 0 + 1 X1i + 2 X2i +u
Example:
wagei = 0 + 1 educi + 2 expi + ui
2
wagei = 0 + 1 expi + 2 IQi + ui
25 / 44
Interpretation of the coefficient
Ŷ = ˆ0 + ˆ1 X1 + ˆ2 X2
Thus the predicted change in y given the changes in X1 and X2 are given
by:
Ŷ = ˆ1 X1 + ˆ2 X2
Thus if x2 is held fixed then:
Ŷ = ˆ1 X1
26 / 44
Interpretation of the coefficient
Sunday February 1 14:48:19 2015 Page 1
Using data on 526 observations on wage, education and ____
___ ____ ____ experience
____(R) the
/__ / ____/ / ____/
following output was obtained: ___/ / /___/ / /___/
Statistics/Data Analysis
27 / 44
Interpretation of the coefficient
28 / 44
Example: Smoking and birthweight
Using the data set birthweight smoking.dta you can estimate the following
regression:
ˆ
birthweight = 3432.06 253.2Smoker
If we include the number of prenatal visits:
ˆ
birthweight = 3050.5 218.8Smoker + 34.1nprevist
29 / 44
Example education
educ Coef.
Robust
Std. Err. t P>|t| [95% Conf. Interval]
Robust
educ Coef. Std. Err. t P>|t| [95% Conf. Interval]
9 .
1 . display _cons+_b[meduc]*12+_b[feduc]*12
5.8634189
2 . display _cons+_b[meduc]*16+_b[feduc]*16
7.4845585
3 .
4 . display 7.484-5.863
1.621
5 .
6 . *or
7 .
8 . display _b[meduc]*4+_b[feduc]*4
1.6211396
Or by hand:
31 / 44
Multiple linear regression model
32 / 44
Assumptions of the MLRM
E (u|X1 , x2 , ....Xk ) = 0
33 / 44
No exact linear relationships
Perfect collinearity
A situation in which one of the regressors is an exact linear function of the
other regressors.
34 / 44
No perfect collinearity
Solving the three 1oc for the model with two independent variables gives:
where ˆX2 j (j = 1, 2), ˆY2 ,Xj and ˆX2 1 ,X2 are empirical variances and
covariances. Thus we require that:
Thus must have that ˆX2 1 > 0, ˆX2 2 > 0 and rX21 ,X2 6= 1. Thus the sample
correlation coefficient between X1 and X2 cannot be one or minus one.
35 / 44
Imperfect collinearity
• Occurs when two or more of the regressors are highly correlated (but
not perfectly correlated).
• High correlation makes it hard to estimate the e↵ect of the one
variable holding the other constant.
• For the model with two independent variables and homoskedastic
errors: !
1 1 2
2 u
ˆ1 =
n 1 ⇢2X1 ,X2 2
X1
• The two variable case illustrates that the higher the correlation
between X1 and X2 the higher the variance of ˆ1 .
• Thus, when multiple regressors are imperfectly collinear, the
coefficients on one or more of these regressors will be imprecisely
estimated.
36 / 44
Omitted variable bias
37 / 44
Example bias
Monday February 2 20:38:46 2015 Page 1
Comparing estimates from simple and multiple regression. What is the
___ ____ ____ ____ ____(R)
return to education? Simple regression: /__
___/ /
/ ____/
/___/ /
/ ____/
/___/
Statistics/Data Analysis
Robust
wage Coef. Std. Err. t P>|t| [95% Conf. Interval]
38 / 44
Example bias - two independent variables
Yi = 0 + 1 X1 + 2 X2 + ui
39 / 44
Example bias - two independent variables
Thus the bias that arise from the omitted variable (in the model with two
independent variables) is given by 2 ˜1 and the direction of the bias can
be summarized by the following table:
40 / 44
Linear regression Number of obs = 935
Comparing estimates from simple and multiple
F( 2,
Prob > F
regression
932) =
=
64.47
0.0000
R-squared = 0.1339
Root MSE = 376.73
Robust
wage Coef. Std. Err. t P>|t| [95% Conf. Interval]
educ
Monday February 2 42.05762 6.810074
20:58:01 2015 Page 1 6.18 0.000 28.69276 55.42247
IQ 5.137958 .9266458 5.54 0.000 3.319404 6.956512
_cons -128.8899 93.09396 -1.38 0.167 -311.5879 53.80818
___ ____ ____ ____ ____(R)
/__ / ____/ / ____/
2 . reg educ IQ, robust ___/ / /___/ / /___/
Statistics/Data Analysis
Linear regression Number of obs = 935
F( 1, 933) = 342.94
Prob > F = 0.0000
Robust R-squared = 0.2659
IQ Coef. Std. Err. t P>|t| [95%MSE
Root Conf. Interval]
= 1.883
41 / 44
Bias - multiple independent variables
• Deriving the sign of omitted variable bias when there are more than
two independent variables in the model is more difficult.
• Note that correlation between a single explanatory variable and the
error generally results in all OLS estimators being biased.
• Suppose the true population model is:
Y = 0 + 1 X1 + 2 X2 + 3 X3 +u
• But we estimate
Ỹ = ˜0 + ˜1 X1 + ˜2 X2
• If Corr (X1 , X3 ) 6= 0 while Corr (X2 , X3 ) = 0 ˜2 will also be biased
unless corr (X1 , X2 ) = 0.
42 / 44
Bias - multiple independent variables
43 / 44
Causation
44 / 44