Econometrics Handout Session 2
Econometrics Handout Session 2
Session II
β̂0 = ȳ − βˆ1 x̄
Standard errors of the estimated parameters:
q
1
x2i
P
N i
SE(β̂0 ) = σ̂ qP
2
i (xi − x̄)
σ̂
SE(β̂1 ) = qP
2
i (xi − x̄)
qP 2
i ε̂i
where σ̂ = n−2 .
Calculation of the R2 , a measure of fit of our model:
N
X
T SS := (Yi − Ȳ )2 = (N − 1) vd
ar(Yi )
i=1
N
X
ESS := (Ŷi − Ȳ )2 = (N − 1) vd
ar(Ŷi )
i=1
N
X
RSS := û2i = (N − 1) vd
ar(ûi )
i=1
1
1.2 Multiple Regressors Case
We have data (Yi , xi1 , ..., xiK )i=1,...,N upon which we seek to fit a linear model
Y1 1 x12 ... x1k ... x1K β1 ε1
Y2 1 x22 ... x2k ... x2K β2 ε2
.. .. .. .. .. .. ..
. . . . . . .
Y =
Yi
, X= , β= , ε=
1 xi2 ... xik ... xiK
βk
εi
. .. .. .. .. . ..
.. ..
. . . . .
YN 1 xN 2 ... xN k ... xN K βK εN
and we can write
Y =Xβ+ε
where Y and ε are (N × 1) vectors, X is an (N × K) matrix, and β is a (K × 1) vector.
2
Appendix
1.3 One Regressor Case
We have data (Yi , xi )i=1,...,N upon which we seek to fit a univariate linear model
Yi = β0 + β1 xi + εi
The problem can be seen from a data fitting perspective, and an econometric perspective.
εi = Yi − β0 − β1 xi
N N
∂ X X
: −2 xi Yi − β̂0 − βˆ1 xi = 0 ⇒ xi Yi − β̂0 − βˆ1 xi = 0
∂ β̂1 i=1 i=1
N
X N
X N
X
xi Yi − Ȳ + βˆ1 x̄ − βˆ1 xi = 0 xi Yi − Ȳ − βˆ1
⇒ ⇒ xi (xi − x̄) = 0
i=1 i=1 i=1
N
X N
X 2
(xi − x̄) Yi − Ȳ − βˆ1
⇒ (xi − x̄) = 0
i=1 i=1
PN 1
PN
(xi − x̄) Yi − Ȳ i=1 (xi − x̄) Yi − Ȳ cov
c (xi , Yi )
⇒ βˆ1 = i=1
PN 2
= N
1
PN 2
=
i=1 (xi − x̄) i=1 (xi − x̄)
vd
ar (xi )
N
where, in the second step I used β̂0 = Ȳ − βˆ1 x̄, in the third step I collected β̂1 , and in the fourth step I
exploited the fact that
N
X N
X
− x̄ Yi − Ȳ = −x̄ Yi + x̄ N Ȳ = −x̄ N Ȳ + x̄ N Ȳ = 0
i=1 i=1
and
N
X N
X
β̂1 x̄ (xi − x̄) = β̂1 x̄ xi + β̂1 x̄ N x̄ = β̂1 x̄ N x̄ − β̂1 x̄ N x̄ = 0
i=1 i=1
PN PN
and added − i=1 x̄ Yi − Ȳ and β̂1 i=1 x̄ (xi − x̄) to the LHS.
3
1.4.1 Measures of Fit
Measures of fit are mostly related to data fitting rather than econometrics per se, so it is useful to introduce
them now.
Since
N N N
1 X 1 X 1 X 2
vd
ar(Yi ) = (Yi − Ȳ )2 , vd
ar(Ŷi ) = (Ŷi − Ȳ )2 , vd
ar(ûi ) = û
N − 1 i=1 N − 1 i=1 N − 1 i=1 i
then
N
X
T SS := (Yi − Ȳ )2 = (N − 1) vd
ar(Yi )
i=1
N
X
ESS := (Ŷi − Ȳ )2 = (N − 1) vd
ar(Ŷi )
i=1
N
X
RSS := û2i = (N − 1) vd
ar(ûi )
i=1
P
5. Now notice that OLS imposes i (Ŷi − Ȳ ) ûi = 0 by construction, indeed
N
X N
X
(Ŷi − Ȳ ) ûi = (β̂0 + β̂1 xi − β̂0 − β̂1 x̄) (Yi − β̂0 − β̂1 xi )
i=1 i=1
N
X
= β̂1 (xi − x̄) (Yi − β̂0 − β̂1 xi )
i=1
4
now,
N
X N
X N
X N
X
(xi − x̄) (Yi − β̂0 − β̂1 xi ) = (xi − x̄) Yi − β̂0 (xi − x̄) − β̂1 (xi − x̄) xi
i=1 i=1 i=1 i=1
N
X N
X
= (xi − x̄) (Yi − Ȳ ) − β̂1 (xi − x̄)2
i=1 i=1
N
" PN # N
X
i=1 (xi − x̄) Yi − Ȳ X
= (xi − x̄) (Yi − Ȳ ) − PN 2
(xi − x̄)2
i=1 i=1 (xi − x̄) i=1
N
X N
X
= (xi − x̄) (Yi − Ȳ ) − (xi − x̄) Yi − Ȳ = 0
i=1 i=1
and that PN
(xi − x̄) Yi − Ȳ
βˆ1 = i=1
PN 2
i=1 (xi − x̄)
Now, we can exploit T SS = ESS + RSS to express the R2 both in terms of the ESS or the RSS as follows
ESS T SS − RSS RSS
R2 := = =1−
T SS T SS T SS
h i2
Another interesting property of the R2 , limited to the univariate regression model, is that R2 = corr(Y,
ˆ Ŷ ) ;
indeed:
1. By definition of R2 ,
(Ŷi − Ȳ )2
P
ESS
2
R := = Pi 2
T SS i (Yi − Ȳ )
and since
Ŷi − Ȳ = β̂0 + β̂1 xi − β̂0 − β̂1 x̄ = β̂1 (xi − x̄)
then
(xi − x̄)2
P
2
R = β̂12 Pi 2
i (Yi − Ȳ )
Letting X X
Sx = (xi − x̄)2 Sy = (Yi − Ȳ )2
i i
we have
Sx
R2 = β̂12
Sy
5
2. The (estimated) correlation coefficient between Y and Ŷ is
1
P
cov(
c Ŷ , Y ) N −1 − Ȳ ) (Ŷi − Ȳ )
i (Yi
corr(
d Ŷ , Y ) = = rh irh
σ
b(Ŷ ) σ
b(Y )
i
1 1
P 2
P 2
N −1 (Y
i i − Ȳ ) N −1 i (Ŷi − Ȳ )
P P
− Ȳ ) (Ŷi − Ȳ )
i (Yi i (Yi − Ȳ ) (Ŷi − Ȳ )
=q r i = rP
P hP hP i
(Y − Ȳ ) 2 ( Ŷ − Ȳ )2 (Y − Ȳ ) 2 ( Ŷ − Ȳ )2
i i i i i i i i
Now,
PN N N
(xi − x̄) Yi − Ȳ X X 2
βˆ1 = i=1
(xi − x̄) Yi − Ȳ = βˆ1 (xi − x̄) = βˆ1 Sx
PN 2
⇒
i=1 (xi − x̄) i=1 i=1
hence
βˆ1 Sx
corr(
d Ŷ , Y ) = p
Sy Sx
and therefore,
h i2 Sx2 Sx
d Ŷ , Y ) = βˆ12
corr( = βˆ12 = R2
Sy Sx Sy
The OLS estimator is then also a so-called Method of Moments estimator, imposing the Moment Conditions
E (xi εi ) = 0
E (εi ) = 0
on the sample at hand.
The so-called Sample Analogue of E (εi ) = 0 is
N
1 X
εi = 0
N i=1
which, under the assumption that the true causal relationship that we seek to capture takes indeed a linear
form
Yi = β0 + β1 xi + εi
6
gives exactly the same condition as the first FOC from the data fitting problem
N N
1 X 1 X
εi = 0 ⇒ Yi − β0 − β1 xi = 0
N i=1 N i=1
which again under the assumption that the true causal relationship that we seek to capture takes indeed a
linear form, gives exactly the same condition as the second FOC from the data fitting problem
N
1 X
xi (Yi − β0 − β1 xi ) = 0
N i=1
Y =Xβ+ε
We will show in more detail, in the multivariate case (takes this for granted, for the time being), that in
matrix notation the OLS estimator is
β̂0
β̂OLS = = (X 0 X)−1 X 0 Y
β̂1
Also, since Y = Xβ + ε,
β̂OLS = (X 0 X)−1 X 0 Y
= (X 0 X)−1 X 0 (Xβ + ε)
= (X 0 X)−1 (X 0 X)β + (X 0 X)−1 X 0 ε
= β + (X 0 X)−1 X 0 ε
7
Now, assume that
E(ε | X) = E(ε) = 0
and that errors are homoskedastic, and not correlated
var(εi | X) = σ 2
cov(εi , εj | X) = 0 , ∀i 6= j
These two latter, define the entries of the so-called variance-covariance matrix of ε, conditional on X
σ 2 0 ... 0
var(ε1 | X) cov(ε1 , ε2 | X) ... cov(ε1 , εN | X)
cov(ε1 , ε2 | X) var(ε2 | X) ... cov(ε2 , εN | X) 0 σ 2 ... 0
V (ε | X) = =
... ... ... ... ... ... ... ...
cov(ε1 , εN | X) cov(ε2 , εN | X) ... var(εN | X) 0 0 ... σ 2
to be more clear: in the first matrix we have how any variance-covariance matrix of the errors is defined,
with variances on the diagonal, and covariances off the diagonal; notice that the variance-covariance matrix
is symmetric, as cov(εi , εj | X) = cov(εj , εi | X) , ∀i 6= j; in the second matrix, we have how the variance-
covariance matrix becomes under our assumptions of homoskedasticity and uncorrelation.
Since E(ε | X) = 0, the assumptions on variances (homoskedasticity) and covariances (uncorrelation) can be
stated more compactly as
0
V (ε | X) = E (ε − E(ε | X)) (ε − E(ε | X)) | X = E(εε0 | X) = σ 2 I
Therefore, β̂OLS − β = (X 0 X)−1 X 0 ε, and the variance-covariance matrix of the OLS estimator, under our
assumptions, is
0
V (β̂OLS | X) = E β̂OLS − E(β̂OLS | X) β̂OLS − E(β̂OLS | X) | X
0
= E β̂OLS − β | X) β̂OLS − β | X) | X
h 0 i
= E (X 0 X)−1 X 0 ε (X 0 X)−1 X 0 ε) | X
= E (X 0 X)−1 X 0 εε0 X(X 0 X)−1 | X
8
Since σ 2 is unknown and, given ε̂i = Yi − β̂0 − β̂1 xi , estimated by
N
1 X 2
b2 =
σ ε̂
N − 2 i=1 i
b2 (X 0 X)−1
Vb (β̂OLS | X) = σ
This holds, under homoskedasticity and uncorrelation, both in the univariate and the multivariate case.
We now exploit some matrix algebra to specialise the formulation for the univariate case; in other words, we
need to compute (X 0 X)−1 .
Recall that
1 x1
.. ..
. .
X= 1 x i
. .
.. ..
1 xN
Then,
1 x1
.. ..
. . P
1 1 ... 1
0 PN x i
XX= 1 xi i
=
P 2
x1 x2 ... xN i xi i xi
. ..
..
.
1 xN
0
To invert (X X), we first need to compute its determinant
! !2 !
X X X 2
0 2 2
det(X X) = N xi − xi = N xi − (N x̄)
i i i
" ! # " ! #
X X
=N x2i − N x̄ 2
=N x2i 2
+ N x̄ − 2N x̄ 2
i i
" ! ! !# " ! ! !#
X X X X X X
=N x2i + x̄ 2
− 2 x̄ xi =N x2i + x̄ 2
− 2 x̄ xi
i i i i i i
X
x2i + x̄2 − 2 x̄ xi
=N
i
X 2
=N (xi − x̄)
i
Then,
P 2 P P 2 P
1 i xi − i xi 1 i xi − xi
(X 0 X)−1 = P = P P i
det(X 0 X)− i xi N N i (xi − x̄)
2 − i xi N
1 P 2 1
P
1 N P i xi − N i xi
=P 1
2 − x 1
i (xi − x̄) N i i
and
b2 1
− N1
P 2 P
σ
b2 (X 0 X)−1 = P
Vb (β̂OLS | X) = σ N Pi xi i xi
(xi − x̄)
2 − N1 i xi 1
i
9
The entries on the diagonal of the estimated variance-covariance matrix are the estimated variances of the
parameters; the off-diagonal (symmetric) element, is the estimated covariance of the parameters.
b2
σ 1 b2
σ 1
" #
x2i
P P
−P xi
ar(β̂0 | X)
vd c β̂0 , β̂1 | X)
cov( P
i (xi −x̄)
2 N i i (xi −x̄)
2 N i
Vb (β̂OLS | X) = = σ 2
1
b2
σ
c β̂0 , β̂1 | X) ar(β̂1 | X)
P
cov( vd −P b
2 N i xi P 2
i (xi −x̄) i (x i −x̄)
where, again, notice that the first matrix is how a general estimated variance-covariance matrix for β̂0 , β̂1 is
defined, while the second matrix is how the estimated variance-covariance matrix of the OLS estimator is
characterised under our assumptions of homoskedasticity and uncorrelation.
The standard errors of the parameters are their estimated standard deviations, that is
q P
1 2
N i xi
SE(β̂0 ) = σ̂ qP
2
i (xi − x̄)
σ̂
SE(β̂1 ) = qP
2
i (xi − x̄)
2. log-level specification
ln(yi ) = β0 + β1 xi + εi
Then,
d ln(y)
β1 =
dx
In general,
d ln(z) 1 dz ∆z
= ⇒ d ln(z) = ≈
dz z z z
i.e. for small changes, we can approximately interpret log changes as % changes.
Therefore,
d ln(y) dy/y
β1 = =
dx dx
A 1 unit increase in x is associated with an increase in y of 100 β1 %.
3. level-log specification
yi = β0 + β1 ln(xi ) + εi
Then,
dy
β1 =
d ln(x)
and therefore
dy d ln(x) dx
= β1 ⇒ d y = β1
dx dx x
10
A 1% increase in x is associated with an increase in y of β1 /100 units.
4. log-log specification
ln(yi ) = β0 + β1 ln(xi ) + εi
Then,
d ln(y) dy/y
β1 = =
d ln(x) dx/x
i.e. the coefficient of a log-log regression can be interpreted as an elasticity.
A 1% increase in x is associated with an increase in y of β1 %.
where the first regressor is set to be xi1 = 1, so as to rationalise the presence of the constant β1 in a useful
way.
As in the univariate case, the problem can be seen both from a data fitting perspective, and an econometric
perspective.
In both cases, it is useful to rewrite the model in matrix notation. Let
Y1 1 x12 ... x1k ... x1K β1 ε1
Y2 1 x22 ... x2k ... x2K β2 ε2
.. .. .. .. .. .. ..
. . . . . . .
Y = Yi , X = 1 xi2 ... xik ... xiK , β = βk , ε = εi
. . .. .. .. . .
.. .. . . . .
. .
.
YN 1 xN 2 ... xN k ... xN K βK εN
min ε0 ε ⇐⇒ min (Y − Xβ)0 (Y − Xβ) ⇐⇒ min (Y 0 − β 0 X 0 )(Y − Xβ) ⇐⇒ min (Y 0 − β 0 X 0 )(Y − Xβ)
β β β β
⇐⇒ min Y Y − β X Y − Y Xβ + β X Xβ ⇐⇒ min Y Y − 2 β X Y + β 0 X 0 Xβ
0 0 0 0 0 0 0 0 0
β β
where in the last step we exploit that in general, given two matrices A and B, (A0 B) = (B 0 A), hence
Y 0 (Xβ) = (Xβ)0 Y = β 0 X 0 Y .
11
First order conditions are
−2 X 0 Y + 2 X 0 X β̂ = 0 ⇐⇒ X 0 X β̂ = X 0 Y ⇐⇒ (X 0 X)−1 X 0 X β̂ = (X 0 X)−1 X 0 Y
therefore,
β̂ = (X 0 X)−1 X 0 Y
where ε̂ = Y − X β̂ = Y − X(X 0 X)−1 X 0 Y , then the estimated variance-covariance matrix of the OLS
estimator is
Vb (β̂ | X) = σ̂ 2 (X 0 X)−1
elements on the diagonal are estimated variances, and their square roots are estimated standard errors
ar(β̂1 | X)
vd c β̂1 , β̂2 | X) ... cov(
cov( c β̂1 , β̂K | X)
c β̂1 , β̂2 | X)
cov( ar(β̂2 | X)
vd c β̂2 , β̂K | X)
... cov(
Vb (β̂ | X) =
... ... ... ...
c β̂1 , β̂K | X) cov(
cov( c β̂2 , β̂K | X) ... ar(β̂K | X)
vd
q
SE(β̂j ) = ar(β̂j | X) ,
vd j = 1, 2, ..., K
12
2 Applied R example
# we want to fit a simple model from our wage data, explaining how education
# affects wages:
# wage = a + b*education + u
# If we know we will be working with the same data frame object for a while,
# it is convenient to use "attach" to ease coding
attach(wage2)
b_hat <- cov(wage,educ)/var(educ)
a_hat <- mean(wage) - b_hat*mean(educ)
detach(wage2)
13
y_hat <- fitted(wage_reg) # fitted values: y_hat = a_hat + b_hat*x
u_hat <- resid(wage_reg) # redisuals: u_hat = y - a_hat - b_hat*x = y - y_hat
# a visual comparison of y and y_hat, and u_hat may enhance the understanding
attach(wage2)
y <- data.frame(wage,y_hat,u_hat)
detach(wage2)
## [1] -3.486193e-13
## [1] -4.948877e-14
## [1] 0.1070001
1- RSS/TSS #same
## [1] 0.1070001
cor(wage2$wage,y_hat)^2 #same
## [1] 0.1070001
# REMEMBER: we don't really care too much about the R^2 in many research scenarios.
# There can be "good" econometric models that have a very low R^2,
# as well as "bad" econometric models that have very high R^2.
# The R^2 is more related to data fitting than econometrics!
## [1] 77.71496
## [1] 5.694982
14
# All of this is good for a thorough understanding of what we are doing when we
# perform regression analysis. But it wouldn't be very efficient to compute each
# time all this information singularly. R actually gives us directly all of this
# when running the LM command
summary(lm(wage ~ educ, data=wage2))
##
## Call:
## lm(formula = wage ~ educ, data = wage2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -877.38 -268.63 -38.38 207.05 2148.26
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 146.952 77.715 1.891 0.0589 .
## educ 60.214 5.695 10.573 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 382.3 on 933 degrees of freedom
## Multiple R-squared: 0.107,Adjusted R-squared: 0.106
## F-statistic: 111.8 on 1 and 933 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = wage ~ educ, data = wage2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -877.38 -268.63 -38.38 207.05 2148.26
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 146.952 77.715 1.891 0.0589 .
## educ 60.214 5.695 10.573 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 382.3 on 933 degrees of freedom
## Multiple R-squared: 0.107,Adjusted R-squared: 0.106
## F-statistic: 111.8 on 1 and 933 DF, p-value: < 2.2e-16
# log-level
wage2$logwage <- log(wage2$wage)
summary(lm(logwage ~ educ, data=wage2))
##
## Call:
## lm(formula = logwage ~ educ, data = wage2)
15
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.94620 -0.24832 0.03507 0.27440 1.28106
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.973062 0.081374 73.40 <2e-16 ***
## educ 0.059839 0.005963 10.04 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4003 on 933 degrees of freedom
## Multiple R-squared: 0.09742,Adjusted R-squared: 0.09645
## F-statistic: 100.7 on 1 and 933 DF, p-value: < 2.2e-16
# level-log
wage2$logeduc <- log(wage2$educ)
summary(lm(wage ~ logeduc, data=wage2))
##
## Call:
## lm(formula = wage ~ logeduc, data = wage2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -878.29 -262.41 -37.67 204.53 2138.72
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1185.5 203.5 -5.826 7.82e-09 ***
## logeduc 828.4 78.5 10.553 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 382.4 on 933 degrees of freedom
## Multiple R-squared: 0.1066,Adjusted R-squared: 0.1057
## F-statistic: 111.4 on 1 and 933 DF, p-value: < 2.2e-16
# log-log
wage2$logwage <- log(wage2$wage)
summary(lm(logwage ~ logeduc, data=wage2))
##
## Call:
## lm(formula = logwage ~ logeduc, data = wage2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.94925 -0.24818 0.03866 0.27282 1.27167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.63932 0.21297 21.78 <2e-16 ***
## logeduc 0.82694 0.08215 10.07 <2e-16 ***
16
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4002 on 933 degrees of freedom
## Multiple R-squared: 0.09796,Adjusted R-squared: 0.09699
## F-statistic: 101.3 on 1 and 933 DF, p-value: < 2.2e-16
# Assume now we think wage is also dependent on experience and tenure, aside
# from education.
# We now fit a model with multiple regressors:
# log(wage) = b0 + b1*educ + b2*exper + b3*tenure + u
# notice that we have N = 526 observations and K = 4 regressors
# (including the constant)
summary(lm(log(wage) ~ educ + exper + tenure, data=wage2))
##
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure, data = wage2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.05802 -0.29645 -0.03265 0.28788 1.42809
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.284360 0.104190 2.729 0.00656 **
## educ 0.092029 0.007330 12.555 < 2e-16 ***
## exper 0.004121 0.001723 2.391 0.01714 *
## tenure 0.022067 0.003094 7.133 3.29e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4409 on 522 degrees of freedom
## Multiple R-squared: 0.316,Adjusted R-squared: 0.3121
## F-statistic: 80.39 on 3 and 522 DF, p-value: < 2.2e-16
# As before, the lm command makes our life way easier. To bridge this output with
# the theory studied in class, we can actually estimate the OLS parameters and
# standard errors step by step from our data
17
# we must add a first column of ones to the X matrix, so as to account for the
# constant b0 in beta_hat
const <- rep(1,N) # vector of ones (or any number you wish) of length 10
X <- cbind(const,X)
XtX <- t(X)%*%X # X'X matrix (K x K): t(X) gives the transpose and %*% matrix multiplication
XtY <- t(X)%*%Y # X'Y matrix (K x 1)
invXtX <- solve(XtX) # (X'X)^{-1} matrix (K x K)
18