0% found this document useful (0 votes)
65 views18 pages

Econometrics Handout Session 2

The document discusses the ordinary least squares (OLS) estimator for univariate and multiple regression models. For univariate regression, OLS chooses coefficients β0 and β1 to minimize the sum of squared errors between observed and predicted y-values. For multiple regression, OLS extends this to estimate coefficients β1 through βK for K regressors. The variance of the OLS estimator is also provided. OLS can be viewed as either a data fitting problem or from an econometric perspective to estimate average effects of regressors on the dependent variable.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views18 pages

Econometrics Handout Session 2

The document discusses the ordinary least squares (OLS) estimator for univariate and multiple regression models. For univariate regression, OLS chooses coefficients β0 and β1 to minimize the sum of squared errors between observed and predicted y-values. For multiple regression, OLS extends this to estimate coefficients β1 through βK for K regressors. The variance of the OLS estimator is also provided. OLS can be viewed as either a data fitting problem or from an econometric perspective to estimate average effects of regressors on the dependent variable.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

The OLS Estimator

Econometrics (30413) — Applied Sessions with R

Session II

1 This Session in a snapshot


1.1 Univariate Regression
Univariate regression model:
Yi = β0 + β1 xi + εi , i = 1, ..., N
OLS estimator (in matrix notation):
 
β̂0
β̂OLS = = (X 0 X)−1 X 0 Y
β̂1
Pn
Cov(Y, x) (yi − ȳ) (xi − x̄)
βˆ1 = = i=1
2
V ar(x) (xi − x̄)

β̂0 = ȳ − βˆ1 x̄
Standard errors of the estimated parameters:
q
1
x2i
P
N i
SE(β̂0 ) = σ̂ qP
2
i (xi − x̄)

σ̂
SE(β̂1 ) = qP
2
i (xi − x̄)
qP 2
i ε̂i
where σ̂ = n−2 .
Calculation of the R2 , a measure of fit of our model:
N
X
T SS := (Yi − Ȳ )2 = (N − 1) vd
ar(Yi )
i=1

N
X
ESS := (Ŷi − Ȳ )2 = (N − 1) vd
ar(Ŷi )
i=1
N
X
RSS := û2i = (N − 1) vd
ar(ûi )
i=1

ESS T SS − RSS RSS


R2 := = =1−
T SS T SS T SS

1
1.2 Multiple Regressors Case
We have data (Yi , xi1 , ..., xiK )i=1,...,N upon which we seek to fit a linear model

Yi = β1 + β2 xi2 + ... + βK xiK + εi

       
Y1 1 x12 ... x1k ... x1K β1 ε1
 Y2   1 x22 ... x2k ... x2K   β2   ε2 
.. .. .. .. .. .. ..
       
       
 .   . . . .   .   . 
Y =
 Yi
, X= , β= , ε= 




 1 xi2 ... xik ... xiK 

 βk




 εi 
 . .. .. .. ..  . .. 
 ..  ..
    
  . . . .    . 
YN 1 xN 2 ... xN k ... xN K βK εN
and we can write

Y =Xβ+ε
where Y and ε are (N × 1) vectors, X is an (N × K) matrix, and β is a (K × 1) vector.

1.2.1 Variance of the OLS estimator


V ar(β̂) = (X 0 X)−1 X 0 σu2 IX(X 0 X)−1 = σu2 (X 0 X)−1

2
Appendix
1.3 One Regressor Case
We have data (Yi , xi )i=1,...,N upon which we seek to fit a univariate linear model

Yi = β0 + β1 xi + εi

The problem can be seen from a data fitting perspective, and an econometric perspective.

1.4 OLS as a data fitting problem


From a pure data fitting perspective, we seek to find the line that best fits the data. This is done by taking
the vertical distances between the line we seek and the data

εi = Yi − β0 − β1 xi

and then minimising their sum of squares


  N N
β̂0 X X 2
= argmin{β0 ,β1 } ε2i = argmin{β0 ,β1 } (Yi − β0 − β1 xi )
βˆ1 i=1 i=1

First Order Conditions of the Data Fitting problem are


  N  N 
∂ X  X 
: −2 Yi − β̂0 − βˆ1 xi = 0 ⇒ Yi − β̂0 − βˆ1 xi = 0
∂ β̂0 i=1 i=1
N N N N
X X 1 X 1 X
⇒ Yi − N β̂0 − βˆ1 xi = 0 ⇒ Yi = β̂0 + βˆ1 xi
i=1 i=1
N i=1 N i=1
⇒ Ȳ = β̂0 + βˆ1 x̄

  N N
∂ X   X  
: −2 xi Yi − β̂0 − βˆ1 xi = 0 ⇒ xi Yi − β̂0 − βˆ1 xi = 0
∂ β̂1 i=1 i=1
N
X   N
X N
X
xi Yi − Ȳ + βˆ1 x̄ − βˆ1 xi = 0 xi Yi − Ȳ − βˆ1

⇒ ⇒ xi (xi − x̄) = 0
i=1 i=1 i=1
N
X N
X 2
(xi − x̄) Yi − Ȳ − βˆ1

⇒ (xi − x̄) = 0
i=1 i=1
PN  1
PN 
(xi − x̄) Yi − Ȳ i=1 (xi − x̄) Yi − Ȳ cov
c (xi , Yi )
⇒ βˆ1 = i=1
PN 2
= N
1
PN 2
=
i=1 (xi − x̄) i=1 (xi − x̄)
vd
ar (xi )
N

where, in the second step I used β̂0 = Ȳ − βˆ1 x̄, in the third step I collected β̂1 , and in the fourth step I
exploited the fact that
N
X N
X

− x̄ Yi − Ȳ = −x̄ Yi + x̄ N Ȳ = −x̄ N Ȳ + x̄ N Ȳ = 0
i=1 i=1

and
N
X N
X
β̂1 x̄ (xi − x̄) = β̂1 x̄ xi + β̂1 x̄ N x̄ = β̂1 x̄ N x̄ − β̂1 x̄ N x̄ = 0
i=1 i=1
PN  PN
and added − i=1 x̄ Yi − Ȳ and β̂1 i=1 x̄ (xi − x̄) to the LHS.

3
1.4.1 Measures of Fit
Measures of fit are mostly related to data fitting rather than econometrics per se, so it is useful to introduce
them now.
Since
N N N
1 X 1 X 1 X 2
vd
ar(Yi ) = (Yi − Ȳ )2 , vd
ar(Ŷi ) = (Ŷi − Ȳ )2 , vd
ar(ûi ) = û
N − 1 i=1 N − 1 i=1 N − 1 i=1 i

then
N
X
T SS := (Yi − Ȳ )2 = (N − 1) vd
ar(Yi )
i=1
N
X
ESS := (Ŷi − Ȳ )2 = (N − 1) vd
ar(Ŷi )
i=1
N
X
RSS := û2i = (N − 1) vd
ar(ûi )
i=1

We now show that


T SS = ESS + RSS
1. By the properties of OLS, Yi = Ŷi + ûi , indeed

Yi = β̂0 + β̂1 xi + Yi − β̂0 − β̂1 xi


| {z } | {z }
Ŷi ûi

2. Then, by subtracting Ȳ on both sides of Yi = Ŷi + ûi , we get

(Yi − Ȳ ) = (Ŷi − Ȳ ) + ûi

3. Taking squares on both sides,

(Yi − Ȳ )2 = (Ŷi − Ȳ )2 + û2i + 2 (Ŷi − Ȳ ) ûi

4. Summing over all observations,


N
X N
X N
X N
X
(Yi − Ȳ )2 = (Ŷi − Ȳ )2 + û2i + 2 (Ŷi − Ȳ ) ûi
i=1 i=1 i=1 i=1

P
5. Now notice that OLS imposes i (Ŷi − Ȳ ) ûi = 0 by construction, indeed
N
X N
X
(Ŷi − Ȳ ) ûi = (β̂0 + β̂1 xi − β̂0 − β̂1 x̄) (Yi − β̂0 − β̂1 xi )
i=1 i=1
N
X
= β̂1 (xi − x̄) (Yi − β̂0 − β̂1 xi )
i=1

4
now,
N
X N
X N
X N
X
(xi − x̄) (Yi − β̂0 − β̂1 xi ) = (xi − x̄) Yi − β̂0 (xi − x̄) − β̂1 (xi − x̄) xi
i=1 i=1 i=1 i=1
N
X N
X
= (xi − x̄) (Yi − Ȳ ) − β̂1 (xi − x̄)2
i=1 i=1
N
" PN # N
X
i=1 (xi − x̄) Yi − Ȳ X
= (xi − x̄) (Yi − Ȳ ) − PN 2
(xi − x̄)2
i=1 i=1 (xi − x̄) i=1
N
X N
X 
= (xi − x̄) (Yi − Ȳ ) − (xi − x̄) Yi − Ȳ = 0
i=1 i=1

where we have again exploited that


N
X
− (xi − x̄) Ȳ = 0
i=1
N
X
x̄ (xi − x̄) = 0
i=1

and added them to the RHS; that


N
X
−β̂0 (xi − x̄) = 0
i=1

and that PN 
(xi − x̄) Yi − Ȳ
βˆ1 = i=1
PN 2
i=1 (xi − x̄)

Now, we can exploit T SS = ESS + RSS to express the R2 both in terms of the ESS or the RSS as follows
ESS T SS − RSS RSS
R2 := = =1−
T SS T SS T SS
h i2
Another interesting property of the R2 , limited to the univariate regression model, is that R2 = corr(Y,
ˆ Ŷ ) ;
indeed:
1. By definition of R2 ,
(Ŷi − Ȳ )2
P
ESS
2
R := = Pi 2
T SS i (Yi − Ȳ )
and since
Ŷi − Ȳ = β̂0 + β̂1 xi − β̂0 − β̂1 x̄ = β̂1 (xi − x̄)
then
(xi − x̄)2
P
2
R = β̂12 Pi 2
i (Yi − Ȳ )

Letting X X
Sx = (xi − x̄)2 Sy = (Yi − Ȳ )2
i i

we have

Sx
R2 = β̂12
Sy

5
2. The (estimated) correlation coefficient between Y and Ŷ is
1
P
cov(
c Ŷ , Y ) N −1 − Ȳ ) (Ŷi − Ȳ )
i (Yi
corr(
d Ŷ , Y ) = = rh irh
σ
b(Ŷ ) σ
b(Y )
i
1 1
P 2
P 2
N −1 (Y
i i − Ȳ ) N −1 i (Ŷi − Ȳ )
P P
− Ȳ ) (Ŷi − Ȳ )
i (Yi i (Yi − Ȳ ) (Ŷi − Ȳ )
=q r i = rP
P  hP  hP i
(Y − Ȳ ) 2 ( Ŷ − Ȳ )2 (Y − Ȳ ) 2 ( Ŷ − Ȳ )2
i i i i i i i i

Exploiting again the fact that Ŷi − Ȳ = β̂1 (xi − x̄)


P P P
β̂1 − Ȳ ) (xi − x̄)
i (Yi β̂1 i (Yi − Ȳ ) (xi − x̄) (Yi − Ȳ ) (xi − x̄)
corr(
d Ŷ , Y ) = r = qP = i p
h i
2
 P 2 Sy Sx
β̂1 i (Yi − Ȳ ) [ i (xi − x̄) ]
P  P 2
2 2
i (Yi − Ȳ ) i β̂1 (xi − x̄)

Now,
PN  N N
(xi − x̄) Yi − Ȳ X X 2
βˆ1 = i=1
(xi − x̄) Yi − Ȳ = βˆ1 (xi − x̄) = βˆ1 Sx

PN 2

i=1 (xi − x̄) i=1 i=1

hence
βˆ1 Sx
corr(
d Ŷ , Y ) = p
Sy Sx
and therefore,
h i2 Sx2 Sx
d Ŷ , Y ) = βˆ12
corr( = βˆ12 = R2
Sy Sx Sy

1.5 OLS from an econometric perspective


When doing econometrics, we seek to have a causal interpretation about the impact of x on Y . For this, we
need a so-called exogeneity assumption
E (ε|x) = E (ε) = 0
which, rougly speaking, requires that ε and x are not related, otherwise β1 would capture other things than
the causal effect of x on Y .
We then have

E (ε|x) = E (ε) = 0 ⇒ cov(xi , εi ) = 0 ⇒ cov(xi , εi ) = E (xi εi ) − E (xi ) E (εi ) = E (xi εi ) = 0

The OLS estimator is then also a so-called Method of Moments estimator, imposing the Moment Conditions

E (xi εi ) = 0

E (εi ) = 0
on the sample at hand.
The so-called Sample Analogue of E (εi ) = 0 is
N
1 X
εi = 0
N i=1

which, under the assumption that the true causal relationship that we seek to capture takes indeed a linear
form
Yi = β0 + β1 xi + εi

6
gives exactly the same condition as the first FOC from the data fitting problem
N N
1 X 1 X
εi = 0 ⇒ Yi − β0 − β1 xi = 0
N i=1 N i=1

The Sample Analogue of E (xi εi ) = 0 is instead


N
1 X
xi εi = 0
N i=1

which again under the assumption that the true causal relationship that we seek to capture takes indeed a
linear form, gives exactly the same condition as the second FOC from the data fitting problem
N
1 X
xi (Yi − β0 − β1 xi ) = 0
N i=1

1.5.1 OLS standard errors


To compute OLS standard errors, it is useful to introduce a matrix notation that will moreover become
essential in the multivariate case.
From our model
Yi = β0 + β1 xi + εi , i = 1, ..., N
where observations are randomly drawn from the same population, hence iid, let
     
Y1 1 x1 ε1
 ..   .. ..   .. 
 .   . .     . 
    β0  
Y = Y
 i 
 , X =  1 x i 
 , β = , ε =  εi 
 . 

 . . β1 
 .

 ..   .. ..   ..
 

YN 1 xN εN
Then, the full model, obtained from stacking all the observations i = 1, ..., N ,
   
Y1 β0 + β1 x1 + ε1
 ..   .. 
 .   . 
   
 Yi  =  β0 + β1 xi + εi 
   
 .   ..
 ..  

. 
YN β0 + β1 xN + εN
can be stated more compactly, in matrix notation, as

Y =Xβ+ε

We will show in more detail, in the multivariate case (takes this for granted, for the time being), that in
matrix notation the OLS estimator is
 
β̂0
β̂OLS = = (X 0 X)−1 X 0 Y
β̂1
Also, since Y = Xβ + ε,

β̂OLS = (X 0 X)−1 X 0 Y
= (X 0 X)−1 X 0 (Xβ + ε)
= (X 0 X)−1 (X 0 X)β + (X 0 X)−1 X 0 ε
= β + (X 0 X)−1 X 0 ε

7
Now, assume that
E(ε | X) = E(ε) = 0
and that errors are homoskedastic, and not correlated
var(εi | X) = σ 2
cov(εi , εj | X) = 0 , ∀i 6= j
These two latter, define the entries of the so-called variance-covariance matrix of ε, conditional on X
σ 2 0 ... 0
   
var(ε1 | X) cov(ε1 , ε2 | X) ... cov(ε1 , εN | X)
 cov(ε1 , ε2 | X) var(ε2 | X) ... cov(ε2 , εN | X)   0 σ 2 ... 0 
V (ε | X) =  = 
 ... ... ... ...   ... ... ... ... 
cov(ε1 , εN | X) cov(ε2 , εN | X) ... var(εN | X) 0 0 ... σ 2
to be more clear: in the first matrix we have how any variance-covariance matrix of the errors is defined,
with variances on the diagonal, and covariances off the diagonal; notice that the variance-covariance matrix
is symmetric, as cov(εi , εj | X) = cov(εj , εi | X) , ∀i 6= j; in the second matrix, we have how the variance-
covariance matrix becomes under our assumptions of homoskedasticity and uncorrelation.

Since E(ε | X) = 0, the assumptions on variances (homoskedasticity) and covariances (uncorrelation) can be
stated more compactly as
0
V (ε | X) = E (ε − E(ε | X)) (ε − E(ε | X)) | X = E(εε0 | X) = σ 2 I
 

In other words, given E(ε | X) = 0,


var(εi | X) = σ 2
cov(εi , εj | X) = 0 , ∀i 6= j
or
E(εε0 | X) = σ 2 I
are alternative ways to state the same assumption.

Now, since E(ε | X) = 0,

E(β̂OLS | X) = E(β + (X 0 X)−1 X 0 ε | X)


= E(β | X) + E((X 0 X)−1 X 0 ε | X)
= β + (X 0 X)−1 X 0 E(ε | X)

Therefore, β̂OLS − β = (X 0 X)−1 X 0 ε, and the variance-covariance matrix of the OLS estimator, under our
assumptions, is
  0 
V (β̂OLS | X) = E β̂OLS − E(β̂OLS | X) β̂OLS − E(β̂OLS | X) | X
  0 
= E β̂OLS − β | X) β̂OLS − β | X) | X
h 0 i
= E (X 0 X)−1 X 0 ε (X 0 X)−1 X 0 ε) | X
= E (X 0 X)−1 X 0 εε0 X(X 0 X)−1 | X
 

= (X 0 X)−1 X 0 E [εε0 | X] X(X 0 X)−1


= σ 2 (X 0 X)−1 X 0 X(X 0 X)−1
= σ 2 (X 0 X)−1

8
Since σ 2 is unknown and, given ε̂i = Yi − β̂0 − β̂1 xi , estimated by
N
1 X 2
b2 =
σ ε̂
N − 2 i=1 i

the estimated variance-covariance matrix of the OLS estimator is

b2 (X 0 X)−1
Vb (β̂OLS | X) = σ

This holds, under homoskedasticity and uncorrelation, both in the univariate and the multivariate case.

We now exploit some matrix algebra to specialise the formulation for the univariate case; in other words, we
need to compute (X 0 X)−1 .
Recall that  
1 x1
 .. .. 
 . . 
 
X=  1 x i 

 . .
 .. .. 

1 xN
Then,
  
1 x1
  .. .. 
  . .    P 
 1 1 ... 1
0 PN x i
 
XX=  1 xi  i
=
 P 2
 x1 x2 ... xN i xi i xi

 . .. 
 ..

 . 
1 xN
0
To invert (X X), we first need to compute its determinant
! !2 !
X X X 2
0 2 2
det(X X) = N xi − xi = N xi − (N x̄)
i i i
" ! # " ! #
X X
=N x2i − N x̄ 2
=N x2i 2
+ N x̄ − 2N x̄ 2

i i
" ! ! !# " ! ! !#
X X X X X X
=N x2i + x̄ 2
− 2 x̄ xi =N x2i + x̄ 2
− 2 x̄ xi
i i i i i i
X
x2i + x̄2 − 2 x̄ xi

=N
i
X 2
=N (xi − x̄)
i

Then,
 P 2 P   P 2 P 
1 i xi − i xi 1 i xi − xi
(X 0 X)−1 = P = P P i
det(X 0 X)− i xi N N i (xi − x̄)
2 − i xi N
 1 P 2 1
P 
1 N P i xi − N i xi
=P 1
2 − x 1
i (xi − x̄) N i i

and
b2 1
− N1
 P 2 P 
σ
b2 (X 0 X)−1 = P
Vb (β̂OLS | X) = σ N Pi xi i xi
(xi − x̄)
2 − N1 i xi 1
i

9
The entries on the diagonal of the estimated variance-covariance matrix are the estimated variances of the
parameters; the off-diagonal (symmetric) element, is the estimated covariance of the parameters.
b2
σ 1 b2
σ 1
"   #
x2i
P P
−P xi
 
ar(β̂0 | X)
vd c β̂0 , β̂1 | X)
cov( P
i (xi −x̄)
2 N i i (xi −x̄)
2 N i
Vb (β̂OLS | X) = = σ 2
1
 b2
σ
c β̂0 , β̂1 | X) ar(β̂1 | X)
P
cov( vd −P b
2 N i xi P 2
i (xi −x̄) i (x i −x̄)

where, again, notice that the first matrix is how a general estimated variance-covariance matrix for β̂0 , β̂1 is
defined, while the second matrix is how the estimated variance-covariance matrix of the OLS estimator is
characterised under our assumptions of homoskedasticity and uncorrelation.

The standard errors of the parameters are their estimated standard deviations, that is
q P
1 2
N i xi
SE(β̂0 ) = σ̂ qP
2
i (xi − x̄)

σ̂
SE(β̂1 ) = qP
2
i (xi − x̄)

1.6 Interpretation of the Coefficients


1. level-level specification
y i = β0 + β1 x i + ε i
Then,
dy
β1 =
dx
A 1 unit increase in x is associated with an increase in y of β1 units.

2. log-level specification
ln(yi ) = β0 + β1 xi + εi
Then,
d ln(y)
β1 =
dx
In general,
d ln(z) 1 dz ∆z
= ⇒ d ln(z) = ≈
dz z z z
i.e. for small changes, we can approximately interpret log changes as % changes.
Therefore,
d ln(y) dy/y
β1 = =
dx dx
A 1 unit increase in x is associated with an increase in y of 100 β1 %.
3. level-log specification
yi = β0 + β1 ln(xi ) + εi

Then,
dy
β1 =
d ln(x)
and therefore
dy d ln(x) dx
= β1 ⇒ d y = β1
dx dx x

10
A 1% increase in x is associated with an increase in y of β1 /100 units.
4. log-log specification
ln(yi ) = β0 + β1 ln(xi ) + εi
Then,
d ln(y) dy/y
β1 = =
d ln(x) dx/x
i.e. the coefficient of a log-log regression can be interpreted as an elasticity.
A 1% increase in x is associated with an increase in y of β1 %.

1.7 Multiple Regressors Case


We have data (Yi , xi1 , ..., xiK )i=1,...,N upon which we seek to fit a linear model

Yi = β1 + β2 xi2 + ... + βK xiK + εi

where the first regressor is set to be xi1 = 1, so as to rationalise the presence of the constant β1 in a useful
way.
As in the univariate case, the problem can be seen both from a data fitting perspective, and an econometric
perspective.
In both cases, it is useful to rewrite the model in matrix notation. Let
       
Y1 1 x12 ... x1k ... x1K β1 ε1
 Y2   1 x22 ... x2k ... x2K   β2   ε2 
 ..   .. .. .. ..   ..   .. 
       
 .   . . . .   .   . 
Y =  Yi  , X =  1 xi2 ... xik ... xiK  , β =  βk  , ε =  εi 
      
       
 .   . .. .. ..   .   . 
 ..   .. . . .   .
.   .
. 
YN 1 xN 2 ... xN k ... xN K βK εN

so that we can write


Y =Xβ+ε
where we can easily check that Y and ε are (N × 1) vectors, X is an (N × K) matrix, and β is a (K × 1)
vector.
Also in this case, the problem can be seen from a data fitting perspective, and an econometric perspective.

1.8 Data fitting problem


Given the vector ε, notice that
N
X
ε2i = ε0 ε
i=1

therefore, the data fitting problem can be stated as follows

min ε0 ε ⇐⇒ min (Y − Xβ)0 (Y − Xβ) ⇐⇒ min (Y 0 − β 0 X 0 )(Y − Xβ) ⇐⇒ min (Y 0 − β 0 X 0 )(Y − Xβ)
β β β β

⇐⇒ min Y Y − β X Y − Y Xβ + β X Xβ ⇐⇒ min Y Y − 2 β X Y + β 0 X 0 Xβ
0 0 0 0 0 0 0 0 0
β β

where in the last step we exploit that in general, given two matrices A and B, (A0 B) = (B 0 A), hence
Y 0 (Xβ) = (Xβ)0 Y = β 0 X 0 Y .

11
First order conditions are
−2 X 0 Y + 2 X 0 X β̂ = 0 ⇐⇒ X 0 X β̂ = X 0 Y ⇐⇒ (X 0 X)−1 X 0 X β̂ = (X 0 X)−1 X 0 Y
therefore,
β̂ = (X 0 X)−1 X 0 Y

1.9 Econometric perspective


From an econometric perspective, we have our usual exogeneity assumption
E(ε | X) = E(ε) = 0
which implies in turn
E(X 0 ε) = 0
imposing this on its sample analogue, we have
1
(X 0 ε) = 0 ⇐⇒ X 0 ε = 0 ⇐⇒ X 0 (Y − X β̂) = 0 ⇐⇒ X 0 Y − (X 0 X)β̂ = 0 ⇐⇒ (X 0 X)β̂ = X 0 Y
N
⇐⇒ β̂ = (X 0 X)−1 X 0 Y
Notice that in both cases, we need a condition/assumption to ensure the existence of the OLS estimator:
that the (X 0 X)−1 matrix exists. This requires in turn that rank(X 0 X) = K; this is the “no collinearity”
assumption: if two or more regressors in X are perfectly correlated (they are “collinear”), then the condition
is not satisfied and we cannot compute the OLS estimator.

1.9.1 OLS variance-covariance matrix and standard errors


Now consider the full linear model, also under the exogeneity and homoskedasticity and no correlation
assumptions
Y =Xβ+ε
E(ε | X) = E(ε) = 0
V (ε | X) = σ 2 I
The steps to derive the OLS variance-covariance matrix are the same that we saw in the 1-regressor case,
with the only difference that here X is of generic (N × K) dimension, instead of (N × 2).
Recall that we derived
V (β̂ | X) = σ 2 (X 0 X)−1
and, given that σ 2 is unobserved and estimated by
N
1 X 1
2
σ̂ = ε̂2 = ε̂0 ε̂
N − K i=1 i N −K

where ε̂ = Y − X β̂ = Y − X(X 0 X)−1 X 0 Y , then the estimated variance-covariance matrix of the OLS
estimator is
Vb (β̂ | X) = σ̂ 2 (X 0 X)−1
elements on the diagonal are estimated variances, and their square roots are estimated standard errors
 
ar(β̂1 | X)
vd c β̂1 , β̂2 | X) ... cov(
cov( c β̂1 , β̂K | X)
c β̂1 , β̂2 | X)
 cov( ar(β̂2 | X)
vd c β̂2 , β̂K | X) 
... cov(
Vb (β̂ | X) =  
 ... ... ... ... 
c β̂1 , β̂K | X) cov(
cov( c β̂2 , β̂K | X) ... ar(β̂K | X)
vd

q
SE(β̂j ) = ar(β̂j | X) ,
vd j = 1, 2, ..., K

12
2 Applied R example

######################## UNIVARIATE REGRESSIONS ########################


######################## Y = a + b*x + u ########################

# Clear the workspace


rm(list=ls())

# Load the data


library(foreign)
wage2 <- read.dta("wage2.dta")

# we want to fit a simple model from our wage data, explaining how education
# affects wages:

# wage = a + b*education + u

# we begin by choosing the right variables


colnames(wage2) # gives colnames of the data frame (= variables in the dataset)
## [1] "wage" "hours" "IQ" "KWW" "educ" "exper" "tenure"
## [8] "age" "married" "black" "south" "urban" "sibs" "brthord"
## [15] "meduc" "feduc" "lwage"
# on the console
class(wage2$wage)
## [1] "numeric"
class(wage2$educ)
## [1] "numeric"
# manual calculation
b_hat <- cov(wage2$wage,wage2$educ)/var(wage2$educ)
a_hat <- mean(wage2$wage) - b_hat*mean(wage2$educ)

# If we know we will be working with the same data frame object for a while,
# it is convenient to use "attach" to ease coding
attach(wage2)
b_hat <- cov(wage,educ)/var(educ)
a_hat <- mean(wage) - b_hat*mean(educ)
detach(wage2)

# R command for fitting linear models


(wage_reg <- lm(wage ~ educ, data=wage2))
##
## Call:
## lm(formula = wage ~ educ, data = wage2)
##
## Coefficients:
## (Intercept) educ
## 146.95 60.21

13
y_hat <- fitted(wage_reg) # fitted values: y_hat = a_hat + b_hat*x
u_hat <- resid(wage_reg) # redisuals: u_hat = y - a_hat - b_hat*x = y - y_hat

# a visual comparison of y and y_hat, and u_hat may enhance the understanding
attach(wage2)
y <- data.frame(wage,y_hat,u_hat)
detach(wage2)

# Recall that OLS imposes 0 covariance bw u and x ...


cov(wage2$educ,u_hat)

## [1] -3.486193e-13

# ... and zero mean for the error term


(u_bar <- mean(wage2$wage- a_hat - b_hat*mean(wage2$educ)))

## [1] -4.948877e-14

######################### GOODNESS OF FIT ############################

TSS <- (length(wage2$wage)-1)*var(wage2$wage)


ESS <- (length(wage2$wage)-1)*var(y_hat)
RSS <- (length(wage2$wage)-1)*var(u_hat)

(R2 <- ESS / TSS)

## [1] 0.1070001

1- RSS/TSS #same

## [1] 0.1070001

cor(wage2$wage,y_hat)^2 #same

## [1] 0.1070001

# REMEMBER: we don't really care too much about the R^2 in many research scenarios.
# There can be "good" econometric models that have a very low R^2,
# as well as "bad" econometric models that have very high R^2.
# The R^2 is more related to data fitting than econometrics!

# STANDARD ERRORS OF THE OLS COEFFICIENTS


N <- length(wage2$wage)
sigma_hat <- sd(u_hat)*sqrt((N-1)/(N-2)) # we need a dof correction for having
#sigma hat unbiased
ssx <- (N-1)*var(wage2$educ)

# standard error of beta_0


(se0 <- sigma_hat * sqrt(mean(wage2$educ^2)) / sqrt(ssx))

## [1] 77.71496

# standard error of beta_1


(se1 <- sigma_hat / sqrt(ssx))

## [1] 5.694982

14
# All of this is good for a thorough understanding of what we are doing when we
# perform regression analysis. But it wouldn't be very efficient to compute each
# time all this information singularly. R actually gives us directly all of this
# when running the LM command
summary(lm(wage ~ educ, data=wage2))

##
## Call:
## lm(formula = wage ~ educ, data = wage2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -877.38 -268.63 -38.38 207.05 2148.26
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 146.952 77.715 1.891 0.0589 .
## educ 60.214 5.695 10.573 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 382.3 on 933 degrees of freedom
## Multiple R-squared: 0.107,Adjusted R-squared: 0.106
## F-statistic: 111.8 on 1 and 933 DF, p-value: < 2.2e-16

### INTERPRETATION OF COEFFICIENTS


# level-level
summary(lm(wage ~ educ, data=wage2))

##
## Call:
## lm(formula = wage ~ educ, data = wage2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -877.38 -268.63 -38.38 207.05 2148.26
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 146.952 77.715 1.891 0.0589 .
## educ 60.214 5.695 10.573 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 382.3 on 933 degrees of freedom
## Multiple R-squared: 0.107,Adjusted R-squared: 0.106
## F-statistic: 111.8 on 1 and 933 DF, p-value: < 2.2e-16

# log-level
wage2$logwage <- log(wage2$wage)
summary(lm(logwage ~ educ, data=wage2))

##
## Call:
## lm(formula = logwage ~ educ, data = wage2)

15
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.94620 -0.24832 0.03507 0.27440 1.28106
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.973062 0.081374 73.40 <2e-16 ***
## educ 0.059839 0.005963 10.04 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4003 on 933 degrees of freedom
## Multiple R-squared: 0.09742,Adjusted R-squared: 0.09645
## F-statistic: 100.7 on 1 and 933 DF, p-value: < 2.2e-16

# level-log
wage2$logeduc <- log(wage2$educ)
summary(lm(wage ~ logeduc, data=wage2))

##
## Call:
## lm(formula = wage ~ logeduc, data = wage2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -878.29 -262.41 -37.67 204.53 2138.72
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1185.5 203.5 -5.826 7.82e-09 ***
## logeduc 828.4 78.5 10.553 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 382.4 on 933 degrees of freedom
## Multiple R-squared: 0.1066,Adjusted R-squared: 0.1057
## F-statistic: 111.4 on 1 and 933 DF, p-value: < 2.2e-16

# log-log
wage2$logwage <- log(wage2$wage)
summary(lm(logwage ~ logeduc, data=wage2))

##
## Call:
## lm(formula = logwage ~ logeduc, data = wage2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.94925 -0.24818 0.03866 0.27282 1.27167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.63932 0.21297 21.78 <2e-16 ***
## logeduc 0.82694 0.08215 10.07 <2e-16 ***

16
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4002 on 933 degrees of freedom
## Multiple R-squared: 0.09796,Adjusted R-squared: 0.09699
## F-statistic: 101.3 on 1 and 933 DF, p-value: < 2.2e-16

################## MULTIVARIATE REGRESSIONS #################


################## Y = b0 + b1*x1 + ... + bk*xk + u #################

# Clear the workspace


rm(list=ls())

# Load the data


library(foreign)
wage2 <- read.dta("wage1.dta")
N <- length(wage2$lwage)

# Assume now we think wage is also dependent on experience and tenure, aside
# from education.
# We now fit a model with multiple regressors:
# log(wage) = b0 + b1*educ + b2*exper + b3*tenure + u
# notice that we have N = 526 observations and K = 4 regressors
# (including the constant)
summary(lm(log(wage) ~ educ + exper + tenure, data=wage2))

##
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure, data = wage2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.05802 -0.29645 -0.03265 0.28788 1.42809
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.284360 0.104190 2.729 0.00656 **
## educ 0.092029 0.007330 12.555 < 2e-16 ***
## exper 0.004121 0.001723 2.391 0.01714 *
## tenure 0.022067 0.003094 7.133 3.29e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4409 on 522 degrees of freedom
## Multiple R-squared: 0.316,Adjusted R-squared: 0.3121
## F-statistic: 80.39 on 3 and 522 DF, p-value: < 2.2e-16

# As before, the lm command makes our life way easier. To bridge this output with
# the theory studied in class, we can actually estimate the OLS parameters and
# standard errors step by step from our data

# OLS in Matrix Form


Y <- as.matrix(wage2[c("lwage")] ) # Y vector (N x 1)
X <- as.matrix(wage2[c("educ","exper","tenure")] ) # X matrix (N x K)

17
# we must add a first column of ones to the X matrix, so as to account for the
# constant b0 in beta_hat
const <- rep(1,N) # vector of ones (or any number you wish) of length 10
X <- cbind(const,X)
XtX <- t(X)%*%X # X'X matrix (K x K): t(X) gives the transpose and %*% matrix multiplication
XtY <- t(X)%*%Y # X'Y matrix (K x 1)
invXtX <- solve(XtX) # (X'X)^{-1} matrix (K x K)

beta_hat <- invXtX%*%XtY # OLS estimator


K <- length(beta_hat) # number of estimated parameters
# you can easily check that we get the same coefficients as with R's "lm" command

Yhat <- X%*%beta_hat # vector of fitted values Y_hat = X*beta_hat


uhat <- Y - Yhat # vector of residuals

# estimated variance of the errors (under homoskedasticity)


sig2_hat <- as.numeric(t(uhat)%*%uhat / (N-K))

# estimated variance-covariance matrix of beta_hat: Vhat = sig2_hat * (X'X)^{-1}


Vhat <- as.matrix(sig2_hat * invXtX)

# this command extracts diagonal elements of Vhat (estimated variances)


var_hat <- as.matrix(diag(Vhat)) # get a K-dimensional vector of estimated variances

# standard errors of the estimated parameters


se <- sqrt(var_hat)
# you can again easily check that we get the same standard errors as with R's
# "lm" command

18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy