0% found this document useful (0 votes)
25 views45 pages

ECO375H Slides 4

The document discusses multiple regression analysis, focusing on the unbiasedness of Ordinary Least Squares Estimators (OLSE) and the advantages of using multiple regressions over simple regressions. It explains the mechanical properties of OLSE, the importance of assumptions like SLR2 for unbiasedness, and the interpretation of regression coefficients. Additionally, it covers concepts such as R-squared and adjusted R-squared in evaluating model fit.

Uploaded by

shoppingymd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views45 pages

ECO375H Slides 4

The document discusses multiple regression analysis, focusing on the unbiasedness of Ordinary Least Squares Estimators (OLSE) and the advantages of using multiple regressions over simple regressions. It explains the mechanical properties of OLSE, the importance of assumptions like SLR2 for unbiasedness, and the interpretation of regression coefficients. Additionally, it covers concepts such as R-squared and adjusted R-squared in evaluating model fit.

Uploaded by

shoppingymd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Lecture 4: Multiple Regression Analysis:

Estimation

Junichi Suzuki
University of Toronto

October 6th, 2011


Revisiting the Unbiasedness of OLSE in SR
I OLSE is unbiased under SLR 1-4
" #
∑ni=1 (xi x̄ ) ui
E β̂1 jx = β1 + E x
∑ni=1 (xi x̄ )2
∑ni=1 (xi x̄ ) E (ui jx)
= β1 +
(xi x̄ )2
∑ni=1
∑n (x x̄ ) E (ui jxi )
= β 1 + i =1 n i 2
∑i =1 (xi x̄ )
= β1
I The proof relies on the following equality
E (ui jx) = E (ui jxi )
where x = [x1 , x2 , . . . , xn ]
I Which assumption implies this equality?
Revisiting the Unbiasedness of OLSE in SR
I Short Answer: SLR2 (random sample)

I Technically, this assumption implies

f (x1 , u1 , , xn , un ) = Πni=1 f (xi , ui )


| {z } | {z }
joint density product of each density

I Intuitively, under SLR2, the distribution of (xi , ui ) is not


a¤ected by any xi 0 6=i and ui 0 6=i

I SLR2 may not hold when a sample is taken from workers


in the same company

I See the supplementary notes for a formal proof


Chapter 3: Multiple Regression Analysis:
Estimation
Big Picture
I Multiple regression (MR): Regression with more than one
regressors

y = β0 + β1 x1 + + βk xk + u

I Simple regressions are a special case of multiple


regressions with βi = 0 for i 2

I Again, OLSE β̂0 , β̂1 , . . . , β̂n minimizes the SSR


n
2
SSR = ∑ yi β̂0 β̂1 x1i β̂2 x2i β̂k xki
i =1

I Many properties found in simple regressions still hold


Outline

I Motivation: Advantage of using MR instead of SR

I De…nition

I Mechanical properties
I Partialling Out Interpretation
I Goodness of Fit: R-squared in MR
I Adjusted R-squared

I Statistical Properties
I Unbiasedness
I Omitted Variable Bias
Motivation for Multiple Regressions
Why Do We Want To Run Multiple Regressions?

I Accurate estimates of causality e¤ects

I Better prediction

I Flexible functional form


Example: Wage Function with More Than One
Observables

I Suppose wage is a function of (1) education, (2)


experience and (3) some other unobservable factors

wage = β0 + β1 educ + β2 exper + u

I Assume E (u jeduc, exper ) = 0

I Want to …nd an unbiased estimator for β1


World Without MR
I Suppose all we know is a simple regression

I Consider to run a SR of wage on educ by using all samples


wage = β0 + β1 educ + ũ
where ũ = β2 exper + u
I To maintain the unbiasedness of OLS estimator β̂1 , have
to assume E (exper jeduc ) = 0

I This assumption collapses if education is related to the


degree of specialization
I CG specializes in one profession (e.g., accountant)
I HG tends to work in various categories (e.g., cashier,
driver etc.)

I β̂1 could be biased


World Without MR

I More sophisticated option is to limit the data to workers


with the same experience exper

I Run regression by using this particular sample

wage = β̃0 + β1 educ + u


where β̃0 = β0 + β2 exper

I Since E (u jeduc ) = 0 holds, β̂1 is unbiased

I Might need to throw away many data points, causing


large variance
World With MR

I With multiple regressions


I can obtain an unbiased estimate of β1 without giving up
a lot of data points
I can separate the impacts of education from that of
experience

I Moreover, if both education and experience a¤ect wage


(i.e., β1 6= 0 and β2 6= 0), R-squared always increases
(i.e., better prediction)
Another Advantage: Inserting Nonlinearity
I Multiple regressions allow us to use a more ‡exible
functional form

I Suppose one’s consumption is a quadratic function of


one’s income

cons = β0 + β1 inc + β2 inc 2 + u


dcons
= β1 + 2β2 inc
dinc
I MR can estimate this model

I Simple regression cannot capture this type of nonlinearity


(even if you use log!)
De…nition of the OLSE
De…nition of OLSE in MR
I Model:
y = β0 + β1 x1 + + βk xk + u
I Data consist of n data points: fyi , xi 1 , xi 2 , . . . , xik gni=1

I xij : ith observation of jth regressor

I Want to …nd the value of β̂0 , β̂1 β̂k that minimizes


the SSR
n
2
minβ̃
0 , β̃1 β̃k , ∑ yi β̃0 β̃1 xi 1 β̃k xik
i =1
| {z }
SSR

I OLS estimators are the solution of this optimization


problem β̂0 , β̂1 β̂k
How to Calculate OLSE in MR

I As before, the problem is a mere minimization problem

I The solution β̂0 , β̂1 , . . . , β̂k must satisfy the following


necessary conditions:

1 eq ∑ni=1 yi β̂0 β̂1 xi 1 β̂k xik = 0


8
n
< ∑i =1 xi 1 yi
> β̂0 β̂1 xi β̂k xik = 0
k eqs ..
> .
:
∑ni=1 xik yi β̂0 β̂1 xi β̂k xik = 0

I Find (k + 1) unknowns from a system of (k + 1) linear


equations
How to Calculate OLSE in MR

I In SR, we had a simple expression of OLSE

∑ni=1 (yi ȳ ) (xi x̄ )


β̂1 = 2
∑ni=1 (xi x̄ )
β̂0 = ȳ β̂1 x̄

I In MR, the solution is too complicated to write down


without using matrices
1
Matrix representation of OLSE: ^
β = X 0X X0 y

I Manual calculation becomes a mess but computers can


do this very well
Mechanical Properties of OLSE
Mechanical Properties of OLSE
I Many mechanical properties in SR have their counterparts
in MR

I De…ne predicted values and residuals:


(predicted value) ŷi = β̂0 + β̂1 xi 1 + + β̂k xik
sample regression function (SRF)
(residual) ûi = yi ŷi
I (1) The sample average of residuals are zero
n
∑ ûi = 0
i =1
I (2) The sample covariance between xj and û is zero
n
∑ ûi xij = 0 for all j 2 f1, . . . , k g
i =1
Mechanical Properties of OLSE
I (3) The sample regression function (SRF) goes through
(ȳ, x̄i 1 , x̄ik )

ȳ = β̂0 + β̂1 x̄i 1 + + β̂k x̄ik

I Proof:
1 n
n i∑
ȳ = yi
=1
1 n
n i∑
= (ŷi + ûi )
=1
1 n 1 n
n i∑ ∑ ûi
= ŷi +
=1 n i =1
= β̂0 + β̂1 x̄i 1 + + β̂k x̄ik
Basic Interpretation of MR

I Constant term β̂0 represents the predicted value of y


when x1 = x2 = 0

I β̂1 represents the predicted marginal impact of x1 on y


when all other things (i.e., xj >1 ) do not change

∂y
= β1
∂x1
Example: Wage Equation

I Regression of ln w on education(educ) and


experience(exper) generates

ln w = .284 + .092 educ + .0041 exper

I Holding experience …xed, another one-year education


increases one’s wage by about 9.2%

I Holding education …xed, another one-year experience


increases one’s wage by about 0.41%
Partialling Out Interpretation of MR

I Can we always distinguish the e¤ects of educ from that of


exper from data?

I Short answer: No

I Partialling out interpretation of MR is helpful to make


this point clear
Partialling Out Interpretation of MR
I The Model
yi = β0 + β1 xi 1 + β2 xi 2 + + βk xik + u
I Goal: Obtain OLSE of MR β̂1 as an OLSE of SR
I Steps
1. Regress x1 on x2 , , xk and calculate the residual r̂1
xi 1 = α0 + α2 xi 2 + + αk xik + r1i
r̂i 1 = xi 1 x̂i 1
2. Regress y on r̂1
yi = λ0 + λ1 r̂i 1 + ei
3. The resulting slope estimate λ̂1 is always equal to β̂1
∑ni=1 (yi ȳ ) (r̂i 1 0) ∑ni=1 yi r̂i 1
β̂1 = λ̂1 = =
∑ni=1 (r̂i 1 0)2 ∑ni=1 r̂i21
Intuition Behind

I r̂1 : the variation in xi 1 that is not explained by the


variation in (xi 2 , xi 3 , , xik )

I β̂1 is derived by a regression of y on r̂1 ,the variation that


is unique to x1 (i.e., zero sample covariance with xj >1 )

∑ni=1 r̂i 1 xij


= 0 for all j 2 f2, , kg
n
I The variation that is common to both x1 and x2 is not
useful to distinguish β1 from β2
Example: No Unique Variation

I Wage function among college graduates

ln w = β0 + β1 age + β2 exper + u

I Assume
exper = age 22
I Can’t estimate β1 since age has NO unique variation
I Regressing age on exper generates a perfect …t (i.e.,
r̂1i = 0 for all i)
2
I Can’t regress ln w on r̂1 as ∑ni=1 r̂1i r̂ 1 = 0
Example: No Unique Variation
I What does it mean by not being able to run a regression?
0 0 1 12
n
min ∑ @yi
β̃0 , β̃1 , β̃2 i =1
β̃0 β̃1 agei β̃2 @agei 22AA
| {z }
experi
0 12
n
B C
= min ∑ @yi
β̃0 , β̃1 , β̃2 i =1 |
β̃0 22β̃
{z 2 }
β̃1 + β̃2 agei A
| {z }
α̃0 α̃1
n
= min
α̃0 ,α̃1
∑ (yi α̃0 α̃1 agei )2
i =1

I Only thing we hope to estimate from the data is(α̂0 , α̂1 )

I In…nite number of combinations of β̂0 , β̂1 , β̂2 generate


the same (α̂0 , α̂1 )
Goodness of Fit
I As in the case of SR, we can de…ne SST, SSE and SSR
8
> n 2
< SST = ∑i =1 (yi ȳ )
SSE = ∑ni=1 (ŷi ȳ )2
>
: SSR = n (y 2
∑i =1 i ŷi )

I By following the same logic before,

SST = SSE + SSR

I Can also de…ne R-squared:


SSE SSR
R2 = =1
SST SST
I R-squared always falls between 0 and 1
R-squared in MR
I R-squared never decreases when more regressors are
added
I Consider two regressions
8
< (1) ln w = β0 + β1 educ + ũ
(2) ln w = β0 + β1 educ + β2 exper + u
:
where ũ = β2 exper + u
I SSRs are de…ned as
n
2
SSR1 = min ∑
β̃0 , β̃1 i =1
ln wi β̃0 β̃1 educi 0 exper
n
2
SSR2 = min ∑
β̃0 , β̃1 , β̃2 i =1
ln wi β̃0 β̃1 educi β̃2 exper

I The second problem has more freedom than the …rst


SSR1 SSR2
Adjusted R-squared

I Might be unfair to compare a regression with fewer


regressors with one with more regressors by their R 2

I Adjusted R-squared penalize high R-squared brought by


many regressors in a particular way

SSR/ (n k 1) SSR ( n 1)
AdjR 2 = 1 =1
SST / (n 1) |SST
{z } ( n k 1)
| {z }
0 1 >1

I It can go below zero (but never above one)

I STATA reports this statistic as one of the default outputs


Statistical Properties of OLSE
Gauss-Markov Assumptions

MLR.1 (Linear in parameters):


y = β0 + β1 x1 + β2 x2 + + βk xk + u

MLR.2 (Random Sampling):


(xi 1 , xi 2 , . . . , xik , yi )ni=1 is a random sample
from the population regression

MLR.3 (No perfect Collinearity):


In the sample, none of the regressors are constant,
and no exact linear relationships among them

MLR.4 (Zero Conditional Mean): E (u jx1, x2 , . . . , xk ) = 0

MLR.5 (Homoskedasticity): Var (u jx1, x2 , . . . , xk ) = σ2


What MLR3 Excludes
I Perfect linear relationships among regressors in the sample
k
x1 = α0 + ∑ αj xj
j =2

I Case 1: Perfect collinearity in the sample but not in the


population
I Happen to draw a sample with no variation
I More likely to occur when sample size is small
I e.g., All data points in the sample happen to be college
graduates (while not in population)

I Case 2: Perfect collinearity in the population (and hence


in the sample)
I Linear relationship between regressors by construction
Example: Perfect Collinearity in the Population

I Representing the same number in di¤erent units


I Income in dollars (inc) and in thousand dollars (inck):
inc = 1000 inck

I Two variables have a linear relationship by construction


I Age and experience: exper = age 22

I Using both the share of male workers (shm ) and the


share of female workers (shf ): shm = 1 shf
Unbiasedness of OLSE
I Under MLR.1 to MLR. 4, OLSE is unbiased
E β̂j = βj for all j 2 f0, . . . , k g
I Consider a case (j = 1 and k = 2) without loss of
generality
I Use the partialling out interpretation
∑ni=1 r̂i 1 yi
β̂1 =
∑ni=1 r̂i21
I Note that
xi 1 = α̂1 + α̂2 xi 2 + r̂i 1
n n
∑ r̂i 1 = 0, ∑ r̂i 1 xi 2 = 0
i =1 i =1
n n n
∑ r̂i 1 xi 1 = ∑ r̂i 1 (α̂1 + α̂2 xi 2 + r̂i 1 ) = ∑ r̂i21
i =1 i =1 i =1
Proving Unbiasedness of OLSE (j=1 and k=2)

∑ni=1 r̂i 1 yi
E β̂1 jx = E x
∑ni=1 r̂i21
∑ni=1 r̂i 1 ( β0 + β1 xi 1 + β2 xi 2 + ui )
= E x
∑ni=1 r̂i21
β (∑n r̂i 1 ) + β1 (∑ni=1 r̂i 1 xi 1 ) + β2 (∑ni=1 r̂i 1 xi 2 )
= 0 i =1
∑ni=1 r̂ij2
β1 ∑ni=1 r̂i21
=
∑ni=1 r̂i21
= β1
Choosing Regressors

I Which variables should we use as regressors?

I Two situations:
I Case 1: Include variables that are irrelevant
i.e., βj = 0

I Case 2: Relevant variables i.e., βj 6= 0 are not


observable (hence can’t include)

I Need to know the cost of running misspeci…ed regressions

I Is OLSE still unbiased?


Case 1: Including Irrelevant Variables
I True Model: educ and exp a¤ect w but train doesn’t
ln w = β0 + β1 educ + β2 exper + 0 train + u
E (u jeduc, exper, train) = 0
I Consider two regressions:
ln w = β̂0 + β̂1 educ + β̂2 exper + β̂3 train
ln w = β̃0 + β̃1 educ + β̃2 exper
I In general, β̂0 6= β̃0 , β̂1 6= β̃1 , β̂2 6= β̃2 , β̂3 6= 0
I However, both of them are unbiased estimators as
E (u jeduc, exper, train) = 0
and hence E (u jeduc, exper ) = 0

E β̂i = E β̃i = βi for i 2 f0, 1, 2g


E β̂3 = 0
Case 2: Not Including Relevant Variables

I Called omitted variables

I One of the most challenging problems in econometrics

I Consider two cases:


I Simple case (k = 2)

I General case (k 3) next week


Omitted Variable Bias: Simple Case (k=2)
I Consider wage function

ln w = β0 + β1 educ + β2 ability + u
E (u jeduc, ability ) = 0

I If ability were observable, the regression of ln w on educ


and ability brings unbiased estimators
d
ln w = β̂0 + β̂1 educ + β̂2 ability

I Since ability is not observable, all we can do is to regress


ln w on educ
g
ln w = β̃0 + β̃1 educ
I In general, β̂0 6= β̃0 and β̂1 6= β̃1

I But can we expect βi = E β̃i for i 2 f0, 1g?


Omitted Variable Bias: Simple Case (k=2)

I Short answer: No except special cases

I Steps
I Derive
β̃1 = β̂1 + β̂2 δ̃1
|{z}
unbiased

where δ̃1 is the OLSE of a regression of ability on educ

ability = δ0 + δ1 educ + r1

I Show E β̂2 δ̃1 jx 6= 0 in general


Omitted Variable Bias
I OLSE is biased when we omit some variables from a
regression
E β̃1 jx = E β̂1 + β̂2 δ̃1 jx
= E β̂1 jx + E β̂2 jx δ̃1
= β1 + β2 δ̃1
|{z}
Bias
I When k = 2, easy to evaluate the direction of the bias
Cov (x1 , x2 ) > 0 Cov (x1 , x2 ) < 0
β2 > 0 +
β2 < 0 +
I Example
β2 > 0
=> E β̃1 jx > β1
Cov (educ, ability ) > 0 | {z }
Upward Bias
Graphical Image of Omitted Variable Bias
Proof (k=2)
I Let y = ln w, x1 = educ, x2 = ability

∑ni=1 (yi ȳ ) (xi 1 x̄1 )


β̃1 = 2
∑ni=1 (xi 1 x̄1 )
∑ni=1 yi (xi 1 x̄1 )
= 2
∑ni=1 (xi 1 x̄1 )
∑ni=1 β̂0 + β̂1 xi 1 + β̂2 xi 2 (xi 1 x̄1 )
= 2
∑ni=1 (xi 1 x̄1 )
β̂1 ∑ni=1 xi 1 (xi 1 x̄1 ) + β̂2 ∑ni=1 xi 2 (xi 1 x̄1 )
= 2
∑ni=1(xi 1 x̄1 )
∑ni=1 (xi 2 x̄2 ) (xi 1 x̄1 )
= β̂1 + β̂2 2
∑ni=1 (xi 1 x̄1 )
= β̂1 + β̂2 δ̃1
Summary

I Study the very basic of multiple regressions

I Partialling out interpretations demonstrate that unique


variation is necessary

I Gauss-Markov assumptions 1-4 (but not 5) guarantee


unbiasedness

I Omitted variables often bring biased estimators

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy