0% found this document useful (0 votes)
9 views39 pages

ECO375H_Slides_3

The lecture covers the concepts of simple and multiple regression models, focusing on R-squared, OLS estimators, and their properties. It emphasizes the importance of multiple regression for accurate causal estimates and better predictions while discussing the assumptions required for unbiased estimators. The document also outlines the process for calculating OLS estimators in multiple regression scenarios.

Uploaded by

shoppingymd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views39 pages

ECO375H_Slides_3

The lecture covers the concepts of simple and multiple regression models, focusing on R-squared, OLS estimators, and their properties. It emphasizes the importance of multiple regression for accurate causal estimates and better predictions while discussing the assumptions required for unbiased estimators. The document also outlines the process for calculating OLS estimators in multiple regression scenarios.

Uploaded by

shoppingymd
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Lecture 3: The Simple and Multiple

Regression Models

Junichi Suzuki
University of Toronto

September 29th, 2011


Outline

I R-squared revisited

I OLSE as Random Variables

I Introduction to Multiple Regressions


A Note of Caution

I Those who took ECO220 need to be aware of a notation


change
ECO220 ECO375
Sum of Sq. Errors Sum of Sq. Residuals
∑ni=1 (yi ŷi )2
(SSE) (SSR)
Sum of Sq. Regressions Explained Sum of Sq.
∑ni=1 (ŷi ȳi )2
(SSR) (SSE)
R-squared Revisited

I R-squared
SSE SSR
R2 = =1
SST SST
n
where SST = ∑ (yi ȳ )2
i =1

I Properties
I Always between 0 and 1
I Increase when SSR decreases for given SST
I Well-de…ned only if SST > 0 (i.e., requires some
variation in yi )
R-squared Revisited
I Consider two straight lines:
I the horizontal line that best …ts the data
n n
min ∑ yi
2
β̃0 i =1
β̃0 = ∑ (yi ȳ )2 = SST
i =1

I the (not necessarily horizontal) straight line that best


…ts the data
n

2
min yi β̃0 β̃1 x = SSR
β̃0 , β̃1 i =1

I R-squared measures the extent to which allowing a


nonzero slope improves the …t
I SSR=SST (i.e., no improvement) =>R 2 = 0

I SSR=0 (i.e., perfect …t) => R 2 = 1


OLS Estimators as Random Variables
Outline

I Statistical View on a Dataset

I OLS Estimators as Random Variables

I Statistical Properties of OLSE


I Assumptions
I Expected Value
I Variance
I Standard Deviation
Statistical View on a Dataset

I Goal: Learn about the population by looking at a realized


sample

I Challenge
I A realized sample may or may not resemble the
population
I Any statement about the population based on a realized
sample is subject to errors (sampling errors)

I Important to evaluate the size of such a risk

I The lower the risk, the more reliable the estimator


Example

I Population: Workers in the population

I A realized sample: 526 workers in Table 1.1

I Goal: Learn about the wage impacts of education in the


population from a realized sample

I Examples of possible discrepancy


I Fractions of college graduates
I The average wage
OLSE as Random Variables

I OLSE is a function of a sample of size n, fxi , yi gni=1

∑ni=1 (xi x̄ )(yi ȳ )


β̂1 = 2
∑ni=1 (xi x̄ )
β̂0 = ȳ β̂1 x̄

I OLSE are random variables as fxi , yi gni=1 are random

I Can de…ne its distribution, expected value and variance

I Can quantify the risk of using β̂0 and β̂1 instead of β0


and β1
Key Assumptions

SLR.1 (Linear in parameters): y = β0 + β1 x + u

SLR.2 (Random Sampling): (xi , yi )ni=1 is a random sample


from the population

SLR.3 (Sample Variation in xi ): (x )ni=1 are not all the same


value

SLR.4 (Zero Conditional Mean): E (u jx ) = 0

SLR.5 (Homoskedasticity): Var (u jx ) = σ2


These Assumptions Do Not Hold When ....

I (SLR.1) Function is nonlinear

y = β 0 + x β1 + u

I (SLR.2) Pick workers so that yi is always less than $100K

I (SLR.3) Everyone has the same education (e.g., HG)

xi = 12 for all i
These Assumptions Do Not Hold When ....

I (SLR.4) Those with high IQ (i.e., u) are more likely to go


to college

E (u jx = 12) = 80 and E (u jx = 16) = 120

I (SLR.5) The variation of IQ among CG is larger than that


among HG

Var (u jx = 12) = 10 and Var (u jx = 16) = 15

I Notations
I CG: college graduates
I HG: high school graduates
Unbiasedness of OLS Estimators

I Conditional on x = fx1 , x2 , . . . , xn g, OLS estimators are


unbiased

E β̂0 jx = β0
E β̂1 jx = β1

The law of iterated expectations (CE.4 p. 735) indicates the


OLS estimators are unbiased unconditionally as well

E β̂0 = E E β̂0 jx = β0
E β̂1 = E E β̂1 jx = β1
Intuitions Behind

I Unbiasedness merely means the expected value of the


estimates are equal to the true parameters

I Say nothing about the estimate obtained from a


particular realized sample

I A realized pair of β̂0 , β̂1 can be very di¤erent from


( β0 , β1 )

I Cannot tell if it is the case or not


Unbiased and Biased Estimators
Proofs
I Fact (See the supplementary note for the derivation):

∑ni=1 (xi x̄ ) ui
β̂1 = β1 +
∑ni=1 (xi x̄ )2

I Unbiasedness of β̂1 conditional on x1


" #
∑ni=1 (xi x̄ ) ui
E β̂1 jx = β1 + E x
∑ni=1 (xi x̄ )2
∑ni=1 (xi x̄ ) E (ui jx)
= β1 + 2
∑ni=1 (xi x̄ )
∑ni=1 (xi x̄ ) E (ui jxi )
= β1 +
∑ni=1 (xi x̄ )2
= β1
Proofs

I Unbiasedness of β̂0 conditional on x

E β̂0 jx = E ȳ β̂1 x̄
= E β0 + β1 x̄ + ū β̂1 x̄ jx
= β0 + E β1 β̂1 jx x̄ + E (ū jx)
1 n
n i∑
= β0 + E β1 β̂1 jx x̄ + E (ui jx)
=1
1 n
n i∑
= β0 + E β1 β̂1 jx x̄ + E (ui jxi )
=1
= β0
Variance of OLS Estimators

I When two di¤erent estimators are unbiased, the next


criterion is their variances

I The smaller the variance, the smaller the chance of


making a serious mistake

I Unbiased estimators with smaller variance (called "more


e¢ cient") are preferred
Variance of OLS Estimators

σ2
Var β̂1 jx = 2
∑i (xi x̄ )
σ2 n 1 ∑ni=1 xi2
Var β̂0 jx =
∑i (xi x̄ )2

I Var β̂1 jx decreases when


I variation in x becomes larger
I σ2 becomes smaller

I Var β̂0 jx decreases as σ2 becomes smaller


Graphical Representation
Proof
!
∑n 1 (xi x̄ ) ui
Var β̂1 jx = Var β1 + i = x
∑ni=1 (xi x̄ )2
∑ni=1 (xi x̄ )2
= 2
Var (ui jx)
2
∑ni=1 (xi x̄ )
σ2 ∑ni=1 (xi x̄ )2
= 2
∑ni=1 (xi x̄ )2
σ2
=
∑ni=1 (xi x̄ )2

I See Question 2.10 for Var β̂0 jx tedious


Estimating the Variance of the Error Term
I Estimation of Var β̂0 jx and Var β̂1 jx requires the
estimation of σ2

I Let’s de…ne σb2 as follows:


n
1
2 i∑
σb2 = ûi2
n =1

I In fact, σb2 is an unbiased estimator for σ2 (See p. 57 for


proof)
E σb2 = σ2
I Can estimate Var β̂0 jx and Var β̂1 jx by

σb2 σb2 n 1 ∑ni=1 xi2


Var\
β̂1 jx = , \
Var β̂0 jx =
∑i (xi x̄ )2 ∑i (xi x̄ )
2
Standard Errors
I Standard errors: Estimators of the standard deviations of
β̂0 and β̂1
q \ σ̂
se β̂1 jx = Var β̂1 jx = q
2
∑i (xi x̄ )
q
q \ σ̂ n 1 ∑ni=1 xi2
se β̂0 jx = Var β̂0 jx = q
2
∑i (xi x̄ )
p
I Here we used an intuitive candidate σb2 for an estimator
for σ

I Turns out σ̂ is a biased estimator for σ

I Hence se β̂1 and se β̂0 are biased, too


Chapter 3
Multiple Regression Analysis: Estimation
Big Picture
I Multiple regression (MR): Regression with more than one
regressors

y = β0 + β1 x1 + + βk xk + u

I Simple regressions are a special case of multiple


regressions with βi = 0 for i 2

I Again, OLSE β̂0 , β̂1 , . . . , β̂n minimizes the SSR


n
2
SSR = ∑ yi β̂0 β̂1 x1i β̂2 x2i β̂k xki
i =1

I Many properties found in simple regressions still hold


Outline

I Motivation: Advantage of using MR instead of SR?

I De…nition of the OLSE

I Derivation of the OLSE


Motivation for Multiple Regressions
Why Do We Want To Run Multiple Regressions?

I Accurate estimates of causality e¤ects

I Better prediction

I Flexible functional form


Example: Wage Function with More Than One
Observables

I Suppose wage is a function of (1) education, (2)


experience and (3) some other unobservable factors

wage = β0 + β1 educ + β2 exper + u

I Assume E (u jeduc, exper ) = 0

I Want to …nd an unbiased estimator for β1


World Without MR
I Suppose all we know is a simple regression

I Consider to run a SR of wage on educ by using all samples


wage = β0 + β1 educ + ũ
where ũ = u + β2 exper
I To maintain the unbiasedness of OLS estimator β̂1 , have
to assume E (exper jeduc ) = 0

I This assumption easily collapses if:


I College graduates tend to work in one job category (e.g.,
accountant) for long time
I High school graduates tend to work in various positions
(e.g., drivers, security guards)

I β̂1 could be biased


World Without MR

I More sophisticated option is to limit the data to workers


with the same experience exper

I Run regression by using this particular sample

wage = β̃0 + β1 educ + u


where β̃0 = β0 + β2 exper

I Since E (u jeduc ) = 0 holds, β̂1 is unbiased

I Might need to throw away many data points


World With MR

I With multiple regressions, we can estimate β1 without


giving up a lot of data points

I With multiple regressions, we can separate the impacts of


education from that of experience

I Moreover, if both education and experience a¤ect wage


(i.e., β1 6= 0 and β2 6= 0), R-squared always increases
Another Advantage: Inserting Nonlinearity
I Multiple regressions allow us to use a more ‡exible
functional form

I Suppose one’s consumption is a quadratic function of


one’s income

cons = β0 + β1 inc + β2 inc 2 + u


dcons
= β1 + 2β2 inc
dinc
I MR can estimate this model

I Simple regression cannot capture this type of nonlinearity


(even if you use log!)
De…nition of the OLSE
De…nition of OLSE in MR
I Model:
y = β0 + β1 x1 + + βk xk + u
I Data consist of n data points: fyi , xi 1 , xi 2 , . . . , xik gni=1

I xij : ith observation of jth regressor

I Want to …nd the value of β̂0 , β̂1 β̂k that minimizes


the SSR
n
2
minβ̂
0 , β̂1 β̂k , ∑ yi β̂0 β̂1 xi 1 β̂k xik
i =1
| {z }
SSR

I OLS estimators are the solution of this optimization


problem β̂0 , β̂1 β̂k
How to Calculate OLSE in MR

I As before, the problem is a mere minimization problem

I The solution β̂0 , β̂1 , . . . , β̂k must satisfy the following


necessary conditions:

1 eq ∑ni=1 yi β̂0 β̂1 xi 1 β̂k xik = 0


8
n
< ∑i =1 xi 1 yi
> β̂0 β̂1 xi β̂k xik = 0
k eqs ..
> .
:
∑ni=1 xik yi β̂0 β̂1 xi β̂k xik = 0

I Find (k + 1) unknowns from a system of (k + 1) linear


equations
How to Calculate OLSE in MR

I In SR, we had a simple expression of OLSE

∑ni=1 (yi ȳ ) (xi x̄ )


β̂1 = 2
∑ni=1 (xi x̄ )
β̂0 = ȳ β̂1 x̄

I In MR, the solution is too complicated to write down


without using matrices
1
Matrix representation of OLSE: ^
β = X 0X X0 y

I Manual calculation becomes a mess but computers can


do this very well
Summary
I Multiple regression: Regressions with multiple regressors

I Advantage
I More precise estimates

I Better prediction

I Allow more ‡exibility in functional forms

I Estimators: Obtained by minimizing SSR

I Lecture 4
I Relationship with SR

I Statistical properties

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy