0% found this document useful (0 votes)
8 views48 pages

Lecture 3-1_Introduction to Multiple Regression

Uploaded by

Hannes Du
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views48 pages

Lecture 3-1_Introduction to Multiple Regression

Uploaded by

Hannes Du
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

ECON 120B Econometrics

Lecture 3 Part 1: Introduction to Multiple Regression

Xinwei Ma

Department of Economics
UC San Diego

Spring 2020

c Xinwei Ma 2021 0/40


Outline

Omitted Variable Bias: An Example

Motivation for Multiple Regression

Mechanics and Interpretation in Multiple Regression

Finite-Sample Properties of OLS Estimates

Statistical Properties of OLS Estimates

Hypothesis Testing for an Individual Coefficient

c Xinwei Ma 2021 0/40


Omitted Variable Bias: An Example

 EXAMPLE. Class size and test score, the California data

• The California data contains test score and student-to-teacher ratio of 420 school districts. It is
a survey data, meaning that the class size (student-to-teacher ratio) is not randomized.

• We still believe there is a causal effect of class size (X ) on student performance (Y ), but there
might be other factors (u) affecting both X and Y at the same time.

• For example, a better funded school district is likely to have smaller classes, and attracts more
experienced teachers.
expenPerStu

stuTeacherRatio testScore
In this case, our zero conditional mean assumption is likely to fail: E[u|X ] 6= 0.

• The above causal diagram applies to many economic datasets. As social scientists, we rarely
have the luxury to conduct randomized experiments, and hence many economic studies rely on
careful analysis of the underlying causal mechanism and advanced econometric methods.

• We call a variable confounder if it affects both X and Y . We will discuss how to explicitly
control for confounding effects later in this class.

c Xinwei Ma 2021 1/40


Omitted Variable Bias: An Example

 To assess the consequence of violating the zero conditional mean assumption, we start
from the model
testScore = β0 + β1 stuTeacherRatio + u,
where the error term includes the variable expenPerStu.

 If we estimate β1 by regressing testScore on stuTeacherRatio, will our estimate be


consistent for the true causal effect?

• First, we expect the true causal effect to be negative:


lower stuTeacherRatio → smaller classes → better test score

• Now consider the conditional expectation of the error term. More specifically, assume the error
term contains expenPerStu (funding status of a school district).

• If we observe a school district with small classes, then it is likely that this school district is better
funded, and vice versa. Therefore,
E[expenPerStu|stuTeacherRatio is large] < E[expenPerStu|stuTeacherRatio is small].

• Therefore, our estimate, β̂1 , will over estimate the true causal effect. That is, β̂1 < β1 < 0 in
large samples.

c Xinwei Ma 2021 2/40


Omitted Variable Bias: An Example

 The previous analysis tells us that the slope estimate is biased, and provides qualitative
analysis of this bias.

 To understand the magnitude of this bias, we need quantitative analysis. We compare


the two models
Short regression: testScore = β0 + β1 stuTeacherRatio + ushort
Long regression: testScore = γ0 + β1 stuTeacherRatio + β2 expenPerStu + ulong

• These are two models about the relationship between class size and student performance.

• The difference is that the funding status of a school district (i.e., expenPerStu) is absorbed into
the error term in the short regression, while in the long regression, this information is separated
from the error term.

• Separating a variable from the error term may affect the intercept, but this is not relevant for our
purpose, as our primary focus is on the slope parameter.

• The slope parameter remains the same. Remember, the slope parameter represents the causal
effect of class size on student performance, which does not depend on how we model the
variables.

• Key assumption:
E[ulong |stuTeacherRatio, expenPerStu] = 0, but E[ushort |stuTeacherRatio] 6= 0
c Xinwei Ma 2021 3/40
Omitted Variable Bias: An Example

Short regression: testScore = β0 + β1 stuTeacherRatio + ushort


Long regression: testScore = γ0 + β1 stuTeacherRatio + β2 expenPerStu + ulong

 The zero conditional mean assumption, E[ulong |stuTeacherRatio, expenPerStu] = 0,


allows us to consistently estimate β1 by regressing testScore on both stuTeacherRatio
and expenPerStu. (More later.)

• Because we include both stuTeacherRatio and expenPerStu explicitly in the long regression,
we believe this assumption is more plausible. It is, however, still an assumption, which means it
may be violated in practice.

 Consider the short regression, and assume we obtained β̂1,short by regressing testScore
on stuTeacherRatio only. We know that β̂1,short is biased for β1 (i.e., inconsistent).

• The exact bias can be calculated as


sample cov of stuTeacherRatio and ushort
β̂1,short = β1 +
sample var of stuTeacherRatio
sample cov of stuTeacherRatio and (β2 expenPerStu + ulong )
= β1 +
sample var of stuTeacherRatio

p Cov[stuTeacherRatio, expenPerStu]
→ β1 + β2 ×
|{z} |{z} V[stuTeacherRatio]
<0 >0 | {z }
<0
c Xinwei Ma 2021 4/40
Omitted Variable Bias: An Example

 SUMMARIZE. We consider the two regressions


Short regression: testScore = β0 + β1 stuTeacherRatio + ushort
Long regression: testScore = γ0 + β1 stuTeacherRatio + β2 expenPerStu + ulong

 It is more plausible to assume E[ulong |stuTeacherRatio, expenPerStu] = 0, because we


explicitly control expenPerStu in the long regression. As a result, the long regression
seems more suitable for estimating the causal effect of class size on student performance
(β1 ).

 We also computed the bias of β̂1,short , which is obtained from running the short
regression
p Cov[stuTeacherRatio, expenPerStu]
β̂1,short → β1 + β2 × .
V[stuTeacherRatio]

Cov[stuTeacherRatio, expenPerStu]
• The term, β2 is called the omitted variable bias (OVB).
V[stuTeacherRatio]

• OVB arises under two conditions:


1. The omitted variable is correlated with the regressor.

2. The omitted variable affects the dependent variable.


c Xinwei Ma 2021 5/40
Outline

Omitted Variable Bias: An Example

Motivation for Multiple Regression

Mechanics and Interpretation in Multiple Regression

Finite-Sample Properties of OLS Estimates

Statistical Properties of OLS Estimates

Hypothesis Testing for an Individual Coefficient

c Xinwei Ma 2021 5/40


Motivation for Multiple Regression

 EXAMPLE. Class size and test score.

testScore = β0 + β1 stuTeacherRatio + β2 expenPerStu + u

where expenPerStu is the average expenditure per student in a school district.

• Primarily interested in β1 , but β2 is of some interest, too. β1 still represents the effect of class
size on test score, holding all other factors constant:
∆testScore = β1 ∆stuTeacherRatio + β2 ∆expenPerStu + ∆u
= β1 ∆stuTeacherRatio (∆expenPerStu = 0, ∆u = 0).

• By explicitly including expenPerStu in the equation, we have taken it out of the error term.

• If expenPerStu is a good proxy for a school district’s funding status, this may lead to a more
persuasive estimate of the causal effect of class size, because it is more plausible that the zero
conditional mean assumption holds in our data: E[u|stuTeacherRatio, expenPerStu] = 0.

c Xinwei Ma 2021 6/40


Motivation for Multiple Regression

 EXAMPLE. Missing lectures and final exam score.

• In a simple regression approach, we would relate final to missed (number of lectures missed).

• But part of the error term would be “student ability.”

• Instead, we can consider a multiple regression


final = β0 + β1 missed + β2 priGPA + u.
Here priGPA, grade point average at the beginning of the quarter, is added to account for
systematic differences in students that might affect performance and be correlated with missed
lectures.

• We are primarily interested in β1 . It is unclear if β2 bears any causal interpretation: We include


priGPA simply as a proxy for “student ability.”

c Xinwei Ma 2021 7/40


Motivation for Multiple Regression

 EXAMPLE. Education and wage: nonlinear effects.

ln(wage) = β0 + β1 educ + β2 iq + β3 exper + β4 exper2 + u,


so that experience is allowed to have a quadratic effect on ln(wage).

• We already know that 100 · β1 is the percentage change in wage when education increases by
one year. 100 · β2 has a similar interpretation (for a one point increase in iq).
• β3 and β4 are harder to interpret, but we can use calculus to get the slope of ln(wage) with
respect to exper:
∂ln(wage)
= β3 + 2β4 exper
∂exper
Multiply by 100 to get the percentage effect. (More later.)

c Xinwei Ma 2021 8/40


Motivation for Multiple Regression

 Generally, we can write a model with two regressors as

Y = β0 + β1 X1 + β2 X2 + u,
where
• β0 is the intercept,

• β1 measures the change in Y with respect to X1 , holding other factors (X2 and u) fixed,

• β2 measures the change in Y with respect to X2 , holding other factors (X1 and u) fixed.

 The key assumption is now how u is related to X1 and X2 : E[u|X1 , X2 ] = 0.

• For any values of X1 and X2 in the population, the average unobservable is equal to zero. (The
value zero is not important because we have an intercept, β0 in the equation.)

• In the class size example, the assumption is E[u|stuTeacherRatio, expenPerStu] = 0. Now u


no longer contains “funding status” (we hope), and so this condition has a better chance of
being true. In simple/short regression, we had to assume expenPerStu and stuTeacherRatio
are unrelated to justify leaving expenPerStu in the error term.

• Other factors, such as “motivation” and “ teacher’s experience” are part of u. Motivation is very
difficult to measure. Experience is easier:
testScore = β0 + β1 stuTeacherRatio + β2 expenPerStu + β3 teacherExp + u

c Xinwei Ma 2021 9/40


Motivation for Multiple Regression

 The multiple linear regression model can be written in the population as

Y = β0 + β1 X1 + β2 X2 + . . . + βk Xk + u.

• β0 is the intercept parameter.


• β1 is the parameter associated with X1 , β2 is the parameter associated with X2 , etc. They are
called slope parameters.
• There are k + 1 (unknown) population parameters.

 Multiple regression allows us to explicitly control for “other factors” and incorporate more
flexible functional forms.

 Key assumption for the general multiple regression model: the zero conditional mean
assumption
E[u|X1 , X2 , · · · , Xk ] = 0.

• Provided we are careful, we can make this condition closer to being true by “controlling for”
more variables. In the class size example, we “control for” expenditure per student when
estimating the effect of class size on student performance.

c Xinwei Ma 2021 10/40


Outline

Omitted Variable Bias: An Example

Motivation for Multiple Regression

Mechanics and Interpretation in Multiple Regression

Finite-Sample Properties of OLS Estimates

Statistical Properties of OLS Estimates

Hypothesis Testing for an Individual Coefficient

c Xinwei Ma 2021 10/40


Mechanics and Interpretation in Multiple Regression

 We want to estimate the k + 1 population parameters in

Y = β0 + β1 X1 + β2 X2 + · · · + βk Xk + u.

 As in the simple regression case, we motivate the estimation technique (estimators) by


the zero conditional mean assumption E[u|X1 , X2 , · · · , Xk ] = 0.

 Let (Xi1 , Xi2 , · · · , Xik , Yi ) : i = 1, 2, · · · , n be a sample of size n (the number of


observations) from the population. Think of this as a random sample.

• Plug any observation into the population equation:


Yi = β0 + β1 Xi1 + β2 Xi2 + · · · + βk Xik + ui , i = 1, 2, · · · , n,
where the i subscript indicates a particular observation.

• Now the regressors have two subscripts: i is the observation number (as always) and the second
subscript labels a particular variable.

− Example (k = 2): Xi1 = stuTeacherRatioi and Xi2 = expenPerStui where i = 1, · · · , n.

• More generally, we use


Xij , 1 ≤ i ≤ n, 1≤j ≤k
to denote the ith observation of the jth regressor.

c Xinwei Ma 2021 11/40


Mechanics and Interpretation in Multiple Regression

 We start from the zero conditional mean assumption

E[u|X1 , X2 , · · · , Xk ] = 0.

 Using iterated expectation, we obtain

0 = E[u] 0 = E[X1 u] 0 = E[X2 u] ··· 0 = E[Xk u].

 Now plug in u = Y − β0 − β1 X1 − β2 X2 − · · · − βk Xk , we obtain the k + 1 population


moment conditions
 
0=E Y − β0 − β1 X1 − β2 X2 − · · · − βk Xk
 
0 = E X1 (Y − β0 − β1 X1 − β2 X2 − · · · − βk Xk )
 
0 = E X2 (Y − β0 − β1 X1 − β2 X2 − · · · − βk Xk )
..
.
 
0 = E Xk (Y − β0 − β1 X1 − β2 X2 − · · · − βk Xk ) .

These are the conditions in the population that determine the parameters. So we use
their sample analogs, which is a method of moments approach to estimation.

c Xinwei Ma 2021 12/40


Mechanics and Interpretation in Multiple Regression

 We define our estimates and the solution to the following sample moment conditions
n
1 X 
0= Yi − β̂0 − β̂1 Xi1 − β̂2 Xi2 − · · · − β̂k Xik
n i=1
n
1 X 
0= Xi1 (Yi − β̂0 − β̂1 Xi1 − β̂2 Xi2 − · · · − β̂k Xik )
n i=1
n
1 X 
0= Xi2 (Yi − β̂0 − β̂1 Xi1 − β̂2 Xi2 − · · · − β̂k Xik )
n i=1
.
..
n
1 X 
0= Xik (Yi − β̂0 − β̂1 Xi1 − β̂2 Xi2 − · · · − β̂k Xik ) .
n i=1

 The above are k + 1 equations with k + 1 unknowns, which allows us to obtain the
estimates β̂0 , β̂1 , β̂2 , · · · , β̂k .

 To obtain expressions for the estimates, β̂0 , β̂1 , β̂2 , · · · , β̂k , we need to use linear algebra
(not required). Fortunately, modern statistical platforms (such as Stata) can compute the
estimates very fast.
c Xinwei Ma 2021 13/40
Mechanics and Interpretation in Multiple Regression

 EXAMPLE. Class size and test score.

• Sample of 420 California school districts.

• We first regress testScore on stuTeacherRatio

Short regression: \ = 698.92 −2.28 stuTeacherRatio.


testScore
Simple regression estimate implies that, say, a one unit decrease in student-to-teacher ratio is
associated with 2.28 points increase in test score.

• Now we control for expenPerStu status (expenPerStu)

Long regression: \ = 675.60 −1.76 stuTeacherRatio + 0.002 expenPerStu.


testScore

• The coefficient on stuTeacherRatio is still negative, but it is now smaller in magnitude.

• If we believe that the “long regression” better reflects a causal relationship between class size
and test score (because the zero conditional mean assumption is more plausible after we
explicitly control for funding status of a school district), then the “short regression” is likely to
over-state the effect of class size.

c Xinwei Ma 2021 14/40


Mechanics and Interpretation in Multiple Regression
 EXAMPLE. Missed classes and final exam score.

• Sample of 680 students. Attendance recorded electronically.

• We first regress final (out of 40 points) on missed (lectures missed out of 32) in a simple
regression analysis. Then we add priGPA (prior GPA) as a control for student ability:
Short regression: \
f inal = 26.60 −0.121 missed.

• Simple regression estimate implies that, say, 10 more missed classes reduces the predicted score
by about 1.2 points (out of 40).

• Now we control for prior GPA:


Long regression: \
f inal = 17.42 + 0.017 missed + 3.24 priGPA.

• The coefficient on missed actually becomes positive, but it is very small. (Later, we will see it is
not statistically different from zero.)

• The coefficient on priGPA means that one more point on prior GPA (for example, from 2.5 to
3.5) predicts a final exam score that is 3.24 points higher. However, it is unclear if this reflects
any interesting causal relationship.

• If we believe the “long regression” better reflects a causal relationship between attendance and
class performance (because the zero conditional mean assumption is more plausible after we
explicitly control for student ability), then the “short regression” is likely to over-state the effect
of attendance.
c Xinwei Ma 2021 15/40
Outline

Omitted Variable Bias: An Example

Motivation for Multiple Regression

Mechanics and Interpretation in Multiple Regression

Finite-Sample Properties of OLS Estimates

Statistical Properties of OLS Estimates

Hypothesis Testing for an Individual Coefficient

c Xinwei Ma 2021 15/40


Finite-Sample Properties of OLS Estimates

 For each i, we define the fitted (predicted) value


Ŷi = β̂0 + β̂1 Xi1 + β̂2 Xi2 + · · · + β̂k Xik .
This is also known as the sample regression line.

 We also define the regression residual


ûi = Yi − Ŷi = Yi − β̂0 − β̂1 Xi1 − β̂2 Xi2 − · · · − β̂k Xik .

 Some properties:

n
• The residuals always sum to zero,
P
ûi = 0. This implies Y = Ŷ .
i=1

n
• Each regressor has a zero sample correlation (covariance) with the residual,
P
Xij ûi = 0 for all
i=1
1 ≤ j ≤ k. This follows from the sample moment conditions. It implies that Ŷi and ûi are also
n
P
uncorrelated, Ŷi ûi = 0
i=1

• The sample averages always fall on the regression line:


Y = β̂0 + β̂1 X 1 + β̂2 X 2 + · · · + β̂k X k
That is, if we plug in the sample average for each regressor, the fitted value is the sample
average of the Yi .
c Xinwei Ma 2021 16/40
Finite-Sample Properties of OLS Estimates

 As with simple regression, it can be shown that

SST = SSE + SSR,

where SST, SSE, and SSR are the total, explained, and residual sum of squares:
n
X n
X n
X
SST = (Yi − Y )2 , SSE = (Ŷi − Y )2 , SSR = ûi2 .
i=1 i=1 i=1

 We define the R-squared as before:

SSE SSR
R2 = =1−
SST SST

• In addition to 0 ≤ R 2 ≤ 1 and the interpretation of R 2 , there is a useful fact to remember:


using the same set of data and the same dependent variable, R 2 can never fall when another
regressor is added to the regression.
• Adding another regressor cannot make SSR increase. The SSR falls unless the coefficient on the
new variable is identically zero.
• This means that, if we focus on R 2 , we might include silly variables in our regression.

• R 2 is a useful summary measure but tells us nothing about causality. Having a “high”
R-squared is neither necessary nor sufficient to infer causality.

c Xinwei Ma 2021 17/40


Finite-Sample Properties of OLS Estimates

 We will illustrate how to use Stata for multiple regression with the dataset
Data-classSize-testScore.dta

 This dataset contains information on 420 school districts in California. We will focus on
three variables, testScore, stuTeacherRatio, and expenPerStu.

 To estimate the regression model

testScore = β0 + β1 stuTeacherRatio + β2 expenPerStu + u,

we simply use the command: reg testScore stuTeacherRatio expenPerStu


. reg testScore stuTeacherRatio expenPerStu

Source | SS df MS Number of obs = 420


-------------+---------------------------------- F(2, 417) = 12.23
Model | 8428.77883 2 4214.38941 Prob > F = 0.0000
Residual | 143672.672 417 344.538781 R-squared = 0.0554
-------------+---------------------------------- Adj R-squared = 0.0509
Total | 152101.45 419 363.010621 Root MSE = 18.562

---------------------------------------------------------------------------------
testScore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
stuTeacherRatio | -1.763149 .6108781 -2.89 0.004 -2.963933 -.5623648
expenPerStu | .0024835 .0018231 1.36 0.174 -.0011 .006067
_cons | 675.596 19.56124 34.54 0.000 637.1451 714.047
---------------------------------------------------------------------------------

c Xinwei Ma 2021 18/40


Finite-Sample Properties of OLS Estimates

 How to obtain the fitted values


\
testScore = β̂0 + β̂1 stuTeacherRatio + β̂2 expenPerStu ?

 Method 1: Use the formula:

• Stata stores the estimates in an object called e(b). (Technically, this is a 1 × 3 matrix.) To
show the slope estimate, we use disp e(b)[1,1] and disp e(b)[1,2]. We can also access the
intercept estimate by disp e(b)[1,3].

• We define a new variable, testScoreHat to store the fitted values:


gen testScoreHat = e(b)[1,3] + e(b)[1,1] * stuTeacherRatio + e(b)[1,2] *
expenPerStu

 Method 2: Automatic:

• After running the regression, we can use the predict command to compute the fitted values:
predict testScoreHat2

• The above tells Stata to generate all fitted values and store them in a new variable called
testScoreHat2.

 We can check if the two approaches agree. Indeed, the two variables, testScoreHat and
testScoreHat2, are identical:
count if testScoreHat == testScoreHat2
c Xinwei Ma 2021 19/40
Finite-Sample Properties of OLS Estimates

 How to obtain the residuals

û = testScore − β̂0 − β̂1 stuTeacherRatio − β̂2 expenPerStu ?

 Method 1: Use the formula:

• We define a new variable, resid to store the residuals:


gen resid = testScore - e(b)[1,3] - e(b)[1,1] * stuTeacherRatio - e(b)[1,2] *
expenPerStu

 Method 2: Automatic:

• After running the regression, we can use the predict command to compute the residuals:
predict resid2, residual

• The above tells Stata to generate all residuals and store them in a new variable called resid2.

 We can check if the two approaches agree. Indeed, the two variables, resid and resid2,
are identical:
count if resid == resid2

c Xinwei Ma 2021 20/40


Finite-Sample Properties of OLS Estimates

 We mentioned a few properties of regressions residuals. These properties can be


illustrated using Stata.

n
• Residuals sum up to zero:
P
ûi = 0
i=1
. summ resid

Variable | Obs Mean Std. Dev. Min Max


-------------+---------------------------------------------------------
resid | 420 7.27e-09 18.5174 -47.46049 48.436

. disp r(sum)
3.055e-06
The final result is very close to zero (it is not exactly zero due to numerical errors)

• The residual and the regressors have zero sample covariance (correlation):
1P n 1P n
ûi Xi = 0, and ûi Ŷi = 0
n i=1 n i=1
. corr resid stuTeacherRatio expenPerStu testScoreHat
(obs=420)

| resid stuTea~o expenP~u testSc~t


-------------+------------------------------------
resid | 1.0000
stuTeacher~o | 0.0000 1.0000
expenPerStu | 0.0000 -0.6200 1.0000
testScoreHat | 0.0000 -0.9613 0.8121 1.0000
The final result is very close to zero (it is not exactly zero due to numerical errors.)

c Xinwei Ma 2021 21/40


Finite-Sample Properties of OLS Estimates

 REMARK. Reporting estimation results for several models.

• Assume a researcher is interested in estimating the effect of class on test score. She considers
four different models (specifications):
testScore = β0 + β1 stuTeacherRatio +u
testScore = β0 + β1 stuTeacherRatio + β2 expenPerStu +u
testScore = β0 + β1 stuTeacherRatio + β3 fracEnglish + u
testScore = β0 + β1 stuTeacherRatio + β2 expenPerStu + β3 fracEnglish + u.

• It is possible to report estimation results as individual regression lines, such as


\ = 698.922 − 2.279 stuTeacherRatio
testScore
\ = 675.596 − 1.763 stuTeacherRatio + 0.002 expenPerStu
testScore
\ = 686.027 − 1.101 stuTeacherRatio
testScore − 0.650 fracEnglish
\ = 649.581 − 0.286 stuTeacherRatio + 0.004 expenPerStu − 0.656 fracEnglish.
testScore
But this approach is cumbersome.

c Xinwei Ma 2021 22/40


Finite-Sample Properties of OLS Estimates

Dependent variable: Test score

Model

(1) (2) (3) (4)

Student-teacher ratio −2.279 −1.763 −1.101 −0.286

Expenditure per student 0.002 0.004

Fraction learning English −0.650 −0.656

Intercept 698.922 675.596 686.027 649.581

R2 0.051 0.055 0.426 0.437

n 420 420 420 420

 REMARK. Reporting estimation results for several models.

• In general, we report estimation results in a table with each column representing one model
(specification).

• A missing entry indicates that a regressor is not included.

c Xinwei Ma 2021 23/40


Outline

Omitted Variable Bias: An Example

Motivation for Multiple Regression

Mechanics and Interpretation in Multiple Regression

Finite-Sample Properties of OLS Estimates

Statistical Properties of OLS Estimates

Hypothesis Testing for an Individual Coefficient

c Xinwei Ma 2021 23/40


Statistical Properties of OLS Estimates

 Assumption 1: Linear Model


The population model can be written as

Y = β0 + β1 X1 + β2 X2 + · · · + βk Xk + u,
where β0 , β1 , β2 , · · · , βk are the (unknown) population parameters.

• We view X1 , X2 , · · · , Xk and u as outcomes of random variables; thus, Y is of course random.

• Stating this assumption formally shows that our goal is to estimate the parameters.

 Assumption 2: Random Sampling


We have a random sample, {(Xi1 , Xi2 , · · · , Xik , Yi ) : i = 1, ..., n}, following the
population model.

• The observations are independently and identically distributed (iid).

• We know how to use these data to estimate β̂j for j = 0, 1, 2, · · · , k by OLS.

• Because each i is a draw from the population, we can write


Yi = β0 + β1 Xi1 + β2 Xi2 + · · · + βk Xik + ui .
• Notice that ui here is the unobserved error for observation i. It is not the residual that we
compute from the data!
c Xinwei Ma 2021 24/40
Statistical Properties of OLS Estimates

 Assumption 3: Zero Conditional Mean


In the population, the error term has zero mean given any value of the regressors:
E[u|X1 , X2 , · · · , Xk ] = 0.

• This is the key assumption for showing that OLS is unbiased.

• We emphasized its importance if we would like to draw causal relationship from data.

 Assumption 4: No Perfect Multicollinearity


None of the regressors is constant, and there are no exact linear relationships among
them.

• Rules out {Xij : i = 1, · · · , n} has no variation for some j.

• Rules out the (extreme) case that one (or more) of the regressors is an exact linear function of
the others.

• If, say, Xi1 is an exact linear function of Xi2 , · · · , Xik in the sample, we say the model suffers
from perfect multicollinearity.

c Xinwei Ma 2021 25/40


Statistical Properties of OLS Estimates

 Assumption 5: Finite Moments


Both the regressors and the error term have finite fourth moments: E[Xj4 ] < ∞ for all
j = 1, 2, · · · , k, and E[u 4 ] < ∞.

• This is a technical condition which allows us to compute the variance of the OLS estimates.

• This condition can be understood as “no outliers.”

c Xinwei Ma 2021 26/40


Statistical Properties of OLS Estimates

 REMARK. Perfect multicollinearity.

• Perfect multicollinearity can arise if n < k + 1. That is, if we include more regressors than the
sample size.

− This is rarely a concern in practice, because we usually have thousands of observations but only a few
regressors.

• Usually, perfect multicollinearity is the result of a bad model specification.

− For example, if we include both college (equals 1 for college graduates) and nonCollege (equals 1 for
non college graduates) in our regression, then we will have college + nonCollege = 1, which is a perfect
linear relationship.

− This does not prevent us from including nonlinear transformations of a regressor. For example, we can
include both exper (experience) and exper2 .

• Under perfect multicollinearity, there are no unique OLS estimators. Stata and other statistical
packages will indicate a problem.

c Xinwei Ma 2021 27/40


Statistical Properties of OLS Estimates

 Under assumptions 1–5, the OLS estimates are unbiased:

E[β̂j ] = βj , j = 0, 1, 2, · · · , k.

• This unbiasedness result relies crucially on the zero conditional mean assumption.

• Often the hope is that if our focus is on, say, X1 , we can include enough other variables in
X2 , · · · , Xk to make the zero conditional mean assumption true, or “close” to be true.

• The easiest proof requires matrix algebra.

 The unbiasedness result allows for the βj to be any value, including zero.

• Suppose that we specify the model


testScore = β0 + β1 stuTeacherRatio + β2 expenPerStu + β3 fracEnglish + u,
where our assumptions 1–5 hold. Suppose that β3 = 0, but we do not know that. We estimate
the full model by OLS:
\ = β̂0 + β̂1 stuTeacherRatio + β̂2 expenPerStu + β̂3 fracEnglish + u,
testScore
• We automatically know from the unbiasedness result that
E[β̂j ] = βj , j = 0, 1, 2, and E[β̂3 ] = 0
• Therefore, including an irrelevant variable, or overspecifying the model, does not cause bias in
any coefficients.
c Xinwei Ma 2021 28/40
Statistical Properties of OLS Estimates

 Under assumptions 1–5, the OLS estimates are consistent and asymptotically normal:

 
p  d
β̂j → βj , n β̂j − βj → N 0, σβ̂2 , j = 0, 2, · · · , k, n → ∞.
j

Recall that σ 2 is called the asymptotic variance of β̂j .


β̂j

• In practice, we interpret consistency as “β̂j is close to βj with high probability in large samples.”

• We also write asymptotic normality as distributional approximation


 
a 1 2
β̂j ∼ N βj , σ̂β̂ , for n large.
n j
2
where σ̂β̂ is some consistent estimator of the asymptotic variance. For simplicity, we define the
j

standard error of β̂j as se(β̂j ) = σ̂β̂ / n, then (β̂j − βj )/se(β̂j ) is approximately normally
j
distributed with mean 0 and variance 1.

• The exact formula of the standard error is complicated. We rely on statistical packages, such as
Stata, for computation.

 We emphasize that both consistency and asymptotic normality require the zero
conditional mean assumption.

c Xinwei Ma 2021 29/40


Statistical Properties of OLS Estimates

Dependent variable: Test score


Model
(1) (2) (3) (4)
Student-teacher ratio −2.279 −1.763 −1.101 −0.286
(0.480) (0.611) (0.380) (0.481)
Expenditure per student 0.002 0.004
(0.002) (0.001)
Fraction learning English −0.650 −0.656
(0.039) (0.039)
Intercept 698.922 675.596 686.027 649.581
(9.467) (19.561) (7.412) (15.207)
R2 0.051 0.055 0.426 0.437
n 420 420 420 420
Note. Standard errors are given in parentheses.

 REMARK. Reporting estimation results for several models.


• In general, we report estimation results in a table with each column representing one model
(specification).
• Standard errors are reported under the estimated regression coefficients. In Stata, use the
robust option with reg. Otherwise Stata will compute standard errors assuming
homoskedasticity.
• A missing entry indicates that a regressor is not included.

c Xinwei Ma 2021 30/40


Outline

Omitted Variable Bias: An Example

Motivation for Multiple Regression

Mechanics and Interpretation in Multiple Regression

Finite-Sample Properties of OLS Estimates

Statistical Properties of OLS Estimates

Hypothesis Testing for an Individual Coefficient

c Xinwei Ma 2021 30/40


Hypothesis Testing for an Individual Coefficient

 RECALL. We started from the population model


Y = β0 + β1 X1 + β2 X2 + · · · βk Xk + u,
and the zero conditional mean assumption E[u|X1 , X2 , · · · , Xk ] = 0.

 Under assumptions 1–5, the estimates, are consistent


p
β̂j → βj , as n → ∞, j = 0, 1, 2, · · · , k.

• In practice, we always have a finite sample, and we interpret consistency as β̂j should not be far
from βj in samples of reasonable size.

 The estimates are also asymptotically normal


√    
d
n β̂j − βj → N 0, σβ̂2 , as n → ∞, j = 0, 1, 2, · · · , k..
j

• σ 2 is called the asymptotic variance of β̂j , which is unknown. Let σ̂ 2 be a consistent


β̂j β̂j

estimate, we define the standard error se(β̂j ) = σ̂β̂ / n.
j

• We interpret asymptotic normality as β̂j is approximately normally distributed with mean βj


and variance se(β̂j )2 . Therefore, we also write
a

2
 β̂j − βj a
β̂j ∼ N βj , se(β̂j ) or ∼ N(0, 1), for n large.
se(β̂j )
c Xinwei Ma 2021 31/40
Hypothesis Testing for an Individual Coefficient
 EXAMPLE. Class size and test score
• Test scores and class sizes in 1990 in 420 California school districts that serve kindergarten
through eighth grade.
• Y (testScore): districtwise average of reading and math scores for fifth graders. X1
(stuTeacherRatio): districtwise student-to-teacher ratio. X2 (expenPerStu): districtwise
expenditure per student.

 The estimation result is (recall that we need the robust option for valid standard errors;
otherwise Stata will compute Standard errors under homoskedasticity)

\
testScore = 675.596 − 1.763 stuTeacherRatio + 0.002 expenPerStuRatio
(18.844) (0.592) (0.002)

• Does class size have a (statistically) significant effect on test score?


• Does more expenPerStu increase student performance?
. reg testScore stuTeacherRatio expenPerStu, robust

---------------------------------------------------------------------------------
| Robust
testScore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
stuTeacherRatio | -1.763149 .5920629 -2.98 0.003 -2.926949 -.5993493
expenPerStu | .0024835 .0018916 1.31 0.190 -.0012348 .0062018
_cons | 675.596 18.84424 35.85 0.000 638.5545 712.6376
---------------------------------------------------------------------------------

c Xinwei Ma 2021 32/40


Hypothesis Testing for an Individual Coefficient

 Hypothesis testing with two-sided alternative. Let β̂j be an OLS estimate with standard
error se(β̂j ).

• We want to test H0 : βj = c against H1 : βj 6= c.

β̂j − c
• We use T = as our test statistic.
se(β̂j )

• We use the following procedure


reject the null hypothesis if |T | > kα ,
where kα is called the critical value, computed from 2Φ(−kα ) = α
α 0.001 0.005 0.01 0.05 0.1
kα 3.291 2.807 2.576 1.960 1.645

 Equivalently, we can use confidence intervals or p-values for hypothesis testing.


h i
• CI1−α = β̂j − kα se(β̂j ) , β̂j + kα se(β̂j ) , and reject the null hypothesis if c 6∈ CI1−α .

• pVal = 2Φ(−|T |), and reject the null hypothesis if pVal < α.

c Xinwei Ma 2021 33/40


Hypothesis Testing for an Individual Coefficient
. reg testScore stuTeacherRatio expenPerStu, robust

---------------------------------------------------------------------------------
| Robust
testScore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
stuTeacherRatio | -1.763149 .5920629 (1) (2) (3) (4)
expenPerStu | .0024835 .0018916 1.31 0.190 -.0012348 .0062018
_cons | 675.596 18.84424 35.85 0.000 638.5545 712.6376
---------------------------------------------------------------------------------

 EXERCISE. Fill in (1)–(4)

• By default, Stata computes t-statistics for the null hypothesis H0 : β1 = 0. Therefore, (1) is
−1.763149 − 0
= −2.978 (disp -1.763149 / .5920629)
0.5920629

• We can apply the formula of p-values, which gives (2)


2Φ(2.9779758) = 0.0029. (disp 2 * normal(-1.763149 / .5920629))
Note that Stata uses the t distribution to compute p-values. We will only use the standard
normal distribution.

• Finally we apply the formula of confidence intervals, which gives (3)


−1.763149 − 1.960 × 0.5920629 = −2.9235923. (disp -1.763149 - 1.960 * .5920629)
Again Stata uses critical values from the t distribution to compute confidence intervals. We will
only use the normal critical values.
c Xinwei Ma 2021 34/40
Hypothesis Testing for an Individual Coefficient

. reg testScore stuTeacherRatio expenPerStu, robust

---------------------------------------------------------------------------------
| Robust
testScore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
stuTeacherRatio | -1.763149 .5920629 -2.98 0.003 -2.926949 -.5993493
expenPerStu | .0024835 .0018916 1.31 0.190 -.0012348 .0062018
_cons | 675.596 18.84424 35.85 0.000 638.5545 712.6376
---------------------------------------------------------------------------------

 EXERCISE. Test H0 : β1 = −1 with significance level 5% (i.e., α = 0.05).


Note that we cannot directly use the t statistic or the p-value, because they were
computed for the null hypothesis H0 : β1 = 0!

−1.763149 − (−1)
• Method 1: Compute the t-statistic as = −1.289, and its absolute value
0.5920629
does not exceed the critical value 1.960. Therefore, we do not reject the hypotheses
H0 : β1 = −1. You can also compute the p-value and compare it to 0.05.

• Method 2: Use the confidence interval. The null hypothesis, −1, is contained by the 95%
confidence interval, [−2.927, −0.599], and hence we do not reject this hypothesis.

• Method 3: Use the Stata command test stuTeacherRatio == -1 after running the regression.
This command will give an F-statistic (more later) and a p-value of 0.198. Since the p-value is
larger than 0.05, we do not reject the null hypothesis.

c Xinwei Ma 2021 35/40


Hypothesis Testing for an Individual Coefficient
. reg testScore stuTeacherRatio expenPerStu, robust

---------------------------------------------------------------------------------
| Robust
testScore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
stuTeacherRatio | -1.763149 .5920629 -2.98 0.003 -2.926949 -.5993493
expenPerStu | .0024835 .0018916 1.31 0.190 -.0012348 .0062018
_cons | 675.596 18.84424 35.85 0.000 638.5545 712.6376
---------------------------------------------------------------------------------

 EXERCISE. Test H0 : β1 = 0 with significance level 1% (i.e., α = 0.01).


Note that we cannot directly use the confidence interval, because it was computed at a
different confidence level (0.95 = 1 − 0.05)!

• Method 1: Use the t-statistic or the p-value. The absolute value of the t-statistic exceeds the
critical value 2.576, so that we reject the null hypothesis H0 : β1 = 0. This can also be seen
using the p-value, which is smaller than 0.01.

• Method 2: Re-run the regression with the option level(99), and Stata will compute the 99%
confidence interval. Again, we reject the null hypothesis because the 99% confidence interval
does not contain our null hypothesis, 0.
. reg testScore stuTeacherRatio expenPerStu, robust level(99)

---------------------------------------------------------------------------------
| Robust
testScore | Coef. Std. Err. t P>|t| [99% Conf. Interval]
----------------+----------------------------------------------------------------
stuTeacherRatio | -1.763149 .5920629 -2.98 0.003 -3.295213 -.2310853
expenPerStu | .0024835 .0018916 1.31 0.190 -.0024114 .0073784
_cons | 675.596 18.84424 35.85 0.000 626.8334 724.3587
c Xinwei---------------------------------------------------------------------------------
Ma 2021 36/40
Hypothesis Testing for an Individual Coefficient

 Hypothesis testing with one-sided alternative. Let β̂j be an OLS estimate with standard
error se(β̂j ).

• We want to test H0 : βj ≤ c against H1 : βj > c.

β̂j − c
• We use T = as our test statistic.
se(β̂j )

• We use the following procedure


reject the null hypothesis if T > kα ,
where kα is called the critical value, computed from Φ(−kα ) = α
α 0.001 0.005 0.01 0.05 0.1
kα 3.090 2.576 2.326 1.645 1.282

 Equivalently, we can use confidence intervals or p-values for hypothesis testing.


h 
• CI1−α = β̂j − kα se(β̂j ) , ∞ , and reject the null hypothesis if c 6∈ CI1−α .

• pVal = Φ(−T ), and reject the null hypothesis if pVal < α.

c Xinwei Ma 2021 37/40


Hypothesis Testing for an Individual Coefficient

. reg testScore stuTeacherRatio expenPerStu, robust

---------------------------------------------------------------------------------
| Robust
testScore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
stuTeacherRatio | -1.763149 .5920629 -2.98 0.003 -2.926949 -.5993493
expenPerStu | .0024835 .0018916 1.31 0.190 -.0012348 .0062018
_cons | 675.596 18.84424 35.85 0.000 638.5545 712.6376
---------------------------------------------------------------------------------

 EXERCISE. Test H0 : β1 ≤ −2 with significance level 5% (i.e., α = 0.05).


Note that we cannot directly use the t statistic or the p-value, because they were
computed for the null hypothesis H0 : β1 = 0!

• How to interpret this null hypothesis? One unit reduction in student-to-teacher ratio leads to at
least two units increase in test score.

−1.763149 − (−2)
• Compute the t-statistic as = 0.400, which does not exceed the critical value
0.5920629
1.645. Therefore, we do not reject the hypothesis H0 : β1 ≤ −2. You can also compute the
p-value and compare it to 0.05.

• Unfortunately, Stata does not compute one-sided confidence intervals or p-values for one-sided
tests.

c Xinwei Ma 2021 38/40


Hypothesis Testing for an Individual Coefficient

 Hypothesis testing with one-sided alternative. Let β̂j be an OLS estimate with standard
error se(β̂j ).

• We want to test H0 : βj ≥ c against H1 : βj < c.

β̂j − c
• We use T = as our test statistic.
se(β̂j )

• We use the following procedure


reject the null hypothesis if T < kα ,
where kα is called the critical value, computed from Φ(kα ) = α
α 0.001 0.005 0.01 0.05 0.1
kα -3.090 -2.576 -2.326 -1.645 -1.282

 Equivalently, we can use confidence intervals or p-values for hypothesis testing.


 i
• CI1−α = − ∞ , β̂j − kα se(β̂j ) , and reject the null hypothesis if c 6∈ CI1−α .

• pVal = Φ(T ), and reject the null hypothesis if pVal < α.

c Xinwei Ma 2021 39/40


Hypothesis Testing for an Individual Coefficient
. reg testScore stuTeacherRatio expenPerStu, robust

---------------------------------------------------------------------------------
| Robust
testScore | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
stuTeacherRatio | -1.763149 .5920629 -2.98 0.003 -2.926949 -.5993493
expenPerStu | .0024835 .0018916 1.31 0.190 -.0012348 .0062018
_cons | 675.596 18.84424 35.85 0.000 638.5545 712.6376
---------------------------------------------------------------------------------

 EXERCISE. Test H0 : β1 ≥ 0 with significance level 1% (i.e., α = 0.01).


We can use the t statistic (because our null hypothesis is 0), but not the p-value
(because the p-value was computed for a two-sided test).

• How to interpret this null hypothesis? Smaller class size does not help improve test score.

−1.763149 − (0)
• Compute the t-statistic as = −2.980, which exceeds the critical value −2.326
0.5920629
Therefore, we reject the hypothesis, and conclude that there is statistical evidence suggesting
students perform better in smaller classes. You can also compute the p-value and compare it to
0.01.

• Unfortunately, Stata does not compute one-sided confidence intervals or p-values for one-sided
tests.

 EXERCISE. Interpret the hypothesis H0 : β1 ≥ −1, and test with significance level 1%
(i.e., α = 0.01).
c Xinwei Ma 2021 40/40
The lectures and course materials, including slides, tests, outlines, and similar
materials, are protected by U.S. copyright law and by University policy. You may take
notes and make copies of course materials for your own use. You may also share those
materials with another student who is enrolled in or auditing this course.

You may not reproduce, distribute or display (post/upload) lecture notes or


recordings or course materials in any other way – whether or not a fee is charged –
without my written consent. You also may not allow others to do so.

If you do so, you may be subject to student conduct proceedings under the UC San
Diego Student Code of Conduct.

c Xinwei Ma 2021
x1ma@ucsd.edu

c Xinwei Ma 2021 40/40

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy