0% found this document useful (0 votes)
11 views42 pages

M05 StockWatson123635 03 Econ Ch05

Uploaded by

Ggv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views42 pages

M05 StockWatson123635 03 Econ Ch05

Uploaded by

Ggv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Chapter 5

Regression with a
Single Regressor:
Hypothesis Tests
and Confidence
Intervals

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


Outline

1. The standard error of


2. Hypothesis tests concerning β1
3. Confidence intervals for β1
4. Regression when X is binary
5. Heteroskedasticity and homoskedasticity

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-2
A big picture view of where we are going…

We want to learn about the slope of the population regression


line. We have data from a sample, so there is sampling
uncertainty. There are five steps towards this goal:
1. State the population object of interest
2. Provide an estimator of this population object
3. Derive the sampling distribution of the estimator (this
requires certain assumptions). In large samples this
sampling distribution will be normal by the CLT.
4. The square root of the estimated variance of the sampling
distribution is the standard error (SE) of the estimator
5. Use the SE to construct t-statistics (for hypothesis tests)
and confidence intervals.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-3
Object of interest: β1 in,

Yi = β0 + β1Xi + ui, i = 1,…, n


β1 = ΔY/ΔX, for an autonomous change in X (causal effect)

Estimator: the OLS estimator .

The Sampling Distribution of :


To derive the large-sample distribution of , we make the
following assumptions:

The Least Squares Assumptions:


1. E(u|X = x) = 0.
2. (Xi,Yi), i =1,…,n, are i.i.d.
3. Large outliers are rare (E(X4) < ∞, E(Y4) < ∞.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
5-4
The Sampling Distribution of , ctd.

Under the Least Squares Assumptions, for n


large, is approximately distributed,

  v2 
~ N  1 , 2 2
, where 𝑣𝑖 = (𝑋𝑖 – 𝜇𝑋)𝑢𝑖
 n( X ) 

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-5
Hypothesis Testing and the Standard Error of
(Section 5.1)

The objective is to test a hypothesis, like β1 = 0, using


data – to reach a tentative conclusion whether the (null)
hypothesis is correct or incorrect.

General setup
Null hypothesis and two-sided alternative:
H0: β1 = β1,0 vs. H1: β1 ≠ β1,0
where β1,0 is the hypothesized value under the null.

Null hypothesis and one-sided alternative:


H0: β1 ≥ β1,0 vs. H1: β1 < β1,0

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-6
General approach: construct t-statistic, and compute p-
value (or compare to the N(0,1) critical value)

estimator - hypothesized value


• In general: t=
standard error of the estimator
where the SE of the estimator is the square root of the
variance of the estimator.

• For testing the mean of Y: t = Y  Y ,0


sY / n
• For testing β1, t = ˆ1  1,0,
SE (ˆ1 )

where SE( ) = the square root of an estimator of the


variance of the sampling distribution of
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
5-7
Formula for SE( )

Recall the expression for the variance of (large n):

var[( X i   x )ui ]  v2
var( )= = , where vi = (Xi – μX)ui.
n( 2X )2 n( X )
2 2

The estimator of the variance of replaces the unknown


population values of and by estimators constructed
from the data:
1 n 2
1 estimator of  v2 1 
n  2 i 1
vˆi
=  = 
n (estimator of  2X )2 n 1 n

2

 n  (Xi  X ) 
2

where vˆi = ( X i  X )uˆi .  i 1 

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-8
1 n 2

n  2 i 1
vˆi
ˆ 2ˆ = 1
2
, where vˆi = ( X i  X )uˆi .
1 n 1 n 2
n  i ( X  X ) 
 i 1 

SE( )= ˆ 2ˆ = the standard error of


1

This is a bit nasty, but:


• It is less complicated than it seems. The numerator
estimates var(v), the denominator estimates [var(X)]2.
• Why the degrees-of-freedom adjustment n – 2? Because
two coefficients have been estimated (β0 and β1).
• SE( ) is computed by regression software
• Your regression software has memorized this formula so you
don’t need to.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-9
Summary: To test H0: β1 = β1,0 v. H1: β1 ≠ β1,0,

• Construct the t-statistic


ˆ1  1,0
t= =
ˆ 2ˆ
1

• Reject at 5% significance level if |t| > 1.96


• The p-value is p = Pr[|t| > |tact|] = probability in tails of
normal outside |tact|; you reject at the 5% significance level
if the p-value is < 5%.
• This procedure relies on the large-n approximation that is
normally distributed; typically n = 50 is large enough for the
approximation to be excellent.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-10
Example: Test Scores and STR, California data

Estimated regression line: ෣


𝑇𝑒𝑠𝑡 𝑆𝑐𝑜𝑟𝑒 = 698.9 − 2.28 ∗ 𝑆𝑇𝑅

Regression software reports the standard errors:

SE( ) = 10.4 SE( ) = 0.52

ˆ1  1,0 2.28  0


t-statistic testing β1,0 = 0 = = = –4.38
SE (ˆ1 ) 0.52
• The 1% 2-sided significance level is 2.58, so we reject the null at
the 1% significance level.
• Alternatively, we can compute the p-value…

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-11
The p-value based on the large-n standard normal
approximation to the t-statistic is 0.00001 (10–5)

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-12
Confidence Intervals for β1
(Section 5.2)

Recall that a 95% confidence is, equivalently:


• The set of points that cannot be rejected at the 5%
significance level;
• A set-valued function of the data (an interval that is a
function of the data) that contains the true parameter value
95% of the time in repeated samples.

Because the t-statistic for β1 is N(0,1) in large samples,


construction of a 95% confidence for β1 is just like the case of
the sample mean:

95% confidence interval for β1 = { ± 1.96×SE( )}

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-13
Confidence interval example: Test Scores and STR
Estimated regression line: TestScore = 698.9 – 2.28×STR

SE( ) = 10.4 SE( ) = 0.52

95% confidence interval for :

{ ± 1.96×SE( )} = {–2.28 ± 1.96×0.52}

= (–3.30, –1.26)

The following two statements are equivalent (why?)


• The 95% confidence interval does not include zero;
• The hypothesis β1 = 0 is rejected at the 5% level

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-14
A concise (and conventional) way to report regressions:
Put standard errors in parentheses below the estimated
coefficients to which they apply.

TestScore = 698.9 – 2.28×STR, R2 = .05, SER = 18.6


(10.4) (0.52)

This expression gives a lot of information


• The estimated regression line is
TestScore = 698.9 – 2.28×STR

• The standard error of is 10.4

• The standard error of is 0.52

• The R2 is .05; the standard error of the regression is 18.6

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-15
OLS regression: reading STATA output
• regress testscr str, robust

• Regression with robust standard errors Number of obs = 420
• F( 1, 418) = 19.26
• Prob > F = 0.0000
• R-squared = 0.0512
• Root MSE = 18.581
• -------------------------------------------------------------------------
• | Robust
• testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
• --------+----------------------------------------------------------------
• str | -2.279808 .5194892 -4.38 0.000 -3.300945 -1.258671
• _cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
• -------------------------------------------------------------------------

so:
TestScore = 698.9 – 2.28×STR, , R2 = .05, SER = 18.6
(10.4) (0.52)
t (β1 = 0) = –4.38, p-value = 0.000 (2-sided)
95% 2-sided conf. interval for β1 is (–3.30, –1.26)
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
5-16
Summary of statistical inference about β0 and β1

Estimation:
• OLS estimators and
• and have approximately normal sampling distributions in
large samples
Testing:
• H0: β1 = β1,0 v. β1 ≠ β1,0 (β1,0 is the value of β1 under H0)
• t = ( – β1,0)/SE( )
• p-value = area under standard normal outside tact (large n)
Confidence Intervals:
• 95% confidence interval for β1 is { ± 1.96×SE( )}
• This is the set of β1 that is not rejected at the 5% level
• The 95% CI contains the true β1 in 95% of all samples.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-17
Regression when X is Binary
(Section 5.3)

Sometimes a regressor is binary:


• X = 1 if small class size, = 0 if not
• X = 1 if female, = 0 if male
• X = 1 if treated (experimental drug), = 0 if not

Binary regressors are sometimes called “dummy” variables.

So far, β1 has been called a “slope,” but that doesn’t make


sense if X is binary.

How do we interpret regression with a binary regressor?

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-18
Interpreting regressions with a binary regressor
Yi = β0 + β1Xi + ui, where X is binary (Xi = 0 or 1):

When Xi = 0, Yi = β0 + ui
• the mean of Yi is β0
• that is, E(Yi|Xi=0) = β0

When Xi = 1, Yi = β0 + β1 + ui
• the mean of Yi is β0 + β1
• that is, E(Yi|Xi=1) = β0 + β1

so:
β1 = E(Yi|Xi=1) – E(Yi|Xi=0)
= population difference in group means

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-19
 1 if STR  20

Example: Let Di = 
i

 0 if STRi  20
OLS regression: TestScore = 650.0 + 7.4×D
(1.3) (1.8)
Tabulation of group means:

Class Size Average score ( ) Std. dev. (sY) N


Small (STR < 20) 657.4 19.4 238

Large (STR ≥ 20) 650.0 17.9 182

Difference in means: Ysmall  Ylarge = 657.4 – 650.0 = 7.4

ss2 sl2 19.42 17.92


Standard error SE =  =  = 1.8
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
ns nl 238 182 5-20
Summary: regression when Xi is binary (0/1)

Yi = β0 + β1Xi + ui

• β0 = mean of Y when X = 0
• β0 + β1 = mean of Y when X = 1
• β1 = difference in group means, X =1 minus X = 0
• SE( ) has the usual interpretation
• t-statistics, confidence intervals constructed as usual
• This is another way (an easy way) to do difference-in-means
analysis
• The regression formulation is especially useful when we have
additional regressors (as we will very soon)

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-21
Heteroskedasticity and Homoskedasticity, and
Homoskedasticity-Only Standard Errors
(Section 5.4)

1. What…?
2. Consequences of homoskedasticity
3. Implication for computing standard errors

What do these two terms mean?


If var(u|X=x) is constant – that is, if the variance
of the conditional distribution of u given X does
not depend on X – then u is said to be
homoskedastic. Otherwise, u is
heteroskedastic.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-22
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
1-23
Example: hetero/homoskedasticity in the case of a binary
regressor (that is, the comparison of means)

• Standard error when group variances are unequal:


ss2 sl2
SE = 
ns nl
• Standard error when group variances are equal:
1 1
SE = s p 
ns nl
s2p (ns  1)ss2  (nl  1)sl2
Where = (SW, Sect 3.6)
ns  nl  2
sp = “pooled estimator of σ2” when  l2 =  s2
• Equal group variances = homoskedasticity
• Unequal group variances = heteroskedasticity
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
5-24
Homoskedasticity in a picture:

• E(u|X=x) = 0 (u satisfies Least Squares Assumption #1)


• The variance of u does not depend on x

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-25
A real-data example from labor economics: average hourly
earnings vs. years of education (data source: Current Population
Survey):

Heteroskedastic or homoskedastic?
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
5-26
The class size data:

Heteroskedastic or homoskedastic?
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
5-27
So far we have (without saying so)
assumed that u might be heteroskedastic.

Recall the three least squares assumptions:


1. E(u|X = x) = 0

2. (Xi,Yi), i =1,…,n, are i.i.d.

3. Large outliers are rare

Heteroskedasticity and homoskedasticity concern


var(u|X=x). Because we have not explicitly
assumed homoskedastic errors, we have implicitly
allowed for heteroskedasticity.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-28
What if the errors are in fact homoskedastic?

• You can prove that OLS has the lowest variance among
estimators that are linear in Y… a result called the Gauss-
Markov theorem that we will return to shortly.
• The formula for the variance of and the OLS standard
error simplifies: If var(ui|Xi=x) = , then
var[( X i   x )ui ]
var( )= (general formula)
n( ) 2 2
X
 u2
= (simplification if u is homoscedastic)
n X 2

Note: var( ) is inversely proportional to var(X): more


spread in X means more information about – we
discussed this earlier but it is clearer from this formula.
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
5-29
• Along with this homoskedasticity-only formula for
the variance of , we have homoskedasticity-only
standard errors:
Homoskedasticity-only standard error formula:
1 n 2
 uˆi
n  2 i 1
SE( )= 1
 .
n 1 n

n i 1
( X i  X )2

Some people (e.g. Excel programmers) find the


homoskedasticity-only formula simpler – but it is
wrong unless the errors really are homoskedastic.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-30
We now have two formulas for standard
errors for .

• Homoskedasticity-only standard errors – these are valid


only if the errors are homoskedastic.

• The usual standard errors – to differentiate the two, it is


conventional to call these heteroskedasticity – robust
standard errors, because they are valid whether or not the
errors are heteroskedastic.

• The main advantage of the homoskedasticity-only standard


errors is that the formula is simpler. But the disadvantage is
that the formula is only correct if the errors are
homoskedastic.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-31
Practical implications…

• The homoskedasticity-only formula for the standard error of


and the “heteroskedasticity-robust” formula differ – so in
general, you get different standard errors using the different
formulas.

• Homoskedasticity-only standard errors are the default


setting in regression software – sometimes the only setting
(e.g. Excel). To get the general “heteroskedasticity-robust”
standard errors you must override the default.

• If you don’t override the default and there is in fact


heteroskedasticity, your standard errors (and t-
statistics and confidence intervals) will be wrong –
typically, homoskedasticity-only SEs are too small.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-32
Heteroskedasticity-robust standard
errors in STATA
regress testscr str, robust
Regression with robust standard errors Number of obs = 420
F( 1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581
-------------------------------------------------------------------------
Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------------------------------------------------------------------
qqqqstr | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
qq_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
-------------------------------------------------------------------------

• If you use the “, robust” option, STATA computes heteroskedasticity-robust standard


errors
• Otherwise, STATA computes homoskedasticity-only standard errors

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-33
The bottom line:

• If the errors are either homoskedastic or heteroskedastic and


you use heteroskedastic-robust standard errors, you are OK

• If the errors are heteroskedastic and you use the


homoskedasticity-only formula for standard errors, your
standard errors will be wrong (the homoskedasticity-only
estimator of the variance of is inconsistent if there is
heteroskedasticity).

• The two formulas coincide (when n is large) in the special


case of homoskedasticity

• So, you should always use heteroskedasticity-robust


standard errors.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-34
The Extended Least Squares Assumptions

These consist of the three LS assumptions, plus two more:


1. E(u|X = x) = 0.
2. (Xi,Yi), i =1,…,n, are i.i.d.
3. Large outliers are rare (E(Y4) < ∞, E(X4) < ∞).
4. u is homoskedastic
5. u is distributed N(0,σ2)

• Assumptions 4 and 5 are more restrictive – so they apply to


fewer cases in practice. However, if you make these
assumptions, then certain mathematical calculations simplify
and you can prove strong results – results that hold if these
additional assumptions are true.
• We start with a discussion of the efficiency of OLS

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-35
Efficiency of OLS, part I: The Gauss-
Markov Theorem

Under extended LS assumptions 1-4 (the basic


three, plus homoskedasticity), has the smallest
variance among all linear estimators (estimators that
are linear functions of Y1,…, Yn). This is the Gauss-
Markov theorem.

Comments
• The GM theorem is proven in SW Appendix 5.2

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-36
The Gauss-Markov Theorem, ctd.

• is a linear estimator, that is, it can be written


as a linear function of Y1,…, Yn:
n

( X  X )ui
1 n
=  wi ui ,
i
– β1 = i1
n
n i1
( X i
 X ) 2

i1

( Xi  X )
where wi = n .
1

n i1
( X i  X )2

• The G-M theorem says that among all possible


choices of {wi}, the OLS weights yield the smallest
var( )
Copyright © 2011 Pearson Addison-Wesley. All rights reserved.
5-37
Efficiency of OLS, part II:

• Under all five extended LS assumptions – including normally


distributed errors – has the smallest variance of all
consistent estimators (linear or nonlinear functions of
Y1,…,Yn), as n ->∞.

• This is a pretty amazing result – it says that, if (in addition


to LSA 1-3) the errors are homoskedastic and normally
distributed, then OLS is a better choice than any other
consistent estimator. And because an estimator that isn’t
consistent is a poor choice, this says that OLS really is the
best you can do – if all five extended LS assumptions hold.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-38
Some not-so-good thing about OLS

The foregoing results are impressive, but these results – and


the OLS estimator – have important limitations.
1. The GM theorem really isn’t that compelling:
– The condition of homoskedasticity often doesn’t hold
(homoskedasticity is special)
– The result is only for linear estimators – only a small subset of
estimators (more on this in a moment)

2. The strongest optimality result (“part II” above) requires


homoskedastic normal errors – not plausible in applications
(think about the hourly earnings data!)

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-39
Limitations of OLS, ctd.

3. OLS is more sensitive to outliers than some other


estimators. In the case of estimating the population mean,
if there are big outliers, then the median is preferred to the
mean because the median is less sensitive to outliers – it
has a smaller variance than OLS when there are outliers.
Similarly, in regression, OLS can be sensitive to outliers,
and if there are big outliers other estimators can be more
efficient (have a smaller variance). One such estimator is
the least absolute deviations (LAD) estimator:
n
min b ,b
0 1
 Y  (b
i 0
 b1 X i )
i1

In virtually all applied regression analysis, OLS is used – and


that is what we will do in this course too.

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-40
Inference if u is homoskedastic and normally distributed:
the Student t distribution (Section 5.6)

Recall the five extended LS assumptions:


1. E(u|X = x) = 0.
2. (Xi,Yi), i =1,…,n, are i.i.d.
3. Large outliers are rare (E(Y4) < ∞, E(X4) < ∞).
4. u is homoskedastic
5. u is distributed N(0,σ2)

If all five assumptions hold, then:


• and are normally distributed for all n (!)
• the t-statistic has a Student t distribution with n – 2 degrees
of freedom – this holds exactly for all n (!)

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-41
Summary and Assessment (Section 5.7)

• The initial policy question:


Suppose new teachers are hired so the student-teacher ratio
falls by one student per class. What is the effect of this
policy intervention (“treatment”) on test scores?
• Does our regression analysis using the California data set
answer this convincingly?
Not really – districts with low STR tend to be ones with lots
of other resources and higher income families, which provide
kids with more learning opportunities outside school…this
suggests that corr(ui, STRi) > 0, so E(ui|Xi)≠0.
• It seems that we have omitted some factors, or variables,
from our analysis, and this has biased our results...

Copyright © 2011 Pearson Addison-Wesley. All rights reserved.


5-42

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy