02 Advance
02 Advance
Multiple hypothesis
testing. Linear and non-linear hypotheses. Confidence
intervals. Delta method.
Jakub Mućk
SGH Warsaw School of Economics
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Least squares estimator 2 / 36
Multiple regression
y = β0 + β1 x1 + β2 x2 + . . . + βK xK + ε (1)
where
I y is the (outcome) dependent variable;
I x1 , x2 , . . . , xK is the set of independent variables;
I ε is the error term.
The dependent variable is explained with the components that vary with the
the dependent variable and the error term.
β0 is the intercept.
β1 , β2 , . . . , βK are the coefficients (slopes) on x1 , x2 , . . . , xK .
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Least squares estimator 3 / 36
Multiple regression
y = β0 + β1 x1 + β2 x2 + . . . + βK xK + ε (1)
where
I y is the (outcome) dependent variable;
I x1 , x2 , . . . , xK is the set of independent variables;
I ε is the error term.
The dependent variable is explained with the components that vary with the
the dependent variable and the error term.
β0 is the intercept.
β1 , β2 , . . . , βK are the coefficients (slopes) on x1 , x2 , . . . , xK .
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Least squares estimator 3 / 36
Assumptions of the least squares estimators I
Assumption #1: true DGP (data generating process):
y = Xβ + ε. (2)
E (ε) = 0, (3)
E(Xε) = 0. (7)
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Least squares estimator 4 / 36
Assumptions of the least squares estimators II
rank(X) = K + 1 ≤ N. (8)
ε ∼ N 0, σ 2 .
(9)
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Least squares estimator 5 / 36
Gauss-Markov Theorem
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Least squares estimator 6 / 36
The least squares estimator
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Least squares estimator 7 / 36
Statistical inference
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Statistical inference 8 / 36
Statistical inference
Statistical inference is the process that using sample data allows to deduce
properties of an underlying features of population.
Statistical inference consists of:
I estimation of underlying parameters,
I testing hypotheses.
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Statistical inference 9 / 36
Hypotheses testing
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Statistical inference 10 / 36
The test statistics and rejection region
Based on the value of a test statistic we decide either to reject the null
hypothesis or not to reject it.
The rejection region consists of values that are unlikely and that have low
probability of occurring when the null hypothesis is true.
The rejection region depends on:
I Distribution of test statistics when the null is true.
I Alternative hypothesis.
I Level of significance.
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Statistical inference 11 / 36
Type I & Type II error and significance level
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Statistical inference 12 / 36
Probability value (p-value)
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Statistical inference 13 / 36
Testing simply hypotheses
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing simply hypotheses 14 / 36
t-test and possible alternative hypotheses
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing simply hypotheses 15 / 36
One-tail test with alternative greater than
H0 : βi =c
f(tm)
H1 : βi ≥c
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing simply hypotheses 16 / 36
One-tail test with alternative less than
H0 : βi =c
f(tm)
H1 : βi ≤c
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing simply hypotheses 17 / 36
Two-tail test with alternative not equal to
H0 : βi =c
H1 : βi 6= c
f(tm)
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing simply hypotheses 18 / 36
Linear combination of parameters I
A linear combination of parameters:
λ = c1 β1 + c2 β2 (16)
therefore we can estimate the variance of the λ by replacing with the (known)
estimated variances and covariance.
if the assumption of the error term normality holds or if the sample is large
then λ̂ have normal distribution:
λ̂ − λ λ̂ − λ
t= p = ∼ tN −K . (22)
var(λ̂) se(λ̂)
Based on the above formulation the variety of hypotheses can be tested. The
null is typically:
H0 : λ = c1 β1 + c2 β2 = λ0 , (23)
while the possible alternative hypotheses:
H1 : λ = c1 β1 + c2 β2 6= λ0 ,
H1 : λ = c1 β1 + c2 β2 ≤ λ0 ,
H1 : λ = c1 β1 + c2 β2 ≥ λ0 .
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing simply hypotheses 20 / 36
Confidence intervals
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Confidence intervals 21 / 36
Point vs Interval estimation
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Confidence intervals 22 / 36
Confidence intervals I
Under the assumption of normality of the error term the least squares esti-
mator of β̂ LS is:
β̂ LS ∼ N (β, Σ) (24)
where Σ is the variance-covariance of the least squares estimator.
For illustrative purpose we focus on slope parameter in the simple regression
model (β̂1LS ): !
σ2
β̂1LS ∼ N β1 , PN (25)
i
(xi − x̄)2
β̂1LS − β1
Z= q ∼ N (0, 1) . (26)
PN
σ2 i
(xi − x̄)2
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Confidence intervals 23 / 36
Confidence intervals II
we can substitute Z
β̂1LS − β1
P −1.96 ≤ q ≤ 1.96 = .95, (28)
PN
σ2 i
(xi − x̄)2
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Confidence intervals 24 / 36
Obtaining interval estimates
In simply regression, replacing σ 2 by its estimates σ̂ 2 produces a random
variable t:
β̂1LS − β1 β̂ LS − β1 β̂ LS − β1
t= q = √1 = 1 LS . (30)
PN v̂ar β̂1LS se (β1 )
σ̂ 2 i (xi − x̄)2
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Confidence intervals 25 / 36
Testing joint hypotheses
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing joint hypotheses 26 / 36
Testing joint hypotheses
A null hypothesis with multiple conjectures, expressed with more than one
equal sign, is called a joint hypothesis.
[Example ] Wages (w) and experience (exper):
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing joint hypotheses 27 / 36
Restricted least squares estimator
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing joint hypotheses 28 / 36
Wald test I
Wald test allows to test a set of linear restrictions.
The F-statistic determines what constitutes a large reduction or a small
reduction in the sum of squared errors:
(SSER − SSEU ) /J
F= , (36)
SSEU / (N − K)
where:
I J is the number of restrictions,
I N is the number of observations,
I K is the number of coefficients in the unrestricted model,
I SSER is sum of squared error in restricted model,
I SSEU is sum of squared error in unrestricted model,
If the null is true then the F-statistic has an F-distribution with J numerator
degrees of freedom and N − K denominator degrees of freedom.
If the null can be rejected then, the differences in sum of squared errors
between restricted model (SSER ) and unrestricted model (SSEU )
become large.
I In other words, the imposed restriction significantly reduce the ability of the
model to fit the data.
The F-test can also be used in many application:
I Testing economic hypotheses.
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing joint hypotheses 29 / 36
Wald test II
W = JF, (37)
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing joint hypotheses 30 / 36
Testing the significance of the model
y = β0 + β1 x1 + . . . + β2 xk + ε. (38)
y = β0 , (40)
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing joint hypotheses 31 / 36
Linear restriction in matrix form
General notation:
R × β = q, (42)
where R is the J × (K + 1) matrix describing linear restriction and q is the
vector of intercepts in each restriction.
Example #1: test of the overall significance of the regression model:
0 1 0 ... 0 0
0 0 1 ... 0 0
R=
... .. .. .. .. and q=
...
. . . .
0 0 0 ... 1 0
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing joint hypotheses 32 / 36
t and F statistics
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing joint hypotheses 33 / 36
Non-sample information
In many cases we have information over and above the information contained
in the sample observation.
This non-sample information can be taken from e.g. economic theory.
[Example] Production function. Consider the regression of logged output (y)
on logged capital (k) and logged labor input (l):
y = β0 + β1 k + β2 l + ε. (43)
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Testing joint hypotheses 34 / 36
Delta method
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Delta method 35 / 36
Delta method I
Delta method is the popular strategy for estimating variance for nonlinear
function of the parameters.
Key assumption: the g(β) is the nonlinear continuous function of the
parameters.
Taylor expansion around true value of the parameters, i.e., β:
0
∂g(β)
g(β̂) = g(β) + (β̂ − β) + o(||β̂ − β||), (45)
∂β
where 0
∂g(β) ∂g ∂g ∂g
= , ,..., . (46)
∂β ∂β1 ∂βK ∂βK
After manipulation
0
∂g(β)
g(β̂) − g(β) = (β̂ − β) + o(||β̂ − β||), (47)
∂β
Jakub Mućk Advanced Applied Econometrics Testing economic hypotheses Delta method 36 / 36