0% found this document useful (0 votes)
8 views65 pages

Intecxpres 182

The document outlines topics related to introductory econometrics including simple linear regression, multiple linear regression, heteroskedasticity, and regressions with time-series observations. It introduces multiple linear regression models and describes how ordinary least squares (OLS) can be used to estimate the coefficients in these models. OLS finds the coefficient estimates that minimize the sum of squared errors between the actual and predicted dependent variable values. The first-order conditions of the OLS estimation have an interpretation analogous to method of moments conditions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views65 pages

Intecxpres 182

The document outlines topics related to introductory econometrics including simple linear regression, multiple linear regression, heteroskedasticity, and regressions with time-series observations. It introduces multiple linear regression models and describes how ordinary least squares (OLS) can be used to estimate the coefficients in these models. OLS finds the coefficient estimates that minimize the sum of squared errors between the actual and predicted dependent variable values. The first-order conditions of the OLS estimation have an interpretation analogous to method of moments conditions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria

.........................................................

Introductory Econometrics
Based on the textbook by Wooldridge:
Introductory Econometrics: A Modern Approach

Robert M. Kunst
robert.kunst@univie.ac.at

University of Vienna
and
Institute for Advanced Studies Vienna

October 5, 2018

. . . . . . 1/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................

Outline
Introduction
Simple linear regression
Multiple linear regression
OLS in the multiple linear regression
Statistical properties of OLS
Inference in the multiple model
OLS asymptotics
Selection of regressors in the multiple model
Heteroskedasticity
Regressions with time-series observations
Serial correlation in time-series regression
Instrumental variables estimation
. . . . . . 2/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................

. . . . . . 3/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................

. . . . . . 4/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS in the multiple linear regression

A multiple linear regression model with two regressors

The simplest multiple linear model is one in which a dependent


variable y depends on two explanatory variables x1 and x2 , for
example wages on education and work experience:

y = β0 + β1 x1 + β2 x2 + u,

where the slope β1 measures the reaction of y to a marginal


change in x1 keeping x2 fixed (ceteris paribus) or ∂y /∂x1 . Often,
regressors are closely related, and the ceteris paribus idea becomes
problematic.

. . . . . . 5/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS in the multiple linear regression

The multiple linear regression model


In the general multiple linear regression model, y is regressed on
k regressors

y = β0 + β1 x1 + β2 x2 + . . . + βk xk + u,

with an intercept β0 and k slope parameters (coefficients)


βj , 1 ≤ j ≤ k. Again, for the error term u, it will be assumed that
E(u|x1 , . . . , xk ) = 0.
The multiple linear regression is the most important statistical
model in econometrics. Note that ‘multiple’ should not be replaced
by ‘multivariate’: multivariate regression lets a vector of variables
y1 , . . . , yg depend on a vector of regressors. Here, y is just a scalar
dependent variable.
. . . . . . 6/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS in the multiple linear regression

OLS in the multiple model

In order to generalize the idea of OLS estimation to the multiple


model, one would minimize

n
(yi − β0 − β1 xi1 − . . . − βk xik )2
i=1

in β0 , β1 , . . . , βk , and call the minimizing values β̂0 , β̂1 , . . . , β̂k .


Note the two subscripts (i, k): the first subscript i denotes the
observation, the second one k the variable number. This is the
Cowles Commission convention; (k, i) is slightly less popular and it
is called ‘Dutch notation’.

. . . . . . 7/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS in the multiple linear regression

Formally, the solution can be obtained by taking derivatives and


solving a system of first-order conditions

n
(yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0,
i=1

n
xi1 (yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0,
i=1

n
xi2 (yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0,
i=1
...

n
xik (yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0.
i=1

This system does not yield a nice closed form for the OLS
coefficients, unless matrix algebra is used.
. . . . . . 8/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS in the multiple linear regression

Interpreting the OLS first-order conditions

Just like in the simple regression model, the first-order conditions


have a method-of-moments interpretation:
I The condition for the intercept β̂0 says that the sample mean
of the OLS residuals is 0. This corresponds to the population
moments condition that Eu = 0;
I Each condition for a slope coefficient β̂j says that the sample
correlation (or covariance) between the residuals and the
regressor xj is 0. This corresponds to the population condition
that the regressors and errors are uncorrelated.

. . . . . . 9/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS in the multiple linear regression

Multiple linear regression in matrix form


Presume all values for the dependent variable y and all regressors
are written in a vector y and in an {n × (k + 1)}–matrix X:
   
y1 1 x11 . . . x1k
 y2   1 x21 . . . x2k 
y= , X =   .. .. 

 ...   .
..
. . 
yn 1 xn1 . . . xnk

In this notation, the regression model becomes y = Xβ + u, and


the OLS estimates β̂ = (β̂0 , β̂1 , . . . , β̂k )′ can be written compactly
as
β̂ = (X′ X)−1 X′ y.
. . . . . . 10/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS in the multiple linear regression

Fitted values and residuals

Just as in simple regression, OLS estimation decomposes observed


y into an explained part ŷ (the fitted value) and an unexplained
part or residual û:

yi = β̂0 + β̂1 xi1 + . . . + β̂k xik + ûi = ŷi + ûi .

Because the sample mean of the residuals is 0, the sample mean of


the fitted values is ȳ and the averages are on the regression
‘hyperplane’:
ȳ = β̂0 + β̂1 x̄1 + . . . + β̂k x̄k .

. . . . . . 11/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS in the multiple linear regression

Simple and multiple linear regression coefficients

In most cases, the estimate β̂1 in a simple linear regression

y = β̂0 + β̂1 x1 + û

differs from the estimate β̂1 in a comparable multiple regression

y = β̂0 + β̂1 x1 + . . . + β̂k xk + û.

The coefficient estimates only coincide in special cases, such as


cov(x1 , xj ) = 0 for all j ̸= 1 or β̂j = 0 for all j ̸= 1. Note that
cov(x1 , xj ) = 0 is not the typical case: regressors are usually
correlated with other regressor variables.

. . . . . . 12/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS in the multiple linear regression

Simple and two-regressor regression: a property


Consider the simple regression

y = β̃0 + β̃1 x1 + ũ

and the regression of y on x1 and on an additional x2 :

y = β̂0 + β̂1 x1 + β̂2 x2 + û.

It is easily(?) shown that

β̃1 = β̂1 + β̂2 δ̂,

where δ̂ is the slope coefficient in a regression of x2 on x1 . Clearly,


β̃1 = β̂1 iff one of the two factors β̂2 and δ̂ is 0. Note: β̂1 is not
necessarily ‘better’ or ‘more correct’ than.
β̃1 .. . . . . 13/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS in the multiple linear regression

Goodness of fit in the multiple model


The variance decomposition equation

n ∑
n ∑
n
(yi − ȳ )2 = (ŷi − ȳ )2 + ûi2
i=1 i=1 i=1
or
SST = SSE + SSR
continues to hold in the multiple regression model. Likewise,
SSE SSR
R2 = =1−
SST SST
defines a descriptive statistic in the interval [0, 1] that measures
the goodness of fit. Note, however, that R 2 is not the squared
correlation of y and any xj but the maximum squared correlation
coefficient of y and linear combinations . of x1. , . . . ., xk . . . . 14/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Statistical properties of OLS

Assumptions for multiple linear regression

In order to establish OLS properties such as unbiasedness etc.,


model assumptions have to be formulated. The first assumption is
the natural counterpart to (SLR.1), the linearity in parameters:
MLR.1 The population model can be written

y = β0 + β1 x1 + β2 x2 + . . . + βk xk + u,

with unknown coefficient parameters β1 , . . . , βk , an intercept


parameter β0 , and unobserved random error u.

. . . . . . 15/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Statistical properties of OLS

Assumption of random sampling

MLR.2 The data constitute a random sample of n observations


{(xi1 , . . . , xik , yi ) : i = 1, . . . , n} of random variables
corresponding to the population model (MLR.1).
Due to the random-sampling assumption (MLR.2), observations
and also errors are independent for different i.

. . . . . . 16/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Statistical properties of OLS

No multicollinearity
In simple regression, OLS requires some variation in the regressor
(should not be entirely constant). In multiple regression, more is
needed. The matrix X′ X must be invertible:
MLR.3 There are no exact linear relationships connecting the
regressor variables, and no regressor is constant in sample or
in population.

Assumption (MLR.3) implies n > k. When it holds in population,


violation in sample happens with 0 probability for continuous
random variables. (MLR.3) is violated if a regressor is the sum or
difference of other regressors. It is not violated for nonlinear
identities, such as x2 = x12 .
. . . . . . 17/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Statistical properties of OLS

Danger: do not misinterpret the multicollinearity condition

MLR.3 does not imply that the correlation among explanatory


variables is assumed as 0. Regressors are not assumed to be
independent of each other.
It is true that estimates are less precise for strong correlation
among regressors, and that ceteris paribus interpretations become
more difficult. Nonetheless, none of the basic properties of the OLS
estimator depends on regressors being independent or uncorrelated.

. . . . . . 18/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Statistical properties of OLS

Zero conditional expectation

The assumption E(u|x) = 0 is just a natural generalization of the


simple regression assumption:
MLR.4 The error u has an expected value of zero given any values of
the regressors, in symbols

E(u|x1 , x2 , . . . , xk ) = 0.

Again, (MLR.4) implies E(u) = 0 but it is stronger than that


property. (MLR.4) also implies cov(xj , u) = 0 for all regressors xj .

. . . . . . 19/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Statistical properties of OLS

Violations of assumption MLR.4


There are several reasons why (MLR.4) may not hold, in particular
the ensuing condition cov(xj , u) = 0, which is often called an
exogeneity condition. When it is violated, xj is called an
endogenous regressor.
I If the true relationship is nonlinear, E(u|xj ) ̸= 0 even though
E(u) = 0;
I If an important influence factor has been omitted from the list
of regressors (‘omitted variable bias’), (MLR.4) is formally
violated. The researcher must decide whether she wishes to
estimate the regression without or with the doubtful control;
I If there is logical ‘feedback’ from y to some xj , u and xj are
correlated, xj is endogenous, and regression yields biased
estimates of the true relationship. This case must be handled
by special techniques (instrumental. variables).
. . . . . 20/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Statistical properties of OLS

Unbiasedness of OLS

(MLR.1) to (MLR.4) suffice for unbiasedness:


Theorem
Under assumptions (MLR.1)–(MLR.4),

E(β̂j ) = βj , j = 0, 1, . . . , k,

for any values of the parameters βj .


In words, OLS is an unbiased estimator for the intercept and all
coefficients. In short, one may write E(β̂) = β, using the notation
β for a (k + 1)–vector (β0 , β1 , . . . , βk )′ and a corresponding
notation for the expectation operator.

. . . . . . 21/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Statistical properties of OLS

Scylla and Charybdis


How many regressors should be included in a multiple regression?
I Omitting influential regressors (too low k) tends to overstate
the effects. Effects due to the omitted variables are attributed
to the included regressors (‘omitted variable bias’).
Conversely, for example the effect of a difference between two
regressors may not be found if only one of them is included;
I ‘Profligate’ regressions with many regressors lack degrees of
freedom. Results will be imprecise, variances will be large.

Statistical tools for ‘model selection’ are important (R 2 and R̄ 2 do


not work). Generally, economists tend to include too many
regressors.
. . . . . . 22/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Statistical properties of OLS

Homoskedasticity

For the efficiency and variance properties, constant variance must


be assumed:
MLR.5 The error u has the same variance given any values of the
explanatory variables, in symbols

var(u|x1 , . . . , xk ) = σ 2 .

If (MLR.5) is violated, the error variance and also the variance of


the dependent variable will change with some xj .
Heteroskedasticity is often observed in cross-section data.

. . . . . . 23/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Statistical properties of OLS

The variance of OLS


The most informative way to represent the OLS variance is by
using matrices:
Theorem
Under assumptions (MLR.1)–(MLR.5), the variance of the OLS
estimator β̂ = (β̂0 , β̂1 , . . . , β̂k )′ is given by

var(β̂|X) = σ 2 (X′ X)−1 ,

where the operator var applied to a vector denotes a matrix of


variances and covariances.

The matrix expression must be evaluated in OLS estimation


anyway. As n → ∞, the matrix X′ X divided by n may converge to
a moment matrix of the regressors.
. . . . . . 24/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Statistical properties of OLS

A property of the OLS variances

From the general formula in the theorem, the interesting formula

σ2
var(β̂j |X) =
SSTj (1 − Rj2 )

is obtained, where SSTj denotes ni=1 (xij − x̄j )2 and Rj2 is the R 2
from a regression of xj on the other regressors xl , l ̸= j. Note that
this formula does not use any matrices.
Strong variation in the regressor xj and weak correlation with other
regressors benefits the precision of the coefficient estimate β̂j .

. . . . . . 25/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Statistical properties of OLS

Estimating the OLS variance


In the formulae for the OLS variance, the item σ 2 is unobserved
and must be estimated. In analogy to simple regression, the
following theorem holds:
Theorem
Under the assumptions (MLR.1)–(MLR.5), it holds that
∑n
û 2 SSR
E i=1 i = E = Eσ̂ 2 = σ 2 ,
n−k −1 n−k −1
i.e. the estimator of the error variance is unbiased.

The scale factor n − k − 1 corresponds to the ‘degrees of freedom’


concept: n observations yield k + 1 coefficient estimates, such that
n − k − 1 degrees of freedom remain. The proof is omitted.
. . . . . . 26/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Statistical properties of OLS

Gauss-Markov and multiple regression


In direct analogy to the case of simple regression, there is the
celebrated Gauss-Markov Theorem for linear efficiency:
Theorem
Under assumptions (MLR.1)–(MLR.5), the OLS estimators
β̂0 , β̂1 , . . . , β̂k are the best linear unbiased estimators of
β0 , β1 , . . . , βk , respectively, i.e. BLUE.

For this reason, (MLR.1)–(MLR.5) are called the Gauss-Markov


conditions. It can be shown that a genuine multivariate
generalization holds and that linear combinations of OLS
estimators are BLUE for linear combinations of coefficient
parameters.
. . . . . . 27/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

Normal regression

For some results, such as unbiasedness and linear efficiency, no


exact distributional assumptions are needed. For others, it is
convenient to assume a Gaussian (normal) distribution:
MLR.6 The error u is independent of the explanatory variables and it
is normally distributed with mean 0 and variance σ 2 , in
symbols u ∼ N (0, σ 2 ).

Assumption (MLR.6) implies (MLR.4) and (MLR.5). Normality is


often a reasonable working assumption, unless there is strong
evidence to the contrary. In large samples, it can be tested.

. . . . . . 28/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

OLS coefficient estimates as normal random variables

Theorem
Under the assumptions (MLR.1)–(MLR.6), the conditional
distribution on the regressors of the OLS coefficient estimates β̂j is
normal, i.e.
β̂j |X ∼ N {βj , var(β̂j )},
with var(β̂j ) given by the direct expression using the idea of a
regression of xj on other covariates or as the (j, j) element of the
variance matrix σ 2 (X′ X)−1 .

Note that the variance is formally a random variable as it depends


on X. Proof is quite obvious, as for given X, β̂ is a linear function
of y, which in turn is normal due to normal u.
. . . . . . 29/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

Implications of the normality of OLS estimates


I Normality does not only hold for the individual β̂j , it holds
that β̂ ∼ N (β, σ 2 (X′ X)−1 ) for a multivariate
((k + 1)–variate) normal distribution. Thus, all sums,
differences, linear combinations of coefficient estimates are
also normally distributed;
I From the properties of the normal distribution, it follows that
the theoretical standardized estimate
β̂j − βj
s.e.(β̂j )
is standard normal N (0, 1) distributed. The denominator
standard error, however, is the square root of the true and
unknown variance. This distributional property does not hold
for the estimated standard error. . . . . . . 30/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

The empirically standardized estimate

If the OLS coefficient estimates are standardized by estimated


standard errors, the distribution follows the well known t law (in
the older literature ‘Student’ distribution):

Theorem
Under assumptions (MLR.1)–(MLR.6),

β̂j − βj
∼ tn−k−1 ,
c β̂j )
s.e.(

in words, ‘the empirically standardized estimate follows a t


distribution with n − k − 1 degrees of freedom’.

. . . . . . 31/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

Remarks on the standardized estimate


I The t distribution with m degrees of freedom is defined from
m + 1 independent standard normal random variables
a, b√
1 , . . . , bm as the distribution of the ratio
a/ (b12 + . . . + bm
2 )/m;

I For more than around 30 degrees of freedom, the t


distribution becomes so close to the normal N (0, 1) that the
standard normal can be used instead;
I The standardized estimator will not be t distributed if the
normality assumption (MLR.6) is violated;
I Degrees of freedom can be remembered as follows: out of n
original degrees of freedom, k + 1 are used up by estimating
coefficients and the intercept, and n − k − 1 remain.
. . . . . . 32/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

Densities of t distributions

0.3
0.2
f(x)

0.1
0.0

−4 −2 0 2 4

Densities for the t distribution with 5 (black), 10 (blue), and 20


(green) degrees of freedom. . . . . . . 33/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

Testing the null hypothesis βj = 0

Researchers are interested in testing the null hypothesis

H0 : β j = 0

usually with the alternative βj ̸= 0, less often with the alternative


βj > 0 or βj < 0. An appropriate test statistic to test this H0 is
the empirically standardized estimate for βj = 0, i.e.

β̂j
tβj = ,
c β̂j )
s.e.(

which is called the t ratio or t statistic.

. . . . . . 34/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

What is a hypothesis test?

A hypothesis test is a statistical decision procedure. Based on the


value of a test statistic, which is a function of the sample and
hence a random variable, it either rejects the null hypothesis or is
unable to reject it (fails to reject).
For example, the t–test rejects the null of βj = 0 if

tβj > c, |tβj | > c,

in the one-sided and two-sided versions. c is called the critical


value, and the region of R where the test rejects, is called the
critical region.

. . . . . . 35/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

How are the critical values determined?

Hypothesis tests are tuned to significance levels. A significance


level is the probability of a type I error, i.e. of rejecting the null
even though it is correct. The construction of a test requires
knowledge of the distribution of the test statistic under the null.

Suppose the significance level (specified by the researcher) is 5%.


Then, any interval that has the probability of 5% under the null is
a valid critical region for a valid test. In order to minimize the
probability of a type II error, critical regions are defined to be
situated in the tails of the null distribution. For example, the 95%
quantile of the t–distribution is a good critical value for a 5% test
against a one-sided alternative if the test statistic is t–distributed.

. . . . . . 36/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

Practical implementation of hypothesis tests


Presume the researcher has the value of the test statistic and
searches for critical values. Several options are available:
I If the null distribution is a known standard law, critical values
are found on the web or in books: inconvenient;
I Critical values may also be provided by a statistical software in
this case: slightly more convenient and flexible;
I If the software is smart, it will provide p–values instead of or
in addition to the critical values: very precise and convenient;
I If the null distribution is non-standard and rare, the researcher
may have to simulate the distribution via Monte Carlo or by
bootstrap procedures: computer skills needed.

. . . . . . 37/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

Definition of the p–value


Correct definitions:
I The p–value is the significance level, at which the test
becomes indifferent between rejection and non-rejection for
the sample at hand (the calculated value of the test statistic);
I The p–value is the probability, under the null hypothesis, of
generating values of the test statistic that are even more
unusual (less typical, often ‘larger’) than the one calculated
from the sample.

Incorrect definition:
I The p–value is the probability of the null hypothesis for this
sample.
. . . . . . 38/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

Test based on quantiles

0.4
0.3
0.2
0.1
0.0

−4 −2 0 2 4

10% to 90% quantiles of the normal distribution. The observed value of


2.2 for the test statistic that is, under H0 , normally distributed is
significant at 10% for the one-sided test.
. . . . . . 39/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

Test based on p–values

0.4
0.3
0.2
0.1
0.0

−4 −2 0 2 4

The area under the density curve to the right of the observed value of 2.2
is 0.014, which yields the p–value. The one-sided test rejects on the
levels of 10%, 5%, but not 1%.
. . . . . . 40/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

Return to the t–test


Assume (MLR.1)–(MLR.6). Under the null hypothesis H0 : βj = 0,
the t ratio for βj ,
β̂j
tβj = ,
c β̂j )
s.e.(
will be t–distributed with n − k − 1 degrees of freedom or tn−k−1
distributed. Thus, reject H0 at 5% significance in favor of the
alternative HA : βj > 0 if the test statistic is larger than the 95%
quantile of the tn−k−1 distribution. Reject in favor of HA : βj ̸= 0
if the test statistic is larger than the 97.5% quantile or less than
the 2.5% quantile.
When the t–test rejects, it is often said that ‘βj is significantly
different from 0’ or simply that ‘βj is significant’ or also that ‘xj is
significant’. . . . . . . 41/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

More general t–tests

Presume one wishes to test H0 : βj = βj0 for a given value βj0 ,


such as H0 : βj = 2.15. Then, evaluate the statistic

β̂j − βj0
.
c β̂j )
s.e.(

Under H0 , it is clearly tn−k−1 distributed, and the usual quantiles


can be used.

. . . . . . 42/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

Testing several exclusion restrictions jointly


Assume the null hypothesis of concern is now

H0 : βl+1 = βl+2 = . . . = βk = 0,

i.e. the exclusion of k − l regressors. Then, a suitable test statistic


is
(SSRr − SSRu )/(k − l)
F = ,
SSRu /(n − k − 1)
where SSRr is the SSR for the restricted model without the k − l
regressors and SSRu for the unrestricted model with all k
regressors. Assuming (MLR.1)–(MLR.6), the statistic F is, under
the null, distributed F with k − l ‘numerator’ and n − k − 1
‘denominator’ degrees of freedom.
. . . . . . 43/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

Densities of F distributions

1.0
0.8
0.6
f(x)

0.4
0.2
0.0

0 1 2 3 4 5

Densities of F distributions with 2 (black), 4 (blue), and 6 (green)


numerator and 20 denominator degrees .of freedom.
. . . . . 44/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

Some remarks on the F test


I The F test is easily generalized to test for general restrictions,
such as

H0 : β2 + β3 = 1, β5 = 3.61, β6 = 2β7 ,

as again there exists a SSRu and a SSRr . The main difficulty


may be the estimation of the restricted model. Numerator
degrees of freedom correspond to the number of linear
independent restrictions;
I For large n, the F statistic will be distributed like 1/(k − l)
times a χ2 (k − l) distribution;
I The F statistic for the exclusion of one regressor xj is the
square of tβj .
. . . . . . 45/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

The overall F test

A special F –test has the null hypothesis H0 : β1 = ... = βk = 0


and the alternative that at least one coefficient is non-zero. The
statistic is
(SST − SSR)/k R 2 /k
F = = ,
SSR/(n − k − 1) (1 − R 2 )/(n − k − 1)

a transformation of the R 2 . When it fails to reject, the regression


model fails to provide a useful description of y . This is the only
F –statistic that shows in a standard regression printout.

. . . . . . 46/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

The importance of F and t tests


F and t tests are restriction tests that are tools in searching for
the best specification of a regression equation—the best selection
of regressors—that determines the targeted dependent variable y .

I Only nested models can be compared. For example,


y = β0 + β1 x1 + u can be tested against
y = β0 + β1 x1 + β2 x2 + u but not against y = β0 + β1 x2 + u;
I In the specification search, it is often recommended to start
with a profligate model and to eliminate insignificant
regressors (backward elimination, general-to-specific) rather
than to add regressors from a ‘small’ model;
I The decisions of t–tests, say, for two coefficients βl , βj and of
the F –test for βl = βj = 0 are often in conflict. Some
researchers prefer the decision of the F –test in doubtful cases.
. . . . . . 47/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Inference in the multiple model

Ouch!

The following statements are regarded as incorrect:


I The tested null hypothesis is H0 : β̂j = 0;
I The test is rejected;
I The alternative hypothesis can be rejected;
I The test is 2.55;
I The coefficient β4 is significant at 95% (unless someone really
uses an unusual 95% significance level);
I The hypothesis that β4 is insignificant can be rejected.

. . . . . . 48/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS asymptotics

The probability limit


When talking about asymptotics, i.e. large-sample behavior,
statistical convergence concepts are needed. For convergence of a
sequence of random variables X1 , . . . , Xn , . . . to a fixed limit, we
use
Definition
A sequence of random variables (Xn ) is said to converge in
probability to θ ∈ R, in symbols plim Xn = θ iff for every ε > 0
n→∞

P(|Xn − θ| > ε) → 0 as n → ∞.

This concept is relatively weak, as it does not imply that single


realizations of the random variable sequence converge. It allows
simple rules, such as plim(Xn Yn ) = (plimX
.
n.)(plimY
.
n ).
. . . 49/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS asymptotics

Consistency of OLS
An estimator θ̂ for the parameter θ is called consistent iff

plim θ̂(n) = θ,
n→∞

with θ̂(n) denoting an estimate from a sample of size n. For OLS


in the linear regression model, consistency holds under relatively
weak conditions:
Theorem
Under assumptions (MLR.1)–(MLR.4) and some technical
conditions, the OLS estimator β̂ is consistent for β, which implies
that plim β̂j = βj for j = 0, 1, . . . , k.
n→∞

. . . . . . 50/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS asymptotics

A sketch of the consistency issue

Consider
1 1
β̂ = β + (X′ X)−1 X′ u = β + ( X′ X)−1 X′ u.
n n
Typically, the term n−1 X′ X will converge to some kind of variance
matrix. The term n−1 X′ u should converge to its expectation
EX′ u, which is 0 if X and u are uncorrelated and Eu = 0. Thus,
the condition
MLR.4’ E(u) = 0, cov(xj , u) = 0, for j = 1, . . . , k,
will suffice for consistency and can be substituted for the stronger
assumption (MLR.4).

. . . . . . 51/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS asymptotics

Correlation of regressor and error is pretty bad

It was shown before that correlation between a regressor and the


errors (for example, with omitted variables and with endogeneity)
usually causes a bias in the sense of Eβ̂ ̸= β. If (MLR.4’) is
violated, this bias will not even disappear as n → ∞ and becomes
an inconsistency. As Clive Granger said,
If you can’t get it right as n goes to infinity, you
shouldn’t be in the business.
This means that inconsistent estimators should not be used at all.
Inconsistency is more serious than a finite-sample bias.

. . . . . . 52/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS asymptotics

Asymptotic normality of the OLS estimator


The (a) celebrated Central Limit Theorem can be used to prove
Theorem
Under the Gauss-Markov assumptions (MLR.1)–(MLR.5) and some
technical conditions, it holds that

β̂j − βj d
→ N (0, 1),
c β̂j )
s.e.(
√ d
and generally that n(β̂j − βj ) → N (0, σβ2j ) with σβ2j determined
either from the matrix formula σ 2 (X′ X)−1 or by the
aforementioned construction from regressions among regressors.

. . . . . . 53/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS asymptotics

Remarks for the asymptotic normality of OLS

I Note that normality of the errors is not required: even for


most non-normal error distributions, β̂ will approach a normal
limit distribution;
I Under the assumptions of the theorem, σ̂ 2 will converge to σ 2 ;
I This latter convergence is of type ‘plim’, while the main result
d
of the theorem uses convergence in distribution (→), a weaker
type of convergence. Convergence in distribution means that
the distribution of a random variable converges to a limit
distribution, nothing else is stated on the random variables
proper.

. . . . . . 54/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS asymptotics

Lagrange multiplier tests: the idea


Restriction tests (t and F ) follow the Wald test principle, one of
the three test construction principles used in parametric statistics.
The other two principles are the likelihood-ratio test (LR) and
the Lagrange multiplier test principle (LM). LR and LM tests are
typically asymptotic tests, their small-sample null distributions
are uncertain, their large-sample distributions will be regular
(chi-square) even in the absence of (MLR.6).
The LM test estimates the model under the null and checks the
increase in the likelihood when moving toward the alternative. It is
also called ‘score test’, as the derivative of the likelihood is called
the score. Often, the LM test can be made operational in a
sequence of regressions, with the test statistic simply calculated as
nR 2 for a specific regression (‘auxiliary regression’).
. . . . . . 55/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
OLS asymptotics

The LM test for exclusion of variables


Consider the multiple regression model

yi = β0 + β1 x1,i + . . . + βk xk,i + ui ,

and the null hypothesis H0 : βk−q+1 = . . . = βk = 0. Estimate the


‘restricted regression model’

yi = β0 + β1 x1,i + . . . + βk−q xk−q,i + ui

by OLS and keep the residuals ũ. Then, regress these ũ on all k
regressors
ũi = γ0 + γ1 x1,i + . . . + γk xk,i + vi .
The nR 2 from this second, auxiliary regression is the LM test
statistic. Under H0 , it is asymptotically distributed as χ2 (q).
. . . . . . 56/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Selection of regressors in the multiple model

Model selection: the main issue


The typical situation in multiple regression is that y has been
specified a priori, and that the researcher looks for the optimal set
of regressors that offer the best explanation for y . Tools for this
specification search or regressor selection are:
I R 2 and R̄ 2 can be used for comparing any two or more
models, but tend to increase with adding any regressors;
I F and t tests can only be used for comparing nested models,
and lengthy search sequences tend to invalidate the
significance level;
I Information criteria such as AIC and BIC can compare any
two or more models and penalize complexity;
I Specification tests can be used to eliminate ill-specified
models but cannot find the optimal model.
. . . . . . 57/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Selection of regressors in the multiple model

Adjusted R 2

The corrected or adjusted R 2 , often denoted R̄ 2 or Rc2 , is defined as


n−1
R̄ 2 = 1 − (1 − R 2 ) .
n−k −1

It holds that R̄ 2 ≤ R 2 . If R 2 is seen as an estimator for


corr2 (y , β ′ x), then the bias of R̄ 2 is smaller than the bias of R 2 .

R 2 always increases if a new regressor is included in the regression.


R̄ 2 increases if the t–ratio for this additional variable is larger than
1 and corresponds to testing at an enormous significance level. It
cannot be used for serious model selection.

. . . . . . 58/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Selection of regressors in the multiple model

Penalizing complexity
Consider the estimated error variance

1 ∑ SSR
n
2
σ̂ = ûi2 = ,
n−k −1 n−k −1
i=1

which, just like R̄ 2 , improves (here, however, decreases) if a new


regressor with a t–ratio greater than one is added. Thus, it cannot
be used for serious model selection. It takes a step in the right
direction, however, a trade-off: the numerator improves (decreases)
with increasing complexity; the denominator deteriorates
(decreases) with higher complexity. This idea is pursued by
information criteria, which impose a stronger penalty for
complexity, strong enough for useful model selection.
. . . . . . 59/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Selection of regressors in the multiple model

The AIC according to Akaike

Akaike introduced the AIC (A Information Criterion), in one


possible version
2(k + 1)
AIC = log σ̂ 2 + ,
n
which is to be minimized: complexity decreases the first term and
increases the second. (In information criteria, σ̂ 2 should be formed
using scales n not n − k − 1.) In nested comparisons, minimizing
AIC corresponds to t or F tests at an approximate 15%
significance level. For n → ∞, minimizing AIC selects the best
forecasting model that tends to keep slightly more regressors than
those with non-zero coefficients.

. . . . . . 60/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................
Selection of regressors in the multiple model

The BIC according to Schwarz


Schwarz simplified the BIC that had been introduced by
Akaike, in one version
(k + 1) log n
BIC = log σ̂ 2 + ,
n
which is to be minimized. The BIC complexity penalty is stronger
than the AIC penalty, so selected models tend to be more
parsimonious (smaller). In nested comparisons, minimizing BIC
corresponds to a significance level falling to 0 as n → ∞. For
n → ∞, BIC will select the ‘true’ model, exactly keeping all
regressors with non-zero coefficients. In smaller samples, BIC tends
to select too parsimonious models.
. . . . . . 61/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................

. . . . . . 62/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................

. . . . . . 63/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................

. . . . . . 64/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................

. . . . . . 65/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy