0% found this document useful (0 votes)

8 views65 pages

Intecxpres 182

The document outlines topics related to introductory econometrics including simple linear regression, multiple linear regression, heteroskedasticity, and regressions with time-series observations. It introduces multiple linear regression models and describes how ordinary least squares (OLS) can be used to estimate the coefficients in these models. OLS finds the coefficient estimates that minimize the sum of squared errors between the actual and predicted dependent variable values. The first-order conditions of the OLS estimation have an interpretation analogous to method of moments conditions.

Uploaded by

Lukijan Mihajlovic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views65 pages

Intecxpres 182

Uploaded by

Lukijan Mihajlovic

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 65

Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria

.........................................................

Introductory Econometrics
Based on the textbook by Wooldridge:
Introductory Econometrics: A Modern Approach

Robert M. Kunst
robert.kunst@univie.ac.at

University of Vienna
and
Institute for Advanced Studies Vienna

October 5, 2018

. . . . . . 1/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna
Introduction Simple linear regression Multiple linear regression Heteroskedasticity Regressions with time-series observations Seria
.........................................................

Outline
Introduction
Simple linear regression
Multiple linear regression
OLS in the multiple linear regression
Statistical properties of OLS
Inference in the multiple model
OLS asymptotics
Selection of regressors in the multiple model
Heteroskedasticity
Regressions with time-series observations
Serial correlation in time-series regression
Instrumental variables estimation
. . . . . . 2/65

. . . . . . 3/65

. . . . . . 4/65

A multiple linear regression model with two regressors

The simplest multiple linear model is one in which a dependent

variable y depends on two explanatory variables x1 and x2 , for
example wages on education and work experience:

y = β0 + β1 x1 + β2 x2 + u,

where the slope β1 measures the reaction of y to a marginal

change in x1 keeping x2 fixed (ceteris paribus) or ∂y /∂x1 . Often,
regressors are closely related, and the ceteris paribus idea becomes
problematic.

. . . . . . 5/65

The multiple linear regression model

In the general multiple linear regression model, y is regressed on
k regressors

y = β0 + β1 x1 + β2 x2 + . . . + βk xk + u,

with an intercept β0 and k slope parameters (coefficients)

βj , 1 ≤ j ≤ k. Again, for the error term u, it will be assumed that
E(u|x1 , . . . , xk ) = 0.
The multiple linear regression is the most important statistical
model in econometrics. Note that ‘multiple’ should not be replaced
by ‘multivariate’: multivariate regression lets a vector of variables
y1 , . . . , yg depend on a vector of regressors. Here, y is just a scalar
dependent variable.
. . . . . . 6/65

OLS in the multiple model

In order to generalize the idea of OLS estimation to the multiple

model, one would minimize
∑
n
(yi − β0 − β1 xi1 − . . . − βk xik )2
i=1

in β0 , β1 , . . . , βk , and call the minimizing values β̂0 , β̂1 , . . . , β̂k .

Note the two subscripts (i, k): the first subscript i denotes the
observation, the second one k the variable number. This is the
Cowles Commission convention; (k, i) is slightly less popular and it
is called ‘Dutch notation’.

. . . . . . 7/65

Formally, the solution can be obtained by taking derivatives and

solving a system of first-order conditions
∑
n
(yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0,
i=1
∑
n
xi1 (yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0,
i=1
∑
n
xi2 (yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0,
i=1
...
∑
n
xik (yi − β̂0 − β̂1 xi1 − . . . − β̂k xik ) = 0.
i=1

This system does not yield a nice closed form for the OLS
coefficients, unless matrix algebra is used.
. . . . . . 8/65

Interpreting the OLS first-order conditions

Just like in the simple regression model, the first-order conditions

have a method-of-moments interpretation:
I The condition for the intercept β̂0 says that the sample mean
of the OLS residuals is 0. This corresponds to the population
moments condition that Eu = 0;
I Each condition for a slope coefficient β̂j says that the sample
correlation (or covariance) between the residuals and the
regressor xj is 0. This corresponds to the population condition
that the regressors and errors are uncorrelated.

. . . . . . 9/65

Multiple linear regression in matrix form

Presume all values for the dependent variable y and all regressors
are written in a vector y and in an {n × (k + 1)}–matrix X:
   
y1 1 x11 . . . x1k
 y2   1 x21 . . . x2k 
y= , X =   .. .. 

 ...   .
..
. . 
yn 1 xn1 . . . xnk

In this notation, the regression model becomes y = Xβ + u, and

the OLS estimates β̂ = (β̂0 , β̂1 , . . . , β̂k )′ can be written compactly
as
β̂ = (X′ X)−1 X′ y.
. . . . . . 10/65

Fitted values and residuals

Just as in simple regression, OLS estimation decomposes observed

y into an explained part ŷ (the fitted value) and an unexplained
part or residual û:

yi = β̂0 + β̂1 xi1 + . . . + β̂k xik + ûi = ŷi + ûi .

Because the sample mean of the residuals is 0, the sample mean of

the fitted values is ȳ and the averages are on the regression
‘hyperplane’:
ȳ = β̂0 + β̂1 x̄1 + . . . + β̂k x̄k .

. . . . . . 11/65

Simple and multiple linear regression coefficients

In most cases, the estimate β̂1 in a simple linear regression

y = β̂0 + β̂1 x1 + û

differs from the estimate β̂1 in a comparable multiple regression

y = β̂0 + β̂1 x1 + . . . + β̂k xk + û.

The coefficient estimates only coincide in special cases, such as

cov(x1 , xj ) = 0 for all j ̸= 1 or β̂j = 0 for all j ̸= 1. Note that
cov(x1 , xj ) = 0 is not the typical case: regressors are usually
correlated with other regressor variables.

. . . . . . 12/65

Simple and two-regressor regression: a property

Consider the simple regression

y = β̃0 + β̃1 x1 + ũ

and the regression of y on x1 and on an additional x2 :

y = β̂0 + β̂1 x1 + β̂2 x2 + û.

It is easily(?) shown that

β̃1 = β̂1 + β̂2 δ̂,

where δ̂ is the slope coefficient in a regression of x2 on x1 . Clearly,

β̃1 = β̂1 iff one of the two factors β̂2 and δ̂ is 0. Note: β̂1 is not
necessarily ‘better’ or ‘more correct’ than.
β̃1 .. . . . . 13/65

Goodness of fit in the multiple model

The variance decomposition equation
∑
n ∑
n ∑
n
(yi − ȳ )2 = (ŷi − ȳ )2 + ûi2
i=1 i=1 i=1
or
SST = SSE + SSR
continues to hold in the multiple regression model. Likewise,
SSE SSR
R2 = =1−
SST SST
defines a descriptive statistic in the interval [0, 1] that measures
the goodness of fit. Note, however, that R 2 is not the squared
correlation of y and any xj but the maximum squared correlation
coefficient of y and linear combinations . of x1. , . . . ., xk . . . . 14/65

Assumptions for multiple linear regression

In order to establish OLS properties such as unbiasedness etc.,

model assumptions have to be formulated. The first assumption is
the natural counterpart to (SLR.1), the linearity in parameters:
MLR.1 The population model can be written

y = β0 + β1 x1 + β2 x2 + . . . + βk xk + u,

with unknown coefficient parameters β1 , . . . , βk , an intercept

parameter β0 , and unobserved random error u.

. . . . . . 15/65

Assumption of random sampling

MLR.2 The data constitute a random sample of n observations

{(xi1 , . . . , xik , yi ) : i = 1, . . . , n} of random variables
corresponding to the population model (MLR.1).
Due to the random-sampling assumption (MLR.2), observations
and also errors are independent for different i.

. . . . . . 16/65

No multicollinearity
In simple regression, OLS requires some variation in the regressor
(should not be entirely constant). In multiple regression, more is
needed. The matrix X′ X must be invertible:
MLR.3 There are no exact linear relationships connecting the
regressor variables, and no regressor is constant in sample or
in population.

Assumption (MLR.3) implies n > k. When it holds in population,

violation in sample happens with 0 probability for continuous
random variables. (MLR.3) is violated if a regressor is the sum or
difference of other regressors. It is not violated for nonlinear
identities, such as x2 = x12 .
. . . . . . 17/65

Danger: do not misinterpret the multicollinearity condition

MLR.3 does not imply that the correlation among explanatory

variables is assumed as 0. Regressors are not assumed to be
independent of each other.
It is true that estimates are less precise for strong correlation
among regressors, and that ceteris paribus interpretations become
more difficult. Nonetheless, none of the basic properties of the OLS
estimator depends on regressors being independent or uncorrelated.

. . . . . . 18/65

Zero conditional expectation

The assumption E(u|x) = 0 is just a natural generalization of the

simple regression assumption:
MLR.4 The error u has an expected value of zero given any values of
the regressors, in symbols

E(u|x1 , x2 , . . . , xk ) = 0.

Again, (MLR.4) implies E(u) = 0 but it is stronger than that

property. (MLR.4) also implies cov(xj , u) = 0 for all regressors xj .

. . . . . . 19/65

Violations of assumption MLR.4

There are several reasons why (MLR.4) may not hold, in particular
the ensuing condition cov(xj , u) = 0, which is often called an
exogeneity condition. When it is violated, xj is called an
endogenous regressor.
I If the true relationship is nonlinear, E(u|xj ) ̸= 0 even though
E(u) = 0;
I If an important influence factor has been omitted from the list
of regressors (‘omitted variable bias’), (MLR.4) is formally
violated. The researcher must decide whether she wishes to
estimate the regression without or with the doubtful control;
I If there is logical ‘feedback’ from y to some xj , u and xj are
correlated, xj is endogenous, and regression yields biased
estimates of the true relationship. This case must be handled
by special techniques (instrumental. variables).
. . . . . 20/65

Unbiasedness of OLS

(MLR.1) to (MLR.4) suffice for unbiasedness:

Theorem
Under assumptions (MLR.1)–(MLR.4),

E(β̂j ) = βj , j = 0, 1, . . . , k,

for any values of the parameters βj .

In words, OLS is an unbiased estimator for the intercept and all
coefficients. In short, one may write E(β̂) = β, using the notation
β for a (k + 1)–vector (β0 , β1 , . . . , βk )′ and a corresponding
notation for the expectation operator.

. . . . . . 21/65

Scylla and Charybdis

How many regressors should be included in a multiple regression?
I Omitting influential regressors (too low k) tends to overstate
the effects. Effects due to the omitted variables are attributed
to the included regressors (‘omitted variable bias’).
Conversely, for example the effect of a difference between two
regressors may not be found if only one of them is included;
I ‘Profligate’ regressions with many regressors lack degrees of
freedom. Results will be imprecise, variances will be large.

Statistical tools for ‘model selection’ are important (R 2 and R̄ 2 do

not work). Generally, economists tend to include too many
regressors.
. . . . . . 22/65

Homoskedasticity

For the efficiency and variance properties, constant variance must

be assumed:
MLR.5 The error u has the same variance given any values of the
explanatory variables, in symbols

var(u|x1 , . . . , xk ) = σ 2 .

If (MLR.5) is violated, the error variance and also the variance of

the dependent variable will change with some xj .
Heteroskedasticity is often observed in cross-section data.

. . . . . . 23/65

The variance of OLS

The most informative way to represent the OLS variance is by
using matrices:
Theorem
Under assumptions (MLR.1)–(MLR.5), the variance of the OLS
estimator β̂ = (β̂0 , β̂1 , . . . , β̂k )′ is given by

var(β̂|X) = σ 2 (X′ X)−1 ,

where the operator var applied to a vector denotes a matrix of

variances and covariances.

The matrix expression must be evaluated in OLS estimation

anyway. As n → ∞, the matrix X′ X divided by n may converge to
a moment matrix of the regressors.
. . . . . . 24/65

A property of the OLS variances

From the general formula in the theorem, the interesting formula

σ2
var(β̂j |X) =
SSTj (1 − Rj2 )
∑
is obtained, where SSTj denotes ni=1 (xij − x̄j )2 and Rj2 is the R 2
from a regression of xj on the other regressors xl , l ̸= j. Note that
this formula does not use any matrices.
Strong variation in the regressor xj and weak correlation with other
regressors benefits the precision of the coefficient estimate β̂j .

. . . . . . 25/65

Estimating the OLS variance

In the formulae for the OLS variance, the item σ 2 is unobserved
and must be estimated. In analogy to simple regression, the
following theorem holds:
Theorem
Under the assumptions (MLR.1)–(MLR.5), it holds that
∑n
û 2 SSR
E i=1 i = E = Eσ̂ 2 = σ 2 ,
n−k −1 n−k −1
i.e. the estimator of the error variance is unbiased.

The scale factor n − k − 1 corresponds to the ‘degrees of freedom’

concept: n observations yield k + 1 coefficient estimates, such that
n − k − 1 degrees of freedom remain. The proof is omitted.
. . . . . . 26/65

Gauss-Markov and multiple regression

In direct analogy to the case of simple regression, there is the
celebrated Gauss-Markov Theorem for linear efficiency:
Theorem
Under assumptions (MLR.1)–(MLR.5), the OLS estimators
β̂0 , β̂1 , . . . , β̂k are the best linear unbiased estimators of
β0 , β1 , . . . , βk , respectively, i.e. BLUE.

For this reason, (MLR.1)–(MLR.5) are called the Gauss-Markov

conditions. It can be shown that a genuine multivariate
generalization holds and that linear combinations of OLS
estimators are BLUE for linear combinations of coefficient
parameters.
. . . . . . 27/65

Normal regression

For some results, such as unbiasedness and linear efficiency, no

exact distributional assumptions are needed. For others, it is
convenient to assume a Gaussian (normal) distribution:
MLR.6 The error u is independent of the explanatory variables and it
is normally distributed with mean 0 and variance σ 2 , in
symbols u ∼ N (0, σ 2 ).

Assumption (MLR.6) implies (MLR.4) and (MLR.5). Normality is

often a reasonable working assumption, unless there is strong
evidence to the contrary. In large samples, it can be tested.

. . . . . . 28/65

OLS coefficient estimates as normal random variables

Theorem
Under the assumptions (MLR.1)–(MLR.6), the conditional
distribution on the regressors of the OLS coefficient estimates β̂j is
normal, i.e.
β̂j |X ∼ N {βj , var(β̂j )},
with var(β̂j ) given by the direct expression using the idea of a
regression of xj on other covariates or as the (j, j) element of the
variance matrix σ 2 (X′ X)−1 .

Note that the variance is formally a random variable as it depends

on X. Proof is quite obvious, as for given X, β̂ is a linear function
of y, which in turn is normal due to normal u.
. . . . . . 29/65

Implications of the normality of OLS estimates

I Normality does not only hold for the individual β̂j , it holds
that β̂ ∼ N (β, σ 2 (X′ X)−1 ) for a multivariate
((k + 1)–variate) normal distribution. Thus, all sums,
differences, linear combinations of coefficient estimates are
also normally distributed;
I From the properties of the normal distribution, it follows that
the theoretical standardized estimate
β̂j − βj
s.e.(β̂j )
is standard normal N (0, 1) distributed. The denominator
standard error, however, is the square root of the true and
unknown variance. This distributional property does not hold
for the estimated standard error. . . . . . . 30/65

The empirically standardized estimate

If the OLS coefficient estimates are standardized by estimated

standard errors, the distribution follows the well known t law (in
the older literature ‘Student’ distribution):

Theorem
Under assumptions (MLR.1)–(MLR.6),

β̂j − βj
∼ tn−k−1 ,
c β̂j )
s.e.(

in words, ‘the empirically standardized estimate follows a t

distribution with n − k − 1 degrees of freedom’.

. . . . . . 31/65

Remarks on the standardized estimate

I The t distribution with m degrees of freedom is defined from
m + 1 independent standard normal random variables
a, b√
1 , . . . , bm as the distribution of the ratio
a/ (b12 + . . . + bm
2 )/m;

I For more than around 30 degrees of freedom, the t

distribution becomes so close to the normal N (0, 1) that the
standard normal can be used instead;
I The standardized estimator will not be t distributed if the
normality assumption (MLR.6) is violated;
I Degrees of freedom can be remembered as follows: out of n
original degrees of freedom, k + 1 are used up by estimating
coefficients and the intercept, and n − k − 1 remain.
. . . . . . 32/65

Densities of t distributions

0.3
0.2
f(x)

0.1
0.0

−4 −2 0 2 4

Densities for the t distribution with 5 (black), 10 (blue), and 20

(green) degrees of freedom. . . . . . . 33/65

Testing the null hypothesis βj = 0

Researchers are interested in testing the null hypothesis

H0 : β j = 0

usually with the alternative βj ̸= 0, less often with the alternative

βj > 0 or βj < 0. An appropriate test statistic to test this H0 is
the empirically standardized estimate for βj = 0, i.e.

β̂j
tβj = ,
c β̂j )
s.e.(

which is called the t ratio or t statistic.

. . . . . . 34/65

What is a hypothesis test?

A hypothesis test is a statistical decision procedure. Based on the

value of a test statistic, which is a function of the sample and
hence a random variable, it either rejects the null hypothesis or is
unable to reject it (fails to reject).
For example, the t–test rejects the null of βj = 0 if

tβj > c, |tβj | > c,

in the one-sided and two-sided versions. c is called the critical

value, and the region of R where the test rejects, is called the
critical region.

. . . . . . 35/65

How are the critical values determined?

Hypothesis tests are tuned to significance levels. A significance

level is the probability of a type I error, i.e. of rejecting the null
even though it is correct. The construction of a test requires
knowledge of the distribution of the test statistic under the null.

Suppose the significance level (specified by the researcher) is 5%.

Then, any interval that has the probability of 5% under the null is
a valid critical region for a valid test. In order to minimize the
probability of a type II error, critical regions are defined to be
situated in the tails of the null distribution. For example, the 95%
quantile of the t–distribution is a good critical value for a 5% test
against a one-sided alternative if the test statistic is t–distributed.

. . . . . . 36/65

Practical implementation of hypothesis tests

Presume the researcher has the value of the test statistic and
searches for critical values. Several options are available:
I If the null distribution is a known standard law, critical values
are found on the web or in books: inconvenient;
I Critical values may also be provided by a statistical software in
this case: slightly more convenient and flexible;
I If the software is smart, it will provide p–values instead of or
in addition to the critical values: very precise and convenient;
I If the null distribution is non-standard and rare, the researcher
may have to simulate the distribution via Monte Carlo or by
bootstrap procedures: computer skills needed.

. . . . . . 37/65

Definition of the p–value

Correct definitions:
I The p–value is the significance level, at which the test
becomes indifferent between rejection and non-rejection for
the sample at hand (the calculated value of the test statistic);
I The p–value is the probability, under the null hypothesis, of
generating values of the test statistic that are even more
unusual (less typical, often ‘larger’) than the one calculated
from the sample.

Incorrect definition:
I The p–value is the probability of the null hypothesis for this
sample.
. . . . . . 38/65

Test based on quantiles

0.4
0.3
0.2
0.1
0.0

−4 −2 0 2 4

10% to 90% quantiles of the normal distribution. The observed value of

2.2 for the test statistic that is, under H0 , normally distributed is
significant at 10% for the one-sided test.
. . . . . . 39/65

Test based on p–values

0.4
0.3
0.2
0.1
0.0

−4 −2 0 2 4

The area under the density curve to the right of the observed value of 2.2
is 0.014, which yields the p–value. The one-sided test rejects on the
levels of 10%, 5%, but not 1%.
. . . . . . 40/65

Return to the t–test

Assume (MLR.1)–(MLR.6). Under the null hypothesis H0 : βj = 0,
the t ratio for βj ,
β̂j
tβj = ,
c β̂j )
s.e.(
will be t–distributed with n − k − 1 degrees of freedom or tn−k−1
distributed. Thus, reject H0 at 5% significance in favor of the
alternative HA : βj > 0 if the test statistic is larger than the 95%
quantile of the tn−k−1 distribution. Reject in favor of HA : βj ̸= 0
if the test statistic is larger than the 97.5% quantile or less than
the 2.5% quantile.
When the t–test rejects, it is often said that ‘βj is significantly
different from 0’ or simply that ‘βj is significant’ or also that ‘xj is
significant’. . . . . . . 41/65

More general t–tests

Presume one wishes to test H0 : βj = βj0 for a given value βj0 ,

such as H0 : βj = 2.15. Then, evaluate the statistic

β̂j − βj0
.
c β̂j )
s.e.(

Under H0 , it is clearly tn−k−1 distributed, and the usual quantiles

can be used.

. . . . . . 42/65

Testing several exclusion restrictions jointly

Assume the null hypothesis of concern is now

H0 : βl+1 = βl+2 = . . . = βk = 0,

i.e. the exclusion of k − l regressors. Then, a suitable test statistic

is
(SSRr − SSRu )/(k − l)
F = ,
SSRu /(n − k − 1)
where SSRr is the SSR for the restricted model without the k − l
regressors and SSRu for the unrestricted model with all k
regressors. Assuming (MLR.1)–(MLR.6), the statistic F is, under
the null, distributed F with k − l ‘numerator’ and n − k − 1
‘denominator’ degrees of freedom.
. . . . . . 43/65

Densities of F distributions

1.0
0.8
0.6
f(x)

0.4
0.2
0.0

0 1 2 3 4 5

Densities of F distributions with 2 (black), 4 (blue), and 6 (green)

numerator and 20 denominator degrees .of freedom.
. . . . . 44/65

Some remarks on the F test

I The F test is easily generalized to test for general restrictions,
such as

H0 : β2 + β3 = 1, β5 = 3.61, β6 = 2β7 ,

as again there exists a SSRu and a SSRr . The main difficulty

may be the estimation of the restricted model. Numerator
degrees of freedom correspond to the number of linear
independent restrictions;
I For large n, the F statistic will be distributed like 1/(k − l)
times a χ2 (k − l) distribution;
I The F statistic for the exclusion of one regressor xj is the
square of tβj .
. . . . . . 45/65

The overall F test

A special F –test has the null hypothesis H0 : β1 = ... = βk = 0

and the alternative that at least one coefficient is non-zero. The
statistic is
(SST − SSR)/k R 2 /k
F = = ,
SSR/(n − k − 1) (1 − R 2 )/(n − k − 1)

a transformation of the R 2 . When it fails to reject, the regression

model fails to provide a useful description of y . This is the only
F –statistic that shows in a standard regression printout.

. . . . . . 46/65

The importance of F and t tests

F and t tests are restriction tests that are tools in searching for
the best specification of a regression equation—the best selection
of regressors—that determines the targeted dependent variable y .

I Only nested models can be compared. For example,

y = β0 + β1 x1 + u can be tested against
y = β0 + β1 x1 + β2 x2 + u but not against y = β0 + β1 x2 + u;
I In the specification search, it is often recommended to start
with a profligate model and to eliminate insignificant
regressors (backward elimination, general-to-specific) rather
than to add regressors from a ‘small’ model;
I The decisions of t–tests, say, for two coefficients βl , βj and of
the F –test for βl = βj = 0 are often in conflict. Some
researchers prefer the decision of the F –test in doubtful cases.
. . . . . . 47/65

Ouch!

The following statements are regarded as incorrect:

I The tested null hypothesis is H0 : β̂j = 0;
I The test is rejected;
I The alternative hypothesis can be rejected;
I The test is 2.55;
I The coefficient β4 is significant at 95% (unless someone really
uses an unusual 95% significance level);
I The hypothesis that β4 is insignificant can be rejected.

. . . . . . 48/65

The probability limit

When talking about asymptotics, i.e. large-sample behavior,
statistical convergence concepts are needed. For convergence of a
sequence of random variables X1 , . . . , Xn , . . . to a fixed limit, we
use
Definition
A sequence of random variables (Xn ) is said to converge in
probability to θ ∈ R, in symbols plim Xn = θ iff for every ε > 0
n→∞

P(|Xn − θ| > ε) → 0 as n → ∞.

This concept is relatively weak, as it does not imply that single

realizations of the random variable sequence converge. It allows
simple rules, such as plim(Xn Yn ) = (plimX
.
n.)(plimY
.
n ).
. . . 49/65

Consistency of OLS
An estimator θ̂ for the parameter θ is called consistent iff

plim θ̂(n) = θ,
n→∞

with θ̂(n) denoting an estimate from a sample of size n. For OLS

in the linear regression model, consistency holds under relatively
weak conditions:
Theorem
Under assumptions (MLR.1)–(MLR.4) and some technical
conditions, the OLS estimator β̂ is consistent for β, which implies
that plim β̂j = βj for j = 0, 1, . . . , k.
n→∞

. . . . . . 50/65

A sketch of the consistency issue

Consider
1 1
β̂ = β + (X′ X)−1 X′ u = β + ( X′ X)−1 X′ u.
n n
Typically, the term n−1 X′ X will converge to some kind of variance
matrix. The term n−1 X′ u should converge to its expectation
EX′ u, which is 0 if X and u are uncorrelated and Eu = 0. Thus,
the condition
MLR.4’ E(u) = 0, cov(xj , u) = 0, for j = 1, . . . , k,
will suffice for consistency and can be substituted for the stronger
assumption (MLR.4).

. . . . . . 51/65

Correlation of regressor and error is pretty bad

It was shown before that correlation between a regressor and the

errors (for example, with omitted variables and with endogeneity)
usually causes a bias in the sense of Eβ̂ ̸= β. If (MLR.4’) is
violated, this bias will not even disappear as n → ∞ and becomes
an inconsistency. As Clive Granger said,
If you can’t get it right as n goes to infinity, you
shouldn’t be in the business.
This means that inconsistent estimators should not be used at all.
Inconsistency is more serious than a finite-sample bias.

. . . . . . 52/65

Asymptotic normality of the OLS estimator

The (a) celebrated Central Limit Theorem can be used to prove
Theorem
Under the Gauss-Markov assumptions (MLR.1)–(MLR.5) and some
technical conditions, it holds that

β̂j − βj d
→ N (0, 1),
c β̂j )
s.e.(
√ d
and generally that n(β̂j − βj ) → N (0, σβ2j ) with σβ2j determined
either from the matrix formula σ 2 (X′ X)−1 or by the
aforementioned construction from regressions among regressors.

. . . . . . 53/65

Remarks for the asymptotic normality of OLS

I Note that normality of the errors is not required: even for

most non-normal error distributions, β̂ will approach a normal
limit distribution;
I Under the assumptions of the theorem, σ̂ 2 will converge to σ 2 ;
I This latter convergence is of type ‘plim’, while the main result
d
of the theorem uses convergence in distribution (→), a weaker
type of convergence. Convergence in distribution means that
the distribution of a random variable converges to a limit
distribution, nothing else is stated on the random variables
proper.

. . . . . . 54/65

Lagrange multiplier tests: the idea

Restriction tests (t and F ) follow the Wald test principle, one of
the three test construction principles used in parametric statistics.
The other two principles are the likelihood-ratio test (LR) and
the Lagrange multiplier test principle (LM). LR and LM tests are
typically asymptotic tests, their small-sample null distributions
are uncertain, their large-sample distributions will be regular
(chi-square) even in the absence of (MLR.6).
The LM test estimates the model under the null and checks the
increase in the likelihood when moving toward the alternative. It is
also called ‘score test’, as the derivative of the likelihood is called
the score. Often, the LM test can be made operational in a
sequence of regressions, with the test statistic simply calculated as
nR 2 for a specific regression (‘auxiliary regression’).
. . . . . . 55/65

The LM test for exclusion of variables

Consider the multiple regression model

yi = β0 + β1 x1,i + . . . + βk xk,i + ui ,

and the null hypothesis H0 : βk−q+1 = . . . = βk = 0. Estimate the

‘restricted regression model’

yi = β0 + β1 x1,i + . . . + βk−q xk−q,i + ui

by OLS and keep the residuals ũ. Then, regress these ũ on all k
regressors
ũi = γ0 + γ1 x1,i + . . . + γk xk,i + vi .
The nR 2 from this second, auxiliary regression is the LM test
statistic. Under H0 , it is asymptotically distributed as χ2 (q).
. . . . . . 56/65

Model selection: the main issue

The typical situation in multiple regression is that y has been
specified a priori, and that the researcher looks for the optimal set
of regressors that offer the best explanation for y . Tools for this
specification search or regressor selection are:
I R 2 and R̄ 2 can be used for comparing any two or more
models, but tend to increase with adding any regressors;
I F and t tests can only be used for comparing nested models,
and lengthy search sequences tend to invalidate the
significance level;
I Information criteria such as AIC and BIC can compare any
two or more models and penalize complexity;
I Specification tests can be used to eliminate ill-specified
models but cannot find the optimal model.
. . . . . . 57/65

Adjusted R 2

The corrected or adjusted R 2 , often denoted R̄ 2 or Rc2 , is defined as

n−1
R̄ 2 = 1 − (1 − R 2 ) .
n−k −1

It holds that R̄ 2 ≤ R 2 . If R 2 is seen as an estimator for

corr2 (y , β ′ x), then the bias of R̄ 2 is smaller than the bias of R 2 .

R 2 always increases if a new regressor is included in the regression.

R̄ 2 increases if the t–ratio for this additional variable is larger than
1 and corresponds to testing at an enormous significance level. It
cannot be used for serious model selection.

. . . . . . 58/65

Penalizing complexity
Consider the estimated error variance

1 ∑ SSR
n
2
σ̂ = ûi2 = ,
n−k −1 n−k −1
i=1

which, just like R̄ 2 , improves (here, however, decreases) if a new

regressor with a t–ratio greater than one is added. Thus, it cannot
be used for serious model selection. It takes a step in the right
direction, however, a trade-off: the numerator improves (decreases)
with increasing complexity; the denominator deteriorates
(decreases) with higher complexity. This idea is pursued by
information criteria, which impose a stronger penalty for
complexity, strong enough for useful model selection.
. . . . . . 59/65

The AIC according to Akaike

Akaike introduced the AIC (A Information Criterion), in one

possible version
2(k + 1)
AIC = log σ̂ 2 + ,
n
which is to be minimized: complexity decreases the first term and
increases the second. (In information criteria, σ̂ 2 should be formed
using scales n not n − k − 1.) In nested comparisons, minimizing
AIC corresponds to t or F tests at an approximate 15%
significance level. For n → ∞, minimizing AIC selects the best
forecasting model that tends to keep slightly more regressors than
those with non-zero coefficients.

. . . . . . 60/65

The BIC according to Schwarz

Schwarz simplified the BIC that had been introduced by
Akaike, in one version
(k + 1) log n
BIC = log σ̂ 2 + ,
n
which is to be minimized. The BIC complexity penalty is stronger
than the AIC penalty, so selected models tend to be more
parsimonious (smaller). In nested comparisons, minimizing BIC
corresponds to a significance level falling to 0 as n → ∞. For
n → ∞, BIC will select the ‘true’ model, exactly keeping all
regressors with non-zero coefficients. In smaller samples, BIC tends
to select too parsimonious models.
. . . . . . 61/65

. . . . . . 62/65

. . . . . . 63/65

. . . . . . 64/65

. . . . . . 65/65

Introductory Econometrics University of Vienna and Institute for Advanced Studies Vienna

Infidelity in Committed Relationships I: A Methodological Review
100% (1)
Infidelity in Committed Relationships I: A Methodological Review
34 pages
The Capstone Project: An Overview 2008
No ratings yet
The Capstone Project: An Overview 2008
14 pages
Slides Backup
No ratings yet
Slides Backup
38 pages
Lecture 3 - Econometria I
No ratings yet
Lecture 3 - Econometria I
46 pages
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
No ratings yet
The Multiple Linear Regression Model: Version: 30-10-2023, 16:07
17 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
17 pages
Assignments Ashoka University
No ratings yet
Assignments Ashoka University
32 pages
Multiple Linear Regression Notes
No ratings yet
Multiple Linear Regression Notes
9 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
37 pages
Econometrics Lecture4 MultipleRegression
No ratings yet
Econometrics Lecture4 MultipleRegression
40 pages
Lec Topic3
No ratings yet
Lec Topic3
51 pages
Ordinary Least Squares: Linear Model
No ratings yet
Ordinary Least Squares: Linear Model
13 pages
Chapter3 Econometrics MultipleLinearRegressionModel
No ratings yet
Chapter3 Econometrics MultipleLinearRegressionModel
41 pages
Multiple Regression Analysis: I 0 1 I1 K Ik I
100% (1)
Multiple Regression Analysis: I 0 1 I1 K Ik I
30 pages
EE1 - 3 - Multiple Linear Regression
No ratings yet
EE1 - 3 - Multiple Linear Regression
30 pages
Lecture 3 Multiple Regression Model-Estimation
No ratings yet
Lecture 3 Multiple Regression Model-Estimation
40 pages
CHAPTER THREE - Multiple Linear Regression Analysis
No ratings yet
CHAPTER THREE - Multiple Linear Regression Analysis
77 pages
Chapter 3 Econometrics
No ratings yet
Chapter 3 Econometrics
67 pages
Multiple Linear Regression I
No ratings yet
Multiple Linear Regression I
33 pages
Lecture 2 - Regression - Multiple - Regressors
No ratings yet
Lecture 2 - Regression - Multiple - Regressors
30 pages
Derivation of The Ordinary Least Squares Estimator Multiple Regression Case
100% (1)
Derivation of The Ordinary Least Squares Estimator Multiple Regression Case
10 pages
Advanced Regression Methods: 1. Reminders On Linear Regression
No ratings yet
Advanced Regression Methods: 1. Reminders On Linear Regression
109 pages
Ols 2
No ratings yet
Ols 2
19 pages
Lecture 7. Multiple Regression
No ratings yet
Lecture 7. Multiple Regression
11 pages
Multiple Linear Regression Model
No ratings yet
Multiple Linear Regression Model
99 pages
Multiple Linear Regression Model: (Or Equivalently
No ratings yet
Multiple Linear Regression Model: (Or Equivalently
41 pages
Econometrics Cheatsheet en
No ratings yet
Econometrics Cheatsheet en
3 pages
Chapter 3 Multiple Regression
No ratings yet
Chapter 3 Multiple Regression
49 pages
2 - Model Linear Jamak Dan OLS
No ratings yet
2 - Model Linear Jamak Dan OLS
11 pages
L4 MLR With 2 Regressors
No ratings yet
L4 MLR With 2 Regressors
19 pages
Topic3 Multiple Regression
No ratings yet
Topic3 Multiple Regression
12 pages
Ecom 165 Notes
No ratings yet
Ecom 165 Notes
98 pages
Lecture 8
No ratings yet
Lecture 8
29 pages
Econometrics Simple Linear Regression
No ratings yet
Econometrics Simple Linear Regression
22 pages
Unit 3c Linear Regression
No ratings yet
Unit 3c Linear Regression
98 pages
Wooldridge Notes
No ratings yet
Wooldridge Notes
15 pages
Econometrics - Exercise Set 1 (Solution)
No ratings yet
Econometrics - Exercise Set 1 (Solution)
7 pages
Top2 Estimation Handout
No ratings yet
Top2 Estimation Handout
39 pages
統計摘要
No ratings yet
統計摘要
12 pages
MultivariableRegression 4
No ratings yet
MultivariableRegression 4
98 pages
SimpleMultipleLinearRegression FoundationalMathofAI S24
No ratings yet
SimpleMultipleLinearRegression FoundationalMathofAI S24
6 pages
Chapter Three: Estimation of Multiple Linear Regression Model
No ratings yet
Chapter Three: Estimation of Multiple Linear Regression Model
18 pages
Econometric S
No ratings yet
Econometric S
8 pages
Lecture 8 - Removed
No ratings yet
Lecture 8 - Removed
13 pages
Ch3slides Multiple Linear Regression
No ratings yet
Ch3slides Multiple Linear Regression
61 pages
Introduction To Econometrics (ET2013) : Teresa Randazzo
No ratings yet
Introduction To Econometrics (ET2013) : Teresa Randazzo
30 pages
Multiple Regression Model
No ratings yet
Multiple Regression Model
17 pages
Demostraciones
No ratings yet
Demostraciones
20 pages
Chapter 3 Multiple Regression
No ratings yet
Chapter 3 Multiple Regression
49 pages
MFIN 305 - Lecture2
No ratings yet
MFIN 305 - Lecture2
50 pages
Econometrics Chapter-3-Multiple Regression-14-08-2023
No ratings yet
Econometrics Chapter-3-Multiple Regression-14-08-2023
33 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Week 2, OLS
No ratings yet
Week 2, OLS
83 pages
Econometrics Lecture 3 Multiple Regression Estimation
No ratings yet
Econometrics Lecture 3 Multiple Regression Estimation
40 pages
Mathemathical Economics
No ratings yet
Mathemathical Economics
60 pages
Ecc321 Chapter 3
No ratings yet
Ecc321 Chapter 3
8 pages
03 Advance
No ratings yet
03 Advance
35 pages
2024 1 Metrics 6 Multipleols 1
No ratings yet
2024 1 Metrics 6 Multipleols 1
35 pages
Econometrics Chap - 2
No ratings yet
Econometrics Chap - 2
57 pages
Econometrics-I - Lecture Notes - Week 7-12
No ratings yet
Econometrics-I - Lecture Notes - Week 7-12
132 pages
Research Methods: Dr. Ali Rawabdeh
No ratings yet
Research Methods: Dr. Ali Rawabdeh
37 pages
WEEK 7 Handout
No ratings yet
WEEK 7 Handout
5 pages
Practical Research 1 Week 2
No ratings yet
Practical Research 1 Week 2
13 pages
1 Lecture or
No ratings yet
1 Lecture or
25 pages
Metpen Sosiolinguistik PDF
No ratings yet
Metpen Sosiolinguistik PDF
184 pages
Awad Et Al. Some Inference Results On PR - YX in The Bivariate Exponential Distribution
No ratings yet
Awad Et Al. Some Inference Results On PR - YX in The Bivariate Exponential Distribution
12 pages
RCT
No ratings yet
RCT
5 pages
The Current Status of STEM Education Research
No ratings yet
The Current Status of STEM Education Research
5 pages
Personal Problem Solving Inventory
No ratings yet
Personal Problem Solving Inventory
14 pages
Review of Related Literature
No ratings yet
Review of Related Literature
4 pages
Dissertation PDF
No ratings yet
Dissertation PDF
78 pages
Dissertation Data Collection and Analysis
100% (2)
Dissertation Data Collection and Analysis
4 pages
Using Secondary Data in Educational and Social Research Conducting Educational Research 1st Edition Emma Smith 2024 Scribd Download
100% (12)
Using Secondary Data in Educational and Social Research Conducting Educational Research 1st Edition Emma Smith 2024 Scribd Download
85 pages
RAT Inquiries, Immersion - Investigation
No ratings yet
RAT Inquiries, Immersion - Investigation
11 pages
Twitter As Data PDF
No ratings yet
Twitter As Data PDF
116 pages
Hellen Quantitative Assignment
No ratings yet
Hellen Quantitative Assignment
4 pages
5 SEC - Usman, Britto, Damm & Börstler - Effort Estimation in Large-Scale Software DevelopmentAn Industrial Case Study
No ratings yet
5 SEC - Usman, Britto, Damm & Börstler - Effort Estimation in Large-Scale Software DevelopmentAn Industrial Case Study
30 pages
Ket Qua Eview Chuong 4 - 9
No ratings yet
Ket Qua Eview Chuong 4 - 9
12 pages
สรุป Midterm
No ratings yet
สรุป Midterm
15 pages
Legal Research & Methodology by Harshit Kiran
100% (1)
Legal Research & Methodology by Harshit Kiran
106 pages
EJ1336249
No ratings yet
EJ1336249
10 pages
Walter Onyango Midigo
No ratings yet
Walter Onyango Midigo
17 pages
STROBE Checklist Cohort
No ratings yet
STROBE Checklist Cohort
2 pages
STTN122 Test 1 2023
No ratings yet
STTN122 Test 1 2023
9 pages
Psychological Statistics and Psychometrics Using Stata, 1st Edition PDF
100% (14)
Psychological Statistics and Psychometrics Using Stata, 1st Edition PDF
17 pages
Sample Research
No ratings yet
Sample Research
41 pages
International Journal of Research and Analytical Reviews
No ratings yet
International Journal of Research and Analytical Reviews
229 pages
Lesson 1 Quarter 4 Grade 10
No ratings yet
Lesson 1 Quarter 4 Grade 10
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.