0% found this document useful (0 votes)
21 views35 pages

EC1 Slides Part4

The document discusses the finite sample properties of the OLS estimator b. It states that: 1) Under assumptions FS1-FS3, b is an unbiased estimator of the true parameter values. Its variance can be expressed in terms of the sample size and the matrix X'X. 2) If assumption FS3 does not hold, for example if relevant variables are omitted, b will be a biased estimator. 3) Multicollinearity among the X variables increases the variance of the coefficient estimates, weakening the precision of b. Larger sample sizes help address multicollinearity issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views35 pages

EC1 Slides Part4

The document discusses the finite sample properties of the OLS estimator b. It states that: 1) Under assumptions FS1-FS3, b is an unbiased estimator of the true parameter values. Its variance can be expressed in terms of the sample size and the matrix X'X. 2) If assumption FS3 does not hold, for example if relevant variables are omitted, b will be a biased estimator. 3) Multicollinearity among the X variables increases the variance of the coefficient estimates, weakening the precision of b. Larger sample sizes help address multicollinearity issues.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

1

4 Finite-Sample Analysis [Greene, Wooldridge]

4.1 Finite Sample Properties of b

The formula b as function of the random sample is an estimator and thus it is random.
After applying the OLS formula using the numeric values of y and X, b is an estimate (not
random). We use the sample letter b to represent the estimator and the estimate and you
should know from the context whether b is an estimator or an estimate. In this section b is
the OLS estimator (thus it is random).
2

Theorem 4.1.1 (Finite-Sample Properties of b). We have:

(a) (Unbiasedness) Under Assumptions FS1-FS3, E ( bj X) = and E (b) = :

1
(b) (Expression for the variance) Under Assumptions FS1-FS4, Var ( bj X) = 2 X0X
and Var (b) = 2 E X0X 1 :

(c) (Gauss-Markov Theorem) Under Assumptions FS1-FS4, the OLS estimator is e¢cient in
the class of linear unbiased estimators (also called Minimum Variance Linear Unbiased
Estimator, MVLU or Best Linear Unbiased Estimator, BLUE). That is, for any unbi-
ased estimator ^ that is linear in y, Var ( bj X) Var ^ X in the matrix sense (i.e.
Var ^ X Var ( bj X) is a positive semide…nite matrix).
3

4.1.1 Unbiased Estimation

It can be proved under the Assumptions FS1-FS3 that [board]

E ( b j X) =
E (b ) = :
However, if FS3 does not hold the estimator is biased. This is obvious from the representation
1
E ( b j X) = + X0 X X0 E ( "j X) :
See the following section for an example where E ( "j X) 6= 0:
4

4.1.2 Bias Caused by Omission of Variables (Correlated with X, X01X2 6= 0)

The correct speci…cation:

y = X + " = X1 1 + X2 2 + "

If we regress y on X1 (we omit X2) then


1
b1 = X01X1 X01y
1
= X01X1 X01 (X1 1 + X2 2 + ")
1 1
= 0
1 + X1 X1 X01X2 0
2 + X1 X1 X01":
Hence
0 1
E ( b 1 j X) = 1 + X1 X1 X01X2 2:
Unless X01X2 = 0 and/or 2 = 0, b1 is biased.
5

If X01X2 6= 0 but the variables are uncorrelated then only the coe¢cient of the intercept is
a¤ected (see section 3.3.2). In general this not a matter of concern, as usually we focus on
the other coe¢cients . Thus, in practical situations the issue is whether the omit variables
are:

relevant (i.e. 2 6= 0 using the notation of the previous slide), or

correlated with the variables of the model.

If we regress y on X2 (we omit X1) then a similar situation occurs: E b2 X = 2 +


1
X02X2 X02X1 2.
6

4.1.3 The Variance of b

We have
1
b = X0 X X0 y
1
= X0 X X0 (X + ")
1
= + X0 X X0 "

1
Var ( bj X) = Var + X0 X X0 " X
1 1
= X0 X X0 Var ( "j X) X X0 X [Revisões, 29(a)]
1 2I 1
= X0 X X0 X X0 X FS4: Var ( "j X) = 2I
2 1
= X0 X :
Finally,
2 1
Var (b) = E (Var ( bj X)) + Var (E ( bj X)) = E X0 X :
7

4.1.4 Decomposition of Var (bk )

Note:
Var ( b1j X) Cov ( b1; b2j X) Cov ( b1; bK j X)
2 1
2 3

Cov ( b1; b2j X) Var ( b2j X) Cov ( b2; bK j X)


Var ( bj X) = X0 X ... ... ... ...
6 7
6 7
=6 7

Cov ( b1; bK j X) Cov ( b2; bK j X) Var ( bK j X)


4 5

It can be proved that Var ( bk j X) ; k = 2; :::; K can be written as


2 2
Var ( bk j X) = 2
=
1 Rk2 i=1 (xik xk ) 1 Rk2 Sx2k n
Pn

where Rk2 is the coe¢cient of determination in the auxiliary regression of xik on the remaning
explanatory variables, and Sx2k = (xik xk )2 =n: For example if k = K the auxiliary
regression is xiK = 1 + 2xi2 + ::: + K 1xiK 1 + ui:
P
8

Let us see again the expression:


2 2
Var ( bk j X) = 2
= :
1 Rk2 (xik xk ) 1 Rk2 Sx2k n
P

We can conclude that the precision of bk is high (i.e. Var ( bk j X) is small) when:

2 is low;

Sx2k is high. For example, in the regression


wage = 1 + 2educ + ::: + ":
if most people (in the sample) approximately report the same education, Sx2k will be low
and 2 will be estimated very imprecisely. Little can be learned about the relationship
between wages and education if the majority of people interviewed report the same level
of education.

n is high (large sample is preferable to small sample).

Rk2 is low (multicollinearity increases Rk2 ).


9

4.1.5 Multicollinearity [Wooldridge, 2013]

If rank (X) < K then b is not de…ned. This is called strict multicollinearity. When this
happens, the statistical software will be unable to construct X0X 1 : Since the error is
discovered quickly, this is rarely a problem for applied econometric practice.

The more relevant situation is near multicollinearity, which is often called “multicollinearity”
for brevity. This is the situation when the X0X is near singular, when the columns of X are
close to linearly dependent.

Consequence: the individual coe¢cient estimates will be imprecise. We have shown that
2
Var ( bK j X) = :
1 2 S2 n
RK xK
2 is the coe¢cient of determination in the auxiliary regression
where RK

xiK = 1 + 2xi2 + ::: + K 1xiK 1 + ui


Therefore, multicollinearity is translated into high values of RK K
2 which in‡ates Var ( b j X) :
10

As a consequence:

Small changes in the data produce wide swings in the parameter estimates.
Coe¢cients may have the “wrong” sign or implausible magnitudes.
Coe¢cients may have very high standard errors and low signi…cance levels even though
they are jointly signi…cant and the R2 for the regression is quite high.

And:

More date is a remedy for multicollinearity. Why?


High degree of correlation between certain independent variables can be irrelevant as to how
well we can estimate other parameters in the model. For example, in a model with K = 4;
suppose that xi2 and xi3 are strongly correlated to each other. However, none of them are
correlated to xi4: Then, the estimation of b4 does not su¤er any problem from the correlation
between xi2 and xi3: If xi2 and xi3 are merely controlled variables the “multicollinearity
problem” is not even a problem. The problem here would be if the controlled variables were
removed.
Multicollinearity may make estimates of individual 0j s imprecise, while facilitating the
precise estimation of particular combinations of the elements of (see the exercises).
11

4.1.6 The Gauss-Marvov Theorem

The Gauss-Markov Theorem (Theorem 4.1.1, part (c)) states that under Assumptions FS1-
FS4, the OLS estimator is e¢cient in the class of linear unbiased estimators (also called
Minimum Variance Linear Unbiased Estimator, MVLU or Best Linear Unbiased Estimator,
BLUE). That is, for any unbiased estimator ^ that is linear in y, Var ( bj X) Var ^ X
in the matrix sense (i.e. Var ^ X Var ( bj X) is a positive semide…nite matrix).

Proof: Let ^ another linear unbiased estimator of : It can be proved [board] that

Var ^ X Var ( bj X)
is a positive semide…nite matrix.
12

4.1.7 Estimating Var ( bj X)

If we wish to test hypotheses about or to form con…dence intervals, then we will require
a sample estimate of the covariance matrix,
1
Var ( bj X) = 2 X0X :

1
The part X0X is known, but 2 is not. Note:
2 = E "2i

hence a possible but unfeasible estimator for 2 is


n
"2i :
n i=1
1X

It is unfeasible because "i is unobservable. Let


n
1
s2 = e2i :
n K i=1
X
13

The standard error of the regression is s: It can be proved that [board]


2 2
E s X = :
Thus
1
Var
d ( b j X ) = s2 X0 X :

1=2 1
The square root of the k-th diagonal element of this matrix s2 X0X is the
kk
h i

standard error of the estimator bk , which is often denoted simply “the standard error of bk ”
or ^ bk : For example

Var 1 Cov 1 2 2:1 :5


Var =
Cov Var : 5 3 :2
" # " #

2 1 2
d ( b j X) d ( b ; b j X)
d ( b j X) =

and
d ( b ; b j X) d ( b j X)

p p
^ b1 = 2 :1 ; ^ b2 = 3 :2 :
14

4.1.8 The Normality Assumption

The normality assumption FS5 allows us to conclude [board]


1
"j X N 0; 2 I ) bj X N ; 2 X0 X :

In particular

2 1 2 b k
bk j X N k; X0 X =N k ; bk or zk = k N (0; 1)
kk bk
We will see in the section “Hypothesis Testing” that the Normality Assumption leads to the
following result:
b k
tk = k t (n K ) :
^ bk
The normal distribution of b in a …nite sample is a consequence of our speci…c assumption
of normally distributed disturbances. What if the distribution of " is unknown? We must
rely on asymptotic theory (we see later on).
15

4.2 Finite Simple - Statistical Inference under Normality [Hayashi,


Goldberger & Wooldridge]

Recall the FS1-FS5 hypotheses:


Assumption (FS1 - Linearity). yi = 1 + 2xi2 + ::: + K xiK + "i.
Assumption (FS2 - Full rank). rank (X) = K
Assumption (FS3 - Exogeneity of the independent variables). E "ijxj1; :::; xjK = 0.
Assumption (FS4 - Homoscedasticity and Nonautocorrelation). Var ( "j X) = 2I
Assumption (FS5 - Normal Distribution). "ij X N ormal
16

It can be proved that under assumptions FS1-FS5


b k
tk = k t (n K)
^ bk
Rb R
t (n K) ; R1 K
s R ( X 0 X ) 1 R0
1
t= q

1
F = (Rb r) 0
R X0 X R0 (Rb r) = ps2 F (p; n K) :

where R is a matrix of size p K (with p linearly independent rows) and r is a p 1 vector.

The statistics F can give, for example, the distribution of


b1
b2 b3 0 1 1
= Rb
0 1

b1 + b2 + 3b3 1 1 3
! !

b3
B C
@ b2 A =

(after Rb be properly standardized according to the formula F ).


17

4.2.1 Testing on a Single Parameter

Suppose that we have a hypothesis about the kth regression coe¢cient:

H0 : k = 0k
( 0k is a speci…c value, e.g. zero), and that this hypothesis is tested against the alternative
hypothesis
H1 : k 6= 0k :
Under the null hypothesis we have
bk 0
t0k = k t(n K):
^ bk
If we observe jtobsj > t1 =2 and the H0 is true, then a low-probability event has occurred.
We take jtobsj > t1 =2 as an evidence against the null and the decision should be to reject
H0:
18

Other cases:

H0 : k = 0k vs: H1 : k > 0k ;

if tobs > t1 then reject H0 at the 100% level; otherwise do not reject H0:

H0 : k = 0k vs: H1 : k < 0k ;

if tobs < t = t1 then reject H0 at the 100% level; otherwise do not reject H0:
19

Using the p-value to decide between H0 and H1

p-value (or p) is the probability of obtaining a test statistic at least as extreme as the one that
was actually observed, assuming that the null hypothesis is true. p is an informal measure
of evidence of the null hypothesis. Calculate the p-value:

H0 : k = 0k vs: H1 : k 6= 0k ! p-value = 2P t0k > jtobsj H0 is true :

H0 : k = 0k vs: H1 : k > 0k ! p-value = P t0k > tobs H0 is true :

H0 : k = 0k vs: H1 : k < 0k ! p-value = P t0k < tobs H0 is true :

In a Chi-Square and F test ! p-value= P F 0 > Fobs H0 is true

For example, a p-value = 0:02 shows little evidence supporting H0: At the 5% level you
should reject the H0 hypothesis. Rejection rule:

p-value > do not reject H0 at 100% signi…cance level:


p-value reject H0 at 100% signi…cance level:
20

In most applications, our primary interest lies in testing the null hypothesis

H0 : j = 0
where j corresponds to any of the K independent variables. Since j measures the partial
e¤ect of xij (or x j ) on the expected value of y , after controlling for all other independent
variables, j = 0 means that, once the other variables have been accounted for, xij has no
e¤ect on the expected value of y .

For example

log wagei = 1 + 2educi + 3exper i + 4tenurei + "i

The null hypothesis H 0 : 3 = 0 means that, once education and tenure have been
accounted for, the number of years in the workforce (exper) has no e¤ect on hourly wage.
If it is true, it implies that a person’s work history prior to the current employment does not
a¤ect wage. If 3 > 0, then prior work experience contributes to productivity, and hence to
wage.
21

4.2.2 Issues in Hypothesis Testing

Correct wording in reporting the outcome of a test involving H0 : k = 0k vs. H1 : k 6= 0k

When the null is rejected we say that bk (not is signi…cantly di¤erent from 0 at
k) k
100%.

When the null isn’t rejected we say that bk (not k) is not signi…cantly di¤erent from
0 at 100%.
k

Correct wording in reporting the outcome of a test involving H0 : k = 0 vs. H1 : k 6= 0

When the null is rejected we say that bk (not k ) is signi…cantly di¤erent from zero at
100% level, or the variable (associated with bk ) is statistically signi…cant at 100%.

When the null isn’t rejected we say that bk (not k ) is not signi…cantly di¤erent from
zero at 100% level, or the variable is not statistically signi…cant at 100%.
22

More Remarks:

Rejection of the null is not proof that the null is false. Why?

Acceptance of the null is not proof that the null is true. Why? We prefer to use the
language “we fail to reject H0 at the x% level” rather than “H 0 is accepted at the x%
level.”

In a test of type H0 : k = 0k , if ^ bk is large (bk is an imprecise estimator) it is more


di¢cult to reject the null. The sample contains little information about the true value
of k parameter. Remember that ^ bk depends on
2; S 2 ; n and Rk2 .
xk

Python OLS. In the cases H0 : k = 0k vs: H1 : k > 0k or H0 : k = 0k vs: H1 :


k < 0 divide the reported p-value by two.
k
23

Statistical Versus Economic Signi…cance

The statistical signi…cance of a variable is determined by the size of tobs = bk = ^ bk whereas


the economic signi…cance of a variable is related to the size and sign of bk :
Example 4.2.1. Suppose that in a business activity we have

log\
(wagei) = :1 + 0:005 f emale + ::: n = 600
(0:001)
H0 : 2 = 0 vs. H1 : 2 6= 0: We have:
b2
t02 = t(600 K) N (0; 1) (under the null)
^ b2
0:005
tobs = = 5;
0:001
p-value = 2P t02 > j5j H0 is true 0:
Discuss statistical versus economic signi…cance.
24

Example 4.2.2. Consider


log (pricei) = 1 + 2 log (lotsizei) + 3 log (sqrf ti) + 4bdrmsi + "i
where price: house price, $1000s, bdrms: number of bedrooms, lotsize: size of lot in square
feet, sqrft: size of house in square feet (py…le: 04_hprice1.py). Test H0 : 4 = 0 vs.
H0 : 4 > 0:

OLS Regression Results


==============================================================================
Dep. Variable: lprice R-squared: 0.643
Model: OLS Adj. R-squared: 0.630
Method: Least Squares F-statistic: 50.42
Date: nan Prob (F-statistic): 9.74e-19
Time: nan Log-Likelihood: 25.861
No. Observations: 88 AIC: -43.72
Df Residuals: 84 BIC: -33.81
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
c -1.2970 0.651 -1.992 0.050 -2.592 -0.002
llotsize 0.1680 0.038 4.388 0.000 0.092 0.244
lsqrft 0.7002 0.093 7.540 0.000 0.516 0.885
bdrms 0.0370 0.028 1.342 0.183 -0.018 0.092
==============================================================================
Omnibus: 12.060 Durbin-Watson: 2.089
Prob(Omnibus): 0.002 Jarque-Bera (JB): 34.890
Skew: -0.188 Prob(JB): 2.65e-08
Kurtosis: 6.062 Cond. No. 410.
==============================================================================

Note: The p-value = 0:183 concerns the test H0 : 4 = 0 vs. H1 : 4 6= 0


25

4.2.3 Test on a Set of Parameters I

Suppose that we have a joint null hypothesis about :

H0 : R = r vs. H1 : R 6= r:
where Rp K . The test statistics is

1 1
F0 = (Rb r) 0
R X0 X R0 (Rb r) = ps2 :

Let Fobs be the observed test statistics. We have

reject H0 if Fobs > F1 (or if p-value < )


do not reject H0 if Fobs F1 :

The reasoning is as follow. Under the null hypothesis we have

F0 F (p; n K) :
If we observe Fobs > F1 and the H0 is true, then a low-probability event has occurred.
26

In the case p = 1 (single linear combination of the elements of ) one may use the test
statistics
Rb R
t (n K ) :
1
s R ( X 0 X ) R0
t0 = q
27

4.2.4 Test on a Set of Parameters II

We focus on another way to test

H0 : R = r vs. H1 : R 6= r:
Note: Rp K . It can be proved that

1 1
F0 = (Rb r) 0
R X0 X R0 (Rb r) = ps2
e 0e e0e =p
=
e 0 e = (n K )
R2 R2 =p
= F (p; n K)
1 R 2 = (n K)
where refers to the short regression or the regression subjected to the constraint R = r.
28

Example 4.2.3. Consider the example 4.2.2 and H0 : 3 2 = 3: Write R and r ( 04_hprice1.

Example 4.2.4. Consider the example 4.2.2 and H0 : 3 2 = 3: Since p = 1 we may use
the t-test
29

Example 4.2.5. Consider the example 4.2.2 and test H0 : 3 2 = 3 against H1 : 3 2 6= 3


( 04_hprice1.py) using
e 0e e0e =p
F0
e e = (n K )
= 0
30

-------------------------------
n = 88.0
K = 4
ssr0 (under H0) 2.9103860099383025
ssr1 (under H1) 2.8625632351852044
-------------------------------
Fobs = 1.4033272802095296
p-value = 0.23950822209892753

In the case “all slopes zero” (test of global signi…cance of the regression),

H0 : 2 = 3 = ::: = K = 0
It can be proved that F 0 equals
R 2 = (K 1)
F0 = :
1 R 2 = (n K)
Under the null we have F 0 F (K 1; n K) :
Example 4.2.6. Consider the example 4.2.2 test the global signi…cance of the regression,
i.e.
H0 : 2 = 3 = 4 = 0 vs. H1 : H0 is false.
From the output we have:

Fobs = 50:42 and p-value ' 0


31

4.3 Prediction and Forecasting

A common use of regression modeling is for prediction or forecasting of the dependent


variable.

Prediction involves using the regression model to compute …tted (predicted) values of
the dependent variable, either within the sample or for observations outside the sample.
The same set of results will apply to cross sections, panels, and time series. We are
usually examining a “scenario” of our own design. For example, we might be interested
in predicting the price of a hypothetical house given certain characteristics (rooms, etc.)
that actually exists in the sample.

Forecasting, while largely the same exercise, explicitly gives a role to “time” and often
involves lagged dependent variables and disturbances that are correlated with their past
values. This exercise usually involves predicting future outcomes. In the time-series
context, we will often try to forecast an event such as real investment next year, not
based on a hypothetical economy but based on our best estimate of what economic
conditions will be next year.
32

Suppose that we wish to predict the value of y 0 associated with a regressor vector x0. The
actual value would be
0
y 0 = x0 + "0:

The best predictor in Mean Squared Error (MSE) is E y0 x0 = x 0 0


and the MVLUE
estimator of x 0 0
is
0
y^0 = x0 b:

(1 ) 100% Prediction interval for y 0:

0 1 0
y^0 t1 =2 (n K) s x 0 (X 0 X ) x +1
s

where t1 =2 (n K ) is the quantil of order 1 =2 from a t distribution with n K


degrees of freedom.
33

4.3.1 Prediction in a log Model

Consider
log yi = xi + "i:
Suppose that we wish to predict the value of y 0 associated with a regressor vector x0. The
actual value would be
y0 = exp x 0 0
+ "0
0
= exp x0 exp "0

and the best predictor in Mean Squared Error (MSE) is again E y 0 x0

y0 x0 x 0 0
exp "0 x0
E = E exp

= exp x0
0 0 x0 :
E exp "

If "0 x0 N 0; 2 ) exp "0 x0 has a log-normal distribution with


2 2
0 x0 = exp 0 + = exp
E exp "
2 2
! !
34

Therefore
0 ^2 0 s2
y^0 = exp x0 b exp = exp x0 b + :
2 2
! !

This prediction strongly relies on the normality assumption. If we wish to avoid this assump-
tion, we may replace E exp "0 x0 by the corresponding empirical moment, assuming
E exp "0 x0 = E exp "0 :
n
\
E exp "0 = exp (ei) :
n i=1
1X

The estimator of y 0 is then


n
y~0 = exp 0
x b
0
exp (ei) :
n i=1
1X
35

Example 4.3.1. Consider the example 4.2.2. Find the predicted value of price when lotsize
= 9019.8, sqrft = 2013 and bdrms = 4 (04_hprice1_prediction.py)

y0_hat = [306.01823605]
y0_hat (using statsmodels)= [301.09604431]
----------------------------------------------
Remarks:
This prediction (using statsmodels) neglects (np.exp(res)).mean()
The prediction is correct for log(price)
----------------------------------------------

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy