EC1 Slides Part4
EC1 Slides Part4
The formula b as function of the random sample is an estimator and thus it is random.
After applying the OLS formula using the numeric values of y and X, b is an estimate (not
random). We use the sample letter b to represent the estimator and the estimate and you
should know from the context whether b is an estimator or an estimate. In this section b is
the OLS estimator (thus it is random).
2
1
(b) (Expression for the variance) Under Assumptions FS1-FS4, Var ( bj X) = 2 X0X
and Var (b) = 2 E X0X 1 :
(c) (Gauss-Markov Theorem) Under Assumptions FS1-FS4, the OLS estimator is e¢cient in
the class of linear unbiased estimators (also called Minimum Variance Linear Unbiased
Estimator, MVLU or Best Linear Unbiased Estimator, BLUE). That is, for any unbi-
ased estimator ^ that is linear in y, Var ( bj X) Var ^ X in the matrix sense (i.e.
Var ^ X Var ( bj X) is a positive semide…nite matrix).
3
E ( b j X) =
E (b ) = :
However, if FS3 does not hold the estimator is biased. This is obvious from the representation
1
E ( b j X) = + X0 X X0 E ( "j X) :
See the following section for an example where E ( "j X) 6= 0:
4
y = X + " = X1 1 + X2 2 + "
If X01X2 6= 0 but the variables are uncorrelated then only the coe¢cient of the intercept is
a¤ected (see section 3.3.2). In general this not a matter of concern, as usually we focus on
the other coe¢cients . Thus, in practical situations the issue is whether the omit variables
are:
We have
1
b = X0 X X0 y
1
= X0 X X0 (X + ")
1
= + X0 X X0 "
1
Var ( bj X) = Var + X0 X X0 " X
1 1
= X0 X X0 Var ( "j X) X X0 X [Revisões, 29(a)]
1 2I 1
= X0 X X0 X X0 X FS4: Var ( "j X) = 2I
2 1
= X0 X :
Finally,
2 1
Var (b) = E (Var ( bj X)) + Var (E ( bj X)) = E X0 X :
7
Note:
Var ( b1j X) Cov ( b1; b2j X) Cov ( b1; bK j X)
2 1
2 3
where Rk2 is the coe¢cient of determination in the auxiliary regression of xik on the remaning
explanatory variables, and Sx2k = (xik xk )2 =n: For example if k = K the auxiliary
regression is xiK = 1 + 2xi2 + ::: + K 1xiK 1 + ui:
P
8
We can conclude that the precision of bk is high (i.e. Var ( bk j X) is small) when:
2 is low;
If rank (X) < K then b is not de…ned. This is called strict multicollinearity. When this
happens, the statistical software will be unable to construct X0X 1 : Since the error is
discovered quickly, this is rarely a problem for applied econometric practice.
The more relevant situation is near multicollinearity, which is often called “multicollinearity”
for brevity. This is the situation when the X0X is near singular, when the columns of X are
close to linearly dependent.
Consequence: the individual coe¢cient estimates will be imprecise. We have shown that
2
Var ( bK j X) = :
1 2 S2 n
RK xK
2 is the coe¢cient of determination in the auxiliary regression
where RK
As a consequence:
Small changes in the data produce wide swings in the parameter estimates.
Coe¢cients may have the “wrong” sign or implausible magnitudes.
Coe¢cients may have very high standard errors and low signi…cance levels even though
they are jointly signi…cant and the R2 for the regression is quite high.
And:
The Gauss-Markov Theorem (Theorem 4.1.1, part (c)) states that under Assumptions FS1-
FS4, the OLS estimator is e¢cient in the class of linear unbiased estimators (also called
Minimum Variance Linear Unbiased Estimator, MVLU or Best Linear Unbiased Estimator,
BLUE). That is, for any unbiased estimator ^ that is linear in y, Var ( bj X) Var ^ X
in the matrix sense (i.e. Var ^ X Var ( bj X) is a positive semide…nite matrix).
Proof: Let ^ another linear unbiased estimator of : It can be proved [board] that
Var ^ X Var ( bj X)
is a positive semide…nite matrix.
12
If we wish to test hypotheses about or to form con…dence intervals, then we will require
a sample estimate of the covariance matrix,
1
Var ( bj X) = 2 X0X :
1
The part X0X is known, but 2 is not. Note:
2 = E "2i
1=2 1
The square root of the k-th diagonal element of this matrix s2 X0X is the
kk
h i
standard error of the estimator bk , which is often denoted simply “the standard error of bk ”
or ^ bk : For example
2 1 2
d ( b j X) d ( b ; b j X)
d ( b j X) =
and
d ( b ; b j X) d ( b j X)
p p
^ b1 = 2 :1 ; ^ b2 = 3 :2 :
14
In particular
2 1 2 b k
bk j X N k; X0 X =N k ; bk or zk = k N (0; 1)
kk bk
We will see in the section “Hypothesis Testing” that the Normality Assumption leads to the
following result:
b k
tk = k t (n K ) :
^ bk
The normal distribution of b in a …nite sample is a consequence of our speci…c assumption
of normally distributed disturbances. What if the distribution of " is unknown? We must
rely on asymptotic theory (we see later on).
15
1
F = (Rb r) 0
R X0 X R0 (Rb r) = ps2 F (p; n K) :
b1 + b2 + 3b3 1 1 3
! !
b3
B C
@ b2 A =
H0 : k = 0k
( 0k is a speci…c value, e.g. zero), and that this hypothesis is tested against the alternative
hypothesis
H1 : k 6= 0k :
Under the null hypothesis we have
bk 0
t0k = k t(n K):
^ bk
If we observe jtobsj > t1 =2 and the H0 is true, then a low-probability event has occurred.
We take jtobsj > t1 =2 as an evidence against the null and the decision should be to reject
H0:
18
Other cases:
H0 : k = 0k vs: H1 : k > 0k ;
if tobs > t1 then reject H0 at the 100% level; otherwise do not reject H0:
H0 : k = 0k vs: H1 : k < 0k ;
if tobs < t = t1 then reject H0 at the 100% level; otherwise do not reject H0:
19
p-value (or p) is the probability of obtaining a test statistic at least as extreme as the one that
was actually observed, assuming that the null hypothesis is true. p is an informal measure
of evidence of the null hypothesis. Calculate the p-value:
For example, a p-value = 0:02 shows little evidence supporting H0: At the 5% level you
should reject the H0 hypothesis. Rejection rule:
In most applications, our primary interest lies in testing the null hypothesis
H0 : j = 0
where j corresponds to any of the K independent variables. Since j measures the partial
e¤ect of xij (or x j ) on the expected value of y , after controlling for all other independent
variables, j = 0 means that, once the other variables have been accounted for, xij has no
e¤ect on the expected value of y .
For example
The null hypothesis H 0 : 3 = 0 means that, once education and tenure have been
accounted for, the number of years in the workforce (exper) has no e¤ect on hourly wage.
If it is true, it implies that a person’s work history prior to the current employment does not
a¤ect wage. If 3 > 0, then prior work experience contributes to productivity, and hence to
wage.
21
When the null is rejected we say that bk (not is signi…cantly di¤erent from 0 at
k) k
100%.
When the null isn’t rejected we say that bk (not k) is not signi…cantly di¤erent from
0 at 100%.
k
When the null is rejected we say that bk (not k ) is signi…cantly di¤erent from zero at
100% level, or the variable (associated with bk ) is statistically signi…cant at 100%.
When the null isn’t rejected we say that bk (not k ) is not signi…cantly di¤erent from
zero at 100% level, or the variable is not statistically signi…cant at 100%.
22
More Remarks:
Rejection of the null is not proof that the null is false. Why?
Acceptance of the null is not proof that the null is true. Why? We prefer to use the
language “we fail to reject H0 at the x% level” rather than “H 0 is accepted at the x%
level.”
log\
(wagei) = :1 + 0:005 f emale + ::: n = 600
(0:001)
H0 : 2 = 0 vs. H1 : 2 6= 0: We have:
b2
t02 = t(600 K) N (0; 1) (under the null)
^ b2
0:005
tobs = = 5;
0:001
p-value = 2P t02 > j5j H0 is true 0:
Discuss statistical versus economic signi…cance.
24
H0 : R = r vs. H1 : R 6= r:
where Rp K . The test statistics is
1 1
F0 = (Rb r) 0
R X0 X R0 (Rb r) = ps2 :
F0 F (p; n K) :
If we observe Fobs > F1 and the H0 is true, then a low-probability event has occurred.
26
In the case p = 1 (single linear combination of the elements of ) one may use the test
statistics
Rb R
t (n K ) :
1
s R ( X 0 X ) R0
t0 = q
27
H0 : R = r vs. H1 : R 6= r:
Note: Rp K . It can be proved that
1 1
F0 = (Rb r) 0
R X0 X R0 (Rb r) = ps2
e 0e e0e =p
=
e 0 e = (n K )
R2 R2 =p
= F (p; n K)
1 R 2 = (n K)
where refers to the short regression or the regression subjected to the constraint R = r.
28
Example 4.2.3. Consider the example 4.2.2 and H0 : 3 2 = 3: Write R and r ( 04_hprice1.
Example 4.2.4. Consider the example 4.2.2 and H0 : 3 2 = 3: Since p = 1 we may use
the t-test
29
-------------------------------
n = 88.0
K = 4
ssr0 (under H0) 2.9103860099383025
ssr1 (under H1) 2.8625632351852044
-------------------------------
Fobs = 1.4033272802095296
p-value = 0.23950822209892753
In the case “all slopes zero” (test of global signi…cance of the regression),
H0 : 2 = 3 = ::: = K = 0
It can be proved that F 0 equals
R 2 = (K 1)
F0 = :
1 R 2 = (n K)
Under the null we have F 0 F (K 1; n K) :
Example 4.2.6. Consider the example 4.2.2 test the global signi…cance of the regression,
i.e.
H0 : 2 = 3 = 4 = 0 vs. H1 : H0 is false.
From the output we have:
Prediction involves using the regression model to compute …tted (predicted) values of
the dependent variable, either within the sample or for observations outside the sample.
The same set of results will apply to cross sections, panels, and time series. We are
usually examining a “scenario” of our own design. For example, we might be interested
in predicting the price of a hypothetical house given certain characteristics (rooms, etc.)
that actually exists in the sample.
Forecasting, while largely the same exercise, explicitly gives a role to “time” and often
involves lagged dependent variables and disturbances that are correlated with their past
values. This exercise usually involves predicting future outcomes. In the time-series
context, we will often try to forecast an event such as real investment next year, not
based on a hypothetical economy but based on our best estimate of what economic
conditions will be next year.
32
Suppose that we wish to predict the value of y 0 associated with a regressor vector x0. The
actual value would be
0
y 0 = x0 + "0:
0 1 0
y^0 t1 =2 (n K) s x 0 (X 0 X ) x +1
s
Consider
log yi = xi + "i:
Suppose that we wish to predict the value of y 0 associated with a regressor vector x0. The
actual value would be
y0 = exp x 0 0
+ "0
0
= exp x0 exp "0
y0 x0 x 0 0
exp "0 x0
E = E exp
= exp x0
0 0 x0 :
E exp "
Therefore
0 ^2 0 s2
y^0 = exp x0 b exp = exp x0 b + :
2 2
! !
This prediction strongly relies on the normality assumption. If we wish to avoid this assump-
tion, we may replace E exp "0 x0 by the corresponding empirical moment, assuming
E exp "0 x0 = E exp "0 :
n
\
E exp "0 = exp (ei) :
n i=1
1X
Example 4.3.1. Consider the example 4.2.2. Find the predicted value of price when lotsize
= 9019.8, sqrft = 2013 and bdrms = 4 (04_hprice1_prediction.py)
y0_hat = [306.01823605]
y0_hat (using statsmodels)= [301.09604431]
----------------------------------------------
Remarks:
This prediction (using statsmodels) neglects (np.exp(res)).mean()
The prediction is correct for log(price)
----------------------------------------------