Basee2 Students
Basee2 Students
3rd. year GE
2021-2022
Unauthorized reproduction of this text and distribution of copies are strictly prohibited, as
well as any other infringement of other rights, which correspond to the Department of Applied
Economics III (Econometrics and Statistics), University of the Basque Country UPV/EHU.
c
UPV/EHU 2021.
Author (translation):
J. Arteche
Juan I. Modroño
Contents EXERCISE 21 (LADE-2001.4) (Sep-2001) 24
where X2t is a fixed variable, X3t is a stochastic variable and β = (β1 , β2 , β3 )′ is the vector of
unknown parameters.
b) Which assumption does guarantee the unbiasedness of the OLS estimator of β? Show why.
c) If X3t is stochastic and not independent of ut but E(X3t ut ) = 0, ∀t, ¿is the OLS estimator
of β consistent? Prove it and indicate what additional assumptions are needed to get the
desired result.
d) If X3t is stochastic but the assumptions in the Mann-Wald’s theorem are satisfied, is
it possible to make inference on β even if the distribution of ut is unknown? Explain
thoroughly.
a) With three observations of Yt and Xt obtain by OLS in its matrix form the estimates of α
and β.
t 1 2 3
Yt 1 1 0
Xt 1 -1 1
E(u1 u3 ) = E(u3 u1 ) = 1
E(u1 u2 ) = E(u2 u1 ) = E(u2 u3 ) = E(u3 u2 ) = 0.
Given the observations of Yt and Xt and the information on E(ut us ), calculate the variance-
covariance matrix of the OLS estimator.
1
c) Given the above information, what are the statistical properties of the Ordinary Least
Squares estimator?
d) Do you know an estimator with better properties? Which one? Describe its properties
and write down its variance-covariance matrix (do not calculate it, just write down its
mathematical expression and explain each of its elements).
e) Consider now the model Yt = α + βXt + ut with
E(u2t ) = tXt2 and E(ut us ) = 0 ∀t, s t 6= s
Write down the transformed model that corrects this problem and show that the variances
of the disturbances in the transformed model are constant.
f) Estimate by OLS and using matrix algebra the parameters of the transformed model.
A researcher A wants to explain the students expenses with the following model:
Yi = α + βXi + ui i : 1, . . . , N (1)
E(ui ) = 0
V ar(ui ) = σu2 ∀i
E(ui us ) = 0 ∀i 6= s
Another researcher B thinks that, in order to simplify calculations, it is better to group the
data for each classroom and estimate the parameters using the grouped data. The students are
grouped in 8 classes and the number of students in each class is n1 , n2 , . . . , n8 . Researcher B
will so be using 8 observations for each variable, one for every class:
Pnj Pnj
Yk Xk
Yj = k=1
nj Xj = k=1
nj j : 1, 2, . . . , 8
2
EXERCISE 4 (LADE-1997.1) (Jun-1997)
Consider the following linear regression model:
b) Assume now that a = 0 and b is an unknown parameter. After GLS estimation, we have
obtained the following estimates:
2 3 −2 1
β̂GLS = 3 Vb (β̂GLS ) = −2 4 0
−1 1 0 3
i) β3 = 0
ii) β3 = 0 and β1 + 2β2 = 5
3
HETEROSCEDASTICITY
a) Write down the transformed model with homoscedastic disturbances. Show the properties
of the transformed disturbances.
b) If T = 4, write down the matrix of regressors X in the transformed model if
t 1 2 3 4
X2t 0 1 1 2
X3t 3 0.5 1 1
Decide which of the following models is correctly transformed in order to correct the het-
eroscedasticity problem and explain why.
(1) Pi Yi = α + βPi Xi + Pi ui
Yi α Xi ui
(2) = +β +
Pi Pi Pi Pi
Yi Xi ui
(4) =α+β +
Pi Pi Pi
4
EXERCISE 7 (PV-E.38) (Feb-1997)
Use the provided information in these two regressions to test the same null hypothesis as
in a).
c) Taking into account all the results obtained above, which estimation method would you
use for the consumption model? Why? Explain in detail.
An expanding commerce business wants to perform an analysis of the relationship between the
industrial sector and the number of offices per province. For that, a sample of 50 observations
for the variables S (no. of office branches per province) and L (no. of commercial licenses, as
an indicator of the importance of the commercial sector) are available. Its research department
estimates by OLS the following regression:
Si = β1 + β2 Li + ui (1)
The graphical representation of the endogeneous variable Si and the OLS residuals against the
explanatory variable Li is:
5
Model variables OLS Residuals
a) The research manager is not convinced by these results. Which problems do you think
these graphics evidence?
The same manager proposes two alternative ways to improve the estimation. The first one
consists in estimating by OLS the following equation:
S 1 p ui
√i = β 1 √ + β 2 Li + √ (3)
Li Li Li
b) What is the basic hypothesis that is not satisfied in model (1) in order to use model (3)?
What solution is proposed here? What is the expected improvement over the first OLS
estimation in (2)?
c) Considering the graphical representation of the variable √SLi and the OLS residuals of
√ i
model (3) against Li , do you think that the problem is correctly solved?
The second possibility is that the relationship between Si and Li is not linear but exponential
Si = exp{γ1 + γ2 Li + vi }, so that the following model is estimated by OLS:
ln S i = γ1 + γ2 Li + vi (4)
6
giving the following results for the whole sample of 50 observations:
d
ln Si = 3, 31 + 0, 02 Li , R2 = 0, 33 RSS = 10, 54 (5)
(t − ratio) (31, 0) (5, 3)
v̂i2
= 0, 053 + 0, 017 Li + êi , R2 = 0, 014 RSS = 89, 72 (6)
0, 21 (0, 09) (1, 6)
Furthermore, after sorting the sample according to the values of the variable L, two regressions
like (4) have been estimated using the first and the last 12 observations. The residual sums of
squares obtained are RSS1 = 0.77 and RSS2 = 0.992 respectively.
d) Do you think that model (4) has the same problem with the fullfilment of the basic
assumptions as model (1)? Justify your answer with a formal test. Explain what you do
and why.
In order to model the relationship between household consumption (Y ) and income of the
householder (X) the following equation is proposed:
Yi = α + βXi + ui (1)
where ui is supposed to have a normal distribution. We have the following data from 10 house-
holds:
i 1 2 3 4 5 6 7 8 9 10 Sum
Y 8 91 191 22 55 32 81 176 138 31 825
X 4 49 100 9 25 16 36 81 64 16 400
û2i X
= −0, 245 + 0, 0311Xi + ŵi ŵi2 = 1, 1473 R2 = 0, 89 (2)
48, 65
7
a) Use some graphical method to search for traces of heteroscedasticity. Comment the results.
b) Test for the existence of heteroskedasticity caused by the variable Xi by means of the
Breusch-Pagan statistic. State clearly the null hypothesis, the alternative, the testing
statistic and its distribution. Comment on the reliability of the above test on this
particular case.
c) Estimate model (1) by GLS under the assumption that V ar(ui ) = σi2 = σ 2 Xi
d) Is the variable income of the householder, X, relevant to explain the household income,
Y?
The effect of an increase in Social Security contributions on the part of the contributions paid
by the workers is to be estimated with a sample of 15 countries. The information (in 1982) of
the Social Security contributions (SSC) and the workers’ contributions part (WSSC) both as a
percentage of the full fiscal income is presented in the first two columns of the following table:
SSC WSSC û
Austria 31,9 13,5
Belgium 29,8 10,1 -0,08327
Denmark 2,8 1,5 -2,97434
France 43,2 11,5
Germany 36,2 16,1
Ireland 15,0 5,4 -1,65393
Italy 47,2 7,1
Japan 30,4 10,7 0,38986
Luxembourg 28,0 11,2 1,39732
The Netherlands 41,6 18,0
Portugal 28,5 10,8 0,89160
Spain 46,5 10,3
Switzerland 31,0 10,2 -0,23700
United Kingdom 16,9 7,6 0,14433
U.S.A. 27,7 10,8 1,06076
W SSCi = β1 + β2 SSCi + ui i = 1, . . . , 15
The OLS estimated model using data from the 15 countries is:
Wd
SSC i = 3, 8823 + 0, 211442 SSCi (1)
(t − stat.) (1, 69) (3, 01)
8
R̄2 = 0, 365 RSS = 132, 7767
a) Look at the table carefully, the OLS residuals ûi are displayed in the third column.
Indicate the general form to obtain ûi . Then complete the missing values in the same table
and in the following picture:
W SSCi 1,5
SSCi 2,8
9
• Second subsample
Wd
SSC i = 28, 9928 − 0, 395203SSCi (3)
W SSCi 13,5
SSCi 31,9
a) What problem exists in the previous model? How could it be detected? Explain carefully
the proposed test.
b) What are the consequencesPon the tests of hypothesis about β1 and β2 of using in the t or
û2
F statistics the estimator Ni−2i (X ′ X)−1 ?
10
where ûi = Yi − β̂1 − β̂2 Xi are the residuals obtained from OLS estimation of β1 and β2 .
d) If the White estimator has been used, how has the following estimation of the variance and
covariance matrix of the OLS estimator of β1 and β2 been obtained? Indicate explicitly
all the steps needed to reach this result.
0, 04 −0, 11
Vd
ar(β̂OLS )W HIT E =
−0, 11 0, 28
f) Assuming that σi2 = 4Xi2 , how could you obtain an efficient estimator of β1 and β2 ?
Explain thoroughly the estimation procedure.
g) Calculate the estimates of β1 and β2 with the efficient estimator and its variances and
covariances matrix.
i) Could we get different conclusions from the tests in e) and h)? Why?
A database is available with information about the selling price and certain characteristics of
224 houses in two residential areas of the Orange County in California (USA): Dove Canyon
and Coto de Caza 1 . Dove Canyon is a neighbourhood built around a golf course with single
family tract homes with relatively small lots. Coto de Caza is a more upscale area. It is more
rural with large custom homes. The variables considered are:
We next show the results of the Ordinary Least Squares (OLS) estimation of a model for the
housing selling price using this dataset:
11
sqft 0,252069 0,00815634 30,905
age 3,69805 3,02416 1,223
city 91,8038 21,7494 4,221
a) Write down the estimated theoretical model and comment the results in terms of the
goodness of fit, significance and signs of the estimated coefficients.
b) Analyse the information provided by the following graphics and the auxiliary regression.
If you perform some test, describe all its elements. Which graphic is more informative and
why?
d̂
ui 2
= − 5, 94184 + 0, 00172457 sqfti
RSSA /224 (-10,387) (12,727)
2
N = 224 R = 0, 421826 RSS = 1478, 52
600
400
200
residual
−200
−400
−600
−800
0 50 100 150 200
index
12
Figure 2: OLS residuals against variable sqft
Residuals from the regresssion (= salepric − estimated salepric)
800
600
400
200
residual
−200
−400
−600
−800
3000 4000 5000 6000 7000 8000 9000 10000 11000
sqft
Now we show the results of the OLS estimation using a consistent estimator of the variance
and covariance matrix of the coefficients under heteroscedasticity.
c) Describe the changes between the results now shown (RESULTS B) and the former results
(RESULTS A). What is the reason of those changes? Which ones are more reliable and
what for?
13
RESULTS C
d) Explain what weighted data and original data mean and the differences between both.
Why is it used as weighting variable the inverse of sqft squared?
We are interested in analysing the relationship between Health aggregated expenditure, Yi and
the aggregated income, Xi , both in billions of dollars, for 51 North American states2 :
Y i = β 1 + β 2 Xi + u i (1)
2
Ramanathan, R. (2002), Introductory Econometrics with Applications, data 3-2.
14
The results of the OLS estimation are:
1
residual
-1
-2
-3
-4
-5
0 100 200 300 400 500 600 700
Income
a) Explain how the residuals have been calculated and what Figure 3 has been drawn for.
Interpret that figure.
c) Explain thoroughly which statistic would you use to test the significance of the variable
Income. Perform the test writing down all its elements.
d) Considering the results of model (1), the researcher decide to estimate again the model
assuming the next structure for the variance of the disturbances: V ar(ui ) = σ 2 Xi . The
following results are obtained:
15
VARIABLE COEFFICIENT STDERROR T STAT
i) Why is V ar(ui ) = σ 2 Xi chosen as the variance of the disturbances? Explain how the
estimates have been obtained.
ii) Assuming normality for ui , test the significance of the variable Income.
e) The researcher is not convinced by the results obtained with the function chosen for
V ar(ui ) and wishes to re-estimate model (1) assuming that V ar(ui ) = a + bXi , where
a and b are unknown.
i) Explain in detail how you would estimate the coefficients of model (1) under this
assumption.
ii) Assuming σ̂i2 = â + b̂Xi , perform the estimation described in the previous item with
the following sample information:
P 2 P 2 P P
P û i = 148, 699 P û i X i = 34945, 67 P (X i /σ̂ i ) 2 = 196420, 998
P (Xi /σ̂i 2 ) = 1608, 337
2 2
(1/σ̂i )2 = 34, 738 (Yi /σ̂i ) = 236, 139 (Yi Xi /σ̂i ) = 28484, 578 (Yi2 /σ̂i 2 ) = 4168, 919
f) What would you comment on the validity of the tests performed in c), d.ii) and e.iii)?
The following regression model is proposed to analyse the effects of advertising spending, Xi ,
on the income of the restaurants, Yi , in a particular city:
With a sample of 166 restaurants, data on the average income (in thousands of euros) and on the
monthly advertising spending (in hundreds of euros) are available with the restaurants grouped
by districts.
16
District 1 2 3 4 5 6 7
Yj 10 12 14 18 17 18 20
Xj 3 5 9 12 15 17 19
nj 9 4 36 16 81 4 16
P P
where X j = n1j i∈Bj Xi , Y j = 1
nj i∈Bj Yi and nj denotes the number of restaurants in district
Bj , j = 1, 2, . . . , 7.
a) Given that we only have information on the averages, which model could you use to
estimate α and β? Show the properties of the disturbances in that model.
b) Obtain efficiently estimates of the parameters of the model and describe in detail the
estimator and its properties.
d) Without making any calculous, how would you estimate the model proposed in a) if the
variance of the disturbances in the original model (1) increases with the advertising spend-
ing such that Var(ui ) = σu2 Xi ?
A travel agency in Chicago wants to analyse if there exist differences in the distance travelled
by the families in their choice of destinations for vacation, as a function of the number of kids in
the family. For that purpose it has a sample of 200 households in Chicago interviewed in 20073 .
The following model is specified :
17
where M iles are the miles travelled by one household in the vacations of one year, Income is
the annual income in thousands of dollars, age is the average age of the adult members of the
household and kids is the number of children under 16 in the household.
Md
ilesi = −391, 55 + 14, 201 Incomei + 15, 741 agei −81, 826 kidsi (2)
d β̂OLS ))
(s.d.( (169,8) (1,80) (3,757) (27,13)
R2 = 0, 340605 RSS = 40099000
1500 1500
1000 1000
500 500
residuals
residuals
0 0
−500 −500
−1000 −1000
−1500 −1500
20 40 60 80 100 120 25 30 35 40 45 50 55
Income age
b) After grouping the observations of all variables into two groups according to a decreasing
sorting of the variable Income, and estimating the above model (1) by OLS for each group
separately, the following results are obtained:
18
Variable coefficient std. error t-ratio p-value
const −339,64 220,160 −1,5427 0,1271
Income 9,68801 4,01043 2,4157 0,0181
age 18,6511 3,87408 4,8143 0,0000
kids −66,026 29,8963 −2,2085 0,0302
Perform a test to verify if what you have answered in a) is statistically significant. You
must point out clearly all the elements of the test, including the null and the alternative
hypotheses.
c) If the result of the performed test gives support to reject the null hypothesis, what would
you change in the results in (2) if you are unwilling to change the estimation method?
Why? Explain in detail.
An alternative method of estimation to OLS has also been used in order to improve the
efficiency in the estimation of the β coefficients. Using the Gretl software the following
results have been obtained:
19
d) Fill in the blanks in the following expressions concerning the disturbance term of the model
and the estimation method used to get these results.
i=....
X
Estimation criterion:........ RSS = (Yi∗ − βˆ1 X1i
∗
− βˆ2 X2i
∗
− βˆ3 X3i
∗
− βˆ4 X4i
∗ 2
)
i=....
Yi∗ = ...................; ∗
X1i = .....................; ∗
X2i = ...................;
∗ ∗
X3i = ...........................; X4i = .............................;
−1
β̂...... =
e) If you had to test H0 : β2 = 10, how would you do it? Explain your answer in detail.
20
AUTOCORRELATION
A company wants to analyse the relationship between its consumption of petrol (Ct ) and its
price (Pt ). Using annual data the following OLS estimation is obtained:
Ĉt = 5278.44 − 23.36Pt
Year ût Year ût
1980 -112.93 1986 58.55
1981 -74.53 1987 155.71
1982 9.46 1988 43.67
1983 33.75 1989 -19.90
1984 58.49 1990 -85.66
1985 59.33 1991 -125.96
Yt 3 3 4 3 2 2
Xt 1 2 3 4 5 6
a) If ρ = 0.7, estimate the parameters α and β using Generalized Least Squares (GLS).
Explain in detail all the steps.
b) Test the hypothesis H0 : β = 1 at 5% significance level.
c) Assuming that the sample size is large enough, how would you estimate the parameters of
the model if ρ were unknown? Explain all the process in detail.
21
EXERCISE 18 (PV-E.44) (Sep-1997)
Next table shows data on wages (Y ) and worked hours (X) of the employees of a company. It
is also known if the worker is man (M ) or woman (W ):
P 2 P
Y 170 180 165 165 105 95 100 90 P Yi = 153900 P Yi 2= 1070
X 40 50 30 40 50 35 40 35 P Xi = 320 Xi = 13150
Gender M M M M F F F F Xi Yi = 43075
In order to explain the wages of the employees, a researcher propose the following model: Yi =
α + βXi + ui where ui ∼ N ID(0, σu2 ).
a) Estimate by OLS the parameters of the model and check the significance of the explanatory
variable X.
c) Another researcher thinks that gender is a relevant variable to explain the salary. Propose
and estimate a model that includes this hypothesis and test it.
d) The Durbin-Watson test statistic is d = 2.2. Do you find evidence of AR(1) autocorrelation
in the disturbances of this model? Relate your answer to the result obtained in b).
The relationship between the sales of certain product (Y ) and its price (X) is analysed, specifying
the following model:
Y t = α + β Xt + u t (1)
t 1 2 3 4 5 6
Y 27 32 25 31 30 32
X 9 12 8 10 12 11
û -0,5 0 -1 2 -2 1,5
a) Is there any evidence of first order autocorrelation in model (1)? Base your answer on
some formal test.
22
The following model has also been estimated by OLS:
Yt − ρ∗ Yt−1 = α(1 − ρ∗ ) + β(Xt − ρ∗ Xt−1 ) + εt εt ∼ N (0, σε2 ) (2)
for different values of ρ∗ , resulting in the following Residual Sums of Squares (RSS):
ρ∗ -0,1 -0,2 -0,3 -0,4 -0,5 -0,6 -0,7 -0,8 -0,9 -0,99
RSS 9,4 7,8 6,5 5,3 4,2 3,3 2,6 2,1 1,7 2,1
In order to analyse the sales structure of a certain model of car, the following model is specified,
Yt = β1 + β2 Pt + β3 Qt + β4 Xt + ut (1)
where Yt =income obtained with the sales of the car, Pt =car price, Qt =medium price of the rest
of models with similar characteristics, Xt = income per capita. With a sample of 100 observations
the model has been estimated by OLS obtaining the next results:
Ŷt = 1, 5 + 0, 1 Pt − 0, 5 Qt + 0, 7 Xt (2)
d
(dev) (0, 2) (0, 3) (0, 15) (0, 05)
R2 = 0, 87 RSS = 215
iid
a) Test the significance of Pt , assuming that ut ∼ (0, σu2 ). Make some comments on the
obtained result.
b) Perform also a test for the existence of first order autocorrelation in the disturbances,
making use of one of the next results:
ût = 0, 2 + 0, 3ût−1 + 0, 15Pt + 0, 12Qt + 0, 01Xt + v̂1t R2 = 0, 15 ESS = 75
ût = 0, 35ût−1 + 0, 22ût−2 + 0, 1Pt + 0, 16Qt + 0, 04Xt + v̂2t R2 = 0, 18 ESS = 74
ût = 0, 3 + 0, 24ût−1 + v̂3t R2 = 0, 05 ESS = 56
ût ût−1
2
= 0, 13 + 0, 2 2 + 0, 19Pt + 0, 02Qt + 0, 09Xt + v̂4t R2 = 0, 35 ESS = 98
σ̂ σ̂
Are the results of the test implemented in a) affected by the result of this test?
iid
c) If ut = ρut−1 + εt where εt ∼ (0, σε2 ) and |ρ| < 1 is unknown, explain in detail how you
would estimate the parameters of model (1) in the best possible way.
d) In the context described in c), how would you perform the significance test of Pt ? Explain.
23
EXERCISE 21 (LADE-2001.4) (Sep-2001)
t Y X
1 2 -3
2 10,2 5
3 17,9 13
4 2,3 -3
5 10 5
6 18,2 13
7 -5,7 -11
8 -14,1 -19
Sums 40,8 0
b) Perform a test to check if ut follows a first order autoregressive process. State clearly the
null and the alternative hypothesis, the test statistic and the decision rule.
d) Making use of the previous result, estimate the parameters α and β by FGLS.
e) Is the variable X relevant to explain Y ? Use a formal test, specifying clearly the null and
the alternative hypotheses and the distribution of the test statistic.
Consider the following yearly observations (the first three columns) of the variables Consumption
(Ct ) and National Income (Rt ):
24
Obs. C R Cb û
1 8,547 11,0 8,0483680 0,498632
2 8,942 13,5 9,7986580 -0,856658
3 10,497 14,0 10,148716 0,348284
4 10,173 14,9 10,778820 -0,605820
5 11,997 15,1 10,918843 1,078157
6 10,729 18,0 12,949180 -2,220180
7 12,750 18,8 13,509273 -0,759273
8 15,611 19,1 13,719307 1,891693
9 13,545 21,0 15,049528 -1,504528
10 17,843 21,2 15,189551
11 21,610 34,0 24,151036
12 25,473 34,3 24,361070
13 24,434 35,0 24,851152
14 28,274 38,0 26,951500
Ct = β1 + β2 Rt + ut
are:
a) The last column in the table shows the OLS residuals. Fill in that column and the time
series displayed in the residual plot shown below. Having in mind such plot, do you find
evidence of any problem?.
25
b) Obtain the value of the Durbin and Watson statistic and perform the corresponding
test. Indicate all the elements of the test, including the null and the alternative hypothesis.
c) Perform the Breusch and Godfrey test making use of the following information.
Indicate all the elements of the test, including the null and the alternative hypothesis.
d) Explain the consequences of the evidence found in the previous sections on:
i) the finite sample properties of the estimator of the parameters of the model. Prove
these properties.
ii) the inference based on the t-statistics shown in equation (1).
e) Would your answer to the previous section change if the detected problem was a conse-
quence of omitting a relevant variable? Explain in detail.
f) Consider the following information and fill in all missing data, (as indicated with dots).
ρ̂ -0,99 -0,9 -0,8 -0,7 -0,6 -0,5 -0,4 -0,3 -0,2 -0,1 0,0 0,1
RSS ∗ 15,9 14,8 14,2 14,1 14,7 15,8 17,5 19,9 22,8 26,2 30,3 34,9
where
t=....
X
RSS ∗ = {(Yt∗ − βˆ1 X1t
∗
− βˆ2 X2t
∗ 2
} (3)
t=....
Yt∗ = Ct − ρ̂Ct−1 ; ∗
X1t = ....................; ∗
X2t = ....................
−1
βˆ1 ..................
= .................. ..................
βˆ2
.................. .................. ..................
Consider the following model for the supply of sugar cane in Bangladesh:
26
where A is the area dedicated to the plantation of cane and P is the market price of cane. With
34 yearly observations for A and P we obtain the OLS estimated model:
dt ) = 6, 11 + 0, 97 ln(Pt )
ln(A R2 = 0, 706 (2)
d
(dev) (0, 17) (0, 11)
(a) Data
5.6 .8 (b) OLS r esid ua ls
.6
5.2
.4
4.8
.2
residuals
log(A)
4.4 .0
-.2
4.0
-.4
3.6
-.6
3.2 -.8
-2.8 -2.4 -2.0 -1.6 -1.2 -0.8 -0.4 5 10 15 20 25 30
Year
log(P)
a) What information can be extracted from the figure of the data in a)?
b) What information can be extracted from the figure of the OLS residuals in b)?
c) Now, we want to check if the variances of the disturbances change over time. Perform an
appropriate test for this hypothesis, specifying all its elements.
27
√
dt ) = 6, 12 + 0, 97 ln(Pt )
ln(A RSS = 3, 052 σ̂t = 0, 30/ t (3)
d
(dev) (0, 18) (0, 14)
dt ) = 6, 82 + 1, 31 ln(Pt )
ln(A RSS = 5, 620 σ̂t = 5, 066 × t (4)
d
(dev) (0, 29) (0, 12)
dt ) = 6, 09 + 0, 94 ln(Pt )
ln(A RSS = 2, 642 ût = 0, 34ût−1 + et (5)
d
(dev) (0, 24) (0, 16)
dt ) = 6, 13 + 0, 98 ln(Pt )
ln(A RSS = 2, 532 ût = 0, 36ût−1 + 0, 002ût−2 + et (6)
d
(dev) (0, 25) (0, 17)
e) Explain how you would test if the price-elasticity is zero or not, stating clearly the estimator
you use and how it has been obtained. Use the information above to perform the test.
In order to estimate a Cobb-Douglas production function for the farming sector in the U.S.A.
there is a database4 of yearly data for the period 1948-1993 on the next index variables (1982
= 100 for all of them):
• Yt = farm output
• Lt = farm labour
The following model is specified, where all the variables are in logarithms
Yt = β1 + β2 Lt + β3 EXt + β4 Kt + ut (1)
28
Figure 5: OLS residuals for model (2)
0.06
0.04
0.02
0
residual
-0.02
-0.04
-0.06
-0.08
-0.1
1950 1955 1960 1965 1970 1975 1980 1985 1990
a) Explain how the residuals have been calculated. What information can be extracted from
Figure 5?
b) Perform the autocorrelation tests you consider relevant using all the information provided.
Explain in detail.
c) Is it reliable to test the significance of the Farm Labour factor using the information
provided in (2)? Why? How should the test statistic be modified if the the OLS estimator
is still used in order to estimate the parameter β2 ?
Not convinced by the estimation of the model in (1) the econometrician estimates again
the production function by the Hildreth-Lu method. The results (using the Gretl software)
are:
d) Explain what Figure 6 is showing. What does it mean that the RSS is minimum at
ρ∗ = 0.35?
29
Figure 6: Hildreth-Lu RSS function. The RSS is minimum at ρ∗ = 0,35
0.2
0.18
0.16
0.14
RSS
0.12
0.1
0.08
0.06
-1 -0.5 0 0.5 1
rho
f) Using the Hildreth-Lu estimates and knowing that the estimate of the variance and co-
variance matrix of the estimator of the coefficients is
1, 70446 0, 03642 −0, 47824 0, 07057
0, 03642 0, 00189 −0, 012883 0, 00307
Vdar(β̂HL ) =
−0, 47824 −0, 01283 0, 143331 −0, 02647
test the null hypothesis H0 : β3 = 2β4 . Explain all the elements of the test.
An American consulting firm has signed a contract to produce a report on the relationship
between the number of patents and the expenditure in Research and Development (RD) in
the United States. The firm has got annual data for the period 1960 to 1993 of the following
variables5 :
30
• R&D expenditures, billions of 1992 dollars (Range 57.94 - 166.7)
P AT EN T St = β1 + β2 RDt + ut t = 1, . . . , 34 (1)
a) Interpret the estimated coefficient related to the variable RD. Has it got the expected
sign? Is it a significant variable?
Figure 7: PATENTS on RD, PATENTS on estimated PATENTS and OLS Residuals on time
PATENTS on R_D PATENTS on estimated PATENTS Residuals of the regression (= PATENTS - estimated PATENTS)
200 200 25
Y = 34,6 + 0,792X estimated
actual
20
180
180
15
160
10
160
140 5
PATENTS
PATENTS
residual
140 0
120
-5
120
100 -10
-15
80 100
-20
60
60 80 100 120 140 160 80 -25
R_D 1960 1965 1970 1975 1980 1985 1990 1960 1965 1970 1975 1980 1985 1990
31
Which problem does exist in the previous model? Explain why and comment the possible
consequences on the results shown here and those in the previous question.
After testing several specifications the consulting firm decides to choose one of the following two
models:
3. Are these two models linear? Why? Are both models dynamic? Why?
15 6
10 4
5 2
residual
residual
0 0
-5 -2
-10 -4
-15 -6
-20 -8
1960 1965 1970 1975 1980 1985 1990 1960 1965 1970 1975 1980 1985 1990
5. Do you think that the plots of the residuals in Figure 8 evidence any problem? Test it.
6. Why do you think that the Newey-West estimator of the standard deviations has been
used? Do you find its use reasonable in both specifications?
32
7. Using all the information provided, which one is the best specification to explain the
number of patents? Does the selected model include some dynamics?
8. Given the selected model, obtain the mean increment in the number of patents filed when
the expenditure in research and development in that year increases in one billion dollars,
all other factors remaining constant. Given the sample range, is the estimated increment
positive?
33
STOCHASTIC REGRESSORS
The next specification is proposed for the demand of wine in a particular country:
Qt = βPt + ut
where ut ∼ iid(0, 0.0921). Given that the price Pt is simultaneously determined with the de-
manded quantity Qt , it is suspected that Pt can be correlated with ut . Data on an index of
storage costs, St , which is exogeneously determined, and thus considered independent of ut , are
available.
a) Use the Hausman test to check that conjecture, explaining in detail the testing procedure.
b) Given the result of the test, ¿which estimator of β would you choose? Why?
Yt = βXt + ut
where ut ∼ iid(0, σu2 ) and Xt is non-stochastic. The variable Xt is not observable but there are
available observations from other variable, Xt∗ whose behaviour is similar to that of Xt , such
that:
Xt∗ = Xt + εt εt ∼ iid(0, σε2 )
where E(εt ut ) = 0 ∀t.
Yt = βXt∗ + vt t = 1, ..., T
34
b) What method of estimation can be used to obtain a consistent estimator of β? Write down
the formula for the proposed estimator and the conditions under which this estimator is
consistent.
where X1t is known to be jointly determined with Yt as X1t = Yt + X2t and E(X2t ut ) = 0 ∀t.
b) What are the implications of this fact on the estimator of β in (1) by Ordinary Least
Squares (OLS)? Justify.
c) Write down explicitly the formula of an alternative estimator of β for this model justifying
your choice.
A sample of 60 observations is available where the following cross-products have been obtained:
Yt X1t X2t
Yt 100 40 -60
X1t 80 40
X2t 100
P
for instance Yt X2t = −60.
d) Obtain the estimate of β with the method proposed in c) and also by OLS.
f) If the researcher ignores that X1t = Yt +X2t , how could he or she realize that E(X1t ut ) 6= 0?
Explain and perform the test. Assume that σ 2 = 1.
The model Yt = βXt + ut is to be estimated and it is suspected that there may be unobservable
factors included in ut correlated with Xt .
35
a) If this suspicion is true, what are the implications for the properties of the OLS estimator
of β? Justify your answer in a formal way.
b) Under which conditions would Xt−1 be a good instrument for Xt in order to get an instru-
mental variables estimator of β? Give formal reasons for your answer.
A sample of 60 observations is available where the following cross-products have been
obtained:
Yt Xt Xt−1
Yt 50 20 -30
Xt 40 20
Xt−1 50
P
for instance, Yt Xt−1 = −30.
c) Using the variable Xt−1 as instrument for Xt , obtain the estimate of β by means of the
instrumental variables method.
P
d) What would have happened if Xt Xt−1 = 0?
e) Assuming that ut ∼ iid(0, 1), test the H0 : E(Xt ut ) = 0 explaining in detail the testing
procedure.
where X2t is a fixed variable, X3t is a stochastic variable and β = (β1 , β2 , β3 )′ is the vector of
unknown parameters.
b) Which assumption does guarantee that the OLS estimator of β is unbiased? Prove it.
c) If X3t is stochastic and not independent of ut but E(X3t ut ) = 0, ∀t, is the OLS estimator
of β consistent? Show and indicate the additional assumptions that are necessary to get
this result.
d) If X3t is stochastic but the conditions on the Mann-Wald theorem hold, can we make
inference on β even if the distribution of ut is not known? Give rigorous reasons to
support your answer.
36
EXERCISE 31 (LE-2002.7) (Sep-2002)
A sample of 25 observations gives way to the following sums of squares and of cross-
products:
P P
where, for instance Y1t X1t = −60 and Y1t2 = 100
b) Under the assumption that E(Y2t ut ) 6= 0, define a consistent estimator of β1 and β2 . Write
down formally the conditions that guarantee this property and explain if they hold in this
case.
d) Under the assumption of σu2 = 1, use the Hausman test to check if there is evidence of
correlation between Y2t and ut . Explain the testing procedure, including the null and the
alternative hypotheses.
e) Given that last result, which estimator is preferable in this case? Why?
Assume that the individual savings depend on the individual permanent income according to
the relationship:
Yi = α + βIi + vi (1)
37
where Yi are annual savings and Ii annual permanent income per worker. The permanent income
I cannot be observed, so the regression model to be estimated is:
Yi = α + βXi + ui (2)
where Xi is the worker’s annual income, used as an approximation to I. The results of the OLS
estimation of the model with data on 50 individuals for year 1999 are:
α̂ 4.34 2 ′ −1 0.7165 −0.009
= σ̂OLS (X X) = 1.023 ×
β̂ OLS −0.856 0.0001
i) Economic Theory says that the permanent income-savings relationship is positive. How-
ever, the OLS estimation of the slope β is actually negative. Can you find an explanation
for this apparent contradiction? Reason your answer.
Later, model (2) is re-estimated by instrumental variables. The variable used as instrument is
the average income obtained during the past 10 years (1989-98), which is obviously strongly
related to the permanent income and also to the current annual income. The results are:
α̃ 0.988 2 ′ −1 ′ ′ −1 1.7088 −0.0223
= σ̃IV (Z X) Z Z(X Z) = 1.3595 ×
β̃ IV 0.039 0.0003
2 ?
ii) What is the expression for β̃IV ? And for σ̃IV
iii) Run the Hausman test. Relate the result you obtain with your answer to question i).
Yi = β1 + β2 EDUi + wi i = 1, ..., N
where Yi and EDUi are wage earnings per year (in tens of thousands of euros) and education
level of individual i, respectively. Furthermore, E(EDUi wi ) = 0 for all i and wi is a white noise.
The sample consists of 1000 individuals. However, the education level is approximated by
the observable variable Si , years of education. Such variable is measured with error, as Si =
EDUi + εi where εi is a white noise independent of EDUi and wi .
Using Ordinary Least Squares (OLS), the following results have been obtained:
38
b) Explain in detail the properties of the OLS estimator of β1 and β2 if Si has been used
instead of EDUi in the model. Reason your answer.
We have now information on an additional variable, Pi , measuring the years of education of the
father of individual i. For the sample of 1000 individuals we have the following information:
P P P P 2
i Yi = 2988, 232 i Si = 16707 i Yi Si = 50071, 6 i Si = 283539
P P P P 2
i Pi = 14343 i Yi Pi = 42914, 7 i Pi Si = 240466 i Pi = 206469
P 2
i Yi = 9028, 9
c) Propose a consistent estimator alternative to OLS. What conditions guarantee its consis-
tency? What is its asymptotic distribution? Justify your answer.
e) If a consistent estimator has been used, describe how the following estimate of the asymp-
totic variance and covariance matrix of the estimator proposed in c) has been obtained.
Indicate all steps leading to this result.
98, 88 0, 2984084 −0, 0178
Vd
ar(β̂) =
998 −0, 0178 0, 001065
f) Using the estimator proposed in c), test the hypothesis that an additional year of education
implies an average increment of 720 euros in the annual earnings. Write down the null
hypothesis, the alternative hypothesis and all the elements of the test.
g) Run the Hausman test to analyse if the problem of measurement error is important or not.
Write down the null and the alternative hypotheses as well as all the elements of the test.
h) Indicate, with adequate reasoning, which one of the two estimators you would choose,
taking into account the result of the Hausman test.
where Xt is an exogeneous variable, independent of u1s and of u2s for all t and s.
39
a) Obtain the expression of the instrumental variables (IV) estimator of β using Xt as in-
strument.
c) Is it consistent? Why?
P 2
P P P 2
P P 2
t y2t = 42 t y1t y2t =5 t y2t Xt = 12 t Xt = 10 t Xt y1t =3 t y1t = 11
Furthermore, a consistent estimator of σ12 is available, with σ̂12 = 0, 01. Use the Hausman
test to decide if there exists or not statistical evidence of y2t being an endogeneous variable.
Explain the process in detail.
where Yt , X1t and X2t are the consumption growth rates, interest rate and inflation at period t
respectively. It is assumed that ut ∼ iid(0, σ 2 ). X1t is assumed to be nonstochastic, but inflation
is determined by the demand for consumption and is thus stochastic. In addition, information
on the growth rate of the costs of production Pt (nonstochastic) is also available.
40
100 -14 -16 0,012 -0,030 0,059
′
ZZ= -14 95 -15 ′ −1 ′ ′ −1 ′
(X Z) Z Z[(X Z) ] = 0,002 0,008 -0,010
-16 -15 155 -0,006 -0,033 0,142
If Z is the matrix of instruments, estimate the model using instrumental variables. Write
down the matrix Z and the instrument that is used, explaining why it has been chosen as
instrument. What are the properties of this estimator?
P 2
c) If ût,IV = 2.037, how would you test if the OLS estimator is consistent? Explain the
procedure and test that hypothesis. Based on the result of the test, what method of
estimation would you choose? Why?
Y i = β 1 + β 2 Xi + u i i = 1, . . . , N, (1)
a) Which problem does exist in this model? How could it be detected? Explain in detail the
proposed test and the consequence of rejecting or not the null hypothesis.
b) What are the consequences of the use of the OLS estimator on the tests of hypotheses
about β1 and β2 ? And of the use of the IV estimator? Justify in detail your answer.
With a sample of 500 observations we obtain the following results of sums of squares
and of cross-products6 :
1 Yi Xi Z1i
1 500 1530.17 14.48 -0.23
Yi 7163.54 1551.83 448.79
Xi 1037.57 451.24
Z1i 509.40
P P
where, say, Yi Xi = 1551.83 and Yi = 1530.17
c) Using this information fill in the blank elements inside the matrices below in order to
obtain the IV estimates of β1 and β2 , considering Z1 as the unique instrument:
6
Source: file vacation.dat from the book Undergraduate Econometrics by Hill, Griffiths and Judge (2001).
41
−1
3.03
βbIV =
=
0.996
Ŷt = ...
d β̂IV ))
(dev( ( ) ( )
Restriction set
1: b[const] = 3
2: b[X] = 1
Test statistic: chi^2(2) = 0.490224, with p-value = 0.782617
f) An alternative estimator to that in c) has been considered, obtaining the following results
(using Gretl):
Explain, step by step, the process leading to the calculation of this estimator. Is it better
than the previous one?
42
DYNAMIC MODELS
The following model has been proposed to analyse the dependency of the Madrid Stock Exchange
market on their New York and London counterparts
Md
ADt = 0, 0095 + 0, 4990 LONt−1 + 0, 1800 N Yt−1 DW = 0, 82 R2 = 0, 88 (1)
(St. deviations →) (0, 0032) (0, 1200) (0, 1900)
Later the explanatory variable M ADt−1 is added to the model, which is then estimated
with the same data:
M ADt = 0, 0031 + 0, 1910 M ADt−1 + 0, 8400 LONt−1 + 0, 0600 N Yt−1 + v̂t (2)
(0, 0012) (0, 0800) (0, 2460) (0, 0120)
with DW = 1, 9 and
Yt = β1 Yt−1 + β2 Xt + ut (1)
where: Yt is the sale price of a first hand flat at time t.
Xt is the interest rate at time t.
We have the following information:
43
• The model is correctly specified.
The three researchers do not agree about what the best estimation method is, proposing three
different alternatives:
0.831371 0.00046 −0.00134 4442.139
= (3)
0.882068 −0.00134 0.0076 903.487
b) What are the properties of the estimators? Perform some test if you think it is necessary.
0.770343 0.003809 −0.00291 0.770343
= (5)
1.060368 −0.01112 0.012178 903.0487
where BG(1) = 27.66 RSS = 165.5112
d) What are the properties of the estimators? Perform some test if you think it is necessary.
P ∗2 P ∗ −1 P ∗
β̂1 P Y∗ t−1 ∗ Yt−1 Xt∗
P PYt−1 Yt∗
= (6)
β̂2 Yt−1 Xt Xt∗2 Xt∗ Yt∗
0.775642 0.001035 −0.00117 1014.806
= (7)
1.090742 −0.00117 0.00938 245.7676
44
P
ût ût−1
ρ∗ = P 2 = 0.5387823 (8)
ût−1
f) Given all previous answers, which researcher has used the best estimator? Reason your
answer.
A professional report proposed two possible models to explain the evolution of the demand of
petrol. There exist quarterly data from 1959 to 1990 (both years included) for the following
variables, all measured in logarithms:
ût = − 0, 01 − 0, 003 X2t − 0, 004 X3t + 0, 004 X4t + 0, 62 ût−1 − 0, 007 ût−2
d
(dev) (0, 09) (0, 008) (0, 012) (0, 004) (0, 09) (0, 107)
45
The OLS results are:
a) Based on the results for model (1), do you think that the basic assumptions are verified
in that model? Perform any tests you judge relevant.
c) Based on the results of model (2), do you think that the basic assumptions are verified in
that model? Perform any tests you judge relevant. Reason your answer.
e) How would you test the hypothesis that the income-elasticity is equal to 1? Explain all the
elements of the test, such as the used model, the null and the alternative hypothesis, the
used estimator, the statistic, its distribution under the null hypothesis and the decision
rule. If you have got enough information, perform the test.
The following table shows the 8 first observations of the variables Yt , Xt and ût,OLS :
46
t Yt Xt ût,OLS
1 8,5 11
2 8,9 13
3 16 14
4 7,8 14,9
5 16,4 15,1 0,625
6 7,9 18 -1,864
7 18 18,8 1,304
8 8 19,1 -0,797
.. .. .. ..
. . . .
a) Using the observations in the table, obtain the initial OLS residuals. Analyse graphically
the possible presence of first order autocorrelation in the disturbances. Explain how you
would test this assumption in a formal way.
b) If the null hypothesis in the previous question is rejected and assuming that the dis-
turbances follow an AR(1) process, that is, ut = ρut−1 + ǫt ǫt ∼ iid(0, σǫ2 ), show the
properties of the OLS estimator of the parameters in equation (1).
-0,233 0,203 -0,207 -0,233 -0,0062 0,033
(Z ′ X)−1 = -0,0062 0,0032 -0,0032 (X ′ Z)−1 = 0,203 0,0032 -0,021
0,033 -0,021 0,021 -0,207 -0,0032 0,021
d) Do you think that the previous estimator solves the autocorrelation problem? Reason your
answer.
47
where Yt∗ = Yt − ρ̂Yt−1 , Xt∗ = Xt − ρ̂Xt−1 , Yt−1 ∗ = Yt−1 − ρ̂Yt−2 and ρ̂ is a
consistent estimator of the parameter of the first order autoregressive process.
∗ in
e) Describe the consistent estimation of the parameter ρ used to obtain Yt∗ , Xt∗ and Yt−1
the above expressions.
f) With the above information, is it possible to estimate the parameters of equation 1 im-
proving the properties of the IV estimator? Describe the proposed method and place the
above sums in the matrix formulae of the corresponding estimator (but do not perform
any calculations).
g) How would you test the null hypothesis H0 : β2 = 1? Describe in detail all the elements
intervening in the test.
A farmer wants to measure the relationship between the amount of collected strawberries in
kilograms Q and the number of employed labourers L. An analysis is outsourced to a professional
econometrician who specifies the following model:
where Lt is non-stochastic and ut has a normal distribution. The OLS estimation provides:
bt
Q = 1115, 93 − 2, 4462 Lt R2 = 0, 8594 DW = 0, 3210 T = 35 (2)
(t-stat) (36,62) (-14,20)
The following regressions are also available, where ût are the OLS residuals from (2):
û2t
(û′ û/35) = 0, 4432 + 2, 2378Lt + ζ̂3t RSS = 70, 4985 R2 = 0, 0427 (C)
û2t
(û′ û/35) = 1, 7899 + 0, 9955ût−1 + ζ̂4t RSS = 55, 2297 R2 = 0, 0577 (D)
48
Q and estimated Q Residuals from regresssion (= Q - estimated Q)
1000 150
estimada
actual
900
100
800
50
residual
Q
700
600
-50
500
400 -100
1970 1975 1980 1985 1990 1995 2000 1970 1975 1980 1985 1990 1995 2000
3. Comment on the plot featuring the actual and the fitted values of the endogeneous variable.
Is it a good fit? Comment on the residuals plot. Given both graphics, do you think that
the model satisfies all basic assumptions?
4. Based on the provided information verify if the disturbances satisfy the basic assumptions.
5. Given the evidence you have found, explain its consequences on the OLS estimator of the
coefficients and the reliability of the statistics shown above.
6. Based on the results obtained above the econometrician estimates model (1) using an
estimator which is thought to be more adequate in this context. The results are:
Which estimation method is used here? Specify all the steps needed in order to obtain
these results. Why is it more adequate than the previous one? Reason your answer based
on the properties of the estimator.
After a conversation with the farmer, it is known that a good harvest is generally followed by
another good one and reversely, that is, a poor harvest is likely to be succeeded by another bad
one. This makes the econometrician to think that the amount of strawberries harvested in the
previous season could affect the current one. Then the econometrician specifies and estimates
the following model:
Qt = α1 + α2 Lt + α3 Qt−1 + wt (3)
49
Model 3: OLS estimates using 34 observations 1971–2004
Dependent variable: Q
ŵt = −255, 47 + 0, 579406Lt + 0, 231059Qt−1 − 0, 804475ŵt−1 + η̂3t RSS = 10958, 4 R2 = 0, 4869 (G)
ŵt2
(ŵ′ ŵ/34) = 0, 4432 + 2, 2378Lt + η̂4t RSS = 77, 8328 R2 = 0, 05665 (H)
ŵt
(ŵ′ ŵ/34) = 3, 9229 + 2, 2552ŵt−1 + 0, 3463Qt−1 + η̂5t RSS = 50, 0805 R2 = 0, 0064 (I)
7. Perform the tests you think relevant and calculate (or explain in detail) the following
equalities:
E(wt ) =
E(wt2 ) =
Cov(wt , ws ) =
E(Lt wt ) =
E(Qt−1 wt ) =
E(β̂OLS ) =
8. What can you say about the Mann and Wald’s theorem and the consistency of the OLS
estimator?
9. In order to test that the harvest from the previous season is a relevant factor to explain
the current harvest, a consistent, asymptotically efficient and valid for inference
estimation has been implemented in model (3) with the results:
Qt − ρ̂Qt−1 = 25, 28 (1 − ρ̂) + 0, 064 (Lt − ρ̂Lt−1 ) + 1, 067 (Qt−1 − ρ̂Qt−2 ) + ǫ̂t (4)
d α̂i )) | {z } (0,125) | {z } (0,048)
(dev(
| {z } Xt∗ L∗t
Q∗t
R2 = 0, 981 DW = 1, 98
50
where ǫt is a white noise such that ǫt = wt − ρwt−1 and wt are the disturbances of model
(3).
a) ǫt ∼ ( , )
b)
−1
α̂1 ............ ............ ............
25, 28 ............
α̂2 ............ ............ ............
= = ............
0, 064
α̂3 ............ ............ ............
1, 067 ............
.........
c) Which consistent estimator of ρ has been used? Detail all the elements and conditions
that guarantee the consistency of the estimator of ρ.
d) Is it true that the harvest of the previous season is a factor determining the current harvest?
What are the implications on the result?
The ALIMENTAX S.A. food store wants to implement an expansion policy inside its region. For
that purpose it has requested a management report on the consumption function in such area.
The data available7 are yearly from 1959 to 1994 with observations of the following variables:
Ct = β1 + β2 Wt + β3 Pt + ut t = 1, . . . , T (1)
51
Coefficient Std. Error t-ratio p-value
const −222,158 19,5527 −11,3620 0,0000
W 0,693262 0,0326064 21,2615 0,0000
P 0,735916 0,0488218 15,0735 0,0000
PART 1:
a) What does the sentence the variables are measured in constant terms mean?
b) Interpret the parameter β2 :
c) Comment on the plot of the residuals below.
40
20
0
residual
−20
−40
−60
−80
−100
1960 1965 1970 1975 1980 1985 1990
d) Do all basic assumptions on the disturbances hold? Analyse the displayed results and fill
in the blank elements in the matrices below according to the test or tests performed:
E(u) =
E(uu ) = . . . . . .
′
e) Assuming that Wt and Pt are non-stochastic, what are the properties of the OLS estimator
of the coefficients in model (1)? Justify your answer.
PART 2:
52
The manager is not convinced with the model specification and decides to consider two
alternative specifications.
• Specification A:
Ct = β1 + β2 Wt + β3 Wt−1 + β4 Pt + ut t = 2, . . . , T
Estimating by OLS:
• Specification B:
Ct = β1 + β2 Wt + β3 Pt + β4 Ct−1 + ut t = 2, . . . , T (2)
53
Residuals from regression (= C − estimated C)
60
40
20
0
residual
−20
−40
−60
−80
1960 1965 1970 1975 1980 1985 1990
g) Given all the results of the previous estimation, what can you say about the validity of
the displayed significance tests?
PART 3:
54
i) Why has the manager used such method of estimation ?
k) Describe step by step the procedure that the manager has followed to obtain these results.
What is the difference between this estimator and that from Results 2?
l) Among the three results obtained for model (2), which one do you think is the best?
Reason your answer.
55
EXAMS
A company dedicated to the assembly, sale and installation of windows wants to analyse its
sales. A sample of monthly observations from January 2005 to December 2011 is available on
the quantity of windows sold (V , in thousands of units) and the average price for window (P , in
dozens of Euros). The economist of the company assumes that Pt is non stochastic and proposes
the following model to be estimated with the available information:
Vt = α1 + α2 Pt + wt . (1)
20 60
V
10 50
40
0
30
-10
20
-20 10
-30 0
2005 2006 2007 2008 2009 2010 2011 2012 2005 2006 2007 2008 2009 2010 2011 2012
1) Given all the available information, are all the basic assumptions on the disturbances satis-
fied? Explain your response based on the figures and the possible tests you may implement.
2) How could we know, using the OLS estimator, if price is a statistically significant variable?
Explain in detail the testing procedure you propose.
56
Given the results above the economist decides to take into account that sales and subsequent
installation of windows are more frequent in the hot months (July, August and September) than
in other months, and includes monthly dummy variables:
where the dummy variables dm7t , dm8t and dm9t are equal to one if the observation at time t
is in July, August or September respectively and zero otherwise.
3) Do you think that the inclusion of the dummy variables has influenced the characteristics
of the disturbances?
4) Given your answer in the previous question, what are the properties of the estimator of the
coefficients used in Model (2)?, what are the properties of the estimator of the standard
deviation of β̂? Explain in detail your answer.
The economist has some doubts about the method of estimation to be used. Thus he/she per-
forms some trials. The results of the estimation in the first trial are:
FIRST TRIAL
57
Dependent variable: V
ρ̂ = 0.277338
5) Which is the method of estimation used by the economist in this first trial? Explain in
detail.
6) What is the improvement in the estimation of Model (2) that you expect to achieve with
this strategy?
The economist suspects that price may be a stochastic variable. The second trial gives the
following results:
SECOND TRIAL
7) Which method of estimation is the economist using in this second trial? Describe it in
detail and write down the elements in the matrices below needed to get the estimates.
58
−1
βb.......... =
THIRD TRIAL
Durbin-Watson=2.123
Breush-Godfrey(first order autocorrelation)=0.07005
20 80
70
residual
10
60
V
0
50
-10 40
30
-20
20
-30
10
-40 0
2005 2006 2007 2008 2009 2010 2011 2012 2005 2006 2007 2008 2009 2010 2011 2012
59
9) Discuss the figures and compare them with those obtained before. What do you think is
the reason of the differences observed?
10) Analyse the results obtained in the three trials and chose the best method of estimation.
Explain your answer.
The owner of a restaurant wants to know if spending in advertising (PUB, in euros) and the
reforms implemented in the restaurant in January 2012 (REF, 1 from January 2012 onwards
and 0 before 2012) have a significant effect on the total number of meals served (M). For
that purpose the owner has monthly data from January 2010 to October 2012, with which the
following estimation has been obtained.
2) Analyse the following information and Figure 9 and explain its implications on the con-
clusions in the previous section.
60
Figure 9: OLS estimation
Residuals of the regression (= V observed − estimated) M observed and estimated
300
2200
estimated
observed
200 2000
100 1800
residual
1600
0
M
1400
−100
1200
−200
1000
−300
600 800 1000 1200 1400 800
PUB 2010 2011 2012
3) It is suspected that the variance of the disturbance could be a quadratic function of ad-
vertising expenditure. Propose one structure for the covariance matrix of the disturbances
according with that suspicion and get the transformed model that solves the problem.
Given the previous suspicion, the model is reestimated assuming a particular structure for the
variance. The following results are obtained:
4) What method of estimation has been used? Why has it been chosen to estimate the model?
5) What would you tell the owner of the restaurant about the expenditures on advertising
and the reform of the restaurant?
To analyse the variables that affect the consumption of cigarettes in the U.S., data for 48 U.S
states in 1995 are available for the following variables8 (all of them in logarithms):
8
Source: Introduction to Econometrics by Stock J.H and Watson M.W.
61
• l pop: state population
• l tax: average state, federal and local excise taxes for fiscal year (exogenous and thus non
stochastic)
With that purpose the following model has been estimated by OLS.
d
l packpc i = 10.9745 + 0.436418 l incomei − 1.38842 l avgprsi − 0.474018 l popi (1)
(1.1152) (0.24436) (0.25004) (0.25466)
a) Interpret β̂lavgprs = −1.388. Are all the variables significant at 5% significance level?
b) The researcher thinks that there may exist heteroscedasticity in the disturbances of the
model. In order to check that possibility he/she makes use of the following auxiliary
regression:
ê2i = 9.247 − 3.588 l incomei + 0.947 l avgprsi + 3.452 l popi + ŵi (2)
(10.939) (2.397) (2.453) (2.498)
T = 48 ESS = 11.856
P 2
where ê2i = û2i /σ̂ 2 with σ̂ 2 = ûi /T for û the OLS residuals. Explain in detail how you
would use this result to test the hypothesis of homoscedasticity. What is the conclusion
of the test?
c) Another researcher is reluctant to accept the results in equation (1) because he/she believes
that the price (avgprs) is also affected by the demanded quantity of packs of cigarettes,
and thus the disturbances in model (1) are likely to be correlated with l avgprs. Therefore
he/she proposes to estimate the model by IV using l tax as the instrument, obtaining the
following result:
d
l packpc i = 11.0754 + 0.449578 l incomei − 1.41622 l avgprsi − 0.486993 l popi
(1.1959) (0.25078) (0.27684) (0.26066)
T = 48 σ̂ = 0.18604
(standard errors in parentheses)
(3)
62
Explain in detail how this estimated model has been obtained. Why is l tax chosen as the
instrument?
d) Which estimated model, (1) or (3), should be used to analyse the consumption of cigarettes?
Implement the test you consider necessary to support your answer.
To analyse the relationship between the growth rates of consumption and of personal disposable
income in USA, the following model is proposed:
Ct = α + βY dt + ut
where
The explanatory variable Y dt is assumed exogenous (non stochastic) and the model is estimated
by OLS using data from II-1947 to III-2003 with the result
b t = 0.00610406 + 0.298581 Ydt
C (1)
(0.00068906) (0.050942)
c) If the disturbances were autocorrelated, what would be the effects on the results (esti-
mated coefficients and standard errors) shown in equation (1)?
d) Another researcher thinks that there is first order autoregressive autocorrelation in the
disturbances and estimates the model by Cochrane-Orcutt obtaining the results:
63
Figure 10: OLS residuals
Regression residuals (= observed - fitted C)
0.04
0.03
0.02
0.01
residual
-0.01
-0.02
-0.03
-0.04
1950 1960 1970 1980 1990 2000
e) Let û∗t be the OLS residuals in the transformed model used to get the results in equation
(2). The following auxiliary regression has been obtained by OLS:
û∗t = 0.001X1t
∗ ∗
− 0.012X2t + 0.055û∗t−1 + ǫ̂t (3)
2
R = 0.052 RSS = 0.012 ESS = 0.043 DW = 1.97
where X1t∗ and X ∗ are the explanatory variables in the transformed model. Test if there
2t
exist autocorrelation in the transformed model, explaining clearly all the elements of the
test (null and alternative hypothesis, test statistic, distribution ...).
f) A third researcher thinks that the relationship between consumption and personal dispos-
able income is dynamic and obtains by OLS the following estimated model:
where v̂t are the OLS residuals in the estimated model in (4). Can you say something
about the compliance of the basic assumptions on the disturbances?
64
g) Which estimated model (1), (2) or (4) is more adequate to explain the relationship between
consumption and personal disposable income? Why?
Y t = β 0 + β 1 Xt + u t t = 1, ..., T
where all the basic assumptions of the GLRM are satisfied. However Yt is not directly observable
but it is measured with error as Yt∗ such that: Yt = Yt∗ + ǫt , ǫ ∼ N (0, σǫ2 I), and the researcher
only has data to estimate the model:
a) Explain the relationship between the coefficients of the original model and those of the
model to be estimated.
b) What are the characteristics of the disturbances u∗t in the model to be estimated?
c) Consider the OLS estimators β̂0∗ and β̂1∗ . Are they unbiased estimators of β0∗ and β1∗ ?
Proof it.
d) Would your answer to the previous question change if Xt were a stochastic variable?
A group of researchers in an NGO wants to analyse the factors that affect the global warming.
To that end they propose the following model:
where
65
• X1t : number of sunspots at time t (no stochastic).
• X2t : index of CO2 emitted to the atmosphere at time t (no stochastic).
The model is estimated by OLS with a sample of 100 monthly observations, obtaining the
following results9 :
a) Are X1t and X2t individually significant factors to explain global warming?
b) Do you think that the disturbances ut satisfy all basic assumptions? Base your answer on
some formal test.
c) Based on your answer to the previous question, comment on the validity of your answer in
a) and the properties of the OLS estimator of model (1).
d) One of the researchers thinks that the variable temperature exhibits some time dependence
and proposes the following dynamic model:
Yt = β0 + β1 X1t + β2 X2t + β3 Yt−1 + vt (3)
The model is estimated by OLS obtaining
b t = 8.97 + 0.36 X1t + 0.14 X2t + 0.37 Yt−1
Y (4)
(2.11) (0.11) (0.12) (0.04)
e) Taking into account this new estimation, would you change your answers to questions a)
and c)?
9
Fictitious results, not based on real data.
66
EXERCISE 49 (GE.7) (May-2014)
In order to analyse the factors that affect the wage of an individual, the following model is
proposed:
lwi = β0 + β1 Edui + β2 agei + ui (1)
With a sample of 3010 individuals recorded in 1976 the following results have been obtained by
OLS:
1 1
0.5 0.5
0 0
residual
residual
-0.5 -0.5
-1 -1
-1.5 -1.5
-2 -2
2 4 6 8 10 12 14 16 18 24 26 28 30 32 34
Edu age
û2i
= 1.072 − 0.011 Edui + 0.002 agei + ŵi
σ̂ 2 (0.289) (0.010) (0.009)
X
T = 3010 ESS = 2.606 σ̂ 2 = û2i /3010
a) Do you find some evidence of failure of some basic hypotheses on the disturbances? Use all
the information provided, including Figure 21 and the auxiliary regression.
67
b) Interpret β̂1 = 0.052. According to your answer in a), what are the properties of β̂1 , as-
suming that Edu and age are non stochastic?
c) Another researcher thinks that Edui does not reflect completely the education of individual
i, but it is a proxy of the true level of education, edi , such that Edui = edi + εi , where the
measurement error εi and edi are independent of each other. If ed is the factor that affects
the wages, what are the properties of the OLS estimator of the coefficients in model (1)?
Explain in detail.
d) This researcher estimates the model using the variables Edufi (years of education of the
father of individual i) and Edumi (years of education of the mather of individual i) as
instruments of Edui . The following results are obtained:
4.012 0.01321 0.00852 0.00745
β̂IV = 0.077 , Vd ar(β̂IV ) = 0.00004 0.00003
0.043 0.00001
e) Explain the properties of this estimator, stating clearly the assumptions needed for them to
hold.
f ) Use a formal test to see if this researcher was right in his/her suspicion about the factor
education.
g) Taking into account all your answers to the previous questions, test if education has a pos-
itive effect on wages.
A researcher wants to analyse the effects of the economic globalization on unemployment (Yt ).
An index based on the exchange rate euro/US Dolar, Xt , which is assumed to be nonstochastic,
is used as a proxy of the degree of economic globalization. The sample is composed of monthly
data of both X and Y .
68
Figure 12: OLS residuals
0.32
0.24
0.16
0.08
0.00
-0.08
-0.16
-0.24
1963 1968 1973 1978 1983 1988 1993 1998
b) Is Xt individually significant?
0.024
0.24
0.016
0.16
0.008
0.08
0.000
0.00
-0.008
-0.08
-0.016
-0.16
-0.024
-0.24 -0.032
1963 1965 1967 1969 1971 1973 1975 1983 1986 1989 1992 1995 1998
Compare the graph of the residuals in models (2) and (3) with those in the whole sample in
Figure 1. Test for the presence of heteroscedasticity in the whole sample. Explain clearly
all the elements of the test.
69
d) Do you think that the OLS estimation in (1) is adequate? And the test made in question b)?
e) Test the hypothesis of no autocorrelation in the second subsample: from 1983 to 1999.
Explain clearly all the elements of the testing procedure.
f ) Finally, a new model including Yt−1 as regressor is estimated by OLS for the period 1983-
1999 (196 observations). The OLS residuals v̂t are used in the auxiliary regression in
(5).
Compare the results in models (3) and (4) and explain the properties of the OLS estimator
in both models. Run all the tests you judge necessary.
a) Starting from equation (1) propose an estimable model based on Yt and Z1t .
Yt = βZ1t + vt t = 1, 2, . . . , T (3)
c) Assume now that observations of two exogenous variables Z2t y Z3t are available, and that
both of them are correlated with Z1t . Bearing in mind this new information, how would
you estimate consistently β? How would you estimate the variance of the disturbances in
the model (8)?
70
d) Estimation of the model in equation (8) has led to the following results:
Test if the measurement error is important. Based on the results of the test, which method
of estimation is more reliable? Why?
The World Health Organization is worried about the differences in life expectancy around the
world and starts a research searching for the causes of these differences. As a first step, the
following model is proposed:
where
The model is estimated by OLS with a sample of 119 countries, obtaining the following results:
The researcher is worried about the possibility of GDP being correlated with the disturbances.
To avoid problems he/she estimates also the model by IV using T V (Televisions per 100 people)
as an instrument of GDP ,
a) What are the properties of the OLS estimation in Model (1) if ui ∼ iid(0, σu2 )? Base your
answer on some formal test.
71
b) Do the estimated coefficients have the expected signs?
Assume hereafter that GDP is an exogenous (non stochastic) variable. Consider the following
graphs of the OLS residuals:
15 15
10 10
5 5
residual
residual
0 0
-5 -5
-10 -10
-15 -15
0 5000 10000 15000 20000 0 10000 20000 30000 40000 50000 60000 70000
GDP PopDoc
c) Based on the graphs of the OLS residuals, do you think that the disturbances ui satisfy all
basic assumptions?
d) The researcher in charge of the investigation estimates also the model by OLS using the 43
observations corresponding to the smallest GDP percapita and the 43 to the largest GDP,
obtaining:
Sample: 43 countries with low GDP
With this information, can you add something to your answer in question c) about the
fulfillment of the basic assumptions of the disturbances in Model (1)?
72
e) Not convinced by the results obtained with OLS, the researcher proposes to estimate the
parameters in Model (1) by applying OLS to a transformed model. The transformation
consists in multiplying dependent and explanatory variables by the square root of GDP,
obtaining the following results
Describe in detail the method of estimation that he/she is proposing. When is this esti-
mated model better than the one obtained by OLS? Why?
f ) The following regression is also estimated with the OLS residuals in the transformed model
(denoted û∗i ):
A group of researchers wants to analyse the factors that affect the consumption of spirits. To
that end the following model is proposed:
Qt = β 0 + β 1 I t + u t (1)
where:
73
• Qt : Growth rate of the consumption of spirits in year t,
• It : Growth rate of the income per capita in year t (assumed exogenous, non-stochastic).
The following estimated model has been obtained by OLS with a sample of annual observations
from 1870 to 193810 :
b t = −0.0144 + 0.8386 It
Q
(0.0047) (0.2512)
2
T = 68 R = 0.1444 DW = 1.4584 σ̂ = 0.0370
(standard errors in parentheses)
0,1
0,05
0
residual
-0,05
-0,1
-0,15
-0,2
1870 1880 1890 1900 1910 1920 1930
a) Do you find some evidence of failure of some basic hypotheses on the disturbances? Use all
the information provided, including Figure 15.
Three researchers of the group consider that the estimation can be improved in different ways.
The first one thinks that the disturbances are autocorrelated and proposes to estimate Model
(1) using Cochrane-Orcutt. The results are
74
Coefficient Std. Error t-ratio p-value
const −0.0152441 0.00609762 −2.5000 0.0150
I 0.886346 0.245357 3.6125 0.0006
b) Describe in detail how the value BG(1) (Breusch- Godfrey for first order autocorrelation)
has been obtained (note that the statistic is based on the rho-differenced data).
c) Is there any improvement in the properties of the estimated coefficients with respect to OLS
in Model (1)?
The second researcher thinks that the prices of spirits should be included in the model to explain
its consumption and proposes the following model
Qt = β0 + β1 It + β2 Pt + vt (2)
where Pt is the growth rate of the prices of spirits in year t (assumed non-stochastic). The
estimated model by OLS is
d) Taking into account this new information, would you change your answer to question c)?
Finally, the third researcher thinks that the dynamism in the consumption of spirits should be
included as
Qt = β0 + β1 It + β2 Pt + β3 Qt−1 + wt (3)
75
e) Use the estimated model you consider most adequate to test if the growth rate of income
per capita is a significant variable to explain the variations in the consumption of spirits.
Which evidences do you use for your choice of the estimated model in which the test is
implemented?
The following variables have been used to study the labour market in USA in 199111 :
In particular, the wages of married women are to be analysed. With that purpose, the following
OLS estimated model is obtained:
a) Write down the sample regression function, indicating what is the sample size N .
b) Accordingly to this estimated model, and assuming that all basic hypothesis in the GLRM
are satisfied, are wife’s wages affected by having children younger than 6 years?
76
Figure 16: OLS residuals
Residuals (= earns observed − estimated)
3000
2500
2000
residual 1500
1000
500
−500
0 2 4 6 8 10 12 14 16 18
educ
b) Explain how to test that problem, indicating all the elements of the test.
a) What is this regression used for? Use it to test if some of the basic hypothesis is not
satisfied.
b) Taking into account the results obtained with that regression, and that V ar(ui ) is
unknown, do you think that a better estimator than that in question a) exists? Why?
Explain how you would obtain it and its properties.
d i = −123, 041 + 35, 5347 educi − 1, 77105 agei − 25, 3981 kidge6i − 99, 7546 kidlt6i
earns
(25,539) (1,4265) (0,38620) (8,1434) (9,7634)
2
T = 5634 R = 0, 1406 σ̂ = 244, 21
(Heteroskedasticity-robust standard errors between round brackets)
77
Are wife’s wages affected by having children younger than 6? Justify in detail your answer
and the validity of the test. Compare it with the test used in question b).
f ) Now, data on the same variables are available for husbands: husearnsi is the salary of wife
i’s husband; huseduci his years of education and husagei his age. The salary of the woman
(earnsi ) is believed to depend on the salary of her husband (husearnsi ) and vice versa.
Consider the following OLS estimated model:
d i = −121, 142 + 33, 4438 educi − 1, 78987 agei − 29, 8467 kidge6i − 102, 056 kidlt6i
earns
(24,351) (1,2701) (0,38484) (7,9674) (9,4145)
+ 0, 0617422 husearnsi
(0,0081764)
(1)
T = 5634 R2 = 0, 1492 σ̂ = 243, 00 (Standard errors between round brackets)
Hausman test- Null hypothesis: OLS is consistent
Asymptotic test statistic: Chi-cuadrado(1) = 4,36097 with p-value = 0,0367713
Explain the relationship of the regression in (2) with the Hausman test indicated after
regression (1). Implement the Hausman test, indicating all its elements. What are the
implications of the result of the test on the OLS estimation in (1)?
We have information on the industrial production function in Greece12 for the period 1961-1987:
78
Figure 17: Time series plot of OLS residuals
Residuals (= lOUTPUT observed − estimated)
0,15
0,1
0,05
residual
−0,05
−0,1
−0,15
1965 1970 1975 1980 1985
a) Do you think that it is reasonable to assume that all the basic hypothesis of the GLRM are
satisfied? Base your answer on Figure 17 and on some test.
0,5
0,4
RSS
0,3
0,2
0,1
0
-1 -0,5 0 0,5 1
rho
79
a) What does the term rho in Figure 18 refer to? What is (approximately) an estimate
of it? Explain and mark it in Figure 18.
b) What is the improvement that you expect with this FGLS estimation over the initial
OLS estimation? Which requirement should the disturbances in the initial model
satisfy in order to gain that improvement?
c) Test the hypothesis of increasing returns to scale in the production function (that is,
the sum of the coefficients of labour and capital factors is larger than one).
a) What are the properties of OLS in this model? Justify your answer.
b) Taking into account the results obtained in this new model, what are the properties
of OLS in the initial model?
In order to analyse the impact of publicity on TV channels in 2014, a model is proposed for
Yi = total income for publicity in channel i, in millions euros, as a function of Xi = average total
market share (audience) of channel i, in percentages:
Yi = α + βXi + ui i = 1, ..., N N = number of channels with open emission, (1)
iid
with Xi non-stochastic and ui ∼ (0, σu2 ).
The total market share is unknown, and an estimation of it is instead obtained based on a sample
of 4625 measures obtained from audiometers installed in the same number of houses randomly
80
selected. Let Ai = approximated market share obtained from the audiometers, related with the
total market share by the equation:
iid
Ai = Xi + vi con vi ∼ (0, σv2 ) and ui independent of vi (2)
a) Write down the model that can be estimated with the information available. Does it satisfy
all the basic hypothesis of the GLRM? Explain in detail.
b) What are the properties of OLS in the model to be estimated, proposed in the previous
question? Explain in detail.
A researcher wants to analyse the factors that affect the level of salary in USA. With that
purpose, a sample of 3010 men in 1976 is considered with information on the following variables:
• nearc4= dummy variable, =1 if the individual grew up near a four-year college, =0 other-
wise.
b) Explain how the OLS residuals have been obtained and make some comments on Figure 19.
81
Figure 19: OLS residuals against experience
Regression residuals (= observed - fitted l_wage)
1.5
0.5
0
residual
-0.5
-1
-1.5
-2
0 5 10 15 20
exper
c) Using one of the following auxiliary regressions test for the fulfilment of the basic hypothesis
in the disturbances:
Not convinced by the results, the researcher estimates the model by weighted least squares,
obtaining
82
Breusch-Pagan test for heteroskedasticity -
Null hypothesis: heteroskedasticity not present
Test statistic: LM = 2945.2 with p-value = P(Chi-square(3) > 2945.2) = 0
d) Explain how these WLS estimates have been obtained, and what improvement is expected
with respect to OLS. Indicate in what context that improvement is actually achieved.
e) Do you think that the estimation by WLS achieves that improvement over OLS?
A second researcher thinks that education and wages are affected by the same factors such that
educ and the disturbances in model (1) are correlated. He/she decides then to estimate the
model by Instrumental Variables using nearc4 as instrument of educ, obtaining the results
f ) What conditions does nearc4 need to satisfy to be a good instrument of educ? What are
the properties of the IV estimator if those conditions are satisfied?
g) Use some formal test to decide if the suspicion of the second researcher is correct.
h) With all this new information, would you change your answer in a) about the race discrim-
ination in wages?
The relationship between unemployment and inflation specified by the Phillips curve implies a
trade off between both variables, such as high unemployment is accompanied by low inflation.
To analyse this relationship the following model is proposed:
where:
83
• unemt : rate of unemployment in year t.
With a sample of annual observations from 1948 to 2003 the following estimated model has been
obtained by OLS:
4
residual
-2
-4
-6
1950 1960 1970 1980 1990 2000
a) Do you find any evidence of failure of some basic hypotheses on the disturbances? Use all
the information provided, including Figure 20.
56
X 56
X 56
X
û2t = 474.25 , ût ût−1 = 270.98 , (ût − ût−1 )2 = 7.76
t=2 t=2 t=2
Assuming that ut ∼ AR(1), explain in detail how you would estimate the model in an
asymptotically efficient way.
84
It is now believed that the temporal dependence existing in the inflation should be incor-
porated in the model, as in
c) Explain how the value BG(1) = 2.15 has been obtained and use it to test for the fulfilment
of the basic hypothesis in vt .
d) With all the results in a) and c), what do you think about the negative relationship between
unemployment and inflation specified by the Phillips curve? Support your answer with
some valid test.
A researcher wants to analyse the factors that affect the rate of employment in Puerto Rico.
Annual series for the period 1950-1987 are available of the following variables:
• lmincov : logarithm of the rate of the minimum salary over the average salary in Puerto
Rico.
85
Model 1: OLS, using observations 1950–1987 (T = 38)
Dependent variable: lprepop
0,1
0,05
residual
-0,05
-0,1
-0,15
1950 1955 1960 1965 1970 1975 1980 1985
a) Considering Figure 21 and some formal test, what can you say about the basic hypothesis
in the disturbances?
b) Interpret the estimate of β2 . Taking into account your response in a), do you consider this
estimation reliable? Justify your answer with the properties of the employed estimator.
c) Test the individual significance of the variable lmincov. Do you think that the result of this
test is reliable? Justify your answer.
The researcher suspects that there is autocorrelation in the disturbances and decides to apply
Hildreth-Lu to estimate the model by FGLS. The results are as follows:
86
Dependent variable: lprepop
ρ̂ = 0.96
d) What process is the researcher assuming for the disturbances in model (1)? Write it down
and explain in detail the method of estimation that she/he is using, specifying clearly the
transformed model and explaining all the steps followed to obtain the FGLS results shown
above.
Figura 22 shows the OLS residuals along time in the transformed model.
0,0-
0,0
residuals
-0,0
-0,0-
-0,06
1955 1960 1965 1970 1975 1980 1985
e) With all the information obtained so far, what can you say about the fulfilment of the basic
hypothesis in the disturbances of the transformed model? Justify your answer.
Another researcher thinks that the model in equation (1) is not correctly specified because the
rate of employment is a dynamic variable such that it depends on past employment rates. Then,
the following model is specified and estimated:
87
lprepopt = β1 + β2 lmincovt + β3 lprgnpt + β4 lprepopt−1 + ut (2)
with OLS estimation:
d
lprepop t = −0, 815 − 0.098 lmincovt + 0.059 lprgnpt + 0.764 lprepopt−1 (3)
(0.318) (0.041) (0.032) (0.085)
b
ut = −0.010 + 0.302 ût−1 − 0.002 lmincovt − 0.006 lprgnpt − 0.062 lprepopt−1 (4)
(0.311) (0.195) (0.040) (0.031) (0.093)
T = 37 R2 = 0.0695
(Standard errors in parentheses)
f ) In view of the previous results, what are the properties of OLS in the model in equation
(1)? And in equation (2)? Which model do you prefer? Justify your answer.
The factors that affect the final score in an exam of a particular university course are to be
analysed. The exam consists of 40 questions, one point each one. The researcher proposes the
following regression model:
where:
88
Model 5: OLS, using observations 1–674 (n = 674)
Dependent variable: final
A second researcher thinks that the variable atndrte could be endogenous. Therefore, he/she
proposes to estimate the model with Instrumental Variables, using distance to campus, dist, as
instrument of atndrte. The results are:
a) What do you think about the suspicion of the second researcher? Justify it with some test.
b) In view of the previous results, what method of estimation would you choose? Why? What
are the properties of the selected estimator?
c) Test the individual significacnce of the variable hwrte using the estimator you consider bet-
ter.
d) Finally, a third researcher thinks that the variance of the disturbances could possibly change
with the variables atndrte and ACT. Explain in detail how you would test this possibility
in the model in equation (1). If the test statistic is 3.861, do you find evidence in favour
of the null hypothesis?
89
e) Taking into account your answer in the previous question, would you change your answer in
question b)? In that case, what method of estimation do you suggest? Justify your answer.
a) Obtain (with a formal and detailed proof) the properties of the OLS estimator. Can you
use Mann-Wald test?
iid
b) Assume now that ut follows an AR(2) such that ut = ρ1 ut−1 +ρ2 ut−2 +ǫt , where ǫt ∼ (0, σǫ2 )
and ρ1 , ρ2 are known parameters. Is OLS an estimator with good properties? If your an-
swer is negative, what alternative estimator would you suggest?
In order to analyse the share of disposable income that is dedicated to food expenditure, infor-
mation on 235 Belgian families is available on the following variables:
d
foodexp i = 147.475 + 0.4852 incomei (2)
(15.957) (0.0144)
leading to the fitted values and OLS residuals shown in Figure 23:
90
Figure 23: Results with OLS estimation
(a) Actual and fitted values (b) OLS residuals
3000 600
actual
fitted
400
2500
200
2000
0
foodexp
residual
1500
-200
1000
-400
500
-600
0 -800
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
income income
b) Using one of the following auxiliary regressions test for the fulfilment of the basic hypothesis
in the disturbances:
c) The researcher also estimates the model by OLS but using the White estimator of the
variance covariance matrix, with the results:
d
foodexp i = 147.475 + 0.4852 incomei (3)
(46.648) (0.0520)
Why do you think that White estimation is used? Describe in detail this estimator.
Not convinced by the results, the researcher estimates the model by Weighted Least Squares
(WLS), where all the variables are weighted by dividing them by incomei , obtaining
g
foodexp i = 66.1830 + 0.574002 incomei (4)
(11.207) (0.014980)
91
Statistics based on the weighted data:
d) Explain how these WLS estimates have been obtained. What improvement is expected with
respect to OLS? Indicate in what context that improvement is actually achieved.
e) Do you think that the estimation by WLS achieves that improvement over OLS? Base your
answer on some formal test.
f ) Taking into account all the previous results, test in the more appropriate way if the share
of disposable income dedicated to food is larger than a half.
The search for a model to explain the aggregate consumption in the USA has been one of the
most active topics of research from the beginning of the last century. One of the first models
proposed is:
Ct = β0 + β1 Pt + β2 Pt−1 + β3 Wt + ut (1)
where:
With a sample of annual observations from 1920 to 1941 the following estimated model has been
obtained by OLS (year 1920 corresponding to t = 0):
b t = 16.2366 + 0.1929 Pt + 0.0899 Pt−1 + 0.7962 Wt
C t = 1, 2, ..., 21, (2)
(1.3027) (0.0912) (0.0906) (0.0399)
T = 21 R2 = 0.981 σ̂ = 1.0255
21
X 21
X 21
X
ût ût−1 = 3.2402 (ût − ût−1 ) = −1.8496 (ût − ût−1 )2 = 24.4497
t=2 t=2 t=2
X21 21
X
û2t = 17.8794 û2t = 17.7750
t=1 t=2
(standard errors in parentheses)
92
Figure 24: OLS residuals
Regression residuals (= observed - fitted C)
2
1.5
0.5
residual
-0.5
-1
-1.5
-2
-2.5
1925 1930 1935 1940
a) Do you find any evidence of failure of some basic hypotheses on the disturbances? Use all
the information provided, including Figure 24 and a formal test.
b) Later, it is thought that profits and consumption are jointly determined, inducing con-
temporaneous correlation between Pt and ut (however cov(Pt−1 , ut ) = 0). What are the
consequences of this correlation on the previous OLS estimation? Explain in detail.
c) Using the variable It (investment at time t) as an instrument for Pt , the same model has
been estimated by Instrumental Variables, obtaining:
b t = 16.2341 + 0.1516 Pt + 0.1161 Pt−1 + 0.8028 Wt
C (3)
(1.3105) (0.1017) (0.0953) (0.0408)
T = 21 R2 = 0.9808 σ̂ = 1.0317
(standard errors in parentheses)
Describe how these estimates (coefficients and standard errors) have been obtained, their
properties and the characteristics that It has to satisfy to guarantee those properties.
e) A different researcher thinks that the model should include the dynamism existing in the
consumption and proposes the following model estimated by OLS:
b t = 10.1435 + 0.4337 Pt − 0.1700 Pt−1 + 0.5377 Wt + 0.3267 Ct−1
C
(2.5214) (0.1186) (0.1267) (0.1019) (0.1213)
(4)
T = 21 R2 = 0.987 σ̂ = 0.87682 BG(1) = 0.011
(standard errors in parentheses)
Explain how the value BG(1) = 0.011 has been obtained and use it to test for the fulfil-
ment of the basic hypotheses in the disturbances.
93
f ) Based on your answer in e), what are the properties of the OLS estimator in this model?
Explain in detail.
g) Taking into account all the results in this exercise, what is in your opinion the best esti-
mated model to analyse the factors that affect Ct ? Justify clearly and thoroughly.
a) Write down the matrix of variances and covariances of the disturbances. Do they satisfy
the basic assumptions of the GLRM? Justify your answer.
P P 2
P Yi = 1201, 88 P Yi 2 = 26456, 32 P 3 P 4
P X1 i = 276, 2 P X1i = 1603, 98 P X1i = 10649, 57 P X1i = 76704, 71
= 18, 66 2 = 8, 69 3 = 5, 54 4 = 4, 28
P Xi P Xi2 P Xi3 P Xi4
X Y = 6372, 93 X Y = 40513, 12 X Y = 285095, 50 X Y = 2137825, 06
P Yii i P Yii i P Yii i P Yii i
Xi = 311, 54 X2i
= 124, 09 X3
i
= 72, 36 X4 i
= 53, 63
b) How would you estimate the coefficients of the model? What are the properties of the
proposed estimator? Estimate finally the coefficients.
c) A different researcher assures thet he/she has estimated the model efficiently. This re-
searcher has estimated a transformed model by OLS. Based in this transformed model,
the researcher has estimated two new regressions: one with 20 observations corresponding
to the lowest values of the explanatory variable and other with the 20 observations with the
highest values. The Residual Sum of Squares (RSS) of these two regressions are 2.73 and
3.91 respectively. Do you think that the asseveration that the model has been estimated
efficiently is true?
d) If the variances of the disturbances were not known and you did not know how to estimate
them, how could you test the individual significance of Xi ? Explain in detail all the steps,
describing every element in the testing procedure.
94
EXERCISE 65 (GE.23) (July-2017)
A group of researchers wants to detect the factors that affect the nominal interest rate in Spain.
To that end, information on the following variables is available:
where the variables inft and deft are considered to be non-stochastic and t goes from the first
term of 1980 to the third term of 2000 (83 observations).
T = 83 R2 = 0.3865 DW = 0.6287
4
residual
-2
-4
-6
-8
1980 1985 1990 1995 2000
95
a) Using all the information provided, do you think that the disturbances satisfy all basic hy-
potheses? Justify your answer.
b) According to the results obtained in the previous question, what are the properties of the
OLS estimator? Justify your answer.
T = 82 R2 = 0.9321 DW = 1.2786
T = 82 R2 = 0.9319 DW = 1.3017
c) Explain how this TSLS estimates have been obtained. In particular, identify the appropri-
ate instruments and explain in detail the steps needed to get the previous results.
d) Which one of the two previous estimators is more adequate for model (4)? Justify your
answer.
e) Given your answer to question d), do you think that the basic assumptions on the regressors
are satisfied? And on the disturbances? Justify your answer.
c t = 0.8596 + 0.2159 inft + 0.0839 inft−1 − 0.0498 deft + 0.1618 deft−1 + 0.9020 intt−1 (8)
int
d
(dev) (0.6399) (0.1613) (0.1598) (0.4577) (0.4706) (0.0458)
f ) What model and which method of estimation do you think that is more adequate? Justify
your answer.
96
EXERCISE 66 (GE.24) (July-2017)
where Xt is a non-stochastic variable that is not directly observable but we observe instead
iid
Xt∗ = Xt + εt , with εt ∼ (0, 1). It is also known that E(ut εt ) = 0.5β (β 6= 0), E(ut εs ) = 0 ∀t 6= s
∗ ) = 0.85.
and corr(Xt∗ , Xt−1
a) Write down the model to be estimated. What are the mean and variances of the distur-
bances?
b) Is there any basic hypothesis that is not satisfied? Justify your answer.
c) Having in mind your answer in the previous question, what method of estimation would you
use to estimate the model in question a)? Justify your choice and fill in the blanks in the
following matrices corresponding to the chosen method of estimation.
−1
········· ········· ·········
βb.......... =
········· ········· ·········
A credit institution wants to analyse the factors that affect the expenditure of individuals using
credit cards. A sample of 100 observations is available on the following variables:
97
with its OLS estimation:
1500 1500
1000 1000
residual
residual
500 500
0 0
-500 -500
-1000 -1000
20 25 30 35 40 45 50 55 2 3 4 5 6 7 8 9 10
AGE INCOME
a) It is now believed that Income and Avgexp are affected by the same factors such that
Incomei and ui in model (1) are correlated. If this suspicion is true, what are the effects
on the previous OLS estimation?
b) A dummy variable Ownrent has now been constructed such that Ownrenti = 0 if individual
i rents a house and Ownrenti = 1 if he/she owns it. With that information the following
IV estimator has been obtained:
d i = −80.0032 − 5.29625 Agei + 130.273 Incomei
Avgexp
(157.60) (5.1108) (62.934)
Perform a formal test to analyse if the suspicion in question a) can be considered as being
true.
c) What characteristics should Ownrenti have to be a good instrument for Incomei ? If those
characteristics are satisfied, what are the properties of the IV estimator?
d) OLS residuals have been plotted in Figure 26. What information can be extracted from
both plots?
98
e) Given the information in Figure 26, explain in detail a formal test to check if the distur-
bances in model (1) satisfy the basic hypothesis of the GLRM.
f ) The following estimated model has also been obtained with OLS:
d i
Avgexp 1 Agei 1
= −23.8625 − 2.6307 + 89.2990
Income2i 2
(53.996) Incomei (1.6493) Incomei2 (19.744) Income i
T = 100 R2 = 0.423 σ̂ = 20.431
(standard errors in parentheses)
What is the final purpose of this transformed model? Why has it been estimated by OLS?
where v̂i are the OLS residuals in the transformed model in question f). Use this auxiliary
regression to test the adequacy of the transformed model.
h) Taking into account all the results obtained so far, test the individual significance of Income
to explain the variations of Avgexp. Justify your selection of the method of estimation
employed in the testing procedure.
where
99
The model has been estimated by OLS using quarterly observations from 1950Q1 to 2000Q4,
d
log M1t = −1.6331 + 0.2871 log GDPt + 0.9718 log CPIt
(0.2286) (0.0474) (0.0338)
2
T = 204 R = 0.9895 σ̂ = 0.0829 (2)
204
X 204
X 204
X
ût ût−1 = 1.357 (ût − ût−1 ) = −0.156 (ût − ût−1 )2 = 0.034
t=2 t=2 t=2
204
X 204
X
û2t = 1.381 û2t = 1.374
t=1 t=2
(standard errors in parentheses)
0.2
0.15
0.1
0.05
residual
-0.05
-0.1
-0.15
-0.2
-0.25
1950 1960 1970 1980 1990 2000
a) What can you say about the fulfilment of the basic hypotheses of the disturbances from
Figure 27?
b) Make now some formal test to check if the disturbances satisfy the basic hypotheses.
d
log M1t = −1.6331 + 0.2871 log GDPt + 0.9718 log CPIt (3)
(0.3116) (0.0723) (0.0608)
Explain the differences (if any) between this and the estimated model in equation (2).
100
d) A new model that includes four lags of the dependent variables is also estimated by OLS,
with the results:
d
log M1t = −0.0426 + 0.0081 log GDPt + 0.0226 log CPIt + 1.3462 log M1t−1
(0.0289) (0.0058) (0.0080) (0.0716)
(4)
− 0.1510 log M1t−2 − 0.0968 log M1t−3 − 0.1225 log M1t−4
(0.1203) (0.1202) (0.0705)
2
T = 200 R = 0.9999 BG(4) = 6.2657 σ̂ = 0.0088
(standard errors in parentheses)
What is the improvement expected by the inclusion of four lags of the dependent variable?
e) Make some formal test to decide if such expected improvement is actually achieved. Explain
in detail all the elements of the testing procedure.
f ) Which one of the three estimated models (2), (3) or (4) should be used to analyse the money
demand?
Daily data (5 work days per week) are available on the exchange rate of the following currencies
against the American Dollar (Dollar):
The following model is considered to explain the Dollar/Euro exchange rate fluctuations:
eurot = β1 + β2 bpt + β3 cdt + β4 dyt + β5 sft + ut (1)
where all the basic hypotheses in the GLRM are, in principle, assumed to be satisfied (unless
some evidence against them is found out). The OLS estimation is:
101
Figure 28: Results from OLS estimation
(a) Time series OLS residuals (b) Observed and estimated dependent variable
Residuals (= euro observed - estimated) euro observed and estimated
0,1 1,3
observed
0,08 estimated
1,2
0,06
1,1
0,04
1
0,02
residual
euro
0 0,9
-0,02
0,8
-0,04
0,7
-0,06
0,6
-0,08
-0,1 0,5
1980 1981 1982 1983 1984 1985 1986 1987 1980 1981 1982 1983 1984 1985 1986 1987
a) Explain how the OLS residuals in Figure 28(a) have been obtained. Using Figure 28(a),
dt
draw, roughly but clearly, the line corresponding to the estimated dependent variable euro
that is absent in Figure 28(b). Explain the implications of these two figures on the OLS
estimation in (1).
b) Given the graphs in Figure 28, test the hypothesis that you consider relevant on the be-
haviour of the disturbances.
c) Taking into account the previous results, propose and explain in detail a method of estima-
tion of the model in equation (1), mentioning its expected properties and the context in
which those properties are actually achieved.
d) The strategy described in the previous question leads to the following OLS regression:
Explain what estimator is obtained with this regression and how the variables euro∗t , bp∗t ,
..., sf∗t have been constructed. Do you think that this estimator is better than the OLS
estimation in equation (2)? Support your answer with some formal test.
e) It is thought that the exchange rates Dollar/Euro and Dollar/British Pound are jointly
determined, such that
bpt = γ1 + γ2 eurot + vt (4)
Describe in detail how to estimate the parameters of the model in equation (1) in this
case, as well as the properties that the estimator should have.
102
TSLS, using observations 1980-01-03–1987-02-26 (T = 1866)
Dependent Variable: euro
Instrumented: bp
Instruments: const cd dy sf cd 1 dy 1 sf 1
g) Explain in detail how to test the null hypothesis that the expected effect of the Dollar/Pound
and Dollar/Canadian Dollar exchange rates on the Dollar/Euro are equal.
where now all the regressors are assumed to be non-stochastic and there is no autocorrelation.
The graphs in Figure 29 are obtained from the OLS estimation in equation (2):
a) Explain the graphs in Figure 29. What effects can be deduced on the properties of the OLS
estimator?
b) Consider now Figure 29(c). In view of this figure, explain how you would test a relevant
hypothesis using Goldfeld and Quandt.
103
Figure 29: OLS residuals
(a) against bpt (b) against cdt
Residuals (= euro observed - estimated) Residuals (= euro observed - estimated)
0,1 0,1
0,08 0,08
0,06 0,06
0,04 0,04
0,02 0,02
residual
residual
0 0
-0,02 -0,02
-0,04 -0,04
-0,06 -0,06
-0,08 -0,08
-0,1 -0,1
1,2 1,4 1,6 1,8 2 2,2 2,4 0,7 0,72 0,74 0,76 0,78 0,8 0,82 0,84 0,86
b c
0,08 0,08
0,06 0,06
0,04 0,04
0,02 0,02
residual
residual
0 0
-0,02 -0,02
-0,04 -0,04
-0,06 -0,06
-0,08 -0,08
-0,1 -0,1
0,004 0,0045 0,005 0,0055 0,006 0,0065 0,007 0,35 0,4 0,45 0,5 0,55 0,6 0,65 0,7
d s
d
u sqt = 0.0014 − 0.0012 bpt − 0.0010 cdt − 1.1545 dyt + 0.0144 sft (1)
(0.0009) (0.0001) (0.0012) (0.0785) (0.0009)
where usqt are squared OLS residuals obtained from equation (2). Explain for what test
the previous regression is necessary. Is the result of that test compatible with the graphs
in Figure 29?
104
d
where u sqt is the estimated dependent variable in equation (1); and
d) Taking into account the previous results and the information provided, explain the proper-
ties of the OLS estimators in equations (2) and (3).
e) Test if the effect of the Dollar/Pound on Dollar/Euro is 1. Justify the choice of the test
statistic and the estimator used in the testing procedure.
f ) If the disturbances of the initial model (1) showed both heteroscedasticity and autocorrela-
tion, how would you test the previous hypothesis in question e)?
A political institution is concerned about the effects of smoking on the weight of new babies at
birth. They want to analyse if increasing the price of cigarettes (perhaps via taxes) may have
some effect on the birth weight. A sample of 1388 individuals of different states in the US is
available on the following variables:
105
with its OLS estimation:
The following OLS regression is also obtained using the OLS residuals ûi :
a) Using the information provided, do you perceive evidence of unfulfillment of any basic hy-
pothesis?
b) Considering your previous answer, test in the best way if increasing the price of cigarettes
has a positive effect on the birth weight.
c) It is now suspected that the income of the families is not exogenous, but it is determined by
socio-economic variables that may also affect the health environment and the birth weight
such that faminci and ui are correlated . If that is the case, what are the implications on
the previous OLS estimation?
106
Describe the method of estimation used here and justify its properties (assume E(ui uj ) = 0
for i 6= j).
f ) Taking into account all the previous results, would you change the test implemented in
question b)? Justify your answer.
A company dedicated to manufacturing cars wants to analyse the factors that affect its demand.
For that, the following model is first considered:
where
The model has been first estimated by OLS using quarterly observations from 1976Q1 to 1990Q4,
a) What can you say about the fulfilment of the basic hypotheses of the disturbances from
Figure 30?
b) How has the value DW = 1.461 been obtained? Use it to make some formal test.
107
Figure 30: OLS residuals
500
400
300
200
100
residual
-100
-200
-300
-400
-500
1976 1978 1980 1982 1984 1986 1988 1990
c) Taking into account your answer to question b), what are the properties of the estimator
used in equation (2)? Do you know any other estimator with better properties? Describe
in detail.
Explain how the value BG(1) has been obtained and use it to implement the corresponding
test.
e) Use one of the estimated models, (2) or (3), to test if price has any effect on the sales of
new cars. Justify your choice of the selected model.
In order to analyse the relationship between the gasoline consumption (kml, in kilometeres per
liter) and the power of the engine of the vehicle (pot, in cubic cm) the following regression model
is proposed:
108
A first researcher estimates the model by OLS, with the results:
2
resid
-2
-4
-6
200 400 600 800 1000
p
a) Based on all the information provided, what can you say about the fulfilment of the basic
assumptions?
b) What are the implications of your answer to the previous question on the properties of the
OLS estimator?
c) Having in mind your answer to question a), obtain step by step the matrix of variances and
covariances of the OLS estimator β̂M CO . Is β̂M CO consistent? Prove it.
109
The OLS estimation gives:
d ∗ = 15, 4117 const ∗ − 0, 011024 pot ∗
kml i i i (5)
(0,224876) (0,000522)
d) Why do you think that the researcher decides to estimate the model in this way? Do you
think that he/she has achieved his/her goal? Justify your answer.
A third researcher estimates Model (1) by OLS but using White to estimate the matrix of
variances and covariances of β̂M CO . The results are:
d i = 14, 9313 − 0, 010051 pot
kml i (7)
(0,241112) (0,000371)
110
EXERCISE 74 (GE.32) (July-2019)
The Department of Traffic in California wants to analyse the factors that influence the number
of traffic accidents. Monthly information, from January 1981 to December 1989, is available for
the following variables:
• Spdlaw: dummy variable, = 1 from May 1987, month in which started the speed limit of
105 km/h; = 0 before May 1987.
The following figure shows the evolution of OLS residuals with time:
6000
4000
2000
resid
-2000
-4000
-6000
-8000
1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
111
Finally, the following regression is also estimated by OLS:
b bt
ut = 1679, 29 − 149, 08 Wkends t + 33, 65 Unem t + 68, 92 Spdlaw t + 0, 5274 ût−1 + w (3)
(3231,84) (221,63) (175,99) (686,61) (0,0846)
T = 108 R2 = 0, 2738
(standard errors in parentheses)
a) What can you say about the fulfilment of the basic asumptions on the disturbances? Base
your answer on Figure 32 and a formal test.
A second researcher estimates Model (1) using Cochrane-Orcutt. The results in the implicit
transformed model are:
d ∗t = 51541, 8 const ∗t + 355, 70 Wkends ∗t − 1889, 80 Unem ∗t + 857, 35 Spdlaw ∗t
Totacc (4)
(3237,35) (163,37) (297,04) (1224,14)
u∗t = −262, 908 const ∗t + 32, 13 Wkends ∗t − 18, 41 Unem ∗t − 69, 39 Spdlaw ∗t − 0, 1179 û∗t−1 + vbt
b
(3239,02) (165,33) (296,89) (1223,28) (0,1001)
(5)
T = 107 R2 = 0, 0134
(standard errors in parentheses)
c) Given the results obtained so far, what are the properties of the estimator used in (4)?
Justify your answer.
A third analyst believes that the specification in model (1) is wrong and proposes to
include a lag of the dependent variable as regressor. The model is then estimated by OLS,
obtaining the following results:
d t = 28915, 1 + 301, 20 Wkends t − 1168, 27 Unem t + 309, 53 Spdlaw t + 0, 4280 Totacc t−1
Totacc
(5621,54) (232,21) (235,16) (730,27) (0,0829)
[7805,59] [224,61] [344,12] [847,33] [0,1250]
(6)
2
T = 107 R = 0, 7372 DW = 2, 00236 BG(1) = 0, 1441
(standard errors in parentheses)
[Robust standard errors (Newey-West) in square brackets]
112
d) What specification of the model do you prefer? Justify your answer.
The third analyst realizes now that U nemt is a stochastic variable that can be contempo-
raneously correlated with the disturbances. A new method of estimation is then proposed:
Instrumental Variables using U nemt−1 as instrument.
d t = 22730, 6 + 296, 93 Wkends t − 800, 31 Unem t + 824, 63 Spdlaw t + 0, 5084 Totacc t−1
Totacc
(6288,62) (234,99) (286,47) (771,98) (0,0908)
(7)
2
T = 107 R = 0, 7312 DW = 2, 1959
(standard errors in parentheses)
e) Taking into account the information provided so far, what method of estimation do you
think is better: the method used in (6) or in (7)? Justify your answer.
f ) Use one of the estimated models [(2), (4), (6) or (7)], to test if the variable Spdlaw has a
negative effect over T otacc.
A factory dedicated to making ice creams wants to analyse the factors that affect the demand
for its production. With that purpose the owners have been collecting information every four
weeks from March 1951 to July 1953, giving a total of 30 observations of the following variables:
Three different advisors are consulted, who provide the following reports:
Advisor 1:
113
d t = 0.19731 − 1.04441 pricet + 0.00331 incomet + 0.00346 tempt
consum
(0.27022) (0.83436) (0.00117) (0.00045)
2
T = 30 R = 0.7190 σ̂ = 0.036833
(standard errors in parentheses)
with the OLS residuals ût , displayed in Figure 1 for t = 1, ..., 30.
residual
0
5 10 15 20 25 30
a) What information gives Figure 33 about the fulfilment of the basic hypotheses of the dis-
turbances ?
c) Advisor 1 tests for the significance of the variable pricet and, in view of the results, concludes
that the owner of the factory should raise the prices to increase the profits. Do you agree?
Advisor 2:
Advisor 2 suspects that the disturbances follow an AR(1) (ut = ρut−1 + ǫt , ǫt ∼ iid(0, σ 2 )).
114
d) Get a consistent estimate of ρ.
e) How do you think that the previous estimate of ρ has been used to obtain the FGLS esti-
mates of the parameters of the model? Describe in detail.
f ) Describe how the value BG(1) = 0.326 has been obtained. Use it to decide which estimated
model: OLS by Advisor 1 or FGLS by Advisor 2 is preferable. Base your answer on the
properties of the estimators.
Advisor 3:
+ 0.09879 consumt−1
(0.24746)
g) What are the properties of OLS in this case? Make some tests to justify your answer.
h) Use some of the results offered by Advisors 2 or 3 to make a test that helps you to support
or disadvise the recommendation of Advisor 1 in question c).
115
EXERCISE 76 (GE.34) (May-2020)
In order to analyse the factors that affect the salary of very young men in USA a sample of 758
observations of men in the age of 14-24 in 1966 was available on the following variables:
Using the OLS residuals ûi , the following auxiliary regressions have been also obtained:
1) û2i = 0.05 + 0.03rnsi + 0.02mrti − 0.00iqi − 0.00agei + 0.02smsai + v̂i , R2 = 0.011, RSS =
23.28,
116
3) ûi = 0.03 + 0.12ûi−1 + 0.02rnsi + 0.12mrti + 0.02iqi + v̂i , R2 = 0.002, RSS = 33.54,
4) ûi = 0.03 + 0.06rnsi + 0.00mrti + 0.01iqi − 0.01agei + 0.00smsai + v̂i , R2 = 0.01, RSS =
28.12,
a) Use one of the previous auxiliary regressions to test for evidence of unfulfillment of some
basic hypothesis in the disturbances.
b) The variable iqi is used as a proxy of the true ability of individual i and thus it may be
subject to significant measurement error. What are the implications of this error on the
previous OLS estimation?
Describe the method of estimation used here and justify its properties.
In 1992, a study was carried out on the safety of several models of cars. At that time, only a
few cars had airbags. The study was done by sitting dummies inside each car and banging them
against a wall. The dependent variable will be Pinjuryi : percentage (expressed as a decimal)
in which the dummy is broken . There are 231 observations (each observation corresponds to a
doll) and the regressors considered are:
117
• weighti : weight of car i (in thousands of pounds),
The OLS residuals have been used to obtain the graphics in Figure 34.
R !"# $% &' ( )*!+, 789:8;;<=> :8;<?@AB; CD =E;8:F8? . G<HH8? I<>J@:KL
/06
/05
/04
/01
residual
residual
/02
/03
0 0
./03
./02
./01
2 3 4 0 1
w driver
[\ZX\]]W^_ X\]W`eaf] gh ^Y]\Xi\` M jWkk\` lW_meXno ~
~ ~ q ||
~
NOV rt{
NOU rtz
NOT rty
NOP rtu
residual
residual
NOQ rtv
NOS rtx
0 0
MNOS qrtx
MNOQ qrtv
MNOP qrtu
0 1 0 1
aWXYaZ |}~
118
a) Interpret the estimated coefficient of the variable airbag.
Using the squared OLS residuals obtained from the regression in equation (1),usq1i , the
following result is obtained:
c) Explain step by step the test in which the previous regression is necessary. Perform that
test. Is the result of the test compatible with the graphs in Figure 34?
d) Taking into account the previous results, justify, based on their properties, the validity of
the estimated coefficients and variances in (1).
A weighting variable has been constructed: ponde = the inverse of the square root of the
endogenous variable estimated in section (c). Using it, the following estimated model has
been obtained:
119
Sum squared resid 87361.92 S.E. of regression 19.66106
R2 0.025505 Adjusted R2 0.008257
F (4, 226) 1.478725 P-value(F ) 0.209495
e) Explain in detail the method of estimation used in Model 3. What are its expected proper-
ties?
f ) We want to check whether having an airbag reduces injuries in the event of an accident.
Do you prefer to do this in the model in equation (1) or in the model estimated in the
previous question? Reason the answer and implement the test.
We want to analyse the relationship between income, R, and consumption, C (both in billions
of dollars) of a country using quarterly data from 1970:1 to 1997:4 (T =112). The disturbances
are assumed to be normal and R is considered a non-stochastic regressor.
and concludes that this country dedicates to consumption half of each additional dollar of income.
a) Test the hypothesis on which the previous conclusion is based. Do you agree with Analyst 1?
b) Given the value of DW , do you think that the previous conclusion is correct?
Ĉt = 0.830 + 0.069 Rt + 0.030 Rt−1 + 0.660 Rt−2 − 0.002 Ct−1 + 0.073 Ct−2
(0.198) (0.044) (0.055) (0.054) (0.060) (0.052) (2)
{0 .150 } {0 .037 } {0 .045 } {0 .055 } {0 .057 } {0 .056 }
c) What are the properties of the estimator used by Analyst 2? Justify your answer in detail.
120
d) Can you think of any reason why Analyst 2 would have obtained the N-W standard de-
viations? That is, does he get any improvement in this case by using them for inference
instead of the usual estimator of the standard deviations?
Analyst 3 eliminates variables that are apparently irrelevant and estimates by OLS the
following dynamic model:
e) Do you think that any basic hypothesis is missing from the model specified by Analyst 3?
Justify your answer in detail.
f ) Test in the model proposed by Analyst 3 the hypothesis that an additional dollar in income
today causes (on average) an increase in consumption of more than 0.5 two quarters ahead.
Data on the U.S. gasoline market for the years 1960-1995 are available on the following variables:
In order to analyse the factors that affect gasoline consumption the following model is estimated
by OLS:
lnd
GCt = −13.679 − 0.082 lnGPt + 1.523 lnDIt − 0.195 lnPPTt (1)
(0.724) (0.027) (0.081) (0.031)
121
Figure 35: OLS residuals
¡ ¢£¤¥ ¦§ ¨©¢ ª««¢ ¥¡¬®
residual
1960 1965 1970 1975 1980 1985 1990 1995
a) What can you say about the fulfilment of the basic hypotheses of the disturbances? Use
the residuals in Figure 35 and a formal test to justify your answer.
Alternatively, two estimates of a different model are considered. First, the following model is
estimated by OLS:
lnd
GCt = −4.645 + 0.523 lnDIt − 0.092 lnPPTt − 0.296 lnGPt
(1.107) (0.123) (0.025) (0.032)
(2)
+ 0.260 lnGPt−1 + 0.775 lnGCt−1
(0.037) (0.080)
2
T = 35 R = 0.990 DW = 1, 734 BG(1) = 1.759
(standard errors in parentheses)
c) Does the model in (2) implies any improvement over the estimated model in (1)?
d) If the price of public transportation were affected by the factors determining the consump-
tion of gasoline such that the disturbances and lnPPT were contemporaneously correlated,
what would be the properties of the previous OLS estimation?
122
Coefficient Std. Error t-ratio p-value
const −4.9859 1.2711 −3.923 0.0005
lnDI 0.5604 0.1414 3.963 0.0004
lnPPT −0.1018 0.03075 −3.311 0.0025
lnGP −0.2954 0.03206 −9.214 0.0000
lnGP 1 0.2624 0.0377 6.951 0.0000
lnGC 1 0.7560 0.0868 8.709 0.0000 (3)
e) What method of estimation is used in equation (3)? Describe it in detail together with its
properties and the conditions the instruments should satisfy.
f ) Test if the disposable income elasticity is lower than one using a valid inference technique
based on the most efficient estimation. You may need to perform some prior test to select
the estimated model (1), (2) or (3).
A health insurance company wants to analyse why the families decide to visit a doctor. A
survey is then made with 485 household heads who may or may not have visited a doctor during
a certain period of time. The variables in the survey are:
• statusi : measure of health status (larger positive numbers are associated with poorer
health).
a) Considering that BP = 413.82 is the Breusch-Pagan statistic to test if some or all regressors
affect the variance of the disturbances, describe in detail how this value has been obtained
and test the corresponding hypothesis.
123
σ2
b) If var(ui ) = kidsi , describe how to estimate efficiently the model.
σ2
Assuming that var(ui ) = kidsi , the following transformed model has been estimated by OLS:
and the following auxiliary regression with the residuals from (2), û∗i :
ĉ2
u∗i = 19.749 − 1.993 kidsi + 8.420 statusi
(8.738) (3.336) (3.069)
2
T = 485 R = 0.016 RSS = 4516647 σ̂ = 96.802
(standard errors in parentheses)
(3)
σ2
c) Do you think that the assumption that var(ui ) = kidsi is correct? Why?
d) How would you test if the variable access is significant to explain the number of visits to
the doctor? Describe every element of the test.
A researcher wants to analyse the factors that determine the volume of shares that are traded
in the New York Stock Exchange market (NYSE). For this, monthly data are available from
January 1980 to September 1995 of the following variables:
124
Model 1: OLS, using observations 1980:01–1995:09 (T = 189)
Dependent variable: volume
Figure 36 shows the time series of residuals from the estimated Model 1:
Figure 36: OLS residuals from Model 1
Regression residuals (= observed - fitted volume)
3500
3000
2500
2000
1500
residual
1000
500
0
-500
-1000
-1500
1980 1982 1984 1986 1988 1990 1992 1994 1996
a) Use Figure 36 to make some comments on the behaviour of the disturbances. Use also the
information given in Model 1 to make a formal test of this behaviour.
The researcher thinks now that the volume of traded shares can also be determined by the
volume traded in the previous period, so the following model is specified:
125
Coefficient Std. Error t-statistic p value
const −70.9246 310.190 −0.2286 0.8194
sp500 6.09143 0.899202 6.774 0.0000
tbill −36.3394 20.3537 −1.785 0.0759
cconf 4.03150 2.05414 1.963 0.0512
volume 1 0.480018 0.0644920 7.443 0.0000
R2 = 0, 2735
b) Use the information here provided to test the fulfilment of the basic hypotheses in Model
2.
c) Given your answer in question b), what are the properties (asymptotic and in finite samples)
of the estimator used in Model 2? Justify your answer.
A second researcher estimates by OLS the following alternative model (Model 3):
d t = 202.534 + 2.5702 sp500 − 23.7499 tbillt + 0.1725 cconft
volume t
(272.602) (0.8986) (18.1205) (1.8539)
d) Do you think that Model 3 shows some improvement over Model 2? Justify your answer.
In order to carry out a more in-depth analysis, this second researcher re-estimates the model by
IV using the lags of sp500, tbill and cconf as instruments for volumet−1 , volumet−2 y volumet−3 .
The results are:
d t = −595.167 + 14.0904 sp500 − 68.5734 tbillt + 12.1332 cconft
volume t
(2092.93) (14.5026) (61.9663) (21.4592)
e) Do you prefer this model or the estimated model selected in question d)? Justify your an-
swer.
126
EXERCISE 82 (GE.40) (July-2021)
The goal of this exercise is to analyse the factors that affect the value of the mortgages requested
from banks. Some information is available on of 1971 mortgages for the following variables:
A first researcher estimates the model using OLS, with the results:
Ĥi = 35, 6272 + 1, 9275 Ri + 0, 4725 Pi + 9, 2429 Bi − 10, 8113 Si + 2, 9609 Ai
(1,9158) (0,2316) (0,0101) (3,3148) (4,3467) (2,9632) (1)
{5 ,44623 } {0 ,6531 } {0 ,04237 } {2 ,5515 } {2 ,7960 } {3 ,6224 }
400
200
residual
-200
-400
-600
0 200 400 600 800 1000 1200 1400
¯
127
a) What can you say about the fulfilment of the basic hypotheses? Base your answer on all
the information provided.
b) What are the implications of your answer to the previous question on the properties of the
OLS estimator? And on the inference?
c) Taking into account your answer in question a), obtain step by step the matrix of variances
and covariances of β̂OLS . Is β̂OLS consistent? Prove it.
Ĥi∗ = 7, 15520 const∗i + 0, 520844 Ri∗ + 0, 706845 Pi∗ + 9, 97720 Bi∗ + 7, 02707 Si∗ + 1, 55189 A∗i
(0,982573) (0,176878) (0,0078919) (1,64088) (1,77960) (1,29349)
(3)
2
R = 0, 653398 RSS = 0, 446520
(Standard errors in parenthesis)
In addition, the OLS residuals of this transformed model (û∗i ) are used in the following
OLS regression:
uc
b∗2
i = 0, 000374784 − 0, 00000075384 P i (4)
(0,0000619403) (0,000000263659)
d) Why do you think that the researcher is using this method of estimation? Do you think
that the researcher has achieved his/her goal? Justify your answer.
e) Test if the monthly income has a positive effect on the value of the mortgage. Justify your
selection of the test statistic.
128
COMPUTER EXERCISES
These computing exercises are addressed to the development of the following competences:
A. Learn the importance of the assumptions underlying a basic econometric model in order to
be able to propose and use more realistic assumptions.
B. Distinguish among different methods of estimation and evaluate their use according to the
economic variables of interest in order to obtain reliable results.
C. Use diverse statistical sources and acquire econometric software skills to analyse relationships
among economic variables.
COMPUTER EXERCISE 1
A database is available with information about the selling price and certain characteristics of
224 houses in two residential areas of the Orange County in California (USA): Dove Canyon
and Coto de Caza 13 . Dove Canyon is a neighbourhood built around a golf course with single
family tract homes with relatively small lots. Coto de Caza is a more upscale area. It is more
rural with large custom homes. The variables considered are:
You can get access to these data running GRETL → in File → Open data → Sample file →
choose Ramanathan, the file coded data7-24.
a) a.1) Specify a model to analyse if the size and age of the building are factors relevant to
explain the price of the house. Estimate the model by Ordinary Least Squares.
a.2) Comment on the obtained results in terms of goodness of fit, significance and signs
of the estimated coefficients.
b) Obtain the graphic of the OLS residuals in this first specification. What does this graphic
suggest? Do you think that there is a misspecification problem? Why?
c) Introduce the variable city as explanatory variable in the model. Give an interpretation
of the accompanying coefficient.
13
Source: Ramanathan, Ramu (1992) Introductory Econometrics with Applications
129
d) Estimate this second specification by OLS. Comment on the results and compare them
with those obtained in a). Is this specification better than the previous one? Why?
g) Perform some heteroscedasticity test(s). Explain the testing procedure and comment the
obtained results.
i) Estimate the model by Generalized or Weighted Least Squares using as weighting variable
the inverse of the squared living area. Analyse the results.
j) What do we mean by weighted data and original data? Why the inverse of sqft2 is used
as weighting variable? Explain.
k) Propose another specification for the modelling of the variances of the disturbances that
includes both age and sqft. Using this funtional form for the variance estimate the model
by Feasible Generalized Least Squares.
l) Write down a section summarizing all the results obtained throughout the exercise. Which
results are more reliable? Why?
COMPUTER EXERCISE 2
130
To access these data:
GRETL → In File → Open data → sample file
Then select Ramanathan, file data8-2.gdt
a) Specify a model to analyse if the personal income explains the expenditure on domestic
travel. Interpret the coefficients.
b) Estimate the model by Ordinary Least Squares. Comment on the results in terms of
goodness of fit, significance and expected signs of the estimated coefficients.
The states with larger population are likely to show higher variability in travel expenditure than
states with smaller number of citizens. Therefore, it can be expected that the variance of the
disturbances grows with population. To analyse this possibility we have data on the population
of the different states. Then,
– OLS residuals.
– Plot of OLS residuals against the variable POP.
e) Perform the Goldfeld and Quandt test assuming that the variance of the disturbances
increases with the variable P OP . Explain the testing procedure and comment on the
obtained results.
f) Perform the Breusch and Pagan test under the assumption that the variance of the dis-
turbances depends on the variable P OP . Explain the testing procedure and comment on
the obtained results.
g) Given the evidence found in e) and f), comment on the reliability of the results obtained
in b). Using β̂OLS , could an increase of one billion dollars in personal income produce an
increment of one million dollars in the aggregate expenditure on domestic travel?
h) Estimate the model by Generalized or Weighted Least Squares using as weights the inverse
of the squared population. Plot the residuals against the explanatory variable. Analyse
the results.
i) What do we mean by weighted data and by original data? Why has the inverse of the
variable P OP 2 been used as weighting variable? Explain with reasoning.
j.1) Write down the corresponding transformed model. Estimate efficiently the proposed
transformed model. Compare these results with those obtained in h), can you estab-
lish any conclusion?
j.2) Plot the residuals from the estimation of the transformed model against their cor-
responding exogenous variable. Interpret such plot and compare it with that in h).
How do you interpret what you see?
131
k) Specify a model for the relationship between the variables EXP T RAV and IN COM E
under the assumption that σi2 = α1 + α2 P OPi . Estimate the corresponding transformed
model pointing out clearly all the steps needed to do it.
l) Write down a concluding section where all the results obtained throughout the exercise
are summarized.
COMPUTER EXERCISE 3
The following model is proposed to analyse the factors that affect the salary of married women
in U.S.A.:
where
This data set is accessible by opening gretl → menu → File → Sample file → Wooldridge
→ (by alphabetical order) mroz
The data set contains 753 observations for 1975. The first 428 are from working women and the
rest are for unemployed women. Select those women that are employed and thus have a salary.
To restrict the sample: sample → set range → Start: 1, End: 428.
b) Comment the results, in particular goodness of fit, estimated coefficients and their signif-
icance.
c) Is there evidence of a quadratic relationship between lwage and exper? Use a formal test
to support your conclusion.
d) The variable educ is considered stochastic and correlated with the disturbances in Model
(1). Explain the consequences of this correlation on the results obtained before.
e) Estimate now Model (1) by Instrumental Variables using the variable years of schooling of
the father (fatheduc) as instrument for educ. Are the results very different from those
obtained by OLS?.
132
f) Write down every element of the IV estimator: the matrix of instruments Z and the data
matrix X. What are the dimensions of these matrices?
g) An additional instrument for educ is now available: the years of schooling of the mother
(variable motheduc). Estimate Model (1) by TSLS (Two Stages Least Squares) using all
available instruments. Compare the results with those obtained in e).
h) Obtain the correlations between the instruments and the instrumentalised variable educ.
What are the implications of these correlations on the validity of the instruments?
j) Using the results in e), implement Hausman’s test. Taking into account the result of the
test, how would you estimate the coefficients in Modelo (1)?
k) Test if the variable exper is significant. What is the estimated percentage change in the
average salary when the labour market experience increases in one year and the rest of
factors remain constant? Is this change constant for every woman in the sample?
COMPUTER EXERCISE 4
We use the sample file smoke from Wooldridge’s book available in Gretl. The database consists
of observations from males resident in different American states for the year 1979. The variables
included in the file are:
133
• white: =1 if white, =0 otherwise
This data set is accessible by opening gretl → menu → File → Sample file → Wooldridge
→ (by alphabetical order) smoke15
b) Comment on the obtained results, particularly on the goodness of fit, the estimated coeffi-
cients and their significance.
c) Is there any evidence of a quadratic relationship between the variables lincome and age
(remaining all other variables constant)? Show the results of any test performed to lead your
conclusions.
d) It is believed that cigarette consumption can be jointly determined with income, such that
cigs is a stochastic regressor correlated with the disturbance term in model (1). Explain the
consequences of this correlation on the previous results obtained by OLS.
e) Show the results of the Instrumental Variable method of estimation of model (1) using the
variable restaurn as instrument for cigs. Are the results very different from those obtained
by OLS? Comment on these results.
f) Write down the matrix of instruments Z and the matrix of explanatory variables X. Do not
use the numbers but write down the name of the variables in the columns. Write down also
the dimension of the matrices.
g) Write down the expression of the IV estimator, including the elements in Z ′ X and Z ′ Y
(using sums) and their dimension. What characteristics should Z have in order to make this
estimator feasible? What characteristics should Z have in order to guarantee that the IV
estimator has the desirable properties and the inference be valid?
h) The variable lcigpric is now considered as an additional instrument for cigs. Estimate model
(1) by two stage least squares using all instruments. Display the results obtained in Gretl.
Compare these results with those obtained in e).
i) Calculate the correlations between the instruments and the variable cigs. What do these
correlations indicate on the adequacy of these instruments?
j) Perform the regression of the variable cigs on all possible instruments including the
constant.
134
k) Perform the Hausman test on the results of e). According to the results of the test, how
would you estimate the coefficients of model (1)?
l) Test the significance of variable age. What is the estimated change on lincome when the
individual is one year older and all other characteristics remain constant? Is this change the
same across all individuals in the sample?
COMPUTER EXERCISE 5
A Cobb-Douglas production function wants to be estimated for the farming sector in the U.S.A.
For that purpose there exist a database16 of yearly data for the period 1948-1993 on the next
indices (with 1982 = 100):
ln Yt = β1 + β2 ln Lt + β3 ln EXt + β4 ln Kt + ut (1)
b) What is the meaning of indices having base 1982 = 100? Which are the values of the
variables in that year?
c) Estimate by Ordinary Least Squares the log-log (or double logarithmic) specification in
(1). Interpret the results.
d) Analyse the time series of the residuals. Interpret the graphic and comment on the evidence
of any foreseen problem.
f) Perform the Breusch-Godfrey test to detect a possible AR(1) or MA(1) process in the
disturbances of the model.
g) How would you modify the t statistic for individual significance if the OLS estimator is
still used to get the estimates of the coefficients? Use the robust standard errors option
in order to estimate by OLS.
h) Estimate again the production function by the Cochrane-Orcutt (CO) and the grid search
of Hildreth and Lu (HL) methods.
16
Rammanathan, R. (2002), Introductory econometrics with applications, data 9-5.gdt
135
i) Comment on the obtained results with each method of estimation(OLS, CO and HL).
j) Using the Hildreth-Lu estimation results, test the null hypothesis H0 : β3 = 2β4 . Explain
all elements of the test.
COMPUTER EXERCISE 6
To analyse the effects of the pronatalistic policy of the U.S. government in the XX century there
exist yearly data for the period 1913-1984 on the following variables17 :
1) Give the data a time series structure by clicking on the Gretl main window in
Data → Dataset structure → . . .
2) Estimate by OLS the model proposed in (1). Give an interpretation of the results.
3) Obtain the time series plot of the variable gfr t and that of the residuals. Comment them
considering the R2 obtained in the previous question.
4) Re-estimate the model adding the regressors pillt and ww2t . What phenomena are they
intended to take into account? Has their inclusion any effect on the previous plots?
5) Test for the existence of first order autocorrelation by means of the Durbin and Watson
statistic.
6) Having in mind all the above information and results, test for the individual significance of
the variable pet .
7) Estimate the model by the Cochrane-Orcutt method. Comment the obtained results, per-
forming any test considered necessary.
8) Estimate the model by the Hildreth-Lu method. Is there any significant difference? Why?
9) Add as regressor the one-period lagged variable gfr t−1 and estimate the new model by OLS.
Obtain the plot of the residuals and comment it. Perform a test for first order autocorrelation.
According to the result of the test, what would you say about the results of the analysis? Do
you think it is necessary to use any other estimator? Why?
17
Wooldridge, J.M. (2001), Introductory Econometrics, data fertil3.gdt.
136
10) Add a time trend t to the model. Given the residual plot, try the inclusion of a quadratic
trend, t2 .
11) Perform the Durbin-Watson test in this model. Comment on the results.
12) How would you test the individual significance of the explanatory variables in the model? Use
an adequate estimate of the standard deviations of the coefficients, given all the information
available so far.
COMPUTER EXERCISE 7
To analyse the effects of the pronatalistic policy of the U.S. government in the XX century there
exist yearly data for the period 1913-1984 on the following variables18 :
1) Give the data a time series structure by clicking on the Gretl main window in
Data → Dataset structure → . . .
2) Estimate by OLS the model proposed in (1). Check for the existence of autocorrelation in
this model.
3) Specify a dynamic model by including as regressors 4 (four) consecutive lags of the endogenous
variable gfr t , that is, add gfr t−1 , ..., gfr t−4 to the list of regressors. Check for their joint and
individual significance using valid statistics.
4) Specify a different dynamic model by including as regressors 4 (four) consecutive lags of the
variable pe t , that is, add pe t−1 , ..., pe t−4 to the list of regressors. Check for their joint
and individual significance using valid statistics. Does this specification introduce any other
problem that you may identify?
5) Include all lagged variables considered in questions 3 and 4 above. Then, based on formal
tests and sequentially:
137
ii) Omit all those variables you find not significant at the 5% significance level, including
lagged and non-lagged variables. You may have to re-estimate the model more than
once.
iii) Save to session as an icon the final model you consider as best and write down its Sample
Regression Function.
6) Test for the presence of autocorrelation in model (2). Instead of adding any lagged vari-
ables, obtain an asymptotically efficient estimator of its parameters. Write down the related
transformed model and the values of all estimated parameters (β̂i and ρ̂).
7) Try to choose the best specification between those in questions 5c) and 6. You can take a
decision by both:
i) Testing the restrictions you judge convenient on the final model of question 5c).
ii) Looking carefully at the residual plots of both estimated models.
138