0% found this document useful (0 votes)
117 views142 pages

Basee2 Students

info for students

Uploaded by

kaiangel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views142 pages

Basee2 Students

info for students

Uploaded by

kaiangel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 142

APPLIED ECONOMETRICS

3rd. year GE

Recommended Problems Set

2021-2022
Unauthorized reproduction of this text and distribution of copies are strictly prohibited, as
well as any other infringement of other rights, which correspond to the Department of Applied
Economics III (Econometrics and Statistics), University of the Basque Country UPV/EHU.

c
UPV/EHU 2021.

Author (translation):
J. Arteche
Juan I. Modroño
Contents EXERCISE 21 (LADE-2001.4) (Sep-2001) 24

EXERCISE 22 (LE-2002.5) (Sep-2002) 24


GENERALIZED LEAST SQRS. 1
EXERCISE 23 (LE-2003.5) (Jun-2003) 26
EXERCISE 1 (LE-2002.2) (Jun-2002) 1

EXERCISE 24 (LE-2005.4) (Sep-2005) 28


EXERCISE 2 (PV-E.8) (Sep-1993) 1

EXERCISE 25 (LE-2007.1) (Jun-2007) 30


EXERCISE 3 (PV-E.15) (Sep-1994) 2

EXERCISE 4 (LADE-1997.1) (Jun-1997) 3 STOCHASTIC REGRESSORS 34

EXERCISE 26 (PV-G.18) (Jun-1995) 34


HETEROSCEDASTICITY 4
EXERCISE 27 (PV-G.22) (Feb-1996) 34
EXERCISE 5 (PV-E.6) (Jun-1993) 4

EXERCISE 28 (LADE-1999.3) (Jun-1999) 35


EXERCISE 6 (PV-E.20) (Feb-1995) 4

EXERCISE 29 (LE-2000.3) (Jun-2000) 35


EXERCISE 7 (PV-E.38) (Feb-1997) 5

EXERCISE 30 (LE-2002.2) (Jun-2002) 36


EXERCISE 8 (LE-2000.7) (Sep-2000) 5

EXERCISE 31 (LE-2002.7) (Sep-2002) 37


EXERCISE 9 (LADE-2001.1) (Jun-2001) 7

EXERCISE 32 (LE-2003.3) (Jun-2003) 37


EXERCISE 10 (LE-2002.1) (Jun-2002) 8

EXERCISE 33 (LE-2003.4) (Jun-2003) 38


EXERCISE 11 (LE-2003.7) (Sep-2003) 10

EXERCISE 34 (LE-2004.3) (Jun-2004) 39


EXERCISE 12 (LE-2004.5) (Sep-2004) 11

EXERCISE 35 (LADE-2004.4) (Jun-2004) 40


EXERCISE 13 (LE-2005.1) (Jun-2005) 14

EXERCISE 36 (LE-2008.4) (Sep-2008. Final


EXERCISE 14 (LADE-2005.5) (Sep-2005) 16
examination. Written test.) 41

EXERCISE 15 (LE-2008.5) (Sep-2008. Final


examination. Written exam.) 17 DYNAMIC MODELS 43

EXERCISE 37 (PV-E.42) (Jun-1997) 43


AUTOCORRELATION 21
EXERCISE 38 (LADE-1998.6) (Sep-1998) 43
EXERCISE 16 (PV-E.2) (Feb-1993) 21

EXERCISE 39 (LE-2000.1) (Jun-2000) 45


EXERCISE 17 (LADE-1999.2) (Jun-1999) 21

EXERCISE 40 (LADE-2004.6) (Sep-2004) 46


EXERCISE 18 (PV-E.44) (Sep-1997) 22

EXERCISE 41 (LE-2006.4) (Sep-2006) 48


EXERCISE 19 (LADE-1998.5) (Sep-1998) 22

EXERCISE 42 (LE-2008.2) (Jun-2008. Final


EXERCISE 20 (LADE-2001.2) (Jun-2001) 23
examination. Applied test.) 51
EXAMS 56 EXERCISE 67 (GE.25) (May-2018) 97

EXERCISE 43 (GE.1) (May-2013) 56 EXERCISE 68 (GE.26) (May-2018) 99

EXERCISE 44 (GE.2) (May-2013) 60 EXERCISE 69 (GE.27) (July-2018) 101

EXERCISE 45 (GE.3) (June-2013) 61 EXERCISE 70 (GE.28) (July-2018) 103

EXERCISE 46 (GE.4) (June-2013) 63 EXERCISE 71 (GE.29) (May-2019) 105

EXERCISE 47 (GE.5) (June-2013) 65 EXERCISE 72 (GE.30) (May-2019) 107

EXERCISE 48 (GE.6) (May-2014) 65 EXERCISE 73 (GE.31) (July-2019) 108

EXERCISE 49 (GE.7) (May-2014) 67 EXERCISE 74 (GE.32) (July-2019) 111

EXERCISE 50 (GE.8) (Jun-2014) 68 EXERCISE 75 (GE.33) (May-2020) 113

EXERCISE 51 (GE.9) (Jun-2014) 70 EXERCISE 76 (GE.34) (May-2020) 116

EXERCISE 52 (GE.10) (May-2015) 71 EXERCISE 77 (GE.35) (July-2020) 117

EXERCISE 53 (GE.11) (May-2015) 73 EXERCISE 78 (GE.36) (July-2020) 120

EXERCISE 54 (GE.12) (June-2015) 76 EXERCISE 79 (GE.37) (May-2021) 121

EXERCISE 55 (GE.13) (June-2015) 78 EXERCISE 80 (GE.38) (May-2021) 123

EXERCISE 56 (GE.14) (June-2015) 80 EXERCISE 81 (GE.39) (July-2021) 124

EXERCISE 57 (GE.15) (May-2016) 81 EXERCISE 82 (GE.40) (July-2021) 127

EXERCISE 58 (GE.16) (May-2016) 83


COMPUTER EXERCISES 129
EXERCISE 59 (GE.17) (June-2016) 85
COMPUTER EXERCISE 1 129

EXERCISE 60 (GE.18) (June-2016) 88


COMPUTER EXERCISE 2 130

EXERCISE 61 (GE.19) (June-2016) 90


COMPUTER EXERCISE 3 132

EXERCISE 62 (GE.20) (May-2017) 90


COMPUTER EXERCISE 4 133

EXERCISE 63 (GE.21) (May-2017) 92


COMPUTER EXERCISE 5 135

EXERCISE 64 (GE.22) (July-2017) 94


COMPUTER EXERCISE 6 136

EXERCISE 65 (GE.23) (July-2017) 95


COMPUTER EXERCISE 7 137

EXERCISE 66 (GE.24) (July-2017) 97


GENERALIZED LEAST SQUARES

EXERCISE 1 (LE-2002.2) (Jun-2002)

Consider the model

Yt = β1 + β2 X2t + β3 X3t + ut ut ∼ iid(0, σ 2 )

where X2t is a fixed variable, X3t is a stochastic variable and β = (β1 , β2 , β3 )′ is the vector of
unknown parameters.

a) Why is the OLS estimator of β non-linear?

b) Which assumption does guarantee the unbiasedness of the OLS estimator of β? Show why.

c) If X3t is stochastic and not independent of ut but E(X3t ut ) = 0, ∀t, ¿is the OLS estimator
of β consistent? Prove it and indicate what additional assumptions are needed to get the
desired result.

d) If X3t is stochastic but the assumptions in the Mann-Wald’s theorem are satisfied, is
it possible to make inference on β even if the distribution of ut is unknown? Explain
thoroughly.

EXERCISE 2 (PV-E.8) (Sep-1993)

Let the model Yt = α + βXt + ut where E(u2t ) = tXt2

a) With three observations of Yt and Xt obtain by OLS in its matrix form the estimates of α
and β.

t 1 2 3
Yt 1 1 0
Xt 1 -1 1

b) It is now also known that:

E(u1 u3 ) = E(u3 u1 ) = 1
E(u1 u2 ) = E(u2 u1 ) = E(u2 u3 ) = E(u3 u2 ) = 0.

Given the observations of Yt and Xt and the information on E(ut us ), calculate the variance-
covariance matrix of the OLS estimator.

1
c) Given the above information, what are the statistical properties of the Ordinary Least
Squares estimator?
d) Do you know an estimator with better properties? Which one? Describe its properties
and write down its variance-covariance matrix (do not calculate it, just write down its
mathematical expression and explain each of its elements).
e) Consider now the model Yt = α + βXt + ut with
E(u2t ) = tXt2 and E(ut us ) = 0 ∀t, s t 6= s
Write down the transformed model that corrects this problem and show that the variances
of the disturbances in the transformed model are constant.
f) Estimate by OLS and using matrix algebra the parameters of the transformed model.

EXERCISE 3 (PV-E.15) (Sep-1994)

A researcher A wants to explain the students expenses with the following model:

Yi = α + βXi + ui i : 1, . . . , N (1)

where Yi : i-th student’s expenses


Xi : i-th student’s income
In model (1) all Gauss-Markov basic assumptions are satisfied, particularly

E(ui ) = 0
V ar(ui ) = σu2 ∀i
E(ui us ) = 0 ∀i 6= s

Another researcher B thinks that, in order to simplify calculations, it is better to group the
data for each classroom and estimate the parameters using the grouped data. The students are
grouped in 8 classes and the number of students in each class is n1 , n2 , . . . , n8 . Researcher B
will so be using 8 observations for each variable, one for every class:
Pnj Pnj
Yk Xk
Yj = k=1
nj Xj = k=1
nj j : 1, 2, . . . , 8

The model considered by researcher B is then:


Y j = α + βX j + vj j : 1, 2, . . . , 8

a) What are the mean and variance of the disturbance vj ?


b) Both researchers want to estimate their models by OLS. Are they right? Which estimation
method is more adequate for these cases? Why?
c) How would your previous answer change if the number of students were the same in all
the classes?

2
EXERCISE 4 (LADE-1997.1) (Jun-1997)
Consider the following linear regression model:

Yt = β1 + β2 X2t + β3 X3t + ut t = 1, . . . , 100

where X2 and X3 are non-stochastic and ut ∼ N ID(0, σt2 = a + bt2 ).

a) Assume that a = 2b with b an unknown parameter.

i) Obtain the variance and covariance matrix of Y .


ii) Indicate the adequate method of estimation for this model, explaining your answer.

b) Assume now that a = 0 and b is an unknown parameter. After GLS estimation, we have
obtained the following estimates:
   
2 3 −2 1
β̂GLS =  3  Vb (β̂GLS ) =  −2 4 0 
−1 1 0 3

Test the following hypothesis:

i) β3 = 0
ii) β3 = 0 and β1 + 2β2 = 5

3
HETEROSCEDASTICITY

EXERCISE 5 (PV-E.6) (Jun-1993)

Let the model:


Yt = β1 + β2 X2t + β3 X3t + ut t = 1, 2, . . . , T
where:
E(ut ) = 0 ∀t
1
E(u2t )= 2 ∀t
X3t
E(ut us ) = 0 ∀t 6= s

a) Write down the transformed model with homoscedastic disturbances. Show the properties
of the transformed disturbances.
b) If T = 4, write down the matrix of regressors X in the transformed model if

t 1 2 3 4
X2t 0 1 1 2
X3t 3 0.5 1 1

EXERCISE 6 (PV-E.20) (Feb-1995)

Let the model


Yi = α + βXi + ui i = 1, . . . , N with V ar(ui ) = Pi2 σ 2

Decide which of the following models is correctly transformed in order to correct the het-
eroscedasticity problem and explain why.

(1) Pi Yi = α + βPi Xi + Pi ui

Yi α Xi ui
(2) = +β +
Pi Pi Pi Pi

(3) Pi Yi = αPi + βPi Xi + Pi ui

Yi Xi ui
(4) =α+β +
Pi Pi Pi

4
EXERCISE 7 (PV-E.38) (Feb-1997)

The consumption model


Ct = β 1 + β 2 I t + ut (1)
has been estimated for the Basque Country using yearly data from 1965 to 1994. Two separate
OLS estimations have been performed, using the first ten and last ten observations:

1965 − 1974 : Ĉt = 22699, 0 + 0, 336It


T SS1 = 9703500, 0 R12 = 0, 85

1985 − 1994 : Ĉt = 38767, 0 + 0, 6542It


T SS2 = 457036363, 0 R22 = 0, 78

a) Implement the Goldfeld-Quandt test of homoscedasticity.

b) The OLS estimation of the full sample (1965-1994) is:

Ct = 35205, 0 + 0, 586It + ût R2 = 0, 82 (2)

with the auxiliary regression:

û2t = 64519, 0 + 0, 52It + v̂t R2 = 0, 71 (3)

Use the provided information in these two regressions to test the same null hypothesis as
in a).

c) Taking into account all the results obtained above, which estimation method would you
use for the consumption model? Why? Explain in detail.

EXERCISE 8 (LE-2000.7) (Sep-2000)

An expanding commerce business wants to perform an analysis of the relationship between the
industrial sector and the number of offices per province. For that, a sample of 50 observations
for the variables S (no. of office branches per province) and L (no. of commercial licenses, as
an indicator of the importance of the commercial sector) are available. Its research department
estimates by OLS the following regression:

Si = β1 + β2 Li + ui (1)

The results of this estimation are:

Ŝi = 22, 2 + 0, 5 Li , R2 = 0, 3 (2)


(t − ratio) (3, 9) (5, 05)

The graphical representation of the endogeneous variable Si and the OLS residuals against the
explanatory variable Li is:

5
Model variables OLS Residuals

a) The research manager is not convinced by these results. Which problems do you think
these graphics evidence?

The same manager proposes two alternative ways to improve the estimation. The first one
consists in estimating by OLS the following equation:
S 1 p ui
√i = β 1 √ + β 2 Li + √ (3)
Li Li Li

b) What is the basic hypothesis that is not satisfied in model (1) in order to use model (3)?
What solution is proposed here? What is the expected improvement over the first OLS
estimation in (2)?

c) Considering the graphical representation of the variable √SLi and the OLS residuals of
√ i
model (3) against Li , do you think that the problem is correctly solved?

The second possibility is that the relationship between Si and Li is not linear but exponential
Si = exp{γ1 + γ2 Li + vi }, so that the following model is estimated by OLS:

ln S i = γ1 + γ2 Li + vi (4)

6
giving the following results for the whole sample of 50 observations:

d
ln Si = 3, 31 + 0, 02 Li , R2 = 0, 33 RSS = 10, 54 (5)
(t − ratio) (31, 0) (5, 3)

v̂i2
= 0, 053 + 0, 017 Li + êi , R2 = 0, 014 RSS = 89, 72 (6)
0, 21 (0, 09) (1, 6)

Furthermore, after sorting the sample according to the values of the variable L, two regressions
like (4) have been estimated using the first and the last 12 observations. The residual sums of
squares obtained are RSS1 = 0.77 and RSS2 = 0.992 respectively.

d) Do you think that model (4) has the same problem with the fullfilment of the basic
assumptions as model (1)? Justify your answer with a formal test. Explain what you do
and why.

e) Which one of the two proposed solutions is better? Explain in detail.

EXERCISE 9 (LADE-2001.1) (Jun-2001)

In order to model the relationship between household consumption (Y ) and income of the
householder (X) the following equation is proposed:

Yi = α + βXi + ui (1)

where ui is supposed to have a normal distribution. We have the following data from 10 house-
holds:

i 1 2 3 4 5 6 7 8 9 10 Sum
Y 8 91 191 22 55 32 81 176 138 31 825
X 4 49 100 9 25 16 36 81 64 16 400

The following OLS estimates have been obtained:


   P −1  P   −1    
α̂ N X i Y i 10 400 825 2, 5
= P P 2 P = =
β̂ Xi Xi Xi Y i 400 25588 52176 2

In addition, the next auxiliary regression has been calculated:

û2i X
= −0, 245 + 0, 0311Xi + ŵi ŵi2 = 1, 1473 R2 = 0, 89 (2)
48, 65

where ûi are the OLS residuals from model (1).

7
a) Use some graphical method to search for traces of heteroscedasticity. Comment the results.

b) Test for the existence of heteroskedasticity caused by the variable Xi by means of the
Breusch-Pagan statistic. State clearly the null hypothesis, the alternative, the testing
statistic and its distribution. Comment on the reliability of the above test on this
particular case.

c) Estimate model (1) by GLS under the assumption that V ar(ui ) = σi2 = σ 2 Xi

d) Is the variable income of the householder, X, relevant to explain the household income,
Y?

EXERCISE 10 (LE-2002.1) (Jun-2002)

The effect of an increase in Social Security contributions on the part of the contributions paid
by the workers is to be estimated with a sample of 15 countries. The information (in 1982) of
the Social Security contributions (SSC) and the workers’ contributions part (WSSC) both as a
percentage of the full fiscal income is presented in the first two columns of the following table:

SSC WSSC û
Austria 31,9 13,5
Belgium 29,8 10,1 -0,08327
Denmark 2,8 1,5 -2,97434
France 43,2 11,5
Germany 36,2 16,1
Ireland 15,0 5,4 -1,65393
Italy 47,2 7,1
Japan 30,4 10,7 0,38986
Luxembourg 28,0 11,2 1,39732
The Netherlands 41,6 18,0
Portugal 28,5 10,8 0,89160
Spain 46,5 10,3
Switzerland 31,0 10,2 -0,23700
United Kingdom 16,9 7,6 0,14433
U.S.A. 27,7 10,8 1,06076

We consider the following model:

W SSCi = β1 + β2 SSCi + ui i = 1, . . . , 15

The OLS estimated model using data from the 15 countries is:

Wd
SSC i = 3, 8823 + 0, 211442 SSCi (1)
(t − stat.) (1, 69) (3, 01)

8
R̄2 = 0, 365 RSS = 132, 7767

a) Look at the table carefully, the OLS residuals ûi are displayed in the third column.
Indicate the general form to obtain ûi . Then complete the missing values in the same table
and in the following picture:

b) Once the graphic is completed, do you perceive any problem?


c) Using the following information, perform the Goldfeld and Quandt test. You must fill
in the missing information and indicate clearly all the elements of the testing procedure,
including the null and alternative hypotheses.
• First subsample
Wd
SSC i = 0, 463351 + 0, 374431SSCi (2)

W SSCi 1,5

SSCi 2,8

û1 -0,011759 0,808758 0,25257

9
• Second subsample
Wd
SSC i = 28, 9928 − 0, 395203SSCi (3)

W SSCi 13,5

SSCi 31,9

û2 1,413507 -0,420075 -3,239264


d) Given the previous evidence and the next information, estimate efficiently the coeffi-
cients of the model. Explain how this estimator is obtained and the assumptions needed
for its efficiency.

W SSCi /SSCi 1/SSCi Constanti = 1


W SSCi /SSCi 2,12814 0,3672255 5,47296
1/SSCi 0,1463262 0,8374455
Constanti = 1 15
P
where, say, W SSCi /SSCi = 5, 47296.
e) With the estimator proposed in the previous item, test the null hypothesis that an increase
in the Social Security contributions would fall fully on the workers’ side, that is, Ho : β2 =
1. Indicate all the assumptions needed for this test to be valid.

EXERCISE 11 (LE-2003.7) (Sep-2003)

Consider the following regression model:


Y i = β 1 + β 2 Xi + u i i = 1, . . . , N
where Xi is nonstochatic, ui ∼ N (0, σi2 ), E(ui uj ) = 0 for i 6= j and σi2 is a function increasing
with Xi .

a) What problem exists in the previous model? How could it be detected? Explain carefully
the proposed test.
b) What are the consequencesPon the tests of hypothesis about β1 and β2 of using in the t or
û2
F statistics the estimator Ni−2i (X ′ X)−1 ?

A sample of 800 observations is available with the following information:


P P 2 P 1 P 1
i Xi = 330 i Xi = 144 i Xi = 2058 i Xi2 = 5683
P 1 P P 2
i Xi = 1273 i Yi = 2672 i Yi = 9576

P P Yi P Yi P Yi
i Xi Yi = 1108 i Xi = 6835 i Xi2 = 18755 i Xi = 4239

P 2 P 2 2 P 2
i ûi = 660 i ûi Xi = 160 i ûi Xi = 309

10
where ûi = Yi − β̂1 − β̂2 Xi are the residuals obtained from OLS estimation of β1 and β2 .

c) Obtain the OLS estimates of β1 and β2 .

d) If the White estimator has been used, how has the following estimation of the variance and
covariance matrix of the OLS estimator of β1 and β2 been obtained? Indicate explicitly
all the steps needed to reach this result.
 
0, 04 −0, 11
Vd
ar(β̂OLS )W HIT E =
−0, 11 0, 28

e) Using the estimates in c) and d), test H0 : β2 = 0 against Ha : β2 6= 0.

f) Assuming that σi2 = 4Xi2 , how could you obtain an efficient estimator of β1 and β2 ?
Explain thoroughly the estimation procedure.

g) Calculate the estimates of β1 and β2 with the efficient estimator and its variances and
covariances matrix.

h) Test H0 : β2 = 0 against Ha : β2 6= 0 using the efficient estimator of β2 .

i) Could we get different conclusions from the tests in e) and h)? Why?

EXERCISE 12 (LE-2004.5) (Sep-2004)

A database is available with information about the selling price and certain characteristics of
224 houses in two residential areas of the Orange County in California (USA): Dove Canyon
and Coto de Caza 1 . Dove Canyon is a neighbourhood built around a golf course with single
family tract homes with relatively small lots. Coto de Caza is a more upscale area. It is more
rural with large custom homes. The variables considered are:

salepric = sale price in thousands of dollars


sqft = living area in square feet
age = age of house in years
city = 1 for Coto de Caza and 0 for Dove Canyon

We next show the results of the Ordinary Least Squares (OLS) estimation of a model for the
housing selling price using this dataset:

RESULTS A Dependent variable: salepric

VARIABLE COEFFICIENT STDERROR T STAT

const -440,312 35,3203 -12,466


1
Fuente: Ramanathan, Ramu (1992) Introductory Econometrics with Applications

11
sqft 0,252069 0,00815634 30,905
age 3,69805 3,02416 1,223
city 91,8038 21,7494 4,221

Mean of dependent variable = 642,929


Standard deviation of dep. var. = 371,376
Sum of squared residuals = 4,27804e+06
Standard error of residuals = 139,448
R-squared = 0,860905
Adjusted R-squared = 0,859008
F-statistic (3, 220) = 453,884

a) Write down the estimated theoretical model and comment the results in terms of the
goodness of fit, significance and signs of the estimated coefficients.

b) Analyse the information provided by the following graphics and the auxiliary regression.
If you perform some test, describe all its elements. Which graphic is more informative and
why?


ui 2
= − 5, 94184 + 0, 00172457 sqfti
RSSA /224 (-10,387) (12,727)
2
N = 224 R = 0, 421826 RSS = 1478, 52

Figure 1: OLS residuals by observation i=1,...,224


Residuals from the regression (= salepric − estimated salepric)
800

600

400

200
residual

−200

−400

−600

−800
0 50 100 150 200
index

12
Figure 2: OLS residuals against variable sqft
Residuals from the regresssion (= salepric − estimated salepric)
800

600

400

200
residual

−200

−400

−600

−800
3000 4000 5000 6000 7000 8000 9000 10000 11000
sqft

Now we show the results of the OLS estimation using a consistent estimator of the variance
and covariance matrix of the coefficients under heteroscedasticity.

RESULTS B Dependent variable: salepric

VARIABLE COEFFICIENT STDERROR T STAT

const -440,312 110,631 -3,980


sqft 0,252069 0,0279076 9,032
age 3,69805 5,15553 0,717
city 91,8038 26,3404 3,485

Mean of dependent variable = 642,929


Standard deviation of dep. var. = 371,376
Sum of squared residuals = 4,27804e+06
Standard error of residuals = 139,448
Unadjusted R-squared = 0,860905
Adjusted R-squared = 0,859008

c) Describe the changes between the results now shown (RESULTS B) and the former results
(RESULTS A). What is the reason of those changes? Which ones are more reliable and
what for?

Finally, Generalized or Weighted Least Squares estimation is implemented using as weight-


ing variable the inverse of the household size, that is, sqf1 t2 .

13
RESULTS C

WLS estimates using the 224 observations 1-224


Dependent variable: salepric
Variable used as weight: sqft_inv = 1/sqft^2 (see Gretl help)

VARIABLE COEFFICIENT STDERROR T STAT

const -285,205 37,2121 -7,664


sqft 0,215569 0,00959143 22,475
age -0,549288 2,28001 -0,241
city 110,780 15,6896 7,061

Statistics based on the weighted data:

Sum of squared residuals = 0,150742


Standard error of residuals = 0,0261762
Unadjusted R-squared = 0,798817
Adjusted R-squared = 0,796073
F-statistic (3, 220) = 291,177

Statistics based on the original data:

Mean of dependent variable = 642,929


Standard deviation of dep. var. = 371,376
Sum of squared residuals = 4,73514e+06
Standard error of residuals = 146,708

d) Explain what weighted data and original data mean and the differences between both.
Why is it used as weighting variable the inverse of sqft squared?

e) Which results, A, B, or C are in your opinion the best? Why?

EXERCISE 13 (LE-2005.1) (Jun-2005)

We are interested in analysing the relationship between Health aggregated expenditure, Yi and
the aggregated income, Xi , both in billions of dollars, for 51 North American states2 :

Y i = β 1 + β 2 Xi + u i (1)
2
Ramanathan, R. (2002), Introductory Econometrics with Applications, data 3-2.

14
The results of the OLS estimation are:

Ŷi = 0, 3256 + 0, 1420 Xi R2 = 0, 999 (2)


d β̂i ))
(dev( (0,3197) (0,0019)
d β̂i )W hite )
(dev( (0, 2577) (0, 0031)
û2i
û′ û
= 0, 113 + 0, 008Xi + ǫ̂i R2 = 0, 3269 ESS = 55, 89 (3)
T

Figure 3 shows the residuals against the aggregated income.

Figure 3: OLS residuals against Income in (1)

1
residual

-1

-2

-3

-4

-5
0 100 200 300 400 500 600 700
Income

a) Explain how the residuals have been calculated and what Figure 3 has been drawn for.
Interpret that figure.

b) Having in mind Figure 3, perform the test you judge relevant.

c) Explain thoroughly which statistic would you use to test the significance of the variable
Income. Perform the test writing down all its elements.

d) Considering the results of model (1), the researcher decide to estimate again the model
assuming the next structure for the variance of the disturbances: V ar(ui ) = σ 2 Xi . The
following results are obtained:

WLS estimates using the 51 observations 1-51


Dependent variable: exphlth
Variable used as weight: inv_inc

15
VARIABLE COEFFICIENT STDERROR T STAT

const 0,104510 0,162476 0,643


income 0,144202 0,00259765 55,513

Statistics based on the weighted data:

Sum of squared residuals = 1,14534


Standard error of residuals = 0,152887
Unadjusted R-squared = 0,984348
Adjusted R-squared = 0,984029

i) Why is V ar(ui ) = σ 2 Xi chosen as the variance of the disturbances? Explain how the
estimates have been obtained.
ii) Assuming normality for ui , test the significance of the variable Income.

e) The researcher is not convinced by the results obtained with the function chosen for
V ar(ui ) and wishes to re-estimate model (1) assuming that V ar(ui ) = a + bXi , where
a and b are unknown.

i) Explain in detail how you would estimate the coefficients of model (1) under this
assumption.
ii) Assuming σ̂i2 = â + b̂Xi , perform the estimation described in the previous item with
the following sample information:
P 2 P 2 P P
P û i = 148, 699 P û i X i = 34945, 67 P (X i /σ̂ i ) 2 = 196420, 998
P (Xi /σ̂i 2 ) = 1608, 337
2 2
(1/σ̂i )2 = 34, 738 (Yi /σ̂i ) = 236, 139 (Yi Xi /σ̂i ) = 28484, 578 (Yi2 /σ̂i 2 ) = 4168, 919

iii) Test the significance of the explanatory variable.

f) What would you comment on the validity of the tests performed in c), d.ii) and e.iii)?

EXERCISE 14 (LADE-2005.5) (Sep-2005)

The following regression model is proposed to analyse the effects of advertising spending, Xi ,
on the income of the restaurants, Yi , in a particular city:

Yi = α + βXi + ui ui ∼ N ID(0, σu2 ) (1)

With a sample of 166 restaurants, data on the average income (in thousands of euros) and on the
monthly advertising spending (in hundreds of euros) are available with the restaurants grouped
by districts.

16
District 1 2 3 4 5 6 7
Yj 10 12 14 18 17 18 20
Xj 3 5 9 12 15 17 19
nj 9 4 36 16 81 4 16

P P
where X j = n1j i∈Bj Xi , Y j = 1
nj i∈Bj Yi and nj denotes the number of restaurants in district
Bj , j = 1, 2, . . . , 7.

In addition the following information is also available:


7
X 7
X 7
X 7
X 7
X
√ √ √ 2 √ 2 √
nj X j = 366; nj Y j = 479; nj X j = 5186; nj Y = 7909;
j nj X j Y j = 6257
j=1 j=1 j=1 j=1 j=1
7 7 7 2 7 2 7
X Xj X Yj X Xj X Y j
X XjY j
= 8, 21; = 11, 59; = 116, 09; = 182, 37; = 138, 73
nj nj nj nj nj
j=1 j=1 j=1 j=1 j=1
7
X 7
X 7
X 7
X 7
X
2 2
nj X j = 2150; nj Y j = 2699; nj X j = 30558; nj Y j = 44821; nj X j Y j = 36461
j=1 j=1 j=1 j=1 j=1

a) Given that we only have information on the averages, which model could you use to
estimate α and β? Show the properties of the disturbances in that model.

b) Obtain efficiently estimates of the parameters of the model and describe in detail the
estimator and its properties.

c) Test if advertising spending has a positive marginal effect on income.

d) Without making any calculous, how would you estimate the model proposed in a) if the
variance of the disturbances in the original model (1) increases with the advertising spend-
ing such that Var(ui ) = σu2 Xi ?

EXERCISE 15 (LE-2008.5) (Sep-2008. Final examination.


Written exam.)

A travel agency in Chicago wants to analyse if there exist differences in the distance travelled
by the families in their choice of destinations for vacation, as a function of the number of kids in
the family. For that purpose it has a sample of 200 households in Chicago interviewed in 20073 .
The following model is specified :

M ilesi = β1 + β2 Incomei + β3 agei + β4 kidsi + ui i = 1, . . . , 200 (1)


3
These data come from file vacation.dat from the book Undergraduate Econometrics by Hill, Griffiths and
Judge (2001).

17
where M iles are the miles travelled by one household in the vacations of one year, Income is
the annual income in thousands of dollars, age is the average age of the adult members of the
household and kids is the number of children under 16 in the household.

A first estimation by OLS gives:

Md
ilesi = −391, 55 + 14, 201 Incomei + 15, 741 agei −81, 826 kidsi (2)
d β̂OLS ))
(s.d.( (169,8) (1,80) (3,757) (27,13)
R2 = 0, 340605 RSS = 40099000

Figure 4: OLS residuals on Income and Age


Residuals of the regression (= Miles − estimated Miles) Residuals of the regression (= Miles − estimated Miles)
2000 2000

1500 1500

1000 1000

500 500
residuals

residuals

0 0

−500 −500

−1000 −1000

−1500 −1500
20 40 60 80 100 120 25 30 35 40 45 50 55
Income age

a) What do these plots suggest? Comment each of them in detail.

b) After grouping the observations of all variables into two groups according to a decreasing
sorting of the variable Income, and estimating the above model (1) by OLS for each group
separately, the following results are obtained:

First subsample: OLS estimates using 80 observations 1-80


Dependent variable: miles

Variable Coefficient std. error t-ratio p-value


const −129,22 615,610 −0,2099 0,8343
Income 13,1490 6,14562 2,1396 0,0356
age 13,3666 7,59215 1,7606 0,0823
kids −114,18 52,9888 −2,1549 0,0343

Residual Sum of Squares 2,42765e+07


R2 0,116112

Second subsample: OLS estimates using 80 observations 121-200


Dependent variable: miles

18
Variable coefficient std. error t-ratio p-value
const −339,64 220,160 −1,5427 0,1271
Income 9,68801 4,01043 2,4157 0,0181
age 18,6511 3,87408 4,8143 0,0000
kids −66,026 29,8963 −2,2085 0,0302

Residual Sum of Squares 7,04816e+06


R2 0,308962

Perform a test to verify if what you have answered in a) is statistically significant. You
must point out clearly all the elements of the test, including the null and the alternative
hypotheses.

c) If the result of the performed test gives support to reject the null hypothesis, what would
you change in the results in (2) if you are unwilling to change the estimation method?
Why? Explain in detail.

An alternative method of estimation to OLS has also been used in order to improve the
efficiency in the estimation of the β coefficients. Using the Gretl software the following
results have been obtained:

WLS estimates using 200 observations 1-200


Dependent variable: miles
1
Variable used as weight: Income

Variable coefficient std. error t-ratio p-value


const −408,37 145,717 −2,8025 0,0056
Income 13,9705 1,64821 8,4762 0,0000
age 16,3483 3,42222 4,7771 0,0000
kids −78,363 24,7355 −3,1680 0,0018

Statistics based on the weighted data:

Residual Sum of Squars 580616,


Standard deviation of the residuals (σ̂) 54,4272
R2 0,390722
Adjusted R̄2 0,381397
F (3, 196) 41,8975

Statistics based on the original data:

Mean of the dependent variable 1054,23


S. D. of the dependent variable 552,799
Residual Sum of Squares 4,01134e+07
Standard deviation of the residuals (σ̂) 452,394

19
d) Fill in the blanks in the following expressions concerning the disturbance term of the model
and the estimation method used to get these results.

E(ui ) = E(u2i ) = E(ui uj ) =


 
 
 
 
 
 
E(uu′ ) =  
| {z }  
 
(........ × .......)  
 
 

i=....
X
Estimation criterion:........ RSS = (Yi∗ − βˆ1 X1i

− βˆ2 X2i

− βˆ3 X3i

− βˆ4 X4i
∗ 2
)
i=....

Yi∗ = ...................; ∗
X1i = .....................; ∗
X2i = ...................;
∗ ∗
X3i = ...........................; X4i = .............................;
 −1  
   
   
   
   
   
   
β̂...... =







   
   
   
   
   

e) If you had to test H0 : β2 = 10, how would you do it? Explain your answer in detail.

20
AUTOCORRELATION

EXERCISE 16 (PV-E.2) (Feb-1993)

A company wants to analyse the relationship between its consumption of petrol (Ct ) and its
price (Pt ). Using annual data the following OLS estimation is obtained:
Ĉt = 5278.44 − 23.36Pt
Year ût Year ût
1980 -112.93 1986 58.55
1981 -74.53 1987 155.71
1982 9.46 1988 43.67
1983 33.75 1989 -19.90
1984 58.49 1990 -85.66
1985 59.33 1991 -125.96

a) Test the hypothesis of no autocorrelation using the Durbin-Watson’s procedure. Explain


the implications of the results of this test on the method of estimation used in this model.
b) If the true relationship between CT and Pt is:
Ct = β1 + β2 Pt + β3 Pt2 + ut ,
Do you maintain the same conclusions about the properties of the OLS estimator?
c) The company closes for vacations on August. If you have monthly data, propose a model
to capture this fact on the relationship between Ct and Pt .

EXERCISE 17 (LADE-1999.2) (Jun-1999)

Consider the following model Yt = α + βXt + ut with ut = ρut−1 + εt εt ∼ N ID(0, σε2 )


The following data are available:

Yt 3 3 4 3 2 2
Xt 1 2 3 4 5 6

a) If ρ = 0.7, estimate the parameters α and β using Generalized Least Squares (GLS).
Explain in detail all the steps.
b) Test the hypothesis H0 : β = 1 at 5% significance level.
c) Assuming that the sample size is large enough, how would you estimate the parameters of
the model if ρ were unknown? Explain all the process in detail.

21
EXERCISE 18 (PV-E.44) (Sep-1997)

Next table shows data on wages (Y ) and worked hours (X) of the employees of a company. It
is also known if the worker is man (M ) or woman (W ):
P 2 P
Y 170 180 165 165 105 95 100 90 P Yi = 153900 P Yi 2= 1070
X 40 50 30 40 50 35 40 35 P Xi = 320 Xi = 13150
Gender M M M M F F F F Xi Yi = 43075

In order to explain the wages of the employees, a researcher propose the following model: Yi =
α + βXi + ui where ui ∼ N ID(0, σu2 ).

a) Estimate by OLS the parameters of the model and check the significance of the explanatory
variable X.

b) Is there any evidence of AR(1) autocorrelation in the disturbances?

c) Another researcher thinks that gender is a relevant variable to explain the salary. Propose
and estimate a model that includes this hypothesis and test it.

d) The Durbin-Watson test statistic is d = 2.2. Do you find evidence of AR(1) autocorrelation
in the disturbances of this model? Relate your answer to the result obtained in b).

e) Is Xi significative? Relate you answer with the result obtained in a).

EXERCISE 19 (LADE-1998.5) (Sep-1998)

The relationship between the sales of certain product (Y ) and its price (X) is analysed, specifying
the following model:

Y t = α + β Xt + u t (1)

We have the following data:

t 1 2 3 4 5 6
Y 27 32 25 31 30 32
X 9 12 8 10 12 11
û -0,5 0 -1 2 -2 1,5

û = Y − X β̂OLS β̂OLS = (X ′ X)−1 X ′ Y

a) Is there any evidence of first order autocorrelation in model (1)? Base your answer on
some formal test.

22
The following model has also been estimated by OLS:
Yt − ρ∗ Yt−1 = α(1 − ρ∗ ) + β(Xt − ρ∗ Xt−1 ) + εt εt ∼ N (0, σε2 ) (2)
for different values of ρ∗ , resulting in the following Residual Sums of Squares (RSS):

ρ∗ 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 0


RSS 34,2 30,9 27,8 24,9 22,2 19,6 17,2 15,1 13,0 11,1

ρ∗ -0,1 -0,2 -0,3 -0,4 -0,5 -0,6 -0,7 -0,8 -0,9 -0,99
RSS 9,4 7,8 6,5 5,3 4,2 3,3 2,6 2,1 1,7 2,1

b) Given the information above, calculate the estimates of ρ, α and β by Hildreth-Lu.


c) What are the properties of the estimators in the previous question?

EXERCISE 20 (LADE-2001.2) (Jun-2001)

In order to analyse the sales structure of a certain model of car, the following model is specified,
Yt = β1 + β2 Pt + β3 Qt + β4 Xt + ut (1)
where Yt =income obtained with the sales of the car, Pt =car price, Qt =medium price of the rest
of models with similar characteristics, Xt = income per capita. With a sample of 100 observations
the model has been estimated by OLS obtaining the next results:
Ŷt = 1, 5 + 0, 1 Pt − 0, 5 Qt + 0, 7 Xt (2)
d
(dev) (0, 2) (0, 3) (0, 15) (0, 05)

R2 = 0, 87 RSS = 215

iid
a) Test the significance of Pt , assuming that ut ∼ (0, σu2 ). Make some comments on the
obtained result.
b) Perform also a test for the existence of first order autocorrelation in the disturbances,
making use of one of the next results:
ût = 0, 2 + 0, 3ût−1 + 0, 15Pt + 0, 12Qt + 0, 01Xt + v̂1t R2 = 0, 15 ESS = 75
ût = 0, 35ût−1 + 0, 22ût−2 + 0, 1Pt + 0, 16Qt + 0, 04Xt + v̂2t R2 = 0, 18 ESS = 74
ût = 0, 3 + 0, 24ût−1 + v̂3t R2 = 0, 05 ESS = 56
ût ût−1
2
= 0, 13 + 0, 2 2 + 0, 19Pt + 0, 02Qt + 0, 09Xt + v̂4t R2 = 0, 35 ESS = 98
σ̂ σ̂
Are the results of the test implemented in a) affected by the result of this test?
iid
c) If ut = ρut−1 + εt where εt ∼ (0, σε2 ) and |ρ| < 1 is unknown, explain in detail how you
would estimate the parameters of model (1) in the best possible way.
d) In the context described in c), how would you perform the significance test of Pt ? Explain.

23
EXERCISE 21 (LADE-2001.4) (Sep-2001)

Let the model Yt = α + β Xt + ut and consider the following data

t Y X
1 2 -3
2 10,2 5
3 17,9 13
4 2,3 -3
5 10 5
6 18,2 13
7 -5,7 -11
8 -14,1 -19
Sums 40,8 0

The OLS estimates are:


   P −1  P   −1    
αb T P Xt Y t 8 0 40, 8 5, 1
= P = =
βb − Xt2 Xt Y t 0 888 888 1

a) Use some graphical method to look for evidence of autocorrelation.

b) Perform a test to check if ut follows a first order autoregressive process. State clearly the
null and the alternative hypothesis, the test statistic and the decision rule.

c) Estimate ρ if we assume that the disturbances follow an autoregressive process of order


iid
one, ut = ρ ut−1 + εt where εt ∼ (0, σε2 ) and |ρ| < 1.

d) Making use of the previous result, estimate the parameters α and β by FGLS.

e) Is the variable X relevant to explain Y ? Use a formal test, specifying clearly the null and
the alternative hypotheses and the distribution of the test statistic.

EXERCISE 22 (LE-2002.5) (Sep-2002)

Consider the following yearly observations (the first three columns) of the variables Consumption
(Ct ) and National Income (Rt ):

24
Obs. C R Cb û
1 8,547 11,0 8,0483680 0,498632
2 8,942 13,5 9,7986580 -0,856658
3 10,497 14,0 10,148716 0,348284
4 10,173 14,9 10,778820 -0,605820
5 11,997 15,1 10,918843 1,078157
6 10,729 18,0 12,949180 -2,220180
7 12,750 18,8 13,509273 -0,759273
8 15,611 19,1 13,719307 1,891693
9 13,545 21,0 15,049528 -1,504528
10 17,843 21,2 15,189551
11 21,610 34,0 24,151036
12 25,473 34,3 24,361070
13 24,434 35,0 24,851152
14 28,274 38,0 26,951500

The OLS estimates of the consumption function

Ct = β1 + β2 Rt + ut

are:

Ĉt = 0, 347092 + 0, 700116 Rt (1)


(t − stat.) (0, 31) (14, 61)

R̄2 = 0, 942 RSS = 30, 6381

a) The last column in the table shows the OLS residuals. Fill in that column and the time
series displayed in the residual plot shown below. Having in mind such plot, do you find
evidence of any problem?.

25
b) Obtain the value of the Durbin and Watson statistic and perform the corresponding
test. Indicate all the elements of the test, including the null and the alternative hypothesis.

c) Perform the Breusch and Godfrey test making use of the following information.

ût = −0, 5679 + 0, 0198 Rt + −0, 75 ût−1 + ω̂t R2 = 0, 433 (2)


(t − stat.) (−0, 603) (0, 0385) (−3, 338)

Indicate all the elements of the test, including the null and the alternative hypothesis.

d) Explain the consequences of the evidence found in the previous sections on:

i) the finite sample properties of the estimator of the parameters of the model. Prove
these properties.
ii) the inference based on the t-statistics shown in equation (1).

e) Would your answer to the previous section change if the detected problem was a conse-
quence of omitting a relevant variable? Explain in detail.

f) Consider the following information and fill in all missing data, (as indicated with dots).

ρ̂ -0,99 -0,9 -0,8 -0,7 -0,6 -0,5 -0,4 -0,3 -0,2 -0,1 0,0 0,1
RSS ∗ 15,9 14,8 14,2 14,1 14,7 15,8 17,5 19,9 22,8 26,2 30,3 34,9

where
t=....
X
RSS ∗ = {(Yt∗ − βˆ1 X1t

− βˆ2 X2t
∗ 2
} (3)
t=....

Yt∗ = Ct − ρ̂Ct−1 ; ∗
X1t = ....................; ∗
X2t = ....................

 −1  
 
βˆ1    .................. 
  =  .................. ..................   
   
βˆ2
.................. .................. ..................

i) what estimation method is being used here?


ii) How would you obtain the final estimates of β1 and β2 using this method? Indicate
the chosen value for ρ̂ and the formula to obtain the estimators of β1 and β2 . What
are the properties of these estimators?
iii) How would you test H0 : β2 = 1? Indicate all the elements of the test statistic and
the decision rule.

EXERCISE 23 (LE-2003.5) (Jun-2003)

Consider the following model for the supply of sugar cane in Bangladesh:

ln(At ) = α + β ln(Pt ) + ut (1)

26
where A is the area dedicated to the plantation of cane and P is the market price of cane. With
34 yearly observations for A and P we obtain the OLS estimated model:
dt ) = 6, 11 + 0, 97 ln(Pt )
ln(A R2 = 0, 706 (2)
d
(dev) (0, 17) (0, 11)

Furthermore, the following graphics have been obtained:

(a) Data
5.6 .8 (b) OLS r esid ua ls
.6
5.2
.4
4.8
.2

residuals
log(A)

4.4 .0

-.2
4.0
-.4
3.6
-.6

3.2 -.8
-2.8 -2.4 -2.0 -1.6 -1.2 -0.8 -0.4 5 10 15 20 25 30
Year
log(P)

and the following regressions based on OLS residuals,û:

ût = −0, 02 + 0, 012 ln(Pt ) + 0, 34ût−1 R2 = 0, 116 RSS = 2, 7


ût = −0, 38 + 0, 01t − 0, 18 ln(Pt ) + 0, 32ût−1 R2 = 0, 13 RSS = 2, 61
ê2t = 1, 32 − 0, 02t R2 = 0, 023 RSS = 46, 48
ê2t = 5, 20 − 0, 1t + 1, 74 ln(Pt ) R2 = 0, 10 RSS = 42, 76
ê2t = 5, 74 − 0, 11t + 1, 87 ln(Pt ) − 0, 18ût−1 R2 = 0, 13 RSS = 41, 21
êt = −0, 22 + 0, 01t R2 = 0, 001 RSS = 378, 62
êt = −3, 59 + 0, 08t − 1, 51 ln(Pt ) R2 = 0, 009 RSS = 375, 82
êt = 0, 51 − 0, 009t + 0, 17 ln(Pt ) − 0, 18êt−1 R2 = 0, 13 RSS = 0, 33
P
with êt = ût /σ̃ and σ̃ 2 = 2
t ût /34.

a) What information can be extracted from the figure of the data in a)?

b) What information can be extracted from the figure of the OLS residuals in b)?

c) Now, we want to check if the variances of the disturbances change over time. Perform an
appropriate test for this hypothesis, specifying all its elements.

d) Test if there exists autocorrelation in the model.

The following FGLS estimation has also been obtained:

27

dt ) = 6, 12 + 0, 97 ln(Pt )
ln(A RSS = 3, 052 σ̂t = 0, 30/ t (3)
d
(dev) (0, 18) (0, 14)

dt ) = 6, 82 + 1, 31 ln(Pt )
ln(A RSS = 5, 620 σ̂t = 5, 066 × t (4)
d
(dev) (0, 29) (0, 12)

dt ) = 6, 09 + 0, 94 ln(Pt )
ln(A RSS = 2, 642 ût = 0, 34ût−1 + et (5)
d
(dev) (0, 24) (0, 16)

dt ) = 6, 13 + 0, 98 ln(Pt )
ln(A RSS = 2, 532 ût = 0, 36ût−1 + 0, 002ût−2 + et (6)
d
(dev) (0, 25) (0, 17)

e) Explain how you would test if the price-elasticity is zero or not, stating clearly the estimator
you use and how it has been obtained. Use the information above to perform the test.

EXERCISE 24 (LE-2005.4) (Sep-2005)

In order to estimate a Cobb-Douglas production function for the farming sector in the U.S.A.
there is a database4 of yearly data for the period 1948-1993 on the next index variables (1982
= 100 for all of them):

• Yt = farm output

• Lt = farm labour

• EXt = size of the farm real estate

• Kt = expenditure in durable equipment (machinery stock)

The following model is specified, where all the variables are in logarithms

Yt = β1 + β2 Lt + β3 EXt + β4 Kt + ut (1)

The results of the Ordinary Least Squares estimation are:

Ŷt = 4, 112 − 0, 739 Lt + 1, 063 EXt − 0, 233 Kt (2)


d β̂i ))
(dev( (1,286) (0,039) (0,377) (0,077)
R2 = 0, 974 DW = 1, 304
ût = −0, 3215 − 0, 0068Lt + 0, 084EXt − 0, 007Kt + 0, 349ût−1 + ŵt (3)
2
R = 0, 1225

Figure 5 shows the series of OLS residuals.


4
Rammanathan, R. (2002), Introductory econometrics with applications, data 9-5.gdt

28
Figure 5: OLS residuals for model (2)

OLS residuals of the regression


0.08

0.06

0.04

0.02

0
residual

-0.02

-0.04

-0.06

-0.08

-0.1
1950 1955 1960 1965 1970 1975 1980 1985 1990

a) Explain how the residuals have been calculated. What information can be extracted from
Figure 5?

b) Perform the autocorrelation tests you consider relevant using all the information provided.
Explain in detail.

c) Is it reliable to test the significance of the Farm Labour factor using the information
provided in (2)? Why? How should the test statistic be modified if the the OLS estimator
is still used in order to estimate the parameter β2 ?

Not convinced by the estimation of the model in (1) the econometrician estimates again
the production function by the Hildreth-Lu method. The results (using the Gretl software)
are:

Model 1: Hildreth-Lu estimates using the 45 observations 1949-1993


Dependent variable: Y

VARIABLE COEFFICIENT STDERROR T STAT P-VALUE


const 3,70258 1,30555 2,836 0,007064 ***
L -0,741430 0,0434648 -17,058 0,000010 ***
EX 1,14724 0,378590 3,030 0,004219 ***
K -0,224659 0,0906423 -2,479 0,017399 **

d) Explain what Figure 6 is showing. What does it mean that the RSS is minimum at
ρ∗ = 0.35?

29
Figure 6: Hildreth-Lu RSS function. The RSS is minimum at ρ∗ = 0,35

0.2

0.18

0.16

0.14
RSS

0.12

0.1

0.08

0.06
-1 -0.5 0 0.5 1
rho

e) Explain how the estimates of the coefficients have been obtained.

f) Using the Hildreth-Lu estimates and knowing that the estimate of the variance and co-
variance matrix of the estimator of the coefficients is
 
1, 70446 0, 03642 −0, 47824 0, 07057
 0, 03642 0, 00189 −0, 012883 0, 00307 
Vdar(β̂HL ) = 
 −0, 47824 −0, 01283 0, 143331 −0, 02647 

0, 07057 0, 00307 −0, 02647 0, 00827

test the null hypothesis H0 : β3 = 2β4 . Explain all the elements of the test.

EXERCISE 25 (LE-2007.1) (Jun-2007)

An American consulting firm has signed a contract to produce a report on the relationship
between the number of patents and the expenditure in Research and Development (RD) in
the United States. The firm has got annual data for the period 1960 to 1993 of the following
variables5 :

• PATENTS: Number of patent applications filed, in thousands (Range 84.5 - 189.4).


5
Source: Ramanathan, Ramu (2002): Introductory Econometrics with Applications, file data3-3.

30
• R&D expenditures, billions of 1992 dollars (Range 57.94 - 166.7)

It is firstly considered the OLS estimation of the simple model

P AT EN T St = β1 + β2 RDt + ut t = 1, . . . , 34 (1)

Dependent variable: PATENTS

Coefficient Std. Error t-ratio p-value


const 34,5711 6,35787 5,4375 0,0000
RD 0,791935 0,0567036 13,9662 0,0000

S.E. of regression (σ̂) 11.17237


R2 0.859065
Durbin–Watson 0.233951

a) Interpret the estimated coefficient related to the variable RD. Has it got the expected
sign? Is it a significant variable?

b) Comment in detail the following three graphs.

Figure 7: PATENTS on RD, PATENTS on estimated PATENTS and OLS Residuals on time
PATENTS on R_D PATENTS on estimated PATENTS Residuals of the regression (= PATENTS - estimated PATENTS)
200 200 25
Y = 34,6 + 0,792X estimated
actual
20
180
180
15

160
10
160

140 5
PATENTS

PATENTS

residual

140 0
120
-5

120
100 -10

-15
80 100

-20

60
60 80 100 120 140 160 80 -25
R_D 1960 1965 1970 1975 1980 1985 1990 1960 1965 1970 1975 1980 1985 1990

31
Which problem does exist in the previous model? Explain why and comment the possible
consequences on the results shown here and those in the previous question.

After testing several specifications the consulting firm decides to choose one of the following two
models:

P AT EN T St = β1 + β2 RDt + β3 RDt2 + u1t (2)


P AT EN T St = α1 + α2 RDt + α3 RDt−4 + α4 RDt2 + u2t (3)

3. Are these two models linear? Why? Are both models dynamic? Why?

4. Write down the data matrix corresponding to each model.

The OLS estimation results of the two alternative specifications are:

MODEL A: P ATd EN T S t = 121.575 − 0.852 RDt + 0.00706 RDt2


d
(s.d.) (23.243) (0.429) (0.00183)
d N −W est
(s.d.) (27.615) (0.503) (0.002)
R2 = 0.904 DW = 0.284 BG(4) = 27.171

MODEL B: P ATd EN T S t = 135.887 − 1.789 RDt + 0.813 RDt−4 + 0.00790 RDt2


d
(s.d.) (22.493) (0.356) (0.097) (0.00160)
d N −W est
(s.d.) (30.555) (0.475) (0.120) (0.002)
R2 = 0.979 DW = 0.842 BG(4) = 11.974

Figure 8: Residuals of models A and B

Residuals of Model A Residuals of Model B


20 8

15 6

10 4

5 2
residual

residual

0 0

-5 -2

-10 -4

-15 -6

-20 -8
1960 1965 1970 1975 1980 1985 1990 1960 1965 1970 1975 1980 1985 1990

5. Do you think that the plots of the residuals in Figure 8 evidence any problem? Test it.

6. Why do you think that the Newey-West estimator of the standard deviations has been
used? Do you find its use reasonable in both specifications?

32
7. Using all the information provided, which one is the best specification to explain the
number of patents? Does the selected model include some dynamics?

8. Given the selected model, obtain the mean increment in the number of patents filed when
the expenditure in research and development in that year increases in one billion dollars,
all other factors remaining constant. Given the sample range, is the estimated increment
positive?

33
STOCHASTIC REGRESSORS

EXERCISE 26 (PV-G.18) (Jun-1995)

The next specification is proposed for the demand of wine in a particular country:

Qt = βPt + ut

where ut ∼ iid(0, 0.0921). Given that the price Pt is simultaneously determined with the de-
manded quantity Qt , it is suspected that Pt can be correlated with ut . Data on an index of
storage costs, St , which is exogeneously determined, and thus considered independent of ut , are
available.

Given the following quarterly data for the years 1955-1975:


P
P 2
P Pt2Qt = 1.78
P St = 2.1417
P Pt = 0.507 Pt St = 0.50
St Qt = 2.754

a) Use the Hausman test to check that conjecture, explaining in detail the testing procedure.

b) Given the result of the test, ¿which estimator of β would you choose? Why?

EXERCISE 27 (PV-G.22) (Feb-1996)

Consider the following model

Yt = βXt + ut

where ut ∼ iid(0, σu2 ) and Xt is non-stochastic. The variable Xt is not observable but there are
available observations from other variable, Xt∗ whose behaviour is similar to that of Xt , such
that:
Xt∗ = Xt + εt εt ∼ iid(0, σε2 )
where E(εt ut ) = 0 ∀t.

a) Show that if Xt∗ is used instead of Xt to estimate β by OLS in the model:

Yt = βXt∗ + vt t = 1, ..., T

the OLS estimator of β is not consistent.

34
b) What method of estimation can be used to obtain a consistent estimator of β? Write down
the formula for the proposed estimator and the conditions under which this estimator is
consistent.

EXERCISE 28 (LADE-1999.3) (Jun-1999)

The following model is to be estimated

Yt = βX1t + ut ut ∼ iid(0, σ 2 ) (1)

where X1t is known to be jointly determined with Yt as X1t = Yt + X2t and E(X2t ut ) = 0 ∀t.

a) Show that E(X1t ut ) = (1 − β)−1 σ 2 . It is assumed that β 6= 1.

b) What are the implications of this fact on the estimator of β in (1) by Ordinary Least
Squares (OLS)? Justify.

c) Write down explicitly the formula of an alternative estimator of β for this model justifying
your choice.

A sample of 60 observations is available where the following cross-products have been obtained:

Yt X1t X2t
Yt 100 40 -60
X1t 80 40
X2t 100

P
for instance Yt X2t = −60.

d) Obtain the estimate of β with the method proposed in c) and also by OLS.

e) Test, at the significance level of 5%, the H0 : β = 0. Assume that σ 2 = 1.

f) If the researcher ignores that X1t = Yt +X2t , how could he or she realize that E(X1t ut ) 6= 0?
Explain and perform the test. Assume that σ 2 = 1.

EXERCISE 29 (LE-2000.3) (Jun-2000)

The model Yt = βXt + ut is to be estimated and it is suspected that there may be unobservable
factors included in ut correlated with Xt .

35
a) If this suspicion is true, what are the implications for the properties of the OLS estimator
of β? Justify your answer in a formal way.

b) Under which conditions would Xt−1 be a good instrument for Xt in order to get an instru-
mental variables estimator of β? Give formal reasons for your answer.
A sample of 60 observations is available where the following cross-products have been
obtained:

Yt Xt Xt−1
Yt 50 20 -30
Xt 40 20
Xt−1 50
P
for instance, Yt Xt−1 = −30.

c) Using the variable Xt−1 as instrument for Xt , obtain the estimate of β by means of the
instrumental variables method.
P
d) What would have happened if Xt Xt−1 = 0?

e) Assuming that ut ∼ iid(0, 1), test the H0 : E(Xt ut ) = 0 explaining in detail the testing
procedure.

EXERCISE 30 (LE-2002.2) (Jun-2002)

Consider the model

Yt = β1 + β2 X2t + β3 X3t + ut ut ∼ iid(0, σ 2 )

where X2t is a fixed variable, X3t is a stochastic variable and β = (β1 , β2 , β3 )′ is the vector of
unknown parameters.

a) Why is the OLS estimator of β non-linear?

b) Which assumption does guarantee that the OLS estimator of β is unbiased? Prove it.

c) If X3t is stochastic and not independent of ut but E(X3t ut ) = 0, ∀t, is the OLS estimator
of β consistent? Show and indicate the additional assumptions that are necessary to get
this result.

d) If X3t is stochastic but the conditions on the Mann-Wald theorem hold, can we make
inference on β even if the distribution of ut is not known? Give rigorous reasons to
support your answer.

36
EXERCISE 31 (LE-2002.7) (Sep-2002)

Consider the following model


Y1t = β1 Y2t + β2 X1t + ut (1)
where X1t is a nonstochastic variable and it is believed that the variable Y2t can be correlated
with the disturbance term ut , which is assumed to be white noise, that is, ut ∼ iid(0, σu2 ). It is
also known that
Y2t = γX2t + εt (2)
where X2t is a nonstochastic regressor and εt ∼ iid(0, σε2 ).

A sample of 25 observations gives way to the following sums of squares and of cross-
products:

Y1t Y2t X1t X2t


Y1t 100 80 -60 60
Y2t 80 100 -40 -10
X1t -60 -40 80 50
X2t 60 -10 50 40

P P
where, for instance Y1t X1t = −60 and Y1t2 = 100

a) Obtain the estimates of β1 and β2 in equation (1) by Ordinary Least Squares.

b) Under the assumption that E(Y2t ut ) 6= 0, define a consistent estimator of β1 and β2 . Write
down formally the conditions that guarantee this property and explain if they hold in this
case.

c) Obtain the estimates of β1 and β2 with the estimator just proposed.

d) Under the assumption of σu2 = 1, use the Hausman test to check if there is evidence of
correlation between Y2t and ut . Explain the testing procedure, including the null and the
alternative hypotheses.

e) Given that last result, which estimator is preferable in this case? Why?

EXERCISE 32 (LE-2003.3) (Jun-2003)

Assume that the individual savings depend on the individual permanent income according to
the relationship:
Yi = α + βIi + vi (1)

37
where Yi are annual savings and Ii annual permanent income per worker. The permanent income
I cannot be observed, so the regression model to be estimated is:

Yi = α + βXi + ui (2)

where Xi is the worker’s annual income, used as an approximation to I. The results of the OLS
estimation of the model with data on 50 individuals for year 1999 are:
     
α̂ 4.34 2 ′ −1 0.7165 −0.009
= σ̂OLS (X X) = 1.023 ×
β̂ OLS −0.856 0.0001

i) Economic Theory says that the permanent income-savings relationship is positive. How-
ever, the OLS estimation of the slope β is actually negative. Can you find an explanation
for this apparent contradiction? Reason your answer.

Later, model (2) is re-estimated by instrumental variables. The variable used as instrument is
the average income obtained during the past 10 years (1989-98), which is obviously strongly
related to the permanent income and also to the current annual income. The results are:
     
α̃ 0.988 2 ′ −1 ′ ′ −1 1.7088 −0.0223
= σ̃IV (Z X) Z Z(X Z) = 1.3595 ×
β̃ IV 0.039 0.0003

2 ?
ii) What is the expression for β̃IV ? And for σ̃IV

iii) Run the Hausman test. Relate the result you obtain with your answer to question i).

EXERCISE 33 (LE-2003.4) (Jun-2003)

In order to evaluate the returns of education the following model is specified

Yi = β1 + β2 EDUi + wi i = 1, ..., N

where Yi and EDUi are wage earnings per year (in tens of thousands of euros) and education
level of individual i, respectively. Furthermore, E(EDUi wi ) = 0 for all i and wi is a white noise.

The sample consists of 1000 individuals. However, the education level is approximated by
the observable variable Si , years of education. Such variable is measured with error, as Si =
EDUi + εi where εi is a white noise independent of EDUi and wi .

Using Ordinary Least Squares (OLS), the following results have been obtained:

Ŷi = 2, 431 + 0, 03332 Si


d
(dev.) (0,078) (0,0046)

a) Give an interpretation of the estimates of the parameter β2 .

38
b) Explain in detail the properties of the OLS estimator of β1 and β2 if Si has been used
instead of EDUi in the model. Reason your answer.

We have now information on an additional variable, Pi , measuring the years of education of the
father of individual i. For the sample of 1000 individuals we have the following information:
P P P P 2
i Yi = 2988, 232 i Si = 16707 i Yi Si = 50071, 6 i Si = 283539
P P P P 2
i Pi = 14343 i Yi Pi = 42914, 7 i Pi Si = 240466 i Pi = 206469
P 2
i Yi = 9028, 9

c) Propose a consistent estimator alternative to OLS. What conditions guarantee its consis-
tency? What is its asymptotic distribution? Justify your answer.

d) Estimate β1 and β2 using the estimator proposed just previously.

e) If a consistent estimator has been used, describe how the following estimate of the asymp-
totic variance and covariance matrix of the estimator proposed in c) has been obtained.
Indicate all steps leading to this result.
 
98, 88 0, 2984084 −0, 0178
Vd
ar(β̂) =
998 −0, 0178 0, 001065

f) Using the estimator proposed in c), test the hypothesis that an additional year of education
implies an average increment of 720 euros in the annual earnings. Write down the null
hypothesis, the alternative hypothesis and all the elements of the test.

g) Run the Hausman test to analyse if the problem of measurement error is important or not.
Write down the null and the alternative hypotheses as well as all the elements of the test.

h) Indicate, with adequate reasoning, which one of the two estimators you would choose,
taking into account the result of the Hausman test.

EXERCISE 34 (LE-2004.3) (Jun-2004)

We want to estimate the parameter β in the following equation

y1t = βy2t + u1t u1t ∼ N ID(0, σ12 ) (1)

It is known that y1t and y2t are simultaneously determined as

y2t = α1 y1t + α2 Xt + u2t u2t ∼ N ID(0, σ22 ) (2)

where Xt is an exogeneous variable, independent of u1s and of u2s for all t and s.

39
a) Obtain the expression of the instrumental variables (IV) estimator of β using Xt as in-
strument.

b) Is this estimator linear? Is it unbiased? Why?

c) Is it consistent? Why?

d) Do you know its distribution? And the asymptotic one? Why?

e) Would any of the previous answers change if α2 = 0 ? Reason your answer.

f) With a sample of size T = 1000 it is obtained that:

P 2
P P P 2
P P 2
t y2t = 42 t y1t y2t =5 t y2t Xt = 12 t Xt = 10 t Xt y1t =3 t y1t = 11

Furthermore, a consistent estimator of σ12 is available, with σ̂12 = 0, 01. Use the Hausman
test to decide if there exists or not statistical evidence of y2t being an endogeneous variable.
Explain the process in detail.

EXERCISE 35 (LADE-2004.4) (Jun-2004)

The following model is proposed to analyse the consumption in a country:

Yt = β0 + β1 X1t + β2 X2t + ut , t = 1, 2, ..., 100

where Yt , X1t and X2t are the consumption growth rates, interest rate and inflation at period t
respectively. It is assumed that ut ∼ iid(0, σ 2 ). X1t is assumed to be nonstochastic, but inflation
is determined by the demand for consumption and is thus stochastic. In addition, information
on the growth rate of the costs of production Pt (nonstochastic) is also available.

The model has been estimated by OLS with the results:

Ŷt = 0.046 − 0.021X1t − 0.055X2t (1)

a) When is the estimator in equation (1) inconsistent?

b) The following sample information is available:


   
0,010 0,012 0,000 0,011 0,000 0,003

(X X) =−1  0,012 0,011 -0,033  ′ −1
(Z X) =  -0,034 -0,012 0,000 
0,000 -0,033 0,022 -0,023 0,000 -0,032
     
0,012 -0,033 -0,033 1,0 1,0
(Z ′ X)−1 Z ′ Z[(Z ′ X)−1 ]′ =  -0,033 0,118 0,051  Z ′ Y =  3,0  X ′ Y =  3,0 
-0,033 0,051 0,188 1,8 2,0

40
   
100 -14 -16 0,012 -0,030 0,059

ZZ=  -14 95 -15  ′ −1 ′ ′ −1 ′
(X Z) Z Z[(X Z) ] =  0,002 0,008 -0,010 
-16 -15 155 -0,006 -0,033 0,142

If Z is the matrix of instruments, estimate the model using instrumental variables. Write
down the matrix Z and the instrument that is used, explaining why it has been chosen as
instrument. What are the properties of this estimator?
P 2
c) If ût,IV = 2.037, how would you test if the OLS estimator is consistent? Explain the
procedure and test that hypothesis. Based on the result of the test, what method of
estimation would you choose? Why?

EXERCISE 36 (LE-2008.4) (Sep-2008. Final examination.


Written test.)

Consider the regression model

Y i = β 1 + β 2 Xi + u i i = 1, . . . , N, (1)

where Xi is stochastic, ui ∼ N (0, σ 2 ), E(ui uj ) = 0 for i 6= j and where E(Xi ui ) = 0.9.

a) Which problem does exist in this model? How could it be detected? Explain in detail the
proposed test and the consequence of rejecting or not the null hypothesis.

b) What are the consequences of the use of the OLS estimator on the tests of hypotheses
about β1 and β2 ? And of the use of the IV estimator? Justify in detail your answer.
With a sample of 500 observations we obtain the following results of sums of squares
and of cross-products6 :

1 Yi Xi Z1i
1 500 1530.17 14.48 -0.23
Yi 7163.54 1551.83 448.79
Xi 1037.57 451.24
Z1i 509.40
P P
where, say, Yi Xi = 1551.83 and Yi = 1530.17

c) Using this information fill in the blank elements inside the matrices below in order to
obtain the IV estimates of β1 and β2 , considering Z1 as the unique instrument:
6
Source: file vacation.dat from the book Undergraduate Econometrics by Hill, Griffiths and Judge (2001).

41
 −1  
   
   
     
    3.03
   
     
     
βbIV =





=
 


     0.996 
   
   
   
   
   

The following estimation of the variance-covariance matrix of the IV estimator of β1 and


β2 has been obtained:
 
0.00203608 -0.000074
Vd
ar(β̂IV ) = .
0.00254410
Fill in the equation for the estimated model:

Ŷt = ...
d β̂IV ))
(dev( ( ) ( )

d) Under which conditions is the above IV estimator consistent? Is it an asymptotically


efficient estimator? Reason your answer.
e) In order to test H0 : β1 = 3 β2 = 1, write down the test statistic and its distribution
under the null hypothesis. Give details of all the elements of the test statistic.
Test the mentioned hypothesis using the information below and the estimator in c).

Restriction set
1: b[const] = 3
2: b[X] = 1
Test statistic: chi^2(2) = 0.490224, with p-value = 0.782617

f) An alternative estimator to that in c) has been considered, obtaining the following results
(using Gretl):

Model 2: TSLS estimates using 500 observations 1–500


Dependent variable: Y
Instruments: const Z1 Z2
Variable coefficient std. error t-ratio p-value
const 3.03113 0.0445796 67.9936 0.0000
X 1.00899 0.0448997 22.4721 0.0000

Explain, step by step, the process leading to the calculation of this estimator. Is it better
than the previous one?

42
DYNAMIC MODELS

EXERCISE 37 (PV-E.42) (Jun-1997)

The following model has been proposed to analyse the dependency of the Madrid Stock Exchange
market on their New York and London counterparts

M ADt = β0 + β1 LONt−1 + β2 N Yt−1 + ut with t = 2, . . . , 30.

Its OLS estimation provides the following results:

Md
ADt = 0, 0095 + 0, 4990 LONt−1 + 0, 1800 N Yt−1 DW = 0, 82 R2 = 0, 88 (1)
(St. deviations →) (0, 0032) (0, 1200) (0, 1900)

a) Test the individual significance of the explanatory variables.


b) Test the existence of AR(1) autocorrelation in the disturbances (specify clearly the null
hypothesis, the alternative, the testing statistic and the decision rule).

Later the explanatory variable M ADt−1 is added to the model, which is then estimated
with the same data:

M ADt = 0, 0031 + 0, 1910 M ADt−1 + 0, 8400 LONt−1 + 0, 0600 N Yt−1 + v̂t (2)
(0, 0012) (0, 0800) (0, 2460) (0, 0120)

with DW = 1, 9 and

v̂t = 0, 0001 + 0, 03 v̂t−1 + 0, 009 M ADt−1 + 0, 04 LONt−1 + 0, 006 N Yt−1 + êt R2 = 0, 09


(0, 002) (0, 09) (0, 3) (0, 1) (0, 03)

c) Test the hypothesis of AR(1) autocorrelation in vt .


d) After the results obtained in b) and c), what could you conclude about the validity of the
models (1) and (2)?

EXERCISE 38 (LADE-1998.6) (Sep-1998)

Three researchers want to estimate the following model:

Yt = β1 Yt−1 + β2 Xt + ut (1)
where: Yt is the sale price of a first hand flat at time t.
Xt is the interest rate at time t.
We have the following information:

43
• The model is correctly specified.

• The disturbance ut follows a normal distribution with E(ut ) = 0 ∀t

The three researchers do not agree about what the best estimation method is, proposing three
different alternatives:

1st researcher: The following results are obtained with t = 2, . . . , 101


   P 2 P −1  P 
β̂1 P Yt−1 PYt−1 Xt P Yt−1 Yt
= (2)
β̂2 Yt−1 Xt Xt2 Xt Y t

    
0.831371 0.00046 −0.00134 4442.139
= (3)
0.882068 −0.00134 0.0076 903.487

where BG(1) = 23.24 RSS = 157.43

a) What method of estimation has been used? Reason.

b) What are the properties of the estimators? Perform some test if you think it is necessary.

2nd researcher: The following results are obtained with t = 2, . . . , 101


   P P −1  P 
β̂1 P X t−1 Y t−1 X
P 2t−1 X t P X t−1 Y t
= (4)
β̂2 Xt Yt−1 Xt Xt Y t

    
0.770343 0.003809 −0.00291 0.770343
= (5)
1.060368 −0.01112 0.012178 903.0487
where BG(1) = 27.66 RSS = 165.5112

c) What method of estimation has been used? Reason.

d) What are the properties of the estimators? Perform some test if you think it is necessary.

3rd researcher: The following results are obtained with t = 3, . . . , 101

Let Yt∗ = (Yt − ρ∗ Yt−1 ), Xt∗ = (Xt − ρ∗ Xt−1 ),

   P ∗2 P ∗ −1  P ∗ 
β̂1 P Y∗ t−1 ∗ Yt−1 Xt∗
P PYt−1 Yt∗
= (6)
β̂2 Yt−1 Xt Xt∗2 Xt∗ Yt∗

    
0.775642 0.001035 −0.00117 1014.806
= (7)
1.090742 −0.00117 0.00938 245.7676

44
P
ût ût−1
ρ∗ = P 2 = 0.5387823 (8)
ût−1

where û = Y − X β̂IV BG(1) = 0.27 RSS = 118.0408

e) What method of estimation has been used? Reason.

f) Given all previous answers, which researcher has used the best estimator? Reason your
answer.

EXERCISE 39 (LE-2000.1) (Jun-2000)

A professional report proposed two possible models to explain the evolution of the demand of
petrol. There exist quarterly data from 1959 to 1990 (both years included) for the following
variables, all measured in logarithms:

• Y = Expenditure in petrol per capita, in constant terms.

• X2 = Price of petrol in constant terms. Non-stochastic.

• X3 = Per capita net income, in constant terms. Non-stochastic.

• X4 = Miles per gallon of petrol. Non-stochastic.

The first model is:


Yt = β1 + β2 X2t + β3 X3t + β4 X4t + ut (1)

The results of the OLS estimation are:

Ŷt = − 1, 51 − 0, 14 X2t + 0, 998 X3t − 0, 52 X4t


d
(dev) (0, 12) (0, 01) (0, 015) (0, 02)
2
R = 0, 97 DW = 0, 74

ût = − 0, 01 − 0, 003 X2t − 0, 004 X3t + 0, 004 X4t + 0, 62 ût−1 − 0, 007 ût−2
d
(dev) (0, 09) (0, 008) (0, 012) (0, 004) (0, 09) (0, 107)

+ 0, 005 ût−3 + 0, 087 ût−4 + ê1t


(0, 107) (0, 09)
2
R = 0, 42 DW = 2, 03

The second model is:


Yt = γ1 + γ2 X2t + γ3 X3t + γ4 X4t + +γ5 Yt−1 + vt (2)

45
The OLS results are:

Ŷt = − 0, 65 − 0, 06 X2t + 0, 47 X3t − 0, 24 X4t + 0, 54 Yt−1


d
(dev) (0, 13) (0, 01) (0, 06) (0, 03) (0, 09)
2
R = 0, 98 DW = 1, 76

v̂t = − 0, 24 − 0, 02 X2t + 0, 13 X3t − 0, 072 X4t − 0, 14 Yt−1 + 0, 22 v̂t−1 + 0, 128 v̂t−2


d
(dev) (0, 17) (0, 02) (0, 09) (0, 047) (0, 09) (0, 12) (0, 101)

+ 0, 105 v̂t−3 + 0, 118 v̂t−4 + ê2t


(0, 091) (0, 09)
2
R = 0, 067 DW = 2, 01

a) Based on the results for model (1), do you think that the basic assumptions are verified
in that model? Perform any tests you judge relevant.

b) Describe in detail the properties of the OLS estimator in model (1).

c) Based on the results of model (2), do you think that the basic assumptions are verified in
that model? Perform any tests you judge relevant. Reason your answer.

d) Describe in detail the properties of the OLS estimator in model (2).

e) How would you test the hypothesis that the income-elasticity is equal to 1? Explain all the
elements of the test, such as the used model, the null and the alternative hypothesis, the
used estimator, the statistic, its distribution under the null hypothesis and the decision
rule. If you have got enough information, perform the test.

EXERCISE 40 (LADE-2004.6) (Sep-2004)

Consider the model


Yt = β1 + β2 Xt + β3 Yt−1 + ut t = 1, . . . , T (1)

where Xt is a non-stochastic regressor. The OLS estimated model is:

Ŷt = 17.86 + 0.27Xt − 0.79Yt−1 t = 2, . . . , 51

The following table shows the 8 first observations of the variables Yt , Xt and ût,OLS :

46
t Yt Xt ût,OLS
1 8,5 11
2 8,9 13
3 16 14
4 7,8 14,9
5 16,4 15,1 0,625
6 7,9 18 -1,864
7 18 18,8 1,304
8 8 19,1 -0,797
.. .. .. ..
. . . .

a) Using the observations in the table, obtain the initial OLS residuals. Analyse graphically
the possible presence of first order autocorrelation in the disturbances. Explain how you
would test this assumption in a formal way.

b) If the null hypothesis in the previous question is rejected and assuming that the dis-
turbances follow an AR(1) process, that is, ut = ρut−1 + ǫt ǫt ∼ iid(0, σǫ2 ), show the
properties of the OLS estimator of the parameters in equation (1).

We have also the following sample information:


P51 P51 P51
Xt = 3323, 4 Yt = 1022 Yt−1 = 998, 5
Pt=2
51 Pt=2
51 Pt=2
51
t=2 Xt Yt = 77268, 38
P51 Pt=2
Yt Yt−1 = 14146, 83
Pt=2
Xt Yt−1 = 75652, 8
51 51
(Xt )2 = 281168, 2 Xt Xt−1 = 272614, 67 (Y 2
t−1 ) = 31068, 07
Pt=2
51 Pt=2
51 Pt=2
51
t=2 Xt−1 = 3205, 4 t=2 Xt−1 Yt−1 = 73233, 88 t=2 Xt−1 Yt = 74499, 05
 
0,103060 -0,000948 -0,001003
(X ′ X)−1 =  -0,000948 0,000019 -0,000015 
-0,001003 -0,000015 0,000103

   
-0,233 0,203 -0,207 -0,233 -0,0062 0,033
(Z ′ X)−1 =  -0,0062 0,0032 -0,0032  (X ′ Z)−1 =  0,203 0,0032 -0,021 
0,033 -0,021 0,021 -0,207 -0,0032 0,021

c) If Z is the matrix of instruments, estimate the model by Instrumental Variables where


Xt−1 is the instrument for Yt−1 . Explain in detail all the properties of such estimator.

d) Do you think that the previous estimator solves the autocorrelation problem? Reason your
answer.

We have the following additional information:


P 2 P ∗ P ∗ ∗
û = 3353, 54 X = 4627, 25 X Y = 148191, 84
P t−1,OLS P ∗t P t∗ t−1
∗ = 151394, 54
P 2t,OLS t−1,OLS = 1331, 60
û û P t∗ = 1421, 21
Y X Y
P ∗t t∗
û = 477634, 63 Y = 1388, 42 Y Y = 41014, 33
P t−1,IV P t−1∗ 2
P t ∗ t−1
2
ût,IV ût−1,IV = −196899, 12 (Xt ) = 550599, 31 (Yt−1 ) = 46920, 97

47
where Yt∗ = Yt − ρ̂Yt−1 , Xt∗ = Xt − ρ̂Xt−1 , Yt−1 ∗ = Yt−1 − ρ̂Yt−2 and ρ̂ is a
consistent estimator of the parameter of the first order autoregressive process.
∗ in
e) Describe the consistent estimation of the parameter ρ used to obtain Yt∗ , Xt∗ and Yt−1
the above expressions.

f) With the above information, is it possible to estimate the parameters of equation 1 im-
proving the properties of the IV estimator? Describe the proposed method and place the
above sums in the matrix formulae of the corresponding estimator (but do not perform
any calculations).

g) How would you test the null hypothesis H0 : β2 = 1? Describe in detail all the elements
intervening in the test.

EXERCISE 41 (LE-2006.4) (Sep-2006)

A farmer wants to measure the relationship between the amount of collected strawberries in
kilograms Q and the number of employed labourers L. An analysis is outsourced to a professional
econometrician who specifies the following model:

Qt = β 1 + β 2 L t + u t t = 1970, . . . , 2004 (1)

where Lt is non-stochastic and ut has a normal distribution. The OLS estimation provides:

bt
Q = 1115, 93 − 2, 4462 Lt R2 = 0, 8594 DW = 0, 3210 T = 35 (2)
(t-stat) (36,62) (-14,20)

The following regressions are also available, where ût are the OLS residuals from (2):

ût = 31, 25 − 0, 1814Lt + 0, 8958ût−1 + ζ̂1t RSS = 26981, 8 R2 = 0, 7041 (A)

ût = 1, 1397 + 0, 8958ût−1 + ζ̂2t RSS = 29807, 6 R2 = 0, 6731 (B)

û2t
(û′ û/35) = 0, 4432 + 2, 2378Lt + ζ̂3t RSS = 70, 4985 R2 = 0, 0427 (C)

û2t
(û′ û/35) = 1, 7899 + 0, 9955ût−1 + ζ̂4t RSS = 55, 2297 R2 = 0, 0577 (D)

and the following plots:

48
Q and estimated Q Residuals from regresssion (= Q - estimated Q)
1000 150
estimada
actual

900
100

800

50

residual
Q

700

600

-50
500

400 -100
1970 1975 1980 1985 1990 1995 2000 1970 1975 1980 1985 1990 1995 2000

1. Is the sample composed of cross-section or time-series data? Why?

2. Interpret the coefficient β2 . Which sign is expected?

3. Comment on the plot featuring the actual and the fitted values of the endogeneous variable.
Is it a good fit? Comment on the residuals plot. Given both graphics, do you think that
the model satisfies all basic assumptions?

4. Based on the provided information verify if the disturbances satisfy the basic assumptions.

5. Given the evidence you have found, explain its consequences on the OLS estimator of the
coefficients and the reliability of the statistics shown above.
6. Based on the results obtained above the econometrician estimates model (1) using an
estimator which is thought to be more adequate in this context. The results are:

Model 2: Cochrane–Orcutt estimates using 34 observations 1971–2004


Dependent Variable: Q
final iteration ρ̂ = 0,976619

Variable Coefficient Std. error t Statistic p value


const 1456,54 186,561 7,8073 0,0000
L 2,74197 1,13652 2,4126 0,0217

Which estimation method is used here? Specify all the steps needed in order to obtain
these results. Why is it more adequate than the previous one? Reason your answer based
on the properties of the estimator.

After a conversation with the farmer, it is known that a good harvest is generally followed by
another good one and reversely, that is, a poor harvest is likely to be succeeded by another bad
one. This makes the econometrician to think that the amount of strawberries harvested in the
previous season could affect the current one. Then the econometrician specifies and estimates
the following model:

Qt = α1 + α2 Lt + α3 Qt−1 + wt (3)

Results of the estimation:

49
Model 3: OLS estimates using 34 observations 1971–2004
Dependent variable: Q

Variable Coefficient Std. error t statistic p value


const 90,9866 99,9536 0,9103 0,3697
L -0,230355 0,0115477 -1,9948 0,0470
Qt−1 0,944638 0,0898926 10,5085 0,0000

Durbin-Watson statistic = 3,10304

There exist also the following auxiliary regressions:


ŵt = 21, 32 − 0, 1766Lt + 0, 8788ŵt−1 + η̂1t RSS = 25671, 3 R2 = 0, 4734 (E)

ŵt = 1, 7943 + 0, 2398ŵt−1 + 0, 5647Qt−1 + η̂2t RSS = 23398, 1 R2 = 0, 4767 (F )

ŵt = −255, 47 + 0, 579406Lt + 0, 231059Qt−1 − 0, 804475ŵt−1 + η̂3t RSS = 10958, 4 R2 = 0, 4869 (G)

ŵt2
(ŵ′ ŵ/34) = 0, 4432 + 2, 2378Lt + η̂4t RSS = 77, 8328 R2 = 0, 05665 (H)

ŵt
(ŵ′ ŵ/34) = 3, 9229 + 2, 2552ŵt−1 + 0, 3463Qt−1 + η̂5t RSS = 50, 0805 R2 = 0, 0064 (I)

7. Perform the tests you think relevant and calculate (or explain in detail) the following
equalities:
E(wt ) =

E(wt2 ) =

Cov(wt , ws ) =

E(Lt wt ) =

E(Qt−1 wt ) =

E(β̂OLS ) =
8. What can you say about the Mann and Wald’s theorem and the consistency of the OLS
estimator?
9. In order to test that the harvest from the previous season is a relevant factor to explain
the current harvest, a consistent, asymptotically efficient and valid for inference
estimation has been implemented in model (3) with the results:
Qt − ρ̂Qt−1 = 25, 28 (1 − ρ̂) + 0, 064 (Lt − ρ̂Lt−1 ) + 1, 067 (Qt−1 − ρ̂Qt−2 ) + ǫ̂t (4)
d α̂i )) | {z } (0,125) | {z } (0,048)
(dev(
| {z } Xt∗ L∗t
Q∗t

R2 = 0, 981 DW = 1, 98

50
where ǫt is a white noise such that ǫt = wt − ρwt−1 and wt are the disturbances of model
(3).

Fill in the blanks in the following expressions:

a) ǫt ∼ ( , )

b)
   −1
α̂1   ............ ............ ............  
  25, 28   ............
       
 α̂2     ............ ............ ............   
  = =   ............ 
   0, 064     
       
 α̂3   ............ ............ ............ 
1, 067 ............
.........

c) Which consistent estimator of ρ has been used? Detail all the elements and conditions
that guarantee the consistency of the estimator of ρ.

d) Is it true that the harvest of the previous season is a factor determining the current harvest?
What are the implications on the result?

EXERCISE 42 (LE-2008.2) (Jun-2008. Final examination. Ap-


plied test.)

The ALIMENTAX S.A. food store wants to implement an expansion policy inside its region. For
that purpose it has requested a management report on the consumption function in such area.
The data available7 are yearly from 1959 to 1994 with observations of the following variables:

C: Real consumption in billion dollars.


W: Real wages in billion dollars.
P: Other Income (no wages), in constant terms, in billion dollars.

The manager estimates by ordinary least squares the following model:

Ct = β1 + β2 Wt + β3 Pt + ut t = 1, . . . , T (1)

with the following results:

Model 1: OLS estimates using 36 observations 1959–1994


Dependent variable: C
7
Ramanathan, R. (2002), Introductory Econometrics with Applications, ed. South-Western.

51
Coefficient Std. Error t-ratio p-value
const −222,158 19,5527 −11,3620 0,0000
W 0,693262 0,0326064 21,2615 0,0000
P 0,735916 0,0488218 15,0735 0,0000

Residuals Sum of Squares 38976.50 R2 0.998754


F (2, 33) 13230.34 Durbin–Watson 0.969426
ρ̂ 0.494451 BG(1) 9.621

PART 1:

a) What does the sentence the variables are measured in constant terms mean?
b) Interpret the parameter β2 :
c) Comment on the plot of the residuals below.

Residuals from regression (= C − estimated C)


60

40

20

0
residual

−20

−40

−60

−80

−100
1960 1965 1970 1975 1980 1985 1990

d) Do all basic assumptions on the disturbances hold? Analyse the displayed results and fill
in the blank elements in the matrices below according to the test or tests performed:
   
   
   
   
   
E(u) = 


 E(uu ) = . . . . . . 




   
   
   

e) Assuming that Wt and Pt are non-stochastic, what are the properties of the OLS estimator
of the coefficients in model (1)? Justify your answer.

PART 2:

52
The manager is not convinced with the model specification and decides to consider two
alternative specifications.
• Specification A:

Ct = β1 + β2 Wt + β3 Wt−1 + β4 Pt + ut t = 2, . . . , T

Estimating by OLS:

Specification A: OLS estimates using 35 observations 1960–1994


Dependent variable: C

Coefficient Std. Error t-ratio p-value


const −223,323 21,9777 −10,1613 0,0000
W 0,618833 0,113718 5,4418 0,0000
W1 0,0839831 0,108643 0,7730 0,4454
P 0,725303 0,0494033 14,6813 0,0000

Sum squared resid 36407.32 R2 0.998754


F (3, 31) 8284.444 Durbin–Watson 0.949518
ρ̂ 0.493482

f) Comment the following statements justifying if they are true or false:

i) “The OLS estimator used in Specification A is non-linear”.


ii) “The variance-covariance matrix of the estimated OLS coefficients in specification A
is V (β̂) = σ 2 (X ′ X)−1 ”. Perform any test you consider necessary.

• Specification B:

Ct = β1 + β2 Wt + β3 Pt + β4 Ct−1 + ut t = 2, . . . , T (2)

Estimating by OLS the following results are obtained:

Results 1: OLS estimates using 35 observations 1960–1994


Dependent variable: C

Coefficient Std. Error t-ratio p-value


const −155,770 33,1278 −4,7021 0,0001
W 0,513348 0,0766851 6,6942 0,0000
P 0,535774 0,0835316 6,4140 0,0000
C1 0,270081 0,100359 2,6911 0,0114

Residuals Sum of Squares 30081.45 R2 0.998971


F (3, 31) 10028.76 Durbin-Watson statistic 1.00858
ρ̂ 0.481818 BG(1) 8.704344
BG(4) 12.040592 Hausman 11.7299

And the following plot of the OLS residuals:

53
Residuals from regression (= C − estimated C)
60

40

20

0
residual

−20

−40

−60

−80
1960 1965 1970 1975 1980 1985 1990

g) Given all the results of the previous estimation, what can you say about the validity of
the displayed significance tests?

PART 3:

After analysing Results 1, the manager re-estimates model (2) obtaining:

Results 2: TSLS estimates using 35 observations 1960–1994


Dependent variable: C
Instruments: W 1
Coefficient Std. Error t-stat p-value
const −202,339 38,7791 −5,2177 0,0000
W 0,632776 0,0918106 6,8922 0,0000
P 0,655223 0,0981252 6,6774 0,0000
C1 0,0998823 0,122778 0,8135 0,4159

Residuals Sum of Squares 32872.29 F (3, 31) 9175.339


ρ̂ 0.475857 Durbin–Watson 0.993249

h) Given the method of estimation used, fill in the blanks:


 −1  
   
   
   
   
   
   
   
   
   
β̂...... =   
   
   
   
   
   
   
   
   

54
i) Why has the manager used such method of estimation ?

The following results are also available:

Results 3: TSLS estimates using 35 observations 1960–1994


Dependent variable: C
Instruments: P 1 W 1
Coefficient Std. Error t-stat p-value
const −207,249 38,3003 −5,4111 0,0000
W 0,645366 0,0903268 7,1448 0,0000
P 0,667815 0,0968543 6,8950 0,0000
C1 0,0819400 0,120362 0,6808 0,4960

Residuals Sum of Squares 33491.72 F (3, 31) 9005.577


ρ̂ 0.473424 Durbin–Watson 0.995420

j) Explain the meaning of:


Instruments: W 1 P 1

k) Describe step by step the procedure that the manager has followed to obtain these results.
What is the difference between this estimator and that from Results 2?

l) Among the three results obtained for model (2), which one do you think is the best?
Reason your answer.

55
EXAMS

EXERCISE 43 (GE.1) (May-2013)

A company dedicated to the assembly, sale and installation of windows wants to analyse its
sales. A sample of monthly observations from January 2005 to December 2011 is available on
the quantity of windows sold (V , in thousands of units) and the average price for window (P , in
dozens of Euros). The economist of the company assumes that Pt is non stochastic and proposes
the following model to be estimated with the available information:

Vt = α1 + α2 Pt + wt . (1)

The results obtained are:

OLS, using observations 2005:01–2011:12 (T = 84)


Dependent variable: V

Coefficient St. error t statistic p value


const 81.4536 12.8121 6.3576 0.0000
P −0.486483 0.127807 −3.8064 0.0003

Residual sum of squares 30179.60 R2 0.150160


F (1, 82) 14.48872 p value (of F ) 0.000271
ρ̂ 0.321033 Durbin-Watson 1.333627

70 V observed and estimated


110
60 estimated
100 observed
50
90
40
80
30 70
residual

20 60
V

10 50
40
0
30
-10
20
-20 10
-30 0
2005 2006 2007 2008 2009 2010 2011 2012 2005 2006 2007 2008 2009 2010 2011 2012

1) Given all the available information, are all the basic assumptions on the disturbances satis-
fied? Explain your response based on the figures and the possible tests you may implement.

2) How could we know, using the OLS estimator, if price is a statistically significant variable?
Explain in detail the testing procedure you propose.

56
Given the results above the economist decides to take into account that sales and subsequent
installation of windows are more frequent in the hot months (July, August and September) than
in other months, and includes monthly dummy variables:

Vt = β1 + β2 Pt + β3 dm7t + β4 dm8t + β5 dm9t + ut (2)

where the dummy variables dm7t , dm8t and dm9t are equal to one if the observation at time t
is in July, August or September respectively and zero otherwise.

OLS, using observations 2005:01–2011:12 (T = 84)


Dependent variable: V

Coefficient St. error t statistic p value


const 71.8093 12.5757 5.7102 0.0000
P −0.422747 0.123919 −3.4115 0.0010
dm7 5.89463 7.29872 0.8076 0.4217
dm8 22.3907 7.39779 3.0267 0.0033
dm9 11.8055 7.30058 1.6171 0.1098

ût = −0.04 + 0.012Pt + 0.34ût−1 R2 = 0.1160


ût = 0.84 − 0.008Pt + 0.11dm7t − 0.065dm8t − 0.0003dm9t + 0.269ût−1 R2 = 0.0736
ût = −0.77 + 0, 012Pt + 0.41dm7t + 0.023dm8t − 0.00453dm9t + 0.69Vt−1 R2 = 0.0464
ût = 0.89 + 0.45ût−1 R2 = 0.1730

where ût is the OLS residual in model (2).

3) Do you think that the inclusion of the dummy variables has influenced the characteristics
of the disturbances?

4) Given your answer in the previous question, what are the properties of the estimator of the
coefficients used in Model (2)?, what are the properties of the estimator of the standard
deviation of β̂? Explain in detail your answer.

The economist has some doubts about the method of estimation to be used. Thus he/she per-
forms some trials. The results of the estimation in the first trial are:

FIRST TRIAL

Calculating rho iteratively...


ITERATION RHO RSS
1 0.26913 23905.4
2 0.27711 23903.7
3 0.27734 23903.7
Cochrane–Orcutt, using observations 2005:02–2011:12 (T = 83)

57
Dependent variable: V
ρ̂ = 0.277338

Coefficient St. error t statistic p value


const 73.8902 12.5904 5.8688 0.0000
P −0.435264 0.123017 −3.5382 0.0007
dm7 5.21858 6.93735 0.7522 0.4542
dm8 20.8686 7.37907 2.8281 0.0059
dm9 8.61689 6.93381 1.2427 0.2177
Durbin-Watson=2.103826
Breush-Godfrey(first order autocorrelation)=0.4019

5) Which is the method of estimation used by the economist in this first trial? Explain in
detail.

6) What is the improvement in the estimation of Model (2) that you expect to achieve with
this strategy?

The economist suspects that price may be a stochastic variable. The second trial gives the
following results:

SECOND TRIAL

IV, using observations 2005:02–2011:12 (T = 83)


Dependent variable: V
Instruments: const P 1 dm7 dm8 dm9
Coefficient St. error z p value
const 54.0866 46.9992 1.1508 0.2498
P −0.240913 0.471228 −0.5112 0.6092
dm7 5.10140 7.42620 0.6869 0.4921
dm8 23.7834 8.68417 2.7387 0.0062
dm9 11.8352 7.43271 1.5923 0.1113
Durbin-Watson=1.495457
Breush-Godfrey(first order autocorrelation) = 17.823

7) Which method of estimation is the economist using in this second trial? Describe it in
detail and write down the elements in the matrices below needed to get the estimates.

58
 −1  
   
   
   
   
   
   
   
βb.......... =







   
   
   
   
   
   

8) Which conditions guarantee the consistency of the estimator?

The results of the estimation in the third trial are:

THIRD TRIAL

OLS, using observations 2005:02–2011:12 (T = 83)


Dependent variable: V

Coefficient St. error t statistic p value


const 61.0837 12.8676 4.7471 0.0000
P −0.387008 0.120420 −3.2138 0.0019
dm7 6.12198 7.06271 0.8668 0.3887
dm8 21.5672 7.15355 3.0149 0.0035
dm9 5.52165 7.47845 0.7383 0.4626
V 1 0.240634 0.100527 2.3937 0.0191

Durbin-Watson=2.123
Breush-Godfrey(first order autocorrelation)=0.07005

50 V observed and estimated


110
40 estimated
100 observed
30
90

20 80
70
residual

10
60
V

0
50
-10 40
30
-20
20
-30
10
-40 0
2005 2006 2007 2008 2009 2010 2011 2012 2005 2006 2007 2008 2009 2010 2011 2012

59
9) Discuss the figures and compare them with those obtained before. What do you think is
the reason of the differences observed?

10) Analyse the results obtained in the three trials and chose the best method of estimation.
Explain your answer.

EXERCISE 44 (GE.2) (May-2013)

The owner of a restaurant wants to know if spending in advertising (PUB, in euros) and the
reforms implemented in the restaurant in January 2012 (REF, 1 from January 2012 onwards
and 0 before 2012) have a significant effect on the total number of meals served (M). For
that purpose the owner has monthly data from January 2010 to October 2012, with which the
following estimation has been obtained.

OLS, using observations 2010:01–2012:10 (T = 34)


Dependent variable dependiente: M

Coefficient St. error t statistic p value


const 368.262 244.045 1.5090 0.1414
PUB 0.976053 0.172945 5.6437 0.0000
REF 5.94283 115.430 0.0515 0.9593

Residual sum of squares = 483269.0 R2 = 0.859077

1) Are the variables PUB and REF individually significant?

2) Analyse the following information and Figure 9 and explain its implications on the con-
clusions in the previous section.

Breusch-Pagan OLS test of homoscedasticity, using observations


2010:01-2012:10 (T = 34) Dependent variable: scaled uhat^2
Coefficient St. error t statistic p value
---------------------------------------------------------------
const 0.295674 1.90772 0.1550 0.8778
PUB 0.00154489 0.00135192 1.143 0.2619

Explained sum of squares = 30.9048


Test statistic: LM 15.452413
P(Chi-square(1) > 15.452413) = 0.000441

60
Figure 9: OLS estimation
Residuals of the regression (= V observed − estimated) M observed and estimated
300
2200
estimated
observed
200 2000

100 1800
residual

1600
0

M
1400
−100

1200
−200
1000

−300
600 800 1000 1200 1400 800
PUB 2010 2011 2012

3) It is suspected that the variance of the disturbance could be a quadratic function of ad-
vertising expenditure. Propose one structure for the covariance matrix of the disturbances
according with that suspicion and get the transformed model that solves the problem.

Given the previous suspicion, the model is reestimated assuming a particular structure for the
variance. The following results are obtained:

Weighted Least Squares, using observations 2010:01–2012:10 (T = 34)


Dependent variable: M
Variable used as weight: 1/P U B 2

Coefficient St. error t statistic p value


const −142.862 325.929 −0.4383 0.6642
PUB 1.35565 0.230850 5.8724 0.0000
REF 185.105 43.0527 4.2995 0.0000

4) What method of estimation has been used? Why has it been chosen to estimate the model?

5) What would you tell the owner of the restaurant about the expenditures on advertising
and the reform of the restaurant?

EXERCISE 45 (GE.3) (June-2013)

To analyse the variables that affect the consumption of cigarettes in the U.S., data for 48 U.S
states in 1995 are available for the following variables8 (all of them in logarithms):
8
Source: Introduction to Econometrics by Stock J.H and Watson M.W.

61
• l pop: state population

• l packpc: number of packs per capita

• l income: state personal income (total, nominal)

• l tax: average state, federal and local excise taxes for fiscal year (exogenous and thus non
stochastic)

• l avgprs: average price during fiscal year, including sales taxes

With that purpose the following model has been estimated by OLS.

d
l packpc i = 10.9745 + 0.436418 l incomei − 1.38842 l avgprsi − 0.474018 l popi (1)
(1.1152) (0.24436) (0.25004) (0.25466)

T = 48 R2 = 0.453 F (3, 44) = 12.146 σ̂ = 0.186


(standard errors in parentheses)

a) Interpret β̂lavgprs = −1.388. Are all the variables significant at 5% significance level?

b) The researcher thinks that there may exist heteroscedasticity in the disturbances of the
model. In order to check that possibility he/she makes use of the following auxiliary
regression:

ê2i = 9.247 − 3.588 l incomei + 0.947 l avgprsi + 3.452 l popi + ŵi (2)
(10.939) (2.397) (2.453) (2.498)

T = 48 ESS = 11.856
P 2
where ê2i = û2i /σ̂ 2 with σ̂ 2 = ûi /T for û the OLS residuals. Explain in detail how you
would use this result to test the hypothesis of homoscedasticity. What is the conclusion
of the test?

c) Another researcher is reluctant to accept the results in equation (1) because he/she believes
that the price (avgprs) is also affected by the demanded quantity of packs of cigarettes,
and thus the disturbances in model (1) are likely to be correlated with l avgprs. Therefore
he/she proposes to estimate the model by IV using l tax as the instrument, obtaining the
following result:

d
l packpc i = 11.0754 + 0.449578 l incomei − 1.41622 l avgprsi − 0.486993 l popi
(1.1959) (0.25078) (0.27684) (0.26066)

T = 48 σ̂ = 0.18604
(standard errors in parentheses)
(3)

62
Explain in detail how this estimated model has been obtained. Why is l tax chosen as the
instrument?

d) Which estimated model, (1) or (3), should be used to analyse the consumption of cigarettes?
Implement the test you consider necessary to support your answer.

EXERCISE 46 (GE.4) (June-2013)

To analyse the relationship between the growth rates of consumption and of personal disposable
income in USA, the following model is proposed:

Ct = α + βY dt + ut

where

• Ct : quarterly growth rate of real personal consumption,

• Y dt : quarterly growth rate of real personal disposable income.

The explanatory variable Y dt is assumed exogenous (non stochastic) and the model is estimated
by OLS using data from II-1947 to III-2003 with the result
b t = 0.00610406 + 0.298581 Ydt
C (1)
(0.00068906) (0.050942)

(standard errors in parentheses)


T = 226 R2 = 0.133 σ̂ = 0.0079904
226
X 226
X 226
X 226
X
û2t = 0.0143 û2t = 0.0136 2
(ût − ût−1 ) = 0.0329 ût ût−1 = −0.0023
t=1 t=2 t=2 t=2

where ût are OLS residuals.

a) Figure 10 displays the OLS residuals. Do yo perceive any problem?

b) Test if the disturbances ut behave as an autoregressive process of order 1.

c) If the disturbances were autocorrelated, what would be the effects on the results (esti-
mated coefficients and standard errors) shown in equation (1)?

d) Another researcher thinks that there is first order autoregressive autocorrelation in the
disturbances and estimates the model by Cochrane-Orcutt obtaining the results:

63
Figure 10: OLS residuals
Regression residuals (= observed - fitted C)
0.04

0.03

0.02

0.01
residual

-0.01

-0.02

-0.03

-0.04
1950 1960 1970 1980 1990 2000

b t = 0.00513029 + 0.403364 Ydt


C (2)
(0.00060609) (0.049777)

T = 225 σ̂ = 0.0077757 ρ̂ = −0.22361


(standard errors in parentheses)

Explain in detail how this estimated model has been obtained.

e) Let û∗t be the OLS residuals in the transformed model used to get the results in equation
(2). The following auxiliary regression has been obtained by OLS:

û∗t = 0.001X1t
∗ ∗
− 0.012X2t + 0.055û∗t−1 + ǫ̂t (3)
2
R = 0.052 RSS = 0.012 ESS = 0.043 DW = 1.97

where X1t∗ and X ∗ are the explanatory variables in the transformed model. Test if there
2t
exist autocorrelation in the transformed model, explaining clearly all the elements of the
test (null and alternative hypothesis, test statistic, distribution ...).

f) A third researcher thinks that the relationship between consumption and personal dispos-
able income is dynamic and obtains by OLS the following estimated model:

b t = 0.00520055 + 0.370505 Ydt + 0.211723 Ydt−1 − 0.186264 Ct−1


C (4)
(0.00084691) (0.052357) (0.053297) (0.067410)

T = 225 R2 = 0.209 RSS = 0.013 DW = 2.031


(standard errors in parentheses)

v̂t = −0.001 − 0.001Y dt − 0.051Y dt−1 + 0.172Ct + 0.172Ct−1 − 0.188v̂t−1 + ǫ̂t


R2 = 0.0032 ESS = 0.0043 DW = 1.34 (5)

where v̂t are the OLS residuals in the estimated model in (4). Can you say something
about the compliance of the basic assumptions on the disturbances?

64
g) Which estimated model (1), (2) or (4) is more adequate to explain the relationship between
consumption and personal disposable income? Why?

EXERCISE 47 (GE.5) (June-2013)

A researcher wants to estimate the following model

Y t = β 0 + β 1 Xt + u t t = 1, ..., T

where all the basic assumptions of the GLRM are satisfied. However Yt is not directly observable
but it is measured with error as Yt∗ such that: Yt = Yt∗ + ǫt , ǫ ∼ N (0, σǫ2 I), and the researcher
only has data to estimate the model:

Yt∗ = β0∗ + β1∗ Xt + u∗t t = 1, ..., T

Assuming that ut and ǫt are independent:

a) Explain the relationship between the coefficients of the original model and those of the
model to be estimated.

b) What are the characteristics of the disturbances u∗t in the model to be estimated?

c) Consider the OLS estimators β̂0∗ and β̂1∗ . Are they unbiased estimators of β0∗ and β1∗ ?
Proof it.

d) Would your answer to the previous question change if Xt were a stochastic variable?

EXERCISE 48 (GE.6) (May-2014)

A group of researchers in an NGO wants to analyse the factors that affect the global warming.
To that end they propose the following model:

Yt = β0 + β1 X1t + β2 X2t + ut (1)

where

• Yt : temperature in a particular place at time t.

65
• X1t : number of sunspots at time t (no stochastic).
• X2t : index of CO2 emitted to the atmosphere at time t (no stochastic).

The model is estimated by OLS with a sample of 100 monthly observations, obtaining the
following results9 :

b t = 10.97 + 0.41 X1t + 0.03 X2t


Y (2)
(1.11) (0.04) (0.01)

(standard errors in parentheses)


100
X 100
X 100
X
T = 100 R2 = 0.253 û2t = 34.20 û2t = 33.32 û2t−1 = 32.87
t=1 t=2 t=2
100
X 100
X
ût ût−1 = 24.20 (ût − ût−1 )2 = 4.23
t=2 t=2

a) Are X1t and X2t individually significant factors to explain global warming?

b) Do you think that the disturbances ut satisfy all basic assumptions? Base your answer on
some formal test.

c) Based on your answer to the previous question, comment on the validity of your answer in
a) and the properties of the OLS estimator of model (1).

d) One of the researchers thinks that the variable temperature exhibits some time dependence
and proposes the following dynamic model:
Yt = β0 + β1 X1t + β2 X2t + β3 Yt−1 + vt (3)
The model is estimated by OLS obtaining
b t = 8.97 + 0.36 X1t + 0.14 X2t + 0.37 Yt−1
Y (4)
(2.11) (0.11) (0.12) (0.04)

(standard errors in parentheses)


T = 99 R2 = 0.503 DW = 2.03
and the auxiliary regression
v̂t = 0.004 + 0.001X1t − 0.012X2t + 0.32Yt−1 + 0.055v̂t−1 + ǫ̂t (5)
2
R = 0.022 ESS = 0.003 DW = 1.97
What do you think about the fulfillment of the basic assumptions of the disturbances in
model (3)?

e) Taking into account this new estimation, would you change your answers to questions a)
and c)?

9
Fictitious results, not based on real data.

66
EXERCISE 49 (GE.7) (May-2014)

In order to analyse the factors that affect the wage of an individual, the following model is
proposed:
lwi = β0 + β1 Edui + β2 agei + ui (1)

• lwi : logarithm of the hourly wage (in cents) of individual i,

• Edui : years of schooling of individual i,

• agei : years of individual i.

With a sample of 3010 individuals recorded in 1976 the following results have been obtained by
OLS:

ci = 4.422 + 0.052 Edui + 0.041 age


lw i
(0.076) (0.003) (0.002)

T = 3010 R2 = 0.1808 σ̂ = 0.40169


(standard errors in parentheses)

with OLS residuals ûi :

Figure 11: OLS residuals


(a) Edu (b) age
Regression residuals (= observed - fitted lw) Regression residuals (= observed - fitted lw)
1.5 1.5

1 1

0.5 0.5

0 0
residual

residual

-0.5 -0.5

-1 -1

-1.5 -1.5

-2 -2
2 4 6 8 10 12 14 16 18 24 26 28 30 32 34
Edu age

The following auxiliary regression has also been estimated by OLS

û2i
= 1.072 − 0.011 Edui + 0.002 agei + ŵi
σ̂ 2 (0.289) (0.010) (0.009)
X
T = 3010 ESS = 2.606 σ̂ 2 = û2i /3010

a) Do you find some evidence of failure of some basic hypotheses on the disturbances? Use all
the information provided, including Figure 21 and the auxiliary regression.

67
b) Interpret β̂1 = 0.052. According to your answer in a), what are the properties of β̂1 , as-
suming that Edu and age are non stochastic?

c) Another researcher thinks that Edui does not reflect completely the education of individual
i, but it is a proxy of the true level of education, edi , such that Edui = edi + εi , where the
measurement error εi and edi are independent of each other. If ed is the factor that affects
the wages, what are the properties of the OLS estimator of the coefficients in model (1)?
Explain in detail.

d) This researcher estimates the model using the variables Edufi (years of education of the
father of individual i) and Edumi (years of education of the mather of individual i) as
instruments of Edui . The following results are obtained:
   
4.012 0.01321 0.00852 0.00745
β̂IV =  0.077  , Vd ar(β̂IV ) =  0.00004 0.00003 
0.043 0.00001

Explain in detail how these estimates have been obtained.

e) Explain the properties of this estimator, stating clearly the assumptions needed for them to
hold.

f ) Use a formal test to see if this researcher was right in his/her suspicion about the factor
education.

g) Taking into account all your answers to the previous questions, test if education has a pos-
itive effect on wages.

EXERCISE 50 (GE.8) (Jun-2014)

A researcher wants to analyse the effects of the economic globalization on unemployment (Yt ).
An index based on the exchange rate euro/US Dolar, Xt , which is assumed to be nonstochastic,
is used as a proxy of the degree of economic globalization. The sample is composed of monthly
data of both X and Y .

The results obtained with an OLS regression are:

Ŷt = 0.0004 + 0.064 Xt (1)


d
(dev.) (0.002) (0.066)

R2 = 0.002 T = 435 RSS = 0.820 DW = 1.425

68
Figure 12: OLS residuals
0.32

0.24

0.16

0.08

0.00

-0.08

-0.16

-0.24
1963 1968 1973 1978 1983 1988 1993 1998

a) What would you say about the residuals in Figure 1?

b) Is Xt individually significant?

c) Next, the following OLS regressions are obtained:


From 1962 to 1975
Ŷt = 0.005 − 0.102 Xt (2)
d
(dev.) (0.006) (0.362)

R2 = 0.0005 T1 = 155 RSS = 0.753 DW = 1.441

From 1983 to 1999


Ŷt = − 0.002 + 0.067 Xt (3)
d
(dev.) (0.0007) (0.020)

R2 = 0.055 T2 = 196 RSS = 0.021 DW = 0.997

Figure 13: OLS residuals: Models (2) and (3)


OLS residuals 1962-1975 OLS residuals 1983-1999
0.32 0.032

0.024
0.24

0.016
0.16

0.008

0.08

0.000

0.00

-0.008

-0.08
-0.016

-0.16
-0.024

-0.24 -0.032

1963 1965 1967 1969 1971 1973 1975 1983 1986 1989 1992 1995 1998

Compare the graph of the residuals in models (2) and (3) with those in the whole sample in
Figure 1. Test for the presence of heteroscedasticity in the whole sample. Explain clearly
all the elements of the test.

69
d) Do you think that the OLS estimation in (1) is adequate? And the test made in question b)?

e) Test the hypothesis of no autocorrelation in the second subsample: from 1983 to 1999.
Explain clearly all the elements of the testing procedure.

f ) Finally, a new model including Yt−1 as regressor is estimated by OLS for the period 1983-
1999 (196 observations). The OLS residuals v̂t are used in the auxiliary regression in
(5).

Ŷt = − 0.0009 + 0.047 Xt + 0.480 Yt−1 R2 = 0.281 (4)


d
(dev.) (0.0007) (0.018) (0.061)

v̂t = 0.0002 − 0.152 v̂t−1 − 0.002 Xt + 0.116 Yt−1 R2 = 0.006 (5)


d
(dev.) (0.0007) (0.136) (0.018) (0.117)

Compare the results in models (3) and (4) and explain the properties of the OLS estimator
in both models. Run all the tests you judge necessary.

EXERCISE 51 (GE.9) (Jun-2014)

Consider the following model:


Yt = βXt + ut (1)
where ut ∼ iid(0, σu2 ) and Xt is nonstochastic but non observable.

The variable Z1t is however observable and it is known that

Z1t = Xt + εt εt ∼ iid(0, σε2 ) (2)

where E(εt ut ) = 0 ∀t.

a) Starting from equation (1) propose an estimable model based on Yt and Z1t .

b) Proof the inconsistency of of the OLS estimator of β in the following model:

Yt = βZ1t + vt t = 1, 2, . . . , T (3)

c) Assume now that observations of two exogenous variables Z2t y Z3t are available, and that
both of them are correlated with Z1t . Bearing in mind this new information, how would
you estimate consistently β? How would you estimate the variance of the disturbances in
the model (8)?

70
d) Estimation of the model in equation (8) has led to the following results:

β̂OLS = 0.052 β̂IV = 0.077 , Vd


ar(β̂OLS ) = 0.000009 Vd
ar(β̂IV ) = 0.00004

Test if the measurement error is important. Based on the results of the test, which method
of estimation is more reliable? Why?

EXERCISE 52 (GE.10) (May-2015)

The World Health Organization is worried about the differences in life expectancy around the
world and starts a research searching for the causes of these differences. As a first step, the
following model is proposed:

lif eexi = β0 + β1 P opDoci + β2 GDPi + ui (1)

where

• Lifeex = Life expectancy at birth.

• PopDoc = Population per doctor (assumed non-stochastic).

• GDP = real GDP (Gross Domestic Product) per capita.

The model is estimated by OLS with a sample of 119 countries, obtaining the following results:

d i = 60.598 − 0.00030 PopDoc + 0.0010 GDPi


lifeex i
(0.978) (0.00004) (0.00009)
2
N = 119 R = 0.706 F (2, 116) = 139.59 σ̂ = 5.7528
(standard errors in parentheses)

The researcher is worried about the possibility of GDP being correlated with the disturbances.
To avoid problems he/she estimates also the model by IV using T V (Televisions per 100 people)
as an instrument of GDP ,

d i = 59.4883 − 0.00027 PopDoc + 0.00113 GDPi


lifeex i
(1.1238) (0.00004) (0.00012)

(standard errors in parentheses)

a) What are the properties of the OLS estimation in Model (1) if ui ∼ iid(0, σu2 )? Base your
answer on some formal test.

71
b) Do the estimated coefficients have the expected signs?

Assume hereafter that GDP is an exogenous (non stochastic) variable. Consider the following
graphs of the OLS residuals:

Figure 14: OLS residuals


(a) GDP per capita (b) PopDoc
Regression residuals (= observed - fitted lifeex) Regression residuals (= observed - fitted lifeex)
20 20

15 15

10 10

5 5
residual

residual

0 0

-5 -5

-10 -10

-15 -15
0 5000 10000 15000 20000 0 10000 20000 30000 40000 50000 60000 70000
GDP PopDoc

c) Based on the graphs of the OLS residuals, do you think that the disturbances ui satisfy all
basic assumptions?

d) The researcher in charge of the investigation estimates also the model by OLS using the 43
observations corresponding to the smallest GDP percapita and the 43 to the largest GDP,
obtaining:
Sample: 43 countries with low GDP

d i = 44.400 − 0.00006 PopDoc + 0.0076 GDPi


lifeex i
(2.214) (0.00004) (0.0012)
2
N = 43 R = 0.573 RSS = 979.859 σ̂ = 4.9494
(standard errors in parentheses)

Sample: 43 countries with large GDP


d i = 70.399 − 0.0013 PopDoc + 0.00036 GDPi
lifeex i
(1.245) (0.0007) (0.00006)
2
N = 43 R = 0.537 RSS = 167.218 σ̂ = 2.045
(standard errors in parentheses)

With this information, can you add something to your answer in question c) about the
fulfillment of the basic assumptions of the disturbances in Model (1)?

72
e) Not convinced by the results obtained with OLS, the researcher proposes to estimate the
parameters in Model (1) by applying OLS to a transformed model. The transformation
consists in multiplying dependent and explanatory variables by the square root of GDP,
obtaining the following results

GLS: WLS, using observations 1–119


Dependent variable: lifeex
Variable used as weight: GDP

Coefficient Std. Error t-ratio p-value


const 65.0973 0.806408 80.7250 0.0000
PopDoc −0.000366400 5.50281e–005 −6.6584 0.0000
GDP 0.000640313 5.56696e–005 11.5020 0.0000

Statistics based on the weighted data:


Sum squared resid 10079013 S.E. of regression 294.7678
R2 0.698268 Adjusted R2 0.693066
F (2, 116) 134.2236 P-value(F ) 6.58e–31

Describe in detail the method of estimation that he/she is proposing. When is this esti-
mated model better than the one obtained by OLS? Why?

f ) The following regression is also estimated with the OLS residuals in the transformed model
(denoted û∗i ):

be2i = 0, 831 + 0, 00003 PopDoci + 0, 0000006 GDPi


(0,236) (0,00001) (0,00002)

N = 119 R2 = 0, 0418 RSS = 224, 17 σ̂ = 1, 3901


(standard errors in parentheses)
P
where e2i = û∗2 2 2
i /σ̂ for σ̂ = û∗2
i /119. Taking into account that one of the main objec-
tives of this investigation is the analysis of the effects of the number of doctors per capita
on life expectancy, test the significance of the variable P opDoc.

EXERCISE 53 (GE.11) (May-2015)

A group of researchers wants to analyse the factors that affect the consumption of spirits. To
that end the following model is proposed:

Qt = β 0 + β 1 I t + u t (1)

where:

73
• Qt : Growth rate of the consumption of spirits in year t,

• It : Growth rate of the income per capita in year t (assumed exogenous, non-stochastic).

The following estimated model has been obtained by OLS with a sample of annual observations
from 1870 to 193810 :

b t = −0.0144 + 0.8386 It
Q
(0.0047) (0.2512)
2
T = 68 R = 0.1444 DW = 1.4584 σ̂ = 0.0370
(standard errors in parentheses)

The OLS residuals are plotted in Figure 15:

Figure 15: OLS residuals


Regression residuals (= observed - fitted Q)
0,15

0,1

0,05

0
residual

-0,05

-0,1

-0,15

-0,2
1870 1880 1890 1900 1910 1920 1930

a) Do you find some evidence of failure of some basic hypotheses on the disturbances? Use all
the information provided, including Figure 15.

Three researchers of the group consider that the estimation can be improved in different ways.
The first one thinks that the disturbances are autocorrelated and proposes to estimate Model
(1) using Cochrane-Orcutt. The results are

C-O: Cochrane–Orcutt, using observations 1872–1938 (T = 67)


Dependent variable: Q
ρ̂ = 0.268254
10
Source: J. Durbin and G.S. Watson, ”Testing for Serial Correlation in Least Squares Regression, II,”
Biometrika, vol. 38. pp. 159-78.

74
Coefficient Std. Error t-ratio p-value
const −0.0152441 0.00609762 −2.5000 0.0150
I 0.886346 0.245357 3.6125 0.0006

Statistics based on the rho-differenced data:

Mean dependent var −0.010557 S.D. dependent var 0.039817


Sum squared resid 0.083146 S.E. of regression 0.035766
R2 0.212529 Adjusted R2 0.200414
BG(1) 2.094274 Durbin–Watson 1.791301

b) Describe in detail how the value BG(1) (Breusch- Godfrey for first order autocorrelation)
has been obtained (note that the statistic is based on the rho-differenced data).

c) Is there any improvement in the properties of the estimated coefficients with respect to OLS
in Model (1)?

The second researcher thinks that the prices of spirits should be included in the model to explain
its consumption and proposes the following model

Qt = β0 + β1 It + β2 Pt + vt (2)

where Pt is the growth rate of the prices of spirits in year t (assumed non-stochastic). The
estimated model by OLS is

b t = −0.0070 + 0.7475 It − 0.8740 Pt


Q
(0.0028) (0.1462) (0.0765)

T = 68 R2 = 0.7155 DW = 2.205 BG(1) = 0.98


(standard errors in parentheses)

d) Taking into account this new information, would you change your answer to question c)?

Finally, the third researcher thinks that the dynamism in the consumption of spirits should be
included as
Qt = β0 + β1 It + β2 Pt + β3 Qt−1 + wt (3)

This new model, estimated by OLS is

b t = −0.0071 + 0.7820 It − 0.8479 Pt + 0.0498 Qt−1


Q
(0.0028) (0.1584) (0.0854) (0.0776)
2
T = 67 R = 0.7158 DW = 2.125 BG(1) = 1.83
(standard errors in parentheses)

75
e) Use the estimated model you consider most adequate to test if the growth rate of income
per capita is a significant variable to explain the variations in the consumption of spirits.
Which evidences do you use for your choice of the estimated model in which the test is
implemented?

EXERCISE 54 (GE.12) (June-2015)

The following variables have been used to study the labour market in USA in 199111 :

• earnsi = weekly earnings of wife i, in US dollars.


• educi = i-wife’s years of schooling.
• agei = i-wife’s age.
• kidge6i = 1 if all her children are older than 6 years.
• kidlt6i = 1 if some of her children is younger than 6 years.

In particular, the wages of married women are to be analysed. With that purpose, the following
OLS estimated model is obtained:

OLS, using observations 1–5634


Dependent variable: earns

Coefficient Std. Error t-ratio p-value


const −123,041 24,4701 −5,0282 0,0000
educ 35,5347 1,24570 28,5258 0,0000
age −1,77105 0,386746 −4,5794 0,0000
kidge6 −25,3981 7,98506 −3,1807 0,0015
kidlt6 −99,7546 9,45624 −10,5491 0,0000

Residual Sum of Squares 3,36e+08 S.E. of regression 244,2057


R2 0,140563 Adjusted R2 0,139952

a) Write down the sample regression function, indicating what is the sample size N .

b) Accordingly to this estimated model, and assuming that all basic hypothesis in the GLRM
are satisfied, are wife’s wages affected by having children younger than 6 years?

c) Figure 16 shows OLS residuals against educ.


11
Wooldridge, J. (2006): Introductory Econometrics: A Modern Approach, Thomson/South-Western. Fichero
de gretl cps91.gdt.

76
Figure 16: OLS residuals
Residuals (= earns observed − estimated)
3000

2500

2000

residual 1500

1000

500

−500
0 2 4 6 8 10 12 14 16 18
educ

a) Explain what problem can be observed in this figure.

b) Explain how to test that problem, indicating all the elements of the test.

d) The following regression is also available:

OLS, using observations 1-5634


û2i RSS
Dependent variable d i and
, where ûi = earnsi − earns = 59637.91
59637.91 N
Coefficient Std.Error t-tatio p-value
const −1,80055 0,338151 −5,325 1,05e-07
educ 0,187365 0,0172143 10,88 2,56e-27
age 0,0118422 0,00534441 2,216 0,0267
kidge6 −0,174970 0,110345 −1,586 0,1129
kidlt6 −0,162081 0,130675 −1,240 0,2149
Explained sum of squares = 1469,4

a) What is this regression used for? Use it to test if some of the basic hypothesis is not
satisfied.

b) Taking into account the results obtained with that regression, and that V ar(ui ) is
unknown, do you think that a better estimator than that in question a) exists? Why?
Explain how you would obtain it and its properties.

e) Consider the following OLS regression:

d i = −123, 041 + 35, 5347 educi − 1, 77105 agei − 25, 3981 kidge6i − 99, 7546 kidlt6i
earns
(25,539) (1,4265) (0,38620) (8,1434) (9,7634)
2
T = 5634 R = 0, 1406 σ̂ = 244, 21
(Heteroskedasticity-robust standard errors between round brackets)

77
Are wife’s wages affected by having children younger than 6? Justify in detail your answer
and the validity of the test. Compare it with the test used in question b).

f ) Now, data on the same variables are available for husbands: husearnsi is the salary of wife
i’s husband; huseduci his years of education and husagei his age. The salary of the woman
(earnsi ) is believed to depend on the salary of her husband (husearnsi ) and vice versa.
Consider the following OLS estimated model:
d i = −121, 142 + 33, 4438 educi − 1, 78987 agei − 29, 8467 kidge6i − 102, 056 kidlt6i
earns
(24,351) (1,2701) (0,38484) (7,9674) (9,4145)

+ 0, 0617422 husearnsi
(0,0081764)
(1)
T = 5634 R2 = 0, 1492 σ̂ = 243, 00 (Standard errors between round brackets)
Hausman test- Null hypothesis: OLS is consistent
Asymptotic test statistic: Chi-cuadrado(1) = 4,36097 with p-value = 0,0367713

d i = 0, 00633109 + 42, 0318 huseduci − 2, 33402 husage


husearns (2)
i
(31,107) (1,7298) (0,45866)

T = 5634 R2 = 0, 1007 σ̂ = 386, 02


(Standard errors between round brackets)

Explain the relationship of the regression in (2) with the Hausman test indicated after
regression (1). Implement the Hausman test, indicating all its elements. What are the
implications of the result of the test on the OLS estimation in (1)?

EXERCISE 55 (GE.13) (June-2015)

We have information on the industrial production function in Greece12 for the period 1961-1987:

• OUTPUTt = Industrial Production, billions of Drachmas at 1970 prices,


• CAPITALt = Capital input,
• LABORt = Labor input, thousands of worker-years,

with the following model estimated by OLS:


d
ln OUTPUT t = −11, 9366 + 0, 139810 ln CAPITALt + 2, 32840 ln LABORt
(3,2111) (0,16539) (0,59949)
2
T = 27 R = 0, 9714 ρ = 0.7944 DW = 0.3738
(Standard errors between round brackets, ln = napierian logarithm)
12
Gujarati, fichero de datos 7.11.

78
Figure 17: Time series plot of OLS residuals
Residuals (= lOUTPUT observed − estimated)
0,15

0,1

0,05
residual

−0,05

−0,1

−0,15
1965 1970 1975 1980 1985

a) Do you think that it is reasonable to assume that all the basic hypothesis of the GLRM are
satisfied? Base your answer on Figure 17 and on some test.

b) We have also obtained the following estimation by FGLS:


Figure 18:
0,6

0,5

0,4
RSS

0,3

0,2

0,1

0
-1 -0,5 0 0,5 1
rho

Hildreth–Lu, using obervations 1962–1987 (T = 26)


Dependent variable: ln OUTPUT

Coefficient Std. Error t-ratio p-value


const −6,11840 2,73241 −2,2392 0,0351
ln CAPITAL 0,213504 0,173902 1,2277 0,2320
ln LABOR 1,42165 0,480354 2,9596 0,0070

Statistics based on the rho-differenced data:


Residual Sum of Squares 0,043709 R2 0,990664
ρ̂ 0,045520 Durbin–Watson 1,872358
 
7, 4661 0, 2202 -1, 2670
Vd
ar(β̂HL ) =  0, 0302 -0, 0567 
0, 2307

79
a) What does the term rho in Figure 18 refer to? What is (approximately) an estimate
of it? Explain and mark it in Figure 18.

b) What is the improvement that you expect with this FGLS estimation over the initial
OLS estimation? Which requirement should the disturbances in the initial model
satisfy in order to gain that improvement?

c) Test the hypothesis of increasing returns to scale in the production function (that is,
the sum of the coefficients of labour and capital factors is larger than one).

c) We have also the following estimated model:

OLS, using observations 1962–1987 (T = 26)


Dependent variable: ln OUTPUT

Coefficient Std. Error t-statistic p-value


const 1,91978 2,45906 0,7807 0,4441
ln CAPITAL 1,56605 0,756414 2,0704 0,0516
ln CAPITAL 1 −1,25756 0,658399 −1,9100 0,0706
ln LABOR 1,23549 0,530538 2,3288 0,0305
ln LABOR 1 −1,60204 0,497884 −3,2177 0,0043
ln OUTPUT 1 0,753924 0,162858 4,6293 0,0002

Residual Sum of Squares 0,034920 S.E. of regression 0,041785


R2 0,992519 Adjusted-R2 0,990649
ρ̂ −0,074356 BG(1) 0,183176

a) What are the properties of OLS in this model? Justify your answer.

b) Taking into account the results obtained in this new model, what are the properties
of OLS in the initial model?

EXERCISE 56 (GE.14) (June-2015)

In order to analyse the impact of publicity on TV channels in 2014, a model is proposed for
Yi = total income for publicity in channel i, in millions euros, as a function of Xi = average total
market share (audience) of channel i, in percentages:
Yi = α + βXi + ui i = 1, ..., N N = number of channels with open emission, (1)
iid
with Xi non-stochastic and ui ∼ (0, σu2 ).

The total market share is unknown, and an estimation of it is instead obtained based on a sample
of 4625 measures obtained from audiometers installed in the same number of houses randomly

80
selected. Let Ai = approximated market share obtained from the audiometers, related with the
total market share by the equation:
iid
Ai = Xi + vi con vi ∼ (0, σv2 ) and ui independent of vi (2)

a) Write down the model that can be estimated with the information available. Does it satisfy
all the basic hypothesis of the GLRM? Explain in detail.

b) What are the properties of OLS in the model to be estimated, proposed in the previous
question? Explain in detail.

EXERCISE 57 (GE.15) (May-2016)

A researcher wants to analyse the factors that affect the level of salary in USA. With that
purpose, a sample of 3010 men in 1976 is considered with information on the following variables:

• lwage = logarithm of hourly wage in cents.

• educ = years of schooling.

• exper = years of working experience.

• black = dummy variable, =1 if the individual is black, =0 otherwise.

• nearc4= dummy variable, =1 if the individual grew up near a four-year college, =0 other-
wise.

The following model is proposed:

lwagei = β0 + β1 educi + β2 experi + β3 blacki + ui (1)

with its OLS estimation:


d = 4.885 + 0.082 educi + 0.039 exper − 0.232 blacki
lwage i i
(0.064) (0.004) (0.002) (0.017)
2
T = 3010 R = 0.227 F (3, 3006) = 293.56 σ̂u = 0.390 (2)
(standard errors in parentheses)

leading to the OLS residuals shown in Figure 19:

a) Test for race discrimination in wages.

b) Explain how the OLS residuals have been obtained and make some comments on Figure 19.

81
Figure 19: OLS residuals against experience
Regression residuals (= observed - fitted l_wage)
1.5

0.5

0
residual

-0.5

-1

-1.5

-2
0 5 10 15 20
exper

c) Using one of the following auxiliary regressions test for the fulfilment of the basic hypothesis
in the disturbances:

1- ê2i = 0.900 + 0.007educi + v̂i , R2 = 0.0002, T SS = 7171,


2- ê2i = 1.021 + 0.002educi − 0.005experi + v̂i , R2 = 0.0003, T SS = 7171,
3- êi = 1.123 + 0.001educi + v̂i , R2 = 0.023, T SS = 141,
4- êi = 1.021 + 0.002educi − 0.005experi + êi−1 + v̂i , R2 = 0.312, T SS = 141.
P 2
where êi = ûi /σ̂u for ûi OLS residuals and σ̂u2 = ûi /3010.

Not convinced by the results, the researcher estimates the model by weighted least squares,
obtaining

Model: WLS, using observations 1–3010


Dependent variable: lwage
Variable used as weight: 1/educ2

Coefficient Std. Error t-ratio p-value


const 5.108 0.056 90.514 0.000
educ 0.070 0.003 23.493 0.000
exper 0.033 0.002 14.908 0.000
black −0.313 0.016 −20.067 0.000

Statistics based on the weighted data:

Sum squared resid 3.153 S.E. of regression 0.032


R2 0.290 Adjusted R2 0.289
F (3, 3006) 409.551 P-value(F ) 4.8e–223

82
Breusch-Pagan test for heteroskedasticity -
Null hypothesis: heteroskedasticity not present
Test statistic: LM = 2945.2 with p-value = P(Chi-square(3) > 2945.2) = 0

d) Explain how these WLS estimates have been obtained, and what improvement is expected
with respect to OLS. Indicate in what context that improvement is actually achieved.

e) Do you think that the estimation by WLS achieves that improvement over OLS?

A second researcher thinks that education and wages are affected by the same factors such that
educ and the disturbances in model (1) are correlated. He/she decides then to estimate the
model by Instrumental Variables using nearc4 as instrument of educ, obtaining the results

d = 1.845 + 0.259 educi + 0.111 exper − 0.028 blacki


lwage i i
(0.663) (0.039) (0.016) (0.050)
2
T = 3010 R = 0.185 F (3, 3006) = 83.264 σ̂u = 0.524 (3)
(standard errors in parentheses)

f ) What conditions does nearc4 need to satisfy to be a good instrument of educ? What are
the properties of the IV estimator if those conditions are satisfied?

g) Use some formal test to decide if the suspicion of the second researcher is correct.

h) With all this new information, would you change your answer in a) about the race discrim-
ination in wages?

EXERCISE 58 (GE.16) (May-2016)

The relationship between unemployment and inflation specified by the Phillips curve implies a
trade off between both variables, such as high unemployment is accompanied by low inflation.
To analyse this relationship the following model is proposed:

inft = β0 + β1 unemt + ut (1)

where:

• inft : annual inflation in year t,

83
• unemt : rate of unemployment in year t.

With a sample of annual observations from 1948 to 2003 the following estimated model has been
obtained by OLS:

c t = 1.0535 + 0.502 unemt


inf
(1.548) (0.266)
2
T = 56 R = 0.045 BG(1) = 31.53 DW = 0.801
(standard errors in parentheses)
(2)

The OLS residuals are plotted in Figure 20:

Figure 20: OLS residuals


Regression residuals (= observed - fitted inf)
10

4
residual

-2

-4

-6
1950 1960 1970 1980 1990 2000

a) Do you find any evidence of failure of some basic hypotheses on the disturbances? Use all
the information provided, including Figure 20.

b) Consider also the following sums of OLS residuals


56
X 56
X
û2t = 476.82 , û2t−1 = 473.69 ,
t=1 t=2

56
X 56
X 56
X
û2t = 474.25 , ût ût−1 = 270.98 , (ût − ût−1 )2 = 7.76
t=2 t=2 t=2

Assuming that ut ∼ AR(1), explain in detail how you would estimate the model in an
asymptotically efficient way.

84
It is now believed that the temporal dependence existing in the inflation should be incor-
porated in the model, as in

inft = β0 + β1 unemt + β2 inft−1 + vt (3)

Model (3) is estimated by OLS, obtaining

OLS, using observations 1949–2003 (T = 55)


Dependent variable: inf

Coefficient Std. Error t-ratio p-value


const 2.210 1.210 1.826 0.073
unem −0.224 0.239 −0.938 0.352
inf 1 0.733 0.117 6.251 0.000

Mean dependent var 3.807 S.D. dependent var 3.013


Sum squared resid 256.453 S.E. of regression 2.221
R2 0.477 Adjusted R2 0.457
BG(1) 2.15 DW 1.87

c) Explain how the value BG(1) = 2.15 has been obtained and use it to test for the fulfilment
of the basic hypothesis in vt .

d) With all the results in a) and c), what do you think about the negative relationship between
unemployment and inflation specified by the Phillips curve? Support your answer with
some valid test.

EXERCISE 59 (GE.17) (June-2016)

A researcher wants to analyse the factors that affect the rate of employment in Puerto Rico.
Annual series for the period 1950-1987 are available of the following variables:

• lprepop: logarithm of the rate of employment in Puerto Rico.

• lmincov : logarithm of the rate of the minimum salary over the average salary in Puerto
Rico.

• lprgnp: logarithm of the Gross Domestic Product in Puerto Rico.

The following model is proposed:

lprepopt = β1 + β2 lmincovt + β3 lprgnpt + ut (1)

and its OLS estimation is:

85
Model 1: OLS, using observations 1950–1987 (T = 38)
Dependent variable: lprepop

Coefficient Std Error t-ratio p-value


const −1.94966 0.522971 −3.7280 0.0007
lmincov −0.257443 0.0653404 −3.9400 0.0004
lprgnp 0.0859111 0.0568042 1.5124 0.1394

Sum squared resid 0.101993 S.E. of regression 0.053982


R2 0.681129 Adjusted R2 0.662908
ρ̂ 0.782173 Durbin–Watson 0.432131

Figure 21 shows the OLS residuals along time.

Figure 21: OLS residuals


OLS residuals (= lprepop observeded - estimated)
0,15

0,1

0,05
residual

-0,05

-0,1

-0,15
1950 1955 1960 1965 1970 1975 1980 1985

a) Considering Figure 21 and some formal test, what can you say about the basic hypothesis
in the disturbances?

b) Interpret the estimate of β2 . Taking into account your response in a), do you consider this
estimation reliable? Justify your answer with the properties of the employed estimator.

c) Test the individual significance of the variable lmincov. Do you think that the result of this
test is reliable? Justify your answer.

The researcher suspects that there is autocorrelation in the disturbances and decides to apply
Hildreth-Lu to estimate the model by FGLS. The results are as follows:

Model 1: Hildreth–Lu, using observations 1951–1987 (T = 37)

86
Dependent variable: lprepop
ρ̂ = 0.96

Coefficient Std. Error t-ratio p value


const −6.21215 1.37616 −4.5141 0.0001
lmincov −0.0615490 0.0434592 −1.4162 0.1658
lprgnp 0.575888 0.155795 3.6965 0.0008

Statistics based on rho-differenced data:

Sum squared resid 0.024016 S.T. of regression 0.026577


R2 0.917825 Adjusted R2 0.912991
ρ̂ −0.049555 Durbin–Watson 2.043080

d) What process is the researcher assuming for the disturbances in model (1)? Write it down
and explain in detail the method of estimation that she/he is using, specifying clearly the
transformed model and explaining all the steps followed to obtain the FGLS results shown
above.

Figura 22 shows the OLS residuals along time in the transformed model.

Figure 22: OLS residuals in the transformed model


OLS residuals (= Z@observed -estimated)
0,06

0,0-

0,0
residuals

-0,0

-0,0-

-0,06
1955 1960 1965 1970 1975 1980 1985

e) With all the information obtained so far, what can you say about the fulfilment of the basic
hypothesis in the disturbances of the transformed model? Justify your answer.

Another researcher thinks that the model in equation (1) is not correctly specified because the
rate of employment is a dynamic variable such that it depends on past employment rates. Then,
the following model is specified and estimated:

87
lprepopt = β1 + β2 lmincovt + β3 lprgnpt + β4 lprepopt−1 + ut (2)
with OLS estimation:

d
lprepop t = −0, 815 − 0.098 lmincovt + 0.059 lprgnpt + 0.764 lprepopt−1 (3)
(0.318) (0.041) (0.032) (0.085)

T = 37 R2 = 0.896 SCR = 0.030524

The following auxiliary regression is also estimated:

b
ut = −0.010 + 0.302 ût−1 − 0.002 lmincovt − 0.006 lprgnpt − 0.062 lprepopt−1 (4)
(0.311) (0.195) (0.040) (0.031) (0.093)

T = 37 R2 = 0.0695
(Standard errors in parentheses)

f ) In view of the previous results, what are the properties of OLS in the model in equation
(1)? And in equation (2)? Which model do you prefer? Justify your answer.

EXERCISE 60 (GE.18) (June-2016)

The factors that affect the final score in an exam of a particular university course are to be
analysed. The exam consists of 40 questions, one point each one. The researcher proposes the
following regression model:

f inali = β1 + β2 atndrtei + β3 ACTi + β4 hwrtei + ui (1)

where:

• f inali : i -th student score in the exam.

• atndrtei : class attendance of i -th student (in %)

• ACTi : average marks of i -th student

• hwrtei : exercises handed in by i -th student (in %)

Using a sample of 674 students the model is estimated by OLS:

88
Model 5: OLS, using observations 1–674 (n = 674)
Dependent variable: final

Coefficient Std. Error t-ratio p-value


const 8.69467 1.49182 5.8282 0.0000
atndrte 0.0408583 0.0129489 3.1553 0.0017
ACT 0.527771 0.0480319 10.9879 0.0000
hwrte 0.0224737 0.0110316 2.0372 0.0420

Sum squared residuals 12390.73 S.D. of regression 4.300422


R2 0.175307 adjusted R2 0.171614

A second researcher thinks that the variable atndrte could be endogenous. Therefore, he/she
proposes to estimate the model with Instrumental Variables, using distance to campus, dist, as
instrument of atndrte. The results are:

Model 5: TSLE, using observations 1–674 (n = 674)


Dependent variable: final
Instrumented variable: atndrte
Instruments: const dist ACT hwrte
Coefficient Std. Error t-ratio p-value
const 8.63433 1.66015 5.2009 0.0000
atndrte 0.0422318 0.0210375 2.0075 0.0447
ACT 0.528284 0.0484306 10.9081 0.0000
hwrte 0.0217456 0.0141040 1.5418 0.1231

Sum squared residuals 12390.94 S.D. of regression 4.300458


R2 0.175294 Adjusted R2 0.171601

a) What do you think about the suspicion of the second researcher? Justify it with some test.

b) In view of the previous results, what method of estimation would you choose? Why? What
are the properties of the selected estimator?

c) Test the individual significacnce of the variable hwrte using the estimator you consider bet-
ter.

d) Finally, a third researcher thinks that the variance of the disturbances could possibly change
with the variables atndrte and ACT. Explain in detail how you would test this possibility
in the model in equation (1). If the test statistic is 3.861, do you find evidence in favour
of the null hypothesis?

89
e) Taking into account your answer in the previous question, would you change your answer in
question b)? In that case, what method of estimation do you suggest? Justify your answer.

EXERCISE 61 (GE.19) (June-2016)

Consider the following regression model:

Yt = β1 + β2 Yt−2 + β3 Xt + β4 Xt−1 + ut (1)


iid
where Xt is a non-stochastic variable and ut ∼ (0, σu2 )

a) Obtain (with a formal and detailed proof) the properties of the OLS estimator. Can you
use Mann-Wald test?

iid
b) Assume now that ut follows an AR(2) such that ut = ρ1 ut−1 +ρ2 ut−2 +ǫt , where ǫt ∼ (0, σǫ2 )
and ρ1 , ρ2 are known parameters. Is OLS an estimator with good properties? If your an-
swer is negative, what alternative estimator would you suggest?

EXERCISE 62 (GE.20) (May-2017)

In order to analyse the share of disposable income that is dedicated to food expenditure, infor-
mation on 235 Belgian families is available on the following variables:

• foodexpi : Annual food expenditure of family i in Belgian Francs,


• incomei : Annual income of family i in Belgian Francs.

The following model is proposed:

f oodexpi = β0 + β1 incomei + ui (1)

with its OLS estimation:

d
foodexp i = 147.475 + 0.4852 incomei (2)
(15.957) (0.0144)

T = 235 R2 = 0.830 σ̂ = 114.11


(standard errors in parentheses)

leading to the fitted values and OLS residuals shown in Figure 23:

90
Figure 23: Results with OLS estimation
(a) Actual and fitted values (b) OLS residuals
3000 600
actual
fitted
400
2500

200
2000

0
foodexp

residual
1500
-200

1000
-400

500
-600

0 -800
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
income income

a) What can be deduced from the two plots in Figure 23?

b) Using one of the following auxiliary regressions test for the fulfilment of the basic hypothesis
in the disturbances:

1- û2i = −3871 + 52.007f oodexpi + 0.014û2i−1 + v̂i , R2 = 0.232, T SS = 455925640644.87,


2- û2i = −4401.2 + 57.967incomei + v̂i , R2 = 0.465, T SS = 455925640644.87,
3- ûi = 0.0001 + 0.001incomei + v̂i , R2 = 0.023, T SS = 3033804.58,
4- ûi = −105.88 + 0.169f oodexpi + v̂i , R2 = 0.169, T SS = 3033804.58.

where ûi are OLS residuals.

c) The researcher also estimates the model by OLS but using the White estimator of the
variance covariance matrix, with the results:
d
foodexp i = 147.475 + 0.4852 incomei (3)
(46.648) (0.0520)

T = 235 R2 = 0.8296 σ̂ = 114.11


(standard errors -White- in parentheses)

Why do you think that White estimation is used? Describe in detail this estimator.

Not convinced by the results, the researcher estimates the model by Weighted Least Squares
(WLS), where all the variables are weighted by dividing them by incomei , obtaining

g
foodexp i = 66.1830 + 0.574002 incomei (4)
(11.207) (0.014980)

T = 235 R2 = 0.863 σ̂ = 0.087545


(standard errors in parentheses)

91
Statistics based on the weighted data:

Sum squared resid 1.785735 R2 0.863050

Breusch-Pagan test statistic for heteroskedasticity = 1.94603


with p-value = P(Chi-square(1) > 1.94603) = 0.163016

d) Explain how these WLS estimates have been obtained. What improvement is expected with
respect to OLS? Indicate in what context that improvement is actually achieved.

e) Do you think that the estimation by WLS achieves that improvement over OLS? Base your
answer on some formal test.

f ) Taking into account all the previous results, test in the more appropriate way if the share
of disposable income dedicated to food is larger than a half.

EXERCISE 63 (GE.21) (May-2017)

The search for a model to explain the aggregate consumption in the USA has been one of the
most active topics of research from the beginning of the last century. One of the first models
proposed is:
Ct = β0 + β1 Pt + β2 Pt−1 + β3 Wt + ut (1)
where:

• Ct : aggregate consumption in year t,

• Pt : corporate profits in year t,

• Wt : aggregate wages in year t.

With a sample of annual observations from 1920 to 1941 the following estimated model has been
obtained by OLS (year 1920 corresponding to t = 0):
b t = 16.2366 + 0.1929 Pt + 0.0899 Pt−1 + 0.7962 Wt
C t = 1, 2, ..., 21, (2)
(1.3027) (0.0912) (0.0906) (0.0399)

T = 21 R2 = 0.981 σ̂ = 1.0255
21
X 21
X 21
X
ût ût−1 = 3.2402 (ût − ût−1 ) = −1.8496 (ût − ût−1 )2 = 24.4497
t=2 t=2 t=2
X21 21
X
û2t = 17.8794 û2t = 17.7750
t=1 t=2
(standard errors in parentheses)

92
Figure 24: OLS residuals
Regression residuals (= observed - fitted C)
2

1.5

0.5
residual

-0.5

-1

-1.5

-2

-2.5
1925 1930 1935 1940

The OLS residuals are plotted in Figure 24:

a) Do you find any evidence of failure of some basic hypotheses on the disturbances? Use all
the information provided, including Figure 24 and a formal test.

b) Later, it is thought that profits and consumption are jointly determined, inducing con-
temporaneous correlation between Pt and ut (however cov(Pt−1 , ut ) = 0). What are the
consequences of this correlation on the previous OLS estimation? Explain in detail.

c) Using the variable It (investment at time t) as an instrument for Pt , the same model has
been estimated by Instrumental Variables, obtaining:
b t = 16.2341 + 0.1516 Pt + 0.1161 Pt−1 + 0.8028 Wt
C (3)
(1.3105) (0.1017) (0.0953) (0.0408)

T = 21 R2 = 0.9808 σ̂ = 1.0317
(standard errors in parentheses)
Describe how these estimates (coefficients and standard errors) have been obtained, their
properties and the characteristics that It has to satisfy to guarantee those properties.

d) Test if the suspected correlation between Pt and ut exists.

e) A different researcher thinks that the model should include the dynamism existing in the
consumption and proposes the following model estimated by OLS:
b t = 10.1435 + 0.4337 Pt − 0.1700 Pt−1 + 0.5377 Wt + 0.3267 Ct−1
C
(2.5214) (0.1186) (0.1267) (0.1019) (0.1213)
(4)
T = 21 R2 = 0.987 σ̂ = 0.87682 BG(1) = 0.011
(standard errors in parentheses)
Explain how the value BG(1) = 0.011 has been obtained and use it to test for the fulfil-
ment of the basic hypotheses in the disturbances.

93
f ) Based on your answer in e), what are the properties of the OLS estimator in this model?
Explain in detail.

g) Taking into account all the results in this exercise, what is in your opinion the best esti-
mated model to analyse the factors that affect Ct ? Justify clearly and thoroughly.

EXERCISE 64 (GE.22) (July-2017)

Consider the following model Yi = β1 + β2 Xi + ui , where Xi is a non-stochastic variable and


ui ∼ (0, σ 2 Xi2 ) for i = 1, ..., 60. It is known in addition that E(ui uj ) = 0 ∀i 6= j.

a) Write down the matrix of variances and covariances of the disturbances. Do they satisfy
the basic assumptions of the GLRM? Justify your answer.

The following sample information is also observed:

P P 2
P Yi = 1201, 88 P Yi 2 = 26456, 32 P 3 P 4
P X1 i = 276, 2 P X1i = 1603, 98 P X1i = 10649, 57 P X1i = 76704, 71
= 18, 66 2 = 8, 69 3 = 5, 54 4 = 4, 28
P Xi P Xi2 P Xi3 P Xi4
X Y = 6372, 93 X Y = 40513, 12 X Y = 285095, 50 X Y = 2137825, 06
P Yii i P Yii i P Yii i P Yii i
Xi = 311, 54 X2i
= 124, 09 X3
i
= 72, 36 X4 i
= 53, 63

b) How would you estimate the coefficients of the model? What are the properties of the
proposed estimator? Estimate finally the coefficients.

c) A different researcher assures thet he/she has estimated the model efficiently. This re-
searcher has estimated a transformed model by OLS. Based in this transformed model,
the researcher has estimated two new regressions: one with 20 observations corresponding
to the lowest values of the explanatory variable and other with the 20 observations with the
highest values. The Residual Sum of Squares (RSS) of these two regressions are 2.73 and
3.91 respectively. Do you think that the asseveration that the model has been estimated
efficiently is true?

d) If the variances of the disturbances were not known and you did not know how to estimate
them, how could you test the individual significance of Xi ? Explain in detail all the steps,
describing every element in the testing procedure.

94
EXERCISE 65 (GE.23) (July-2017)

A group of researchers wants to detect the factors that affect the nominal interest rate in Spain.
To that end, information on the following variables is available:

• intt : nominal interest rate (in %)

• inft : inflation (in %)

• deft : public deficit (in % of the GDP)

where the variables inft and deft are considered to be non-stochastic and t goes from the first
term of 1980 to the third term of 2000 (83 observations).

A first researcher proposes the following regression model:

intt = α1 + α2 inft + α3 deft + ut (1)

which is estimated by OLS:

c t = 9.7832 + 2.1187 inft + 0.5144 deft


int (2)
d
(dev) (1.3175) (0.4066) (0.2483)

T = 83 R2 = 0.3865 DW = 0.6287

Figure 25 shows the evolution of the OLS residuals.

Figure 25: OLS residuals


ols Residuals (= observed - estimated)
10

4
residual

-2

-4

-6

-8
1980 1985 1990 1995 2000

The following auxiliary regression is also obtained:

ût = 3.6501 − 1.3728inft + 0.4488deft + 0.8667ût−1 + ŵt (3)


2
T = 83 R = 0.5769

95
a) Using all the information provided, do you think that the disturbances satisfy all basic hy-
potheses? Justify your answer.

b) According to the results obtained in the previous question, what are the properties of the
OLS estimator? Justify your answer.

A second researcher proposes the following alternative model:

intt = β1 + β2 inft + β3 deft + β4 intt−1 + vt (4)


which is estimated by OLS:

c t = 0.8275 + 0.2101 inft + 0.1170 deft + 0.9190 intt−1


int (5)
d
(dev) (0.5633) (0.1586) (0.0842) (0.0362)

T = 82 R2 = 0.9321 DW = 1.2786

and by TSLS (Two Stages Least Squares):

c t = 0.5353 + 0.1470 inft + 0.1039 deft + 0.9490 intt−1


int (6)
d
(dev) (0.7360) (0.1889) (0.0871) (0.0605)

T = 82 R2 = 0.9319 DW = 1.3017

c) Explain how this TSLS estimates have been obtained. In particular, identify the appropri-
ate instruments and explain in detail the steps needed to get the previous results.

d) Which one of the two previous estimators is more adequate for model (4)? Justify your
answer.

e) Given your answer to question d), do you think that the basic assumptions on the regressors
are satisfied? And on the disturbances? Justify your answer.

Finally, a third researcher proposes the following model:

intt = γ1 + γ2 inft + γ3 inft−1 + γ4 deft + γ5 deft−1 + γ6 intt−1 + εt , (7)


which is estimated by OLS:

c t = 0.8596 + 0.2159 inft + 0.0839 inft−1 − 0.0498 deft + 0.1618 deft−1 + 0.9020 intt−1 (8)
int
d
(dev) (0.6399) (0.1613) (0.1598) (0.4577) (0.4706) (0.0458)

T = 82 R2 = 0.9325 DW = 1.2658 BG(1) = 2.1207

f ) What model and which method of estimation do you think that is more adequate? Justify
your answer.

96
EXERCISE 66 (GE.24) (July-2017)

Consider the following linear regression model:


iid
Yt = α + βXt + ut , ut ∼ (0, 1), t = 1, ..., T

where Xt is a non-stochastic variable that is not directly observable but we observe instead
iid
Xt∗ = Xt + εt , with εt ∼ (0, 1). It is also known that E(ut εt ) = 0.5β (β 6= 0), E(ut εs ) = 0 ∀t 6= s
∗ ) = 0.85.
and corr(Xt∗ , Xt−1

a) Write down the model to be estimated. What are the mean and variances of the distur-
bances?

b) Is there any basic hypothesis that is not satisfied? Justify your answer.

c) Having in mind your answer in the previous question, what method of estimation would you
use to estimate the model in question a)? Justify your choice and fill in the blanks in the
following matrices corresponding to the chosen method of estimation.
 −1  
 ········· ·········   ········· 
   
   
   
βb.......... =







   
   
 ········· ·········   ········· 

EXERCISE 67 (GE.25) (May-2018)

A credit institution wants to analyse the factors that affect the expenditure of individuals using
credit cards. A sample of 100 observations is available on the following variables:

• Avgexpi : Average monthly credit card expenditure of individual i,

• Incomei : Monthly income of individual i,

• Agei : Age in years of individual i.

The following model is proposed:

Avgexpi = β0 + β1 Agei + β2 Incomei + ui (1)

97
with its OLS estimation:

d i = 11.4750 − 2.0547 Agei + 72.2590 Incomei


Avgexp (2)
(119.31) (3.6498) (17.540)

T = 100 R2 = 0.151 σ̂ = 273.86


(standard errors in parentheses)

leading to the OLS residuals shown in Figure 26:

Figure 26: OLS residuals


(a) Against Age (b) Against Income
2000 2000

1500 1500

1000 1000
residual

residual

500 500

0 0

-500 -500

-1000 -1000
20 25 30 35 40 45 50 55 2 3 4 5 6 7 8 9 10
AGE INCOME

a) It is now believed that Income and Avgexp are affected by the same factors such that
Incomei and ui in model (1) are correlated. If this suspicion is true, what are the effects
on the previous OLS estimation?

b) A dummy variable Ownrent has now been constructed such that Ownrenti = 0 if individual
i rents a house and Ownrenti = 1 if he/she owns it. With that information the following
IV estimator has been obtained:
d i = −80.0032 − 5.29625 Agei + 130.273 Incomei
Avgexp
(157.60) (5.1108) (62.934)

(standard errors in parentheses)

Perform a formal test to analyse if the suspicion in question a) can be considered as being
true.

c) What characteristics should Ownrenti have to be a good instrument for Incomei ? If those
characteristics are satisfied, what are the properties of the IV estimator?

d) OLS residuals have been plotted in Figure 26. What information can be extracted from
both plots?

98
e) Given the information in Figure 26, explain in detail a formal test to check if the distur-
bances in model (1) satisfy the basic hypothesis of the GLRM.

f ) The following estimated model has also been obtained with OLS:

d i
Avgexp 1 Agei 1
= −23.8625 − 2.6307 + 89.2990
Income2i 2
(53.996) Incomei (1.6493) Incomei2 (19.744) Income i
T = 100 R2 = 0.423 σ̂ = 20.431
(standard errors in parentheses)

What is the final purpose of this transformed model? Why has it been estimated by OLS?

g) The following auxiliary regression has also been obtained by OLS:

vi2 = 0.179110 − 0.0498332 Incomei


b
(4.6888) (1.2541)
2
T = 100 R = 0.000016 σ̂ = 20.327
(standard errors in parentheses)

where v̂i are the OLS residuals in the transformed model in question f). Use this auxiliary
regression to test the adequacy of the transformed model.

h) Taking into account all the results obtained so far, test the individual significance of Income
to explain the variations of Avgexp. Justify your selection of the method of estimation
employed in the testing procedure.

EXERCISE 68 (GE.26) (May-2018)

Consider the following simple model of money demand:

log M 1t = β0 + β1 log GDPt + β2 log CP It + ut (1)

where

• M 1t : Nominal money stock at time t,

• GDPt : Real Gross Domestic Product at time t,

• CP It : Consumer Price Index at time t.

99
The model has been estimated by OLS using quarterly observations from 1950Q1 to 2000Q4,
d
log M1t = −1.6331 + 0.2871 log GDPt + 0.9718 log CPIt
(0.2286) (0.0474) (0.0338)
2
T = 204 R = 0.9895 σ̂ = 0.0829 (2)
204
X 204
X 204
X
ût ût−1 = 1.357 (ût − ût−1 ) = −0.156 (ût − ût−1 )2 = 0.034
t=2 t=2 t=2
204
X 204
X
û2t = 1.381 û2t = 1.374
t=1 t=2
(standard errors in parentheses)

The OLS residuals are plotted in Figure 27:

Figure 27: OLS residuals


0.25

0.2

0.15

0.1

0.05
residual

-0.05

-0.1

-0.15

-0.2

-0.25
1950 1960 1970 1980 1990 2000

a) What can you say about the fulfilment of the basic hypotheses of the disturbances from
Figure 27?

b) Make now some formal test to check if the disturbances satisfy the basic hypotheses.

c) The model is now re-estimated resulting in:

d
log M1t = −1.6331 + 0.2871 log GDPt + 0.9718 log CPIt (3)
(0.3116) (0.0723) (0.0608)

T = 204 R2 = 0.9895 σ̂ = 0.0829


(HAC, Newey-West standard errors in parentheses)

Explain the differences (if any) between this and the estimated model in equation (2).

100
d) A new model that includes four lags of the dependent variables is also estimated by OLS,
with the results:
d
log M1t = −0.0426 + 0.0081 log GDPt + 0.0226 log CPIt + 1.3462 log M1t−1
(0.0289) (0.0058) (0.0080) (0.0716)
(4)
− 0.1510 log M1t−2 − 0.0968 log M1t−3 − 0.1225 log M1t−4
(0.1203) (0.1202) (0.0705)
2
T = 200 R = 0.9999 BG(4) = 6.2657 σ̂ = 0.0088
(standard errors in parentheses)
What is the improvement expected by the inclusion of four lags of the dependent variable?

e) Make some formal test to decide if such expected improvement is actually achieved. Explain
in detail all the elements of the testing procedure.

f ) Which one of the three estimated models (2), (3) or (4) should be used to analyse the money
demand?

EXERCISE 69 (GE.27) (July-2018)

Daily data (5 work days per week) are available on the exchange rate of the following currencies
against the American Dollar (Dollar):

• bpt : Dollar/British Pound.


• cdt : Dollar/Canadian Dollar.
• dyt : Dollar/Japanese Yen.
• sft : Dollar/Swiss Franc.
• eurot : Dollar/Euro.

The following model is considered to explain the Dollar/Euro exchange rate fluctuations:
eurot = β1 + β2 bpt + β3 cdt + β4 dyt + β5 sft + ut (1)
where all the basic hypotheses in the GLRM are, in principle, assumed to be satisfied (unless
some evidence against them is found out). The OLS estimation is:

d t = 0.2803 + 0.1434 bpt − 0.5340 cdt − 23.3073 dyt + 1.7183 sft


euro (2)
(0.0228) (0.0037) (0.0306) (1.9689) (0.0215)

T = 1867 R2 = 0.9717 ρ̂ = 0.9909 DW = 0.0176


(standard errors in parentheses)
giving rise to Figure 28.

101
Figure 28: Results from OLS estimation
(a) Time series OLS residuals (b) Observed and estimated dependent variable
Residuals (= euro observed - estimated) euro observed and estimated
0,1 1,3
observed
0,08 estimated
1,2
0,06
1,1
0,04
1
0,02
residual

euro
0 0,9

-0,02
0,8
-0,04
0,7
-0,06
0,6
-0,08

-0,1 0,5
1980 1981 1982 1983 1984 1985 1986 1987 1980 1981 1982 1983 1984 1985 1986 1987

a) Explain how the OLS residuals in Figure 28(a) have been obtained. Using Figure 28(a),
dt
draw, roughly but clearly, the line corresponding to the estimated dependent variable euro
that is absent in Figure 28(b). Explain the implications of these two figures on the OLS
estimation in (1).

b) Given the graphs in Figure 28, test the hypothesis that you consider relevant on the be-
haviour of the disturbances.

c) Taking into account the previous results, propose and explain in detail a method of estima-
tion of the model in equation (1), mentioning its expected properties and the context in
which those properties are actually achieved.

d) The strategy described in the previous question leads to the following OLS regression:

d ∗t = 8.69e–06 + 0.0654 bp∗t + 0.0640 cd∗t + 15.6646 dy∗t + 1.2062 sf∗t


euro (3)
(0.0001) (0.0063) (0.0305) (2.5364) (0.0230)

T = 1866 R2 = 0.8537 DW = 2.1704 BG(1) = 13.5523 RSS = 0.0119


(standard errors in parentheses)

Explain what estimator is obtained with this regression and how the variables euro∗t , bp∗t ,
..., sf∗t have been constructed. Do you think that this estimator is better than the OLS
estimation in equation (2)? Support your answer with some formal test.

e) It is thought that the exchange rates Dollar/Euro and Dollar/British Pound are jointly
determined, such that
bpt = γ1 + γ2 eurot + vt (4)
Describe in detail how to estimate the parameters of the model in equation (1) in this
case, as well as the properties that the estimator should have.

f ) Consider the results in the following table:

102
TSLS, using observations 1980-01-03–1987-02-26 (T = 1866)
Dependent Variable: euro
Instrumented: bp
Instruments: const cd dy sf cd 1 dy 1 sf 1

Coefficient std. error z p-value


const 0.528386 0.232499 2.2726 0.0230
bp 0.227092 0.0777555 2.9206 0.0035
cd −0.917772 0.359008 −2.5564 0.0106
dy −19.2185 4.27069 −4.5001 0.0000
sf 1.51224 0.191040 7.9158 0.0000

R2 0.964216 ρ̂ 0.991331 Durbin–Watson 0.016525

Hausman test – Asymptotic test statistic = 1.4928


with p-value = 0.221782

Explain the meaning of: Instruments: const cd dy sf cd 1 dy 1 sf 1. Which one


of all the previous estimated models do you prefer? Why?

g) Explain in detail how to test the null hypothesis that the expected effect of the Dollar/Pound
and Dollar/Canadian Dollar exchange rates on the Dollar/Euro are equal.

EXERCISE 70 (GE.28) (July-2018)

Consider the initial model

eurot = β1 + β2 bpt + β3 cdt + β4 dyt + β5 sft + ut

where now all the regressors are assumed to be non-stochastic and there is no autocorrelation.
The graphs in Figure 29 are obtained from the OLS estimation in equation (2):

a) Explain the graphs in Figure 29. What effects can be deduced on the properties of the OLS
estimator?

b) Consider now Figure 29(c). In view of this figure, explain how you would test a relevant
hypothesis using Goldfeld and Quandt.

c) The following auxiliary regression is now obtained with OLS:

103
Figure 29: OLS residuals
(a) against bpt (b) against cdt
Residuals (= euro observed - estimated) Residuals (= euro observed - estimated)
0,1 0,1

0,08 0,08

0,06 0,06

0,04 0,04

0,02 0,02
residual

residual
0 0

-0,02 -0,02

-0,04 -0,04

-0,06 -0,06

-0,08 -0,08

-0,1 -0,1
1,2 1,4 1,6 1,8 2 2,2 2,4 0,7 0,72 0,74 0,76 0,78 0,8 0,82 0,84 0,86
b c

(c) against dyt (d) against sft


Residuals (= euro observed - estimated) Residuals (= euro observed - estimated)
0,1 0,1

0,08 0,08

0,06 0,06

0,04 0,04

0,02 0,02
residual

residual

0 0

-0,02 -0,02

-0,04 -0,04

-0,06 -0,06

-0,08 -0,08

-0,1 -0,1
0,004 0,0045 0,005 0,0055 0,006 0,0065 0,007 0,35 0,4 0,45 0,5 0,55 0,6 0,65 0,7
d s

d
u sqt = 0.0014 − 0.0012 bpt − 0.0010 cdt − 1.1545 dyt + 0.0144 sft (1)
(0.0009) (0.0001) (0.0012) (0.0785) (0.0009)

T = 1867 R2 = 0.2073 RSS = 0.0019


(standard errors in parentheses)

where usqt are squared OLS residuals obtained from equation (2). Explain for what test
the previous regression is necessary. Is the result of that test compatible with the graphs
in Figure 29?

Two additional OLS estimations are obtained:


d
euro 1 bp cdt dy sft
p t = 0.2796 p + 0.1351 p t − 0.4676 p − 21.3993 p t + 1.6317 p
d
u d
sq t (0.0124) u sq t (0.0022) udsq t (0.0177) udsq t (1.0940) d
u d
sq t (0.0133) u sq t
T = 1867 R2 = 0.9901 BP (4) = 18.8022, σ̂ = 1.2160 (2)
(standard errors in parentheses)

104
d
where u sqt is the estimated dependent variable in equation (1); and

d t = 0.2803 + 0.1434 bpt − 0.5340 cdt − 23.3073 dyt + 1.7183 sft


euro (3)
(0.0202) (0.0029) (0.0268) (2.0966) (0.0229)

T = 1867 R2 = 0.9717 σ̂ = 0.0251


(Standard errors HC0, White, in parentheses)

d) Taking into account the previous results and the information provided, explain the proper-
ties of the OLS estimators in equations (2) and (3).

e) Test if the effect of the Dollar/Pound on Dollar/Euro is 1. Justify the choice of the test
statistic and the estimator used in the testing procedure.

f ) If the disturbances of the initial model (1) showed both heteroscedasticity and autocorrela-
tion, how would you test the previous hypothesis in question e)?

EXERCISE 71 (GE.29) (May-2019)

A political institution is concerned about the effects of smoking on the weight of new babies at
birth. They want to analyse if increasing the price of cigarettes (perhaps via taxes) may have
some effect on the birth weight. A sample of 1388 individuals of different states in the US is
available on the following variables:

• bwghti : birth weight in ounces,

• cigpricei : price of cigarettes in home state,

• faminci : family income,

• malei : =1 if the newborn is male, =0 if female,

• whitei : =1 if the newborn is white, =0 otherwise,

• fatheduci : father’s years of education.

The following model is proposed:

bwghti = β0 + β1 cigpricei + β2 faminci + β3 malei + β4 whitei + ui (1)

105
with its OLS estimation:

d i = 103.478 + 0.055 cigprice + 0.086 faminci + 3.146 malei + 5.025 whitei


bwght i
(6.909) (0.053) (0.030) (1.081) (1.382) (2)
{6 .726 } {0 .052 } {0 .029 } {1 .074 } {1 .435 }

T = 1388 R2 = 0.028 F (4, 1383) = 10.122 σ̂ = 20.091


(standard errors in parentheses)
{Robust (White) standard errors in brackets}

The following OLS regression is also obtained using the OLS residuals ûi :

ĉ2 = 325.306 + 0.973 cigprice − 3.123 male − 61.816 white


u i i i i
(317.61) (2.438) (49.678) (60.696)
2
T = 1388 R = 0.0008 F (3, 1384) = 0.373 σ̂ = 923.63
(standard errors in parentheses)

a) Using the information provided, do you perceive evidence of unfulfillment of any basic hy-
pothesis?

b) Considering your previous answer, test in the best way if increasing the price of cigarettes
has a positive effect on the birth weight.

c) It is now suspected that the income of the families is not exogenous, but it is determined by
socio-economic variables that may also affect the health environment and the birth weight
such that faminci and ui are correlated . If that is the case, what are the implications on
the previous OLS estimation?

d) This alternative estimation of the model is also proposed:

Model: IV, using observations 1–1388


Dependent variable: bwght
Instrumented: faminc
Instruments: fatheduc
Coefficient Std. Error t ratio p-value
const 104.557 7.418 14.096 0.000
cigprice 0.032 0.057 0.556 0.578
faminc 0.188 0.076 2.474 0.013
male 4.150 1.170 3.547 0.001
white 3.039 1.768 1.718 0.086

Mean dependent var 119.519 S.D. dependent var 20.136


Sum squared resid 476804.5 R2 0.018

106
Describe the method of estimation used here and justify its properties (assume E(ui uj ) = 0
for i 6= j).

e) Test if the data confirm the suspicion in question c).

f ) Taking into account all the previous results, would you change the test implemented in
question b)? Justify your answer.

EXERCISE 72 (GE.30) (May-2019)

A company dedicated to manufacturing cars wants to analyse the factors that affect its demand.
For that, the following model is first considered:

nocarst = β0 + β1 popt + β2 DPIt + β3 pricet + ut (1)

where

• nocarst : Number of new car sales, in thousands,

• popt : population, in millions,

• DP It : Disposable personal income, in thousands dollars,

• price: New car price index (1982 base year).

The model has been first estimated by OLS using quarterly observations from 1976Q1 to 1990Q4,

d t = 8292.81 − 58.314 popt + 750.269 DPIt − 6.621 pricet


nocars (2)
(3899.1) (26.169) (153.44) (10.514)

T = 60 R2 = 0.488 F (3, 56) = 17.771 DW = 1.461


(standard errors in parentheses)

The OLS residuals are plotted in Figure 30:

a) What can you say about the fulfilment of the basic hypotheses of the disturbances from
Figure 30?

b) How has the value DW = 1.461 been obtained? Use it to make some formal test.

107
Figure 30: OLS residuals
500

400

300

200

100
residual

-100

-200

-300

-400

-500
1976 1978 1980 1982 1984 1986 1988 1990

c) Taking into account your answer to question b), what are the properties of the estimator
used in equation (2)? Do you know any other estimator with better properties? Describe
in detail.

d) The following model is also estimated by OLS:

d t = 6328.77 − 44.027 popt + 558.569 DPIt − 4.867 pricet + 0.246 nocarst−1


nocars (3)
(3994.4) (26.975) (184.91) (10.578) (0.138)

T = 59 R2 = 0.516 DW = 1.878 BG(1) = 0.675


(standard errors in parentheses)

Explain how the value BG(1) has been obtained and use it to implement the corresponding
test.

e) Use one of the estimated models, (2) or (3), to test if price has any effect on the sales of
new cars. Justify your choice of the selected model.

EXERCISE 73 (GE.31) (July-2019)

In order to analyse the relationship between the gasoline consumption (kml, in kilometeres per
liter) and the power of the engine of the vehicle (pot, in cubic cm) the following regression model
is proposed:

kmli = β1 + β2 poti + ui (1)

108
A first researcher estimates the model by OLS, with the results:

d i = 14, 9313 − 0, 010051 pot


kml i (2)
(0,210203) (0,000375)

T = 392 R2 = 0, 6482 SCR = 1514, 44


(standard errors in parentheses)

and the residuals in the following figure:

Figure 31: OLS residulas vs pot


8

2
resid

-2

-4

-6
200 400 600 800 1000
p

He/She also estimates by OLS the following regression:

ub2i = 6, 72338 − 0, 005792 pot i


b (3)
(0,734319) (0,001310)

T = 392 R2 = 0, 0477 SCR = 18481, 87


(standard errors in parentheses)

a) Based on all the information provided, what can you say about the fulfilment of the basic
assumptions?

b) What are the implications of your answer to the previous question on the properties of the
OLS estimator?

c) Having in mind your answer to question a), obtain step by step the matrix of variances and
covariances of the OLS estimator β̂M CO . Is β̂M CO consistent? Prove it.

A second researcher proposes the following model:

kmli∗ = α1 const∗i + α2 pot∗i + u∗i (4)


√ √ √ √
where kmli∗ = kmli / poti , const∗i = 1/ poti , pot∗i = poti and u∗i = ui / poti .

109
The OLS estimation gives:
d ∗ = 15, 4117 const ∗ − 0, 011024 pot ∗
kml i i i (5)
(0,224876) (0,000522)

T = 392 R2 = 0, 9655 SCR = 4, 9452


(standard errors in parentheses)
The following regression is also estimated by OLS:
uc
b∗2 ∗ ∗
i = 0, 508395 const i − 0, 000591 pot i (6)
(0,049445) (0,000115)

T = 392 R2 = 0, 2992 SCR = 0, 239086


(standard errors in parentheses)

d) Why do you think that the researcher decides to estimate the model in this way? Do you
think that he/she has achieved his/her goal? Justify your answer.

A third researcher estimates Model (1) by OLS but using White to estimate the matrix of
variances and covariances of β̂M CO . The results are:
d i = 14, 9313 − 0, 010051 pot
kml i (7)
(0,241112) (0,000371)

T = 392 R2 = 0, 6482 SCR = 1514, 44


(White standard errors in parentheses)

e) Describe the White estimator and explain why it is used.

Finally, a new researcher proposes to estimate the following regression model:

kmli = α1 + α2 poti + α3 pot2i + vi (8)


The OLS estimation is:
d i = 17, 87 − 0, 0231 pot + 0, 0000111 pot 2
kml i i
(0,4582) (0,0019) (0,0000016) (9)
[0,4997] [0,0017] [0,0000013]

T = 392 R2 = 0, 6688 SCR = 1339, 74


(standard errors in parentheses)
[White standard errors in square brackets]

The following OLS regression is also obtained:


vb2 = 6, 7707 − 0, 006790 pot
b i i (10)
(0,730271) (0,001302)

T = 392 R2 = 0, 0651 SCR = 18278, 68


(standard errors in parentheses)

f ) What estimated model do you prefer? Why?

g) What is the estimated average effect on kml of an unitary increment in pot?

110
EXERCISE 74 (GE.32) (July-2019)

The Department of Traffic in California wants to analyse the factors that influence the number
of traffic accidents. Monthly information, from January 1981 to December 1989, is available for
the following variables:

• Totacc: number of traffic accidents.

• Wkends: number of weekends in the month.

• Unem: unemployment rate (in %).

• Spdlaw: dummy variable, = 1 from May 1987, month in which started the speed limit of
105 km/h; = 0 before May 1987.

A researcher proposes the following regression model:

T otacct = β1 + β2 W kendst + β3 U nemt + β4 Spdlawt + ut (1)

Its OLS estimation is:


d t = 52181, 6 + 321, 65 Wkends t − 1920, 11 Unem t + 914, 19 Spdlaw t
Totacc
(3761,10) (257,31) (205,42) (801,74) (2)
[3893,57] [193,60] [371,31] [1391,44]

T = 108 R2 = 0, 6688 DW = 0, 9575


(standard errors in parentheses)
[Robust (Newey-West) standard errors in square brackets]

The following figure shows the evolution of OLS residuals with time:

Figure 32: OLS residuals along time


8000

6000

4000

2000
resid

-2000

-4000

-6000

-8000
1981 1982 1983 1984 1985 1986 1987 1988 1989 1990

111
Finally, the following regression is also estimated by OLS:

b bt
ut = 1679, 29 − 149, 08 Wkends t + 33, 65 Unem t + 68, 92 Spdlaw t + 0, 5274 ût−1 + w (3)
(3231,84) (221,63) (175,99) (686,61) (0,0846)

T = 108 R2 = 0, 2738
(standard errors in parentheses)

a) What can you say about the fulfilment of the basic asumptions on the disturbances? Base
your answer on Figure 32 and a formal test.

A second researcher estimates Model (1) using Cochrane-Orcutt. The results in the implicit
transformed model are:
d ∗t = 51541, 8 const ∗t + 355, 70 Wkends ∗t − 1889, 80 Unem ∗t + 857, 35 Spdlaw ∗t
Totacc (4)
(3237,35) (163,37) (297,04) (1224,14)

T = 107 R2 = 0, 4437 DW = 2, 1915


(standard error in parentheses)

b) Explain the estimating procedure.

The following OLS regression is also obtained:

u∗t = −262, 908 const ∗t + 32, 13 Wkends ∗t − 18, 41 Unem ∗t − 69, 39 Spdlaw ∗t − 0, 1179 û∗t−1 + vbt
b
(3239,02) (165,33) (296,89) (1223,28) (0,1001)
(5)
T = 107 R2 = 0, 0134
(standard errors in parentheses)

where û∗t are residuals from (4)

c) Given the results obtained so far, what are the properties of the estimator used in (4)?
Justify your answer.

A third analyst believes that the specification in model (1) is wrong and proposes to
include a lag of the dependent variable as regressor. The model is then estimated by OLS,
obtaining the following results:

d t = 28915, 1 + 301, 20 Wkends t − 1168, 27 Unem t + 309, 53 Spdlaw t + 0, 4280 Totacc t−1
Totacc
(5621,54) (232,21) (235,16) (730,27) (0,0829)
[7805,59] [224,61] [344,12] [847,33] [0,1250]
(6)
2
T = 107 R = 0, 7372 DW = 2, 00236 BG(1) = 0, 1441
(standard errors in parentheses)
[Robust standard errors (Newey-West) in square brackets]

112
d) What specification of the model do you prefer? Justify your answer.

The third analyst realizes now that U nemt is a stochastic variable that can be contempo-
raneously correlated with the disturbances. A new method of estimation is then proposed:
Instrumental Variables using U nemt−1 as instrument.

d t = 22730, 6 + 296, 93 Wkends t − 800, 31 Unem t + 824, 63 Spdlaw t + 0, 5084 Totacc t−1
Totacc
(6288,62) (234,99) (286,47) (771,98) (0,0908)
(7)
2
T = 107 R = 0, 7312 DW = 2, 1959
(standard errors in parentheses)

e) Taking into account the information provided so far, what method of estimation do you
think is better: the method used in (6) or in (7)? Justify your answer.

f ) Use one of the estimated models [(2), (4), (6) or (7)], to test if the variable Spdlaw has a
negative effect over T otacc.

EXERCISE 75 (GE.33) (May-2020)

A factory dedicated to making ice creams wants to analyse the factors that affect the demand
for its production. With that purpose the owners have been collecting information every four
weeks from March 1951 to July 1953, giving a total of 30 observations of the following variables:

• consumt : consumption of ice cream per head (in pints),

• pricet : price of ice cream per pint (in US Dollars),

• incomet : average family income per week (in US Dollars),

• tempt : average temperature (in Fahrenheit).

Three different advisors are consulted, who provide the following reports:

Advisor 1:

The first advisor estimates by OLS the following regression model:

113
d t = 0.19731 − 1.04441 pricet + 0.00331 incomet + 0.00346 tempt
consum
(0.27022) (0.83436) (0.00117) (0.00045)
2
T = 30 R = 0.7190 σ̂ = 0.036833
(standard errors in parentheses)

with the OLS residuals ût , displayed in Figure 1 for t = 1, ..., 30.

Figure 33: OLS residuals


Regression residuals (= observed - fitted consum)
 

 
 
 
residual

0
 
 
 
 

5 10 15 20 25 30

The following information on the residuals is also obtained:


30
X 30
X
P30
û2t = 0.0353 2
t=2 ût−1 = 0.0290 (ût − ût−1 )2 = 0.036
t=1 t=2
30
X 30
X
P30
ût ût−1 = 0.012 t=2 (ût − ût−1 ) = 0.008 (ût − ût−1 )2 = 3.1454
t=2 t=1

a) What information gives Figure 33 about the fulfilment of the basic hypotheses of the dis-
turbances ?

b) Use the information provided to test if the disturbances are an AR(1).

c) Advisor 1 tests for the significance of the variable pricet and, in view of the results, concludes
that the owner of the factory should raise the prices to increase the profits. Do you agree?

Advisor 2:

Advisor 2 suspects that the disturbances follow an AR(1) (ut = ρut−1 + ǫt , ǫt ∼ iid(0, σ 2 )).

114
d) Get a consistent estimate of ρ.

Advisor 2 prefers to estimate the model by FGLS, obtaining:

OLS, using observations 2–30 (T = 29)


Dependent variable: consum*

Coefficient Std. Error t-ratio p-value


const* 0.156990 0.289602 0.5421 0.5926
price* −0.892272 0.810840 −1.100 0.2816
income* 0.003204 0.001546 2.073 0.0486
temp* 0.003559 0.000554 6.417 0.0000

Mean dependent var 0.217122 S.D. dependent var 0.050892


Sum squared resid 0.025452 S.E. of regression 0.031907
R2 0.649038 Durbin–Watson = 1.548635 BG(1)= 0.326

e) How do you think that the previous estimate of ρ has been used to obtain the FGLS esti-
mates of the parameters of the model? Describe in detail.

f ) Describe how the value BG(1) = 0.326 has been obtained. Use it to decide which estimated
model: OLS by Advisor 1 or FGLS by Advisor 2 is preferable. Base your answer on the
properties of the estimators.

Advisor 3:

Advisor 3 proposes instead a dynamic model, which estimated by OLS gives:

d t = 0.02910 − 0.70750 pricet + 0.00383 incomet + 0.00332 tempt


consum
(0.27105) (0.81946) (0.00083) (0.00095)

+ 0.09879 consumt−1
(0.24746)

T = 29 R2 = 0.7645 F (4, 24) = 20.218 σ̂ = 0.034992


DW = 1.1764 BG(1) = 3.991
(HAC- Newey West- standard errors in parentheses)

g) What are the properties of OLS in this case? Make some tests to justify your answer.

h) Use some of the results offered by Advisors 2 or 3 to make a test that helps you to support
or disadvise the recommendation of Advisor 1 in question c).

115
EXERCISE 76 (GE.34) (May-2020)

In order to analyse the factors that affect the salary of very young men in USA a sample of 758
observations of men in the age of 14-24 in 1966 was available on the following variables:

• lwi : log wage,

• rnsi : =1 if individual i lives in the south and 0 otherwise,

• mrti : =1 if married and 0 oterwise,

• iqi : Intelligence Quotient score,

• agei : age of individual i,

• smsai : =1 if individual i resides in an urban area and 0 otherwise,

• medi : years of education on individual i’s mother.

The following model is proposed

lwi = β0 + β1 rnsi + β2 mrti + β3 iq + β4 agei + β5 smsai + ui , i = 1, ..., 758. (1)

This model is first estimated by OLS with the results

OLS, using observations 1–758


Dependent variable: lw

Coefficient Std. Error t-ratio p-value


const 3.40997 0.12418 27.46 0.0000
rns −0.09146 0.02805 −3.261 0.0012
mrt 0.08262 0.02748 3.007 0.0027
iq 0.00752 0.00092 8.162 0.0000
age 0.06309 0.00468 13.47 0.0000
smsa 0.14219 0.02719 5.230 0.0000

Mean dependent var 5.68674 S.D. dependent var 0.42895


Sum squared resid 84.57760 R2 0.392778

Using the OLS residuals ûi , the following auxiliary regressions have been also obtained:

1) û2i = 0.05 + 0.03rnsi + 0.02mrti − 0.00iqi − 0.00agei + 0.02smsai + v̂i , R2 = 0.011, RSS =
23.28,

2) û2i = 0.02+0.31ûi−1 +0.00rnsi +0.02mrti +0.02iqi −0.01agei +0.01smsai + v̂i , R2 = 0.211,


RSS = 13.28,

116
3) ûi = 0.03 + 0.12ûi−1 + 0.02rnsi + 0.12mrti + 0.02iqi + v̂i , R2 = 0.002, RSS = 33.54,

4) ûi = 0.03 + 0.06rnsi + 0.00mrti + 0.01iqi − 0.01agei + 0.00smsai + v̂i , R2 = 0.01, RSS =
28.12,

a) Use one of the previous auxiliary regressions to test for evidence of unfulfillment of some
basic hypothesis in the disturbances.

b) The variable iqi is used as a proxy of the true ability of individual i and thus it may be
subject to significant measurement error. What are the implications of this error on the
previous OLS estimation?

c) This alternative estimation of the model is also proposed:

IV using observations 1–758


Dependent variable: lw
Instrumented: iq
Instruments: const rns mrt med age smsa

Coefficient Std. Error t-ratio p-value


const 2.50331 0.38195 6.554 0.0000
rns −0.05163 0.03428 −1.506 0.1325
mrt 0.09962 0.03060 3.256 0.0012
iq 0.01827 0.00435 4.198 0.0000
age 0.05329 0.00639 8.341 0.0000
smsa 0.12085 0.03072 3.934 0.0001

Mean dependent var 5.686739 S.D. dependent var 0.42895


Sum squared resid 99.87656 R2 0.33605

Describe the method of estimation used here and justify its properties.

d) Test if the data confirm the suspicion in question b).

EXERCISE 77 (GE.35) (July-2020)

In 1992, a study was carried out on the safety of several models of cars. At that time, only a
few cars had airbags. The study was done by sitting dummies inside each car and banging them
against a wall. The dependent variable will be Pinjuryi : percentage (expressed as a decimal)
in which the dummy is broken . There are 231 observations (each observation corresponds to a
doll) and the regressors considered are:

117
• weighti : weight of car i (in thousands of pounds),

• driveri : =1 if the dummy was at the driver´s sit, =0 otherwise,

• airbagi : =1 if the car has airbag(s), =0 otherwise,

• threei : =1 if the car has 3 or more doors, =0 otherwise.

All the regresors are assumed to be nonstochastic.

The OLS estimated model is:


d i = 0, 215 − 0, 025 weighti + 0, 130 driveri − 0, 096 airbagi + 0, 073 threei
Pinjury (1)
(0,060) (0,024) (0,020) (0,026) (0,021)

T = 231 R2 = 0, 241 σ̂ = 0, 150


(Standard erros HC0, White, in parenthesis)

The OLS residuals have been used to obtain the graphics in Figure 34.

Figure 34: OLS residuals


(a) Against weighti (b) Against driveri

R !"# $% &'  ( )*!+, 789:8;;<=> :8;<?@AB; CD =E;8:F8? . G<HH8? I<>J@:KL
 /06
 /05
 /04
 /01
residual

residual

 /02
 /03
0 0

 ./03
 ./02
 ./01
2  3  4 0 1
w driver

(c) Against airbagi (d) Against three

[\ZX\]]W^_ X\]W`eaf] gh ^Y]\Xi\` M jWkk\` lW_meXno €~‚‚ƒ„… ~‚ƒ†‡ˆ‰‚ Š‹ „Œ‚~† q Žƒ||† ƒ…‡~‘’
NOV rt{
NOU rtz
NOT rty
NOP rtu
residual

residual

NOQ rtv
NOS rtx
0 0

MNOS qrtx
MNOQ qrtv
MNOP qrtu
0 1 0 1
aWXYaZ |}~

118
a) Interpret the estimated coefficient of the variable airbag.

b) Interpret the graphics in Figure 34, in terms of fulfilment of basic hypotheses.

Using the squared OLS residuals obtained from the regression in equation (1),usq1i , the
following result is obtained:

Model 2: OLS, using observations 1-231


Dependent variable: usq1

coefficient Std. Error t-ratio p-value


----------------------------------------------------------------
const 0.0163450 0.0204120 0.8008 0.4241
weight 8.18003e-06 0.00760303 0.001076 0.9991
driver 0.0195255 0.00677193 2.883 0.0043 ***
airbag -0.0200149 0.00871604 -2.296 0.0226 **

Mean dependent var 0.021914 S.D. dependent var 0.052462


Sum squared resid 0.596611 S.E. of regression 0.051266
R squared 0.057534 R-squared corrected 0.045079

c) Explain step by step the test in which the previous regression is necessary. Perform that
test. Is the result of the test compatible with the graphs in Figure 34?

d) Taking into account the previous results, justify, based on their properties, the validity of
the estimated coefficients and variances in (1).

A weighting variable has been constructed: ponde = the inverse of the square root of the
endogenous variable estimated in section (c). Using it, the following estimated model has
been obtained:

Model 3: WLS, using observations 1–231


Dependent variable: Pinjury
Variable used as weight: ponde

Coefficient Std. Error t-ratio


const 0.138557 0.247796 0.5592
weight 0.0213943 0.0109887 1.947
driver 0.0964578 0.322691 0.2989
airbag −0.0696942 0.246379 −0.2829
three 0.00571187 0.00956279 0.5973

Statistics based on the weighted data:

119
Sum squared resid 87361.92 S.E. of regression 19.66106
R2 0.025505 Adjusted R2 0.008257
F (4, 226) 1.478725 P-value(F ) 0.209495

e) Explain in detail the method of estimation used in Model 3. What are its expected proper-
ties?

f ) We want to check whether having an airbag reduces injuries in the event of an accident.
Do you prefer to do this in the model in equation (1) or in the model estimated in the
previous question? Reason the answer and implement the test.

EXERCISE 78 (GE.36) (July-2020)

We want to analyse the relationship between income, R, and consumption, C (both in billions
of dollars) of a country using quarterly data from 1970:1 to 1997:4 (T =112). The disturbances
are assumed to be normal and R is considered a non-stochastic regressor.

Analyst 1 estimates by OLS a simple model:

Ĉt = 2.28 + 0.53 Rt DW = 1.09 (1)


d
(desv.) (0.30) (0.06)

and concludes that this country dedicates to consumption half of each additional dollar of income.

a) Test the hypothesis on which the previous conclusion is based. Do you agree with Analyst 1?

b) Given the value of DW , do you think that the previous conclusion is correct?

Analyst 2 uses OLS to estimate the following dynamic models:

Ĉt = 0.830 + 0.069 Rt + 0.030 Rt−1 + 0.660 Rt−2 − 0.002 Ct−1 + 0.073 Ct−2
(0.198) (0.044) (0.055) (0.054) (0.060) (0.052) (2)
{0 .150 } {0 .037 } {0 .045 } {0 .055 } {0 .057 } {0 .056 }

T = 112 R2 = 0.881 F (4, 106) = 156.31 DW = 1.88 BG(4) = 15.12


(Standard errors in parenthesis)
{Robust standard errors (N-W) in braces}

c) What are the properties of the estimator used by Analyst 2? Justify your answer in detail.

120
d) Can you think of any reason why Analyst 2 would have obtained the N-W standard de-
viations? That is, does he get any improvement in this case by using them for inference
instead of the usual estimator of the standard deviations?

Analyst 3 eliminates variables that are apparently irrelevant and estimates by OLS the
following dynamic model:

Ĉt = 1.011 + 0.077 Rt + 0.717 Rt−2 DW = 2.08 BG(4) = 7.94 (3)


d
(desv.) (0.155) (0.036) (0.036)

e) Do you think that any basic hypothesis is missing from the model specified by Analyst 3?
Justify your answer in detail.

f ) Test in the model proposed by Analyst 3 the hypothesis that an additional dollar in income
today causes (on average) an increase in consumption of more than 0.5 two quarters ahead.

EXERCISE 79 (GE.37) (May-2021)

Data on the U.S. gasoline market for the years 1960-1995 are available on the following variables:

• lnGCt : logarithm of gasoline consumption per capita in year t,

• lnGPt : logarithm of gasoline price index,

• lnDIt : logarithm of per capita disposable income,

• lnPPTt : logarithm of price index for public transportation,

• lnPNCt : logarithm of price index for new cars,

• lnPUCt : logarithm of price index for used cars.

In order to analyse the factors that affect gasoline consumption the following model is estimated
by OLS:

lnd
GCt = −13.679 − 0.082 lnGPt + 1.523 lnDIt − 0.195 lnPPTt (1)
(0.724) (0.027) (0.081) (0.031)

T = 36 R2 = 0.961 DW = 0.640 BG(1) = 15.54


(standard errors in parentheses)

with the residuals in Figure 35:

121
Figure 35: OLS residuals
š›œ›žžŸ ¡ ›žŸ¢£¤¥ž ¦§  ¨ž›©›¢ “ ªŸ««›¢ ¥¡¬­®
”•”—

”•”˜

”•”™
residual

“”•”™

“”•”˜

“”•”—

“”•”–
1960 1965 1970 1975 1980 1985 1990 1995

a) What can you say about the fulfilment of the basic hypotheses of the disturbances? Use
the residuals in Figure 35 and a formal test to justify your answer.

b) If the disturbances were an AR(1), describe an asymptotically efficient estimator of the


model.

Alternatively, two estimates of a different model are considered. First, the following model is
estimated by OLS:

lnd
GCt = −4.645 + 0.523 lnDIt − 0.092 lnPPTt − 0.296 lnGPt
(1.107) (0.123) (0.025) (0.032)
(2)
+ 0.260 lnGPt−1 + 0.775 lnGCt−1
(0.037) (0.080)
2
T = 35 R = 0.990 DW = 1, 734 BG(1) = 1.759
(standard errors in parentheses)

c) Does the model in (2) implies any improvement over the estimated model in (1)?

d) If the price of public transportation were affected by the factors determining the consump-
tion of gasoline such that the disturbances and lnPPT were contemporaneously correlated,
what would be the properties of the previous OLS estimation?

Finally, consider the following estimated model:

TSLS, using observations 1961–1995 (T = 35)


Dependent variable: lnGC
Instrumented: lnPPT
Instruments: lnPNC lnPUC

122
Coefficient Std. Error t-ratio p-value
const −4.9859 1.2711 −3.923 0.0005
lnDI 0.5604 0.1414 3.963 0.0004
lnPPT −0.1018 0.03075 −3.311 0.0025
lnGP −0.2954 0.03206 −9.214 0.0000
lnGP 1 0.2624 0.0377 6.951 0.0000
lnGC 1 0.7560 0.0868 8.709 0.0000 (3)

e) What method of estimation is used in equation (3)? Describe it in detail together with its
properties and the conditions the instruments should satisfy.

f ) Test if the disposable income elasticity is lower than one using a valid inference technique
based on the most efficient estimation. You may need to perform some prior test to select
the estimated model (1), (2) or (3).

EXERCISE 80 (GE.38) (May-2021)

A health insurance company wants to analyse why the families decide to visit a doctor. A
survey is then made with 485 household heads who may or may not have visited a doctor during
a certain period of time. The variables in the survey are:

• docvisiti : number of doctor visits by family i,

• kidsi : number of children in the household,

• accessi : measure of access to health care,

• statusi : measure of health status (larger positive numbers are associated with poorer
health).

The following model is estimated by OLS:


d i = 1.607 − 0.251 kidsi + 1.500 accessi + 0.651 statusi
docvisit (1)
(0.418) (0.110) (0.783) (0.102)

T = 485 R2 = 0.092 BP = 413.82 σ̂ = 3.200


(standard errors in parentheses)

a) Considering that BP = 413.82 is the Breusch-Pagan statistic to test if some or all regressors
affect the variance of the disturbances, describe in detail how this value has been obtained
and test the corresponding hypothesis.

123
σ2
b) If var(ui ) = kidsi , describe how to estimate efficiently the model.

σ2
Assuming that var(ui ) = kidsi , the following transformed model has been estimated by OLS:

d ∗ = 1.666 const∗ − 0.189 kids∗ + 0.853 access∗ + 0.561 status∗


docvisit (2)
i i i i i
(0.353) (0.076) (0.654) (0.0847)

T = 485 R2 = 0.2845 F (4, 481) = 48.857 σ̂ = 3.9196


(standard errors in parentheses)

and the following auxiliary regression with the residuals from (2), û∗i :
ĉ2
u∗i = 19.749 − 1.993 kidsi + 8.420 statusi
(8.738) (3.336) (3.069)
2
T = 485 R = 0.016 RSS = 4516647 σ̂ = 96.802
(standard errors in parentheses)
(3)

σ2
c) Do you think that the assumption that var(ui ) = kidsi is correct? Why?

d) How would you test if the variable access is significant to explain the number of visits to
the doctor? Describe every element of the test.

EXERCISE 81 (GE.39) (July-2021)

A researcher wants to analyse the factors that determine the volume of shares that are traded
in the New York Stock Exchange market (NYSE). For this, monthly data are available from
January 1980 to September 1995 of the following variables:

• volume: NYSE reported share volume, measured in millions of shares.

• sp500: S&P common stock price index, in dollars.

• tbill: U.S. Treasury bill rate (3 months), measured in %.

• cconf: consumer confidence index, measured as 1985=100.

The researcher specifies the following regresion model:

volumet = β1 + β2 sp500t + β3 tbillt + β4 cconft + ut t = 1, . . . , 189,

and the OLS estimated model is:

124
Model 1: OLS, using observations 1980:01–1995:09 (T = 189)
Dependent variable: volume

Coefficient Std. Error t-statistic p value


const −291.133 350.594 −0.8304 0.4074
sp500 11.7000 0.556349 21.03 0.0000
tbill −63.2079 22.7033 −2.784 0.0059
cconf 8.68326 2.22137 3.909 0.0001

Residual sum of squares 66825249 S.E. of regression 601.0138


R2 0.886799 Durbin–Watson 1.035297

Figure 36 shows the time series of residuals from the estimated Model 1:
Figure 36: OLS residuals from Model 1
Regression residuals (= observed - fitted volume)
3500
3000
2500
2000
1500
residual

1000
500
0
-500
-1000
-1500
1980 1982 1984 1986 1988 1990 1992 1994 1996

a) Use Figure 36 to make some comments on the behaviour of the disturbances. Use also the
information given in Model 1 to make a formal test of this behaviour.

The researcher thinks now that the volume of traded shares can also be determined by the
volume traded in the previous period, so the following model is specified:

volumet = β1 + β2 sp500t + β3 tbillt + β4 cconft + β5 volumet−1 + vt ,

and the OLS estimated model is

Model 2: OLS, using observations 1980:02–1995:09 (T = 188)


Dependent variable: volume

125
Coefficient Std. Error t-statistic p value
const −70.9246 310.190 −0.2286 0.8194
sp500 6.09143 0.899202 6.774 0.0000
tbill −36.3394 20.3537 −1.785 0.0759
cconf 4.03150 2.05414 1.963 0.0512
volume 1 0.480018 0.0644920 7.443 0.0000

Sum of squared residuals 51274663 R2 0.912468


ρ̂ −0.224805 Durbin-Watson 2.449365

The researcher estimates also by OLS the following auxiliary regression:



v t = 172, 51−0, 42v̂t−1 +0, 32v̂t−2 +0, 34v̂t−3 −1, 67sp500t +6, 67tbillt −2, 39cconf +0, 14volumet−1

R2 = 0, 2735

b) Use the information here provided to test the fulfilment of the basic hypotheses in Model
2.

c) Given your answer in question b), what are the properties (asymptotic and in finite samples)
of the estimator used in Model 2? Justify your answer.

A second researcher estimates by OLS the following alternative model (Model 3):
d t = 202.534 + 2.5702 sp500 − 23.7499 tbillt + 0.1725 cconft
volume t
(272.602) (0.8986) (18.1205) (1.8539)

+ 0.1533 volumet−1 + 0.4386 volumet−2 + 0.1916 volumet−3


(0.0731) (0.0666) (0.0732)
2
T = 186 R = 0.9343 DW = 1.9621 BG(3) = 6, 0906
(standard errors in parentheses)

d) Do you think that Model 3 shows some improvement over Model 2? Justify your answer.

In order to carry out a more in-depth analysis, this second researcher re-estimates the model by
IV using the lags of sp500, tbill and cconf as instruments for volumet−1 , volumet−2 y volumet−3 .
The results are:
d t = −595.167 + 14.0904 sp500 − 68.5734 tbillt + 12.1332 cconft
volume t
(2092.93) (14.5026) (61.9663) (21.4592)

+ 0.8815 volumet−1 − 1.1304 volumet−2 + 0.0442 volumet−3


(6.3105) (5.3721) (2.1647)
2
T = 186 R = 0.7251 DW = 2.4107
(standard errors in parentheses)

e) Do you prefer this model or the estimated model selected in question d)? Justify your an-
swer.

126
EXERCISE 82 (GE.40) (July-2021)

The goal of this exercise is to analyse the factors that affect the value of the mortgages requested
from banks. Some information is available on of 1971 mortgages for the following variables:

Hi : Value of the mortgage, in thousands of dollars.


Ri : Applicant’s monthly income, in thousands of dollars.
Pi : Price of the house to be purchased, in thousands of dollars.
Bi : dummy variable =1 if the applicant is afro-american, =0 in any other case.
Si : dummy variable =1 if the applicant is hispano-american, =0 in any other case.
Wi : dummy variable =1 if the applicant is caucasian, =0 in any other case.
Ai : dummy variable =1 if the applicant as a freelancer, =0 in any other case.

A first researcher estimates the model using OLS, with the results:
Ĥi = 35, 6272 + 1, 9275 Ri + 0, 4725 Pi + 9, 2429 Bi − 10, 8113 Si + 2, 9609 Ai
(1,9158) (0,2316) (0,0101) (3,3148) (4,3467) (2,9632) (1)
{5 ,44623 } {0 ,6531 } {0 ,04237 } {2 ,5515 } {2 ,7960 } {3 ,6224 }

R2 = 0, 7091 RSS = 3739357


(standard errors in parentheses)
{standard Robust standard errors (White) in brackets }

Figure 37 shows a graph of the OLS residuals û2i :

Figure 37: OLS residuals against Pi


°±²³±´´µ¶· ³±´µ¸¹º»´ ¼½ ¶¾´±³¿±¸ À ÁµÂ±¸ ÃÄ
600

400

200
residual

-200

-400

-600
0 200 400 600 800 1000 1200 1400
¯

The following regression is also estimated by OLS:

ub2i = −5672, 93 + 38, 4961 P i


b (2)
(401,736) (1,71005)

R2 = 0, 204694 RSS = 187488005014


(Standard errors in parenthesis)

127
a) What can you say about the fulfilment of the basic hypotheses? Base your answer on all
the information provided.

b) What are the implications of your answer to the previous question on the properties of the
OLS estimator? And on the inference?

c) Taking into account your answer in question a), obtain step by step the matrix of variances
and covariances of β̂OLS . Is β̂OLS consistent? Prove it.

A second researcher estimates by OLS the following model:

Ĥi∗ = 7, 15520 const∗i + 0, 520844 Ri∗ + 0, 706845 Pi∗ + 9, 97720 Bi∗ + 7, 02707 Si∗ + 1, 55189 A∗i
(0,982573) (0,176878) (0,0078919) (1,64088) (1,77960) (1,29349)
(3)
2
R = 0, 653398 RSS = 0, 446520
(Standard errors in parenthesis)

3/2 3/2 3/2 3/2 3/2


where Hi∗ = Hi /Pi , const∗i = 1/Pi , Ri∗ = Ri /Pi , Pi∗ = Pi /Pi , Bi∗ = Bi /Pi ,
3/2 3/2
Si∗ = Si /Pi , A∗i = Ai /Pi .

In addition, the OLS residuals of this transformed model (û∗i ) are used in the following
OLS regression:

uc
b∗2
i = 0, 000374784 − 0, 00000075384 P i (4)
(0,0000619403) (0,000000263659)

R2 = 0, 004135 RSS = 0, 004457


(Standard errors in parenthesis)

d) Why do you think that the researcher is using this method of estimation? Do you think
that the researcher has achieved his/her goal? Justify your answer.

e) Test if the monthly income has a positive effect on the value of the mortgage. Justify your
selection of the test statistic.

128
COMPUTER EXERCISES

These computing exercises are addressed to the development of the following competences:

A. Learn the importance of the assumptions underlying a basic econometric model in order to
be able to propose and use more realistic assumptions.

B. Distinguish among different methods of estimation and evaluate their use according to the
economic variables of interest in order to obtain reliable results.

C. Use diverse statistical sources and acquire econometric software skills to analyse relationships
among economic variables.

D. Interpret adequately the obtained results to prepare economic reports.

COMPUTER EXERCISE 1

A database is available with information about the selling price and certain characteristics of
224 houses in two residential areas of the Orange County in California (USA): Dove Canyon
and Coto de Caza 13 . Dove Canyon is a neighbourhood built around a golf course with single
family tract homes with relatively small lots. Coto de Caza is a more upscale area. It is more
rural with large custom homes. The variables considered are:

salepric = sale price in thousands of dollars


sqft = living area in square feet
age = age of house in years
city = 1 for Coto de Caza and 0 for Dove Canyon

You can get access to these data running GRETL → in File → Open data → Sample file →
choose Ramanathan, the file coded data7-24.

a) a.1) Specify a model to analyse if the size and age of the building are factors relevant to
explain the price of the house. Estimate the model by Ordinary Least Squares.
a.2) Comment on the obtained results in terms of goodness of fit, significance and signs
of the estimated coefficients.

b) Obtain the graphic of the OLS residuals in this first specification. What does this graphic
suggest? Do you think that there is a misspecification problem? Why?

c) Introduce the variable city as explanatory variable in the model. Give an interpretation
of the accompanying coefficient.
13
Source: Ramanathan, Ramu (1992) Introductory Econometrics with Applications

129
d) Estimate this second specification by OLS. Comment on the results and compare them
with those obtained in a). Is this specification better than the previous one? Why?

Consider hereafter this second specification of the model.

e) Obtain the following graphics:

– Graphic of the OLS residuals.


– Graphic of the OLS residuals against the age variable.
– Graphic of the OLS residuals against the sqft variable.

f) Analyse with reasoning the information provided by the graphics.

g) Perform some heteroscedasticity test(s). Explain the testing procedure and comment the
obtained results.

i) Estimate the model by Generalized or Weighted Least Squares using as weighting variable
the inverse of the squared living area. Analyse the results.

j) What do we mean by weighted data and original data? Why the inverse of sqft2 is used
as weighting variable? Explain.

k) Propose another specification for the modelling of the variances of the disturbances that
includes both age and sqft. Using this funtional form for the variance estimate the model
by Feasible Generalized Least Squares.

l) Write down a section summarizing all the results obtained throughout the exercise. Which
results are more reliable? Why?

COMPUTER EXERCISE 2

A database of 51 U.S. states is available with observations on aggregate expenditure on do-


mestic travel (EXP T RAV ) and aggregated personal income (IN COM E) on year 199314 . The
variables considered are:

EXPTRAV = Travel expenditures in billions of dollars,


(Range 0.708 - 42.48).
INCOME = Personal Income in billions of dollars,
(Range 9.3 - 683.5).
POP = Population, in millions,
(Range 0.47 - 31.217).
14
Source: Statistical Abstract of U.S. (1995), in Ramanathan, Ramu (2002) Introductory Econometrics with
Applications.

130
To access these data:
GRETL → In File → Open data → sample file
Then select Ramanathan, file data8-2.gdt

a) Specify a model to analyse if the personal income explains the expenditure on domestic
travel. Interpret the coefficients.

b) Estimate the model by Ordinary Least Squares. Comment on the results in terms of
goodness of fit, significance and expected signs of the estimated coefficients.

The states with larger population are likely to show higher variability in travel expenditure than
states with smaller number of citizens. Therefore, it can be expected that the variance of the
disturbances grows with population. To analyse this possibility we have data on the population
of the different states. Then,

c) Obtain the following plots:

– OLS residuals.
– Plot of OLS residuals against the variable POP.

d) Analyse the information provided by these plots.

e) Perform the Goldfeld and Quandt test assuming that the variance of the disturbances
increases with the variable P OP . Explain the testing procedure and comment on the
obtained results.

f) Perform the Breusch and Pagan test under the assumption that the variance of the dis-
turbances depends on the variable P OP . Explain the testing procedure and comment on
the obtained results.

g) Given the evidence found in e) and f), comment on the reliability of the results obtained
in b). Using β̂OLS , could an increase of one billion dollars in personal income produce an
increment of one million dollars in the aggregate expenditure on domestic travel?

h) Estimate the model by Generalized or Weighted Least Squares using as weights the inverse
of the squared population. Plot the residuals against the explanatory variable. Analyse
the results.

i) What do we mean by weighted data and by original data? Why has the inverse of the
variable P OP 2 been used as weighting variable? Explain with reasoning.

j) Considering the mathematical expression for the variance considered in i):

j.1) Write down the corresponding transformed model. Estimate efficiently the proposed
transformed model. Compare these results with those obtained in h), can you estab-
lish any conclusion?
j.2) Plot the residuals from the estimation of the transformed model against their cor-
responding exogenous variable. Interpret such plot and compare it with that in h).
How do you interpret what you see?

131
k) Specify a model for the relationship between the variables EXP T RAV and IN COM E
under the assumption that σi2 = α1 + α2 P OPi . Estimate the corresponding transformed
model pointing out clearly all the steps needed to do it.

l) Write down a concluding section where all the results obtained throughout the exercise
are summarized.

COMPUTER EXERCISE 3

The following model is proposed to analyse the factors that affect the salary of married women
in U.S.A.:

lwagei = β1 + β2 educi + β3 huswagei + β4 experi + β5 expersqi + ui (1)

where

• lwagei : logarithm of the salary per hour (in dolars) of woman i.

• huswagei : salary per hour (in dolars) of woman i’s husband.

• educi : years of schooling of woman i.

• experi : labour market experience of woman i.

• expersqi : squared labour market experience of woman i.

This data set is accessible by opening gretl → menu → File → Sample file → Wooldridge
→ (by alphabetical order) mroz

The data set contains 753 observations for 1975. The first 428 are from working women and the
rest are for unemployed women. Select those women that are employed and thus have a salary.
To restrict the sample: sample → set range → Start: 1, End: 428.

a) Obtain the OLS estimation of Model (1).

b) Comment the results, in particular goodness of fit, estimated coefficients and their signif-
icance.

c) Is there evidence of a quadratic relationship between lwage and exper? Use a formal test
to support your conclusion.

d) The variable educ is considered stochastic and correlated with the disturbances in Model
(1). Explain the consequences of this correlation on the results obtained before.

e) Estimate now Model (1) by Instrumental Variables using the variable years of schooling of
the father (fatheduc) as instrument for educ. Are the results very different from those
obtained by OLS?.

132
f) Write down every element of the IV estimator: the matrix of instruments Z and the data
matrix X. What are the dimensions of these matrices?

g) An additional instrument for educ is now available: the years of schooling of the mother
(variable motheduc). Estimate Model (1) by TSLS (Two Stages Least Squares) using all
available instruments. Compare the results with those obtained in e).

h) Obtain the correlations between the instruments and the instrumentalised variable educ.
What are the implications of these correlations on the validity of the instruments?

i) Regress the variable educ on all possible instruments , including a constant:

educi = α1 + α2 huswagei + α3 experi + α4 expersqi + α5 fatheduci + α6 motheduci + vi .

d i i = 1, . . . , 428 and use this variable as instrument for educ.


Safe the fitted values educ
Compare these results with those obtained in g). Are the variables fatheduc and motheduc
significant?

j) Using the results in e), implement Hausman’s test. Taking into account the result of the
test, how would you estimate the coefficients in Modelo (1)?

k) Test if the variable exper is significant. What is the estimated percentage change in the
average salary when the labour market experience increases in one year and the rest of
factors remain constant? Is this change constant for every woman in the sample?

COMPUTER EXERCISE 4

We use the sample file smoke from Wooldridge’s book available in Gretl. The database consists
of observations from males resident in different American states for the year 1979. The variables
included in the file are:

• income: annual household income in thousands of dollars.

• lincome: logarithm of variable income.

• cigs: no. of cigarettes smoked per day.

• educ: years of schooling.

• age: age in years.

• agesq: squared age.

• cigpric: state cigarette price, cents per pack.

• lcigpric: logarithm of cigprice

• restaurn: =1 if state has restaurant smoking restrictions, =0 otherwise

133
• white: =1 if white, =0 otherwise

This data set is accessible by opening gretl → menu → File → Sample file → Wooldridge
→ (by alphabetical order) smoke15

Consider the model specification (1):

lincomei = β1 + β2 cigsi + β3 educi + β4 agei + β5 agesqi + ui (1)

a) Show the OLS estimation results for model (1).

b) Comment on the obtained results, particularly on the goodness of fit, the estimated coeffi-
cients and their significance.

c) Is there any evidence of a quadratic relationship between the variables lincome and age
(remaining all other variables constant)? Show the results of any test performed to lead your
conclusions.

d) It is believed that cigarette consumption can be jointly determined with income, such that
cigs is a stochastic regressor correlated with the disturbance term in model (1). Explain the
consequences of this correlation on the previous results obtained by OLS.

e) Show the results of the Instrumental Variable method of estimation of model (1) using the
variable restaurn as instrument for cigs. Are the results very different from those obtained
by OLS? Comment on these results.

f) Write down the matrix of instruments Z and the matrix of explanatory variables X. Do not
use the numbers but write down the name of the variables in the columns. Write down also
the dimension of the matrices.

g) Write down the expression of the IV estimator, including the elements in Z ′ X and Z ′ Y
(using sums) and their dimension. What characteristics should Z have in order to make this
estimator feasible? What characteristics should Z have in order to guarantee that the IV
estimator has the desirable properties and the inference be valid?

h) The variable lcigpric is now considered as an additional instrument for cigs. Estimate model
(1) by two stage least squares using all instruments. Display the results obtained in Gretl.
Compare these results with those obtained in e).

i) Calculate the correlations between the instruments and the variable cigs. What do these
correlations indicate on the adequacy of these instruments?

j) Perform the regression of the variable cigs on all possible instruments including the
constant.

cigsi = α1 + α2 educi + α3 agei + α4 age2i + α5 lcigprici + α6 restaurni + ui (2)


di i = 1, . . . , 879 of the regression and use this variable as an
Store the adjusted values cigs
instrument for cigs. Do you obtain the same results as in h)? Why do you obtain those
results? Are the variables lcigpric and restaurn significant?
15
Source: Wooldridge, J. M. (2003): Introductory Econometrics, file smoke.gdt.

134
k) Perform the Hausman test on the results of e). According to the results of the test, how
would you estimate the coefficients of model (1)?

l) Test the significance of variable age. What is the estimated change on lincome when the
individual is one year older and all other characteristics remain constant? Is this change the
same across all individuals in the sample?

COMPUTER EXERCISE 5

A Cobb-Douglas production function wants to be estimated for the farming sector in the U.S.A.
For that purpose there exist a database16 of yearly data for the period 1948-1993 on the next
indices (with 1982 = 100):

• Yt = farm output (Range 51 - 116)

• Lt = farm labour (Range 81 - 278)

• EXt = farm real estate (Range 89 - 102) - size indicator variable

• Kt = durable equipment (Range 38 - 102) - machinery stock

The following model is specified:

ln Yt = β1 + β2 ln Lt + β3 ln EXt + β4 ln Kt + ut (1)

a) Explain why the variables have been considered in logarithms.

b) What is the meaning of indices having base 1982 = 100? Which are the values of the
variables in that year?

c) Estimate by Ordinary Least Squares the log-log (or double logarithmic) specification in
(1). Interpret the results.

d) Analyse the time series of the residuals. Interpret the graphic and comment on the evidence
of any foreseen problem.

e) Perform the Durbin and Watson test. Explain with detail.

f) Perform the Breusch-Godfrey test to detect a possible AR(1) or MA(1) process in the
disturbances of the model.

g) How would you modify the t statistic for individual significance if the OLS estimator is
still used to get the estimates of the coefficients? Use the robust standard errors option
in order to estimate by OLS.

h) Estimate again the production function by the Cochrane-Orcutt (CO) and the grid search
of Hildreth and Lu (HL) methods.
16
Rammanathan, R. (2002), Introductory econometrics with applications, data 9-5.gdt

135
i) Comment on the obtained results with each method of estimation(OLS, CO and HL).

j) Using the Hildreth-Lu estimation results, test the null hypothesis H0 : β3 = 2β4 . Explain
all elements of the test.

COMPUTER EXERCISE 6

To analyse the effects of the pronatalistic policy of the U.S. government in the XX century there
exist yearly data for the period 1913-1984 on the following variables17 :

gfr births per 1000 women, aged 15-44


pe real value personal exemption, in dollars
year 1913 to 1984
pill =1 if year >= 1963 (year of introduction of the contraceptive pill)
ww2 =1, 1941 to 1945 (2nd world war period)

The following naive model is firstly specified

gfrt = β1 + β2 pet + ut (1)

1) Give the data a time series structure by clicking on the Gretl main window in
Data → Dataset structure → . . .

2) Estimate by OLS the model proposed in (1). Give an interpretation of the results.

3) Obtain the time series plot of the variable gfr t and that of the residuals. Comment them
considering the R2 obtained in the previous question.

4) Re-estimate the model adding the regressors pillt and ww2t . What phenomena are they
intended to take into account? Has their inclusion any effect on the previous plots?

5) Test for the existence of first order autocorrelation by means of the Durbin and Watson
statistic.

6) Having in mind all the above information and results, test for the individual significance of
the variable pet .

7) Estimate the model by the Cochrane-Orcutt method. Comment the obtained results, per-
forming any test considered necessary.

8) Estimate the model by the Hildreth-Lu method. Is there any significant difference? Why?

9) Add as regressor the one-period lagged variable gfr t−1 and estimate the new model by OLS.
Obtain the plot of the residuals and comment it. Perform a test for first order autocorrelation.
According to the result of the test, what would you say about the results of the analysis? Do
you think it is necessary to use any other estimator? Why?
17
Wooldridge, J.M. (2001), Introductory Econometrics, data fertil3.gdt.

136
10) Add a time trend t to the model. Given the residual plot, try the inclusion of a quadratic
trend, t2 .

11) Perform the Durbin-Watson test in this model. Comment on the results.

12) How would you test the individual significance of the explanatory variables in the model? Use
an adequate estimate of the standard deviations of the coefficients, given all the information
available so far.

COMPUTER EXERCISE 7

To analyse the effects of the pronatalistic policy of the U.S. government in the XX century there
exist yearly data for the period 1913-1984 on the following variables18 :

gfr births per 1000 women, aged 15-44


pe real value personal exemption, in dollars
year 1913 to 1984
pill =1 if year >= 1963 (year of introduction of the contraceptive pill)
ww2 =1, 1941 to 1945 (2nd world war period)

The following naive model is firstly specified

gfrt = β1 + β2 pet + β3 pillt + β4 ww2t + ut (1)

1) Give the data a time series structure by clicking on the Gretl main window in
Data → Dataset structure → . . .

2) Estimate by OLS the model proposed in (1). Check for the existence of autocorrelation in
this model.

3) Specify a dynamic model by including as regressors 4 (four) consecutive lags of the endogenous
variable gfr t , that is, add gfr t−1 , ..., gfr t−4 to the list of regressors. Check for their joint and
individual significance using valid statistics.

4) Specify a different dynamic model by including as regressors 4 (four) consecutive lags of the
variable pe t , that is, add pe t−1 , ..., pe t−4 to the list of regressors. Check for their joint
and individual significance using valid statistics. Does this specification introduce any other
problem that you may identify?

5) Include all lagged variables considered in questions 3 and 4 above. Then, based on formal
tests and sequentially:

i) Omit the variable gfr t−4 .


18
Wooldridge, J.M. (2001), Introductory Econometrics, data fertil3.gdt.

137
ii) Omit all those variables you find not significant at the 5% significance level, including
lagged and non-lagged variables. You may have to re-estimate the model more than
once.
iii) Save to session as an icon the final model you consider as best and write down its Sample
Regression Function.

Now, consider the following model:

gfrt = β1 + β2 pet−2 + β3 pillt + β4 ww2t + ut (2)

6) Test for the presence of autocorrelation in model (2). Instead of adding any lagged vari-
ables, obtain an asymptotically efficient estimator of its parameters. Write down the related
transformed model and the values of all estimated parameters (β̂i and ρ̂).

7) Try to choose the best specification between those in questions 5c) and 6. You can take a
decision by both:

i) Testing the restrictions you judge convenient on the final model of question 5c).
ii) Looking carefully at the residual plots of both estimated models.

138

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy