PS4 Solution
PS4 Solution
Econometrics (30413)
Spring 2021
Theory Questions
Question 1
yi = Xi β + εi
b) Is εi normally distributed?
c) Is εi homoskedastic?
Solution
a) In this model,
1
Moreover, since y is Bernoulli-distributed,
Therefore,
P (yi = 1 | X) = E(yi | Xi ) = Xi β
As a consequence, β̂OLS is not normally distributed in finite samples, but is still asymptoti-
cally normal.
c) εi is heteroskedastic, indeed
= Xi β (1 − Xi β) [(1 − Xi β) + Xi β] = Xi β (1 − Xi β)
As a consequence,
V β̂OLS | X 6= σ 2 (X 0 X)−1
and OLS is no longer BLUE. Therefore, standard errors must be adjusted for heteroskedas-
ticity (with heteroskedasticity-robust standard errors) in order to test the significance of the
parameters.
d) Given that it is a linear fit, Xi β̂OLS is unbounded and can hence assume values lower than
0 or bigger than 1.
2
Question 2
Suppose you have a sample of households and you are interested in determining the variables
which are relevant to the choice of buying a boat.
a) Which type of latent theoretical model can be considered? Which type of estimable model
should we consider?
Solution
a) We can imagine that there is a latent unobserved variable (willingness to pay for a boat),
such that when the latent unobserved variable falls above a certain threshold (a so-called
reserve price), the boat is purchased, and otherwise not.
By contrast, in the estimable model the dependent variable is binary, capturing observed
buying choices regarding boats, i.e. whether a boat is purchased or not by an individual.
b) Letting yi∗ be the latent unobserved variable (willingness to pay for a boat), we can relate this
linearly to a set of observables Xi as follows
yi∗ = Xi β + εi
3
where the last step follows from the assumption that the distribution of ε conditional on X
is symmetric.
∂ P (yi = 1 | Xi ) ∂
= F (Xi β) = βj · f (Xi β)
∂ xij ∂ xij
4
Applied Questions
Question 3
The summary statistics from the data, and the output from a logit model are reported below.
5
a) Comment on the estimated parameters. Are they statistically significant?
b) What test would you use to evaluate whether the whole estimated model is statistically
significant?
Solution
a) The significance of each coefficient is evaluated by means of t-tests. Given the reported
p-values associated to the t-tests, all coefficients (except the intercept) are statistically sig-
nificant at 1% level.
b) The joint significance of the estimated coefficients (excluding the intercept) is evaluated by
means of a likelihood ratio (LR) test.
H0
LR = 2 [ln(L1 ) − ln(L0 )] ≈ χ2K
where ln(L1 ) is the maximised log-likelihood of the full model, and ln(L0 ) is the maximised log-
likelihood of a model where only the constant is included. The LR statistic is asymptotically
distributed as a χ2r with r degrees of freedom, where r is the number of restrictions that are
imposed by the null. In our case, r = K, where K is the number of regressors (excluding the
constant), as the null imposes that all coefficients except the constant are equal to zero.
In our case, we reject the null on no overall significance of the model, i.e. we go in favor of
the alternative that the estimated model is overall significant.
6
Question 4
7
c) Is the model overall significant?
Solution
a) A probit model assumes that the errors in the (latent) theoretical model are distributed accord-
ing to a standard normal, whereas a logit model assumes that they are distributed according
to a logistic.
∂ P (yi = 1 | Xi ) ∂
= Φ (Xi β) = βeduc · ϕ (Xi β)
∂ educi ∂ educi
where Φ is the standard normal CDF and ϕ is the standard normal pdf.
Evaluated at mean (Xi = X̄), this marginal effect is approximately 0.04, i.e. an additional
year of education is associated with an increase in the probability that a married woman is
working of around 4 percentage points.
c) The model is overall significant according to the reported LR test of overall significance: the
LR statistic is 41.767 and the associated p-value is less than 1%.
8
Question 5
Now consider the marginal effects from the logit model estimated in exercise 3
a) Derive the marginal effect of education on the dependent variable in the logit case.
b) Can we give any direct interpretation to the estimated coefficients βj ? What is the value of
the estimated coefficient on the education variable?
Solution
b) The estimated coefficient on the education variable is β̂educ = 0.16067. It can directly be
interpreted as the marginal effect in the latent model.
∂ P (yi = 1 | Xi ) ∂
= F (Xi β) = βeduc · f (Xi β)
∂ educi ∂ educi
and the density f (Xi β) is always positive, the sign of the estimated coefficient and that of
the corresponding marginal effect are the same. That is, we can already say, by looking only
at the estimated coefficient, that the estimated marginal effect of education will be positive.
To determine its exact magnitude we need instead to pick a sample point (e.g. the mean)
and compute it.