0% found this document useful (0 votes)
8 views

CE1 Sol

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

CE1 Sol

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Econometrics

Solutions to Computer Exercise 1: Introduction Stata + Simple regression model


--------------------------------------------------------------------------------
log: ce1.log
log type: text
opened on: X

. do ce1

1. -

. *cd [change to your workdirectory e.g. u:\data]

2.
. use caschool.dta, clear

3.
. * list testscr str [output suppressed]

4.
. summarize testscr str

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
testscr | 420 654.1565 19.05335 605.55 706.75
str | 420 19.64043 1.891812 14 25.8

5.
. summarize testscr str, detail

testscr
-------------------------------------------------------------
Percentiles Smallest
1% 612.65 605.55
5% 623.15 606.75
10% 630.375 609 Obs 420
25% 640 612.5 Sum of Wgt. 420

50% 654.45 Mean 654.1565


Largest Std. Dev. 19.05335
75% 666.675 699.1
90% 679.1 700.3 Variance 363.0301
95% 685.5 704.3 Skewness .0916151
99% 698.45 706.75 Kurtosis 2.745712

str
-------------------------------------------------------------
Percentiles Smallest
1% 15.13898 14
5% 16.41658 14.20176
10% 17.34573 14.54214 Obs 420
25% 18.58179 14.70588 Sum of Wgt. 420

50% 19.72321 Mean 19.64043


Largest Std. Dev. 1.891812
75% 20.87183 24.95
90% 21.87561 25.05263 Variance 3.578952
95% 22.64514 25.78512 Skewness -.0253655
99% 24.88889 25.8 Kurtosis 3.609597

6.
. correlate testscr str
(obs=420)

| testscr str
-------------+------------------
testscr | 1.0000
str | -0.2264 1.0000

7.
. scatter testscr str

8.
. twoway (lfit testscr str) (scatter testscr str)
9.
. regress testscr str

Source | SS df MS Number of obs = 420


-------------+------------------------------ F( 1, 418) = 22.58
Model | 7794.11004 1 7794.11004 Prob > F = 0.0000
Residual | 144315.484 418 345.252353 R-squared = 0.0512
-------------+------------------------------ Adj R-squared = 0.0490
Total | 152109.594 419 363.030056 Root MSE = 18.581

------------------------------------------------------------------------------
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -2.279808 .4798256 -4.75 0.000 -3.22298 -1.336637
_cons | 698.933 9.467491 73.82 0.000 680.3231 717.5428
------------------------------------------------------------------------------

10.
. regress testscr str, robust

Linear regression Number of obs = 420


F( 1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581

------------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
str | -2.279808 .5194892 -4.39 0.000 -3.300945 -1.258671
_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
------------------------------------------------------------------------------

11.
. clear

12.
. set seed [use your student number]

BE AWARE: From here the output given will be somewhat different from the output
you get. The samples you draw below, are based on the seed you chose. The output
below is based on another seed and hence the output is somewhat different.

13.
. set obs 1000
obs was 0, now 1000

14.
. generate x = 10+2*rnormal()

. generate u = rnormal()

. generate y = 10+x+u

. * list y x
. summarize y x

Variable | Obs Mean Std. Dev. Min Max


-------------+--------------------------------------------------------
y | 1000 19.83624 2.260635 13.34501 27.6075
x | 1000 9.951573 2.029736 3.664238 16.34289

15.
. regress y x

Source | SS df MS Number of obs = 1000


-------------+------------------------------ F( 1, 998) = 3927.52
Model | 4070.91891 1 4070.91891 Prob > F = 0.0000
Residual | 1034.439 998 1.03651202 R-squared = 0.7974
-------------+------------------------------ Adj R-squared = 0.7972
Total | 5105.35791 999 5.11046838 Root MSE = 1.0181

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .9945438 .0158696 62.67 0.000 .9634022 1.025685
_cons | 9.938967 .1611753 61.67 0.000 9.622686 10.25525
------------------------------------------------------------------------------

From the definition of y (part 14) we see that β0 =10 and β1= 1. Every student has drawn a different sample
from the population regression model, hence numerical values will be different for the OLS estimators and,
hence, the estimation errors. For the seed used here (see 12) the estimation errors are 9.939 - 10 = -0.061 for
β0 and 0.995 - 1 = -0.005 for β1.

. twoway (lfit y x) (scatter y x)

16.
. generate xx = 10+2*rnormal()

. generate uu = rnormal()

. generate yy = 10+xx+uu

. regress yy xx

Source | SS df MS Number of obs = 1000


-------------+------------------------------ F( 1, 998) = 3696.19
Model | 3922.74633 1 3922.74633 Prob > F = 0.0000
Residual | 1059.17114 998 1.06129373 R-squared = 0.7874
-------------+------------------------------ Adj R-squared = 0.7872
Total | 4981.91747 999 4.98690437 Root MSE = 1.0302
------------------------------------------------------------------------------
yy | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
xx | 1.013315 .0166674 60.80 0.000 .9806074 1.046022
_cons | 9.915967 .1697014 58.43 0.000 9.582954 10.24898
------------------------------------------------------------------------------

The estimation errors are now 9.916 - 10 = -0.084 for β0 and 1.013 - 1 = 0.013 for β1. These are different from
the ones in 15 because we have drawn new data. New x's or new u's will yield new y's and if we do regression
the estimation results will deviate.
17.
If you would repeat this experiment many times you would get many different estimates of the parameters β0
and β1. For each new draw of x and u a new set of estimates. The average of all these different estimates will
be equal to the actual values 10 and 1 used to create y and this reflects essentially the unbiasedness property.
Also the average estimation errors will be zero. Resampling data and, hence, OLS estimates you get on average
the true values.

18.
In the case of artificial data (Monte Carlo) we have full control on the model and the data. So, we actually
know the model and e.g. the real parameter values. As a result we can calculate estimation errors. For real data,
this is impossible since we do not know the true model and the true values of the parameters in particular. A
Monte-Carlo analysis is only used to evaluate the quality of an estimation method. It has no real empirical
content.

19.
The relevant output is in part 15. If you perform a test or have to evaluate you always have to go through the
steps you learned in your statistics class. Be aware that we require you to do so also in examinations.
(1) 𝐻𝐻0 : β1 = 0.95 vs 𝐻𝐻1 : β1 ≠ 0.95
(2) 𝑡𝑡 = (1.025 − 0.95)/0.016 = 4.659. Under the null and approximately/asymptotically t is N(0,1)
distributed.
(3) Critical value at 5%: 1.96.
(4) Conclusion: reject the null since |t| > critical value.
No error is made, because you reject a false hypothesis (the true value of β1 is 1).

20.
The relevant output is in part 15.
(1) 𝐻𝐻0 : 𝛽𝛽1 = 1 vs 𝐻𝐻1 : 𝛽𝛽1 ≠ 1
(2) 𝑡𝑡 = (1.025 − 1)/0.016 = 1.563. Under the null and approximately/asymptotically t is N(0,1)
distributed.
(3) Critical value at 5%: 1.96.
(4) Conclusion: do not reject the null since |t| < critical value.
No error is made, because you do not reject a correct hypothesis (the true value of β1 is 1). About 5% of the
students will make a Type 1 error (rejecting H0 while it is correct).

21.
. regress y x in 1/10

Source | SS df MS Number of obs = 10


-------------+---------------------------------- F(1, 8) = 40.04
Model | 40.2992617 1 40.2992617 Prob > F = 0.0002
Residual | 8.05202958 8 1.0065037 R-squared = 0.8335
-------------+---------------------------------- Adj R-squared = 0.8127
Total | 48.3512912 9 5.37236569 Root MSE = 1.0032

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 1.163825 .1839276 6.33 0.000 .7396873 1.587963
_cons | 8.523538 1.854429 4.60 0.002 4.247217 12.79986
------------------------------------------------------------------------------

The estimation errors are now 8.524 - 10 = -1.476 for β0 and 1.164 - 1 = 0.164 for β1. Since in the case of using
10 observations instead of 1000, the precision of an estimator will be less, the estimation errors are often larger
for n = 10 than for n = 1000. However, now and then, they can be smaller because of coincidence because the
estimate depends on the sample which is randomly drawn.

22.
More observations imply less uncertainty about estimates, hence lower standard errors. The latter play a
prominent role in the calculation of the confidence intervals. Hence, more observations imply narrower
confidence intervals. To understand this better look at the variance of the regression coefficients as given in
Appendix 5.1 of S&W.
23.
. generate w=2*x

. regress y w

Source | SS df MS Number of obs = 1000


-------------+------------------------------ F( 1, 998) = 3927.52
Model | 4070.91891 1 4070.91891 Prob > F = 0.0000
Residual | 1034.439 998 1.03651202 R-squared = 0.7974
-------------+------------------------------ Adj R-squared = 0.7972
Total | 5105.35791 999 5.11046838 Root MSE = 1.0181

------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
w | .4972719 .0079348 62.67 0.000 .4817011 .5128426
_cons | 9.938967 .1611753 61.67 0.000 9.622686 10.25525
------------------------------------------------------------------------------

. predict resid, residuals

We multiply the regressor x by 2, i.e. w = 2x. Comparing the new population regression model:
𝑦𝑦𝑖𝑖 = 𝛽𝛽0* + 𝛽𝛽1* 𝑤𝑤𝑖𝑖 + 𝑢𝑢𝑖𝑖* with the original one 𝑦𝑦𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑥𝑥𝑖𝑖 + 𝑢𝑢𝑖𝑖 it is easily seen that we should have
𝛽𝛽*0 = 𝛽𝛽0 and 𝛽𝛽*1 = 𝛽𝛽1⁄2.Hence, we expect that the regression coefficient of 𝛽𝛽1to change (and its
standard error), but all other regression outcomes (t values, R2, etc.) should stay unchanged. Try as an
exercise to derive all this analytically (hint: use the variance-covariance rules stated in Chapter 3 of
Stock & Watson).
For example,
� ∑𝑛𝑛𝑖𝑖=1(𝑤𝑤𝑖𝑖 − 𝑤𝑤̄ ) (𝑦𝑦𝑖𝑖 − 𝑦𝑦̄ ) ∑𝑛𝑛𝑖𝑖=1(2x𝑖𝑖 − 2𝑥𝑥̄ ) (𝑦𝑦𝑖𝑖 − 𝑦𝑦̄ )
*
𝛽𝛽1 = = �1
= 0.5𝛽𝛽
∑𝑛𝑛𝑖𝑖=1 (𝑤𝑤𝑖𝑖 − 𝑤𝑤̄ )2 ∑𝑛𝑛𝑖𝑖=1 (2x𝑖𝑖 − 2𝑥𝑥̄ )2

24.
. correlate y x w resid, covariance
(obs=1000)

| y x w resid
-------------+------------------------------------
y | 5.11047
x | 4.09735 4.11983
w | 8.1947 8.23966 16.4793
resid | 1.03547 2.2e-09 4.4e-09 1.03547
. correlate y x w resid
(obs=1000)

| y x w resid
-------------+------------------------------------
y | 1.0000
x | 0.8930 1.0000
w | 0.8930 1.0000 1.0000
resid | 0.4501 0.0000 0.0000 1.0000

1
We have 𝑣𝑣𝑣𝑣𝑣𝑣(𝑤𝑤) = 𝑣𝑣𝑣𝑣𝑣𝑣(2x) = 4var(𝑥𝑥).Note that 𝑢𝑢 �̄ = 𝑛𝑛 ∑𝑛𝑛𝑖𝑖=1 𝑢𝑢�𝚤𝚤 = 0 (results from the OLS normal
equation (= first order conditions of minimizing the SSR w.r.t. the constant), we also have
𝑛𝑛 𝑛𝑛
1 1 1
𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢�) = �(𝑢𝑢�𝚤𝚤 − 𝑢𝑢�̄)2 = �(𝑢𝑢�𝚤𝚤 )2 = 𝑆𝑆𝑆𝑆𝑆𝑆
𝑛𝑛 − 1 𝑛𝑛 − 1 𝑛𝑛 − 1
𝑖𝑖=1 𝑖𝑖=1

25.
All this can be derived analytically using the variance-covariance rules stated in Chapter 3 of Stock
& Watson, but applied to this specific sample. The OLS normal equations (= first order condition of
minimizing the SSR) for the simple regression model are ∑𝑛𝑛𝑖𝑖=1 𝑢𝑢�𝚤𝚤 = 0 and ∑𝑛𝑛𝑖𝑖=1 𝑢𝑢�𝚤𝚤 𝑥𝑥𝑖𝑖 = 0. From these
we can derive
𝑛𝑛 𝑛𝑛
1 1 1 1
𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑢𝑢�) = �(𝑥𝑥𝑖𝑖 − 𝑥𝑥̄ ) (𝑢𝑢�𝚤𝚤 − 𝑢𝑢�̄) = �(𝑥𝑥𝑖𝑖 − 𝑥𝑥̄ ) (𝑢𝑢�𝚤𝚤 ) = ∑𝑥𝑥𝑖𝑖 𝑢𝑢�𝚤𝚤 − 𝑥𝑥̄ ∑𝑢𝑢�𝚤𝚤
𝑛𝑛 − 1 𝑛𝑛 − 1 𝑛𝑛 − 1 𝑛𝑛 − 1
𝑖𝑖=1 𝑖𝑖=1
= 0.

Furthermore we have 𝑐𝑐𝑐𝑐𝑐𝑐(𝑤𝑤, 𝑢𝑢�) = 𝑐𝑐𝑐𝑐𝑐𝑐(2x, 𝑢𝑢�) = 2𝑐𝑐𝑐𝑐𝑐𝑐(𝑥𝑥, 𝑢𝑢�) = 0.


Finally, we have:
�0 + 𝛽𝛽
𝑐𝑐𝑐𝑐𝑐𝑐(𝑦𝑦, 𝑢𝑢�) = 𝑐𝑐𝑐𝑐𝑐𝑐(𝛽𝛽 �1 𝑥𝑥 + 𝑢𝑢�, 𝑢𝑢�) = 𝑐𝑐𝑐𝑐𝑐𝑐(𝛽𝛽
�0, 𝑢𝑢�) + 𝑐𝑐𝑐𝑐𝑐𝑐(𝛽𝛽1 𝑥𝑥, 𝑢𝑢�) + 𝑐𝑐𝑐𝑐𝑐𝑐(𝑢𝑢�, 𝑢𝑢�) = 𝑣𝑣𝑣𝑣𝑣𝑣(𝑢𝑢�).

Of course, all variances and covariances used here are sample variances and covariances.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy