0% found this document useful (0 votes)

176 views18 pages

Econ 251 PS3 Solutions v2

The document provides solutions to questions from Problem Set #3 in Econ 251. It tests hypotheses about population parameters using data from the STATA file WAGE2.dta. Question 1 asks students to find sample mean log-wages for blacks and non-blacks, which are 6.52 and 6.82 respectively. Question 2 tests whether population mean log-wages differ between blacks and non-blacks using a t-test. The null hypothesis of no difference is rejected with a t-statistic of 7.2876. Question 3 tests for differences in population mean years of education and also rejects the null hypothesis, with a t-statistic of 5.5720. Calculations of the t-statistics

Uploaded by

Peter Shang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

176 views18 pages

Econ 251 PS3 Solutions v2

Uploaded by

Peter Shang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Econ 251

Problem Set #3

SOLUTIONS

Part I: Testing hypotheses in STATA (13 points in total)

This part of the problem set introduces you to using STATA for testing hypotheses about
the population parameters.
Instructions:
Following each question, please handwrite or type your answers and copy/paste the
STATA output (please, use the ‘copy as picture’ option).

The problem set uses the same STATA file WAGE2.dta as the second problem set. As
a reminder, WAGE2.dta contains the following variables:
wage monthly earnings (in 1976 USD)
hours average weekly hours of work
IQ IQ (intelligence quotient) score
educ years of education
exper years of work experience
age age in years
married =1 if the person is married
black =1 if the person is black
meduc mother’s education
(meaning the education level of the person’s mother)
feduc father’s education
(meaning the education level of the person’s mother)

1. (i) Economists usually use log-earnings, rather than earnings, since log-earnings
allow modelling percentage changes in earnings, rather than absolute changes (we’ll
see this very soon in class).
Generate a new variable equal to the natural logarithm of variable wage. Call this new
variable lwage. Note: You do not need to submit anything for part 1.(i).

(ii) Find the sample mean log-wage (lwage) for blacks and non-blacks separately.
Hint: in order to generate variable log-wage type the following command in STATA:
gen lwage=log(wage)
(1 point)
Solution:

The sample mean log-wage (lwage) for blacks is 6.52, while non-blacks’ mean log
earnings is 6.82.
(See STATA output on pg. 2).

1
. gen lwage=log(wage)

. tab black, sum(lwage)

Summary of lwage
=1 if black Mean Std. Dev. Freq.

0 6.8164865 .41214464 815

1 6.5244342 .39392362 120

Total 6.7790038 .4211439 935

FYI: after generating variable lwage you may wish to have a look at the new variable
and at variable wage in order to double check that lwage was generated correctly. If
you type browse wage lwage you can view the two variables and check that for each
observation the value of lwage=log(wage). E.g. for observation 1, wage=769 and
lwage=6.645091, which is the value of log(769) displayed to the 6th decimal.

2. Test the hypothesis that the population mean log-wage is equal for black and non-
black men, against the two-sided alternative. Use a significance level α = 0.05.
Hint: Use the STATA command ttest var1, by (var2)
In this particular example, var1 is lwage and var2 is black; hence, what you would need to
type in the command window in STATA is: ttest lwage, by (black)

(i) What is the null hypothesis being tested in terms of the notation used in class? What
is the alternative hypothesis in terms of the notation used in class? Be very precise and
explain what the notation stands for.
(ii) What do you conclude (i.e. do you reject the null at the 5% significance level), and
why?
(iii) Show how the t-statistic of 7.2876 was calculated.
(1 point each, 3 points in total)

Solution:

2
. ttest lwage, by (black)

Two-sample t test with equal variances

Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

0 815 6.816486 .0144368 .4121446 6.788149 6.844824

1 120 6.524434 .0359601 .3939236 6.45323 6.595639

combined 935 6.779004 .0137729 .4211439 6.751974 6.806033

diff .2920523 .0400754 .2134039 .3707006

diff = mean(0) - mean(1) t = 7.2876

Ho: diff = 0 degrees of freedom = 933

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000

Here is a step-by-step solution to parts (i) and (ii) of this question.

STEP 1: define the null and the alternative hypotheses

Stated statistically, our null hypothesis is:

H0: µo = µ1 where µ1 is the population mean log-wage for blacks and µo is the
population mean log-wage for non-blacks. THE NULL IS THE HYPOTHESIS WE
WANT TO DISPROVE.
Note that this is equivalent to H0: µo - µ1 = 0. This is how the null hypothesis is written
in STATA: “H0: diff = 0” (i.e. the difference in population mean log-wage for non-
blacks and blacks equals zero).

The null hypothesis is tested against the alternative:

H1: µo ≠ µ1
In STATA this alternative is written as: “Ha: diff != 0” (i.e. the difference in
population means does not equal zero).

STEP 2: look at the p-value for the relevant alternative hypothesis and compare it to the
significance level

The p-value for this alternative hypothesis is given by Pr (|T| > |t|) = 0.0000. A p-value
of 0.0000 means that there is a 0% chance to observe a test statistic as the one we
actually observed (i.e. t-stat=7.29), if the null hypothesis were true.

STEP 3: What do you conclude?

Reject the null if p-value ≤ significance level α
Fail to reject null if p-value > significance level α
All you had to say to get full points in this question is what are the null and alternative
for part (i), and that we reject the null hypothesis (in favour of the alternative) because

3
the p-value = 0.0000 < significance level α = 0.05. We conclude that the populations mean
log-wage differs significantly for black and non-black men.

IMPORTANT NOTE

In part 1 of the question we talk about the sample mean log-wage meaning that in this
particular sample of 935 men the average log-wage is different for blacks and non-
blacks. WE NEVER TEST A HYPOTHESIS ABOUT THE SAMPLE MEANS: we
know they are different numbers (in our examples they are 6.82 and 6.52).

In part 2 we talk about the population mean meaning we are referring to the
population from which this sample was drawn, and we are asking the question: based
on our random sample and the sample statistics, do we have enough information to
conclude that the population means of the two groups (blacks and non-blacks) differ?

(iii) Show how the t-statistic of 7.2876 was calculated.

(1 point)
Solution:
̂ − 𝒅𝒊𝒇𝒇𝒐
𝒅𝒊𝒇𝒇 𝟎.𝟐𝟗𝟐𝟎𝟓−𝟎
t-stat = = = 7.287,
𝑺𝑬(𝒅𝒊𝒇𝒇) 𝟎.𝟎𝟒𝟎𝟎𝟖

where
̂ is the estimated difference in means (i.e. this is just the difference between the
𝒅𝒊𝒇𝒇
two sample means = 6.8165-6.5244=0.2921).
𝒅𝒊𝒇𝒇𝒐 is the difference between the two population means under the null hypothesis
(recall H0: µo = µ1 ⟺ H0: diff≡µo - µ1 = 0, i.e. 𝒅𝒊𝒇𝒇𝒐 = 𝟎).
STATA reports this as “H0: diff=0”.
𝑺𝑬(𝒅𝒊𝒇𝒇) is the standard error of the difference in means.

3. Test the hypothesis that the population mean years of education (variable educ) is equal
for black and non-black men, against the two-sided alternative. Use a significance level
α = 0.05.
(i) What is the null hypothesis being tested in terms of the notation used in class?
What is the alternative hypothesis in terms of the notation used in class?
(ii) What do you conclude (i.e. do you reject the null at the 5% significance level), and
why?
(iii) Show how the t-statistic of 5.5720 was calculated.
(1 point each, 3 points in total)

Solution:
H0: µo = µ1
H1: µo ≠ µ1

Here µ1 is the population mean years of education for blacks and µo is the population
mean years of education for non-blacks. Again, please note we test a hypothesis about

4
the population means; we know the sample means are not equal – they are different
numbers – we saw this is problem set #2.

(ii) What do you conclude (i.e. do you reject the null at the 5% significance level), and
why?

. ttest educ, by (black)

Two-sample t test with equal variances

Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

0 815 13.61963 .0776698 2.217333 13.46718 13.77209

1 120 12.44167 .1586868 1.738326 12.12745 12.75588

combined 935 13.46845 .0718383 2.196654 13.32747 13.60943

diff 1.177965 .2114084 .7630741 1.592856

diff = mean(0) - mean(1) t = 5.5720

Ho: diff = 0 degrees of freedom = 933

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000

We reject the null hypothesis (in favour of the alternative) because the p-
value=0.0000<significance level α = 0.05 and conclude that the population mean years of
education differ significantly for black and non-black men.

(iii) Show how the t-statistic of 5.5720 was calculated.

Solution:
̂ − 𝒅𝒊𝒇𝒇𝒐
𝒅𝒊𝒇𝒇 𝟏.𝟏𝟕𝟕𝟗𝟕−𝟎
t-stat = = = 5.572,
𝑺𝑬(𝒅𝒊𝒇𝒇) 𝟎.𝟐𝟏𝟏𝟒𝟏

where
̂ is the estimated difference in means (i.e. this is just the difference between the
𝒅𝒊𝒇𝒇
two sample means = 13.61963-12.44167=1.17796).
𝒅𝒊𝒇𝒇𝒐 = 𝟎 is the difference between the two population means under the null
hypothesis
𝑺𝑬(𝒅𝒊𝒇𝒇) is the standard error of the difference in means.

4. Test the hypothesis that the population mean years of mother’s education (variable
meduc) is larger for non-black men than for black men. Use a significance level α =
0.05.
(i) What is the null hypothesis being tested in terms of the notation used in class?
What is the alternative hypothesis in terms of the notation used in class?

5
(ii) What do you conclude (i.e. do you reject the null at the 5% significance level), and
why?
(iii) Show how the t-statistic of 6.6322 was calculated.
(1 point each, 3 points in total)

Solution:
H0: µo = µ1
H1: µo > µ1

Here µ1 is the population mean years of mother’s education blacks and µo is the
population mean years of education for non-blacks.

(ii) What do you conclude (i.e. do you reject the null at the 5% significance level), and
why?

. ttest meduc, by (black)

Two-sample t test with equal variances

Group Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

0 758 10.91029 .099562 2.74112 10.71484 11.10574

1 99 8.939394 .308546 3.069994 8.327095 9.551693

combined 857 10.68261 .0973458 2.849756 10.49155 10.87368

diff 1.970896 .2971709 1.387626 2.554166

diff = mean(0) - mean(1) t = 6.6322

Ho: diff = 0 degrees of freedom = 855

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0

Pr(T < t) = 1.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 0.0000

We reject the null hypothesis (in favour of the alternative) because the p-
value=0.0000<significance level α = 0.05 and conclude that the population mean years of
mother’s education is significantly larger for non-black than for black men.

(iii) Show how the t-statistic of 6.6322 was calculated.

(1 point)
Solution:
̂ − 𝒅𝒊𝒇𝒇𝒐
𝒅𝒊𝒇𝒇 𝟏.𝟗𝟕𝟎𝟖−𝟎
t-stat = = = 6.632,
𝑺𝑬(𝒅𝒊𝒇𝒇) 𝟎.𝟐𝟗𝟕𝟐

where
̂ is the estimated difference in means (i.e. this is just the difference between the
𝒅𝒊𝒇𝒇
two sample means = 10.91029-8.939394=1.970896).

6
𝒅𝒊𝒇𝒇𝒐 = 𝟎 is the difference between the two population means under the null
hypothesis
𝑺𝑬(𝒅𝒊𝒇𝒇) is the standard error of the difference in means.

5. Finally, test the hypothesis that the population mean (average) weekly hours of work
(variable hours) is equal to 45 hours, against the two-sided alternative. Use a
significance level α = 0.05.
(i) What is the null hypothesis being tested in terms of the notation used in class?
What is the alternative hypothesis in terms of the notation used in class?
(ii) What do you conclude (i.e. do you reject the null at the 5% significance level), and
why?
(iii) Show how the t-statistic of -4.5314 was calculated.
(1 point each, 3 points in total)

Solution:
H0: µ = 45
H1: µ ≠ 45

Here µ is the population mean weekly hours of work.

(ii) What do you conclude (i.e. do you reject the null at the 5% significance level), and
why?

. ttest hours=45

One-sample t test

Variable Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]

hours 935 43.92941 .2362584 7.224256 43.46575 44.39307

mean = mean(hours) t = -4.5314

Ho: mean = 45 degrees of freedom = 934

Ha: mean < 45 Ha: mean != 45 Ha: mean > 45

Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000

We reject the null hypothesis (in favour of the alternative) because the p-
value=0.0000<significance level α = 0.05 and conclude that the population mean weekly
hours of work is significantly different than 45 hours a week.

(iii) Show how the t-statistic of -4.5314 was calculated.

Solution:

7
̅ −μ
X 𝟒𝟑.𝟗𝟐𝟗𝟒𝟏−𝟒𝟓
t-stat = s/√N𝟎 = = -4.5314,
𝟕.𝟐𝟐𝟒𝟐𝟓𝟔/√935

where
̅ is the sample mean hours of work for everyone in the sample
𝑿
𝝁𝒐 = 𝟎 is the population mean hours of work under the null hypothesis
𝑺𝑬(𝑿 ̅ ) = s/√N is the standard error of the sample mean of variable hours, which is
calculated as the ratio of the sample standard deviation of variable hours, s=7.224256
divided by the square root of the sample size, N=935.

Part II: Simple linear regression in STATA (9 points in total)

This part of the problem set introduces you to running regressions in STATA.

This problem uses STATA file BIG9SALARY.dta. This data was used by O. Baser and
E. Pema (2003) in a paper titled “The Return of Publications for Economics Faculty”.
BIG9SALARY.dta contains data on 223 faculty members of Economics departments in 9
universities in the US (Ohio State, Iowa, Indiana, Purdue, Michigan State, Minnesota,
Michigan, Wisconsin and Illinois) collected in year 1995. The dataset contains the
following variables:
id person identifier
salary total gross annual salary of the faculty member (in 1999 USD)
totpge total number of standardized article pages published
pubindex publication index (the product of number of articles published by
the rank of the journal)
top20phd =1 if Ph.D. was obtained from a top 20 Economics department; 0
otherwise
yearphd year when the Ph.D. was obtained
age age in years
female =1 if female; 0 otherwise
mich =1 for University of Michigan professors; 0 otherwise
(as well as dummies for each of the other universities).

Use dataset BIG9SALARY.dta to answer the following questions.

Consider the simple linear regression model relating a faculty member’s annual salary
(salary) to total number of standardized article pages published (totpge):

salary = 𝜷0 + 𝜷1 totpge + u

1. What is the meaning of the error term u?

(1 point)
Solution:

8
The error term u represents all factors that affect the outcome variable Y (in our
example, a faculty member’s annual salary), other than the explanatory variable X (in
our case, total number of article pages published).

2. Provide an example of a variable (factor) contained in u.

(1 point)
Solution:
Examples of variables in u are: a faculty member’s seniority level (assistant, associate
or full professor), tenure vs. non-tenure track; teaching evaluations; gender;
experience; type of university (e.g. public or private); geographic location, etc.

3. Now use the data in BIG9SALARY.dta to estimate this simple regression model and
answer the following questions:

Hint: in order to estimate the model in STATA, type: reg salary totpge

(i) Interpret the OLS estimate of the intercept.

(2 points)

Solution:
. reg salary totpge

Source SS df MS Number of obs = 233

F(1, 231) = 64.66
Model 3.8897e+10 1 3.8897e+10 Prob > F = 0.0000
Residual 1.3896e+11 231 601542208 R-squared = 0.2187
Adj R-squared = 0.2153
Total 1.7785e+11 232 766608670 Root MSE = 24526

salary Coef. Std. Err. t P>|t| [95% Conf. Interval]

totpge 89.65586 11.14946 8.04 0.000 67.68822 111.6235

_cons 66626.78 2414.201 27.60 0.000 61870.11 71383.45

The OLS estimate of the intercept is $66, 626.78 (note that STATA calls the intercept
_cons). The intercept is always interpreted as the predicted (average) value of Y when
X=0. In this example this means that a faculty member who zero total number of
standardized article pages published (i.e. has no published articles) has annual salary
of $66, 626.78, on average.

(ii) Interpret the slope (coefficient) on totpge.

(2 points)
Solution:
The OLS estimate of the slope is $89.66 (rounded to the second decimal).

9
The slope is always interpreted as the estimated effect of a one-unit increase in X on the
average value of Y. In our example this means that every additional article pages
increases a faculty member’s salary by $89.66, on average.

(iii) What is the effect of a 100-page increase in total number of standardized article
pages published on annual salary?
(1 point)
Solution:
From part (ii) we found that 1 more article page increases salary by $89.66; therefore,
100 more pages will increase salary by 100*($89.66)=$8965.59. You can calculate this
with STATA by typing:
di 100 * 89.65586
8965.586
Notice there is a linear relationship between X (totpge) and Y (salary) – every unit
increase of X (totpge) increases Y (salary) by the same amount – by $95.35.
FYI: You may find it easier to use the formula from class linking the change in Y for
any given change in X:
∆𝒚̂=𝜷 ̂ 1∆X = 100*($89.66)=$8965.59.

(iv) Use STATA command

browse totpge salary if id==330

to display the actual salary and number of published standardized article pages of the
faculty member with identifier number 330 (this is a MSU faculty member, who has the
highest number of published article pages amongst everyone in the sample). What is the
predicted annual salary of this faculty member?
Hint: The predicted annual salary of this faculty member 𝑌̂𝑖 is given by the value on the
regression line corresponding to totpge =1060.5.
(1 point)

Solution:

The predicted annual salary of this faculty member 𝒀̂ 𝒊 is given by the value on the
regression line corresponding to totpge =1060.5.
̂𝒊= ̂
𝒀 ̂ 1Xi = 66626.78 + 89.65586*1060.5= $161,706.82
𝜷0 + 𝜷
You can calculate this with STATA by typing
. di 66626.78 + 89.65586*1060.5
161706.82

(v) What is the actual salary of this faculty member? Find the residual for this faculty
member. Does our regression suggest this faculty member is underpaid or overpaid?

10
Hint: The residual 𝑢̂i is given by the difference between the actual and the predicted salary
for this observation: 𝑢̂i = 𝑌𝑖 − 𝑌̂𝑖 . You calculated the predicted salary in part (iv).
(2 points)

Solution:
The actual salary of this faculty member is $92,083 (this is given by the value of
variable salary for this faculty member).
The residual 𝒖 ̂ i for this observation is given by the difference between the actual and
the predicted salary for this observation:
̂ 𝒊 = $𝟗𝟐, 𝟎𝟖𝟑 − $𝟏𝟔𝟏, 𝟕𝟎𝟔. 𝟖𝟐 = $-69, 623.82
̂ i = 𝒀𝒊 − 𝒀
𝒖
With STATA:
. di 92083 - 161706.82
-69623.82
Our regression suggests that this faculty member is underpaid.

(vi) Now estimate the model separately for the male and female faculty members.
Are the estimates of the intercepts the same? What about the slope estimates? How do you
interpret this?
Do the regression results provide convincing evidence that either group of faculty members
is discriminated against? Explain.
(3 points)

Hint: This question asks you to this about the difference between estimates and population
parameters, and the correlation versus causal relationship.
To estimate the model separately for women and men type in the command window:
reg salary totpge if female==1
reg salary totpge if female==0

Solution:
The estimation results are presented below. The estimates of the intercepts are
different, suggesting that with no articles published (totpge=0) female faculty
members earn about $10,500 lower salary, on average. However, the slope estimates
suggest that female faculty members are paid about $37 more for every additional
page published.
The regression results suggest that there may be differential payment for each group
but they do not provide convincing evidence that either male or female faculty
members are discriminated against? First, we can only see the difference in the
estimates of the slopes and intercepts, while we still don’t know if the population
parameters differ in the population. Secondly, our results may not have a causal
interpretation since we are not sure male and female faculty members are the same,
on average, in all dimensions affecting earnings (i.e. it might be the case that female
faculty members are mainly employed in language schools, where the salary levels are
generally lower).
Ultimately, we need to develop tools for analysing this question at a deeper level.
. reg salary totpge if female==1

Source | SS df MS Number of obs = 21

11
-------------+---------------------------------- F(1, 19) = 4.07
Model | 1.7705e+09 1 1.7705e+09 Prob > F = 0.0579
Residual | 8.2608e+09 19 434781062 R-squared = 0.1765
-------------+---------------------------------- Adj R-squared = 0.1332
Total | 1.0031e+10 20 501566396 Root MSE = 20851
------------------------------------------------------------------------------
salary | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
totpge | 122.0529 60.48352 2.02 0.058 -4.540533 248.6464
_cons | 57594.09 6323.063 9.11 0.000 44359.77 70828.42
------------------------------------------------------------------------------

. reg salary totpge if female==0

Source | SS df MS Number of obs = 211

-------------+---------------------------------- F(1, 209) = 52.77
Model | 3.2585e+10 1 3.2585e+10 Prob > F = 0.0000
Residual | 1.2906e+11 209 617526916 R-squared = 0.2016
-------------+---------------------------------- Adj R-squared = 0.1978
Total | 1.6165e+11 210 769752285 Root MSE = 24850
------------------------------------------------------------------------------
salary | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
totpge | 85.01794 11.7039 7.26 0.000 61.94511 108.0908
_cons | 68174.95 2634.973 25.87 0.000 62980.42 73369.48
------------------------------------------------------------------------------

. di 68174.95- 57594.09
10580.86

. di 85.01794- 122.0529
-37.03496

Part III: Algebra of OLS (17 points in total)

For this part of the question no STATA submission is required.

1. Consider the simple linear regression model

Y = 𝜷0 + 𝜷1X + u.
Show that the OLS estimators of 𝜷0 and 𝜷1 minimizing the sum of squared residuals
𝑁

𝑚𝑖𝑛{𝛽̂0,𝛽̂1 } ∑ (𝑌𝑖 − 𝛽̂0 − 𝛽̂1 𝑋𝑖 )2

𝑖=1

are given by

𝛽̂0 = 𝑌̅ − 𝛽̂1 𝑋̅

∑𝑁 ̅ ̅
𝑖=1(𝑋𝑖 − 𝑋 ) (𝑌𝑖 − 𝑌)
𝛽̂1 =
∑𝑖=1(𝑋𝑖 − 𝑋̅)2
𝑁

Hint: We proved this in class. No need to show that

12
𝑁 𝑁

∑(𝑋𝑖 − 𝑋̅) (𝑌𝑖 − 𝑌̅) = ∑ 𝑋𝑖 𝑌𝑖 − 𝑁 𝑋̅𝑌̅

𝑖=1 𝑖=1
𝑁 𝑁

∑(𝑋𝑖 − 𝑋̅)2 = ∑ 𝑋𝑖2 − 𝑁𝑋̅ 2

𝑖=1 𝑖=1
as we did this in homework 1.

(7 points)

Solution:

First order conditions (FOC):

{𝛽̂0 }: 2 ∑(𝑌𝑖 − 𝛽̂0 − 𝛽̂1 𝑋𝑖 )(−1) = 0 (1)

{𝛽̂1 }: 2 ∑(𝑌𝑖 − 𝛽̂0 − 𝛽̂1 𝑋𝑖 )(−𝑋𝑖 ) = 0 (2)

𝟏
Simplifying by multiplying both sides of each equation by - 𝟐 yields:

{𝛽̂0 }: ∑(𝑌𝑖 − 𝛽̂0 − 𝛽̂1 𝑋𝑖 ) = 0 (1)

{𝛽̂1 }: ∑[(𝑌𝑖 − 𝛽̂0 − 𝛽̂1 𝑋𝑖 )𝑋𝑖 ] = 0 (2)

These two expressions and are called the normal equations.

Let’s solve these and find the formulae of 𝛽̂0 and 𝛽̂1.

From (1):

∑(𝑌𝑖 − 𝛽̂0 − 𝛽̂1 𝑋𝑖 ) = 0

⟺ ∑ 𝑌𝑖 − 𝑁𝛽̂0 − 𝛽̂1 ∑ 𝑋𝑖 = 0

Solving for 𝛽̂0 yields:

− ∑ 𝑌𝑖 ∑ 𝑋𝑖
𝛽̂0 = + 𝛽̂
−𝑁 −𝑁 1

⟺ 𝛽̂0 = 𝑌̅ − 𝛽̂1 𝑋̅

In order to obtain to formula for 𝜷 ̂ 𝟏 , we’re going to substitute the expression for
𝛽̂0 into (𝟐). But before this, let’s simplify (2):

13
∑(𝑌𝑖 − 𝛽̂0 − 𝛽̂1 𝑋𝑖 )(𝑋𝑖 ) = 0

⟺ ∑ 𝑌𝑖 𝑋𝑖 − 𝛽̂0 ∑ 𝑋𝑖 − 𝛽̂1 ∑ 𝑋𝑖2 = 0

Substituting β̂ 0 from (1)and diving and multiplying the second term by 𝑁 yields:

∑ 𝑋𝑖
⟺ ∑ 𝑌𝑖 𝑋𝑖 − 𝑁(𝑌̅ − 𝛽̂1 𝑋̅) 𝑁
− 𝛽̂1 ∑ 𝑋𝑖2 = 0

⟺ ∑ 𝑌𝑖 𝑋𝑖 − 𝑁𝑋̅𝑌̅ + 𝑁𝛽̂1 𝑋̅ 2 − 𝛽̂1 ∑ 𝑋𝑖2 = 0

⟺ (∑ 𝑋𝑖 𝑌𝑖 − 𝑁𝑋̅𝑌̅) − 𝛽̂1 (∑ 𝑋𝑖2 − 𝑁𝑋̅ 2 ) = 0

Solving for 𝛽̂1 :

∑ 𝑋𝑖 𝑌𝑖 − 𝑁𝑋̅𝑌̅
𝛽̂1 =
∑ 𝑋𝑖2 − 𝑁𝑋̅ 2

And we’ve already shown in lecture 1 (and in homework 1) that this can be rewritten as:

∑𝑁 ̅ ̅
𝑖=1(𝑋𝑖 − 𝑋 ) (𝑌𝑖 − 𝑌 )
𝛽̂1 =
∑𝑁 ̅ 2
𝑖=1(𝑋𝑖 − 𝑋 )

2. Throughout this class, we are going to be using STATA to calculate regression lines,
but it is a good idea to compute a regression line by hand once.
a) Calculate the OLS estimates of the intercept and slope from a regression of Y
on X, using the following data. Show all your calculations!

Obs. i Xi Yi
1 3 3
2 9 6
3 6 3

(1 point for each estimate, 2 points in total)

b) Check that the point (𝑋̅, 𝑌̅) lies on the regression line, where 𝑋̅ is the sample
average of variable X, and 𝑌̅ is the sample average of variable Y. (1 point)

c) Show that the property you illustrated in part b) also holds in general, i.e. that
the point (𝑋̅, 𝑌̅) always lies on the regression line. Your solution cannot use
the formula for the intercept estimate. (7 points)

14
Hint: write the residual for a single observation:

̂𝑖
𝑢̂𝑖 = 𝑌𝑖 − 𝑌
From here:
̂𝑖 + 𝑢̂𝑖
𝑌𝑖 = 𝑌

̂𝑖 = β̂0 + β̂1 𝑋𝑖 :
Replace 𝑌
𝑌𝑖 = β̂0 + β̂1 𝑋𝑖 + 𝑢̂𝑖
Then sum over all the data points – from 1 to N:
𝑁 𝑁

∑ 𝑌𝑖 = ∑(β̂0 + β̂1 𝑋𝑖 + 𝑢̂𝑖 )

𝑖=1 𝑖=1

⇔
𝑁 𝑁 𝑁 𝑁

∑ 𝑌𝑖 = ∑ β̂0 + ∑ β̂1 𝑋𝑖 + ∑ 𝑢̂𝑖

𝑖=1 𝑖=1 𝑖=1 𝑖=1
⇔
𝑁 𝑁

∑ 𝑌𝑖 = 𝑁β̂0 + β̂1 ∑ 𝑋𝑖
𝑖=1 𝑖=1

(since ∑𝑁
𝑖=1 𝑢
̂ 𝑖 = 0 from the normal equation).

From here: 𝑌̅ = β̂0 + β̂1 𝑋̅, which means that the point (𝑋̅, 𝑌̅) lies on the regression line.

OPTIONAL: double-check your calculation for part a) with STATA by following the
steps below:

1. Open STATA and select Data/Data editor/Data editor (Edit).

2) Type in the values of X as var1, Type in the values of Y as var2 in the data editor (or
just copy/paste the numbers from the Word table):

15
3) Close the Data Editor. You can now see var1 and var2 in the variable list.

4) Rename var1 as X, var2 as Y by typing the following code:

rename var1 X
rename var2 Y
You can now see the variables as X and Y:

5) Regress Y on X by typing:
reg Y X
(Beware that STATA is case sensitive, if you call the variables Y and X (caps) you
should use caps in the regression command as well).

Do you get the same result for the slope and intercept as the one you calculated?

Solution:
a)

Obs.
Xi Yi (Xi  X ) (Yi  Y ) (X i  X )2 ( X i  X )(Yi  Y )
i
1 3 3 -3 -1 9 3
2 9 6 3 2 9 6
3 6 3 0 -1 0 0
∑ 0 0 18 9

̅ = 6, 𝐘
𝐗 ̅=𝟒

16
𝑁

∑(𝑋𝑖 − 𝑋̅) (𝑌𝑖 − 𝑌̅) = 9

𝑖=1
𝑁

∑(𝑋𝑖 − 𝑋̅)2 = 18
𝑖=1

Our estimate of the slope parameter is 𝜷 ̂ 𝟏 = 9/18 = 0.5 (we estimate that a 1-unit
increase in X increases the average of Y by 0.5).

Our estimate of the intercept is:

̂ 𝟎 =𝒀
𝜷 ̅−𝜷 ̂ 𝟏𝐗
̅ = 4 – 6·0.5 = 1 (we estimate that the average of Y is 1 when X equals
zero).

Alternatively, you can use the properties of summations we proved earlier in the
course, which will save you some calculations:
𝑁 𝑁

∑(𝑋𝑖 − 𝑋̅) (𝑌𝑖 − 𝑌̅) = ∑ 𝑋𝑖 𝑌𝑖 − 𝑁 𝑋̅𝑌̅

𝑖=1 𝑖=1

𝑁 𝑁

∑(𝑋𝑖 − 𝑋̅)2 = ∑ 𝑋𝑖2 − 𝑁𝑋̅ 2

𝑖=1 𝑖=1

Obs. i Xi Yi Xi Y i 𝑿𝟐𝒊
1 3 3 9 9
2 9 6 54 81
3 6 3 18 36
∑ 81 126

𝑵
̅𝒀
∑ 𝑿𝒊 𝒀𝒊 − 𝑵 𝑿 ̅ = 𝟖𝟏 − 𝟑 · 𝟔 · 𝟒 = 𝟗
𝒊=𝟏

𝑵
̅ 𝟐 = 𝟏𝟐𝟔 − 𝟑 · 𝟔𝟐 = 𝟏𝟖
∑ 𝑿𝟐𝒊 − 𝑵𝑿
𝒊=𝟏
̂ 𝟏 = 9/18 = 0.5, 𝜷
𝜷 ̂ 𝟎 = ̅𝒀 − 𝜷
̂ 𝟏𝐗
̅ = 4 – 6·0.5 = 1

The STATA output below confirms our calculations.

17
. reg Y X

Source SS df MS Number of obs = 3

F( 1, 1) = 3.00
Model 4.5 1 4.5 Prob > F = 0.3333
Residual 1.5 1 1.5 R-squared = 0.7500
Adj R-squared = 0.5000
Total 6 2 3 Root MSE = 1.2247

Y Coef. Std. Err. t P>|t| [95% Conf. Interval]

X .5 .2886751 1.73 0.333 -3.167965 4.167965

_cons 1 1.870829 0.53 0.687 -22.77113 24.77113

b)
̅ ̅
𝐗 = 6, 𝐘 = 𝟒 – plugging 𝐗̅ = 6 into the equation of the regression line we obtain:
c) The hint contains the complete solutions; I just wanted you to go through it.
See graph below for illustration:

Midterm Fall2011
No ratings yet
Midterm Fall2011
13 pages
Econ 251 PS1 Solutions
No ratings yet
Econ 251 PS1 Solutions
5 pages
Mostly Harmless Econometrics: An Empiricist's Companion
No ratings yet
Mostly Harmless Econometrics: An Empiricist's Companion
3 pages
Econ 251 PS5 Solutions
No ratings yet
Econ 251 PS5 Solutions
16 pages
ECON7310: Elements of Econometrics: Research Project 2
No ratings yet
ECON7310: Elements of Econometrics: Research Project 2
29 pages
Group Project No3
No ratings yet
Group Project No3
4 pages
Midterm
No ratings yet
Midterm
5 pages
ECMT1020 - Week 04 Workshop PDF
No ratings yet
ECMT1020 - Week 04 Workshop PDF
4 pages
Partial Derivative and Its Economic Application
No ratings yet
Partial Derivative and Its Economic Application
21 pages
Analysing Panel Data Using STATA
No ratings yet
Analysing Panel Data Using STATA
13 pages
The Stern-Gerlach Experiment and Spin: Nonuniform
No ratings yet
The Stern-Gerlach Experiment and Spin: Nonuniform
25 pages
Stock Watson 3U ExerciseSolutions Chapter6 Instructors
No ratings yet
Stock Watson 3U ExerciseSolutions Chapter6 Instructors
13 pages
Solutions To The Review Questions at The End of Chapter 7
No ratings yet
Solutions To The Review Questions at The End of Chapter 7
7 pages
Homework 07 Answers
No ratings yet
Homework 07 Answers
3 pages
Stock Watson 4E EE Solutions EE 2 1
No ratings yet
Stock Watson 4E EE Solutions EE 2 1
2 pages
At Siara S Macro Math Camp 2017
No ratings yet
At Siara S Macro Math Camp 2017
130 pages
Cultural Hegemony
No ratings yet
Cultural Hegemony
3 pages
Let Today Be November 3 2008 A Use The Libor Rate
No ratings yet
Let Today Be November 3 2008 A Use The Libor Rate
2 pages
Econ 138: Financial and Behavioral Economics: Noise-Trader Risk in Financial Markets February 8 & 13, 2017
0% (1)
Econ 138: Financial and Behavioral Economics: Noise-Trader Risk in Financial Markets February 8 & 13, 2017
35 pages
5probability Mass Function
No ratings yet
5probability Mass Function
9 pages
PT As I Still See It Clean Copy Markowitz 2010
No ratings yet
PT As I Still See It Clean Copy Markowitz 2010
42 pages
Financial Time Series Models
No ratings yet
Financial Time Series Models
11 pages
Reading 2 Time-Series Analysis
No ratings yet
Reading 2 Time-Series Analysis
47 pages
CQF FINAL PROJECT MARCELO BRANDAO Final
No ratings yet
CQF FINAL PROJECT MARCELO BRANDAO Final
40 pages
Exercise 4 Key - Problem 4 p54 C7 p57
No ratings yet
Exercise 4 Key - Problem 4 p54 C7 p57
3 pages
Econ 251 S2018 PS6 Solutions
No ratings yet
Econ 251 S2018 PS6 Solutions
16 pages
STA3045F Exam 2012
No ratings yet
STA3045F Exam 2012
5 pages
A Guide To Modern Econometrics, 5th Edition Answers To Selected Exercises - Chapter 2
No ratings yet
A Guide To Modern Econometrics, 5th Edition Answers To Selected Exercises - Chapter 2
5 pages
Econ 138: Financial and Behavioral Economics: The Efficient Markets Hypothesis February 1, 2017
No ratings yet
Econ 138: Financial and Behavioral Economics: The Efficient Markets Hypothesis February 1, 2017
28 pages
Econ 251 PS4 Solutions
No ratings yet
Econ 251 PS4 Solutions
11 pages
Problems With OLS
No ratings yet
Problems With OLS
8 pages
Lesson Plan Class 11 July, 2019
No ratings yet
Lesson Plan Class 11 July, 2019
1 page
Using Stata To Replicate Table 4 in Bond PDF
No ratings yet
Using Stata To Replicate Table 4 in Bond PDF
3 pages
Midterm Reviews
100% (1)
Midterm Reviews
4 pages
Ito Process
No ratings yet
Ito Process
76 pages
The Econometric Modelling of Financial Time Series: Terence C. Mills
100% (1)
The Econometric Modelling of Financial Time Series: Terence C. Mills
11 pages
4 - LM Test and Heteroskedasticity
No ratings yet
4 - LM Test and Heteroskedasticity
13 pages
Black Karasinski PDF
No ratings yet
Black Karasinski PDF
49 pages
Bootstrap: Estimate Statistical Uncertainties
No ratings yet
Bootstrap: Estimate Statistical Uncertainties
22 pages
The Origin and Scope of Behavioral Financial Economics January 18, 2017
No ratings yet
The Origin and Scope of Behavioral Financial Economics January 18, 2017
21 pages
Lucas Tree PDF
100% (1)
Lucas Tree PDF
11 pages
Markowitz-Portfolio-Optimization
No ratings yet
Markowitz-Portfolio-Optimization
10 pages
Time Series Analysis
No ratings yet
Time Series Analysis
3 pages
Heteroskedasticity
No ratings yet
Heteroskedasticity
30 pages
Solutions 5
No ratings yet
Solutions 5
11 pages
Johansen Cointegration Test
No ratings yet
Johansen Cointegration Test
7 pages
UsefulStataCommands PDF
No ratings yet
UsefulStataCommands PDF
51 pages
Econ 251 PS2 Solutions
No ratings yet
Econ 251 PS2 Solutions
11 pages
Useful Stata Commands
No ratings yet
Useful Stata Commands
48 pages
Stock Watson 3u Exercise Solutions Chapter 13 Instructors
No ratings yet
Stock Watson 3u Exercise Solutions Chapter 13 Instructors
15 pages
004 - Modelling Volatility - Arch and Garch Models
0% (1)
004 - Modelling Volatility - Arch and Garch Models
31 pages
20 A Long-Run and Short-Run Component Model of Stock Return Volatility
No ratings yet
20 A Long-Run and Short-Run Component Model of Stock Return Volatility
23 pages
PSS 1 Solutions - MGMT331
No ratings yet
PSS 1 Solutions - MGMT331
11 pages
Analyze The Report of Swedish Motor Insurance
No ratings yet
Analyze The Report of Swedish Motor Insurance
14 pages
The Two Facets of Pride
No ratings yet
The Two Facets of Pride
38 pages
X The Martingale Representation Approach: 1 Notation and Definitions
No ratings yet
X The Martingale Representation Approach: 1 Notation and Definitions
15 pages
ARCH Model
No ratings yet
ARCH Model
26 pages
Web Additional Exercises
No ratings yet
Web Additional Exercises
14 pages
Parametric and Non Parametric Test c1
No ratings yet
Parametric and Non Parametric Test c1
46 pages
Panel Data
No ratings yet
Panel Data
9 pages
Term Structure JP Morgan Model (Feb04)
No ratings yet
Term Structure JP Morgan Model (Feb04)
7 pages
Heath Jarrow Morton A Interest Rate Model For CVA Calculations
No ratings yet
Heath Jarrow Morton A Interest Rate Model For CVA Calculations
9 pages
Business Forecasting 9th Edition Hanke Solution Manual
71% (7)
Business Forecasting 9th Edition Hanke Solution Manual
9 pages
L 1-R Credit Spread D (0, T - Years) SP Cumul: PDF Cum
No ratings yet
L 1-R Credit Spread D (0, T - Years) SP Cumul: PDF Cum
5 pages
Leadcon
No ratings yet
Leadcon
2 pages
Quantum Mechanics - I Tutorial 0
No ratings yet
Quantum Mechanics - I Tutorial 0
15 pages
Time Series
No ratings yet
Time Series
31 pages
Xlstat 14
No ratings yet
Xlstat 14
3 pages
Chapter 8 Review
No ratings yet
Chapter 8 Review
6 pages
Stat 97
No ratings yet
Stat 97
10 pages
One Hundred Years of Quantum Mysteries
No ratings yet
One Hundred Years of Quantum Mysteries
8 pages
T Test 1
No ratings yet
T Test 1
23 pages
Chap 05 Decision Theory
No ratings yet
Chap 05 Decision Theory
35 pages
Hydrogen PDF
No ratings yet
Hydrogen PDF
8 pages
Big 5 and Relationship Satisfaction
No ratings yet
Big 5 and Relationship Satisfaction
8 pages
EC 744 Lecture Notes: Incomplete Markets and Bewley Models: Jianjun Miao
No ratings yet
EC 744 Lecture Notes: Incomplete Markets and Bewley Models: Jianjun Miao
39 pages
B.Sc. (Hons.) Biotechnology Core Course 13: Basics of Bioinformatics and Biostatistics (BIOT 3013) Biostatistics (BIOT 3013)
No ratings yet
B.Sc. (Hons.) Biotechnology Core Course 13: Basics of Bioinformatics and Biostatistics (BIOT 3013) Biostatistics (BIOT 3013)
29 pages
General Relativity and Cosmology: Unsolved Questions and Future Directions
No ratings yet
General Relativity and Cosmology: Unsolved Questions and Future Directions
82 pages
Quick and Dirty Regression Tutorial
No ratings yet
Quick and Dirty Regression Tutorial
6 pages
Newbold Stat7 Ism 09
No ratings yet
Newbold Stat7 Ism 09
17 pages
International Journal of Quantum Chemistry - 1985 - 28
No ratings yet
International Journal of Quantum Chemistry - 1985 - 28
7 pages
SOC 101 - 2022 Course Outline
No ratings yet
SOC 101 - 2022 Course Outline
6 pages
CH 1 Lec 2
No ratings yet
CH 1 Lec 2
9 pages
Quantum Condensed Matter Past Papers
No ratings yet
Quantum Condensed Matter Past Papers
3 pages
Correction Homework Chapter 12 Fall 2018
No ratings yet
Correction Homework Chapter 12 Fall 2018
6 pages
Momentum Conservation Principle
No ratings yet
Momentum Conservation Principle
6 pages
Atomic Structure DTS-9
No ratings yet
Atomic Structure DTS-9
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.