0% found this document useful (0 votes)
57 views7 pages

Pset 7

This document contains a problem set analyzing the relationship between height and earnings using statistical tests and regression analysis. Problem 1 finds the median height is 67. Problem 2 uses a t-test to show that taller workers earn more on average. Problem 3 performs linear regression and finds that for each additional inch in height, earnings rise by $707.70. Problem 4 finds that the relationship between height and earnings differs for males and females. Specifically, male earnings rise more steeply with height.

Uploaded by

jake frei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views7 pages

Pset 7

This document contains a problem set analyzing the relationship between height and earnings using statistical tests and regression analysis. Problem 1 finds the median height is 67. Problem 2 uses a t-test to show that taller workers earn more on average. Problem 3 performs linear regression and finds that for each additional inch in height, earnings rise by $707.70. Problem 4 finds that the relationship between height and earnings differs for males and females. Specifically, male earnings rise more steeply with height.

Uploaded by

jake frei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Problem Set 7

November 30, 2015

1
2
3
Problem 1

The median value of height is 67.

Problem 2

Part a.

The mean earnings for those of height  67 is 44488.44

Part b.

The mean earnings for those of height > 67 is 49987.88

Part c.

We do a hypothesis test to determine whether taller workers earn more, on average.

Suppose there are nt number of workers with height > 67, and suppose there are ns number of workers
with height  67.

Let E1tall , . . . , Entall


t
denote the earnings of tall workers, and let E1short , . . . , Enshort
s
denote the earnings of
short workers. Suppose they are all drawn IID from the respective tall and short populations.

1 P 1 P
Let nt Eitall = Ē tall denote the sample mean of tall workers, and let nt Eishort = Ē short denote
the sample mean of short workers.

Let E[Eitall ] = µtall denote the population mean of tall workers, and let E[Eishort ] = µshort denote the
population mean of short workers.

Then we are interested in the hypotheses:

H0 : µtall
0 µshort
0 = 0 vs. H1 : µtall
1 µshort
1 6= 0

for which we use the following t-statistic

Ē tall Ē short (µtall


0 µshort
0 )
t = tall short
SE(Ē Ē )
Ē tall Ē short (µtall
0 µshort
0 )
= p
SE(Ē tall )2 + SE(Ē short )2

where µtall
0 µshort
0 , and
✓P ◆
1 (Eitall Ē tall )2
SE(Ē tall )2 = nt = nt 1 ˆtall
2
nt 1

4
✓P ◆
(Eishort Ē short )2
SE(Ē short )2 = ns 1 = ns 1 ˆshort
2
ns 1

Plugging in numbers from the dataset,

Ē tall Ē short (µtall0 µshort


0 )
t = p
SE(Ē tall )2 + SE(Ē short )2
49987.88 44488.44 (0)
= q
nt 1 ˆtall
2 + ns 1 ˆshort
2

49987.88 44488.44 (0)


= p
(7756) 1 (26896.56)2 + (10114) 1 (26700.39)2

= 13.6

which leads to a rejection of the null hypothesis at the .001 = .1% level. Hence, we conclude that
earnings are different for tall and short workers. Keep in mind, however, that this does not necessarily
imply that earnings are caused by the difference in height.

Part d.

The 95% confidence interval is

(49987.88 44488.44) ± 1.96 · SE(Ē tall Ē short )


= [4707.0, 6291.9]

Problem 3

Part a.

The estimated slope is 707.7, and the estimated intercept is -512.7.

Part b.

The estimated regression equation is

ˆ
earnings = ˆ0 + ˆ1 height

= 512.7 + 707.7 · height

When height is 65, predicted earnings are

512.7 + 707.7 · (65) = 45488

When height is 67, predicted earnings are

512.7 + 707.7 · (67) = 46903

5
When height is 70, predicted earnings are

512.7 + 707.7 · (70) = 49026

Part c.

The robust standard error of ˆ1 is 50.395, and the 95% confidence interval is

707.7 ± 1.96 · (50.395) = [608.89, 806.45]

Problem 4

Part a.

The estimated slope is ˆ1f emale = 511, and the estimated intercept is 12650.9. The standard error of
the slope is 97.6, and the 95% confidence interval is

511 ± 1.96 · (97.6) = [319.9, 702.5]

Part b.

The estimated slope is ˆmale = 1307, and the estimated intercept is -43130. The standard error of the
slope is 98.9, and the 95% confidence interval is

1307 ± 1.96 · (98.9) = [1113.1, 1500.6]

Part c.

We can do a 2-sample t-test to determine this. The hypotheses are

male f emale male f emale


H0 : 0 0 = 0 vs. H1 : 0 0 6= 0,

and the t-statistic is

ˆmale ˆf emale ( male f emale


)
0 0
t =
SE( ˆmale ˆf emale )
ˆmale ˆf emale (male f emale
)
0 0
= q
SE( ˆmale )2 + SE( ˆf emale )2
1307 511 (0)
= p
97.62 + 98.92
= 5.72

which leads to a rejection of the null hypothesis at the .001 = .1% level.

6
Alternatively, you could have estimated the equation

earnings = 0 + 1 height + 2 sex + 3 (sex ⇥ height) + ui

where sex ⇥ height is an “interaction” term which takes on the value of height when sex = 1 is male,
and takes on the value of 0 when sex = 0 is female. Then you would have gotten an estimate of
ˆ3 = 795.6 with a robust standard error of SE( ˆ3 ) = 138.91, leading to a t-statistic of 5.73.

Problem 5

Unless this is a very special dataset, there can be many factors that are correlated with height and
affect earnings. For example: race, geography, nutrition, parents’ wealth, etc.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy