0% found this document useful (0 votes)
17 views8 pages

Practice Questions - Final With Feedback

Uploaded by

tanisha.m.dighe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

Practice Questions - Final With Feedback

Uploaded by

tanisha.m.dighe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

EC1108

Professor Michael Spagat

Spring 2022

Questions for the Wednesday, April 19 Revision Session with Answers

A test for COVID-19 has the following properties:

- 95% of individuals who have the virus test positive


- 8% of individuals who do not have the virus test positive

40% of the population has the virus. What is the probability that an individual who tests
positive, really does have virus?

Answer – It’s helpful to imagine many people, say 100,000, who all take a COVID test.

Of these, 40,000 have the virus (40%)

Of these, 0.95*40,000 = 38,000 test positive

60,000 out of the original 100,000 do not have the virus

Of these, 0.08*60,000 = 4,800 test positive

So the probability that a person who has tested positive actually has the virus is
38,000/(38,000 + 4,800) = 0.89
2

The following table reports data on public health expenditures and COVID-19 death
rates for a selected set of countries.

Country Public health expenditure COVID-19 death rate in


as % of GDP %
Kapitaland 2% 6%
Unified Islands 4% 4.5%
Scandiland 8% 2%
Medi 4% 5%
Teutonic 6% 3%

Provide R code to compute the correlation between public health expenditures and the
COVID-19 death rate?

Answer

health <- c(2,4,8,4,6)

death <- c(6, 4.5, 2, 5, 3)

cor(health, death)

If you actually run this in R, which you wouldn’t be able to do on the final, you would find
a strong negative correlation of -0.99 between health expenditure percentage and the
Covid death rate.

In a randomly selected sample of 1,000 people, the number of individuals testing


positive for COVID-19 antibodies is 342. Find the lower bound of a confidence interval
at the 95% level for the fraction of the population who has COVID19 antibodies. Note –
you can answer this question without any R code; please do so for this revision session.

Answer
p=342/1000=0.342

In words, the central estimate for this proportion is 0.0342

Note - We have quite a large sample so we can use the normal distribution. You should
remember that the critical value for a 95% confidence interval using the normal
distribution is 1.96. If you forget this fact you could write the code qnorm(0.975) into
your answer.

Compute the confidence interval

𝑝(1−𝑝) 0.342(1−0.342)
𝑧 = 𝑝 ± 𝑧𝛼 √ =0.342±1.96√ = 0.342 ± 1.96 ∙
𝑛 1000
0.015001 =[0.3126,0.3714]

The confidence interval for the estimate of the population proportion of individuals
testing positive is between 31.26 and 37.14%. The lower bound is 31.26%.

A government official claims that each of the 100 test centres in the country conduct, on
average, 1,000 COVID19 tests per week. You obtain data from a random sample of 9
testing centres and find that, on average, they have conducted 820 tests per week with
a sample variance of 144. Assume that the distribution of tests per centre is normal and
provide R code that will help you determine whether, at a 5% significance level, the
official claim is valid. State how acceptance or rejection of the government claim will
depend on the result you would get from running your R code.

Answer

This is a small sample so we should use the t distribution rather than the normal
distribution.

Step 1 – State the null and alternative hypotheses

We should conduct a one-sided test of the hypothesis that mean tests per week are
1,000 against the alternative hypothesis that mean tests per week are less than 1,000.
Step 2 – Determine the Critical Value

It’s a one-sided test with 5% significance so we need to find the value for which 5% of
the area of a t distribution with 9 – 1 = 8 degrees of freedom will be to left. We get this
from:

qt(0.05, 8)

Step 3: Calculate the test statistic

You can do this on a calculator, or even with paper and pencil in this case, if you want.

Probably it’s helpful to do this calculation in two stages:

i. Calculate the standard error. This is the sample standard deviation divided by the
square root of the sample size. The sample standard deviation is the square root of the
sample variance so the standard error is:

12/3 = 4

ii. The test statistic is the difference between the sample mean and the null-hypothesis
mean divided by the standard error.

(820 – 1000)/4 = -45

In R you could get here with the ugly looking:

(820 - 1000)/((144^0.5)/9^0.5)

Step 4 – Draw a conclusion

The question will be whether -45 is smaller than qt(0.05, 8). If so then we’ll reject at the
5% level the hypothesis that the mean number of tests at each centre is 1,000 in favour
of the hypothesis that this mean is less than 1,000.

FYI if you compute qt(0.05, 8) in R you get 1.86 so the null hypothesis would get
rejected by a country mile. But on the test it would be good enough to leave you
answer with the previous paragraph.
5

Among a random sample of 100 students, 20 test positive for COVID19. A politician
claims that at least 1/3rd of students in the country have COVID. Provide R code that
would test the politician’s claim at a 5% level of significance level.

Answer –

Step 1: formulate the hypothesis

This will be a one-sided test (note the phrase “at least” above). The null hypothesis that
1/3rd of students have COVID will be tested against the alternative hypothesis that less
than 1/3rd of students have COVID.

Step 2: determine the critical value

100 students can be considered a large sample so we can use the normal distribution.
However, it is totally fine to use the t instead. The two will hardly differ.

For the normal the R command is:

qnorm(0.05)

For the t the R command is:

qt(0.05, 99)

FYI, these two numbers differ by only about 0.015.

Step 3: calculate the test statistic

𝑝−𝜋 0.2−1/3
𝑧= = =-2.8284
𝜋(1−𝜋) 1/3(1−1/3)
√ √
𝑛 100
Step 4: draw a conclusion

If the test statistic -2.8284 < (left hand side) critical value qnorm(0.05) then reject the
Null hypothesis

If you run the R code you see that we would reject the politician’s claim in favour of the
alternative that less than a third of students have tested positive nationwide.

A scientist wants to assess whether age affects the effectiveness of a drug. She runs a
regression on a large dataset where a measure of drug effectiveness is the response
variable and Age is the explanatory variable. She obtains the following results:

Coef. Std. Err.


Age -0.200 .110
Constant .512 .443

Calculate the t-statistic on Age for a 5% significance test. Decide whether age is likely to
be informative regarding the effectiveness of the drug. How would your answer change
if the estimated coefficient came out to be 0.100 or 0.350? Note – you do not need R
code to answer this question.

Answer

Step 1: Formulate the hypothesis

This is a two-sided hypothesis

Ho: beta=0 Ha: beta ≠0

Step 2: Determine the critical value

The question states that the dataset is large. So, although you are asked to do a t test,
we can still use the normal distribution to get the critical value. In fact, you should
probably just remember that the critical value for the normal distribution for 5%
significance in a two-sided test is 1.96. But, if not, you can use:
qnorm(0.025)

Step 3: Calculate the test statistic:

t = -0.200/0.110= -1.8181

Step 4: Draw a conclusion:

The test statistics is less, in absolute value, than the critical so you would not reject the
Null hypothesis at the 5% level.

If the coefficient is 0.100: test statistic=0.9091, do not reject either

If the coefficient is 0.350: test statistic=3.1818, reject the Null that age is not related to
the effectiveness of the drug in favour of the alternative hypothesis that there is
relationship between age and the effectiveness of the drug.

A researcher wants to investigate the relationship between economic development and


health, so she downloads the following dataset, called “development_dataset”, given life
expectancy and GDP per capita in 2018 for 20 countries.

Country Life_Expectancy GDPpc


Bangladesh 73.4 4.440
Belarus 74.3 18.900
Belgium 81.5 51.200
Benin 64.9 3.160
Bolivia 73.1 8.660
Italy 83.3 42.200
Japan 84.4 41.100
Jordan 79.5 9.850
Kazakhstan 72.5 25.500
Kenya 66.3 4.200
Netherlands 81.7 56.500
New Zealand 81.8 42.600
Nicaragua 79 5.690
Niger 62.8 1.200
Nigeria 64.8 5.160
United States 78.6 61.400
Uruguay 77.2 21.600
Uzbekistan 70.5 6.760
Vietnam 74.6 7.590
Zambia 63.7 3.520

1. Write down the regression equation for Life expectancy as a function of GDP per
capita.

Answer

Life_Expectancy = a + b*GDPpc + e

2. Provide R code that will estimate the intercept and the slope of the linear model.

Answer

lm(development_dataset$Life_Expectancy ~ development_dataset$GDPpc)

3. If your answer to part 2 does not already do so, then provide R code that will give an
estimate of the R2 for your regression.

Answer

The code in part 2 will only give you the coefficient estimates. You get a lot more
information, including the R2 but using the summary() command

summary(lm(development_dataset$Life_Expectancy ~ development_dataset$GDPpc))

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy