Biostatistics 541/699, Exam 2: Solutions
Biostatistics 541/699, Exam 2: Solutions
Solutions
699 Students: Please complete the indicated parts of problems 1 and 5.
541 Students: Please complete all parts of all questions.
1. [14pts, 541 & 699] True or False (no need to explain your answers):
(a) (541 only) The χ2 test of independence for a 2 by 10 table has 9 degrees of freedom.
True.
(b) (541 only) If the estimate for the population standard deviation does not change, the 95% confidence interval
for the population mean µ will get more narrow as the number of observations increases.
True.
(c) (541 only) The 95% confidence interval for the population mean µ will always contain µ if all necessary
assumptions are satisfied.
False.
(d) (541 & 699) The log transformation will always transform right skewed data to symmetric data.
False.
(e) (541 & 699) In a two sample t-test, different data transformations can be used for the two samples if
necessary.
False.
(f) (541 only) The p-value for a chi-squared test is calculated using only one tail of the distribution.
True.
(g) (541 & 699) McNemar’s test is used to compare two proportions calculated from independent samples.
False.
2. [26pts, 541 only] The American Hospital Association reports that the mean cost to community hospitals per
patient per day in U.S. hospitals was $2035 last year. In that same year, a random sample of 25 Massachusetts
hospitals yielded a mean daily cost of $2100 with a sample standard deviation of $160. It is known that the
distribution of cost per patient per day is symmetric in the population of community hospitals. Is there evidence
that that the cost of Massachusetts hospitals is different from the rest of the country?
(a) [4pts] State the null and alternative hypotheses (define any symbols you use).
H0 : µ = 2035 HA : µ 6= 2035 µ = the mean cost per day per patient at Massachusetts hospitals.
(b) [5pts] Calculate the test statistic.
√
t = (x̄ − µ0 )/(s/ n) = (2100 − 2035)/(160/5) = 2.0312
(c) [5pts] What is the distribution of the test statistic? (no need to explain your answer)
A t distribution with 24 degrees of freedom.
(d) [8pts] Draw a picture of the p-value calculation:
i. Draw a picture of the distribution of the test statistic. Your picture should reflect the general shape of
the distribution and whether it is symmetric or not.
ii. Shade (fill in) the area(s) under the curve that correspond(s) to the p-value (this will be approximate).
iii. indicate the probability contained in the shaded area(s)
iv. Label the following on the x-axis (horizontal axis) in your picture
USING NUMBERS not symbols:
• the mean of the distribution of the test statistic.
• the value(s) that define the shaded area.
1
0.0285 0.0285
−2.03 0 2.03
3. [14pts, 541 only] For a class of 200 students there are 2 teaching assistants (TAs). The professor wonders if the
TAs will give similar grades on the first exam. To answer this question she chooses 20 of the 200 ungraded exams.
She gives the 20 exams to the first TA and asks him to grade them without making any marks on them. He then
gives the same 20 exams to the second TA with the same instructions.
(a) [4pts] The professor wants to know how much evidence there is that the TAs are grading differently. What
method should she use? (No need to explain your answer)
Paired t-test
(b) [5pts] What assumptions are required so that the method in part (a) will give accurate results. (No need to
explain your answer)
1) The mean difference in grades is approximately normally distributed.
2) The 20 exams are a simple random sample from the set of 200 exams.
(c) [5pts] The 95% confidence interval for the difference in the mean grade assigned by the two TAs
is (2.1, 4.5).
If you were to calculate the p-value for the test of the difference in the mean grades assigned by the two TAs
would it be less than 0.05? (Yes, No or Can’t tell; no explanation necessary)
Yes. (0 is not in the 95% confidence interval)
4. [18pts, 541 only] In an investigation of pregnancy-induced hypertension, a group of 17 women with the disorder
were treated with low-dose aspirin plus standard care. A second group of 24 women were given the standard care.
In the aspirin group 13 women had a positive response to treatment. In the standard care only group 8 women
women had a positive response to treatment.
(a) [5pts] Calculate an estimate for the population proportion of successful responses for the aspirin group minus
the population proportion of successful responses for the standard care only group.
p̂1 − p̂2 = 13/17 − 8/24 = 0.431
(b) [4pts] Construct the 2 by 2 table used to test the null hypothesis that the fraction of positive responses is the
same in each group. Label the rows and columns. You do not need to actually do the test.
aspirin placebo
positive response 13 8
negative response 4 16
total 17 24
(c) [4pts] What type of test is used to test the null hypothesis in part (b) (No need to explain your answer)
χ2 test of independence
(d) [5pts] What assumptions are required for the hypothesis test in part (c) to give accurate results. (No need
to explain your answer)
2
1) The expected number in each table cell is > 1
2) The expected number in 80% of cells is ≥ 5 (Note: The observed value in the asprin/negative cell is less
than 5 but the expected number is (13 + 8)(13 + 4)/41 = 8.70).
3) The two groups are simple random samples from the population of women with pregnancy-induced
hypertension
5. [28pts, 541 & 699] For each part (a-g) enter the method that you would used to answer the question and additional
information as described. You may assume symmetric population distributions for continuous random variables
and that the expected counts are ≥ 5 in any contingency table.
IN ALL CASES you must define any symbols you use in terms of the problem (e.g. x, n, x̄, µ, σ, p, p0 , p̂, t, z).
(a) [4pts] (541 only) In a study of physical therapy patients a simple random sample of 300 patients with knee
pain was obtained. After treatment, Each patient was asked to rate their pain as “decreased”, “no change”
or “increased”. The patients ages were divided in to the 3 categories 15-25, 26-45, and 46 and above. The
researchers would like to know if the effectiveness of the treatment is different for different age groups?
Method: χ2 test of independence
H0 : Age group and reported pain group are independent.
(b) [4pts] (541 & 699) A simple random sample of 50 liver transplant patients is obtained. The amount of
billirubin in each patient is measured using both the established method and a proposed new method. The
researcher would like to know if there is evidence that the two methods differ in the amount of billirubin
measured.
Method: Paired t-test.
H0 : µ1 = µ2
µ1 = the population mean billirubin using the the established method
µ2 = the population mean billirubin using the the new method
(c) [4pts] (541 & 699) Two simple random samples were obtained: 100 lung cancer patients and 200 healthy
controls. Each patient was asked whether they had ever smoked cigarettes. Researchers would like to know
whether there is a difference between the fraction of smokers in the lung cancer group and in the healthy
group.
(d) [4pts] (541 only) For the smoking data in part (c) the researchers would like to to estimate the fraction of
smokers in each group.
Method: Estimate for one proportion using the data in each group separately.
p̂1 = x1 /n1 where x1 = the number of smokers in the lung cancer group and n1 is the total number in the
lung cancer group.
p̂2 = x2 /n2 where x2 = the number of smokers in the control group and n2 is the total number in the control
group.
3
(e) [4pts] (541 only) A simple random sample of size 288 was obtained from all births in a particular city in a
four week period. The day of the week when the births occurred is recorded. The investigator would like to
know if there is an equal probability of a birth occurring on each day of the week?
(f) [4pts] (541 & 699) Researchers are interested in evaluating the effectiveness of a once a week educational
program designed to prevent or stop smoking in teenagers. A simple random sample of 25 families with twin
children in high-school is obtained. In each family, a randomly selected twin child is assigned to attend the
educational program. The other twin does not attend. Two months after the end of the program the smoking
status of each of the twins is assessed via a blood sample. The investigator would like to test whether the
proportion of smokers in the twins who attended the educational program is different than the proportion of
smokers in the twins who did not.
(g) [4pts] (541 only) In a particular strain of mice it is thought that 90% of offspring survive past their first
day. It is known that birth order does not effect survival. An experimenter observes a simple random sample
of 200 litters and records whether the first pup born survives or not. She wishes to test the claim that there
is a 90% one-day survival rate.
Method: Test of one proportion
H0 : p = 0.9 where p is the probability of a pup surviving their first day.