STAT305 Main Exam 2023 Solutions
STAT305 Main Exam 2023 Solutions
MAIN EXAMINATION
18 November 2023
COURSE AND CODE:
BIOSTATISTICS METHODS (STAT305 W2)
SOLUTIONS
INSTRUCTIONS:
QUESTION MARK
1. Fill in the following:
1 [16]
Student Number__________________
2 [8]
Signature________________________
3 [18]
1.1) What is a standardized mortality ratio? Discuss two scenarios where it is better to
use this ratio compared to a comparative mortality figure. [4]
SMR is the ratio of the number of deaths observed in the special population over a
given period to the total number that would be expected over the same period if the
special population had the same age-specific rates as the standard population.
SMR is less sensitive to changes in the population and is more appropriate for small
population sizes. In addition, it is more appropriate when the age specific number
of deaths for the special population are unknown.
1.2) Give two basic summary measures used to describe the amount of disease in the
population and briefly explain what each one measures. [4]
Incidence and prevalence rates.
The incidence rate measures the rate at which new cases of a specific condition or
disease occur in a population over a defined period of time. The prevalence rate
measures the total number of existing cases (both new and pre-existing) of a
specific condition or disease in a population at a particular point in time or over a
specific period.
Time taken to an event can be highly skewed, there it is non-normal data. Censoring
can occur where some individuals drop out or do not experience the event by the end
of the study.
2
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023
QUESTION 2 [8 marks]
A cohort study trying to investigate the association between high cholesterol level
(>=250 milligram per deciliter of blood) and coronary heart disease (CHD) was
conducted. The data obtained is summarized in the following two by two table below.
Outcome
Cholesterol level
CHD No CHD Total
High (≥250) 10 2 12
Not high (< 250) 2 4 6
Total 12 6 18
2.2) Which of the above Statistics is the most appropriate? Explain your answer. [3]
6×6
𝑒3 = = 2. Therefore, since some of the expected cell counts are less than 5,
18
we use the Fisher’s Exact Test.
2.3) Test the hypotheses you stated in part (2.1). This must include the statistic you
choose, the decision you made and the conclusion. [3]
3
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023
The data below is based on a case-control study investigating the relationship between
lung cancer and employment in shipbuilding (exposure to asbestos). The smoking
status of the individuals was controlled for.
3.1) Using relevant point estimates, show that the rate of lung cancer differs based on
smoking status. [2]
61
For non-smokers: 𝑝̂ = 299 = 20.2%
110
For smokers: 𝑝̂ = 163 = 67.5%
3.2) Show that the assumption of a common odds ratio of lung cancer for employment
in shipbuilding across the smoking statuses is valid. Clearly state the hypotheses
to be tested. [6]
𝐻0 : 𝜓1 = 𝜓2 = 𝜓
𝐻0 : 𝜓1 ≠ 𝜓2
11(203)
𝜓̂1 = = 1.276, 𝜃̂1 = ln 1.276 = 0.244
35(50)
1 1 1 1 −1
𝑤1 = [ + + + ] = 6.925
11 35 203 50
14(50)
𝜓̂2 = = 2.431, 𝜃̂2 = ln 2.431 = 0.888
3(96)
1 1 1 1 −1
𝑤2 = [ + + + ] = 2.298
14 3 50 96
4
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023
(∑ 𝑤𝑖 𝜃̂𝑖 )2
𝑤 = ∑ 𝑤𝑖 𝜃̂𝑖2 −
∑ 𝑤𝑖
2
[6.925(0.244) + 2.298(0.888)]2
2
= 6.925(0.244) + 2.298(0.888) −
6.925 + 2.298
= 0.716
2
Critical value 𝜒1,0.05 = 3.841
Since 𝑤 < 3.841, 𝐻0 is not rejected. The common odds assumption is met.
3.3) Estimate the common odds ratio of lung cancer for employment in shipbuilding. [3]
11(203) 14(50)
∑𝑎𝑖 𝑑𝑖 /𝑛𝑖 + 163
𝜓̂𝑀𝐻 = = 299 = 1.544
∑ 𝑏𝑖 𝑐𝑖 /𝑛𝑖 35(50) 3(96)
299 + 163
3.4) Determine a 95% confidence interval for the common odds ratio of lung cancer for
employment in shipbuilding and use your answer to determine whether
employment in shipbuilding is associated with a higher or lower likelihood of lung
cancer. Give a reason for your answer. [7]
35(50) 3(96)
𝑊1 = = 5.853, 𝑊2 = = 1.767
299 163
1 1 1 1 1 1 1 1
𝑣1 = + + + = 0.144, 𝑣2 = + + + = 0.435
11 203 35 50 14 50 3 96
5
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023
QUESTION 4 [9 marks]
Decrease in WBC
Drug Total
Yes No
1 43 14 57
2 65 34 99
3 91 16 107
Total 199 64 263
It has been determined that there is a significant association between the dose of drug
and a decrease in WBC, which resulted in a test statistic of 10.502. Determine if there
is a significant linear trend in a decrease in WBC with an increase in the dose. Clearly
state all hypotheses. [9]
𝐻0 : There is no trend of 𝑝𝑖 on 𝑥𝑖
𝐻1 : There is a trend of 𝑝𝑖 on 𝑥𝑖
𝑥𝑖 𝑟𝑖 𝑛𝑖 𝑟𝑖 𝑥𝑖 𝑛𝑖 𝑥𝑖 𝑛𝑖 𝑥𝑖2
1 43 57 43 57 57
= 3.634
2
Critical value 𝜒1,0.05 = 3.841
6
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023
= 𝑒 −0.9𝑡
b) What is the probability of a patient taking longer than 5 weeks to recover? [2]
d) What is the probability of a patient recovering in week 3 given that they had not
yet recovered before then? [2]
𝑓(𝑡)
=
𝑆(𝑡)
= 0.9
7
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023
5.2) Suppose the survival times (in months since transplant) for eight patients who
received bone marrow transplants are given below (note ‘*’ denotes a censored
observation):
3, 4.5, 6*, 11, 18.5*, 20, 28* and 36.
a) Calculate the Kaplan Meier estimates for the survival function. [5]
𝑚𝑖
𝑡𝑖 𝑚𝑖 𝑟𝑖 1− 𝑆̂𝐾𝑀 (𝑡)
𝑟𝑖
3 1 8 0.875 0.875
4.5 1 7 0.857 0.75
11 1 5 0.8 0.6
20 1 3 0.667 0.4
36 1 1 0 0
b) Plot the survival function based on the estimates obtained in part (a). [4]
0.875
0.5
𝑆̂𝐾𝑀 (𝑡)
4.5 11 20 36
3
𝑡
c) Determine a 95% confidence interval for the survival function at 3 months. [4]
𝑚
= 0.0179
𝑟(𝑟 − 𝑚)
𝑚𝑖
𝑠. 𝑒[𝑆̂𝐾𝑀 (𝑡)] ≅ [𝑆̂𝐾𝑀 (𝑡)]√∑ = 0.875√0.0179 = 0.1169
𝑟𝑖 (𝑟𝑖 − 𝑚𝑖 )
𝑡𝑖 ≤𝑡
8
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023
6.1) Explain what the coefficients in a logistic regression model tell us:
a) This represents the change in the log-odds of the event of interest occurring for
1 unit increase in the explanatory variable.
b) This represents the change in the log-odds of the event of interest occurring for
a particular category of the explanatory variable compared to the reference
category.
6.2) A study on the relationship between education level and the cholesterol status of
an individual was undertaken. The following data was obtained:
Cholesterol
Education level Total
High Low
None 52 27 79
Primary 33 55 88
Secondary 27 12 39
Higher 9 26 35
Total 121 120 241
9
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023
b) Fit a logistic regression model to the data to model the likelihood of an individual
having high cholesterol. Use higher education as a reference category for
education level. [5]
None 1 0 0
Primary 0 1 0
Secondary 0 0 1
Higher 0 0 0
52/79
𝑜𝑑𝑑𝑠 ℎ𝑖𝑔ℎ 𝑐ℎ𝑜𝑙 (𝑛𝑜𝑛𝑒): = 1.93
27/79
33/88
𝑜𝑑𝑑𝑠 ℎ𝑖𝑔ℎ 𝑐ℎ𝑜𝑙 (𝑝𝑟𝑖𝑚𝑎𝑟𝑦): = 0.6
55/88
27/39
𝑜𝑑𝑑𝑠 ℎ𝑖𝑔ℎ 𝑐ℎ𝑜𝑙 (𝑠𝑒𝑐𝑜𝑛𝑑𝑎𝑟𝑦): = 2.25
12/39
9/35
𝑜𝑑𝑑𝑠 ℎ𝑖𝑔ℎ 𝑐ℎ𝑜𝑙 (ℎ𝑖𝑔ℎ𝑒𝑟): = 0.34
26/35
1.93
̂ (𝑛𝑜𝑛𝑒, ℎ𝑖𝑔ℎ𝑒𝑟):
𝑂𝑅 = 5.6765 𝛽̂1 = ln(5.6765) = 1.73
0.34
0.6
̂ (𝑝𝑟𝑖𝑚𝑎𝑟𝑦, ℎ𝑖𝑔ℎ𝑒𝑟):
𝑂𝑅 = 1.7647 𝛽̂2 = ln(1.7647) = 0.57
0.34
2.25
̂ (𝑠𝑒𝑐𝑜𝑛𝑑𝑎𝑟𝑦, ℎ𝑖𝑔ℎ𝑒𝑟):
𝑂𝑅 = 6.6176 𝛽̂3 = ln(6.6176) = 1.89
0.34
10
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023
6.3) For a study using logistic regression to examine the data on rheumatoid arthritis,
we consider age in years of patient as the predictor variable. The response
measured whether the patient showed any improvement (1=yes, 0 =no). The
following partial SAS output was obtained using age to predict the probability of
improvement:
Deviance 10.9549 11
Pearson 19.0054 11
𝑒 2.239
𝜋̂ = = 0.9037
1 + 𝑒 2.239
11
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023
c) Find the age at which a patient shows a 75% chance of improvement from
rheumatoid arthritis. [3]
𝜂̂ = 5.289 − 0.122𝐴𝑔𝑒
𝑒 𝜂̂
= 0.75
1 + 𝑒 𝜂̂
𝑒 𝜂̂ = 0.75 + 0.75𝑒 𝜂̂
0.75
𝑒 𝜂̂ =
0.25
𝜂̂ = 5.289 − 0.122𝐴𝑔𝑒 = ln 3
∴ 𝐴𝑔𝑒 = 34.35
𝐻0 : 𝛽1 = 0 𝑣𝑠 𝐻1 : 𝛽1 ≠ 0
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.0210
Since p-value < 0.05, reject 𝐻0 . Therefore, age has a significant effect on showing
improvement from rheumatoid arthritis.
19.0054
Overdispersion = = 1.72, thus based on the Pearson’s Chi square statistic,
11
there is some overdispersion present.
10.9549
Based on the Deviance statistics, = 0.9959, there is no overdispersion.
11
f) Determine the estimated odds ratio for an increase in age by 10 years. [3]
12