0% found this document useful (0 votes)
18 views12 pages

STAT305 Main Exam 2023 Solutions

The document is an examination paper for the Biostatistics Methods course (STAT305 W2) at the University of KwaZulu-Natal, dated November 18, 2023, with a total of 100 marks. It includes various questions related to biostatistics concepts such as standardized mortality ratio, incidence and prevalence rates, screening test sensitivity and specificity, survival data challenges, and statistical analyses in case-control studies. The exam consists of multiple questions requiring definitions, explanations, hypothesis testing, and calculations based on provided data.

Uploaded by

naseeha.syed18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views12 pages

STAT305 Main Exam 2023 Solutions

The document is an examination paper for the Biostatistics Methods course (STAT305 W2) at the University of KwaZulu-Natal, dated November 18, 2023, with a total of 100 marks. It includes various questions related to biostatistics concepts such as standardized mortality ratio, incidence and prevalence rates, screening test sensitivity and specificity, survival data challenges, and statistical analyses in case-control studies. The exam consists of multiple questions requiring definitions, explanations, hypothesis testing, and calculations based on provided data.

Uploaded by

naseeha.syed18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

SCHOOL OF MATHEMATICS, STATISTICS

& COMPUTER SCIENCE

MAIN EXAMINATION
18 November 2023
COURSE AND CODE:
BIOSTATISTICS METHODS (STAT305 W2)
SOLUTIONS

DURATION: 3 Hours TOTAL MARKS: 100

INTERNAL EXAMINER: Dr. DJ Roberts


EXTERNAL EXAMINER: Dr. Y Shiferaw (University of Johannesburg)

THIS IS EXAM CONSISTS OF 14 PAGES INCLUDING THIS ONE.


PLEASE ENSURE THAT YOU HAVE THEM ALL.

INSTRUCTIONS:
QUESTION MARK
1. Fill in the following:
1 [16]
Student Number__________________
2 [8]
Signature________________________
3 [18]

1. Attempt all questions in the space 4 [9]


provided.
2. Show all workings where appropriate; Do 5 [22]
not re-calculate information that is given. 6 [27]
3. Write in Ink. Rough work can be done in
pencil and will not be marked.
4. Unless otherwise stated, use a 5% level of
significance.
5. A formula sheet and statistical tables are
attached at the end of the paper. These may TOTAL
be detached.
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023

QUESTION 1 [16 marks]

1.1) What is a standardized mortality ratio? Discuss two scenarios where it is better to
use this ratio compared to a comparative mortality figure. [4]

SMR is the ratio of the number of deaths observed in the special population over a
given period to the total number that would be expected over the same period if the
special population had the same age-specific rates as the standard population.

SMR is less sensitive to changes in the population and is more appropriate for small
population sizes. In addition, it is more appropriate when the age specific number
of deaths for the special population are unknown.

1.2) Give two basic summary measures used to describe the amount of disease in the
population and briefly explain what each one measures. [4]
Incidence and prevalence rates.

The incidence rate measures the rate at which new cases of a specific condition or
disease occur in a population over a defined period of time. The prevalence rate
measures the total number of existing cases (both new and pre-existing) of a
specific condition or disease in a population at a particular point in time or over a
specific period.

1.3) Define the sensitivity and specificity of screening tests. [4]


Sensitivity, also known as the true positive rate or recall, is the ability of a screening
test to correctly identify individuals who have a particular condition or disease. It is
the proportion of true positives (correctly identified cases) among all individuals who
actually have the condition.
Specificity is the ability of a screening test to correctly identify individuals without a
particular condition. It is the proportion of true negatives (correctly identified non-
cases) among all individuals who do not have the condition.

1.4) Discuss the unique challenges of survival data. [4]

Time taken to an event can be highly skewed, there it is non-normal data. Censoring
can occur where some individuals drop out or do not experience the event by the end
of the study.

2
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023

QUESTION 2 [8 marks]

A cohort study trying to investigate the association between high cholesterol level
(>=250 milligram per deciliter of blood) and coronary heart disease (CHD) was
conducted. The data obtained is summarized in the following two by two table below.

Outcome
Cholesterol level
CHD No CHD Total
High (≥250) 10 2 12
Not high (< 250) 2 4 6
Total 12 6 18

The following output was obtained from SAS frequency procedure.

Statistics for Table of Cholesterol by outcome

Statistic DF Value Prob (two side-p-value)

Chi-Square 1 4.5000 0.0339


Likelihood Ratio Chi-Square 1 4.4629 0.0346
Fisher's Exact Test 0.1070
Mantel-Haenszel Chi-Square 1 4.2500 0.0393
Phi Coefficient 0.5000

Using the above information answer the following questions.

2.1) State the null and alternative hypotheses to be tested. [2]

𝐻0 : There is no association between high cholesterol level and coronary heart


disease

𝐻1 : There is an association between high cholesterol level and coronary heart


disease

2.2) Which of the above Statistics is the most appropriate? Explain your answer. [3]
6×6
𝑒3 = = 2. Therefore, since some of the expected cell counts are less than 5,
18
we use the Fisher’s Exact Test.

2.3) Test the hypotheses you stated in part (2.1). This must include the statistic you
choose, the decision you made and the conclusion. [3]

Since p-value = 0.1070 > 0.05, 𝐻0 is not rejected. There is no association


between high cholesterol level and coronary heart disease.

3
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023

QUESTION 3 [18 marks]

The data below is based on a case-control study investigating the relationship between
lung cancer and employment in shipbuilding (exposure to asbestos). The smoking
status of the individuals was controlled for.

Smoking Employment in Lung cancer


Total
Status shipbuilding Yes No
Non-smoker Yes 11 35 46
Non-smoker No 50 203 253
Total 61 238 299
Smoker Yes 14 3 17
Smoker No 96 50 146
Total 110 53 163

3.1) Using relevant point estimates, show that the rate of lung cancer differs based on
smoking status. [2]

61
For non-smokers: 𝑝̂ = 299 = 20.2%
110
For smokers: 𝑝̂ = 163 = 67.5%

3.2) Show that the assumption of a common odds ratio of lung cancer for employment
in shipbuilding across the smoking statuses is valid. Clearly state the hypotheses
to be tested. [6]

𝐻0 : 𝜓1 = 𝜓2 = 𝜓
𝐻0 : 𝜓1 ≠ 𝜓2
11(203)
𝜓̂1 = = 1.276, 𝜃̂1 = ln 1.276 = 0.244
35(50)
1 1 1 1 −1
𝑤1 = [ + + + ] = 6.925
11 35 203 50

14(50)
𝜓̂2 = = 2.431, 𝜃̂2 = ln 2.431 = 0.888
3(96)
1 1 1 1 −1
𝑤2 = [ + + + ] = 2.298
14 3 50 96

4
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023

(∑ 𝑤𝑖 𝜃̂𝑖 )2
𝑤 = ∑ 𝑤𝑖 𝜃̂𝑖2 −
∑ 𝑤𝑖

2
[6.925(0.244) + 2.298(0.888)]2
2
= 6.925(0.244) + 2.298(0.888) −
6.925 + 2.298

= 0.716

2
Critical value 𝜒1,0.05 = 3.841

Since 𝑤 < 3.841, 𝐻0 is not rejected. The common odds assumption is met.

3.3) Estimate the common odds ratio of lung cancer for employment in shipbuilding. [3]

11(203) 14(50)
∑𝑎𝑖 𝑑𝑖 /𝑛𝑖 + 163
𝜓̂𝑀𝐻 = = 299 = 1.544
∑ 𝑏𝑖 𝑐𝑖 /𝑛𝑖 35(50) 3(96)
299 + 163

3.4) Determine a 95% confidence interval for the common odds ratio of lung cancer for
employment in shipbuilding and use your answer to determine whether
employment in shipbuilding is associated with a higher or lower likelihood of lung
cancer. Give a reason for your answer. [7]
35(50) 3(96)
𝑊1 = = 5.853, 𝑊2 = = 1.767
299 163
1 1 1 1 1 1 1 1
𝑣1 = + + + = 0.144, 𝑣2 = + + + = 0.435
11 203 35 50 14 50 3 96

∑𝑊𝑖2 𝑣𝑖 0.58532 (0.144) + 1.7672 (0.435)


𝑣𝑎𝑟[ln(𝜓̂𝑀𝐻 )] = = = 0.1084
(∑𝑊𝑖 )2 (5.853 + 1.767)2

ln(𝜓̂𝑀𝐻 ) ± 𝑍𝛼 √𝑣𝑎𝑟[ln(𝜓̂𝑀𝐻 )] = ln(1.544) ± 1.96√0.1084 = (−0.2109; 1.0796)


2

exp(−0.2109; 1.0796) = (0.810; 2.944)

Since the CI contains 1, employment in the shipbuilding industry is not significantly


associated with lung cancer.

5
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023

QUESTION 4 [9 marks]

In a randomized clinical trial on the treatment of leukemia, patients were assigned to


3 doses of drugs which increased from drug 1 to drug 3. Whether or not the drugs
resulted in a decrease in the white blood cell (WBC) count in the patient was recorded.
The data is as follows:

Decrease in WBC
Drug Total
Yes No
1 43 14 57
2 65 34 99
3 91 16 107
Total 199 64 263

It has been determined that there is a significant association between the dose of drug
and a decrease in WBC, which resulted in a test statistic of 10.502. Determine if there
is a significant linear trend in a decrease in WBC with an increase in the dose. Clearly
state all hypotheses. [9]

𝐻0 : There is no trend of 𝑝𝑖 on 𝑥𝑖

𝐻1 : There is a trend of 𝑝𝑖 on 𝑥𝑖

𝑥𝑖 𝑟𝑖 𝑛𝑖 𝑟𝑖 𝑥𝑖 𝑛𝑖 𝑥𝑖 𝑛𝑖 𝑥𝑖2

1 43 57 43 57 57

2 65 99 130 198 396

3 91 107 273 321 963

𝑅 = 199 𝑁 = 263 446 576 1416

𝑁{𝑁 ∑ 𝑟𝑖 𝑥𝑖 − 𝑅 ∑ 𝑛𝑖 𝑥𝑖 }2 263{263(446) − 199(576)}2


𝜒12 = =
𝑅(𝑁 − 𝑅){𝑁 ∑ 𝑛𝑖 𝑥𝑖2 − (∑ 𝑛𝑖 𝑥𝑖 )2 } 199(263 − 199){263(1416) − (576)2 }

= 3.634
2
Critical value 𝜒1,0.05 = 3.841

Since 𝜒 2 < 3.841, 𝐻0 is not rejected. There is not a significant trend.


Since there is no significant trend, there is no need to determine if the trend is linear.

6
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023

QUESTION 5 [22 marks]


5.1) Let 𝑇 be a random variable denoting the time in weeks it takes for a patient to
recover from COVID-19. Suppose 𝑇 has the following density function:
−0.9𝑡
𝑓(𝑡) = {0.9𝑒 𝑡>0
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

a) Derive the survival function. [3]

𝑆(𝑡) = 𝑃(𝑇 > 𝑡) = 1 − 𝐹(𝑡)


𝑡
= 1 − ∫ 0.9𝑒 −0.9𝑥 𝑑𝑥
0

= 1 − [−𝑒 −0.9𝑥 ]𝑡0

= 𝑒 −0.9𝑡

b) What is the probability of a patient taking longer than 5 weeks to recover? [2]

𝑃(𝑇 > 5) = 𝑆(5) = 𝑒 −0.9(5) = 0.0111

c) What is the median time taken for a patient to recover? [2]

𝑆(𝑡) = 0.5 = 𝑒 −0.9𝑡


ln(2)
𝑡= = 0.77 weeks
0.9

d) What is the probability of a patient recovering in week 3 given that they had not
yet recovered before then? [2]

𝑃(𝑇 = 3|𝑇 < 3) = ℎ(3)

𝑓(𝑡)
=
𝑆(𝑡)

= 0.9

7
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023

5.2) Suppose the survival times (in months since transplant) for eight patients who
received bone marrow transplants are given below (note ‘*’ denotes a censored
observation):
3, 4.5, 6*, 11, 18.5*, 20, 28* and 36.

a) Calculate the Kaplan Meier estimates for the survival function. [5]
𝑚𝑖
𝑡𝑖 𝑚𝑖 𝑟𝑖 1− 𝑆̂𝐾𝑀 (𝑡)
𝑟𝑖
3 1 8 0.875 0.875
4.5 1 7 0.857 0.75
11 1 5 0.8 0.6
20 1 3 0.667 0.4
36 1 1 0 0

b) Plot the survival function based on the estimates obtained in part (a). [4]

0.875

0.5
𝑆̂𝐾𝑀 (𝑡)

4.5 11 20 36
3
𝑡
c) Determine a 95% confidence interval for the survival function at 3 months. [4]
𝑚
= 0.0179
𝑟(𝑟 − 𝑚)

𝑚𝑖
𝑠. 𝑒[𝑆̂𝐾𝑀 (𝑡)] ≅ [𝑆̂𝐾𝑀 (𝑡)]√∑ = 0.875√0.0179 = 0.1169
𝑟𝑖 (𝑟𝑖 − 𝑚𝑖 )
𝑡𝑖 ≤𝑡

CI: 0.875 ± 1.96(0.1169) = (0.646, 1.104)

8
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023

QUESTION 6 [27 marks]

6.1) Explain what the coefficients in a logistic regression model tell us:

a) for a continuous explanatory variable. [2]


b) for a categorical explanatory variable. [2]

a) This represents the change in the log-odds of the event of interest occurring for
1 unit increase in the explanatory variable.
b) This represents the change in the log-odds of the event of interest occurring for
a particular category of the explanatory variable compared to the reference
category.

6.2) A study on the relationship between education level and the cholesterol status of
an individual was undertaken. The following data was obtained:

Cholesterol
Education level Total
High Low
None 52 27 79
Primary 33 55 88
Secondary 27 12 39
Higher 9 26 35
Total 121 120 241

a) State 2 assumptions of a logistic regression model. [2]

Any two of the following:


− The observations are independent
− The explanatory variables have a linear relationship with the log-odds of the
event of interest occurring
− There is no multicollinearity
− The sample size is sufficient

9
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023

b) Fit a logistic regression model to the data to model the likelihood of an individual
having high cholesterol. Use higher education as a reference category for
education level. [5]

Education Design Variables


level 𝑥1 𝑥2 𝑥3

None 1 0 0

Primary 0 1 0

Secondary 0 0 1

Higher 0 0 0

52/79
𝑜𝑑𝑑𝑠 ℎ𝑖𝑔ℎ 𝑐ℎ𝑜𝑙 (𝑛𝑜𝑛𝑒): = 1.93
27/79

33/88
𝑜𝑑𝑑𝑠 ℎ𝑖𝑔ℎ 𝑐ℎ𝑜𝑙 (𝑝𝑟𝑖𝑚𝑎𝑟𝑦): = 0.6
55/88

27/39
𝑜𝑑𝑑𝑠 ℎ𝑖𝑔ℎ 𝑐ℎ𝑜𝑙 (𝑠𝑒𝑐𝑜𝑛𝑑𝑎𝑟𝑦): = 2.25
12/39

9/35
𝑜𝑑𝑑𝑠 ℎ𝑖𝑔ℎ 𝑐ℎ𝑜𝑙 (ℎ𝑖𝑔ℎ𝑒𝑟): = 0.34
26/35

(there may be slight rounding differences)

1.93
̂ (𝑛𝑜𝑛𝑒, ℎ𝑖𝑔ℎ𝑒𝑟):
𝑂𝑅 = 5.6765  𝛽̂1 = ln(5.6765) = 1.73
0.34
0.6
̂ (𝑝𝑟𝑖𝑚𝑎𝑟𝑦, ℎ𝑖𝑔ℎ𝑒𝑟):
𝑂𝑅 = 1.7647  𝛽̂2 = ln(1.7647) = 0.57
0.34

2.25
̂ (𝑠𝑒𝑐𝑜𝑛𝑑𝑎𝑟𝑦, ℎ𝑖𝑔ℎ𝑒𝑟):
𝑂𝑅 = 6.6176  𝛽̂3 = ln(6.6176) = 1.89
0.34

𝛽̂0 = ln(0.34) = −1.08

𝑙𝑜𝑔𝑖𝑡(𝑦̂) = −1.08 + 1.72𝑥1 + 0.57𝑥2 + 1.89𝑥3

10
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023

6.3) For a study using logistic regression to examine the data on rheumatoid arthritis,
we consider age in years of patient as the predictor variable. The response
measured whether the patient showed any improvement (1=yes, 0 =no). The
following partial SAS output was obtained using age to predict the probability of
improvement:

Deviance and Pearson Goodness-of-Fit Statistics

Criterion Value DF Value/DF Pr > ChiSq

Deviance 10.9549 11

Pearson 19.0054 11

Analysis of Maximum Likelihood Estimates

Parameter Standard Wald


DF Estimate Error Chi-Square Pr > ChiSq
Intercept 1 5.289 2.157 6.0124 0.0140
Age 1 -0.122 0.058 4.4244 0.0210

Hosmer and Lemeshow Goodness-of-Fit Test


Chi-Square DF Pr > ChiSq
8.9534 6 0.1762

a) Specify the fitted logistic regression model. [2]

𝑙𝑜𝑔𝑖𝑡(𝑦̂) = 5.289 − 0.122𝐴𝑔𝑒

b) Determine the predicted probability of improvement for a patient who is 25 years


old and interpret. [3]

𝜂̂ = 5.289 − 0.122(25) = 2.239

𝑒 2.239
𝜋̂ = = 0.9037
1 + 𝑒 2.239

A person who is 25 years old has a 90.37% chance of showing improvement in


rheumatoid arthritis.

11
University of KwaZulu-Natal, Main Examination: STAT305 W2 2023

c) Find the age at which a patient shows a 75% chance of improvement from
rheumatoid arthritis. [3]
𝜂̂ = 5.289 − 0.122𝐴𝑔𝑒

𝑒 𝜂̂
= 0.75
1 + 𝑒 𝜂̂

𝑒 𝜂̂ = 0.75 + 0.75𝑒 𝜂̂

0.75
𝑒 𝜂̂ =
0.25

𝜂̂ = 5.289 − 0.122𝐴𝑔𝑒 = ln 3

∴ 𝐴𝑔𝑒 = 34.35

d) Does age have a significant effect on the likelihood of a patient showing


improvement from rheumatoid arthritis? Clearly state the hypotheses. [3]

𝐻0 : 𝛽1 = 0 𝑣𝑠 𝐻1 : 𝛽1 ≠ 0
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 0.0210
Since p-value < 0.05, reject 𝐻0 . Therefore, age has a significant effect on showing
improvement from rheumatoid arthritis.

e) Comment on the presence of overdispersion. [2]

19.0054
Overdispersion = = 1.72, thus based on the Pearson’s Chi square statistic,
11
there is some overdispersion present.

10.9549
Based on the Deviance statistics, = 0.9959, there is no overdispersion.
11

f) Determine the estimated odds ratio for an increase in age by 10 years. [3]

̂ = exp(10 × −0.122) = 0.295


𝑂𝑅

12

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy