0% found this document useful (0 votes)
109 views30 pages

Kubsa Guyo Advance Biostatistic

This document appears to be a course assignment submitted by Kubsa Guyo Boru for an Advanced Biostatistics course at Salale University College of Health Science Department of Pediatrics. The assignment contains two questions analyzing cholesterol and forced expiratory volume (FEV) data using statistical tests like one-way ANOVA, simple and multiple linear regression. For question 1, the student checks assumptions, conducts a one-way ANOVA on cholesterol data, and identifies values used to calculate the between groups mean square, within groups mean square, and F-statistic. For question 2, the student checks distribution assumptions, performs simple linear regression of FEV on age and reports findings, calculates the coefficient of determination, and fits a multiple regression model

Uploaded by

Kubsa Guyo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
109 views30 pages

Kubsa Guyo Advance Biostatistic

This document appears to be a course assignment submitted by Kubsa Guyo Boru for an Advanced Biostatistics course at Salale University College of Health Science Department of Pediatrics. The assignment contains two questions analyzing cholesterol and forced expiratory volume (FEV) data using statistical tests like one-way ANOVA, simple and multiple linear regression. For question 1, the student checks assumptions, conducts a one-way ANOVA on cholesterol data, and identifies values used to calculate the between groups mean square, within groups mean square, and F-statistic. For question 2, the student checks distribution assumptions, performs simple linear regression of FEV on age and reports findings, calculates the coefficient of determination, and fits a multiple regression model

Uploaded by

Kubsa Guyo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 30

SALALE UNIVERSITY

COLLEGE OF HEALTH SCIENCE


DEPARTMENT OF PEDIATRICS

Course: -Advanced Biostatistics.


By: - Kubsa Guyo Boru (BSc, MSc candidate)

ID NO-RM0083/15

Department Pediatrics

submitted to: Mrs. Dube Jara (BSc in PH, MPHE, Assistant Professor,
PhD Candidate)

jan, 2023,
Fiche, Ethiopia.
QUESTION 1: Data description: Using cholesterol level dataset.
experiment was carried out to investigate the effect of four different cholesterol
reducing diets of persons having hypercholesterolemia, 32 people were
randomly assigned to one of 4 diets to. Each subject has their initial cholesterol
measured, they were then followed for a year and then had a further
measurement of their cholesterol taken.
1.1 Check distribution assumptions?
1.2 Conduct a one-way ANOVA on the data using SPSS. What is your
conclusion?
1.3 Which numbers in the ANOVA table were used for the calculation of:
a. The Between groups mean square
b. The Within groups mean square
c. The F-statistic
Answer:
1.1. Check distribution assumptions.
i. Test of homogeneity of variances (constant variance).

Test of Homogeneity of Variances


Levene Statistic df1 df2 Sig.
cholest Based on Mean 0.224 3 28 0.879
Based on Median 0.225 3 28 0.878
Based on Median and with 0.225 3 26.955 0.878
adjusted df
Based on trimmed mean 0.226 3 28 0.878

Interpretation: The from the table above, insignificant value for levene test
is=0.879 which is greater than P-value (0.05) the assumption of equal variance is
met.

ii. testing normal distribution.

2
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Diet Statistic Df Sig. Statistic Df Sig.
cholest 1 .128 8 .200* .970 8 .902
2 .211 8 .200* .931 8 .524
3 .177 8 .200* .967 8 .873
4 .150 8 .200* .973 8 .921
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction

Interpretation: The to test the assumption of normality, we can use the Shapiro
wilk and is typically tested at the alpha=0.05 level of significance.

In this test the significance (p) value is greater than alpha=0.05. So that we can
conclude normal distribution is assumption is met.

Histogram plot for normality: These below three graphs are plotted to check
whether the observations are independently normally distributed or not.
However, these do not indicate the normality, due to the cholest graph, is
slightly right skewed which opposes the normality distribution and the other two
graphs simply seems like rectangular form, which indicates uniformly
distribution.

3
iii. Testing skewness and kurtosis.

Descriptive Statistics

N Skewness Kurtosis

Statistic Statistic Std. Error Statistic Std. Error

Residence 32 0.000 0.414 -2.138 0.809

cholest 32 0.469 0.414 0.005 0.809

diet 32 0.000 0.414 -1.385 0.809

Valid N (listwise) 32

Interpretation: One-way ANOVA test is robust test. It can tolerate data that are non-
normal (skewed or kurtotic distributions) with small effect on type one error rate. From the
table above, skewness result for the cholesterol, Residence and diet are 0.00, 0.469 & 0.00
respectively and kurtosis result for the cholesterol, Residence and diet are 0.005, -2.138 and
0.809 Since those values are less than absolute value, it has small effect on error rate. So, we
accept it as normal. Skewness and kurtosis are calculated by dividing statistic to std. error.

iv. Independent observation: The observations in each group are


independent of the observations in every other group. The observation within
each group were obtained by a random sample. The only way this assumption
can be satisfied is if a random design was used.
1.2. Conduct a one-way ANOVA on the data using SPSS. what are your
conclusions?
Answer:
Since I have two independent variables, then I can do two different one-way
ANOVAs. But when I do chosterol with residence have only factor level (value)
of residence can’t do one-way ANOVA, this means that can I do independent
sample t test.
4
I have one response and two independent variables; I can’t do one-way ANOVA
because of two independent variables. But we can do two-way ANOVA this
means that we STATA but I can’t do by STATA.
Descriptive cholest

95% Confidence Interval for


Mean
Std.
N Mean Deviation Std. Error Lower Bound Upper Bound Minimum Maximum

1 8 12.575 4.8872 1.7279 8.489 16.661 6.2 21.5

2 8 10.362 3.9170 1.3849 7.088 13.637 4.1 15.2

3 8 10.463 3.9892 1.4104 7.127 13.798 5.1 17.3

4 8 9.938 3.7144 1.3132 6.832 13.043 4.9 15.9

Total 32 10.834 4.0804 0.7213 9.363 12.306 4.1 21.5

Sum of Squares Df Mean Square F Sig.


Between Groups 33.561 3 11.187 0.649 0.590
Within Groups 482.571 28 17.235
Total 516.132 31
Hypothesis:
Ho: The mean of populations is equal.
Ha: The at least one mean of populations is difference.
Interpretation: In this case we fail to reject Ho because of p-value is >0.05, so
I fail to reject Ho and reject Ha.
Conclusion: At the alpha=0.05 level of significance, there is no enough
evidence to conclude that mean cholesterol level of cholesterol reducing types of
diets differ.
1.3. which number in ANOVA table were used for the calculation of:
1.the between groups mean square (BMS)
2.the within groups mean square (WMS)
3.the F-statistic
5
Solution:
1. SSB=33.561, Df= 4-1
BMS = SSB/k-1 = 33.561/4-1 = 11.187

2. SSW=482.571, Df = (8+8+8+8)-4
WMS = SSW/n-k = 482.571/ (8+8+8+8)-4 = 17.235
3. F-statistic = BMS/WMS = 11.187/17.235 = 0.649
Conclusion: F-tabulate of critical values of alpha=0.05 for F (df1=3, df2=28) is
= 2.9467 since 0.649<2.95 (p-value>0.05), fail to reject Ho. The mean
cholesterol level for cholesterol reducing types of diets are the same.

QUESTION 2: Data description: Using forced expiratory volume


(litres)(FEV) data set.
The data file fev data. sav contains measurements of forced expiratory volume (FEV) made in
654 children and young adults aged 6 to 22, in a city. Dependent variable: Forced expiratory
volume (litres) (FEV) Independents variables Sex (male/female) Age, Height (cm) and
Smoking status.
Do analysis of the following statistics:
2.1. Check distribution assumptions
2.2. Conduct simple linear regression of FEV on age.
i. Is there a significant association between age and FEV?
ii. What is the expected change in FEV for a one-year increase in age?
iii. How much of the variation in FEV is explained by its association with age?
2.3. Calculate the coefficient of determination and interpret the values. Is the association
statistically significant?
2.4. Fit a multiple regression model of FEV on age, sex, height and smoking?
2.5. Report the unadjusted and adjusted regression coefficients and interpreter your findings.

Answer:

2.1. Check distribution assumptions


Answer:
The assumption of the distribution is; each of the populations from which the
sample comes is normally distributed and each of the populations has the same
variance. To check the normality distribution and homoscedasticity (constant
variance) it possible to use formal test and diagnostic plots. For normality we
will use histogram for each dependent and independent variable. And for
constant variance we will use the residual versus fitted plot and residual versus
predictor plots.
i. normality test.
By graphical-examination:
6
Interpretation: Generally, two graphs above (P.P plot and histogram plot)
indicated data are met assumptions for normality.

ii. Not multicollinearity.

Coefficientsa
Standardized Collinearity
Unstandardized Coefficients Coefficients Statistics
Toleranc
Model B Std. Error Beta T Sig. e VIF
1 (Constant) -4.300 .230 -18.679 .000
Sex -.157 .033 -.091 -4.731 .000 .943 1.060
Smoker -.087 .059 -.030 -1.472 .141 .827 1.210
height in cms .041 .002 .685 21.901 .000 .353 2.830
age in years .066 .009 .223 6.904 .000 .331 3.019
a. Dependent Variable: forced expiratory volume (litres)

Interpretation: The above table indicate assumption for not multicollinearity is


met. Because of all value of VIF are less than ten (10) and all value of tolerance
7
are greater than 0.1. Not require to do correlations matric because of our
independent variables more than two.

iii. linearity relationship.

Interpretation: The first two graph indicate positive linear relationship, but 3rd
and 4th are indicating negative linear relationship. In general, our assumption is
met for linearity.

iv. homoscedasticity (constant variance).

8
Interpretation: Constant variance we will use the residual versus fitted plot and
residual versus predictor plots. The above graph indicate assumption for
constant variance are met.
v. Independent observation: The observations in each group are independent
of the observations in every other group. The observation within each group
were obtained by a random sample. The only way this assumption can be
satisfied is if a random design was used.

2.2. Conduct simple linear regression of FEV on age.

Solution:
Model Summary

Std. Error Change Statistics


Adjusted R of the R Square F Sig. F
Model R R Square Square Estimate Change Change df1 df2 Change
1 0.756a 0.572 0.572 0.567527 0.572 872.184 1 652 .000
a. Predictors: (Constant), age in years
b. Dependent Variable: forced expiratory volume (litres)

Coefficientsa
Unstandardized Standardized 95.0% Confidence
Coefficients Coefficients Interval for B
Lower Upper
Model B Std. Error Beta T Sig. Bound Bound
1 (Constant) 0.432 0.078 5.541 0.000 0.279 0.585
age in years 0.222 0.008 .756 29.533 0.000 0.207 0.237
a. Dependent Variable: forced expiratory volume (litres)
i. Is there a significant association between age and FEV?

Interpretation: Yes. Because the p-value for age variable is <0.001, which indicates that
there is high significant strong positive association. There is significant association
between age and FEV. Because significant value (<0.001) is <0.05 and r=0.756 (strong
positive linear relation).

ii. what is the expected change in FEV for a one-year increase in age
9
Solution:

Y = Bo+B1X1
FEV=0.432+0.222 x 1 age
FEV=0.432+0.222Age
The expected change in FEV for a one-year increase in age is
0.222. The slope tells us that for 1yr increase in the age of children, the FEV
increases by an additional 0.222.
iii. How much of the variation in FEV is explained by its association with
age?
Solution:
In above table show that:
R2 = 0.572, so that 57.2% of FEV variation explained by its
association with age.

2.3. Calculate the coefficient of determination and interpret the values. Is


the association statistically significant?

ANOVAa
Model Sum of Squares Df Mean Square F Sig.
1 Regression 280.919 1 280.919 872.184 .000b
Residual 210.001 652 .322
Total 490.920 653
a. Dependent Variable: forced expiratory volume (litres)
b. Predictors: (Constant), age in years
Solution:
From above the table of FEV and age summary of ANOVA:
we can obtain sum of squares due to regression and total sum of squares. SSE=
210.001, SST= 490.920.
R2=1-SSE/SST = 1-210.001/490.920 = 0.5722.
Interpretation: Therefore, about 57.22% of the variability in FEV is accounted
by age and the association between FEV and the age is statistically significant
Since statistically significant value (<0.001) is<p-value (0.05).
2.4. Fit a multiple regression model of FEV on age, sex, height and
10
Smoking.

Model Summaryb
Change Statistics
F
Adjusted R Std. Error of R Square Chang
Model R R Square Square the Estimate Change e df1 df2 Sig. F Change
1 .881a .775 .774 .412216 .775 560.02 4 649 .000
1

ANOVAa
Model Sum of Squares Df Mean Square F Sig.
1 Regression 380.640 4 95.160 560.021 .000b
Residual 110.280 649 .170
Total 490.920 653
a. Dependent Variable: forced expiratory volume (litres)
b. Predictors: (Constant), age in years, sex, smoker, height in cms

Coefficientsa
Unstandardized Standardized 95.0% Confidence
Coefficients Coefficients Interval for B
Std. Lower Upper
Model B Error Beta T Sig. Bound Bound
1 (Constant) -4.300 0.230 -18.679 0.000 -4.752 -3.848
Sex -0.157 0.033 -0.091 -4.731 0.000 -0.222 -0.092
Smoker -0.087 0.059 -0.030 -1.472 0.141 -0.204 0.029
height in cms 0.041 0.002 0.685 21.901 0.000 0.037 0.045
age in years 0.066 0.009 0.223 6.904 0.000 0.047 0.084

11
a. Dependent Variable: forced expiratory volume (litres)

Interpretation: There is significant association between FEV and age, sex, height. But there
is no significant association between FEV and smoker.

Fit model: Y = Bo+B1X1+B2X2+B3X3+B4X4

FEV = -4.30 -0.157Sex-0.087Smoker+0.041Height+0.066Age

2.5. Report the unadjusted and adjusted regression coefficient and


interpret your findings.
Coefficientsa

Unstandardized Standardized 95.0% Confidence


Coefficients Coefficients Interval for B
Std. Lower Upper
Model B Error Beta T Sig. Bound Bound
1 (Constant) -4.300 0.230 -18.679 0.000 -4.752 -3.848
Sex -0.157 0.033 -0.091 -4.731 0.000 -0.222 -0.092
Smoker -0.087 0.059 -0.030 -1.472 0.141 -0.204 0.029
height in cms 0.041 0.002 0.685 21.901 0.000 0.037 0.045
age in years 0.066 0.009 0.223 6.904 0.000 0.047 0.084

Variables Unadjusted R. Coefficient Adjusted R. coefficient


Sex -0.157 -0.091
Smoker -0.087 -0.030
Height 0.041 0.685
Age in years 0.066 0.223
Interpretation: Unadjusted or unstandardized models: slope compares
parameters across groups differing by 1 unit in the modeled predictor.
Adjusted or standardized models: Slope compares parameters across groups
differing by 1 unit in the modeled predictor but similar with respect to the other
model covariates.
1.The 1year increase in the age of children, the FEV increases by 0.066. OR
The expected change in FEV for a one-year increase in age is 0.066.

12
2.The 1centmeter increase in the height (cm) of children, the FEV increases by
0.041.
3.The sex of children will be, the FEV decrease by 0.157.

QUESTION 3: Data description: Using of under nutrition data set


determination. A total of 636 HIV-infected patients were assessed for the Determinants
of undernutrition in the year 2009.
The event of interest: Nutritional “Nutritional’s” status (good/poor) based on their body
mass index.
Explanatory variables: Sex of participants (male/female), Eating problem (yes/no), Presence
candidiasis (yes/no), Disclosure status (yes/no), Adherence (good/fair/poor), Residence
(urban/rural), age (continuous), children (yes/no), Anemia (yes/no) and All others
independents variables.
Do the following statistics:
1. Check the assumptions of logistic regression
2. Conduct bivariate analysis and select candidate variables for multivariable
analysis using selection criteria p-value
3. Fit multivariable logistic regression model with all selected variables
4. Check the model fitness using all statistics for fitness
5. Interpret the results for those show significant association
Answer:
3.1. Check distribution assumptions of logistic regression.
i. The logit of dependent variable is linearly related to the continuous
independent variable.
This is done by creating new variable called Lnage and by interacting age with
Lnage. By looking the table below, sig.(p)value for the interaction Lnage by age
is =0.504, which is greater than alpha=0.05, we assume that, the logit of
dependent variable is linearly related to the continuous independent variable.

95% C.I.for EXP(B)


Lower
B S.E. Wald df Sig. Exp(B) Upper
Step 1a Lnage by age .037 .055 .446 1 .504 1.037 .931 1.156
Constant 22.767 27305.9 .000 1 .999 77216894
41 22.000

a. Variable(s) entered on step 1: nce of the drug, age, male, religion, marriage, ethnic, urban,
secondary education, children, eating_prob, phone, fam_support, disclose, Lnage * age.

13
ii. independence are observation: The observations in each group are
independent of the observations in every other group. The observation within
each group were obtained by a random sample. The only way this assumption
can be satisfied is if a random design was used.
iii. Not Multicollinearity: As from the table below, VIF is below ten (<10) for
each Independent variables and tolerance is also greater than 0.1 for each
variable. So that we can say assumption is met.
Coefficientsa

Collinearity Statistics

Model Tolerance VIF

1 Adherence 0.909 1.100

Age 0.317 3.150

Male 0.859 1.164

Religion 0.969 1.032

Marriage 0.900 1.111

Ethnic 0.948 1.054

Urban 0.851 1.175

Secondary education 0.882 1.134

Children 0.848 1.179

eating_prob 0.913 1.095

Anemia 0.964 1.037

Phone 0.795 1.257

Fam support 0.914 1.094

Disclose 0.888 1.126

age_cat .325 3.081

a. Dependent Variable: bmi

Interpretation: The above table indicate assumption for not multicollinearity is met.
Because of all value of VIF are less than ten (10) and all value of tolerance are greater
than 0.1.
iv. Adequate sample size. The based number of our independent variable, this
means that one independent variable should be minimum ten (10) sample size.

14
According to our data number of independent variables are is 15 so, should be
minimum sample size are 150. Our data have contained are 636 therefore this
assumption is met. Sample size is >10cases/independent variable.

3.2. Conduct bivariate analysis and select candidate variable for


multivariable analysis using selection criteria P value.
Variables in the Equation
95% C.I.for EXP(B)
B S.E. Wald df Sig. Exp(B) Lower Upper
Step 1a Adherence .998 .178 31.541 1 .000 2.712 1.914 3.841

Constant -1.883 .227 68.759 1 .000 .152

a. Variable(s) entered on step 1: adherence.

Variables in the Equation


95% C.I.for EXP(B)
B S.E. Wald df Sig. Exp(B) Lower Upper
Step 1a Age .018 .008 4.559 1 .033 1.018 1.001 1.035

Constant -1.314 .305 18.575 1 .000 .269

a. Variable(s) entered on step 1: age.

Variables in the Equation


95% C.I.for EXP(B)
B S.E. Wald Df Sig. Exp(B) Lower Upper
Step 1a Male -.102 .175 .337 1 .561 .903 .641 1.273

Constant -.656 .105 38.810 1 .000 .519

a. Variable(s) entered on step 1: Male.

Variables in the Equation


95% C.I.for EXP(B)

B S.E. Wald df Sig. Exp(B) Lower Upper


Step 1a religion -.448 .359 1.558 1 .212 .639 .316 1.291

Constant -.227 .380 .356 1 .551 .797

a. Variable(s) entered on step 1: religion.

Variables in the Equation


95% C.I.for EXP(B)
B S.E. Wald df Sig. Exp(B) Lower Upper
Step 1a marriage .148 .073 4.073 1 .044 1.160 1.004 1.339

15
Constant -.997 .175 32.608 1 .000 .369

a. Variable(s) entered on step 1: marriage.

Variables in the Equation


95% C.I.for EXP(B)
B S.E. Wald df Sig. Exp(B) Lower Upper
Step 1a ethnic .547 .367 2.228 1 .136 1.729 .843 3.546

Constant -1.259 .388 10.523 1 .001 .284

a. Variable(s) entered on step 1: ethnic.

Variables in the Equation


95% C.I.for EXP(B)
B S.E. Wald Df Sig. Exp(B) Lower Upper
Step 1a Urban -.373 .227 2.686 1 .101 .689 .441 1.076
Constant -.379 .208 3.335 1 .068 .684
a. Variable(s) entered on step 1: Urban.

Variables in the Equation


95% C.I.for EXP(B)
B S.E. Wald Sig. Exp(B) Lower Upper
Step 1a Secondary education -.164 .182 .811 .368 .849 .594 1.213

Constant -.642 .101 40.168 .000 .527


a. Variable(s) entered on step 1: Secondary education.

Variables in the Equation


95% C.I.for EXP(B)

B S.E. Wald df Sig. Exp(B) Lower Upper


Step 1a children -.159 .177 .799 1 .371 .853 .603 1.208

Constant -.588 .144 16.658 1 .000 .556

a. Variable(s) entered on step 1: children.

Variables in the Equation


95% C.I.for EXP(B)
B S.E. Wald df Sig. Exp(B) Lower Upper
Step 1a eating_prob 1.627 .181 80.788 1 .000 5.089 3.569 7.256

Constant -1.426 .129 122.871 1 .000 .240

a. Variable(s) entered on step 1: eating_prob.

Variables in the Equation


B S.E. Wald df Sig. Exp(B) 95% C.I.for EXP(B)

16
Lower Upper
Step 1a anemia .719 .228 9.898 1 .002 2.052 1.311 3.210

Constant -.806 .093 75.364 1 .000 .447

a. Variable(s) entered on step 1: anemia.

Variables in the Equation


95% C.I.for EXP(B)

B S.E. Wald df Sig. Exp(B) Lower Upper


Step 1a phone -.635 .170 13.941 1 .000 .530 .379 .739

Constant -.356 .121 8.711 1 .003 .701

a. Variable(s) entered on step 1: phone.

Variables in the Equation


95% C.I.for EXP(B)
B S.E. Wald Df Sig. Exp(B) Lower Upper
Step 1a fam_support -.921 .240 14.774 1 .000 .398 .249 .637

Constant .098 .221 .195 1 .659 1.103

a. Variable(s) entered on step 1: fam_support.

Variables in the Equation


95% C.I.for EXP(B)

B S.E. Wald df Sig. Exp(B) Lower Upper


Step 1a disclose -.892 .173 26.724 1 .000 .410 .292 .575

Constant -.250 .117 4.585 1 .032 .778


a. Variable(s) entered on step 1: disclose.

Variables in the Equation


95% C.I.for EXP(B)

B S.E. Wald df Sig. Exp(B) Lower Upper


Step 1a age_cat .335 .171 3.825 1 .051 1.398 .999 1.957

Constant -.827 .110 56.573 1 .000 .438

a. Variable(s) entered on step 1: age_cat.

Interpretation: From above output the independent variables such as:


Secondary education, male and children are not statistically significant.
Because, their p-value was greater than 0.25. The p-value cut-off point of 0.25

17
which is a purposeful selection algorithm as proposed by Hosmer and
Lemeshow (2000).

Those are p-value less than (<0.25) the enter to multiple logistic regression.

3.3. Fit multiple logistic regression model with all selected variables.
Variables in the Equation
95% C.I.for EXP(B)
B S.E. Wald Df Sig. Exp(B) Lower Upper
Step 1a adherence .797 .197 16.356 1 .000 2.219 1.508 3.265
age .006 .017 .109 1 .741 1.006 .973 1.039
religion -.427 .473 .813 1 .367 .653 .258 1.650
marriage .052 .085 .381 1 .537 1.054 .892 1.245
ethnic .601 .407 2.183 1 .140 1.824 .822 4.051
Urban .090 .284 .100 1 .752 1.094 .627 1.910
eating_prob 1.422 .193 54.073 1 .000 4.146 2.838 6.057
Anemia .636 .264 5.811 1 .016 1.889 1.126 3.168
Phone -.245 .204 1.439 1 .230 .783 .524 1.168
fam_support -.648 .275 5.539 1 .019 .523 .305 .897
disclose -.570 .199 8.218 1 .004 .565 .383 .835
age_cat .404 .341 1.409 1 .235 1.498 .768 2.922
Constant -2.120 .946 5.022 1 .025 .120
a. Variable(s) entered on step 1: nce of the drug, age, religion, marriage, ethnic, Urban, eating_prob, anemia, phone, fam_support,
disclose, age_cat.

Interpretation: The above table indicate that all selected variable in multiple
logistic regression model are p-value <0.25.
3.4. Check the model fitness using all statistic for fitness.

Hypothesis:
Ho: The model is good fit.
Ha: The model is not good fit.

i. By likelihood test:
Iteration History
Coefficients
Iteration -2 Log likelihood Constant
Step 0 1 808.932 -.665
2 808.834 -.691
3 808.834 -.691
18
a. Constant is included in the model.
b. Initial -2 Log Likelihood: 808.834
c. Estimation terminated at iteration number 3 because parameter estimates
changed by less than .001.

Model Summary
Cox & Snell R Nagelkerke R
Step -2 Log likelihood Square Square
a
1 662.958 .205 .285
a. Estimation terminated at iteration number 5 because parameter estimates
changed by less than .001.

Omnibus Tests of Model Coefficients


Chi-square Df Sig.
Step 1 Step 145.876 12 .000
Block 145.876 12 .000
Model 145.876 12 .000

Test hypothesis:
by likelihood test this means that statistic equals:
 2 = -2LLo – (-2 LL1 ) = 808.834-662.9581 = 145.876

Interpretation: The chi-square difference between Iteration History (-2 Log


likelihood step 0) and Model Summary (-2 Log likelihood step 1) is equal to chi-
square Omnibus Tests of Model Coefficients (model step 1), this means that our
model good fitted. So, failed to reject Ho hypothesis test and reject Ha
hypothesis.

ii. By Hosmer and Lemeshow test:

Hosmer and Lemeshow Test


Step Chi-square Df Sig.
1 4.556 8 .804

Interpretation: By another method our model goodness of fit check is Hosmer


and Lemeshow Test. If p-value >0.05 our model is goodness of fitted.
Conclusion: The model is good fitted so, we failed reject Ho hypothesis test
because our model goodness of fit (p-value 0.804>0.05), so reject Ha

19
2 2 2
hypothesis. OR when tabulate X > calculate X we fail to reject Ho, so X tab.
2
=15.507, X cal. =4.556 therefore 15.507>4.556 we fail to reject Ho.

3.5. Interpret the results for those show significant association.

Categorical Variables Codings


Parameter coding
Frequency (1) (2)
Adherence Good 555 .000 .000
Fair 49 1.000 .000
Poor 32 .000 1.000

eating_prob No 387 .000


Yes 249 1.000
Anemia No 544 .000
Yes 92 1.000

Disclose No 297 .000


Yes 339 1.000
fam_support No 82 .000
Yes 554 1.000

My reference category all are using first!

Significance variables only.


Variables in the Equation
95% C.I.for
EXP(B)
B S.E. Wald df Sig. Exp(B) Lower Upper
a
Step 1 Adherence 19.146 2 .000
adherence (1) .995 .335 8.832 1 .003 2.704 1.403 5.211
adherence (2) 1.480 .434 11.647 1 .001 4.394 1.878 10.282
eating_prob (1) 1.463 .190 59.190 1 .000 4.318 2.974 6.267
Anemia (1) .594 .258 5.324 1 .021 1.812 1.094 3.002
fam_support -.631 .270 5.458 1 .019 .532 .313 .903
(1)
Disclose (1) -.490 .194 6.404 1 .011 .613 .419 .895
Constant -.823 .278 8.766 1 .003 .439
20
a. Variable(s) entered on step 1: adherence, eating_prob, anemia, fam_support, disclose.

Interpretation:

1.The odd of poor adherence of respondent was 4.394 times more likely had
good nutritional status as compared to good adherence of respondent
[AOR=4.394,95% CI (1.878-10.282)].

2.The odd of fair adherence of respondent was 2.704 times more likely had good
nutritional status as compared to good adherence of respondent [AOR=2.704,
95% CI (1.403-5.211)].

3.The odd of eat problem of respondent was 4.318 times more likely had good
nutritional status as compared to non-eat problem of respondent [AOR=4.318,
95% CI (2.974-6.267)].

4.The odd of respondent had anemia was 1.812 times more likely had good
nutritional status as compared to no had anemia respondent [AOR=1.812, 95%
CI (1.094-3.002)].

5.The odd of who get family support was reduced by 53.2% had good
nutritional status as compared to who not get family support [AOR=0.532, 95%
CI (0.313-0.903)].

6.The odd of respondent who disclose HIV status was reduced by 61.3% had
good nutritional status as compared to respondent who not disclose HIV status
[AOR=0.613, 95% CI (0.419.-0. 895)].

QUESTION 4: Data description: Using adverse drug reaction data set.


A total of 485 patients were assessed in in the for adverse drug reaction in the year 2010.
The event of interest: adverse drug reaction status (event/censored)
The time taken to develop adverse drug reaction (if any)-time
Patients who did not develop ADR until the date of interview were considered as right
censored.
Explanatory variables: Sex of participants (male/female), History of cpt (yes/no), WHO
stage (stage I and II, stage III and IV).
21
With tow category: Age (15-24, 24-34, 35-44, >=45), Regimen change (yes/no), Residence
(Within/outside catchment area), BMI (underweight/normal), Baseline CD4 (cd4cat)
(=<200/>200).
Do analysis of the following statistics:
1. Life table with 5 interval and report the survival probability at the end of the
fourth interval.
2. KM with Log-rank test for variables - sex, residence, who stage and regimen
change.
3. Fit Cox regression model with some selected independent variables and
interprets the output. Hint: Sex, age_cat, residence, baseline CD4, regimen
change, who stage, BMI and CPT – can be used for the Cox model.

4.1. Life table with 5 interval and report the survival probability at the end
of the fourth interval.
ltable id status, survival interval (5)
Beg. Std.
Interval total Deaths Lost Survival error [95% conf. int.]
1. 0 5 485 4 0 0.9918 0.0041 0.9782 0.9969
2. 5 10 481 5 0 0.9814 0.0061 0.9646 0.9903
3. 10 15 476 5 0 0.9711 0.0076 0.9517 0.9828
4. 15 20 471 5 0 0.9608 0.0088 0.9393 0.9748

Answer: Survival probability at fourth interval is = 0.9608

4.2. KM with log-rank test for variables-sex, residence, who stage and
regimen change.
1. ADR by sex:

H0: There is no difference between the survival curves.

Ha: There is difference between the survival curves.

22
Overall Comparisons

Chi-Square Df Sig.

Log Rank (Mantel-Cox) 6.685 1 0.010

Test of equality of survival distributions for the different levels of Sex of the
respondents.

Conclusion: The log–rank statistic is 6.685 and the corresponding P-value =


0.010. Therefore, we to reject H0 and accept Ha because of KM curve
statistically significance p-value <0.05 is 0.010. We can conclude that the male
and female groups to develop adverse drug reaction have significantly different
KM survival curves or the probabilities of surviving in the two groups are
significantly different.

23
Interpretation: The probabilities of surviving for the male to who develop
adverse drug reaction greater than that for the female who to develop adverse
drug reaction.
2. ADR by residence

H0: There is no difference between the survival curves.

Ha: There is difference between the survival curves.

Overall Comparisons

Chi-Square Df Sig.

Log Rank (Mantel-Cox) 4.883 1 0.027

Test of equality of survival distributions for the different levels of Residence of


the respondents.

Conclusion: The log–rank statistic is 4.883 and the corresponding P-value =


0.027. Therefore, we to reject H0 and accept Ha because of KM curve
24
statistically significance p-value <0.05 is 0.027. We can conclude that the
residence of the respondent within the catchment area and residence of the
respondent outside the catchment area to develop adverse drug reaction have
significantly different KM survival curves or the probabilities of surviving in the
two groups are significantly different.

Interpretation: The probabilities of surviving for the residence of the


respondent who within the catchment area to develop adverse drug reaction
greater than that for the residence of the respondent outside the catchment area
to develop adverse drug reaction.

3.ADR by WHO stage.

H0: There is no difference between the survival curves.

Ha: There is difference between the survival curves.

25
Overall Comparisons

Chi-Square Df Sig.

Log Rank (Mantel-Cox) 18.168 1 0.000

Test of equality of survival distributions for the different levels of who stage categorized.

Conclusion: The log–rank statistic is 18.168 and the corresponding P-value =


<0.001. Therefore, we to reject H0 and accept Ha because of KM curve
statistically significance p-value <0.05 is <0.001. We can conclude that the
WHO stage І and ІІ of the respondents and WHO stage ІІІ and ІV of the
respondents to develop adverse drug reaction have significantly different KM
survival curves or the probabilities of surviving in the two groups are
significantly different.

Interpretation: The probabilities of surviving for the WHO stage І and ІІ of the
respondents who to develop adverse drug reaction greater than that for the WHO
stage ІІІ and ІV of the respondents who to develop adverse drug reaction.

4.ADR by regimen

H0: There is no difference between the survival curves.

Ha: There is difference between the survival curves.

26
Overall Comparisons

Chi-Square df Sig.

Log Rank (Mantel-Cox) 67.767 1 0.000

Test of equality of survival distributions for the different levels of regimen


change.

Conclusion: The log–rank statistic is 67.767 and the corresponding P-value =


<0.001. Therefore, we to reject H0 and accept Ha because of KM curve
statistically significance p-value <0.05 is <0.001. We can conclude that the not
regimen change of the respondents and regimen change of the respondents to
develop adverse drug reaction have significantly different KM survival curves or
the probabilities of surviving in the two groups are significantly different.

Interpretation: The probabilities of surviving for the not regimen change of the
respondents who to develop adverse drug reaction greater than that for the
regimen change of the respondents who to develop adverse drug reaction.

4.3. Fit cox regression model with some selected independent variables and
interprets the output. (Hint: sex, age_cat, residence, baseline CD4, regimen
change, who change, BMI and CPT-can be used for the cox model fitness).

Categorical Variable Codings

Frequency (1) (2) (3)


b
Sex of the respondents 0=male 195 0
1=female 290 1
Residence of the 0=within the catchment area 273 0

27
respondentsb 1=outside the catchment 212 1
area
regimen changeb 0=yes 187 0
1=no 298 1
b
who stage categorized 0=stage I and II 270 0
1=stage III and IV 215 1
CD4 at base line 0=CD4 <=200 291 0
categorizedb 1=CD4 >200 194 1
history of CPTb 0=yes 372 0
1=no 113 1
b
BMI categorized 0=under weight 167 0

1=normal 318 1
categorized ageb 0=15-24 63 1 0 0
1=25-34 203 0 1 0
2=35-44 146 0 0 1
3=>=45 73 0 0 0

I Choose the reference category all are first!

Omnibus Tests of Model Coefficientsa

Change from Previous Change from Previous


Overall (score) Step Block
-2 Log
Likelihoo Chi- Chi-
d square Df Sig. Chi-square Df Sig. square df Sig.

656.604 91.445 10 .000 99.130 10 .000 99.130 10 .000

a. Beginning Block Number 1. Method = Enter

28
Variables in the Equation

95.0% CI for
Exp(B)

B SE Wald Df Sig. Exp(B) Lower Upper

Sex of the respondents .588 .287 4.207 1 .040 1.800 1.026 3.157

Residence of the .492 .251 3.857 1 .050 1.636 1.001 2.674


respondents

regimen change -2.697 .471 32.791 1 .000 .067 .027 .170

who stage categorized .848 .257 10.906 1 .001 2.334 1.411 3.859

CD4 at base line -.373 .298 1.559 1 .212 .689 .384 1.237
categorized

history of CPT .103 .285 .131 1 .718 1.109 .634 1.939

categorized age 2.350 3 .503

categorized age (1) .318 .403 .622 1 .430 1.374 .624 3.029

categorized age (2) -.131 .437 .089 1 .765 .878 .373 2.066

categorized age (3) .201 .471 .182 1 .669 1.223 .486 3.076

Body mass index .037 .046 .663 1 .416 1.038 .949 1.135

Interpretation:
1.Being a female Sex of the respondents has statistically significance Risk of die
1.8 times (about 2 times) higher than the rate of those male for who a history of
adverse drug reaction.

2.The Residence of the respondents outside the catchment area statistically


significance risk of die 1.636 times higher than the rate of those within the
catchment area for who history of adverse drug reaction.

3.The respondent’s Getting non-regimen change is statistically significance


preventive has 0.067 times than the rate of those regimen change for history of
adverse drug reaction.

29
4.The respondents of the WHO stage ІІІ and ІV has statistically significance
high risk of die 2.334 times than the rate of those WHO stage І and ІІ for who
history of adverse drug reaction.

THE END!

30

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy