Kubsa Guyo Advance Biostatistic
Kubsa Guyo Advance Biostatistic
ID NO-RM0083/15
Department Pediatrics
submitted to: Mrs. Dube Jara (BSc in PH, MPHE, Assistant Professor,
PhD Candidate)
jan, 2023,
Fiche, Ethiopia.
QUESTION 1: Data description: Using cholesterol level dataset.
experiment was carried out to investigate the effect of four different cholesterol
reducing diets of persons having hypercholesterolemia, 32 people were
randomly assigned to one of 4 diets to. Each subject has their initial cholesterol
measured, they were then followed for a year and then had a further
measurement of their cholesterol taken.
1.1 Check distribution assumptions?
1.2 Conduct a one-way ANOVA on the data using SPSS. What is your
conclusion?
1.3 Which numbers in the ANOVA table were used for the calculation of:
a. The Between groups mean square
b. The Within groups mean square
c. The F-statistic
Answer:
1.1. Check distribution assumptions.
i. Test of homogeneity of variances (constant variance).
Interpretation: The from the table above, insignificant value for levene test
is=0.879 which is greater than P-value (0.05) the assumption of equal variance is
met.
2
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Diet Statistic Df Sig. Statistic Df Sig.
cholest 1 .128 8 .200* .970 8 .902
2 .211 8 .200* .931 8 .524
3 .177 8 .200* .967 8 .873
4 .150 8 .200* .973 8 .921
*. This is a lower bound of the true significance.
a. Lilliefors Significance Correction
Interpretation: The to test the assumption of normality, we can use the Shapiro
wilk and is typically tested at the alpha=0.05 level of significance.
In this test the significance (p) value is greater than alpha=0.05. So that we can
conclude normal distribution is assumption is met.
Histogram plot for normality: These below three graphs are plotted to check
whether the observations are independently normally distributed or not.
However, these do not indicate the normality, due to the cholest graph, is
slightly right skewed which opposes the normality distribution and the other two
graphs simply seems like rectangular form, which indicates uniformly
distribution.
3
iii. Testing skewness and kurtosis.
Descriptive Statistics
N Skewness Kurtosis
Valid N (listwise) 32
Interpretation: One-way ANOVA test is robust test. It can tolerate data that are non-
normal (skewed or kurtotic distributions) with small effect on type one error rate. From the
table above, skewness result for the cholesterol, Residence and diet are 0.00, 0.469 & 0.00
respectively and kurtosis result for the cholesterol, Residence and diet are 0.005, -2.138 and
0.809 Since those values are less than absolute value, it has small effect on error rate. So, we
accept it as normal. Skewness and kurtosis are calculated by dividing statistic to std. error.
2. SSW=482.571, Df = (8+8+8+8)-4
WMS = SSW/n-k = 482.571/ (8+8+8+8)-4 = 17.235
3. F-statistic = BMS/WMS = 11.187/17.235 = 0.649
Conclusion: F-tabulate of critical values of alpha=0.05 for F (df1=3, df2=28) is
= 2.9467 since 0.649<2.95 (p-value>0.05), fail to reject Ho. The mean
cholesterol level for cholesterol reducing types of diets are the same.
Answer:
Coefficientsa
Standardized Collinearity
Unstandardized Coefficients Coefficients Statistics
Toleranc
Model B Std. Error Beta T Sig. e VIF
1 (Constant) -4.300 .230 -18.679 .000
Sex -.157 .033 -.091 -4.731 .000 .943 1.060
Smoker -.087 .059 -.030 -1.472 .141 .827 1.210
height in cms .041 .002 .685 21.901 .000 .353 2.830
age in years .066 .009 .223 6.904 .000 .331 3.019
a. Dependent Variable: forced expiratory volume (litres)
Interpretation: The first two graph indicate positive linear relationship, but 3rd
and 4th are indicating negative linear relationship. In general, our assumption is
met for linearity.
8
Interpretation: Constant variance we will use the residual versus fitted plot and
residual versus predictor plots. The above graph indicate assumption for
constant variance are met.
v. Independent observation: The observations in each group are independent
of the observations in every other group. The observation within each group
were obtained by a random sample. The only way this assumption can be
satisfied is if a random design was used.
Solution:
Model Summary
Coefficientsa
Unstandardized Standardized 95.0% Confidence
Coefficients Coefficients Interval for B
Lower Upper
Model B Std. Error Beta T Sig. Bound Bound
1 (Constant) 0.432 0.078 5.541 0.000 0.279 0.585
age in years 0.222 0.008 .756 29.533 0.000 0.207 0.237
a. Dependent Variable: forced expiratory volume (litres)
i. Is there a significant association between age and FEV?
Interpretation: Yes. Because the p-value for age variable is <0.001, which indicates that
there is high significant strong positive association. There is significant association
between age and FEV. Because significant value (<0.001) is <0.05 and r=0.756 (strong
positive linear relation).
ii. what is the expected change in FEV for a one-year increase in age
9
Solution:
Y = Bo+B1X1
FEV=0.432+0.222 x 1 age
FEV=0.432+0.222Age
The expected change in FEV for a one-year increase in age is
0.222. The slope tells us that for 1yr increase in the age of children, the FEV
increases by an additional 0.222.
iii. How much of the variation in FEV is explained by its association with
age?
Solution:
In above table show that:
R2 = 0.572, so that 57.2% of FEV variation explained by its
association with age.
ANOVAa
Model Sum of Squares Df Mean Square F Sig.
1 Regression 280.919 1 280.919 872.184 .000b
Residual 210.001 652 .322
Total 490.920 653
a. Dependent Variable: forced expiratory volume (litres)
b. Predictors: (Constant), age in years
Solution:
From above the table of FEV and age summary of ANOVA:
we can obtain sum of squares due to regression and total sum of squares. SSE=
210.001, SST= 490.920.
R2=1-SSE/SST = 1-210.001/490.920 = 0.5722.
Interpretation: Therefore, about 57.22% of the variability in FEV is accounted
by age and the association between FEV and the age is statistically significant
Since statistically significant value (<0.001) is<p-value (0.05).
2.4. Fit a multiple regression model of FEV on age, sex, height and
10
Smoking.
Model Summaryb
Change Statistics
F
Adjusted R Std. Error of R Square Chang
Model R R Square Square the Estimate Change e df1 df2 Sig. F Change
1 .881a .775 .774 .412216 .775 560.02 4 649 .000
1
ANOVAa
Model Sum of Squares Df Mean Square F Sig.
1 Regression 380.640 4 95.160 560.021 .000b
Residual 110.280 649 .170
Total 490.920 653
a. Dependent Variable: forced expiratory volume (litres)
b. Predictors: (Constant), age in years, sex, smoker, height in cms
Coefficientsa
Unstandardized Standardized 95.0% Confidence
Coefficients Coefficients Interval for B
Std. Lower Upper
Model B Error Beta T Sig. Bound Bound
1 (Constant) -4.300 0.230 -18.679 0.000 -4.752 -3.848
Sex -0.157 0.033 -0.091 -4.731 0.000 -0.222 -0.092
Smoker -0.087 0.059 -0.030 -1.472 0.141 -0.204 0.029
height in cms 0.041 0.002 0.685 21.901 0.000 0.037 0.045
age in years 0.066 0.009 0.223 6.904 0.000 0.047 0.084
11
a. Dependent Variable: forced expiratory volume (litres)
Interpretation: There is significant association between FEV and age, sex, height. But there
is no significant association between FEV and smoker.
12
2.The 1centmeter increase in the height (cm) of children, the FEV increases by
0.041.
3.The sex of children will be, the FEV decrease by 0.157.
a. Variable(s) entered on step 1: nce of the drug, age, male, religion, marriage, ethnic, urban,
secondary education, children, eating_prob, phone, fam_support, disclose, Lnage * age.
13
ii. independence are observation: The observations in each group are
independent of the observations in every other group. The observation within
each group were obtained by a random sample. The only way this assumption
can be satisfied is if a random design was used.
iii. Not Multicollinearity: As from the table below, VIF is below ten (<10) for
each Independent variables and tolerance is also greater than 0.1 for each
variable. So that we can say assumption is met.
Coefficientsa
Collinearity Statistics
Interpretation: The above table indicate assumption for not multicollinearity is met.
Because of all value of VIF are less than ten (10) and all value of tolerance are greater
than 0.1.
iv. Adequate sample size. The based number of our independent variable, this
means that one independent variable should be minimum ten (10) sample size.
14
According to our data number of independent variables are is 15 so, should be
minimum sample size are 150. Our data have contained are 636 therefore this
assumption is met. Sample size is >10cases/independent variable.
15
Constant -.997 .175 32.608 1 .000 .369
16
Lower Upper
Step 1a anemia .719 .228 9.898 1 .002 2.052 1.311 3.210
17
which is a purposeful selection algorithm as proposed by Hosmer and
Lemeshow (2000).
Those are p-value less than (<0.25) the enter to multiple logistic regression.
3.3. Fit multiple logistic regression model with all selected variables.
Variables in the Equation
95% C.I.for EXP(B)
B S.E. Wald Df Sig. Exp(B) Lower Upper
Step 1a adherence .797 .197 16.356 1 .000 2.219 1.508 3.265
age .006 .017 .109 1 .741 1.006 .973 1.039
religion -.427 .473 .813 1 .367 .653 .258 1.650
marriage .052 .085 .381 1 .537 1.054 .892 1.245
ethnic .601 .407 2.183 1 .140 1.824 .822 4.051
Urban .090 .284 .100 1 .752 1.094 .627 1.910
eating_prob 1.422 .193 54.073 1 .000 4.146 2.838 6.057
Anemia .636 .264 5.811 1 .016 1.889 1.126 3.168
Phone -.245 .204 1.439 1 .230 .783 .524 1.168
fam_support -.648 .275 5.539 1 .019 .523 .305 .897
disclose -.570 .199 8.218 1 .004 .565 .383 .835
age_cat .404 .341 1.409 1 .235 1.498 .768 2.922
Constant -2.120 .946 5.022 1 .025 .120
a. Variable(s) entered on step 1: nce of the drug, age, religion, marriage, ethnic, Urban, eating_prob, anemia, phone, fam_support,
disclose, age_cat.
Interpretation: The above table indicate that all selected variable in multiple
logistic regression model are p-value <0.25.
3.4. Check the model fitness using all statistic for fitness.
Hypothesis:
Ho: The model is good fit.
Ha: The model is not good fit.
i. By likelihood test:
Iteration History
Coefficients
Iteration -2 Log likelihood Constant
Step 0 1 808.932 -.665
2 808.834 -.691
3 808.834 -.691
18
a. Constant is included in the model.
b. Initial -2 Log Likelihood: 808.834
c. Estimation terminated at iteration number 3 because parameter estimates
changed by less than .001.
Model Summary
Cox & Snell R Nagelkerke R
Step -2 Log likelihood Square Square
a
1 662.958 .205 .285
a. Estimation terminated at iteration number 5 because parameter estimates
changed by less than .001.
Test hypothesis:
by likelihood test this means that statistic equals:
2 = -2LLo – (-2 LL1 ) = 808.834-662.9581 = 145.876
19
2 2 2
hypothesis. OR when tabulate X > calculate X we fail to reject Ho, so X tab.
2
=15.507, X cal. =4.556 therefore 15.507>4.556 we fail to reject Ho.
Interpretation:
1.The odd of poor adherence of respondent was 4.394 times more likely had
good nutritional status as compared to good adherence of respondent
[AOR=4.394,95% CI (1.878-10.282)].
2.The odd of fair adherence of respondent was 2.704 times more likely had good
nutritional status as compared to good adherence of respondent [AOR=2.704,
95% CI (1.403-5.211)].
3.The odd of eat problem of respondent was 4.318 times more likely had good
nutritional status as compared to non-eat problem of respondent [AOR=4.318,
95% CI (2.974-6.267)].
4.The odd of respondent had anemia was 1.812 times more likely had good
nutritional status as compared to no had anemia respondent [AOR=1.812, 95%
CI (1.094-3.002)].
5.The odd of who get family support was reduced by 53.2% had good
nutritional status as compared to who not get family support [AOR=0.532, 95%
CI (0.313-0.903)].
6.The odd of respondent who disclose HIV status was reduced by 61.3% had
good nutritional status as compared to respondent who not disclose HIV status
[AOR=0.613, 95% CI (0.419.-0. 895)].
4.1. Life table with 5 interval and report the survival probability at the end
of the fourth interval.
ltable id status, survival interval (5)
Beg. Std.
Interval total Deaths Lost Survival error [95% conf. int.]
1. 0 5 485 4 0 0.9918 0.0041 0.9782 0.9969
2. 5 10 481 5 0 0.9814 0.0061 0.9646 0.9903
3. 10 15 476 5 0 0.9711 0.0076 0.9517 0.9828
4. 15 20 471 5 0 0.9608 0.0088 0.9393 0.9748
4.2. KM with log-rank test for variables-sex, residence, who stage and
regimen change.
1. ADR by sex:
22
Overall Comparisons
Chi-Square Df Sig.
Test of equality of survival distributions for the different levels of Sex of the
respondents.
23
Interpretation: The probabilities of surviving for the male to who develop
adverse drug reaction greater than that for the female who to develop adverse
drug reaction.
2. ADR by residence
Overall Comparisons
Chi-Square Df Sig.
25
Overall Comparisons
Chi-Square Df Sig.
Test of equality of survival distributions for the different levels of who stage categorized.
Interpretation: The probabilities of surviving for the WHO stage І and ІІ of the
respondents who to develop adverse drug reaction greater than that for the WHO
stage ІІІ and ІV of the respondents who to develop adverse drug reaction.
4.ADR by regimen
26
Overall Comparisons
Chi-Square df Sig.
Interpretation: The probabilities of surviving for the not regimen change of the
respondents who to develop adverse drug reaction greater than that for the
regimen change of the respondents who to develop adverse drug reaction.
4.3. Fit cox regression model with some selected independent variables and
interprets the output. (Hint: sex, age_cat, residence, baseline CD4, regimen
change, who change, BMI and CPT-can be used for the cox model fitness).
27
respondentsb 1=outside the catchment 212 1
area
regimen changeb 0=yes 187 0
1=no 298 1
b
who stage categorized 0=stage I and II 270 0
1=stage III and IV 215 1
CD4 at base line 0=CD4 <=200 291 0
categorizedb 1=CD4 >200 194 1
history of CPTb 0=yes 372 0
1=no 113 1
b
BMI categorized 0=under weight 167 0
1=normal 318 1
categorized ageb 0=15-24 63 1 0 0
1=25-34 203 0 1 0
2=35-44 146 0 0 1
3=>=45 73 0 0 0
28
Variables in the Equation
95.0% CI for
Exp(B)
Sex of the respondents .588 .287 4.207 1 .040 1.800 1.026 3.157
who stage categorized .848 .257 10.906 1 .001 2.334 1.411 3.859
CD4 at base line -.373 .298 1.559 1 .212 .689 .384 1.237
categorized
categorized age (1) .318 .403 .622 1 .430 1.374 .624 3.029
categorized age (2) -.131 .437 .089 1 .765 .878 .373 2.066
categorized age (3) .201 .471 .182 1 .669 1.223 .486 3.076
Body mass index .037 .046 .663 1 .416 1.038 .949 1.135
Interpretation:
1.Being a female Sex of the respondents has statistically significance Risk of die
1.8 times (about 2 times) higher than the rate of those male for who a history of
adverse drug reaction.
29
4.The respondents of the WHO stage ІІІ and ІV has statistically significance
high risk of die 2.334 times than the rate of those WHO stage І and ІІ for who
history of adverse drug reaction.
THE END!
30