5.1) Binary logistic regression
5.1) Binary logistic regression
March 2013
Logistic regression
2
Logistic regression: Uses and selection of independent variables
The Model:
► The basic principle of logistic regression is much the same as for ordinary
multiple regression.
a) Find the single variable that has the strongest association with the
dependent variable and enter it into the model (i.e., the variable
with the smallest p-value).
b) Find the variable among those not in the model that, when added
to the model so far obtained, explains the largest amount of the
remaining variability.
c) Repeat step (b) until the addition of an extra variable is not
statistically significant at some chosen level such as P=.05.
► N.B. You have to stop the process at some point otherwise you will
end up with all the variables in the model. 8
Logistic regression: Backward stepwise regression
9
Logistic regression
♣ Logistic regression is a powerful statistical tool for estimating the magnitude
of the association between an exposure and a binary outcome after
adjusting simultaneously for a number of potential confounding
factors.
♣ We can fit regression models to the logit which are very similar to the
ordinary multiple regression models found for data from a normal
distribution.
This is what you get when you divide the probability that
the event occurs by the probability that the event does not
occur, since both probabilities have the same denominator
and it cancels, leaving the number of events divided by the
number of non-events.
compare the adjusted odds ratio (AOR) with the crude odds
ratio (COR)
eZ
P Z
1 e
14
Significance tests
► The process by which coefficients are tested for significance
for inclusion or elimination from the model involves several
different techniques.
I) Z-test
The significance of each variable can be assessed by treating
b
Z= se(b)
15
Significance tests
II) Likelihood-Ratio Test:
► In its most basic form, it can test the hypothesis that all the
coefficients in a model are all equal to 0:
H0: ß1 = ß2 = . . . = ßk = 0
21
5. No multicollinearity: To the extent that one
independent is a linear function of another
independent, the problem of multicollinearity will occur
in logistic regression, as it does in OLS regression. As
the independents increase in correlation with each
other, the standard errors of the logit (effect)
coefficients will become inflated.
♣ The log likelihood ratio test (or sometimes called as model chi-square
test) of a model tests the difference between -2LL for the initial chi-square
in the null model and -2LL for the full model. That is, Model chi-square is
computed as -2LL for the null (initial) model minus -2LL for the
researcher’s model.
♣ The initial chi-square is -2LL for the model which accepts the null
hypothesis that all the β coefficients are zero.
♣ The log likelihood ratio test tests the null hypothesis that all population
logistic regression coefficients except the constant are zero. It is an
overall model test which does not assure that every independent is
significant. 25
♣ LRT measures the improvement in fit that the
explanatory variables make compared to the null
model.
27
Table X :Results of separately regressing fertility levels (high versus low) on each
explanatory variable relating to women's sexual behaviour and use of contraceptives, North
and South Gondar zones, northwest Ethiopia, 2007 (Bivariate analyses)
fertility level Odds Ratio 95% Confidence P-value
Explanatory variable (crude) Interval
high low lower upper
Religion
Orthodox Christian 945 1312 1.00
Muslim 64 95 0.94 0.67 1.3 0.69
Knowledge of the respondent
regarding the period of pregnancy
Correct 92 274 1.00
Wrong 919 1139 2.40 1.87 3.09 < 0.001
Do you approve wife beating by
the husband for various reasons? 28
735 847 1.78 1.49 2.12 < 0.001
Yes
276 566 1.00
Table Z : Results from the multivariate analysis – adjusted for
demographic, socio-economic and reproductive variables, North
and South Gondar zones, northwest Ethiopia, 2007
Odds 95% P-
Explanatory variable fertility level Ratio Confidence value
(adjusted) Interval
high low lower upper
Categorical variables
Educational status:
0 = Women who could not read/write)
1 = Women with some primary school education
2 = Women with secondary and above education
Wash face:
1 = at most once a day (without soap)
2 = twice a day (without soap)
3 = at least once a day (with soap)
B) Does 'washing face' at least once a day (with soap) have any significant
effect on the prevention of trachoma?
C) Estimate the probability that a woman with some primary school education
(4th grade) who washes her face twice a day (without soap) will have
trachoma?
G) If you are asked to have more independent variables and undertake a similar
study, what additional variables (predictors) would you suggest to be
included in the proposed study? Why?
32
H) What recommendations do you forward based on your findings?