0% found this document useful (0 votes)

39 views

5.1) Binary logistic regression

Uploaded by

sintayehushiferaw19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

5.1) Binary logistic regression

Uploaded by

sintayehushiferaw19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 32

Biostatistics

Binary Logistic Regression

Prof. Getu Degu

March 2013
Logistic regression

► In many studies the outcome variable of interest is the

presence or absence of some condition, whether or not the
subject has a particular characteristic such as a symptom of a
certain disease.

► We cannot use ordinary multiple linear regression for such

data, but instead we can use a similar approach known as
multiple linear logistic regression or just logistic
regression.

2
Logistic regression: Uses and selection of independent variables

♣ The first is the prediction (estimation) of the

probability that an individual will have (develop) the
characteristic.

♣ For example, logistic regression is often used in

epidemiological studies where the result of the
analysis is the probability of developing cancer after
controlling for other associated risks.

♣ Logistic regression also provides knowledge of the

relationships and strengths between an outcome
variable (dependent variable having only two
categories) and explanatory (independent) variables
that can be categorical or continuous. 3
Logistic regression
Example:
smoking 10 packs a day puts you at a higher risk for developing cancer than
working in an asbestos mine).

♣ Logistic regression can be applied to case-control and cross-sectional data.

The Model:
► The basic principle of logistic regression is much the same as for ordinary
multiple regression.

► The main difference is that instead of developing a model that uses a

combination of the values of a group of explanatory variables to predict the
value of a dependent variable, we predict a transformation of the dependent
variable.

►The dependent variable in logistic regression is usually dichotomous, that is,

the dependent variable can take the value 1 with a probability of success ,
or the value 0 with a probability of failure 1-.

This type of variable is called a binomial (or binary) variable. 4

Logistic regression
• Logistic regression extends ordinary least
squares (OLS) methods to model data with
binary (yes/no, success/failure) outcomes.

• Instead of directly estimating the value of

the outcome, logistic regression allows you
to estimate the probability of a success or
failure.
5
Logistic regression
 Applications of logistic regression have also been
extended to cases where the dependent variable is of
more than two cases, known as multinomial logistic
regression.

 When multiple classes of the dependent variable can

be ranked, then ordinal logistic regression is preferred
to multinomial logistic regression.

 As mentioned previously, one of the goals of logistic

regression is to correctly predict the category of outcome
for individual cases using the most parsimonious
(condensed) model.
6
Logistic regression: model creation
 To accomplish this goal, a model is created
that includes all predictor variables that are
useful in predicting the response variable.
 Several different options are available during
model creation.

 Variables can be entered into the model in the

order specified by the researcher.
 Logistic regression can test the fit of the model
after each coefficient is added or deleted, called
stepwise regression. 7
Logistic regression: Forward stepwise regression

► The first step in many analyses of multivariate data is to examine the

simple relation between each potential explanatory variable and the
outcome variable of interest ignoring all the other variables.

► Forward stepwise regression analysis uses this analysis as its starting

point. Steps in applying this method are:

a) Find the single variable that has the strongest association with the
dependent variable and enter it into the model (i.e., the variable
with the smallest p-value).
b) Find the variable among those not in the model that, when added
to the model so far obtained, explains the largest amount of the
remaining variability.
c) Repeat step (b) until the addition of an extra variable is not
statistically significant at some chosen level such as P=.05.

► N.B. You have to stop the process at some point otherwise you will
end up with all the variables in the model. 8
Logistic regression: Backward stepwise regression

♣ It appears to be the preferred method of exploratory

analyses, where the analysis begins with a full or
saturated model and variables are eliminated from
the model in an iterative process.

♣ The fit of the model is tested after the elimination of

each variable to ensure that the model still adequately
fits the data. When no more variables can be
eliminated from the model, the analysis has been
completed.

9
Logistic regression
♣ Logistic regression is a powerful statistical tool for estimating the magnitude
of the association between an exposure and a binary outcome after
adjusting simultaneously for a number of potential confounding
factors.

♣ If we have a binary variable and give the categories numerical values of 0

and 1, usually representing ‘No’ and ‘Yes’ respectively, then the mean of
these values in a sample of individuals is the same as the proportion of
individuals with the characteristic.

♣ We could expect, therefore, that the appropriate regression model would

estimate the probability (proportion) that an individual will have the
characteristic.

♣ We cannot use an ordinary linear regression, because this might predict

proportions less than zero or greater than one, which would be
meaningless.

♣ In practice, a statistically preferable method is to use a transformation

of this proportion. 10
Logistic regression
♣ The transformation we use is called the logit transformation, written as
logit (p). Here p is the proportion of individuals with the characteristic.

♣ For example, if p is the probability of a subject having a myocardial

infarction, then 1-p is the probability that they do not have one.

♣ The ratio p / (1-p) is called the odds and thus

P
logit (p) = ln  1  P  is the log odds.
The logit can take any value from minus infinity to plus infinity.

♣ We can fit regression models to the logit which are very similar to the
ordinary multiple regression models found for data from a normal
distribution.

♣ We assume that relationships are linear on the logistic scale:

 P 
 
1 P 
ln = a + b1X1 + b2X2 + … + bnXn
11
where, X1, … Xn are the predictor variables and p is the proportion to be predicted. The calculation is computer intensive.
Logistic regression
 The quantity to the left of the equal sign is called a logit.
It’s the log of the odds that an event occurs.

 The odds that an event occurs is the ratio of the number of

people who experience the event to the number of people

who do not.

This is what you get when you divide the probability that
the event occurs by the probability that the event does not
occur, since both probabilities have the same denominator
and it cancels, leaving the number of events divided by the
number of non-events.

The coefficients in the logistic regression model tell you

how much the logit changes based on the values of the 12
predictor variables.
 Because the logistic regression equation predicts
the log odds, the coefficients represent the
difference between two log odds, a log odds ratio.
 The antilog of the coefficients is thus an odds ratio. Most
programs print these odds ratios. These are often called
adjusted odds ratios.

 compare the adjusted odds ratio (AOR) with the crude odds
ratio (COR)

 The above equation can be rewritten to represent the

probability of disease as:
1
P
1  e  ( a b1 X 1b 2 X 2.bnXn )
13
e ( a b1 X 1b 2 X 2bnXn)
P
1  e ( a b1 X 1b 2 X 2bnXn )

If Z = a + b1X1 + b2X2 + … + bnXn

The above equation turns out to be:

eZ
P Z
1 e

14
Significance tests
► The process by which coefficients are tested for significance
for inclusion or elimination from the model involves several
different techniques.

I) Z-test
The significance of each variable can be assessed by treating
b
Z= se(b)

► This z value is then squared, yielding a Wald statistic with a

chi-square distribution. However, there are problems with the
use of the Wald statistic. The likelihood-ratio test is more
reliable for small sample sizes than the Wald test.

15
Significance tests
II) Likelihood-Ratio Test:

► Logistic regression uses maximum-likelihood estimation to

compute the coefficients for the logistic regression equation.

N.B. Multiple regression uses the least-squares method to find the

coefficients for the independent variables in the regression
equation
(it computes coefficients that minimize the residuals for all
cases).

► Maximum-likelihood estimation is an interative procedure that

successively tries to get closer and closer to the correct answer.

► Before proceeding to the likelihood ratio test, we need

to know about the deviance which is analogous to the
16
Deviance
 The deviance of a model is -2 times the log
likelihood (-2LL) associated with each model.

 As a model’s ability to predict outcomes improves,

the deviance falls. Poorly-fitting models have higher
deviance.

 If a model perfectly predicts outcomes, the

deviance will be zero. This is analogous to the
situation in linear regression, where the residual
sum of squares falls to 0 if the model predicts the
values of the dependent variable perfectly. 17
 Based on the deviance, it is possible to construct an
analogous to r² for logistic regression, commonly
referred to as the Pseudo r².

 If G1² is the deviance of a model with variables, and G0² is

the deviance of a2null model, the pseudo r² of the model is:
G1
r² = 1 - G 02 = 1 – (ln L1 / ln L0)

 One might think of it (pseudo r2) as the proportion of

deviance explained.
 Note that The deviance of a model is -2 times the log likelihood (i.e., -2LL)
associated with each model. 18
►The likelihood ratio test (LRT), which makes use of the deviance,
is analogous to the F-test from linear regression.

► In its most basic form, it can test the hypothesis that all the
coefficients in a model are all equal to 0:
H0: ß1 = ß2 = . . . = ßk = 0

►The test statistic has a chi-square distribution, with k degrees of

freedom.

► If we want to test whether a subset consisting of q coefficients

in a model are all equal to zero, the test statistic is the same,
except that for L0 we use the likelihood from the model without
the coefficients, and L1 is the likelihood from the model with
them.

►This chi-square has q degrees of freedom. 19

Assumptions
► Logistic
regression is popular in part because it enables the
researcher to overcome many of the restrictive assumptions of
OLS regression:

1. Logistic regression does not assume a linear relationship

between the dependents and the independents.
 It is possible and permitted to add explicit interaction
and power terms as variables on the right-hand side of the
logistic equation, as in OLS regression.

2. The dependent variable need not be normally distributed.

3. The dependent variable need not be homoscedastic for each

level of the independents; that is, there is no homogeneity of
variance assumption.
20
However, other assumptions still apply:
1. Meaningful coding. Logistic coefficients will be difficult to interpret if not
coded meaningfully. The convention for binomial logistic regression is to
code the dependent class of greatest interest as 1 and the other class as 0.

2. Inclusion of all relevant variables in the regression model

3. Error terms are assumed to be independent (independent sampling).

Violations of this assumption can have serious effects. Violations are apt to
occur, for instance, in correlated samples and repeated measures designs,
such as before-after or matched-pairs studies, cluster sampling, or time-
series data. That is, subjects cannot provide multiple observations at
different time points.

4. Linearity: Logistic regression does not require linear relationships between

the independents and the dependent, as does OLS regression, but it does
assume a linear relationship between the logit of the independents and
the dependent.

21
5. No multicollinearity: To the extent that one
independent is a linear function of another
independent, the problem of multicollinearity will occur
in logistic regression, as it does in OLS regression. As
the independents increase in correlation with each
other, the standard errors of the logit (effect)
coefficients will become inflated.

6. No outliers: As in OLS regression, outliers can affect

results significantly. The researcher should analyze
standardized residuals for outliers and consider
removing them or modeling them separately.
Standardized residuals >2.58 are outliers at the .01
level, which is the customary level (standardized 22
residuals > 1.96 are outliers at the less-used .05 level).
7. Large samples: Unlike OLS regression,
logistic regression uses maximum likelihood
estimation (MLE) rather than ordinary least
squares (OLS) to derive parameters.

♦ MLE relies on large-sample asymptotic normality

which means that reliability of estimates decline
when there are few cases for each observed
combination of independent variables.

♦ That is, in small samples one may get high

standard errors. In the extreme, if there are too few
cases in relation to the number of variables, it may
be impossible to converge on a solution.

♦ Very high parameter estimates (logistic 23

coefficients) may signal inadequate sample size.
Hosmer and Lemeshow Test
♣ The Hosmer -Lemeshow goodness - of - fit statistic is used to
assess whether the necessary assumptions for the application
of multiple logistic regression are fulfilled.

♣ The Hosmer and Lemeshow's goodness-of-fit statistic is

computed as the Pearson chi-square from the contingency
table of observed frequencies and expected frequencies.

♣ A good fit as measured by Hosmer and Lemeshow's test will

yield a large p-value (much larger than 0.05).

♣ The result of the Hosmer-Lemeshow goodness-of-fit is easily

obtained by clicking on the appropriate menu commands of
logistic regression. That is,
Analyze → Regression → Binary logistic → Options → Hosmer-Lemeshow goodness-of-fit
24
Summary
♣ A likelihood is a probability, specifically the probability that the values of
the dependent variable may be predicted from the values of the
independent variables. Like any probability, the likelihood varies from
0 to 1.

♣ The log likelihood ratio test (or sometimes called as model chi-square
test) of a model tests the difference between -2LL for the initial chi-square
in the null model and -2LL for the full model. That is, Model chi-square is
computed as -2LL for the null (initial) model minus -2LL for the
researcher’s model.

♣ The initial chi-square is -2LL for the model which accepts the null
hypothesis that all the β coefficients are zero.

♣ The log likelihood ratio test tests the null hypothesis that all population
logistic regression coefficients except the constant are zero. It is an
overall model test which does not assure that every independent is
significant. 25
♣ LRT measures the improvement in fit that the
explanatory variables make compared to the null
model.

♣ The method of analysis uses an iterative procedure

whereby the answer is obtained by several repeated
cycles of calculation using the maximum likelihood
approach.

♣ Because of this extra complexity, logistic regression

is only found in large statistical packages or those
primarily intended for the analysis of epidemiological
data. 26
EXAMPLES

The following Tables are given to show the

formats of selected presentations from the
results of logistic regression analysis (only
some of the independent variables are taken)

27
Table X :Results of separately regressing fertility levels (high versus low) on each
explanatory variable relating to women's sexual behaviour and use of contraceptives, North
and South Gondar zones, northwest Ethiopia, 2007 (Bivariate analyses)
fertility level Odds Ratio 95% Confidence P-value
Explanatory variable (crude) Interval
high low lower upper

Educational level 0.002

no education (does not read/ write) 884 989 1.00
primary 104 214 0.54 0.42 0.70
secondary and above 23 210 0.12 0.08 0.19

Monthly household income < 0.001

≤ 320 Eth Birr 256 556 1.00
321 – 500 Eth Birr 309 487 1.38 1.12 1.69
501 – 999 Eth Birr 327 292 2.43 1.96 3.02
≥ 1000 Eth Birr 119 78 3.31 2.40 4.57

Religion
Orthodox Christian 945 1312 1.00
Muslim 64 95 0.94 0.67 1.3 0.69
Knowledge of the respondent
regarding the period of pregnancy
Correct 92 274 1.00
Wrong 919 1139 2.40 1.87 3.09 < 0.001
Do you approve wife beating by
the husband for various reasons? 28
735 847 1.78 1.49 2.12 < 0.001
Yes
276 566 1.00
Table Z : Results from the multivariate analysis – adjusted for
demographic, socio-economic and reproductive variables, North
and South Gondar zones, northwest Ethiopia, 2007

Odds 95% P-
Explanatory variable fertility level Ratio Confidence value
(adjusted) Interval
high low lower upper

Educational level 0.002

no education (does not read/ write) 884 989 1.00
primary 104 214 0.92 0.67 1.27
secondary and above 23 210 0.37 0.21 0.64

Monthly household income < 0.001

≤ 320 Eth Birr 256 556 1.00
321 – 500 Eth Birr 309 487 1.48 1.16 1.88
501 – 999 Eth Birr 327 292 3.39 2.60 4.43
≥ 1000 Eth Birr 119 78 6.97 4.54 10.71

Knowledge of the respondent

regarding the period of pregnancy
Correct 92 274 1.00
♣ For variables having more than two categories, the overall
Wrong 919significance
1139 1.42 by their corresponding
(red color) is given 1.04 1.93
P-values. 0.027
29
♣ The assessment made whether the required assumptions for the application of multiple logistic regression was fulfilled showed that this
parsimonious model adequately fits the data as P = 0.88 (by using Hosmer and Lemeshow test)
Exercise
• Fifty women aged 16 to 65 were randomly taken from
a certain village to assess the level of trachoma (all
stages) and its associated risk factors.

• Selected socio-demographic characteristics of the

women together with their status (presence/absence)
of trachoma were recorded.

• The dependent variable (i.e., trachoma) is coded as

1 for ‘yes’ and 0 for ‘no’.

• There were three independent variables (predictors)

relating to the study subjects collected during the
investigation (age, educational status and wash face).
30
Exercise
► Age was taken as a continuous variable.

► Educational status and wash face were taken as categorical

variables.

Categorical variables

Educational status:
0 = Women who could not read/write)
1 = Women with some primary school education
2 = Women with secondary and above education

Wash face:
1 = at most once a day (without soap)
2 = twice a day (without soap)
3 = at least once a day (with soap)

By taking the above information into account, answer the following

questions. 31
Exercise
A) Are the necessary assumptions for the application of multiple logistic
regression fulfilled? How? If so, use the forward LR method to analyze the
given data (This procedure should be preceded by the classical bivariate analyses).

B) Does 'washing face' at least once a day (with soap) have any significant
effect on the prevention of trachoma?

C) Estimate the probability that a woman with some primary school education
(4th grade) who washes her face twice a day (without soap) will have
trachoma?

D) Estimate the probability that a woman with a secondary school education

who washes her face twice a day (without soap) will have trachoma?

E) Estimate the probability that a woman with a secondary school education

who washes her face at least once a day (with soap) will have trachoma?

F) What do you understand from your answers in parts C , D and E ?

G) If you are asked to have more independent variables and undertake a similar
study, what additional variables (predictors) would you suggest to be
included in the proposed study? Why?

32
H) What recommendations do you forward based on your findings?

Regresi Logistik
No ratings yet
Regresi Logistik
34 pages
Joseph M. Hilbe - Practical Guide To Logistic Regression (2016, Taylor & Francis)
No ratings yet
Joseph M. Hilbe - Practical Guide To Logistic Regression (2016, Taylor & Francis)
162 pages
Assignement 1 Question Pool
No ratings yet
Assignement 1 Question Pool
6 pages
Bio2 Module 5 - Logistic Regression
No ratings yet
Bio2 Module 5 - Logistic Regression
19 pages
Econometrics II CH 1
No ratings yet
Econometrics II CH 1
48 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Lecture 22. Glm
No ratings yet
Lecture 22. Glm
41 pages
Logistic Regression & Practice
100% (1)
Logistic Regression & Practice
51 pages
Psy 512 Logistic Regression
No ratings yet
Psy 512 Logistic Regression
12 pages
Garson 2008 Logistic Regression
No ratings yet
Garson 2008 Logistic Regression
33 pages
Regresion Logistica
No ratings yet
Regresion Logistica
71 pages
Regression Logistic 4
No ratings yet
Regression Logistic 4
51 pages
Logistic Regression
100% (1)
Logistic Regression
37 pages
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
No ratings yet
Logistic Regression: Logistic Regression and The New: Residual Logistic Regression
31 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
23 pages
Logistic Regression
100% (2)
Logistic Regression
47 pages
Logistic Regression: Psy 524 Ainsworth
No ratings yet
Logistic Regression: Psy 524 Ainsworth
37 pages
Binary Logistic Regression - 6.2
No ratings yet
Binary Logistic Regression - 6.2
34 pages
Logistic Regression
No ratings yet
Logistic Regression
20 pages
Logistic Regression
No ratings yet
Logistic Regression
24 pages
spss10 LOGIT
No ratings yet
spss10 LOGIT
17 pages
A Simple But Effective Logistic Regression Derivation
No ratings yet
A Simple But Effective Logistic Regression Derivation
6 pages
Binary Logistic Regression Lecture 9
No ratings yet
Binary Logistic Regression Lecture 9
33 pages
Logistic Ordinal Regression
No ratings yet
Logistic Ordinal Regression
10 pages
W5S01 - PM-Logistic Regression
No ratings yet
W5S01 - PM-Logistic Regression
17 pages
Regression3 Slides
No ratings yet
Regression3 Slides
47 pages
4_c_logistic regression
No ratings yet
4_c_logistic regression
13 pages
An Introduction To Logistic Regression in R
No ratings yet
An Introduction To Logistic Regression in R
25 pages
Logistic Regression
No ratings yet
Logistic Regression
42 pages
Logistic Regression-Advanced Biostat-PDF(1)
No ratings yet
Logistic Regression-Advanced Biostat-PDF(1)
86 pages
Logistic Regression
No ratings yet
Logistic Regression
27 pages
Logistic Regression
No ratings yet
Logistic Regression
5 pages
T12 Logistic Regression
No ratings yet
T12 Logistic Regression
5 pages
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
No ratings yet
Home Lesson 15: Logistic, Poisson & Nonlinear Regression
32 pages
02 LogisticRegression
No ratings yet
02 LogisticRegression
29 pages
Detailed_Logistic_Regression
No ratings yet
Detailed_Logistic_Regression
30 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Logistic Regression
No ratings yet
Logistic Regression
17 pages
Day 13 Logistic Regression
No ratings yet
Day 13 Logistic Regression
28 pages
Logistic Regression
No ratings yet
Logistic Regression
18 pages
Logistic Regression: in Experimental Research
No ratings yet
Logistic Regression: in Experimental Research
12 pages
b4l
No ratings yet
b4l
30 pages
Article: An Introduction Tos Logistic Regression Analysis and Reporting
No ratings yet
Article: An Introduction Tos Logistic Regression Analysis and Reporting
5 pages
Logistic Regression
100% (3)
Logistic Regression
30 pages
Lec-4 Logistic Regression
No ratings yet
Lec-4 Logistic Regression
54 pages
Logistic Regression Playbook
No ratings yet
Logistic Regression Playbook
19 pages
What Is Logistic Regression
No ratings yet
What Is Logistic Regression
20 pages
Dissertation Using Logistic Regression
100% (2)
Dissertation Using Logistic Regression
6 pages
Ilovepdf Merged (24)
No ratings yet
Ilovepdf Merged (24)
208 pages
2 Dealing With Logistic Regression
No ratings yet
2 Dealing With Logistic Regression
4 pages
Logistic Regression Analysis
No ratings yet
Logistic Regression Analysis
48 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Logistic Regression
No ratings yet
Logistic Regression
8 pages
Logistic Regression
100% (2)
Logistic Regression
32 pages
Logistic_Regression_ADA__xid-2911285_1_0SwZFA4qav
No ratings yet
Logistic_Regression_ADA__xid-2911285_1_0SwZFA4qav
98 pages
RM - Binary Logistic Regression Model - Estimation
No ratings yet
RM - Binary Logistic Regression Model - Estimation
19 pages
Logistic Regression Tutorial
No ratings yet
Logistic Regression Tutorial
25 pages
Thesis Using Logistic Regression
100% (2)
Thesis Using Logistic Regression
7 pages
What Is Logistic Regression - Statistics Solutions PDF
No ratings yet
What Is Logistic Regression - Statistics Solutions PDF
2 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
5.3) Ordinal logistic regression 2
No ratings yet
5.3) Ordinal logistic regression 2
40 pages
6.2) Survival Analysis (Logrank Test 22)
No ratings yet
6.2) Survival Analysis (Logrank Test 22)
22 pages
6 Experimental Study Design, HU Dec 2011, 2 Slides Per Page Copy
No ratings yet
6 Experimental Study Design, HU Dec 2011, 2 Slides Per Page Copy
18 pages
5.2) Multinomial logistic regression
No ratings yet
5.2) Multinomial logistic regression
34 pages
6.1) Survival Analysis (Life Table)
No ratings yet
6.1) Survival Analysis (Life Table)
16 pages
Introduction (Autosaved)
No ratings yet
Introduction (Autosaved)
17 pages
Statistics for Engineers and Scientists 5th Edition William Navidi 2024 Scribd Download
100% (1)
Statistics for Engineers and Scientists 5th Edition William Navidi 2024 Scribd Download
47 pages
Split Learning Over Wireless Networks Parallel Design and Resource Management
No ratings yet
Split Learning Over Wireless Networks Parallel Design and Resource Management
30 pages
7. Class Intervals
No ratings yet
7. Class Intervals
13 pages
Presentation 16 Demo
No ratings yet
Presentation 16 Demo
12 pages
Digital Numbers
100% (1)
Digital Numbers
8 pages
Machine Learning in Antenna Design
No ratings yet
Machine Learning in Antenna Design
9 pages
Time: 3 Hours Total Marks: 70: Section A
No ratings yet
Time: 3 Hours Total Marks: 70: Section A
2 pages
Arduino Programming Part7 Slides
No ratings yet
Arduino Programming Part7 Slides
14 pages
Data Mining Techniques Unit-1
No ratings yet
Data Mining Techniques Unit-1
122 pages
1998 - A Method For VRP With Multiple Vehicle Types and TW
No ratings yet
1998 - A Method For VRP With Multiple Vehicle Types and TW
11 pages
(Ebook) JMP 8 Statistics and Graphics Guide by SAS Publishing ISBN 9781607643005, 1607643006 download
100% (2)
(Ebook) JMP 8 Statistics and Graphics Guide by SAS Publishing ISBN 9781607643005, 1607643006 download
56 pages
NumericalAnalysisChapter2 28 02 23
No ratings yet
NumericalAnalysisChapter2 28 02 23
62 pages
Existential Existential - Existential Any: Alternation Universal Universal
No ratings yet
Existential Existential - Existential Any: Alternation Universal Universal
19 pages
Lec 4
No ratings yet
Lec 4
12 pages
Statement of Purpose: Jaweria Amjad
No ratings yet
Statement of Purpose: Jaweria Amjad
3 pages
Priority Driven Scheduling of Periodic Tasks
No ratings yet
Priority Driven Scheduling of Periodic Tasks
10 pages
1.5 Recursion
No ratings yet
1.5 Recursion
8 pages
Two - Level Logic Synthesis
No ratings yet
Two - Level Logic Synthesis
15 pages
FL AutoML ModelBuilder EN
No ratings yet
FL AutoML ModelBuilder EN
6 pages
20A04502T Digital Signal Processing
No ratings yet
20A04502T Digital Signal Processing
2 pages
Evaluation of Postfix Expression
No ratings yet
Evaluation of Postfix Expression
7 pages
Quantum Computing: Principles of Quantum Mechanics
No ratings yet
Quantum Computing: Principles of Quantum Mechanics
22 pages
Che614 Introduction To Hydrodynamic Stability Assignment 2 Due Date: 18 August 2014 Elementary Theory of Bifurcations
No ratings yet
Che614 Introduction To Hydrodynamic Stability Assignment 2 Due Date: 18 August 2014 Elementary Theory of Bifurcations
1 page
Artificial Intelligence Cat 2
No ratings yet
Artificial Intelligence Cat 2
8 pages
Mattheus Lim - Data Scientist CV
No ratings yet
Mattheus Lim - Data Scientist CV
1 page
Artificial Intelligence Quiz Questions
100% (1)
Artificial Intelligence Quiz Questions
3 pages
Z-Transform: 10.1 Mathematical de Finition
No ratings yet
Z-Transform: 10.1 Mathematical de Finition
28 pages
Grade 12 Calculus Table of Contents
No ratings yet
Grade 12 Calculus Table of Contents
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

5.1) Binary logistic regression

Uploaded by

5.1) Binary logistic regression

Uploaded by

Biostatistics

Binary Logistic Regression

Prof. Getu Degu

► In many studies the outcome variable of interest is the

► We cannot use ordinary multiple linear regression for such

♣ The first is the prediction (estimation) of the

♣ For example, logistic regression is often used in

♣ Logistic regression also provides knowledge of the

♣ Logistic regression can be applied to case-control and cross-sectional data.

► The main difference is that instead of developing a model that uses a

►The dependent variable in logistic regression is usually dichotomous, that is,

This type of variable is called a binomial (or binary) variable. 4

• Instead of directly estimating the value of

 When multiple classes of the dependent variable can

 As mentioned previously, one of the goals of logistic

 Variables can be entered into the model in the

► The first step in many analyses of multivariate data is to examine the

► Forward stepwise regression analysis uses this analysis as its starting

♣ It appears to be the preferred method of exploratory

♣ The fit of the model is tested after the elimination of

♣ If we have a binary variable and give the categories numerical values of 0

♣ We could expect, therefore, that the appropriate regression model would

♣ We cannot use an ordinary linear regression, because this might predict

♣ In practice, a statistically preferable method is to use a transformation

♣ For example, if p is the probability of a subject having a myocardial

♣ The ratio p / (1-p) is called the odds and thus

♣ We assume that relationships are linear on the logistic scale:

 The odds that an event occurs is the ratio of the number of

people who experience the event to the number of people

The coefficients in the logistic regression model tell you

 The above equation can be rewritten to represent the

If Z = a + b1X1 + b2X2 + … + bnXn

The above equation turns out to be:

► This z value is then squared, yielding a Wald statistic with a

► Logistic regression uses maximum-likelihood estimation to

N.B. Multiple regression uses the least-squares method to find the

► Maximum-likelihood estimation is an interative procedure that

► Before proceeding to the likelihood ratio test, we need

 As a model’s ability to predict outcomes improves,

 If a model perfectly predicts outcomes, the

 If G1² is the deviance of a model with variables, and G0² is

 One might think of it (pseudo r2) as the proportion of

►The test statistic has a chi-square distribution, with k degrees of

► If we want to test whether a subset consisting of q coefficients

►This chi-square has q degrees of freedom. 19

1. Logistic regression does not assume a linear relationship

2. The dependent variable need not be normally distributed.

3. The dependent variable need not be homoscedastic for each

2. Inclusion of all relevant variables in the regression model

3. Error terms are assumed to be independent (independent sampling).

4. Linearity: Logistic regression does not require linear relationships between

6. No outliers: As in OLS regression, outliers can affect

♦ MLE relies on large-sample asymptotic normality

♦ That is, in small samples one may get high

♦ Very high parameter estimates (logistic 23

♣ The Hosmer and Lemeshow's goodness-of-fit statistic is

♣ A good fit as measured by Hosmer and Lemeshow's test will

♣ The result of the Hosmer-Lemeshow goodness-of-fit is easily

♣ The method of analysis uses an iterative procedure

♣ Because of this extra complexity, logistic regression

The following Tables are given to show the

Educational level 0.002

Monthly household income < 0.001

Educational level 0.002

Monthly household income < 0.001

Knowledge of the respondent

• Selected socio-demographic characteristics of the

• The dependent variable (i.e., trachoma) is coded as

• There were three independent variables (predictors)

► Educational status and wash face were taken as categorical

By taking the above information into account, answer the following

D) Estimate the probability that a woman with a secondary school education

E) Estimate the probability that a woman with a secondary school education

F) What do you understand from your answers in parts C , D and E ?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.