LOGISTIC
REGRESSION
LIMITATIONS / STRENGHTS
The Disadvantages of Logistic Regression
Logistic regression
also called logit regression or logit modeling, is a statistical
technique allowing researchers to create predictive models.
The technique is most useful for understanding the influence
of several independent variables on a single dichotomous
outcome variable. For example, logistic regression would
allow a researcher to evaluate the influence of grade point
average, test scores and curriculum difficulty on the
outcome variable of admission to a particular university. The
technique is useful, but it has significant limitations.
Identifying Independent Variables
Logistic regression attempts to predict outcomes based on a set
of independent variables, but if researchers include the wrong
independent variables, the model will have little to no predictive
value. For example, if college admissions decisions depend more
on letters of recommendation than test scores, and researchers
don't include a measure for letters of recommendation in their
data set, then the logit model will not provide useful or accurate
predictions. This means that logistic regression is not a useful
tool unless researchers have already identified all the relevant
independent variables.
Limited Outcome Variables
Logistic regression works well for predicting
categorical outcomes like admission or
rejection at a particular college. It can also
predict multinomial outcomes, like admission,
rejection or wait list. However, logistic
regression cannot predict continuous
outcomes. For example, logistic regression
could not be used to determine how high an
influenza patient's fever will rise, because the
scale of measurement -- temperature -- is
continuous. Researchers could attempt to
convert the measurement of temperature into
discrete categories like "high fever" or "low
fever," but doing so would sacrifice the
precision of the data set. This is a significant
disadvantage for researchers working with
continuous scales.
Independent Observations Required
Logistic regression requires that each data
point be independent of all other data points. If
observations are related to one another, then
the model will tend to overweight the
significance of those observations. This is a
major disadvantage, because a lot of scientific
and social-scientific research relies on research
techniques involving multiple observations of
the same individuals. For example, drug trials
often use matched pair designs that compare
two similar individuals, one taking a drug and
the other taking a placebo. Logistic regression
is not an appropriate technique for studies
using this design.
Overfitting the Model
Logistic regression attempts to predict outcomes based on a set
of independent variables, but logit models are vulnerable to
overconfidence. That is, the models can appear to have more
predictive power than they actually do as a result of sampling
bias. In the college admissions example, a random sample of
applicants might lead a logit model to predict that all students
with a GPA of at least 3.7 and a SAT score in the 90th
percentile will always be admitted. In reality, however, the
college might reject some small percentage of these applicants.
A logistic regression would therefore be "overfit," meaning that
it overstates the accuracy of its predictions.
The problem with logistic regression
The OR overestimates the Relative Risk when
the outcome is common (rule of thumb > 10%)
Despite advice on the rare event rate
assumption consumers of health research
literature often interpret the OR as a
Relative Risk (RR), leading to its potential
exaggeration
Logistic regression became easy to use and
very popular and there is a perception that
alternative methods do not exist
But there are easy and potentially more
appropriate outcomes when you want to
to estimate relative risk.
The strengths of the logistic regression approach
Logistic Regression can be applied to many different
study designs (cohort, case control,cross-sectional)
The Odds Ratio (OR) provides a good approximation
of the Relative Risk when the outcome is rare.
Fairly easy to run using many different statistical
software packages... too easy?
Multivariate