0% found this document useful (0 votes)
31 views10 pages

An Overview of Logistic Regression: Jill Mccracken May 28, 2004

The document provides an overview of logistic regression modeling. It discusses the logistic regression model form and how parameter estimates are obtained through maximum likelihood estimation. It describes strategies for model building such as variable selection and transformations. Methods for assessing model fit are presented, including goodness-of-fit measures and graphical analyses. The document also discusses interpreting logistic regression coefficients and provides an example of applying the Hosmer-Lemeshow method for determining the best functional form for a continuous variable.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views10 pages

An Overview of Logistic Regression: Jill Mccracken May 28, 2004

The document provides an overview of logistic regression modeling. It discusses the logistic regression model form and how parameter estimates are obtained through maximum likelihood estimation. It describes strategies for model building such as variable selection and transformations. Methods for assessing model fit are presented, including goodness-of-fit measures and graphical analyses. The document also discusses interpreting logistic regression coefficients and provides an example of applying the Hosmer-Lemeshow method for determining the best functional form for a continuous variable.
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 10

An Overview of Logistic

Regression

Jill McCracken
May 28, 2004
1 Filename/RPS Number
The objective of this presentation is to cover some of the basics of
logistic regression modeling
Logistic Regression Basics
Form of the model
Parameter estimation
Significance Testing

Model Building Strategies
Variable Selection
Transformations

Assessing Model Fit
Common goodness-of-fit measures
Graphical methods
2 Filename/RPS Number
The Y
i
s are independent and follow a Bernoulli distribution
Objective: Want to relate E(Y|x) to |
0
+ |
1
x
The Logistic Equation:
where and is the logit transformation (or log-odds)
Alternatively, we can write the above equation as (Note that 0 < t(x) < 1)
The logit transformation isnt the only suitable link function but it is used most often to model
binary response data for reasons of interpretation
Parameter estimates are obtained using the method of maximum likelihood
The logistic regression model is a specific generalized linear model
that is often used to represent the relationship between a number
of explanatory variables and a binary response variable
,
) ( 1
) (
log
1 0
x
x
x
| |
t
t
+ = |
.
|

\
|

) | ( ) ( x Y E x = t |
.
|

\
|

=
) ( 1
) (
log ) (
x
x
x g
t
t
x
x
e
e
x
1 0
1 0
1
) (
| |
| |
t
+
+
+
=
3 Filename/RPS Number
Likelihood Ratio Tests can be applied to determine the significance of coefficients and significance of the
model
Basic approach is to compare observed values to predictions from models with and without the variable(s)
of interest
Use the test statistic, , (where m1 = Model w/o the variable(s) and m2= Model w/ the
variable(s)); Under the null hypothesis, G follows an asymptotic chi-square distribution
Wald Test: compares MLE estimate of the slope to an estimate of its standard error, which will follow an
asymptotically normal distribution under the null hypothesis
Score Test: statistically equivalent to the Wald test but does not require an explicit calculation of the MLE
Interval estimates are formed using the Wald interval, again exploiting the asymptotic normality of the MLE:

Significance tests and interval estimates are typically obtained
using large sample approximations
|
.
|

\
|
=
) 2 (
) 1 (
log 2
m L
m L
G
)

2 / 1
| |
o est
SE z

4 Filename/RPS Number
One desirable attribute of the logistic regression model is the
interpretability of the estimated coefficients
In general, the slope coefficient in a logistic regression model represents the change in the
logit that arises from a one-unit change in the independent variable, | = g(x+1)-g(x)
For a dichotomous independent variable, the slope coefficient represents the log of the odds
ratio (the ratio of the odds for x=1 to the odds for x=0), adjusting for all of the other variables in
the model
For a polychotomous independent variable, the slope coefficient represents the log of the odds
ratio for the design group relative to the reference group, adjusting for all of the other variables
in the model
For a logistic regression model with a single continuous variable, the slope coefficient
represents the change in the log of the odds corresponding to a unit increase in x , adjusting
for all of the other variables in the model
may require a change of units to be useful
Assumes that the relationship between the logit and the variable is linear
5 Filename/RPS Number
Hosmer and Lemeshow provide a step-by-step guide to building
logistic regression models
STEP 1: Begin variable selection process with a careful univariate analysis of each variable, using contingency tables or
by fitting univariable logistic regression models to obtain preliminary diagnostics
STEP 2: Based on results of the univariable analyses, select variables and build a multivariable model (good rule of thumb
is p-value < 0.25)
STEP 3: Verify the importance of each variable in the model (iterative)
Look at Wald statistic for each variable
Compare estimated coefficients to those from the univariable models
Eliminate variables that do not contribute significantly and compare this model to the original one
Add any variables not included in the original multivariable model back in and check their significance
Yields the preliminary main effects model once this step is complete
STEP 4: Check the assumption of linearity in the logit for continuous variables
Use the design variable method or fractional polynomials to determine the correct scale of each variable
Yields the main effects model once this step is complete
STEP 5: Check for interactions between variables
Add in each possible interaction, one at a time, to the main effects model to see which are significant
Add all of the significant interactions to the main effects model at once to see which are significant
STEP 6: Check the goodness-of-fit
6 Filename/RPS Number
A number of goodness-of-fit measures exist to provide insight
into how well the fit model describes the outcome variable
Measures of difference between observed and fitted values
Pearson residual: Form test statistic by summing the squares of the differences between observed and
fitted values and dividing by the estimated standard error (asymptotically chi-squared)
Deviance residual**: Form test statistic by summing the signed square root of each observations
contribution to the total model deviance (asymptotically chi-squared)
Hosmer Lemeshow Tests Forms g groups of cases based on the values of the estimated probabilities and
performs an ordinary chi-square test for the mean of the predicted probability against the observed fraction
of events (g-2 d.o.f)
Classification Tables use estimated probabilities to classify cases in the two groups
Area under ROC Curves Plot sensitivity (probability of detecting true positive) vs. 1-specificity (probability
of detecting a false positive) over a given range of possible cutpoints
The area under this curve gives an idea of how well the model can discriminate between classes
Useful tool for graphical comparison of multiple models
**Note the deviance residual does not follow an asymptotic chi-squared distribution when the number
of distinct x-values is close to the number of observations
7 Filename/RPS Number
Bibliography

Hosmer & Lemeshow, Applied Logistic Regression, Wiley & Sons, NY, 2000.
Harrell, Regression Modeling Strategies, Springer-Verlag, NY, 2001.
Casella & Berger, Statistical Inference, 2
nd
Ed, Duxbury Press, 2002.
8 Filename/RPS Number
Application of the Hosmer & Lemeshow Design Variable
approach suggested that converting RM to a binary indicator might
work well in the regression model
Hosmer & Lemeshow Design Variable Method
STEP 1: Obtain the quartiles for the variable of interest
STEP 2: Create a categorical variable with four levels
using three cutpoints based on the quartiles
STEP 3: Fit the multivariable model replacing the
continuous variable with the four-level categorical
variable (this will require three design variables, using the
lowest quartile as the reference category)
STEP 4: Plot the estimated coefficients versus the
quartile midpoints for each group; Plot coefficient = 0 at
the midpoint of the first quartile
STEP 5: Visually inspect plot & choose most logical
parametric shape for the scale of the variable
STEP 6: Refit the model using the parametric form
suggested by the plot
H-L Design Variable Method
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
4 5 6 7 8 9
Quartile Midpoint
F
i
t

C
o
e
f
f
i
c
i
e
n
t
9 Filename/RPS Number
Further investigation showed a more gradual increase of the
coefficient with respect to the value of RM

APPROACH Followed Hosmer &
Lemeshow method using deciles instead
of quartiles to identify the cutpoint for the
binary indicator
RESULT Fairly smooth increase in the
value of the coefficient that was not
evident in the quartile plot
Tried creating a binary indicator cutting RM
at the 80
th
percentile and fit new OLS
model for comparison with the others
H-L Design Variable Method
-0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
4 5 6 7 8 9
Decile Midpoint
F
i
t

C
o
e
f
f
i
c
i
e
n
t
Created binary indicator cutting
RM at the 80
th
percentile

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy