0% found this document useful (0 votes)
68 views

L9 Logistical Regression Models Updated

1) Logistic regression is used when the outcome variable is categorical with two or more categories. Binary logistic regression is used for two categories, and multinomial logistic regression is used for more than two categories. 2) Unlike linear regression, logistic regression models the log odds rather than the probability of the outcome variable. This allows the predicted probabilities to remain between 0 and 1. 3) Odds ratios from logistic regression indicate how much more or less likely an outcome is given a one unit increase in the predictor variable, compared to the baseline level. Odds ratios above 1 indicate increased likelihood, while those between 0 and 1 indicate decreased likelihood.

Uploaded by

Mahmudul hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views

L9 Logistical Regression Models Updated

1) Logistic regression is used when the outcome variable is categorical with two or more categories. Binary logistic regression is used for two categories, and multinomial logistic regression is used for more than two categories. 2) Unlike linear regression, logistic regression models the log odds rather than the probability of the outcome variable. This allows the predicted probabilities to remain between 0 and 1. 3) Odds ratios from logistic regression indicate how much more or less likely an outcome is given a one unit increase in the predictor variable, compared to the baseline level. Odds ratios above 1 indicate increased likelihood, while those between 0 and 1 indicate decreased likelihood.

Uploaded by

Mahmudul hasan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

3/31/2023

Logistic regression
• A version of the multiple regression in which
the outcome (i.e., the DV) is a categorical
Logit Models: variable.
Nonstop fun for the whole family – If the # of categories = 2, then it is called binary
logistic regression (or simply a logit model)
L9 – If the # of categories > 2, then it is called
“Gee dad, after dinner will you
“Hey I want to know about
show us how to do logit models?”
ordered Logit Models” multinomial logistic regression (or multinomial
logit model)
“Dad, I want to learn about
Multinomial Logit models!!!”

“But I want Conditional Logit Models”

Multiple regression (MR) vs. logistical


Form of Logit Relationship b/t DV & IV
regression (LR)
• MR: Predicting value of Y for a given value of 1.0

X…
• LR: Predict the probability of Y occurring given

Prob of Event (DV)


known values of X (or X’s)
– Linear probability models
– Logit models
– Binomial logit models
0
Low High

Level of Independent variable


Hair et al, 4th ed; p. 131

1
3/31/2023

Dad, why we can’t use ordinary linear


regression?

()
= B0+B1X1+….BnXn • Model needs to be linear…violated with
( ) categorical variable.
– “e” is the base of the natural log (~2.71…) • Values are constrained to 0-1 …with a linear
– The estimated coefficients of B0, B1, …,Bn are regression predicted values of Y may exceed 1 or be
less than 0. (See diagram)
– ODDS RATIO - Measures the change in the ratio of
the probabilities.
• To deal with this problem…
– Transform the data to express nonlinear
relationships in a linear

Dad…I’m kinda confused about the


For coefficients
odds ratio thing
• Take natural log of both sides of the equation • Probability of success of some event is 0.8
•  Log odds ratio • Then the probability of failure is 1- 0.8 = 0.2
• Odds of success = Prob of success / Prob of
Ln(Prob (event)/ Prob (No event))= B0 + B1X1 +….BnXn failure
• Odds of success = .8/.2 = 4
• Also called the Odds Ratio (OR)

2
3/31/2023

Gee dad, how do I interpret the sign? Odds Ratios versus Coefficients
• If reporting coefficients
– A positive coefficient  increase the probability or Coefficient
likelihood of Y = 1. Odds Ratio Ln(Odds Ratio)
– A negative coefficient  decreases the likelihood of Y 1.002267 0.00226
= 1).
2.234545 0.80404

• If reporting the odds ratio


– A value greater than 1  increase the odds in favor of
Y=1
– A value less than 1 (but positive…it will always be
positive)  decrease the odds in favor of Y = 1 0.9 -0.10536
0.5 -0.69315

In other words… A Fun Logit Example


• Odds ratio = 1.0  associated with a regression • Predicting the Probability of promotion to
coefficient is zero (B = 0)…indicating an absence
of a relationship. Associate Professor
• Odds ratio > 1.0  correspond to a positive B – Logit (promotion) = B0 + B1*Publications =
(regression) coefficient…Reflecting an increase in .39*publications – 6.00; where B1 = .39* B0 = -6.00
odds of being a case category associated with – The .39 implies that the “predicted logit” increase
each unit increase in X.
by 0.39 for each increase in one publication
• Odds ratio < 1.0  correspond to a negative B
• Odds (promotion) = e (.39*publications – 6.00)
(regression) coefficient…Reflecting a decrease in
odds of being a case category associated with • Probability (promotion) = 1/ (1+ e (.39*publications – 6.00))
each unit increase in X.

3
3/31/2023

Dad, you’re the best!!! But how do I


Another Really Fun Logit Example
talk about it in a paper?
• DV = Predicting compliance with Mammogram Screening guidelines (1=compliance; 0=no compliance)

• Variable B Exp(B_ (Odds ratio)


• Hypotheses
• PHYSREC 1.842 6.311
• KNOWLEDG -.079 .924 – H1: X1 increases the likelihood of Y
• BENEFITS .544 1.722


BARRIERS
Const.
-.581
-3.051
.559
– H2: X2 decreases the likelihood of Y

• Interpretation
– For a physician’s recommendation, (c.p.), 1.842  higher likelihood of compliance.


Recall that it is a logistical relationship…
So the odds ratio is computed using the exponential function  e1.842 = 6.311 (“e” equals about
– Publications increase the likelihood of receiving
2.741).
– So the odds of compliance with a physicians recommendation increase by a factor of over 6! tenure.

– Alternatively, perceived barriers (B = -.581) corresponds to an odds ratio of e -.581 = .559. So the odds
of compliance with perceived barriers decreases by a factor of .559

Odds Ratio & Ln(Odds Ratio) Assessing the model


• Likelihood value: Similar to sum of squared errors…in that it is an
indicator of how much unexplained error information there is after
• X1 – Gender dummy (Male =1; female =0) the model has been fitted.
• DV = Admittance to a particular graduate program
– It measures -2 log of the likelihood value
– -2LL or -2 Log likelihood
• Assume B1 coefficient = 1.694596 – Well fitted model  -2LL has small value.
– Interpretation  a one unit change in gender (i.e., being male) results – Perfect fit  likelihood = 1; -2LL = 0
in a 1.694596 unit change in the log odds of being admitted.
• LL…Compare (new model) vs. a baseline model (only a
• Odds ratio  e1.694596 = 5.44 constant…that is it assumes all coefficients equal “0”)…
– Meaning = The odds of being admitted increases by a factor of 5.44 for – R^2Logit = (-2LLnull – (-2LLmodel))/-2LLnull
males. – Chi square distribution

4
3/31/2023

But dad, what about the predictor


But how do you estimate it???
variables?
• Maximum likelihood estimation • The Wald test is unreliable in logit analysis…
– Originally developed by R.A. Fisher in the 1920s, – Use a LR (likelihood ratio) test
states that the desired probability distribution is the – Estimation of a logit model is usually by MLE.
one that makes the observed data ‘‘most likely,’’
– No universally-accepted goodness of fit measure (i.e.,
–  one must seek the value of the parameter vector
that maximizes the likelihood function. pseudo R2)
• The resulting parameter vector (i.e., the – Be careful if you use percentage of correct predictions
coefficients), which is sought by searching the – Kennedy mentions a test of the fraction of 1’s
multi-dimensional parameter space, is called the correctly predicted + the fraction of zeroes correctly
MLE estimate predicted. Should be greater than 1 (see. P. 249)

Dichotomous DV (Two
Bowen & Wiersema (Modeling Limited DVs)
choices/categories)
• DV  0-1 dummy variable • The interpretation of the directional impact (+
– Can’t have values outside of 0 or 1 …(e.g., it is a buy, or -) of a change in an explanatory variable in
no buy decision)
the binary LM or PM is identical to that for
• Logit modeluses logistical distribution
OLS, except, …the direction of the effect refers
• Probit modeluses standard normal distribution
to the change in the probability of the choice
– They tend to produce similar results
– Heterosked. causes major problems for Logit and
for which y = 1).
Probit models
– Use probit models for sample selection (Heckman
models)!!!

5
3/31/2023

See STATA Logit example Interpreting the logit output


See Logit model annotated output


• . logit admit gre gpa i.rank, robust

• Iteration 0: log pseudolikelihood = -249.98826 – For every one unit change in gre, the log odds of
• Iteration 1: log pseudolikelihood = -229.66446


Iteration
Iteration
2:
3:
log
log
pseudolikelihood
pseudolikelihood
=
=
-229.25955
-229.25875
admission (versus non-admission) increases by 0.002.
• Iteration 4: log pseudolikelihood = -229.25875

• Logistic regression Number of obs = 400
– For a one unit increase in gpa, the log odds of being


• Log pseudolikelihood = -229.25875 Pseudo R2
Wald chi2(5)
Prob > chi2
=
=
=
0.0829
36.66
0.0000 admitted to graduate school increases by 0.804.
------------------------------------------------------------------------------
– The indicator variables for rank have a slightly

• | Robust
• admit | Coef. Std. Err. z P>|z| [95% Conf. Interval]


-------------+----------------------------------------------------------------
gre | .0022644 .0011027 2.05 0.040 .0001032 .0044257 different interpretation. For example, having attended
• gpa | .8040377 .3451359 2.33 0.020 .1275838 1.480492


|
rank |
an undergraduate institution with rank of 2, versus an



2 | -.6754429
3 | -1.340204
4 | -1.551464
.3144686
.3445257
.4160544
-2.15
-3.89
-3.73
0.032
0.000
0.000
-1.29179
-2.015462
-2.366915
-.0590958
-.6649459
-.7360121
institution with a rank of 1, decreases the log odds of


|
_cons | -3.989979 1.138089 -3.51 0.000 -6.220593 -1.759366
admission by 0.675.
• ------------------------------------------------------------------------------

Similar results for a probit model Additional tests

6
3/31/2023

Hoetker, 2007
• Since y* is unobserved, we use do not know • For logit models that report the odd ratio
the distribution of the errors, ε 1:1  an event is equally likely to occur (50% prob)
• In order to use maximum likelihood 2:1  an event is twice as likely to occur (66.7%
estimation (ML), we need to make some prob)
assumption about the distribution of the The effect of a one unit change in variable X is to
errors. change the odds by a factor of exp(Bx)
Values > 1 increase the odds of the event occurring
• A good (but a bit technical) summary of MLE: Values < 1 decrease the odds of the event occurring
– https://online.stat.psu.edu/stat415/lesson/1/1.2

Dad, “What if you have more than 2 CEO’s preferred flavor of ice cream:
categories?” Chocolate, Vanilla, or Strawberry
• Multinomial logit models
Manatee
– Use when there are > 2 categories
– Categories
Poop…
Served at
• Ordinal  consisting of ordered categories Lickety Split in
– Socio-economic status (e.g., lower, middle, upper class) Englewood FL!!!
– Use ordered logit model (in Stata, ologit command)
• or
• Nominal  consisting of unordered categories
– Favorite ice cream flavor (e.g., Vanilla, chocolate, strawberry,
manatee poop (FL)

7
3/31/2023

Hypotheses and testing for multiple


The Florida Manatee categories
• Hypothesis examples
– Age increases the likelihood of selecting choosing
chocolate versus manatee poop
– Musicians are more likely to choose vanilla versus
manatee poop
• Testing
– A variable named flavor
• chocolate = 1; vanilla = 2; manatee poop = 3
– Anchor on manatee poop
– mlogit flavor age Musicians gender, base(3) robust

Multinomial logit/multinomial probit Interpretation of coefficient estimates


Manatee Poop vs. Vanilla or Chocolate vs. Vanilla

– Polychotomous DVs (Many choices/categories) • Different than linear regression


• Consider four choices  A, B, C, and D. – Other choices (chocolate or Manatee Poop) relative to
• Consider A as the Base case…compare A – B, A – C, and A – D the reference group (vanilla)
– For a 1 unit change in the IV, the multinomial log
– Weakness of the MNL  odds for preferring Manatee Poop to the reference
• Characterized by the IIA (Independence of irrelevant group (Vanilla) is expected to increase/decrease by
alternatives) coefficient units, ceteris paribus.
• MNL  it is inappropriate when two or more alternatives
are close substitutes
• More useful to examine the marginal effects of a
• MNP allows the error terms to be correlated across variable (Hoetker, 2007)
alternatives thereby permitting it to circumvent the IIA
• “How much a change in a variable changes the probability of
problem. the focal outcome” (p. 334)

8
3/31/2023

Stata code See STATA handout


• Logit and Probit • Multinomial Logit Model Example handout
– logit DV IV1 IV2…
– probit DV IV1 IV2…
• Multinomial Logit & Multinomial Probit
– mlogit DV IV1 IV2…
– mprobit DV IV1 IV2…

Dad, one more question, “What if I Dad, will you talk about
have panel data?” Conditional Logit Models?
• Logit and Probit • “Not today son…We have to save some of the
– xtlogit DV IV1 IV2… fun for next time”
– xtprobit DV IV1 IV2… • “Ok dad”
• Multinomial Logit & Multinomial Probit
– xtmlogit DV IV1 IV2…
– xtmprobit DV IV1 IV2…

9
3/31/2023

For more on Logit Models,


• https://www.youtube.com/watch?v=vCSh613
UMic

10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy