Dummy Variable CH 5
Dummy Variable CH 5
Dummy variables
Logit and Probit models
• Logit and Probit models are both types of regression models
commonly used in statistical analysis, particularly in the field of
binary classification.
• This means that the outcome of interest can only take on two
possible values / classes.
• In most cases, these models are used to predict whether or
not something will happen in form of binary outcome.
• For example, a bank might want to know if a particular
borrower might default on loan or otherwise.
What are Logit Models?
• Logit models are a form of a statistical model that is used to
predict the probability of an event occurring.
• Logit models are also called logistic regression models.
• The logit model is based on the logistic function (also called
the sigmoid function), which is used to model situations where
there are two / binary possible outcomes or categorical outcomes.
• The logistic function can be used to model a variety of
situations, including binary dependent variables, dichotomous
dependent variables, and categorical data.
•
Cont.
• The logit function is used to model the relationship between
the predictors and the probability of the event occurring,
and it produces an output on a continuous scale that ranges
from 0 to 1.
Cont.
• Logit models generally take one of two forms: multinomial logits and binary
logits.
• Multinomial logits predict a value from multiple mutually exclusive outcomes,
while binary logits predict either a 1 or 0 outcome from a single variable.
• In both cases, the model takes into account independent variables that may
influence the outcome, such as borrower’s credit score, income, debt-to-
income ratio, loan amount, etc when predicting whether borrower would
default on loan.
• The model then produces an estimated probability which is compared
against a predetermined threshold to determine if the predicted outcome is
correct or not.
Cont…
• The logit model is used to model the odds of success of an event as a
function of independent variables.
• where
• P is the probability of an event occurring, and
• l is the odds of an event occurring.
• Z is the linear combination of independent variables with coefficients.
What are Probit Models?
• Link function:
• The main difference between Logit and Probit models lies in the
choice of the link function used to model the relationship between
the predictor variables and the probability of the event occurring.
• In the case of the logit model, we use a logistic or sigmoid
function while in case of probit models, the probit link function Φ
used is a cumulative distribution function of the standard normal
distribution.
• Usage:
• The logit model is more widely used than the probit models and
has a more extensive literature.
Key Differences between Logit & Probit Models
• Model Assumptions:
• Logit and Probit models make different assumptions about the
distribution of the error term.
• Logit models assume that the error term follows a logistic
distribution, while Probit models assume that the error term
follows a normal distribution.
• Outliers:
• Logit model is also more robust to outliers as it uses a logistic
function but Probit model is more sensitive to outliers