Discrete Choice Methods and Their Applications To Short Term Travel Decisions
Discrete Choice Methods and Their Applications To Short Term Travel Decisions
Introduction
1
2 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
DRAFT
M. Ben-Akiva and M. Bierlaire 3
DISCRETE CHOICE METHODS
Decision-maker
Discrete choice models are also referred to as disaggregate
models, meaning that the decision-maker is assumed to be an
individual. The “individual” decision-making entity depends on the
particular application. For instance, we may consider that a group
of persons (a household or an organization, for example) is the
decision-maker. In doing so, we may ignore all internal interactions
within the group, and consider only the decisions of the group as a
whole. We refer to “decision-maker” and “individual”
interchangeably throughout this chapter. To explain the
heterogeneity of preferences among decision-makers, a
disaggregate model must include their characteristics such as the
socio-economic variables of age, gender, education and income.
Alternatives
Analyzing individual decision making requires not only
knowledge of what has been chosen, but also of what has not been
chosen. Therefore, assumptions must be made about available
options, or alternatives, that an individual considers during a choice
process. The set of considered alternatives is called the choice set.
A discrete choice set contains a finite number of alternatives that
can be explicitly listed. The choice of a travel mode is a typical
example of a choice from a discrete choice set. The identification
of the list of alternatives is a complex process usually referred to as
choice set generation. The most widely used method for choice set
generation uses deterministic criteria of alternative availability. For
example, the possession of a driver’s license determines the
availability of the auto drive option.
The universal choice set contains all potential alternatives in the
application’s context. The choice set is the subset of the universal
choice set considered by, or available to, a particular individual.
DRAFT
4 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
Alternatives in the universal choice set that are not available to the
individual are therefore excluded from the choice set.
In addition to availability, the decision-maker’s awareness of the
alternative could also affect the choice set. The behavioral aspects
of awareness introduce uncertainty in modeling the choice set
generation process and motivate the use of probabilistic choice set
generation models that predict the probability of each feasible
choice set within the universal set. A discrete choice model with a
probabilistic choice set generation model is described later in this
chapter as a special case of the latent class choice model.
Attributes
Each alternative in the choice set is characterized by a set of
attributes. Note that some attributes may be generic to all
alternatives, and some may be alternative-specific.
An attribute is not necessarily a directly measurable quantity. It
can be any function of available data. For example, instead of
considering travel time as an attribute of a transportation mode, the
logarithm of the travel time may be used, or the effect of out-of-
pocket cost may be represented by the ratio between the out-of-
pocket cost and the income of the individual. Alternative
definitions of attributes as functions of available data must usually
be tested to identify the most appropriate.
Decision rule
DRAFT
M. Ben-Akiva and M. Bierlaire 5
DISCRETE CHOICE METHODS
DRAFT
6 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
DRAFT
M. Ben-Akiva and M. Bierlaire 7
DISCRETE CHOICE METHODS
DRAFT
8 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
DRAFT
M. Ben-Akiva and M. Bierlaire 9
DISCRETE CHOICE METHODS
DRAFT
10 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
distributed (or Type I extreme value). That is, εin for all i,n is
distributed as:
F (ε ) = exp[ − e − µ (ε − η ) ], µ > 0
f (ε ) = µe − µ ( ε −η ) exp[ − e − µ ( ε −η ) ]
j∈C n
DRAFT
M. Ben-Akiva and M. Bierlaire 11
DISCRETE CHOICE METHODS
P(i|C1 ) P(i|C2 )
=
P( j|C1 ) P( j|C2 )
.
Path 2
a b
O D
Path 1
Figure 1
DRAFT
12 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
e µT 1
P(1|{1,2a ,2b}) = P(2a|{1,2a ,2b}) = P(2b|{1,2a ,2b}) = =
∑e 3 µT
j ∈{1, 2 a ,2 b}
C
M
Cn = mn
m =1
and
Cmn ∩ Cm’n = ∅ ∀ m≠m’.
DRAFT
M. Ben-Akiva and M. Bierlaire 13
DISCRETE CHOICE METHODS
l =1
and
~
e µmVin
P(i|Cmn ) = .
∑e
~
µ mV jn
j ∈Cmn
DRAFT
14 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
var(ε~C mn ) if i and j ∈ C mn
Cov(U in , U jn ) =
0 otherwise
and the correlation is
µ2
1 − if i and j ∈ C mn
Corr(U in , U jn ) = µ m2 .
0 otherwise
Therefore, as the correlation is non negative, we have
µ
0≤ ≤ 1,
µm
and
µ
= 1 ⇔ corr(U in ,U jn ) = 0 .
µm
The parameters µ and µm are closely related in the model.
Actually, only their ratio is meaningful. It is not possible to identify
them separately. A common practice is to arbitrarily constrain one
of them to a specific value (usually 1).
As an example, we apply now the Nested Logit Model to the
route choice problem described in Figure 1. We partition the choice
set Cn={1,2a,2b} into C1n={1} and C2n={2a,2b}. The probability of
choosing path 1 is given by
1
P(1 | {1,2a,2b}) = µ
,
µ2
1+ 2
where µ2 is the scale parameter of the random term associated with
C2n, and µ is the scale parameter of the choice between C1n and C2n.
DRAFT
M. Ben-Akiva and M. Bierlaire 15
DISCRETE CHOICE METHODS
DRAFT
16 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
with 0≤θ1,θ2≤1.
This formulation simplifies the estimation process. For this
reason, it has been adopted by the Ben-Akiva and Lerman (1985)
textbook and in estimation packages like ALOGIT (Daly, 1987)
and HieLoW (Bierlaire, 1995, Bierlaire and Vandevyvere, 1995).
We emphasize here that these packages should be used with caution
when the same parameters are present in more than one nest.
Specific techniques inspired from artificial trees proposed by
Bradley and Daly (1991) must be used to obtain a correct
specification of the model. In the above example, if µ1=µ2, then
imposing the restriction β1=β2 is straightforward. However, for the
case of µ1≠µ2 and β1=β2=β, we define β*=µ1µ2β and create
artificial nodes below each alternative, with a scale µ2 for the first
nest and scale µ1 for the second. We refer the reader to Koppelman
and Chen (1998) for further discussion.
A direct extension of the Nested Logit Model consists in
partitioning some or all nests into sub-nests which can in turn, be
divided into sub-nests. The model described above is valid at every
DRAFT
M. Ben-Akiva and M. Bierlaire 17
DISCRETE CHOICE METHODS
The error terms ε~in and ε~C mn are independent. The error terms
ε~in are independent and identically Gumbel distributed, with unit
scale parameter (this assumption is not the most general, but
simplifies the derivation of the model). The distribution of ε~C mn is
such that the random variable max U jmn is Gumbel distributed with
j∈Cmn
DRAFT
18 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
M
P (i|Cn ) = ∑ P(Cmn |Cn ) Pn (i| Cmn )
m =1
where
µVC mn
e
P(C mn | C n ) = M
,
∑e
µVC ln
l =1
~
α im eVin
P(i | C mn ) = ,
∑α jm e
~
V jn
j∈Cmn
∑α
~
~
and VCmn = VCmn + ln
V jn
jm e .
j∈Cmn
eVin
∂ G V1n
∂e Vin
(
e ,..., e J n
V
)
P (i | C n ) =
(
µG eV1n ,..., e J n
V
.
)
DRAFT
M. Ben-Akiva and M. Bierlaire 19
DISCRETE CHOICE METHODS
∂ kG
( −1) k
( x ) ≤ 0 ∀x ∈ IR J+n .
∂xi1 ...∂xi k
The Multinomial Logit Model, the Nested Logit Model and the
Cross-Nested Logit Model are GEV models, with
Jn
G ( x ) = ∑ xiµ
i =1
1
McFadden’s original formulation with µ=1 was generalized to
µ>0 by Ben-Akiva and François (1983).
DRAFT
20 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
µ
M µm
G ( x) = ∑ ∑ α jm x µj m
m =1 j∈C n
for the Cross-nested Logit model.
The Probability Unit (or Probit) model should have been called
Normit, for Normal Probability Unit model. It is derived from the
assumption that the error terms of the utility functions are normally
distributed. The Probit model captures explicitly the correlation
among all alternatives. Therefore, we adopt a vector notation for
the utility functions:
Un = Vn + εn,
where Un, Vn and εn are (Jn×1) vectors. The vector of error terms
εn=[ε1n,ε2n,...,εJn]T is multivariate normal distributed with a vector of
means 0 and a JnxJn variance-covariance matrix Σn.
The probability that a given individual n chooses alternative i
from the choice set Cn is given by
P(i|Cn ) = P(U jn − U in ≤ 0 ∀j ∈ Cn ) .
we have that
∆iUn ~ N(∆iVn, ∆iΣn ∆iT).
The density function is given by
DRAFT
M. Ben-Akiva and M. Bierlaire 21
DISCRETE CHOICE METHODS
1
f i ( x ) = λ exp − ( x − ∆ iVn ) T ( ∆ i Σ n ∆Ti ) −1 ( x − ∆ iVn )
2
where
J n −1
−
λ = (2π ) 2
| ∆ i Σ n ∆Ti | −1 / 2
and
0 0
P(i | C n ) = P(∆ i U n ≤ 0) = ∫ ... ∫ f i ( x )dx1 ...dx i −1 dxi +1 ...dx J n .
−∞ −∞
DRAFT
22 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
Un = Vn + εn = Vn + Fn ζn,
where Un is a (Jn×1) vector of utilities, Vn is a (Jn×1) vector of
deterministic utilities, εn is a (Jn×1) vector of random terms, ζn is a
(M×1) vector of factors which are IID standard normal distributed,
and Fn is a Jn × M matrix of loadings that map the factors to the
random utility vector. This specification is very general. If M = J,
the number of alternatives in the universal set, we can define the
matrix F as the Cholesky factor of the variance-covariance matrix
Σ, that is Σ=F FT. Fn is then obtained by removing the rows
associated with unavailable alternatives. We describe here special
cases of factor analytical representations. They are discussed in
more details by Ben-Akiva and Bolduc (1996).
Heteroscedasticity
A heteroscedastic2 model is obtained when Fn is a Jn×Jn diagonal
matrix. Let T be a diagonal matrix containing the alternative
specific standard deviations σi. Fn is obtained by removing the rows
and columns of the unavailable alternatives. We obtain the
following model, in scalar form:
Uin = Vin + σiζin.
Factor Analytic
In this model, the general matrix Fn is divided into a matrix of
loadings Qn and a diagonal matrix T containing the factor specific
standard deviations. We obtain the following model,
2
Heteroscedasticity here refers to different variances among the
alternatives. We use it in this context to refer to a diagonal
variance-covariance matrix with potentially different terms on the
diagonal.
DRAFT
M. Ben-Akiva and M. Bierlaire 23
DISCRETE CHOICE METHODS
Un =Vn + Qn T ζn.
Or, in scalar form:
M
U in = Vin + ∑ qimnσ mζmn ,
m =1
where qimn are the elements of Qn and σm are the diagonal elements
of T. The matrix Qn is normalized so that
∑q
i ,m
2
imn = 1 ∀n .
DRAFT
24 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
3
Sometimes called Mixed Logit
DRAFT
M. Ben-Akiva and M. Bierlaire 25
DISCRETE CHOICE METHODS
Random coefficients
We conclude our discussion of the Hybrid Logit model with a
formulation of the Multinomial Logit Model with randomly
distributed coefficients:
υn = Xnβn+υ
Un=Vn+υ υn.
Assume that βn~N(β,Ω). If Γ is the Cholesky factor of Ω such that
ΓΓT=Ω, we replace βn by β+Γζζn to obtain
Un= Xn β+ Xn Γζζn+υ
υn.
It is an Hybrid Logit model with a factor analytic representation
with Fn= Xn Γ.
DRAFT
26 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
DRAFT
M. Ben-Akiva and M. Bierlaire 27
DISCRETE CHOICE METHODS
∏ P( A
i∈C
in = 1)∏ P( Ain = 0)
j∉C
P (C n ) = .
1 − ∏ P( Al = 0)
l∈M
DRAFT
28 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
Decision-Maker
DRAFT
M. Ben-Akiva and M. Bierlaire 29
DISCRETE CHOICE METHODS
Alternatives
DRAFT
30 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
Attributes
DRAFT
M. Ben-Akiva and M. Bierlaire 31
DISCRETE CHOICE METHODS
designed to avoid path enumeration, use link attributes and not path
attributes.
Among the many attributes that can potentially be included in a
utility function, travel time is probably the most important. But
what does travel time mean for the decision-maker? How does
she/he perceive travel time? Many models are based on the
assumption that most travelers are sufficiently experienced and
knowledgeable about usual network conditions and, therefore, are
able to estimate travel times accurately. This assumption may be
satisfactory for planning applications using static models. With the
emergence of Intelligent Transportation Systems, models that are
able to predict the impact of real-time information have been
developed. In this context, the "perfect knowledge" assumption is
contradictory with the ITS services that provide information.
Several approaches can be used to capture perceptions of travel
times. One approach represents travel time as a random variable in
the utility function. This idea was introduced by Burrell (1968) and
is captured by a random utility model. Also, the uncertainty or the
variability of travel time along a given path can be explicitly
included as an attribute of the path.
In addition to travel time, the following attributes are usually
included.
• Path length. The length of the path is likely to influence the
decision maker’s choice. Also, this attribute is easy to
measure. Note that it may be highly correlated with travel
time, especially in uncongested networks.
• Travel cost. In addition to the obvious behavioral motivation,
including travel cost in the utility function is necessary to
forecast the impact of tolls and congestion pricing, for
example. It is common practice to distinguish the so-called
out-of-pocket costs (like tolls), which are directly associated
DRAFT
32 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
Decision Rules
Shortest path
The simplest possible decision rule in the route choice context
assumes that each individual chooses the path with the highest
utility. Models based on deterministic utility maximization are
supported by efficient algorithms to compute shortest paths in a
graph (e.g. Dijkstra, 1959, and Dial, 1969). However, the
behavioral limitations of this approach have motivated the
development of stochastic models based on the random utility
model.
DRAFT
M. Ben-Akiva and M. Bierlaire 33
DISCRETE CHOICE METHODS
C-Logit
The C-Logit model, proposed by Cascetta et al. (1996) in the
context of route choice, is a Multinomial Logit Model which
captures the correlation among alternatives in a deterministic way.
They add to the deterministic part of the utility function a term,
DRAFT
34 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
j∈Cn
where Lij is the length4 of links common to paths i and j, and Li and
Lj are the overall length of paths i and j, respectively. βCF is a
coefficient to be estimated. The parameter γ may be estimated or
constrained to a convenient value, often 1 or 2.
Considering the path choice example in Figure 1, the
commonality factor for path 1 is zero because it does not overlap
with any other path. The commonality factor for paths 2a and 2b is
βCF ln(1 + [(T-δ)/T]γ ).
Note that the commonality factor of an alternative is not one of
its attributes. It can be viewed as a measure of how the alternative
is perceived within a choice set.
PS-Logit
Path-Size Logit is an application of the notion of elemental
alternatives and size variables. See Ben-Akiva and Lerman
(Chapter 9) for details about models with elemental and aggregate
alternatives. In the route choice context, we assume that an
4
or any other link-additive attribute
DRAFT
M. Ben-Akiva and M. Bierlaire 35
DISCRETE CHOICE METHODS
j∈Cn
and Γi is the set of links in path i; la and Li are the length of link a
and path i, respectively; δaj is the link-path incidence variable that is
one if link a is on path j and 0 otherwise; and L*Cn is the length of
the shortest path in Cn.
Considering again the path choice problem from Figure 1, the
size of path 1 is 1, and the size of paths 2a and 2b is (T+δ)/2T. It is
interesting to note that the size variable formulation is equivalent to
the commonality factor formulation for the extreme cases where
δ=0 or δ=T, assuming that βCF=1 and for any value of γ. However,
the two models are different for intermediary values.
DRAFT
36 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
itself and the choice of changing departure time. The latter appears
usually in the context of Traveler Information Systems, where
individuals may revisit a previous choice using additional
information. We now describe typical modeling assumptions
associated with the departure time choice model.
Decision-Maker
Alternatives
DRAFT
M. Ben-Akiva and M. Bierlaire 37
DISCRETE CHOICE METHODS
Attributes
DRAFT
38 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
Decision Rules
Conclusion
DRAFT
M. Ben-Akiva and M. Bierlaire 39
DISCRETE CHOICE METHODS
Acknowledgment
References
Anderson, S. P., de Palma, A. and Thisse, J.-F. (1992). Discrete Choice Theory of
Product Differentiation, MIT Press, Cambridge, Ma.
Antoniou, C., Ben-Akiva, M.E., Bierlaire, M., and Mishalani, R. (1997) Demand
Simulation for Dynamic Traffic Assignment. Proceedings of the 8th
IFAC/IFIP/IFORS symposium on transportation systems.
Ben-Akiva, M. E. and Boccara, B. (1995). Discrete choice models with latent choice
sets, International Journal of Research in Marketing 12: 9–24.
Ben-Akiva, M. E., Cyna, M. and de Palma, A. (1984). Dynamic model of peak period
congestion, Transportation Research B 18(4–5): 339–355.
DRAFT
40 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
Bolduc, D., Fortin, B. and Fournier, M.-A. (1996). The effect of incentive policies on
the practice location of doctors: A multinomial Probit analysis, Journal of labor
economics 14(4): 703.
Bradley, M. A. and Daly, A. (1991). Estimation of logit choice models using mixed
stated preferences and revealed preferences information, Methods for understanding
travel behaviour in the 1990's, International Association for Travel Behaviour,
Québec, pp.~116--133. 6th international conference on travel behaviour.
DRAFT
M. Ben-Akiva and M. Bierlaire 41
DISCRETE CHOICE METHODS
Cascetta, E., Nuzzolo, A., Russo, F. and Vitetta, A. (1996). A modified logit route
choice model overcoming path overlapping problems. Specification and some
calibration results for interurban networks, Proceedings of the 13th International
Symposium on the Theory of Road Traffic Flow (Lyon, France).
Chang, G. L. and Mahmassani, H. S. (1988). Travel time prediction and departure time
adjustment dynamics in a congested traffic system, Transportation Research B 22
(3): 217–232.
Dial, R. B. (1969). Algorithm 360: shortest path forest with topological ordering.,
Communications of ACM 12: 632–633.
Hendrickson, C. and Kocur, G. (1981). Schedule delay and departure time decisions in
a deterministic model, Transportation Science 15: 62–77.
Hendrickson, C. and Plank, E. (1984). The flexibility of departure times for work trips,
Transportation Research A 18: 25–36.
DRAFT
42 M. Ben-Akiva and M. Bierlaire
DISCRETE CHOICE METHODS
Koppelman, F. S. and Wen, C.-H. (1997). The paired combinatorial logit model:
properties, estimation and application, Transportation Research Board, 76th Annual
Meeting, Washington DC. Paper #970953.
Koppelman, F. S. and Wen, C.-H. (1998). Alternative nested logit models: Structure,
properties and estimation, Transportation Research B. (forthcoming).
Liu, Y.-H. and Mahmassani, H. (1998). Dynamic aspects of departure time and route
decision behavior under ATIS: modeling framework and experimental results,
presented at the 77th annual meeting of the Transportation Research Board,
Washington DC.
Luce, R. (1959). Individual choice behavior: a theoretical analysis, J. Wiley and Sons,
New York.
Manski, C. (1977). The structure of random utility models, Theory and Decision
8: 229–254.
DRAFT
M. Ben-Akiva and M. Bierlaire 43
DISCRETE CHOICE METHODS
McFadden, D. and Train, K. (1997). Mixed multinomial logit models for discrete
response, Technical report, University of California, Berkeley, Ca.
Nguyen, S. and Pallottino, S. (1987). Traffic assignment for large scale transit
networks, in A. Odoni (ed.), Flow control of congested networks, Springer Verlag.
Small, K. (1982). The scheduling of consumer activities: work trips, The American
Economic Review pp. 467–479.
Vovsha, P. (1997). Cross-nested logit model: an application to mode choice in the Tel-
Aviv metropolitan area, Transportation Research Board, 76th Annual Meeting,
Washington DC.Paper #970387.
Yai, T., Iwakura, S. and Morichi, S. (1997). Multinomial Probit with structured
covariance for route choice behavior, Transportation Research B 31(3): 195–208.
DRAFT