The Bayesian Information Criterion
The Bayesian Information Criterion
Overview
Derivation
Use of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Introduction
BIC, the Bayesian information criterion, was introduced by
Schwarz (1978) as a competitor to the Akaike (1973, 1974)
information criterion.
Schwarz derived BIC to serve as an asymptotic approximation
to a transformation of the Bayesian posterior probability of a
candidate model.
In large-sample settings, the fitted model favored by BIC
ideally corresponds to the candidate model which is
a posteriori most probable; i.e., the model which is rendered
most plausible by the data at hand.
The computation of BIC is based on the empirical
log-likelihood and does not require the specification of priors.
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Introduction
In Bayesian applications, pairwise comparisons between
models are often based on Bayes factors.
Assuming two candidate models are regarded as equally
probable a priori, a Bayes factor represents the ratio of the
posterior probabilities of the models. The model which is
a posteriori most probable is determined by whether the Bayes
factor is less than or greater than one.
In certain settings, model selection based on BIC is roughly
equivalent to model selection based on Bayes factors (Kass
and Raftery, 1995; Kass and Wasserman, 1995).
Thus, BIC has appeal in many Bayesian modeling problems
where priors are hard to set precisely.
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Introduction
Outline:
Overview of BIC
Derivation of BIC
BIC and Bayes Factors
BIC versus AIC
Use of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Overview of BIC
Key Constructs:
True or generating model: g (y ).
Candidate or approximating model: f (y | k ).
Candidate class:
F(k) = {f (y | k ) | k (k)} .
Fitted model: f (y | k ).
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Overview of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Overview of BIC
The Bayesian information criterion is often called the Schwarz
information criterion.
Common acronyms: BIC, SIC, SBC, SC.
AIC provides an asymptotically unbiased estimator of the
expected Kullback discrepancy between the generating model
and the fitted approximating model.
BIC provides a large-sample estimator of a transformation of
the Bayesian posterior probability associated with the
approximating model.
By choosing the fitted candidate model corresponding to the
minimum value of BIC, one is attempting to select the
candidate model corresponding to the highest Bayesian
posterior probability.
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Overview of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Derivation of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Derivation of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Derivation of BIC
Applying Bayes Theorem, the joint posterior of Mk and k
can be written as
h((k, k ) | y ) =
(k) g (k | k) L(k | y )
,
m(y )
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Derivation of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Derivation of BIC
We obtain
2 ln P(k | y ) 2 ln {(k)}
Z
L(k | y ) g (k | k) dk
2 ln
S(k | y ).
Now consider the integral which appears above:
Z
L(k | y ) g (k | k) dk .
In order to obtain an approximation to this term, we take a
second-order Taylor series expansion of the log-likelihood
about k .
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Derivation of BIC
We have
k | y )
0 ln L(
ln L(k | y ) ln L(k | y ) + (k k )
k
"
#
2
k | y )
1
ln
L(
0
+ (k k )
(k k )
0
2
k k
h
i
1
0
k , y ) (k k ),
= ln L(k | y ) (k k ) n I(
2
where
k , y ) = 1 ln L(k0 | y )
I(
n
k k
Introduction
Overview
Derivation
Use of BIC
Derivation of BIC
Thus,
h
i
1
0
k , y ) (k k ) .
L(k | y ) L(k | y ) exp (k k ) n I(
2
We therefore have the following approximation for our
integral:
Z
L(k | y ) g (k | k) dk
Z
h
i
1
0
k , y ) (k k )
L(k | y )
exp (k k ) n I(
2
g (k | k) dk .
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Derivation of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Derivation of BIC
We therefore have
Z
L(k | y ) g (k | k) dk
k , y )|1/2
L(k | y ) (2)(k/2) |n I(
k , y )|1/2
= L(k | y ) (2)(k/2) n(k/2) |I(
(k/2)
2
k , y )|1/2 .
= L(k | y )
|I(
n
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Derivation of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Derivation of BIC
We can now write
S(k | y ) = 2 ln{(k)}
Z
L(k | y ) g (k | k) dk
2 ln
2 ln{(k)}
"
#
(k/2)
2
k , y )|1/2
2 ln L(k | y )
|I(
n
= 2 ln{(k)}
n n o
k , y )|.
2 ln L(k | y ) + k ln
+ ln |I(
2
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Derivation of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Introduction
Overview
Derivation
Use of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Introduction
Overview
Derivation
Use of BIC
Introduction
Overview
Derivation
Use of BIC
AIC and BIC share the same goodness-of-fit term, but the
penalty term of BIC (k ln n) is potentially much more
stringent than the penalty term of AIC (2k).
Thus, BIC tends to choose fitted models that are more
parsimonious than those favored by AIC.
The differences in selected models may be especially
pronounced in large sample settings.
Intuitively, why is the complexity penalization so much greater
for BIC than for AIC?
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Use of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Use of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Use of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
Use of BIC
Joseph E. Cavanaugh
171:290 Model Selection
Introduction
Overview
Derivation
Use of BIC
References
Joseph E. Cavanaugh
171:290 Model Selection