Examples1 2up
Examples1 2up
Bayes Risk
1. In many pattern classification problems, one has the option either to assign the
pattern to one of the c classes, or to reject it as being unrecognizable. If the cost to
reject is not too high, rejection may be a desirable action. Let the cost of classification
be defined as
0 ωi = ωj (i.e. (Correct classification)
λ(ωi |ωj ) = λr ωi = ω0 (i.e. Rejection)
λs Otherwise (i.e. Substitution Error)
Show that for minimum risk classification, the decision rule should associate a test
vector x with class ωi , if P (ωi |x) ≥ P (ωj |x) for all j and P (ωi |x) ≥ 1 − λr /λs , and
reject otherwise.
EM and Mixture Models
2. † For d-dimensional data compare the computational cost of calculating the log-
likelihood with a diagonal covariance matrix Gaussian distribution, a full covariance
matrix Gaussian distribution and an M -component diagonal covariance matrix Gaus-
sian mixture models. Clearly state any assumptions made.
3. A 1-dimensional 2-component mixture distribution has a common fixed known vari-
ance = 1 and initial mean values µ1 = 0 µ2 = 2 and mixture weights c1 = c2 = 0.5.
There is a data set of 9 training data points provided
(a) Calculate the log likelihood of the training data for the mixture distribution with
the initial parameters.
(b) Calculate updated values for the mean and mixture weights for 1 iteration of the
E-M algorithm.
4. Consider an M component mixture model of d-dimensional binary data x of the form
∑
M
p(x) = P (ωm )p(x|ωm )
m=1
1
where the j th component PDF has parameters λj1 , . . . , λjd and
∏
d
p(x|ωj ) = λxjii (1 − λji )1−xi
i=1
A set of training samples x1 , . . . , xn are used to train the mixture model. Using
the standard form of EM with mixture models show that the maximum likelihood
estimate for the “new” parameters, λ̂ji , is given by
∑n
k=1 P (ωj |xk )xki
λ̂ji = ∑
k=1 P (ωj |xk )
n
2
(a) Show that the overall sequence of observarions can be written in the following
form
x1
x1 − 0
x1
x2
Ax = A x2 =
x2 − x1
x3
x3
x3 − x2
Show that the posterior probabilty of the hidden and observed variables can be
expressed as
1
P (hj = 1|x, θ) = ∑
1 + exp(−bj − di=1 x1
w )
σi ij
∑
p(xi |h, θ) = N (xi ; ai + σi hj wij , σi2 )
j
Why is this form of expression important when training Restricted Boltzmann ma-
chines?
3
Single Layer Perceptrons
8. The standard single layer perceptron is used to discriminate between two classes.
There are two simple techniques for generalising this to a K class problem. The
first is to build a set of pairwise classifiers i.e. ωi versus ωj , j ̸= i. The second
is to build a set of classifiers of each class versus all other classes i.e. ωi versus
{ω1 , . . . , ωi−1 , ωi+1 , ωK }. Compare the two forms of classifier in terms of training and
testing computational cost. By drawing a specific example with K = 3 show that
both forms of classifier can result in an “ambiguous” region i.e. no decision can be
made. Describe how multiple binaries classifiers may be trained so that no ambigu-
ous regions exist.
Answers
3. (a) total log-likelihood of data (natural log (ln)) -15.302 (likelihood 2.262e-07); (b)
µ̂1 = −0.0426 ; µ̂2 = 1.878 ; ĉ1 = 0.5266; ĉ2 = 0.4734
M.J.F. Gales
P.C. Woodland
Oct 2003 - Jan 2007