An Alternative View of EM - Poornima
An Alternative View of EM - Poornima
In practice, the complete data set {X,Z} could not be obtained, so only the incomplete data X will be
available.
The state of knowledge of the values of the latent variables in Z is given only by the posterior
distribution p(Z|X, θ).
Due to incomplete data, each cycle of EM will increase the incomplete-data log likelihood.
Solution:
The distribution of the observed values is obtained by taking the joint distribution of all the
variables and then marginalizing over the missing ones.
EM can then be used to maximize the corresponding likelihood function EM algorithm.
It will be a valid procedure if the data values are missing at random, that the mechanism causing values
to be missing does not depend on the unobserved values
Eg: sensor fails to return a value whenever the quantity it is measuring exceeds some threshold
Relation to K-means
Comparison of the K-means algorithm with the EM algorithm for Gaussian mixtures shows that there is
a close similarity.
K-means algorithm performs a hard assignment of data points to clusters, in which each data point is
associated uniquely with one cluster whereas the EM algorithm
Dr.S.Poornima
makes
Dept of Computing a soft assignment based on1 the
Technologies
Source: Christopher M Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
posterior probabilities
An Alternative View of EM
Derive the K-means algorithm as a particular limit of EM for Gaussian mixtures
The expected complete-data log likelihood is given by,
Maximizing the expected complete-data log likelihood is equivalent to minimizing the distortion
measure J for the K-means algorithm
Mixtures of Bernoulli distributions
Bernoulli distributions are used for describing the mixtures of discrete binary variables, this models is
called as latent class analysis.
Bernoulli distribution with parameter μi is given by,
Derive the EM algorithm for maximizing the likelihood function for the mixture of Bernoulli
distributions with latent variable z.
The conditional distribution of x, given the latent variable z is given by,
The likelihood function does not goes to infinity since it is bounded above because 0<=p(xn|μk)<= 1
The EM algorithm always increases the value of the likelihood function, until a local maximum is found
EM for Bayesian linear regression
Alternative approach for finding the hyper parameters α and β based on the EM algorithm.
The complete-data log likelihood function is then given by,