0% found this document useful (0 votes)
15 views4 pages

An Alternative View of EM - Poornima

Uploaded by

vm4512
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views4 pages

An Alternative View of EM - Poornima

Uploaded by

vm4512
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 4

An Alternative View of EM

 In practice, the complete data set {X,Z} could not be obtained, so only the incomplete data X will be
available.
 The state of knowledge of the values of the latent variables in Z is given only by the posterior
distribution p(Z|X, θ).
 Due to incomplete data, each cycle of EM will increase the incomplete-data log likelihood.
 Solution:
 The distribution of the observed values is obtained by taking the joint distribution of all the
variables and then marginalizing over the missing ones.
 EM can then be used to maximize the corresponding likelihood function EM algorithm.
 It will be a valid procedure if the data values are missing at random, that the mechanism causing values
to be missing does not depend on the unobserved values
 Eg: sensor fails to return a value whenever the quantity it is measuring exceeds some threshold
 Relation to K-means
 Comparison of the K-means algorithm with the EM algorithm for Gaussian mixtures shows that there is
a close similarity.
 K-means algorithm performs a hard assignment of data points to clusters, in which each data point is
associated uniquely with one cluster whereas the EM algorithm
Dr.S.Poornima
makes
Dept of Computing a soft assignment based on1 the
Technologies
Source: Christopher M Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
posterior probabilities
An Alternative View of EM
 Derive the K-means algorithm as a particular limit of EM for Gaussian mixtures
 The expected complete-data log likelihood is given by,

 Maximizing the expected complete-data log likelihood is equivalent to minimizing the distortion
measure J for the K-means algorithm
 Mixtures of Bernoulli distributions
 Bernoulli distributions are used for describing the mixtures of discrete binary variables, this models is
called as latent class analysis.
 Bernoulli distribution with parameter μi is given by,

 Derive the EM algorithm for maximizing the likelihood function for the mixture of Bernoulli
distributions with latent variable z.
 The conditional distribution of x, given the latent variable z is given by,

Dept of Computing Technologies


Dr.S.Poornima 2
Source: Christopher M Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
An Alternative View of EM
 The expectation of the complete-data log likelihood with respect to the posterior distribution of the latent
variables is,

where γ(znk) = E[znk] is the posterior probability


 In the E step, the posterior probability is evaluated using Bayes’ theorem,

 The likelihood function does not goes to infinity since it is bounded above because 0<=p(xn|μk)<= 1
 The EM algorithm always increases the value of the likelihood function, until a local maximum is found
 EM for Bayesian linear regression
 Alternative approach for finding the hyper parameters α and β based on the EM algorithm.
 The complete-data log likelihood function is then given by,

Dr.S.Poornima Dept of Computing Technologies 3


Source: Christopher M Bishop, Pattern Recognition and Machine Learning, Springer, 2006.
An Alternative View of EM
 E step: Expectation with respect to the posterior distribution of w,

 M step: e-estimation equation,

where M - Eigen decomposition


 In the E step, compute the posterior distribution of w given the current setting of the parameters
α and β and then use this to find the expected complete-data log likelihood.
 In the M step, maximize this quantity with respect to α and β.
• Thus, this method optimize marginal likelihood function using EM.

Dept of Computing Technologies


Dr.S.Poornima 4
Source: Christopher M Bishop, Pattern Recognition and Machine Learning, Springer, 2006.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy