0% found this document useful (0 votes)
5 views39 pages

ML Lecture16

Uploaded by

Aniket Dwivedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views39 pages

ML Lecture16

Uploaded by

Aniket Dwivedi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Machine Learning

CSE343/CSE543/ECE363/ECE563
Lecture 16 | Take your own notes during lectures
Vinayak Abrol <abrol@iiitd.ac.in>
Gaussian Mixture Model (GMM)
GMM is a parametric probability density
function represented as a weighted sum
of Gaussian component densities.

Soft assignment: each data point is assigned


to each cluster with a probability.
Gaussian Mixture Model (GMM)
GMM is a parametric probability density
function represented as a weighted sum
of Gaussian component densities.

Soft assignment: each data point is assigned


to each cluster with a probability.
GMM: Mathematical Model

Assignment: wk is the probability that i’s


cluster is k. It has discrete probability
distribution.
wk is called Mixture Weights
GMM: Mathematical Model

Assignment: wk is the probability that i’s


cluster is k. It has discrete probability
distribution.
wk is called as Mixture Weights

Generation: given the cluster assignment


generate each example from its distribution
GMM: Mathematical Model

Assignment: wk is the probability that i’s


cluster is k. It has discrete probability
distribution.
wk is called as Mixture Weights

Generation: given the cluster assignment


generate each example from its distribution
GMM: The Likelihood Function
GMM: The Likelihood Function

Likelihood and Joint Probability distribution

Recall Lecture on Linear Regression!

We already know the mathematical trick


GMM: The Likelihood Function

Likelihood and Joint Probability distribution

Recall Lecture on Linear Regression!

We already know the mathematical trick


GMM: The Likelihood Function

Likelihood and Joint Probability distribution

Recall Lecture on MLE/MAP!

We already know the mathematical trick


GMM: The Likelihood Function
GMM: The Likelihood Function

In this case we cannot pass the log through the sum


This problem occurs in many ML formulations when we have latent (hidden
variables) e.g., cluster assignment here.
GMM: The Likelihood Function

In this case we cannot pass the log through the sum


This problem occurs in many ML formulations when we have latent (hidden
variables) e.g., cluster assignment here.

Note, if you know the zi’s (cluster assignment), there is no sum (inner red one)
and the issue with the sum and the log goes away!
GMM: The Likelihood Function

In this case we cannot pass the log through the sum


This problem occurs in many ML formulations when we have latent (hidden
variables) e.g., cluster assignment here.

The mathematical tool to solve this problem is called Expectation-Maximization (EM)


EM is an iterative procedure where we update the zi’s and then update µ, Σ, and w.

E-step: compute cluster assignments (which are probabilistic)


M-step: update θ (which are the cluster’s properties)
EM Algorithm

The main idea in EM is to find a lower bound on


likelihood.
Maximizing the lower bound always leads to
higher values of likelihood.
EM Algorithm

The main idea in EM is to find a lower bound on


likelihood.
Maximizing the lower bound always leads to
higher values of likelihood.

Procedure:
Starting at θt with iteration t in orange, we construct the
surrogate lower bound A(θ, θt).

Maximizing A(θ, θt) increases our likelihood and the


maximum occurs at θt+1

We again construct a surrogate lower bound A(θ, θt+1),


and maximize it to get to the next iteration,which occurs
at point θt+2 and so on.
Back to GMM

Bayes Rule
Back to GMM

Bayes Rule
Back to GMM

Bayes Rule

E Step: Assignment
Back to GMM

Bayes Rule

E Step: Assignment

This is similar to k-means where we assign each point to a cluster probabilistically


at iteration t.
Back to GMM

M Step: Model Update

sum-log-sum
Back to GMM

M Step: Model Update

sum-log-sum sum-log-E
(convert sum to average)
Back to GMM

M Step: Model Update

sum-log-sum sum-log-E sum-E-log sum-sum-log


(convert sum to average) (Apply Jensen’s Inequality) (convert average to sum)
Back to GMM

M Step: Model Update

sum-log-sum sum-log-E sum-E-log sum-sum-log


(convert sum to average) (Apply Jensen’s Inequality) (convert average to sum)
Back to GMM

M Step: Model Update

sum-log-sum sum-log-E sum-E-log sum-sum-log


(convert sum to average) (Apply Jensen’s Inequality) (convert average to sum)

Ignoring trivial substitutions and calculations, we get


Back to GMM

M Step: Model Update

The update for w is tricker because of the constraint


Back to GMM

M Step: Model Update

The update for w is tricker because of the constraint

And we are back to the Lecture on SVMs. Can you guess What can we do?
Back to GMM

M Step: Model Update

The update for w is tricker because of the constraint

And we are back to the Lecture on SVMs. Can you guess What can we do?

Method of Lagrange Multiplier


Back to GMM

M Step: Model Update

The update for w is tricker because of the constraint

And we are back to the Lecture on SVMs. Can you guess What can we do?

Method of Lagrange Multiplier

Here we are using index k′ so as not to be confused with the sum over k
Back to GMM

M Step: Model Update

The update for w is tricker because of the constraint

And we are back to the Lecture on SVMs. Can you guess What can we do?

Method of Lagrange Multiplier

Here we are using index k′ so as not to be confused with the sum over k
GMM Likelihood in Detail
Visualization of EM procedure in GMM
Covariance Matrix: same vs different | full vs diagonal
Covariance Matrix: same vs different | full vs diagonal
Covariance Matrix: same vs different | full vs diagonal
Covariance Matrix: same vs different | full vs diagonal
Covariance Matrix: same vs different | full vs diagonal
Covariance Matrix: same vs different | full vs diagonal
Thanks

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy