0% found this document useful (0 votes)

24 views19 pages

ds11 2

Uploaded by

Aditya Shankar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as KEY, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views19 pages

ds11 2

Uploaded by

Aditya Shankar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as KEY, PDF, TXT or read online on Scribd

You are on page 1/ 19

The EM Algorithm

Generative Models
So far, we looked at probability theory as a tool to express the belief of
an ML algorithm that the true label is such and such
Likelihood: given model it tells us
We also looked at how to use probability theory to express our beliefs
about which models are preferred by us and which are not
Prior: this just tells us
Notice that in all of this, the data features were always considered
constant and never questions as being random or flexible
Can we also talk about ?
Very beneficial: given label , this would allow us to generate a new from the
distribution ?
Can generate new cat images, new laptop designs (GANs do this very thing!)
Generative Algorithms
ML algos that can learn dist. of the form or or
A slightly funny bit of terminology used in machine learning
Discriminative Algorithms: that only use to do their stuff
Generative Algorithms: that use or etc to do their stuff
Generative Algorithms have their advantages and disadvantages
More expensive: slower train times, slower test times, larger models
An overkill: often, need only to make predictions – disc. algos enough!
More frugal: can work even if we have very less training data (e.g. RecSys)
More robust: can work even if features corrupted e.g. some features missing
A recent application of generative techniques (GANs etc) allows us to
Generate novel examples of a certain class of data points
A very simple generative model
Given a few feature vectors (never mind labels for now)
We wish to learn a probability distribution with support over
This distribution should capture interesting properties about the data in a way
that allows us to do things like generate similar-looking feature vectors etc
Let us try to learn a standard Gaussian as this distribution i.e. we
wish to learn so that the distribution explains the data well
One way is to look for a that achieves maximum likelihood i.e. MLE!!
As before, assume that our feature vectors were independently generated
which, upon applying first order optimality, gives us
We just learnt as our generating dist. for data features!
Still more powerful generative model?
Suppose we are concerned that a single Gaussian cannot capture all
the variations in our data
Can we learn 2 (or more) Gaussians to represent our data instead?
Such a generative model is often called a mixture of Gaussians
The Expectation Maximization (EM) algorithm is a very powerful
technique for performing this and several other tasks
Soft clustering, learning Gaussian mixture models (GMM)
Robust learning, Mixed Regression
Also underlies more powerful variational algorithms such as VAE
Learning a Mixture of Two Gaussians
WeThis means that
suspect thatifinstead
someone tells us that
of one this means
Gaussian, twothatGaussians
the first Gaussian
are involved
is responsible for that data point and consequently, the likelihood expression
in generating our feature vectors
is . Similarly, if someone tells us that this means that the second
Gaussian is responsible for that data point and the likelihood expression is
Let us call them and .
Each of these is called a component of this GMM
Covariance matrices, more than two components can also be incorporated
Since we are unsure which data point came from which component,
we introduce a latent variable per data point to denote this
The English word “latent” means hidden or dormant or concealed
Nice name since this variable describes something that was hidden from us
These latent variables may seem similar to the one we used in (soft) k-means
Not an accident – the connections will be clear soon!
Latent variables can be discrete or continuous
MLE with Latent Variables
We wish to obtain the maximum (log) likelihood models i.e.

Since we do not know the values of latent variables, force them
into the expression using the law of total probability
We did a similar thing (introduce models) in predictive posterior calculations

Very difficult optimization problem – NP-hard in general
However, two heuristics exist which work reasonably well in practice
Also theoretically sound if data is “nice” (details in a learning theory
course)
Heuristic 1: Alternating Optimization
Keep alternating between step 1 and step 2
till you are tired or till the process has
Convert the original optimization problem converged!

to a double maximization problem (assume const)

In several ML problems with latent vars, although the above double
optimization
The problem is (still)
intuition behind difficult,
reducing following
things two
to a double problems isare
optimization thateasy
Step 1: FixThe most
itand important
mayupdate difference
latent
be mostly between
variables
the case that the
one
only original
to of
their and in
the optimal
terms thethe
new
values
problem is that original
summation will has a sumand
dominate of if
logthis
of is
sumthe which
case, is very
then
difficult to optimize whereas thelargest
new problem gets be
ridokay
of this and
approximating
Step 2: Fixlooks
latentsimply
variablesthe sum
by
and the
update termto should
their optimal i.e.
values
like a MLE problem. We know how to solve MLE
problems very easily!
Heuristic 1 at Work Isn’t this like the k-
means clustering
algorithm?
As discussed before, we assume a mixture of two Gaussians
and Not just “like” – this is the k-means algorithm! This
means that the k-means algorithm is one heuristic way
Step 1 becomes to compute an MLE which is difficult to compute
directly!

Indeed! Notice that even here, instead of choosing just
Step 2 becomes one value of the latent variables at each time step, we
can instead use a distribution over their support

I have a feeling that the
Thus, and where is the number of datasecond
pointsheuristic
for which
willwe
alsohave
give us something
Repeat! familiar!
Heuristic 2: Expectation Maximization
Original Prob:
Step 1 (E Step) Consists of two sub-steps
Step 1.1 Assume our current model estimates are
Use the current models to ascertain how likely are different values of for the -th data
Yet
point i.e. compute again, the new
for both problem gets rid of the treacherous “sum of
log of sum”
Step 1.2 Use weights toterms which
set up are difficult
a new objectiveto optimize.
functionThe new
problem
As before, assume instead looks simply like a weighted MLE problem
constant for sake of simplicity
with weights and we know how to solve MLE problems very

easily!
Step 2 (M Step) Maximize the new obj. fn. to get new models

Repeat!
Derivation of the E Step
Jensen’s inequality tells us that for any convex function. We used the fact
that is a concave function and so the inequality reverses since every concave
function is the negative of a convex function
Let denote the models to avoid clutter. Also let denote our
current estimate of the model
Law of total probability
Just need to see derivation for a single point, say the -th point
Simply multiply and
divide by the same term
Jensen’s inequality

Just renaming

does not depend on

(it depends on
Note: assumptions such as const are made for sake of simplicity. Can
The EM Algorithm
execute EM perfectly without making these assumptions as well. However, then
updates get more involved – be careful not to make mistakes
If we instantiate the EM algorithm with the GMM likelihoods, we
will recover the soft k-means EM foralgorithm
GMM
1. Initialize
Thus, the soft k-means means
algorithm is yet
another heuristic way (the k-means
algo was the first) For
2. to , update
compute an MLE usingisdifficult to compute directly!
which
The EM algorithm 1. Let pros
has and cons over alternating optimization
2. Let (normalize)
Con: EM is usually
3. more
Let expensive to execute than alternating optimization
Pro: EM will ensures
4.
that objective
Update value of the original problem i.e.
5. Repeat until convergence
… always keeps going up at every iteration – monotonic progress!!
However, no guarantee that we will ever reach the global maximum
May converge to, and get stuck at, a local maximum instead
The Q WeFunction
call it instead of just since uses the values which are
calculated using . Thus, the function keeps changing
Let be the new objective function constructed at time step
The Generic
The EM algorithm constructs a newEM Algorithm
function at each time during
the E-step1.and maximizes
Initialize modelit during
the M-step
2. For every latent variable and every possible
We have already valueseen
itthat
could take,
for allcompute

3. Compute the Q-function
Easy enough4. toUpdate
show that
Some indication
5. as to why
Repeat EM
until increases likelihood at each iteration
convergence
Alternating optimization can be seen as a cousin of EM that uses a
fn of the form where
A pictorial depiction
is notof the
necessarily anEM
inverted quadratic fn.
Just an illustration

The -curves always lie

below the red curve
The curves always
touch the red curve at
Stuck!! because
M-step maximizes
Mixed Regression
An example of latent variables
in aSure,
supervised
we couldlearning taskthis data first and then apply regression
try clustering
models
We haveseparately on both
regression clusters.
train data However, using latent variables may be
beneficial
since 1) clustering e.g. k-means may not necessarily work well
since the points here are really not close to two centroids (instead, they lie
Example:
close to two denotes
lines age andis
which k-means really not meant to handle) and 2) using
denotes
latent time spent
variables, weon
canwebsite
elegantly cluster and learn regression models
There are two subpopulations jointly!!
in
data (gender) which behave
differently even if age is same
An indication that our features

?
may be incomplete/latent
We could have had separate and for the two components as
Latent Variables to the Rescue
well which we could also learn. However, this would make things
more tedious so for now, let us assume and also that
As before, if we believe that our data is best explained using two
linear regression models instead of one, we should work with a
mixed model (aka mixture of experts)
Will fit two regression models to the data and use a latent variable
to keep track of which data point belongs to which model
Let us use Gaussian likelihoods since we are comfortable with it

Note: this is not generative learning since we are still learning
discriminative distributions of the form
Will see soon how to perform generative learning in supervised settings
MLE for Mixed Regression
which, upon introducing latent variables gives
Method 1: Alternating Optimization

As before, assume constant for sake of simplicity to get

Step 1: Fix and update all
Step 2: Fix all and update the models
Alternating Optimization for MR
As before, we assumed the likelihood
AltOpt for MR distributions as
for 1. Initialize models
2. For , update using
Step 1 becomes
1. Let
3. Update using
Let
i.e. assign every1.data to its “closest” line or the line which fits it better
point
Step 2 becomes
4. Repeat until convergence

i.e. perform least squares on the data points assigned to each component
May incorporate a prior as well to add a regularizer (ridge regression)
Repeat!
EM for Mixed Regression
Original Prob: EM for MR
Initialize
1. Step models
1 (E Step) (forof
Consists twocomponents)
sub-steps
For1.1
2. Step , updateour
Assume using model
current estimates are
Use Let
1. the models to ascertain how likely are different values of for the -th data
current
point i.e. compute for both
2. Let (normalize)
Step 1.2 Use weights to set up a new objective function
3.
AsUpdate
before, assume where
constant (apply
for sake first order optimality)
of simplicity
4. Repeat until convergence

Step 2 (M Step) Maximize the new obj. fn. to get new models

Repeat!

Internal Resistance Project Class 12
100% (6)
Internal Resistance Project Class 12
16 pages
The Possessed (Devils) by Fyodor Dostoevsky
No ratings yet
The Possessed (Devils) by Fyodor Dostoevsky
657 pages
Cyclone Collection Efficiency PDF
No ratings yet
Cyclone Collection Efficiency PDF
11 pages
Minh Hoa KTHK1 Anh 11 - Linh
No ratings yet
Minh Hoa KTHK1 Anh 11 - Linh
2 pages
Week 7 - Latent Variable Models and Expectation Maximization
No ratings yet
Week 7 - Latent Variable Models and Expectation Maximization
39 pages
CL6 Winter 2024-25
No ratings yet
CL6 Winter 2024-25
4 pages
Honka B2B Brochure 2020
No ratings yet
Honka B2B Brochure 2020
66 pages
Hardy Weinberg Law
No ratings yet
Hardy Weinberg Law
7 pages
Dissertation Penser Par Soi Meme
100% (2)
Dissertation Penser Par Soi Meme
6 pages
RK20S1 MEC420 Assignment 02 by RK
No ratings yet
RK20S1 MEC420 Assignment 02 by RK
2 pages
ML Columbia PDF
No ratings yet
ML Columbia PDF
615 pages
SOP For Export of Fruits and Vegetables To EU
100% (2)
SOP For Export of Fruits and Vegetables To EU
51 pages
EM Algorithm and Variants: An Informal Tutorial
No ratings yet
EM Algorithm and Variants: An Informal Tutorial
17 pages
Lec 13
No ratings yet
Lec 13
27 pages
SSc-Model Question Paper Solved 2023-24
No ratings yet
SSc-Model Question Paper Solved 2023-24
69 pages
ML Lecture16
No ratings yet
ML Lecture16
39 pages
Measurement of Irrigation Water
No ratings yet
Measurement of Irrigation Water
83 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
Tutorial em
No ratings yet
Tutorial em
57 pages
Expectation Maximization (EM) Algorithm
No ratings yet
Expectation Maximization (EM) Algorithm
47 pages
2B Naive Bayes
No ratings yet
2B Naive Bayes
90 pages
Expectation Maximization
No ratings yet
Expectation Maximization
21 pages
Unit 8 Stastical Learning Method
No ratings yet
Unit 8 Stastical Learning Method
4 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
Lecture-04 GMM EMalg
No ratings yet
Lecture-04 GMM EMalg
34 pages
Software Evaluation New 2023
No ratings yet
Software Evaluation New 2023
3 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
Aiml Lab Algorithms
No ratings yet
Aiml Lab Algorithms
10 pages
Lecture 19 and 20
No ratings yet
Lecture 19 and 20
27 pages
HMM Tutorial
No ratings yet
HMM Tutorial
15 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
Unit 3 ML
No ratings yet
Unit 3 ML
45 pages
Machine Learning: The Basics
No ratings yet
Machine Learning: The Basics
288 pages
Gaussian Mixtures
No ratings yet
Gaussian Mixtures
5 pages
NTPEP 16001.2 Final
No ratings yet
NTPEP 16001.2 Final
145 pages
Conductivity-Depth Imaging of Helicopter-Borne TEM Data Based On Pseudo-Layer Half Space Model
No ratings yet
Conductivity-Depth Imaging of Helicopter-Borne TEM Data Based On Pseudo-Layer Half Space Model
7 pages
Lec 24
No ratings yet
Lec 24
39 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
COMP4702 Notes 2019: Week 2 - Supervised Learning
No ratings yet
COMP4702 Notes 2019: Week 2 - Supervised Learning
23 pages
Dis10 Sol PDF
No ratings yet
Dis10 Sol PDF
6 pages
ML Unit 3 MID1
No ratings yet
ML Unit 3 MID1
83 pages
Bzdok, D., Altman, N., & Krzywinski, M. (2020)
No ratings yet
Bzdok, D., Altman, N., & Krzywinski, M. (2020)
6 pages
Oral Texte
No ratings yet
Oral Texte
12 pages
Lecture 5
No ratings yet
Lecture 5
16 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
UNIT 4 - EM Alg
No ratings yet
UNIT 4 - EM Alg
3 pages
As Phy Revision BK For Mid Term PDF
No ratings yet
As Phy Revision BK For Mid Term PDF
10 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
PROBABILISTIC Learning Jb-New
No ratings yet
PROBABILISTIC Learning Jb-New
13 pages
Unit 2
No ratings yet
Unit 2
7 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
WTC Foundation Beam MKD 03
No ratings yet
WTC Foundation Beam MKD 03
8 pages
AI29
No ratings yet
AI29
3 pages
6.2 K Means
No ratings yet
6.2 K Means
23 pages
Lec 12
No ratings yet
Lec 12
15 pages
2021 10 11 - Intro ML - Inserm
No ratings yet
2021 10 11 - Intro ML - Inserm
41 pages
BSR Tran Uno Bsu
No ratings yet
BSR Tran Uno Bsu
2 pages
Gaussian Mixture Modelling GMM
No ratings yet
Gaussian Mixture Modelling GMM
11 pages
ML-2-Expectation Maximization
No ratings yet
ML-2-Expectation Maximization
11 pages
Lecture 3 ML - Optimization
No ratings yet
Lecture 3 ML - Optimization
32 pages
Spectral Graph Theory - Wikipedia
No ratings yet
Spectral Graph Theory - Wikipedia
24 pages
WTS 12 Functions & Inverses
No ratings yet
WTS 12 Functions & Inverses
46 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
No ratings yet
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
7 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
Machine 2023 Part 1
No ratings yet
Machine 2023 Part 1
4 pages
Holiday Homework
No ratings yet
Holiday Homework
16 pages
Fischer SaMontec ENG PDF
No ratings yet
Fischer SaMontec ENG PDF
242 pages
Lecture3 EM
No ratings yet
Lecture3 EM
36 pages
Asset-V1 ColumbiaX+CSMM.102x+1T2018+type@asset+block@ML Lecture1
No ratings yet
Asset-V1 ColumbiaX+CSMM.102x+1T2018+type@asset+block@ML Lecture1
17 pages
The EM Algorithm: Ajit Singh November 20, 2005
No ratings yet
The EM Algorithm: Ajit Singh November 20, 2005
4 pages
Learning With Hidden Variables - EM Algorithm
No ratings yet
Learning With Hidden Variables - EM Algorithm
31 pages
Likelihood EM HMM Kalman
No ratings yet
Likelihood EM HMM Kalman
46 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Live Case Study - 1
No ratings yet
Live Case Study - 1
7 pages
Red Team Blue Team Exercise Data Sheet
No ratings yet
Red Team Blue Team Exercise Data Sheet
2 pages
ML 01
No ratings yet
ML 01
24 pages
Quiz 11 Unit .3 Patricia E. Benner Introduction of Nursing Theory & Model
No ratings yet
Quiz 11 Unit .3 Patricia E. Benner Introduction of Nursing Theory & Model
3 pages
Exam Program Nov 2022 (Civil Engg)
No ratings yet
Exam Program Nov 2022 (Civil Engg)
4 pages
Writing 1
No ratings yet
Writing 1
10 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
Traffic Sign Detection and Recognition Using Opencv: Icices2014 - S.A.Engineering College, Chennai, Tamil Nadu, India
No ratings yet
Traffic Sign Detection and Recognition Using Opencv: Icices2014 - S.A.Engineering College, Chennai, Tamil Nadu, India
6 pages
Physical Traces PDF
No ratings yet
Physical Traces PDF
150 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
Experiment No.: - Welding Procedure Specification (WPS) & Welder Performance Qualification (WPQ)
No ratings yet
Experiment No.: - Welding Procedure Specification (WPS) & Welder Performance Qualification (WPQ)
12 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
A First Look at Perturbation Theory
From Everand
A First Look at Perturbation Theory
James G. Simmonds
4.5/5 (3)
Simulated Annealing: Fundamentals and Applications
From Everand
Simulated Annealing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

ds11 2

Uploaded by

ds11 2

Uploaded by

The EM Algorithm

does not depend on

The -curves always lie

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

ds11 2

Uploaded by

ds11 2

Uploaded by

The EM Algorithm

￼ does not depend on ￼

The ￼-curves always lie

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

does not depend on

The -curves always lie