0% found this document useful (0 votes)
19 views25 pages

-Expectation Maximization Algorithm

The Expectation Maximization (EM) algorithm, introduced by Dempster, Laird, and Rubin in 1977, is an iterative method for deriving maximum likelihood estimates in the presence of incomplete data. It consists of two main steps: the Expectation step (E-step), where expected values of complete data are calculated, and the Maximization step (M-step), where the Q-function is maximized. The algorithm is advantageous for producing valid parameters without requiring gradients, but it can exhibit non-differentiable behavior that complicates optimization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views25 pages

-Expectation Maximization Algorithm

The Expectation Maximization (EM) algorithm, introduced by Dempster, Laird, and Rubin in 1977, is an iterative method for deriving maximum likelihood estimates in the presence of incomplete data. It consists of two main steps: the Expectation step (E-step), where expected values of complete data are calculated, and the Maximization step (M-step), where the Q-function is maximized. The algorithm is advantageous for producing valid parameters without requiring gradients, but it can exhibit non-differentiable behavior that complicates optimization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Expectation Maximization

Algorithm
Introduction
• Presented by Dempster, Laird and Rubin in 1977
• Basically the same principle was already proposed
earlier by some other authors in specific
circumstances
• EM algorithm is an iterative estimation algorithm that
can derive the maximum likelihood (ML) estimates in
the presence of missing/hidden data (“incomplete
data”)
Many to one mapping
Expectation Step (E-Step)
• The basic functioning of the EM algorithm can
be divided into two steps (the parameter to be
estimated is θ):
• Expectation step (E-step)

• Take the expected value of the complete data


given the observation and the current
parameter estimate
Maximization Step (M-Step)
• Maximization step (M-step)
• Maximize the Q-function in the E-step (basically, the data
of the E-step is used as it were measured observations)

• The likelihood of the parameter is increased at every


iteration
• – EM converges towards some local maximum of the
likelihood function
Coin Toss Example
• We have two coins: A and B
• The probabilities for heads are θA and θB
• We have 5 measurement sets including 10 coin
tosses in each set
• If we know which of the coins are tossed in each set,
we can calculate the ML probabilities for θA and θB
• If we don’t know which of the coins are tossed in
each set, ML estimates cannot be calculated directly
→ EM algorithm
Maximum Likelihood
ML Method
• ML method if we know the coins
Binomial Distribution
• Likelihood of coin tossing is
estimated using Binomial
Distribution
n
C p (1  p )
k
k
i i
n k
Case 1:
• HTTTHHTHTH
=̂ 0.6 and = ̂0.5
(0) (0)

• Initially assumed values: A B

• Coin A
• 10
C5 (0.6)5 (0.4)5 = 252  0.07776  0.01024
•  0.201
• Coin B
• 10
C5 (0.5)5 (0.5)5 = 252  0.03125  0.03125
•  0.246 1
0.201  0.246
• Normalization Factor =
• = 2.237
• Normalized value for A: 0.201  2.237 = 0.45
• Normalized value for B: 0.246  2.237 = 0.55
Case 2:
• HHHHTHHHHH
Initially assumed values:̂ = 0.6 and̂
(0) (0)
• A B = 0.5
• Coin A
• 10
C9 (0.6)9 (0.4)1  0.0403
• Coin B
• 10
C9 (0.5)9 (0.5)1  0.0098
• Normalization Factor
• 1/(0.0403+0.0098)
• = 19.96
• Normalized value for A: 0.0403  19.96 = 0.80
• Normalized value for B: 0.0098  19.96 = 0.20
Case 3:
• HTHHHHHTHH
• Initially assumed values: ̂A = 0.6 and ̂ = 0.5
(0) (0)
B

• Coin A
• 10C8 (0.6)8 (0.4)2
•  0.121
• Coin B
• 10C8 (0.5)8 (0.5)2
•  0.044
• Normalization Factor
• = 6.061
• Normalized value for A: 0.121  6.061 = 0.73
• Normalized value for B: 0.044  6.061 = 0.27
Case 4:
• HTHTTTTHHT
Initially assumed values: ̂ = 0.6 and ̂ = 0.5
(0) (0)
• A B

• Coin A
• 10
C4 (0.6)4 (0.4)6
•  0.1115
• Coin B
• 10
C4 (0.5)4 (0.5)6
•  0.2051
• Normalization Factor
• = 3.1586
• Normalized value for A: 0.1115  3.1586 = 0.35
• Normalized value for B: 0.2051  3.1586 = 0.65
Case 5:
• THHHTHHHTH
• Initially assumed values: ̂A = 0.6 and ̂ = 0.5
(0) (0)
B

• Coin A
• 10C7 (0.6)7 (0.4)3
•  0.215
• Coin B
• 10C7 (0.5)7 (0.5)3
•  0.1172
• Normalization Factor
• = 3.01
• Normalized value for A: 0.215  3.01 = 0.65
• Normalized value for B: 0.1172  3.01 = 0.35
Calculations Coin A Coin B
• 0.45  5, 5 0.55  5, 5  2.2 H, 2.2 T  2.8 H, 2.8 T
• 0.80  9, 1 0.20  9, 1  7.2 H, 0.8 T  1.8 H, 0.2 T
• 0.73  8, 2 0.27  8, 2  5.8 H, 1.5 T  2.2 H, 0.5 T
• 0.35  4, 6 0.65  4, 6  1.4 H, 2.1 T  2.6 H, 3.9 T
• 0.65  7, 3 0.35  7, 3  4.6 H, 2 T  2.4 H, 1 T
•  21.2H, 8.6 T  11.8 H, 8.4 T
M-Step Calculation for next iteration

21.2
ˆ (1)
 0.71
21.2  8.6
A

11.7
ˆ (1)
 0.58
11.7  8.4
B
Advantages
• The EM algorithm will naturally produce valid
parameters for the mixture distribution.
• The EM algorithm doesn't require the gradient.
• From an implementation standpoint, the EM
algorithm is often described as being very simple, but
plugging things into a standard optimization solver
sounds even simpler.
Disadvantages
• Because of the uniform distribution, it seems like your
objective function might have some nasty behavior.
• For example, imagine sliding the uniform distribution to
the side. The likelihood won't change at all (the derivative
will be zero w.r.t. the location parameter), until a data
point falls into or out of the support of the distribution.
• The likelihood will then abruptly jump to a new value, and
the function will be non differentiable at this point.
• This kind of behavior doesn't play nicely with gradient-
based optimization algorithms. I don't know how it would
affect the EM algorithm.
References
• Andrew Ng. CS229 Lecture Notes – EM
Algorithm
• Chuong B Do & Serafim Batzoglou. What is the
expectation maximization algorithm? Nature
Biotechnology 26, 897 - 899 (2008)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy