0% found this document useful (0 votes)

32 views30 pages

EM algorithm-ppt

Uploaded by

Sathish K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views30 pages

EM algorithm-ppt

Uploaded by

Sathish K

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Missing Data and the EM algorithm

MSc Further Statistical Methods

Lecture 4 and 5
Hilary Term 2007
Steffen Lauritzen, University of Oxford; January 31, 2007
Missing data problems

case A B C D E F
1 a1 b1 ∗ d1 e1 *
2 a2 ∗ c2 d2 e2 ∗
.. .. .. .. .. .. ..
. . . . . . .
n an bn cn ∗ ∗ ∗

∗ or NA denotes values that are missing , i.e. non-observed.

Examples of missingness

• non-reply in surveys;
• non-reply for specific questions: ”missing” ∼ don’t
know, essentially an additional state for the variable
in question
• recording error
• variable out of range
• just not recorded (e.g. too expensive)

Different types of missingness demand different treatment.

Notation for missingness

Data matrix Y , missing data matrix M = {Mij }:

1 if Yij is missing
Mij =
0 if Yij is observed.

Convenient to introduce the notation Y = (Yobs , Ymis ),

where Ymis are conceptual and denote the data that were
not observed.
This notation follows Little and Rubin (2002).
Patterns of missingness

Little and Rubin (2002) classify these into the following

techincal categories.
We shall illustrate with a case of cross-classification of Sex,
Race, Admission and Department, S, R, A, D.

Univariate: Mij = 0 unless j = j ∗ , e.g. an unmeasured

response. Example: R unobserved for some, but data
otherwise complete.
Multivariate: Mij = 0 unless j ∈ J ⊂ V , as above, just
with multivariate response, e.g. in surveys. Example:
For some subjects, both R and S unobserved.
Monotone: There is an ordering of V so Mik = 0 implies
Mij = 0 for j < k, e.g. drop-out in longitudinal
studies. Example: For some, A is unobserved, others
neither A nor R, but data otherwise complete.
Disjoint: Two subsets of variables never observed
together. Controversial. Appears in Rubin’s causal
model. Example: S and R never both observed.
General: none of the above. Haphazardly scattered
missing values. Example: R unobserved for some, A
unobserved for others, S, D for some.
Latent: A certain variable is never observed. Maybe it is
even unobservable. Example: S never observed, but
believed to be important for explaining the data.
Methods for dealing with missing data

Complete case analysis: analyse only cases where all

variables are observed. Can be adequate if most cases
are present, but will generally give serious biases in
the analysis. In survey’s, for example, this
corresponds to making inference about the population
of responders, not the full population;
Weighting methods. For example, if a population total
µ = E(Y ) should be estimated and unit i has been
selected with probability πi a standard method is the
Horwitz–Thompson estimator
P Yi
µ̂ = P π1i .
πi
To correct for non-response, one could let ρi be the
response-probability, estimate this in some way as ρ̂i
and then let P Yi
πi ρ̂i
µ̃ = P 1 .
πi ρ̂i

Imputation methods: Find ways of estimating the values

of the unobserved values as Ŷmis , then proceed as if
there were complete data. Without care, this can give
misleading results, in particular because the ”sample
size” can be grossly overestimated.
Model-based likelihood methods: Model the missing data
mechanism and then proceed to make a proper
likelihood-based analysis, either via the method of
maximum-likelihood or using Bayesian methods. This
appears to be the most sensible way.
Typically this approach was not computationally
feasible in the past, but modern algorithms and
computers have changed things completely. Ironically,
the efficient algorithms are indeed based upon
imputation of missing values, but with proper
corrections resulting.
Mechanisms of missingness

The data are missing completely at random, MCAR, if

f (M | Y, θ) = f (M | θ), i.e. M ⊥
⊥ Y | θ.

Heuristically, the values of Y have themselves no

influence on the missingness. Example is recording
error, latent variables, and variables that are missing
by design (e.g. measuring certain values only for the
first m out of n cases). Beware: it may be
counterintuitive that missing by design is MCAR.
The data are missing at random, MAR, if

f (M | Y, θ) = f (M | Yobs , θ), i.e. M ⊥⊥ Ymis | (Yobs , θ).

Heuristically, only the observed values of Y have
influence on the missingness. By design, e.g. if
individuals with certain characteristics of Yobs are not
included in part of study (where Ymis is measured).
The data are not missing at random, NMAR, in all other
cases.
For example, if certain values of Y cannot be
recorded when they are out of range, e.g. in survival
analysis.

The classifications above of the mechanism of missingness

lead again to increasingly complex analyses.
It is not clear than the notion MCAR is helpful, but MAR
is. Note that if data are MCAR, they are also MAR.
Likelihood-based methods

The most convincing treatment of missing data problems

seems to be via modelling the missing data mechanism, i.e.
by considering the missing data matrix M as an explicit
part of the data.
The likelihood function then takes the form
Z
L(θ | M, yobs ) ∝ f (M, yobs , ymis | θ) dymis
Z
= Cmis (θ | M, yobs , ymis )f (yobs , ymis | θ) dymis , (1)

where the factor Cmis (θ | M, y) = f (M | yobs , ymis , θ) is

based on an explicit model for the missing data mechanism.
Ignoring the missing data mechanism
The likelihood function ignoring the missing data
mechanism is
Z
Lign (θ | yobs ) ∝ f (yobs | θ) = f (yobs , ymis | θ) dymis . (2)

When is L ∝ Lign so the missing data mechanism can be

ignored for further analysis? This is true if:

1. The data are MAR;

2. The parameters η governing the missingness are
separate from parameters of interest ψ i.e. the
parameters vary in a product region, so that
information about the value of one does not restrict
the other.
Ignorable missingness

If data are MAR and the missingness parameter is separate

from the parameter of interest, we have θ = (η, ψ) and
Cmis (θ) = f (M | yobs , ymis , η) = f (M | yobs , η)
Hence, the correction factor Cmis is constant (1) and can
be taken outside in the integral so that
L(θ | M, yobs ) ∝ Cmis (η)Lign (θ | yobs )
and since
f (yobs , ymis | θ) = f (yobs , ymis | ψ)
we get
L(θ | M, yobs ) ∝ Cmis (η)Lign (ψ | yobs ),
which shows that the missingness mechanism can be
ignored when concerned with likelihood inference about ψ.
For a Bayesian analysis the parameters must in addition be
independent w.r.t. the prior:

f (η, ψ) = f (η)f (ψ).

If the data are NMAR or the parameters are not separate,

then the missing data mechanism cannot be ignored.
Care must then be taken to model the mechanism
f (M | yobs , ymis , θ) and the corresponding likelihood term
must be properly included in the analysis.
Note: Ymis is MAR if data is (M, Y ), i.e. if M is considered
part of the data, since then M ⊥
⊥ Ymis | (M, Yobs , θ).
The EM algorithm

The EM algorithm is an alternative to Newton–Raphson or

the method of scoring for computing MLE in cases where
the complications in calculating the MLE are due to
incomplete observation and data are MAR, missing at
random, with separate parameters for observation and the
missing data mechanism, so the missing data mechanism
can be ignored.
Data (X, Y ) are the complete data whereas only
incomplete data Y = y are observed. (Rubin uses Y = Yobs
and X = Ymis ).
The complete data log-likelihood is:

l(θ) = log L(θ; x, y) = log f (x, y; θ).

The marginal log-likelihood or incomplete data
log-likelihood is based on y alone and is equal to
ly (θ) = log L(θ; y) = log f (y; θ).
We wish to maximize ly in θ but ly is typically quite
unpleasant: Z
ly (θ) = log f (x, y; θ) dx.

The EM algorithm is a method of maximizing the latter

iteratively and alternates between two steps, one known as
the E-step and one as the M-step, to be detailed below.
We let θ∗ be and arbitrary but fixed value, typically the
value of θ at the current iteration.
The E-step calculates the expected complete data
log-likelihood ratio q(θ | θ∗ ):

∗ f (X, y; θ)
q(θ | θ ) = Eθ∗ log |Y = y
f (X, y; θ∗ )
Z
f (x, y; θ)
= log f (x | y; θ∗ ) dx.
f (x, y; θ∗ )

The M-step maximizes q(θ | θ∗ ) in θ for for fixed θ∗ , i.e.

calculates
θ∗∗ = arg max q(θ | θ∗ ).
θ

After an E-step and subsequent M-step, the likelihood

function has never decreased.
The picture on the next overhead should show it all.
Expected and complete data likelihood

6
KL(fθy∗ : fθy ) ≥ 0

∇ly (θ∗ )
ly (θ) − ly (θ∗ )

q(θ | θ∗ ) − q(θ∗ | θ∗ )
- θ
θ∗
ly (θ) − ly (θ∗ ) = q(θ | θ∗ ) + KL(fθy∗ : fθy )
∂ ∂
∇ly (θ∗ ) = ly (θ) = q(θ | θ∗ ) .
∂θ θ=θ ∗ ∂θ θ=θ ∗
Kullback-Leibler divergence

The KL divergence between f and g is

Z
f (x)
KL(f : g) = f (x) log dx.
g(x)
Also known as relative entropy of g with respect to f .
Since − log x is a convex function, Jensen’s inequality gives
KL(f : g) ≥ 0 and KL(f : g) = 0 if and only if f = g,
since
Z Z
f (x) g(x)
KL(f : g) = f (x) log dx ≥ − log f (x) dx = 0,
g(x) f (x)
so KL divergence defines an (asymmetric) distance measure
between probability distributions.
Expected and marginal log-likelihood

Since f (x | y; θ) = f {(x, y); θ}/f (y; θ) we have

Since the KL-divergence is minimized for θ = θ∗ ,

differentiation of the above expression yields
∂ ∂
q(θ | θ∗ ) = ly (θ) .
∂θ θ=θ ∗ ∂θ θ=θ ∗
Let now θ0 = θ∗ and define the iteration

θn+1 = arg max q(θ | θn ).

Then

ly (θn+1 ) = ly (θn ) + q(θn+1 | θn ) + KL(fθyn+1 : fθyn )

≥ ly (θn ) + 0 + 0.

So the log-likelihood never decreases after a combined

E-step and M-step.
It follows that any limit point must be a saddle point or a
local maximum of the likelihood function.
Mixtures

Consider a sample Y = (Y1 , . . . , Yn ) from individual

densities

f (y; α, µ) = {αφ(y − µ) + (1 − α)φ(y)}

where φ is the normal density

1 2
φ(y) = √ e−y /2
2π
and α and µ are both unknown, 0 < α < 1.
This corresponds to a fraction α of the observations being
contaminated, or originating from a different population.
Incomplete observation

The likelihood function becomes

Y
Ly (α, µ) = {αφ(yi − µ) + (1 − α)φ(yi )}
i

is quite unpleasant, although both Newton–Raphson and

the method of scoring can be used.
But suppose we knew which observations came from which
population?
In other words, let X = (X1 , . . . , Xn ) be i.i.d. with
P (Xi = 1) = α and suppose that the conditional
distribution of Yi given Xi = 1 was N (µ, 1) whereas given
Xi = 0 it was N (0, 1), i.e. that Xi was indicating whether
Yi was contaminated or not.
Then the marginal distribution of Y is precisely the mixture
distribution and the ‘complete data likelihood’ is
Y
Lx,y (α, µ) = αxi φ(yi − µ)xi (1 − α)1−xi φ(yi )1−xi
i
P P Y
xi
∝ α (1 − α)n− xi
φ(yi − µ)xi
i

so taking logarithms we get (ignoring a constant) that

X X
lx,y (α, µ) = xi log α + n − xi log(1 − α)
X
− xi (yi − µ)2 /2.
i

If we did not know how to maximize this explicitly,

differentiation easily leads to:
X X X
α̂ = xi /n, µ̂ = xi yi / xi .

Thus, when complete data are available the frequency of

contaminated observations is estimated by the observed
frequency and the mean µ of these is estimated by the
average among the contaminated observations.
E-step and M-step

By taking expectations, we get the E-step as

q(α, µ | α∗ , µ∗ ) = Eα∗ ,µ∗ {lX,y (α, µ) | Y = y}

X X
= x∗i log α + n − x∗i log(1 − α)
X
− x∗i (yi − µ)2 /2
i

where

x∗i = Eα∗ ,µ∗ (Xi | Yi = yi ) = Pα∗ ,µ∗ (Xi = 1 | Yi = yi ).

Since this has the same form as the complete data

likelihood, just with x∗i replacing xi , the M-step simply
becomes
X X X
α∗∗ = x∗i /n, µ∗∗ = x∗i yi / x∗i ,

i.e. here the mean of the contaminated observations is

estimated by a weighted average of all the observations, the
weight being proportional to the probability that this
observation is contaminated. In effect, x∗i act as imputed
values of xi .
The imputed values x∗i needed in the E-step are calculated
as follows:

x∗i = E(Xi | Yi = yi ) = P (Xi = 1 | Yi = yi )

α∗ φ(yi − µ∗ )
= .
α∗ φ(yi − µ∗ ) + (1 − α∗ )φ(yi )
Incomplete two-way tables

As another example, let us consider a 2×-table with

n1 = {n1ij } complete observations of two binary variables I
and J, n2 = {ni+ observations where only I was observed,
and n3 = {n+j observations where only J was observed,
and let us assume that the mechanism of missingness can
be ignored.
The complete data log-likelihood is
X
log L(p) = (n1ij + n2ij + n3ij ) log pij
ij

and the E-step needs

n∗ij = n1ij + n2∗ 3∗

ij + nij
where
n2∗ 2 2 2
ij = E(Nij | p, ni+ ) = pj | i ni+

and
n3∗ 3 3 2
ij = E(Nij | p, n+j ) = pi | j n+j .

We thus get
pij pij
n2∗
ij = n2 , n3∗
ij = n3 . (3)
pi0 + pi1 i+ p0j + p1j +j
The M-step now maximizes log L(p) = ij n∗ij log pij by
P
letting
pij = (n1ij + n2∗ 3∗
ij + nij )/n (4)
where n is the total number of observations.
The EM algorithm alternates between (3) and (4) until
convergence.

Missing_Data
No ratings yet
Missing_Data
71 pages
Applied Missing Data Analysis 1st edition by Craig Enders 1606236393 978-1606236390 - Quickly download the ebook to explore the full content
100% (6)
Applied Missing Data Analysis 1st edition by Craig Enders 1606236393 978-1606236390 - Quickly download the ebook to explore the full content
89 pages
Week 10 Non Response and Missing Data (1)
No ratings yet
Week 10 Non Response and Missing Data (1)
73 pages
Emmanuel Et Al. - 2021 - A Survey on Missing Data in Machine Learning
No ratings yet
Emmanuel Et Al. - 2021 - A Survey on Missing Data in Machine Learning
37 pages
Learning Probabilistic Graphical Models in R 1st Edition David Bellot all chapter instant download
100% (5)
Learning Probabilistic Graphical Models in R 1st Edition David Bellot all chapter instant download
55 pages
meth_2024_part1_censored
No ratings yet
meth_2024_part1_censored
80 pages
FDS_U4.pptx
No ratings yet
FDS_U4.pptx
93 pages
chapter_3
No ratings yet
chapter_3
58 pages
Semiparametric Theory and Missing Data - Anastasios Tsiatis - Springer Series in Statistics, 1, 2006 - Springer - 9780387324487 - Anna's Archive
No ratings yet
Semiparametric Theory and Missing Data - Anastasios Tsiatis - Springer Series in Statistics, 1, 2006 - Springer - 9780387324487 - Anna's Archive
391 pages
Flexible Regression and Smoothing The GAMLSS Packages in R
100% (1)
Flexible Regression and Smoothing The GAMLSS Packages in R
380 pages
Ch9 2-MixturesofGaussians PDF
No ratings yet
Ch9 2-MixturesofGaussians PDF
38 pages
1.data Cleaning Screening
No ratings yet
1.data Cleaning Screening
21 pages
Ai Fundamental Midterm Quizzes - Jei
No ratings yet
Ai Fundamental Midterm Quizzes - Jei
48 pages
Missing Data 1st Edition Paul D. Allison - Quickly download the ebook to read anytime, anywhere
No ratings yet
Missing Data 1st Edition Paul D. Allison - Quickly download the ebook to read anytime, anywhere
82 pages
CH 02 Data Handling Technique
No ratings yet
CH 02 Data Handling Technique
105 pages
Parametric MMD Estimation With Missing Values- Robustness to Missingness and Data Model Misspecification
No ratings yet
Parametric MMD Estimation With Missing Values- Robustness to Missingness and Data Model Misspecification
39 pages
Marketing Analytics (Unit 2)
No ratings yet
Marketing Analytics (Unit 2)
78 pages
meth_2024_part2_missing
No ratings yet
meth_2024_part2_missing
40 pages
Speech Recognition Using Backoff N-Gram Modelling in Android Application
No ratings yet
Speech Recognition Using Backoff N-Gram Modelling in Android Application
7 pages
EMVS The EM Approach To Bayesian Variable Selection
No ratings yet
EMVS The EM Approach To Bayesian Variable Selection
20 pages
DM Missing Value
No ratings yet
DM Missing Value
21 pages
Water Resources Research - 2014 - Madadgar - Improved Bayesian Multimodeling Integration of Copulas and Bayesian Model
No ratings yet
Water Resources Research - 2014 - Madadgar - Improved Bayesian Multimodeling Integration of Copulas and Bayesian Model
18 pages
MIssing Data Imputation Using Machine Learning Algorithm
No ratings yet
MIssing Data Imputation Using Machine Learning Algorithm
11 pages
EM missing
No ratings yet
EM missing
25 pages
Medical Applications of Finite Mixture Models Full Digital Edition
100% (13)
Medical Applications of Finite Mixture Models Full Digital Edition
15 pages
Lecture10
No ratings yet
Lecture10
20 pages
Ugrd Cybs6101 Artificial Intelligence Fundamentals Midterms Exams
100% (1)
Ugrd Cybs6101 Artificial Intelligence Fundamentals Midterms Exams
58 pages
Introduction To Data Mining Clustering Analysis
No ratings yet
Introduction To Data Mining Clustering Analysis
84 pages
Emmanuel 2021 A Survey On Missing Data in Machine Learning
No ratings yet
Emmanuel 2021 A Survey On Missing Data in Machine Learning
37 pages
SPSS
No ratings yet
SPSS
92 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
Missing Data Values and How To Handle It
No ratings yet
Missing Data Values and How To Handle It
5 pages
Figueiredo EM Algorithm
No ratings yet
Figueiredo EM Algorithm
35 pages
Missing Data
No ratings yet
Missing Data
25 pages
Tobit Models A Survey PDF
No ratings yet
Tobit Models A Survey PDF
59 pages
Imputation: - Applied Multivariate Analysis & Statistical Learning
No ratings yet
Imputation: - Applied Multivariate Analysis & Statistical Learning
17 pages
moreno-betancur-chavance-2013-sensitivity-analysis-of-incomplete-longitudinal-data-departing-from-the-missing-at-random
No ratings yet
moreno-betancur-chavance-2013-sensitivity-analysis-of-incomplete-longitudinal-data-departing-from-the-missing-at-random
19 pages
rubin1976
No ratings yet
rubin1976
12 pages
Missing Data & How To Handle It
No ratings yet
Missing Data & How To Handle It
32 pages
Midterm-Lab-Exam_-Attempt-review
No ratings yet
Midterm-Lab-Exam_-Attempt-review
17 pages
DepmixS4 Package For R
No ratings yet
DepmixS4 Package For R
21 pages
Package Amelia': February 19, 2015
No ratings yet
Package Amelia': February 19, 2015
23 pages
Data Screening: Wei-Jiun, Shen Ph. D
No ratings yet
Data Screening: Wei-Jiun, Shen Ph. D
31 pages
VVImp Missing Values v14
No ratings yet
VVImp Missing Values v14
35 pages
Week 5 Lecture - Data Wrangling
No ratings yet
Week 5 Lecture - Data Wrangling
26 pages
Missing Data DAGS R448-Reprint
No ratings yet
Missing Data DAGS R448-Reprint
12 pages
CS3491 AI and ML Important Question Bank (1)
No ratings yet
CS3491 AI and ML Important Question Bank (1)
7 pages
Missing Data Mechanisms and Imputation Methods
No ratings yet
Missing Data Mechanisms and Imputation Methods
16 pages
Handling Data With Three Types of Missing Values
No ratings yet
Handling Data With Three Types of Missing Values
33 pages
Handling Missing Data
No ratings yet
Handling Missing Data
23 pages
Chapter 9.4 Allele Frequency Estimation
No ratings yet
Chapter 9.4 Allele Frequency Estimation
24 pages
AIML QB
No ratings yet
AIML QB
9 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Lecture 13. em Algorithm (After-Class)
No ratings yet
Lecture 13. em Algorithm (After-Class)
6 pages
Missingdata
No ratings yet
Missingdata
10 pages
Missing Data Analysis: University College London, 2015
No ratings yet
Missing Data Analysis: University College London, 2015
37 pages
Unit Iv L Earning
No ratings yet
Unit Iv L Earning
33 pages
CS 2008 1complete
No ratings yet
CS 2008 1complete
14 pages
Graham2009 Missing Values Analysis
No ratings yet
Graham2009 Missing Values Analysis
31 pages
m Akaba 2019
No ratings yet
m Akaba 2019
7 pages
Lec4 Missing
No ratings yet
Lec4 Missing
12 pages
Performance of A General Location Model With An Ignorable Missing-Data Assumption in A Multivariate Mental Health Services Study
No ratings yet
Performance of A General Location Model With An Ignorable Missing-Data Assumption in A Multivariate Mental Health Services Study
13 pages
Lecture 2.3.10
No ratings yet
Lecture 2.3.10
30 pages
Expectation-Maximization Algorithm
No ratings yet
Expectation-Maximization Algorithm
13 pages
AI ETE Question Bank
No ratings yet
AI ETE Question Bank
35 pages
missng data
No ratings yet
missng data
8 pages
ISAT 600 Progress Report 2
No ratings yet
ISAT 600 Progress Report 2
6 pages
Cs3491 Artificial Intelilgence and Machine Learning
No ratings yet
Cs3491 Artificial Intelilgence and Machine Learning
22 pages
Missing Data
100% (2)
Missing Data
35 pages
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
No ratings yet
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
7 pages
Jornadas de Estad Istica Aplicada, Universidad de Chimborazo, Riobamba, Ecuador, 10 - 13th June 2013
No ratings yet
Jornadas de Estad Istica Aplicada, Universidad de Chimborazo, Riobamba, Ecuador, 10 - 13th June 2013
28 pages
DADM S5 Imputation of Missing Data
No ratings yet
DADM S5 Imputation of Missing Data
15 pages
Modern Method Web in Ar May 2012
No ratings yet
Modern Method Web in Ar May 2012
45 pages
Milsap Allison
No ratings yet
Milsap Allison
18 pages
BBL Presentation March2013 Followup
No ratings yet
BBL Presentation March2013 Followup
3 pages
ml4771 Syllabus
No ratings yet
ml4771 Syllabus
2 pages
Gaussian Mixture Models
No ratings yet
Gaussian Mixture Models
5 pages
Aaps - Schafer Missing Data and Longitudinal Analysis
No ratings yet
Aaps - Schafer Missing Data and Longitudinal Analysis
59 pages
Quntative Data Analysis SPSS: Formating, Handling, & Manipulation
No ratings yet
Quntative Data Analysis SPSS: Formating, Handling, & Manipulation
22 pages
MM Algorithm
No ratings yet
MM Algorithm
28 pages
Comparison of Different Clustering Algorithms Using WEKA Tool
No ratings yet
Comparison of Different Clustering Algorithms Using WEKA Tool
3 pages
6418 Chapter 3A Meyers I Proof 2
No ratings yet
6418 Chapter 3A Meyers I Proof 2
32 pages
A New Approach To Modeling and Estimation For Pairs Trading
No ratings yet
A New Approach To Modeling and Estimation For Pairs Trading
31 pages
Missing Data: I. Types of Missing Data. There Are Several Useful Distinctions We Can Make
No ratings yet
Missing Data: I. Types of Missing Data. There Are Several Useful Distinctions We Can Make
19 pages
Missing Data Stata
No ratings yet
Missing Data Stata
18 pages
Roles of Imputation Methods For Filling The Missing Values: A Review
No ratings yet
Roles of Imputation Methods For Filling The Missing Values: A Review
9 pages
Solutions For Missing Data in Structural Equation Modeling
No ratings yet
Solutions For Missing Data in Structural Equation Modeling
6 pages
Galois Theory: Lectures Delivered at the University of Notre Dame by Emil Artin (Notre Dame Mathematical Lectures,
From Everand
Galois Theory: Lectures Delivered at the University of Notre Dame by Emil Artin (Notre Dame Mathematical Lectures,
Emil Artin
4/5 (6)
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

EM algorithm-ppt

Uploaded by

EM algorithm-ppt

Uploaded by

Missing Data and the EM algorithm

MSc Further Statistical Methods

∗ or NA denotes values that are missing , i.e. non-observed.

Different types of missingness demand different treatment.

Data matrix Y , missing data matrix M = {Mij }:

Convenient to introduce the notation Y = (Yobs , Ymis ),

Little and Rubin (2002) classify these into the following

Univariate: Mij = 0 unless j = j ∗ , e.g. an unmeasured

Complete case analysis: analyse only cases where all

Imputation methods: Find ways of estimating the values

The data are missing completely at random, MCAR, if

Heuristically, the values of Y have themselves no

f (M | Y, θ) = f (M | Yobs , θ), i.e. M ⊥⊥ Ymis | (Yobs , θ).

The classifications above of the mechanism of missingness

The most convincing treatment of missing data problems

where the factor Cmis (θ | M, y) = f (M | yobs , ymis , θ) is

When is L ∝ Lign so the missing data mechanism can be

1. The data are MAR;

If data are MAR and the missingness parameter is separate

f (η, ψ) = f (η)f (ψ).

If the data are NMAR or the parameters are not separate,

The EM algorithm is an alternative to Newton–Raphson or

l(θ) = log L(θ; x, y) = log f (x, y; θ).

The EM algorithm is a method of maximizing the latter

The M-step maximizes q(θ | θ∗ ) in θ for for fixed θ∗ , i.e.

After an E-step and subsequent M-step, the likelihood

The KL divergence between f and g is

Since f (x | y; θ) = f {(x, y); θ}/f (y; θ) we have

Since the KL-divergence is minimized for θ = θ∗ ,

θn+1 = arg max q(θ | θn ).

ly (θn+1 ) = ly (θn ) + q(θn+1 | θn ) + KL(fθyn+1 : fθyn )

So the log-likelihood never decreases after a combined

Consider a sample Y = (Y1 , . . . , Yn ) from individual

f (y; α, µ) = {αφ(y − µ) + (1 − α)φ(y)}

where φ is the normal density

The likelihood function becomes

is quite unpleasant, although both Newton–Raphson and

so taking logarithms we get (ignoring a constant) that

If we did not know how to maximize this explicitly,

Thus, when complete data are available the frequency of

By taking expectations, we get the E-step as

q(α, µ | α∗ , µ∗ ) = Eα∗ ,µ∗ {lX,y (α, µ) | Y = y}

x∗i = Eα∗ ,µ∗ (Xi | Yi = yi ) = Pα∗ ,µ∗ (Xi = 1 | Yi = yi ).

Since this has the same form as the complete data

i.e. here the mean of the contaminated observations is

x∗i = E(Xi | Yi = yi ) = P (Xi = 1 | Yi = yi )

As another example, let us consider a 2×-table with

and the E-step needs

n∗ij = n1ij + n2∗ 3∗

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.