0% found this document useful (0 votes)

32 views50 pages

Learning With Maximum Likelihood: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University

Uploaded by

tsabharwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views50 pages

Learning With Maximum Likelihood: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University

Uploaded by

tsabharwal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 50

Learning with

Maximum Likelihood
Note to other teachers and users of these
slides. Andrew would be delighted if you
Andrew W. Moore
Professor
found this source material useful in
giving your own lectures. Feel free to use
these slides verbatim, or to modify them
to fit your own needs. PowerPoint
originals are available. If you make use School of Computer Science
of a significant portion of these slides in
your own lecture, please include this
message, or the following link to the Carnegie Mellon University
source repository of Andrew’s tutorials:
http://www.cs.cmu.edu/~awm/tutorials . www.cs.cmu.edu/~awm
Comments and corrections gratefully
received. awm@cs.cmu.edu
412-268-7599

Copyright © 2001, 2004, Andrew W. Moore Sep 6th, 2001

Maximum Likelihood learning of
Gaussians for Data Mining
• Why we should care
• Learning Univariate Gaussians
• Learning Multivariate Gaussians
• What’s a biased estimator?
• Bayesian Learning of Gaussians

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 2

Why we should care
• Maximum Likelihood Estimation is a very
very very very fundamental part of data
analysis.
• “MLE for Gaussians” is training wheels for
our future techniques
• Learning Gaussians is more useful than you
might guess…

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 3

Learning Gaussians from Data
• Suppose you have x1, x2, … xR ~ (i.i.d) N(,2)
• But you don’t know 
(you do know 2)

MLE: For which  is x1, x2, … xR most likely?

MAP: Which  maximizes p(|x1, x2, … xR , 2)?

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 4

Learning Gaussians from Data
• Suppose you have x1, x2, … xR ~(i.i.d) N(,2)
• But you don’t know 
Sneer
(you do know 2)

MLE: For which  is x1, x2, … xR most likely?

MAP: Which  maximizes p(|x1, x2, … xR , 2)?

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 5

Learning Gaussians from Data
• Suppose you have x1, x2, … xR ~(i.i.d) N(,2)
• But you don’t know 
Sneer
(you do know 2)

MLE: For which  is x1, x2, … xR most likely?

MAP: Which  maximizes p(|x1, x2, … xR , 2)?

Despite this, we’ll spend 95% of our time on MLE. Why? Wait and see…

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 6

MLE for univariate Gaussian
• Suppose you have x1, x2, … xR ~(i.i.d) N(,2)
• But you don’t know  (you do know 2)
• MLE: For which  is x1, x2, … xR most likely?

 mle  arg max p( x1 , x2 ,...xR |  ,  2 )



Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 7

Algebra Euphoria
 mle  arg max p( x1 , x2 ,...xR |  ,  2 )


= (by i.i.d)

= (monotonicity of
log)

= (plug in formula
for Gaussian)

= (after
simplification)

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 8

Algebra Euphoria
 mle  arg max p( x1 , x2 ,...xR |  ,  2 )

R
= arg max  p ( xi |  ,  )
2 (by i.i.d)
 i 1
R
= arg max log p ( x |  ,  2 ) (monotonicity of
 
i
i 1
log)

= arg max 1 R
( x   ) 2
(plug in formula



2  i 1
 i
2 2 for Gaussian)

= R (after
arg min  ( xi   ) 2 simplification)
 i 1

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 9

Intermission: A General Scalar
MLE strategy
Task: Find MLE  assuming known form for p(Data| ,stuff)
1. Write LL = log P(Data| ,stuff)
2. Work out LL/ using high-school calculus
3. Set LL/=0 for a maximum, creating an equation in
terms of 
4. Solve it*
5. Check that you’ve found a maximum rather than a
minimum or saddle-point, and be careful if  is
constrained

*This is a perfect example of something that works perfectly in

all textbook examples and usually involves surprising pain if
you need it for something new.
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 10
The MLE 
 mle  arg max p( x1 , x2 ,...xR |  ,  2 )

R
 arg min  ( xi   ) 2
 i 1

LL
  s.t. 0  


= (what?)

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 11

The MLE 
 mle  arg max p( x1 , x2 ,...xR |  ,  2 )

R
 arg min  ( xi   ) 2
 i 1

LL  R
  s.t. 0 

 
 i 1
( xi   ) 2

R
  2( xi   )
i 1

1 R
Thus    xi
R i 1
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 12
Lawks-a-lawdy!
1 R
 mle   xi
R i 1

• The best estimate of the mean of a

distribution is the mean of the sample!
At first sight:
This kind of pedantic, algebra-filled and
ultimately unsurprising fact is exactly the
reason people throw down their
“Statistics” book and pick up their “Agent
Based Evolutionary Data Mining Using
The Neuro-Fuzz Transform” book.

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 13

A General MLE strategy
Suppose = (1, 2, …, n)T is a vector of parameters.
Task: Find MLE  assuming known form for p(Data| ,stuff)
1. Write LL = log P(Data| ,stuff)
2. Work out LL/ using high-school calculus
 LL 
 
 θ1 
 LL 
LL 
 θ2 
θ  
  
 LL 
 θ 
 n
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 14
A General MLE strategy
Suppose = (1, 2, …, n)T is a vector of parameters.
Task: Find MLE  assuming known form for p(Data| ,stuff)
1. Write LL = log P(Data| ,stuff)
2. Work out LL/ using high-school calculus
3. Solve the set of simultaneous equations
LL
0
θ1
LL
0
θ2

LL
0
θn
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 15
A General MLE strategy
Suppose = (1, 2, …, n)T is a vector of parameters.
Task: Find MLE  assuming known form for p(Data| ,stuff)
1. Write LL = log P(Data| ,stuff)
2. Work out LL/ using high-school calculus
3. Solve the set of simultaneous equations
LL
0
θ1
LL
0 4. Check that you’re at
θ2
a maximum

LL
0
θn
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 16
A General MLE strategy
Suppose = (1, 2, …, n)T is a vector of parameters.
Task: Find MLE  assuming known form for p(Data| ,stuff)
1. Write LL = log P(Data| ,stuff)
2. Work out LL/ using high-school calculus
3. Solve the set of simultaneous equations
LL
0
If you can’t solve them, θ1
what should you do? LL
0 4. Check that you’re at
θ2
a maximum

LL
0
θn
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 17
MLE for univariate Gaussian
• Suppose you have x1, x2, … xR ~(i.i.d) N(,2)
• But you don’t know  or 2
• MLE: For which  =(,2) is x1, x2,…xR most likely?
R
1 1
log p ( x1 , x2 ,...xR |  ,  )   R (log   log  ) 
2

2
2

2 2
 i
( x
i 1
 ) 2

LL 1 R

 
 2  (x
i 1
i  )

LL R 1 R

 2

2 2

2 4
 i
( x
i 1
 ) 2

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 18

MLE for univariate Gaussian
• Suppose you have x1, x2, … xR ~(i.i.d) N(,2)
• But you don’t know  or 2
• MLE: For which  =(,2) is x1, x2,…xR most likely?
R
1 1
log p ( x1 , x2 ,...xR |  ,  )   R (log   log  ) 
2

2
2

2 2
 i
( x
i 1
 ) 2

R
1
0
 2  (x
i 1
i  )

R
R 1
0
2 2

2 4
 i
( x
i 1
 ) 2

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 19

2 2 2
 i
( x
i 1
 ) 2

R
1 1 R
0  2  ( xi  )     xi
 i 1 R i 1
R 1 R
4 
0  ( x  ) 2
 what?
2 2 i 1
2 i

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 20

MLE for univariate Gaussian
• Suppose you have x1, x2, … xR ~(i.i.d) N(,2)
• But you don’t know  or 2
• MLE: For which  =(,2) is x1, x2,…xR most likely?

1 R
 mle   xi
R i 1

1 R
 mle
2
  ( xi  mle ) 2
R i 1

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 21

Unbiased Estimators
• An estimator of a parameter is unbiased if the expected
value of the estimate is the same as the true value of the
parameters.
• If x1, x2, … xR ~(i.i.d) N(,2) then

 1 R

E[  ]  E   xi   
mle

 R i 1 

mle is unbiased

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 22

Biased Estimators
• An estimator of a parameter is biased if the expected value
of the estimate is different from the true value of the
parameters.
• If x1, x2, … xR ~(i.i.d) N(,2) then

  
2

  1 
R R R
1 1
E  mle
2
 E   ( xi  mle ) 2   E    xi   x j     2
 R i 1   R  i 1 R j 1  

2mle is biased

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 23

MLE Variance Bias
• If x1, x2, … xR ~(i.i.d) N(,2) then
1  R   
2

  1 2
R
1
E  mle
2
E  
 
 R  i 1
xi   x 
j
 
R j 1    R 
 1     2

Intuition check: consider the case of R=1

Why should our guts expect that 2mle would be an

underestimate of true 2?
How could you prove that?

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 24

Unbiased estimate of Variance
• If x1, x2, … xR ~(i.i.d) N(,2) then
1  R   
2

  1 2
R
1
E  mle
2
E  
 
 R  i 1
xi   x 
j
 
R j 1    R 
 1     2

 mle
2
 unbiased 
 
2
So define So E  unbiased
2
2
 1
1  
 R

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 25

Unbiased estimate of Variance
• If x1, x2, … xR ~(i.i.d) N(,2) then
1  R   
2

  1 2
R
1
E  mle
2
E  
 
 R  i 1
xi   x 
j
 
R j 1    R 
 1     2

 mle
2
 unbiased   
2
So define
 1 So E  unbiased
2
2
1  
 R

1 R
 2
unbiased  
R  1 i 1
( xi  mle 2
)

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 26

Unbiaseditude discussion
• Which is best?
1 R
 mle
2
  ( xi  mle ) 2
R i 1

1 R
 unbiased
2
 
R  1 i 1
( xi  mle 2
)

Answer:
•It depends on the task
•And doesn’t make much difference once R--> large

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 27

Don’t get too excited about being
unbiased
• Assume x1, x2, … xR ~(i.i.d) N(,2)
• Suppose we had these estimators for the mean
R
1
 suboptimal 
R7 R
x
i 1
i

Are either of these unbiased?

 crap  x1 Will either of them asymptote to the
correct value as R gets large?
Which is more useful?

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 28

MLE for m-dimensional Gaussian
• Suppose you have x1, x2, … xR ~(i.i.d) N(,)
• But you don’t know  or 
• MLE: For which  =(,) is x1, x2, … xR most likely?

1 R
μ mle
  xk
R k 1

Σ mle 1 R

  x k  μ mle x k  μ mle
R k 1
 
T

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 29

MLE for m-dimensional Gaussian
• Suppose you have x1, x2, … xR ~(i.i.d) N(,)
• But you don’t know  or 
• MLE: For which  =(,) is x1, x2, … xR most likely?

1 R 1 R Where 1  i  m
μ mle
  xk mle
μ
i   x ki
R k 1 R k 1 And xki is value of the
ith component of xk
Σ mle 1 R
R k 1

  x k   mle x k   mle  
T
(the ith attribute of
the kth record)

And imle is the ith

component of mle
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 30
MLE for m-dimensional Gaussian
• Suppose you have x1, x2, … xR ~(i.i.d) N(,)
• But you don’t know  or 
• MLE: For which  =(,) is x1, x2, … xR most likely?
Where 1  i  m, 1  j  m
1 R
μ mle
  xk And xki is value of the ith
R k 1 component of xk (the ith
attribute of the kth record)
Σ mle 1 R

  x k   mle x k   mle
R k 1
 T

And ijmle is the (i,j)th

component of mle
 ijmle
1 R
 
  x ki   imle x kj   mle
R k 1
j 
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 31
MLE for m-dimensional Gaussian
Q: How would you prove this?
• Suppose you have x1, x2, … xRA:~(i.i.d)
Just plug N(,)
through the MLE
recipe.
• But you don’t know  or 
Note how mle is forced to be
• MLE: For which  =(,) is xsymmetric xR most likely?
1, x2, … non-negative definite
Note the unbiased case
1 R
μ mle
  xk How many datapoints would you
R k 1 need before the Gaussian has a
chance of being non-degenerate?

Σ mle 1 R

  x k   mle x k   mle
R k 1
 T

Σ unbiased

Σ mle

1 R
1 R 1 
x k  mle
x 
k   mle
T

We need to discuss how accurate we expect mle and mle to be as a

function of R
And we need to consider how to estimate these accuracies from
data…
•Analytically *
•Non-parametrically (using randomization and bootstrapping) *
But we won’t. Not yet.
*Will be discussed in future Andrew lectures…just
before we need this technology.

Structural error
Actually, we need to talk about something else too..
What if we do all this analysis when the true distribution is in fact
not Gaussian?
How can we tell? *
How can we survive? *
*Will be discussed in future Andrew lectures…just
before we need this technology.

Gaussian MLE in action
Using R=392 cars from the
“MPG” UCI dataset supplied
by Ross Quinlan

Data-starved Gaussian MLE
Using three subsets of
MPG.
Each subset has 6
randomly-chosen cars.

Bivariate MLE in action

Multivariate MLE

Covariance matrices are not exciting to look at

Being Bayesian: MAP estimates for Gaussians
• Suppose you have x1, x2, … xR ~(i.i.d) N(,)
• But you don’t know  or 
• MAP: Which (,) maximizes p(, |x1, x2, … xR)?
Step 1: Put a prior on (,)
Step 1a: Put a prior on 
0-m-1) ~ IW(0, 0-m-1) 0 )
This thing is called the Inverse-Wishart
distribution.
A PDF over SPD matrices!

Being Bayesian: MAP estimates for Gaussians
0 small: “I am not sure
• Suppose
about myyou of x0 1“, x2,
have
guess … xR ~(i.i.d) N(,)
0 : (Roughly) my best
• But you don’t know  or  guess of 
0 large: “I’m pretty sure
• MAP: Which (,) maximizes p(, |x1, x2, … xR)?
about my guess of 0 “ 0
Step 1: Put a prior on (,)
Step 1a: Put a prior on 
0-m-1) ~ IW(0, 0-m-1) 0 )
This thing is called the Inverse-Wishart
distribution.
A PDF over SPD matrices!

Being Bayesian: MAP estimates for Gaussians
• Suppose you have x1, x2, … xR ~(i.i.d) N(,)
• But you don’t know  or 
• MAP: Which (,) maximizes p(, |x1, x2, … xR)?
Step 1: Put a prior on (,)
Step 1a: Put a prior on 
0-m-1) ~ IW(0, 0-m-1)0 ) Together, “” and
“ | ” define a
Step 1b: Put a prior on  | 
joint distribution
 | ~ N(0 , / 0) on (,)

Being Bayesian: MAP estimates for Gaussians
• Suppose you have x1, x2, … xR ~(i.i.d) N(,)
• But you don’t know  or  0 small: “I am not sure
about my guess of 0 “
• MAP: Which (,) maximizes p(, |x1, x2, … xR)?
0Step
: My1:best
Putguess  (,)
a priorofon 0 large: “I’m pretty sure
about my guess of 0 “
Step E
1a: Put a0 prior on 
0-m-1) ~ IW(0, 0-m-1)0 ) Together, “” and
“ | ” define a
Step 1b: Put a prior on  | 
joint distribution
 | ~ N(0 , / 0) on (,)
Notice how we are forced to express our
ignorance of proportionally to 

Being Bayesian: MAP estimates for Gaussians
• Suppose you have x1, x2, … xR ~(i.i.d) N(,)
• But you don’t know  or 
• MAP: Which (,) maximizes p(, |x1, x2, … xR)?
Step 1: Put a prior on (,) Why do we use this form
of prior?
Step 1a: Put a prior on 
0-m-1) ~ IW(0, 0-m-1)0 )
Step 1b: Put a prior on  | 
 | ~ N(0 , / 0)

Being Bayesian: MAP estimates for Gaussians
• Suppose you have x1, x2, … xR ~(i.i.d) N(,)
• But you don’t know  or 
• MAP: Which (,) maximizes p(, |x1, x2, … xR)?
Why do we use this form of
Step 1: Put a prior on (,) prior?
Step 1a: Put a prior on  Actually, we don’t have to

0-m-1) ~ IW(0, 0-m-1)0 ) But it is computationally and

algebraically convenient…
Step 1b: Put a prior on  |  …it’s a conjugate prior.
 | ~ N(0 , / 0)

Being Bayesian: MAP estimates for Gaussians
• Suppose you have x1, x2, … xR ~(i.i.d) N(,)
• MAP: Which (,) maximizes p(, |x1, x2, … xR)?
Step 1: Prior: 0-m-1) ~ IW(0, 0-m-1)0 ),  | ~ N(0 , / 0)
Step 2:  0 μ 0  Rx  R   0  R
1 R
x   xk μ R 
R k 1 0  R     R
R 0


R
 
( R  m  1) Σ R  ( 0  m  1) Σ 0   x k  x x k  x 
T x  μ 0 x  μ 0 
T

k 1 1/  0 1/ R
Step 3: Posterior: (R+m-1) ~ IW(R, (R+m-1) R ),
 | ~ N(R , / R)

Result: map = R, E[ |x1, x2, … xR ]= R

• x1 , x2 , …
Suppose you have•Conjugate xR mean
priors ~(i.i.d) N(,)
prior form and posterior
form are same and characterized by “sufficient
• MAP: Which (,)statistics” maximizes data. |x , x , … x )?
of the p(,
1 2 R
Step 1: Prior: 0-m-1) ~ •IW(
The marginal distribution on is a student-t
0, 0-m-1)0 ),  | ~ N(0 , / 0)
•One point of view: it’s pretty academic if R > 30
Step 2:  μ  Rx
1R
 R 0  R
x   xk μ R  0 0
R k 1 0  R R  0  R


R
 
( R  m  1) Σ R  ( 0  m  1) Σ 0   x k  x x k  x 
T x  μ 0 x  μ 0 
T

k 1 1/  0 1/ R
Step 3: Posterior: (R+m-1) ~ IW(R, (R+m-1) R ),
 | ~ N(R , / R)

Result: map = R, E[ |x1, x2, … xR ]= R

Categorical Real-valued Mixed Real /

inputs only inputs only Cat okay

Predict Joint BC
Inputs

Dec Tree
Classifier category Naïve BC

Joint DE Gauss DE
Inputs Inputs

Density Prob-
Estimator ability Naïve DE
Predict
Regressor real no.

What you should know
• The Recipe for MLE
• What do we sometimes prefer MLE to MAP?
• Understand MLE estimation of Gaussian
parameters
• Understand “biased estimator” versus
“unbiased estimator”
• Appreciate the outline behind Bayesian
estimation of Gaussian parameters

Useful exercise
• We’d already done some MLE in this class
without even telling you!
• Suppose categorical arity-n inputs x1, x2, …
xR~(i.i.d.) from a multinomial
M(p1, p2, … pn)
where
P(xk=j|p)=pj
• What is the MLE p=(p1, p2, … pn)?
Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 50

Energy Virtual Work Diagram
100% (1)
Energy Virtual Work Diagram
2 pages
Maximum Likelihood Learning of Gaussians For Data Mining
No ratings yet
Maximum Likelihood Learning of Gaussians For Data Mining
25 pages
Mathematical Statistics (MA212M) : Lecture Slides
No ratings yet
Mathematical Statistics (MA212M) : Lecture Slides
14 pages
Lecture 03 Maximum Likelihood Estimation
No ratings yet
Lecture 03 Maximum Likelihood Estimation
22 pages
L08 MaximumLikelihoodEstimation
No ratings yet
L08 MaximumLikelihoodEstimation
5 pages
Lec8 MLE
No ratings yet
Lec8 MLE
35 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
46 pages
CS775 Lec 2
No ratings yet
CS775 Lec 2
66 pages
Inf 2
No ratings yet
Inf 2
37 pages
Asset-V1 ColumbiaX+CSMM.102x+1T2018+type@asset+block@ML Lecture1
No ratings yet
Asset-V1 ColumbiaX+CSMM.102x+1T2018+type@asset+block@ML Lecture1
17 pages
MLEstimation
No ratings yet
MLEstimation
8 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
Model Inference and Averaging: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
No ratings yet
Model Inference and Averaging: Dept. Computer Science & Engineering, Shanghai Jiao Tong University
51 pages
Maximum-Likelihood & Bayesian Parameter Estimation: Srihari: CSE 555
No ratings yet
Maximum-Likelihood & Bayesian Parameter Estimation: Srihari: CSE 555
9 pages
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
No ratings yet
Maximum Likelihood Estimation: Guy Lebanon February 19, 2011
6 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
Topic3 Formalizing Estimation Oct112023
No ratings yet
Topic3 Formalizing Estimation Oct112023
55 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
Maximum Likelihood Homework
100% (1)
Maximum Likelihood Homework
8 pages
Statistical Inference: Classical and Bayesian Methods
No ratings yet
Statistical Inference: Classical and Bayesian Methods
22 pages
MLE Assingnment
No ratings yet
MLE Assingnment
7 pages
ML Columbia PDF
No ratings yet
ML Columbia PDF
615 pages
Experiment 1
No ratings yet
Experiment 1
5 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
02 Review Estimation 2
No ratings yet
02 Review Estimation 2
36 pages
12 MLEFilled
No ratings yet
12 MLEFilled
8 pages
Lecture03c Maximum Likelihood Annotated
No ratings yet
Lecture03c Maximum Likelihood Annotated
8 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Likelihood EM HMM Kalman
No ratings yet
Likelihood EM HMM Kalman
46 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
16 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
11 pages
Sta255 Week 11-2 Pre
No ratings yet
Sta255 Week 11-2 Pre
21 pages
Chapte 2 - Maximum Likelihood - HEC - Lausanne
No ratings yet
Chapte 2 - Maximum Likelihood - HEC - Lausanne
276 pages
Wk04 Machine Learning
No ratings yet
Wk04 Machine Learning
6 pages
Chapter 2: Maximum Likelihood Estimation: Advanced Econometrics - HEC Lausanne
No ratings yet
Chapter 2: Maximum Likelihood Estimation: Advanced Econometrics - HEC Lausanne
207 pages
Lecture 19 and 20
No ratings yet
Lecture 19 and 20
27 pages
Ps 2,3
No ratings yet
Ps 2,3
48 pages
Maximum
No ratings yet
Maximum
3 pages
Lecture03c Maximum Likelihood
No ratings yet
Lecture03c Maximum Likelihood
8 pages
Chapter 2 - Maximum Likelihood - HEC - Lausanne
No ratings yet
Chapter 2 - Maximum Likelihood - HEC - Lausanne
277 pages
Lecture1 ML MLE
No ratings yet
Lecture1 ML MLE
103 pages
3maximum Likelyhood
No ratings yet
3maximum Likelyhood
15 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
14 pages
Maximum Likelihood
No ratings yet
Maximum Likelihood
7 pages
Unsupervised Learning Clustering Math
No ratings yet
Unsupervised Learning Clustering Math
28 pages
MIT18 05S14 Reading10b PDF
No ratings yet
MIT18 05S14 Reading10b PDF
9 pages
7 Mle
No ratings yet
7 Mle
31 pages
ML Notes
No ratings yet
ML Notes
4 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
21 Mle
No ratings yet
21 Mle
24 pages
Lecture17 Mle Map
No ratings yet
Lecture17 Mle Map
29 pages
Unit 04 - Maximum Likelihood Estimation - 1 Per Page
No ratings yet
Unit 04 - Maximum Likelihood Estimation - 1 Per Page
62 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Introduction to Minimax
From Everand
Introduction to Minimax
V. F. Dem’yanov
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Kessell
No ratings yet
Kessell
40 pages
Internet of Everything
No ratings yet
Internet of Everything
48 pages
Neural Networks
No ratings yet
Neural Networks
60 pages
Quiz - MCQ - 1-25-2024, 5-12-41 PM
No ratings yet
Quiz - MCQ - 1-25-2024, 5-12-41 PM
11 pages
Module5-Cyber and Information Security
No ratings yet
Module5-Cyber and Information Security
67 pages
L5 Evaluating Materials
No ratings yet
L5 Evaluating Materials
11 pages
G12 Phy Sci P2 June 2025 Marking Guidelines
No ratings yet
G12 Phy Sci P2 June 2025 Marking Guidelines
13 pages
Norm Referenced Interpretation
No ratings yet
Norm Referenced Interpretation
1 page
Resume Piping Superintendent Gedeandi
No ratings yet
Resume Piping Superintendent Gedeandi
5 pages
95SCS-4 Sr. No. 10 Examination of Marine Engineer Officer
No ratings yet
95SCS-4 Sr. No. 10 Examination of Marine Engineer Officer
6 pages
Choosing A Course Booklet 2022
No ratings yet
Choosing A Course Booklet 2022
9 pages
4 Transpiration
No ratings yet
4 Transpiration
15 pages
Working of A Human Ear: PHASE:-#02. Chapter: - Sound
No ratings yet
Working of A Human Ear: PHASE:-#02. Chapter: - Sound
14 pages
Ai For IT Coders
No ratings yet
Ai For IT Coders
18 pages
Asian Countries
No ratings yet
Asian Countries
4 pages
6BT - 6BTA ReCon - Cummins Inc
No ratings yet
6BT - 6BTA ReCon - Cummins Inc
7 pages
(FREE PDF Sample) Mostly Codeless Game Development: New School Game Engines Robert Ciesla Ebooks
100% (2)
(FREE PDF Sample) Mostly Codeless Game Development: New School Game Engines Robert Ciesla Ebooks
55 pages
The Elements and Principles of Art
No ratings yet
The Elements and Principles of Art
4 pages
Value Creation Through Mergers and Acquistion - Eicher Motors
No ratings yet
Value Creation Through Mergers and Acquistion - Eicher Motors
21 pages
Final Exam Study Guide 3010 F17
No ratings yet
Final Exam Study Guide 3010 F17
22 pages
Yield Certificate RAMULIFHO CARLSWALD NORTH ESTATE - 18 APRIL 2024
No ratings yet
Yield Certificate RAMULIFHO CARLSWALD NORTH ESTATE - 18 APRIL 2024
1 page
Thesis Topics For Electronics and Communication Engineering
100% (3)
Thesis Topics For Electronics and Communication Engineering
4 pages
Unit - 6 Promotion Decisions: Jacqueline
No ratings yet
Unit - 6 Promotion Decisions: Jacqueline
22 pages
Contoh Soal - Imrona-Ngantang 1
No ratings yet
Contoh Soal - Imrona-Ngantang 1
3 pages
Analysis of Organic Acids 2370 PDF
No ratings yet
Analysis of Organic Acids 2370 PDF
22 pages
798 - Section 06
No ratings yet
798 - Section 06
6 pages
Cumulative Records of Students (Cures)
No ratings yet
Cumulative Records of Students (Cures)
4 pages
Oscar Ccoa Codes v1
No ratings yet
Oscar Ccoa Codes v1
247 pages
1 Text For Reading Comprehension
100% (1)
1 Text For Reading Comprehension
3 pages
Attitude Is Everything
No ratings yet
Attitude Is Everything
27 pages
#10 - Energy Balance - 01 (Rev01)
No ratings yet
#10 - Energy Balance - 01 (Rev01)
48 pages
الدراما والمسرح التربوي كمدخل علاجي لصعوبات التعلم
No ratings yet
الدراما والمسرح التربوي كمدخل علاجي لصعوبات التعلم
18 pages
CPM18th Care of Older Persons
No ratings yet
CPM18th Care of Older Persons
11 pages
(Ebook PDF) Linear Algebra With Applications 9th Edition Download
100% (2)
(Ebook PDF) Linear Algebra With Applications 9th Edition Download
50 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Learning With Maximum Likelihood: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University

Uploaded by

Learning With Maximum Likelihood: Andrew W. Moore Professor School of Computer Science Carnegie Mellon University

Uploaded by

Learning with

Copyright © 2001, 2004, Andrew W. Moore Sep 6th, 2001

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 2

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 3

MLE: For which  is x1, x2, … xR most likely?

MAP: Which  maximizes p(|x1, x2, … xR , 2)?

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 4

MLE: For which  is x1, x2, … xR most likely?

MAP: Which  maximizes p(|x1, x2, … xR , 2)?

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 5

MLE: For which  is x1, x2, … xR most likely?

MAP: Which  maximizes p(|x1, x2, … xR , 2)?

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 6

 mle  arg max p( x1 , x2 ,...xR |  ,  2 )

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 7

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 8

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 9

*This is a perfect example of something that works perfectly in

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 11

• The best estimate of the mean of a

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 13

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 18

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 19

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 20

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 21

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 22

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 23

Intuition check: consider the case of R=1

Why should our guts expect that 2mle would be an

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 24

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 25

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 26

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 27

Are either of these unbiased?

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 28

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 29

And imle is the ith

And ijmle is the (i,j)th

We need to discuss how accurate we expect mle and mle to be as a

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 33

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 34

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 35

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 36

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 37

Covariance matrices are not exciting to look at

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 38

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 39

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 40

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 41

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 42

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 43

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 44

0-m-1) ~ IW(0, 0-m-1)0 ) But it is computationally and

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 45

Result: map = R, E[ |x1, x2, … xR ]= R

Result: map = R, E[ |x1, x2, … xR ]= R

Categorical Real-valued Mixed Real /

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 48

Copyright © 2001, 2004, Andrew W. Moore Maximum Likelihood: Slide 49

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.