CS464 Ch3 Estimation
CS464 Ch3 Estimation
Estimation
• Density Estimation
– Maximum Likelihood Estimator (MLE)
– Maximum A Posteriori Estimate (MAP)
• Where do we get these probability estimates?
Density Estimation
• We assume that the variable of interest is sampled
from a distribution
D={ }}
Thumbtack- Bernoulli Trial
D={ }}
D={ } }
• Flips produce a data set D
p(x)
Learning Parameters For a Gaussian
• Assume we have i.i.d data
xi Exam Scores
• Learn the parameters 0 80
– The mean, µ 1 70
– Standard deviation, σ 2 12
…
3 99
Learning a Gaussian Distribution
Learning a Gaussian Distribution
MLE for the Mean
MLE for the Mean
MLE for the Mean
MLE for the Mean
MLE for the Mean
MLE for the Variance
MLE for the Variance
MLE for the Variance
MLE for the Variance
MLE of Gaussian Parameters
The MLE for the variance of a Gaussian is biased. That is, the expected value of the
estimator is not equal to the true parameter. An unbiased variance estimator:
What if we have prior beliefs?
• Billionaire says wait, I think the thumbtack is close
to 50-50. How can you use this information?
MLE
Maximum A Posteriori(MAP) Approximation
MAP estimation
• Our prior could be in the form of a probability
distribution
• Priors can have different forms
• Uninformative prior:
– Uniform distribution
• Conjugate prior:
- Prior and the posterior have the same form
Posterior Distribution
Beta Distribution
0≤𝜃≤1
1 #$% &$%
𝛼, 𝛽 > 0
𝑝 𝜃 = ∗𝜃 ∗ (1 − 𝜃)
𝐵(𝛼, 𝛽)
Posterior Distribution
1 Flip it N times, and k times it was head.
𝑝 𝜃 = ∗ 𝜃 #$% ∗ (1 − 𝜃)&$%
𝐵(𝛼, 𝛽)
N=3
k=1
1 Flip it N times, and k times it was head.
𝑝 𝜃 = ∗ 𝜃 #$% ∗ (1 − 𝜃)&$%
𝐵(𝛼, 𝛽)
N=3
k=1
𝛼, 𝛽 = 2
𝑘 1
𝜃'*+ = =
𝑁 3
𝑘+𝛼−1 2
𝜃'() = =
𝑁+𝛼+𝛽−2 5
𝑘 1
𝜃'*+ = =
𝑁 3
𝑘+𝛼−1
𝜃'() =
𝑁+𝛼+𝛽−2
Bayesian Estimation
• For the parameters to estimate we assign them an a priori
distribution, which is used to capture our prior belief about
the parameter