Open navigation menu

Scribd

0% found this document useful (0 votes)

16 views56 pages

CS464 Ch3 Estimation

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views56 pages

CS464 Ch3 Estimation

Uploaded by

Copyright

© © All Rights Reserved

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

CS464

Introduction to Machine Learning

Estimation

(slides based on the slides provided by Öznur Taştan

and Mehmet Koyutürk)
Motivation
• In machine learning, we are trying to figure out the
relationship between variables (features and
outcomes)
– For this purpose, we use a model (an assumption on the
structure of this relationship)
– A probability distribution usually serves as a good model
– How do we use observations to learn this distribution?

• Density Estimation
– Maximum Likelihood Estimator (MLE)
– Maximum A Posteriori Estimate (MAP)
• Where do we get these probability estimates?
Density Estimation
• We assume that the variable of interest is sampled
from a distribution

• We have some observations on the variable

• How do we use observations to learn the

distribution?
Density Estimation
• A billionaire asks you a question:

• He says: I have a thumbtack, if I ﬂip it, what’s the

probability it will fall with the nail up (heads)?

• You say: Please ﬂip it a few times…

Data
• The billionaire flips the thumbnail 5 times:

• You say the probability that it falls with the nail up

• Why frequency of heads?

• How good is this estimation?
• Why is this a machine learning problem?
Why frequency of heads?
• Frequency of heads is exactly the maximum
likelihood estimator for this problem
Thumbtack- Bernoulli Trial

D={ }}
Thumbtack- Bernoulli Trial

D={ }}

• Flips produce a data set D

• Flips are independent, identically distributed and each is

a Bernoulli trial.
Thumbtack- Bernoulli Trial

D={ } }
• Flips produce a data set D

• Flips are independent, identically distributed and each is a

Bernoulli trial.

• Maximum Likelihood Estimator (MLE):

Choose θ that maximizes the probability of
observed data
Estimation vs. Learning

• Density estimation is a learning problem too:

– Data: Observed set of flips with with 𝛼! heads and 𝛼 " tails
– Model: Bernoulli distribution
– Learning: Finding 𝜃 , which is an optimization problem

• Once we estimate 𝜃, we can predict the probability of the

next flip being a head
– We can do more than that too: For example, predict the number of
heads in the next 100 flips
Maximum Likelihood Estimation
MLE: Choose θ that maximizes the probability of
observed data (likelihood of the data)

The likelihood of observing this data is the joint probability:

Maximum likelihood estimate of θ :

The likelihood of observing this data is the joint probability:

Your First Parameter Learning Algorithm

• Why do we take the log?

Your First Parameter Learning Algorithm

• Why do we take the log?

– Joint probabilities are often in the form of multiplications
and exponents (comes from the independence
assumption)
– Log transforms multiplication to addition
– The resulting equations are easier to manage

• Take derivative and set it to 0

Your First Parameter Learning Algorithm

• Take derivative and set it to 0

Your First Parameter Learning Algorithm

• Take derivative and set it to 0

Your First Parameter Learning Algorithm

• Take derivative and set it to 0

Your First Parameter Learning Algorithm

• Take derivative and set it to 0

Maximum Likelihood Estimation
How Many Flips Do I Need?
• Your answer to the billionaire

• He says: “While you have been calculating, I flipped 50

times, 30 times it was head”. He asks what is your
answer now?
• You say: 30 / 50 = 3/5
• He says: Did I wasted my time flipping more?
• You say: No! On the contrary, the more data the merrier
• This is why….
How Many Flips Do I Need?
• Your answer to the billionaire

• He says: “While you have been calculating, I flipped 50

times, 30 times it was head”. He asks what is your
answer now?
• You say: 30 / 50 = 3/5
• He says: Did I waste my time flipping more?
• You say: No! On the contrary, the more data the merrier
• This is why….
How Many Flips Do I Need?
• Your answer to the billionaire

• He says: “While you have been calculating, I flipped 50

times, 30 times it was head”. He asks what is your
answer now?
• You say: 30 / 50 = 3/5
• He says: Did I waste my time flipping more?
• You say: No! On the contrary, the more data the merrier
• This is why….
A Bound (from Hoeffding’s Inequality)
Probably Approximately Correct
What if we have a continuous variable?
• What if we are measuring a continuous variable?

p(x)
Learning Parameters For a Gaussian
• Assume we have i.i.d data
xi Exam Scores
• Learn the parameters 0 80
– The mean, µ 1 70

– Standard deviation, σ 2 12
…
3 99
Learning a Gaussian Distribution
Learning a Gaussian Distribution
MLE for the Mean
MLE for the Mean
MLE for the Mean
MLE for the Mean
MLE for the Mean
MLE for the Variance
MLE for the Variance
MLE for the Variance
MLE for the Variance
MLE of Gaussian Parameters

The MLE for the variance of a Gaussian is biased. That is, the expected value of the
estimator is not equal to the true parameter. An unbiased variance estimator:
What if we have prior beliefs?
• Billionaire says wait, I think the thumbtack is close
to 50-50. How can you use this information?

• You say: I can learn it the Bayesian way.

Bayesian Rule
• What if we have prior beliefs?
Utilizing prior information
Utilizing prior information
Utilizing prior information
Bayesian Rule
Maximum A Posteriori (MAP) Estimation

MLE
Maximum A Posteriori(MAP) Approximation
MAP estimation
• Our prior could be in the form of a probability
distribution
• Priors can have different forms

• Uninformative prior:
– Uniform distribution

• Conjugate prior:
- Prior and the posterior have the same form
Posterior Distribution
Beta Distribution

0≤𝜃≤1

1 #$% &$%
𝛼, 𝛽 > 0
𝑝 𝜃 = ∗𝜃 ∗ (1 − 𝜃)
𝐵(𝛼, 𝛽)
Posterior Distribution
1 Flip it N times, and k times it was head.
𝑝 𝜃 = ∗ 𝜃 #$% ∗ (1 − 𝜃)&$%
𝐵(𝛼, 𝛽)
N=3
k=1
1 Flip it N times, and k times it was head.
𝑝 𝜃 = ∗ 𝜃 #$% ∗ (1 − 𝜃)&$%
𝐵(𝛼, 𝛽)
N=3
k=1
𝛼, 𝛽 = 2

𝑘 1
𝜃'*+ = =
𝑁 3

𝑘+𝛼−1 2
𝜃'() = =
𝑁+𝛼+𝛽−2 5
𝑘 1
𝜃'*+ = =
𝑁 3
𝑘+𝛼−1
𝜃'() =
𝑁+𝛼+𝛽−2
Bayesian Estimation
• For the parameters to estimate we assign them an a priori
distribution, which is used to capture our prior belief about
the parameter

• When the data is sparse, this allows us to fall back to the

prior and avoid the issues faced by Maximum Likelihood
Estimation (Example: univariate Gaussian)

• When the data is abundant, the likelihood will dominate the

prior and the prior will not have much of an effect on the
posterior distribution
Estimating Parameters

You might also like

Week 5
No ratings yet
Week 5
49 pages
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
No ratings yet
Maximum Likelihood and Bayesian Parameter Estimation: Chapter 3, DHS
35 pages
M3 DensityEstimation v1
No ratings yet
M3 DensityEstimation v1
65 pages
CS775 Lec 2
No ratings yet
CS775 Lec 2
66 pages
Lecture 2 Annotated
No ratings yet
Lecture 2 Annotated
60 pages
Lecture 3
No ratings yet
Lecture 3
55 pages
7.estimation Clustering
No ratings yet
7.estimation Clustering
56 pages
Likelihood Frequentist
No ratings yet
Likelihood Frequentist
27 pages
L09 Learning I Bayesian Learning
No ratings yet
L09 Learning I Bayesian Learning
66 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
Unit 2 (2) - 1
No ratings yet
Unit 2 (2) - 1
37 pages
Unsupervised Learning Clustering Math
No ratings yet
Unsupervised Learning Clustering Math
28 pages
Unit 6
No ratings yet
Unit 6
5 pages
Mle & Map
No ratings yet
Mle & Map
21 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
213 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
03 MLE MAP NBayes-1-21-2015
No ratings yet
03 MLE MAP NBayes-1-21-2015
40 pages
11 Mle
No ratings yet
11 Mle
26 pages
2009 Paninsky Nonparametric Estimation of Entropy and Distributions
No ratings yet
2009 Paninsky Nonparametric Estimation of Entropy and Distributions
34 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
4.4 Parametric and Non-Parametric Estimator
No ratings yet
4.4 Parametric and Non-Parametric Estimator
47 pages
ML Lecture 03 - Probabilistic Inference (Spring 2024)
No ratings yet
ML Lecture 03 - Probabilistic Inference (Spring 2024)
46 pages
Probabilistic Theory of Deep Learning
No ratings yet
Probabilistic Theory of Deep Learning
11 pages
AIML-Unit 3 Notes-Assignment 3
No ratings yet
AIML-Unit 3 Notes-Assignment 3
37 pages
ML Map and Bayseian
No ratings yet
ML Map and Bayseian
35 pages
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
No ratings yet
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
141 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
ML and MAP - HTML
No ratings yet
ML and MAP - HTML
9 pages
Lecturenotes
No ratings yet
Lecturenotes
56 pages
Frequentist vs. Bayesian Statistics Frequentist Thinking Bayesian Thinking
No ratings yet
Frequentist vs. Bayesian Statistics Frequentist Thinking Bayesian Thinking
18 pages
Week 6 Mle
No ratings yet
Week 6 Mle
41 pages
EE675A Lecture 4
No ratings yet
EE675A Lecture 4
7 pages
03 Lectureslides ParameterInference
No ratings yet
03 Lectureslides ParameterInference
24 pages
Week #6 - Verilog Behavioural Modeling (Part 4) FSM
No ratings yet
Week #6 - Verilog Behavioural Modeling (Part 4) FSM
18 pages
Lecture17 Mle Map
No ratings yet
Lecture17 Mle Map
29 pages
ANNParameter Estimation-II, III
No ratings yet
ANNParameter Estimation-II, III
2 pages
Chap1 Bishop
No ratings yet
Chap1 Bishop
35 pages
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
No ratings yet
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
12 pages
Maximum Likelihood Estimation by K.Kashin
No ratings yet
Maximum Likelihood Estimation by K.Kashin
34 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
جلسه پنجم-1
No ratings yet
جلسه پنجم-1
15 pages
Ppt-Unit 5 - 18mab302t-Graph Theory
No ratings yet
Ppt-Unit 5 - 18mab302t-Graph Theory
72 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
Statistical Learning: Problem Set 1: Problem 1 - Frequentist Decision Theory
No ratings yet
Statistical Learning: Problem Set 1: Problem 1 - Frequentist Decision Theory
4 pages
Bayesian Learning: Thanks To Nir Friedman, HU
No ratings yet
Bayesian Learning: Thanks To Nir Friedman, HU
41 pages
What Is Artificial Intelligence
No ratings yet
What Is Artificial Intelligence
8 pages
Notes4 BayesianLearning
No ratings yet
Notes4 BayesianLearning
8 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
46 pages
03 Lecturenote MLE MAP
No ratings yet
03 Lecturenote MLE MAP
7 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
Information Security
No ratings yet
Information Security
43 pages
Likelihood, Bayesian, and Decision Theory
No ratings yet
Likelihood, Bayesian, and Decision Theory
50 pages
Lecture 6
No ratings yet
Lecture 6
13 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
6 pages
Wk04 Machine Learning
No ratings yet
Wk04 Machine Learning
6 pages
MATHESH Matlab Final Output
No ratings yet
MATHESH Matlab Final Output
19 pages
Agricultural Land Use in Kerala
No ratings yet
Agricultural Land Use in Kerala
5 pages
RISE QM MCQs Ch#04
No ratings yet
RISE QM MCQs Ch#04
14 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
The Two-Dimensional Cutting Stock Problem With Usable Leftovers: Mathematical Modelling and Heuristic Approaches
No ratings yet
The Two-Dimensional Cutting Stock Problem With Usable Leftovers: Mathematical Modelling and Heuristic Approaches
39 pages
ML Question Bank
No ratings yet
ML Question Bank
4 pages
10-701/15-781, Machine Learning: Homework 1: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781, Machine Learning: Homework 1: Aarti Singh Carnegie Mellon University
6 pages
Nonlinear Optimization With Inequality Constraints
No ratings yet
Nonlinear Optimization With Inequality Constraints
21 pages
Probability Theory - Towards Data Science
No ratings yet
Probability Theory - Towards Data Science
19 pages
Evaluating Risk in Construction-Schedule Model Eric-S: Construction Schedule Risk Model
No ratings yet
Evaluating Risk in Construction-Schedule Model Eric-S: Construction Schedule Risk Model
10 pages
Bayes Classification
No ratings yet
Bayes Classification
86 pages
Variable Selection 8.1 The Model Building Problem
No ratings yet
Variable Selection 8.1 The Model Building Problem
18 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
Control Systems Notes DEE M2 June
No ratings yet
Control Systems Notes DEE M2 June
34 pages
Customer Churn Prediction in The Telecommunication Sector Using A Rough Set Approach
No ratings yet
Customer Churn Prediction in The Telecommunication Sector Using A Rough Set Approach
13 pages
ML Lab Exp 7 K-Means Clustering
No ratings yet
ML Lab Exp 7 K-Means Clustering
14 pages
Leymarie Backtesting Marginal Expected Shortfall
No ratings yet
Leymarie Backtesting Marginal Expected Shortfall
27 pages
DSA Question Bank
No ratings yet
DSA Question Bank
6 pages
Summer Term 2024 Course Handout: Date: 28.05.2024
No ratings yet
Summer Term 2024 Course Handout: Date: 28.05.2024
3 pages
Unit 4 Practice Test
No ratings yet
Unit 4 Practice Test
8 pages
Introduction To Probabilistic Learning
No ratings yet
Introduction To Probabilistic Learning
9 pages
Course - Syllabus - 2024 WAY - ECO3104-11 - ECONOMETRICS (1) - SEOKJOO ANDREW CHANG
No ratings yet
Course - Syllabus - 2024 WAY - ECO3104-11 - ECONOMETRICS (1) - SEOKJOO ANDREW CHANG
2 pages
Perspectives On System Identification
No ratings yet
Perspectives On System Identification
17 pages
Week 2 - The General Strategy For Solving Material Balance Problems
No ratings yet
Week 2 - The General Strategy For Solving Material Balance Problems
19 pages
Desalgo 02 - Practice - Exercises - 1
No ratings yet
Desalgo 02 - Practice - Exercises - 1
2 pages
Regula Falsi PDF
No ratings yet
Regula Falsi PDF
8 pages
Linear Predict
No ratings yet
Linear Predict
14 pages
Data Structures and Algorithms
No ratings yet
Data Structures and Algorithms
32 pages
Zeeshan (CS) - Assignment 1
No ratings yet
Zeeshan (CS) - Assignment 1
3 pages
Neuro-Fuzzy, Revision Questions June 1, 2005
No ratings yet
Neuro-Fuzzy, Revision Questions June 1, 2005
7 pages
Design and Characterization of Different Shapes of Micro Cantilever For Human Immunodeficiency Virus
No ratings yet
Design and Characterization of Different Shapes of Micro Cantilever For Human Immunodeficiency Virus
1 page

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Alternative Proxies:

Alternative Proxy