0% found this document useful (0 votes)

19 views6 pages

04 Lecturenote MLE MAP Discriminative

The document discusses Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP) methods for discriminative supervised learning, focusing on applications like yield prediction and linear/logistic regression. It outlines the mathematical formulations for estimating model parameters and highlights the equivalence of MLE to ordinary least squares regression and MAP to regularized OLS. Additionally, it emphasizes the importance of understanding predictive distributions and parameter estimation techniques in machine learning.

Uploaded by

mizhou0309

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views6 pages

04 Lecturenote MLE MAP Discriminative

Uploaded by

mizhou0309

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

CSE517A Machine Learning Fall 2022

Lecture 4: MLE and MAP for Discriminative Supervised Learning

Instructor: Marion Neumann
Reading: fcml 2.8 (mle), 3.8 (map), 4.2-4.3 (map), 5.2 (Bayes Classifier and Logistic Regression)

Application
Let’s consider our yield prediction problem from last lecture. This can
be cast as a classical discriminative supervised learning problem: predict
the production of bushels of corn per acre on a farm as a function
of the proportion of that farm’s planting area that was treated with a
new pesticide by modeling p(y ∣ x) which incorporates a reasonable way
to model the noise in the observed data (https://www.developer.com/mgmt/
real-world-machine-learning-model-evaluation-and-optimization.html).
In addition to the point estimate of the yield for a given amount of treated
area, it will be very informative for the farmer to know what the expected
deviation from this point estimate is. In other words, we would like to http://www.corncapitalinnovations.com/production/
300- bushel-corn/
provide the standard deviation as an estimator of uncertainty.

1 Introduction
1.1 Predictive Distribution
In discriminative supervised machine learning our goal is to model the posterior predictive distribution:

p(y ∣ D, x) = ∫ p(y, θ ∣ D, x) dθ
θ
(1)
= ∫ p(y ∣ D, x, θ) p(θ ∣ D) dθ
θ

This makes sense, since we really want to incorporate all possible models parameterized by their respective
model parameters θ weighted by the parameter’s probability (i.e. the posterior probability over parameters);
cf. fcml 3.8.6.

Unfortunately, the above integral is generally intractable in closed form and sampling techniques, such as
Monte Carlo approximations, are used to approximate the distribution. So, oftentimes we will actually not
use this distribution for predictions but estimate the model parameters via mle or map and then plug those
into our model p(y ∣ x, θ̂) for predictions. We will meet the posterior predictive distribution again when
discussing Gaussian processes later in the course.

1.2 Parameter Estimation

Usually, there are two assumptions in discriminative supervised learning.

Assumptions for Discriminative Supervised Learning:

(1) xi are known ⇒ xi independent of the model parameters w ⇒ p(X ∣ w) = p(X), also p(w ∣ X) = p(w)
(2) yi′ s are independent given the input features xi and w

Our goal is to estimate w directly from D = {(x, yi )}ni=1 using the joint conditional likelihood p(y ∣ X, w).

1
2

Lemma 1.1. Maximizing the (data) likelihood p(D ∣ w) = p(y, X ∣ w) is equivalent to maximizing the (joint)
conditional likelihood p(y ∣ X, w).
⎡y ⎤
⎢ 1⎥
⎢ ⎥
Notation Reminder: X = [x1 , ..., xn ] ∈ R d×n
where xi ∈ R ; y = ⎢ ⋮ ⎥ ∈ Rn d
⎢ ⎥
⎢yn ⎥
⎣ ⎦
Exercise 1.1. Prove Lemma 1.1. hint: use assumption (1).

Maximum Likelihood Estimation

Choose w to maximize the joint conditional likelihood p(y ∣ X, w).

ŵM LE = arg max p(y ∣ X, w)

w
n
(2)
= arg max ∏ p(yi ∣ xi , w)
w i=1 (2)
n
= arg max ∑ log p(yi ∣ xi , w)
w i=1
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
log−likelihood

Maximum-a-posterior Estimation
Bayesian Way: Model w as a random variable from p(w) and use p(w ∣ D). Choose w to maximize the
posterior over parameters p(w ∣ X, y).

ŵM AP = arg max p(w ∣ X, y)

w
= arg max p(y ∣ X, w) p(w)
w ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ²
likelihood prior (3)
n
= arg max ∑ log p(yi ∣ xi , w) + log p(w)
w i=1
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
same as MLE

2 Example: Linear Regression

Model Assumption: yi = w⊺ xi + i ∈ R, where we use the Gaussian distribution (cf. fcml 2.5.3) to model
the noise i ∼ N (0, σ 2 ), which is independent identically distributed (iid).

1 −(w⊺ xi −yi )2
⇒ yi ∣ xi , w ∼ N (w⊺ xi , σ 2 ) ⇒ p(yi ∣ xi , w) = √ e 2σ 2 (4)
2πσ 2

2.1 Learning Phase

To train our model we estimate w from D.

MLE
Use Eq.(2):
3

n
ŵM LE = arg max ∑ log p(yi ∣ xi , w)
w i=1
1 −(w⊺ xi −yi )2
n
= arg max ∑ log( √ ) + log(e 2σ 2 )
w i=1 2πσ 2
n
(5)
= arg max ∑ −(w⊺ xi − yi )2
w i=1
1 n
= arg min ∑(w⊺ xi − yi )2
w n i=1
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
OLS/squared loss

The loss thus l(w) = n1 ∑ni=1 (w⊺ xi − yi )2 aka square loss or Ordinary Least Squares (OLS). OLS can be
optimized with gradient descent, Newton’s method, or in closed form.

Closed Form Solution: w = (XX ⊺ )−1 Xy.

Note: We need to take the inverse; for low dimensional data this is fine since XX ⊺ is d×d, for high-dimensional
data we will have to get an approximate solution.

MAP
Additional Model Assumption: prior distribution:

w ∼ N (0, σp2 I)
−w w ⊺
1
p(w) = √
2
e 2σp
2πσp2

Ensure for yourself that this prior is a conjugate prior to our likelihood.
Now, use Eq.(3):
n
ŵM AP = arg max ∑ log p(yi ∣ xi , w) + log p(w)
w i=1
1 n ⊺ 1 ⊺
= arg min ∑(w xi − yi ) + 2 w w
2
w 2σ 2 i=1 2σp (6)
1 n
= arg min ∑(w⊺ xi − yi )2 + λ∣∣w∣∣22
w n i=1 ´¹¹ ¹ ¹ ¸¹¹ ¹ ¹ ¶
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ l2 −regularization
squared loss

This formulation is known as ridge regression and we have derived it before in a frequentist setting using
structural risk minimization (srm). Note that λ is a hyperparameter controlling the amount of regularization
used/needed. It can be learned via cross-validation.

Closed Form Solution: w = (XX ⊺ + λI)−1 Xy.

Note: The solution is numerically more stable as the term λI makes the matrix to invert less likely to be
ill-conditioned.

2.2 Prediction Phase

Use the estimated model parameters ŵ in predictive distribution p(y ∗ ∣ x∗ , ŵ). For linear regression we
have
1 −(ŵ⊺ x∗ −y ∗ )2
p(y ∗ ∣ x∗ , ŵ) = √ e 2σ 2 .
2πσ 2
The point estimate would be given by the mean of this distribution: ŷ ∗ = ŵ⊺ x∗ .
4

2.3 Summary
• mle solution is equivalent to ordinary least squares regression.
• map solution is equivalent to regularized ols using an l2 regularizer.

• We could use a different noise model such as the full Gaussian N (µ, Σ), multiplicative noise, or non-
stationary noise (e.g. heteroscedastic noise) to make this model more expressive.

Exercise 2.1. True or false? Justify your answer.

(a) If n → ∞, MAP can recover from a wrong prior distribution over parameters, where we assume that
our prior distribution is strictly larger than zero on [0,1].
(b) The MAP solution to linear regression is numerically less stable to compute than the MLE solution.

3 Example: Logistic Regression

Model Assumption: We need to squash w⊺ xi to get a value in [0,1]. In logistic regression we model
p(y ∣ x, w) and assume that it takes on the form:
1
p(y ∣ x, w) = Ber (y ∣ ), (7)
1 + e−w⊺ x
where we use the Bernoulli distribution (cf. fcml 2.3.1):
⎧
⎪
⎪θ if a = 1
Ber(a ∣ θ) = ⎨
⎪
⎪1−θ if a = −1.
⎩
For binary classification our observations are y ∈ {−1, +1} and we can write Eq.(7) as p(y ∣ x, w) = 1
1+e−y(w⊺ x)
.

Exercise 3.1. Verify that p(y ∣ x, w) = 1

1+e−y(w⊺ x)
is equivalent to Eq.(7).

3.1 Learning Phase

MLE
Now, plug this into Eq.(2) to get:
n
ŵM LE = arg max ∑ log p(yi ∣ xi , w)
w i=1
n ⊺ (8)
= arg min ∑ log(1 + e−yi (w xi )
)
w i=1
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
negative log likelihood (nll)

We need to estimate the parameter w. To find the values of the parameter at minimum, we can try to
⊺
find solutions for ∇w ∑ni=1 log(1 + e−yi (w xi ) ) = 0. This equation has no closed form solution, so we will use
⊺
Gradient Descent on the negative log likelihood nll(w) = ∑ni=1 log(1 + e−yi (w xi ) ).
5

MAP
In the MAP estimate we treat w as a random variable and can specify a prior belief distribution over it.
Additional Model Assumption:

w ∼ N (0, σ 2 I)
1 −w⊺ w
p(w) = √ e 2σ2
2πσ 2
Then the MAP estimator is given by

n ⊺
ŵM AP = arg min ∑ log(1 + e−yi (w xi )
) + λ∣∣w∣∣22
w i=1 (9)
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
negative log posterior (nlp)

Once again, this function has no closed form solution, but we can use Gradient Descent on the negative
⊺
log posterior nlp(w) = ∑ni=1 log(1 + e−yi (w xi ) ) + λ∣∣w∣∣22 to find the optimal parameter. Note again that we
derived this before via srm using the log-loss and l2 -regularization (frequentist approach).
Exercise 3.2. Derive Eq.(9), the negative log-posterior for logistic regression.

[optional] True Bayesian Logistic Regression

Did you notice that in order to get the MAP solution we modeled the posterior as the product of the
likelihood and the prior? This means that, we have to approximate/model two distributions, p(y ∣ X, w)
and p(w). Alternatively, we can directly model the posterior p(w ∣ X, y). We have two options:
• Model the posterior via Laplace approximation (most common approach).

• Derive an algorithm for sampling from the posterior and use this as an approximation.
We will not cover this approach in this course. For further reference see FCML 4.4 and 4.5.

3.2 Prediction Phase

Use ŵ in Eq. (7):
1
p(y ∗ ∣ x∗ , ŵ) = Ber (y ∗ ∣ ).
1 + e−ŵ⊺ x∗
To get a point estimate this means
⎧
⎪
⎪1 if ŵ⊺ x∗ ≥ 0
ŷ ∗ = ⎨
⎪
⎪−1 if ŵ⊺ x∗ < 0
⎩
which just simplifies to ŷ ∗ = sign (ŵ⊺ x∗ ).

3.3 Summary
Logistic regression is easy to
• fit (estimate w directly from D, linear in dn)
p(y=1∣x)
• interpret as log odds: log p(y=−1∣x) = w⊺ x
⊺
ewc x
• easy to extend to multi-class classification: p(y = c ∣ x, w) = ⊺
∑c ewc x
6

Exercise 3.3. One benefit of LR is that it is easy to interpret. This can be seen by looking at the log odds:

p(y = 1 ∣ x, w)
log
p(y = −1 ∣ x, w)

Show that
p(y = 1 ∣ x, w)
log = w⊺ x
p(y = −1 ∣ x, w)

Our Application

Back to our application of predicting the production of bushels

of corn per acre on a farm as a function of the proportion of that
farm’s planting area that was treated with pesticides.
The data clearly shows a non-liner relationship between x and
y. How could you use the mle and map solutions developed in
Section 2 to model this trend?
(image source: https://www.developer.com/mgmt/
real-world-machine-learning-model-evaluation-and-optimization.
html)

Business Statistics Project Report: Submitted by
100% (1)
Business Statistics Project Report: Submitted by
8 pages
Tuo Zhao Notes
No ratings yet
Tuo Zhao Notes
47 pages
Cheat Sheet Interpreting Regressions Three Pages
No ratings yet
Cheat Sheet Interpreting Regressions Three Pages
3 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
Output 25
No ratings yet
Output 25
8 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
Output 23
No ratings yet
Output 23
6 pages
Version 1
No ratings yet
Version 1
18 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
Scribe Notes BML
No ratings yet
Scribe Notes BML
25 pages
Lecture17 Mle Map
No ratings yet
Lecture17 Mle Map
29 pages
Lecture 05
No ratings yet
Lecture 05
5 pages
Lecture 6
No ratings yet
Lecture 6
13 pages
1 Lecture 5b: Probabilistic Perspectives On ML Algorithms
No ratings yet
1 Lecture 5b: Probabilistic Perspectives On ML Algorithms
6 pages
Regression Probabilistic Perspective
No ratings yet
Regression Probabilistic Perspective
20 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Logistic Regression
No ratings yet
Logistic Regression
36 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
Lec 05
No ratings yet
Lec 05
53 pages
ML Map and Bayseian
No ratings yet
ML Map and Bayseian
35 pages
Unit 3-Discriminative Models
No ratings yet
Unit 3-Discriminative Models
29 pages
Lin Reg
No ratings yet
Lin Reg
34 pages
Lecture 7 and 8
No ratings yet
Lecture 7 and 8
13 pages
ML-chap10 2024 110300
No ratings yet
ML-chap10 2024 110300
29 pages
Revisiting Revisiting Logistic Regression & Naïve Logistic Regression & Naïve Bayes Bayes
No ratings yet
Revisiting Revisiting Logistic Regression & Naïve Logistic Regression & Naïve Bayes Bayes
46 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Representer Function
No ratings yet
Representer Function
12 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
Lecture03c Maximum Likelihood Annotated
No ratings yet
Lecture03c Maximum Likelihood Annotated
8 pages
Lecture 14 - Logistic and Softmax Regression - Plain
No ratings yet
Lecture 14 - Logistic and Softmax Regression - Plain
12 pages
Logistic Regression
No ratings yet
Logistic Regression
26 pages
Revised Lecture Notes 2
No ratings yet
Revised Lecture Notes 2
16 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
Ds 7
No ratings yet
Ds 7
20 pages
Mid Sem Solution 2019
No ratings yet
Mid Sem Solution 2019
9 pages
Chapter 3 - Introduction Via Linear Regression
No ratings yet
Chapter 3 - Introduction Via Linear Regression
20 pages
18.657: Mathematics of Machine Learning: N I I H H I 1
No ratings yet
18.657: Mathematics of Machine Learning: N I I H H I 1
6 pages
Lecture 5
No ratings yet
Lecture 5
23 pages
Homework2 v1.0
No ratings yet
Homework2 v1.0
5 pages
ML Basics Lecture2 Linear Classification
No ratings yet
ML Basics Lecture2 Linear Classification
34 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Filt Ident Lecturenotes
No ratings yet
Filt Ident Lecturenotes
12 pages
Dis 1
No ratings yet
Dis 1
5 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Introduction To Machine Learning: 2 Linear Classifiers
No ratings yet
Introduction To Machine Learning: 2 Linear Classifiers
4 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Sci ML Mock Exam 2023
No ratings yet
Sci ML Mock Exam 2023
8 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
77 pages
GMM, MLE and Tests For Non-Linear Restrictions: 1 Generalized Method of Moments (GMM)
No ratings yet
GMM, MLE and Tests For Non-Linear Restrictions: 1 Generalized Method of Moments (GMM)
15 pages
Quiz3 2023
No ratings yet
Quiz3 2023
2 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Bayesian Learning: Thanks To Nir Friedman, HU
No ratings yet
Bayesian Learning: Thanks To Nir Friedman, HU
41 pages
Binary Classification and Logistic Regression
No ratings yet
Binary Classification and Logistic Regression
7 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
89 pages
03 Lecturenote MLE MAP
No ratings yet
03 Lecturenote MLE MAP
7 pages
Manova: Presented By
No ratings yet
Manova: Presented By
13 pages
Winbugs: A Tutorial: Anastasia Lykou and Ioannis Ntzoufras
No ratings yet
Winbugs: A Tutorial: Anastasia Lykou and Ioannis Ntzoufras
12 pages
Chapter 8
No ratings yet
Chapter 8
4 pages
BRM Lesson Plan
No ratings yet
BRM Lesson Plan
5 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
6 pages
NASA - Physics of Failure-Based Reliability Assessments
No ratings yet
NASA - Physics of Failure-Based Reliability Assessments
95 pages
Unit 2 - Probabilistic Reasoning
No ratings yet
Unit 2 - Probabilistic Reasoning
25 pages
Selected Works of Debabrata Basu 1st Edition Anirban Dasgupta Download
100% (1)
Selected Works of Debabrata Basu 1st Edition Anirban Dasgupta Download
91 pages
(Untitled) : Error Analysis
No ratings yet
(Untitled) : Error Analysis
2 pages
Funasr: A Fundamental End-To-End Speech Recognition Toolkit
No ratings yet
Funasr: A Fundamental End-To-End Speech Recognition Toolkit
5 pages
Advances in Complex Data Modeling and Computational Methods in Statistics
No ratings yet
Advances in Complex Data Modeling and Computational Methods in Statistics
210 pages
Ancova Reading
No ratings yet
Ancova Reading
20 pages
Final 2022
No ratings yet
Final 2022
5 pages
Results Excel
No ratings yet
Results Excel
20 pages
Stat PDF
No ratings yet
Stat PDF
132 pages
BRM Unit-4
No ratings yet
BRM Unit-4
47 pages
H2 Mathematics (9758) Topic 23: Hypothesis Testing Tutorial Questions
No ratings yet
H2 Mathematics (9758) Topic 23: Hypothesis Testing Tutorial Questions
19 pages
More Predictive Analytics. Microsoft Excel (PDFDrive)
No ratings yet
More Predictive Analytics. Microsoft Excel (PDFDrive)
465 pages
Statistical Concepts
No ratings yet
Statistical Concepts
20 pages
Complete Business Statistics: Time Series, Forecasting, and Index Numbers
No ratings yet
Complete Business Statistics: Time Series, Forecasting, and Index Numbers
37 pages
Adverse Impact Tests 1
No ratings yet
Adverse Impact Tests 1
34 pages
Two-Sample T-Tests Using Effect Size
No ratings yet
Two-Sample T-Tests Using Effect Size
11 pages
Effect Sizes Means
No ratings yet
Effect Sizes Means
10 pages
Win Bugs
No ratings yet
Win Bugs
10 pages
Business Analytics
No ratings yet
Business Analytics
3 pages
Test3 (SP22) - Attempt Review
No ratings yet
Test3 (SP22) - Attempt Review
17 pages
Confidence Intervals For Kendall's Tau-B Correlation
No ratings yet
Confidence Intervals For Kendall's Tau-B Correlation
6 pages
An - Aconitus An - Maculatus: Data Kepadatan Vektor Dan Kasus Malaria Pada Di Kecamatan "X" Tahun 2011
No ratings yet
An - Aconitus An - Maculatus: Data Kepadatan Vektor Dan Kasus Malaria Pada Di Kecamatan "X" Tahun 2011
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

04 Lecturenote MLE MAP Discriminative

Uploaded by

04 Lecturenote MLE MAP Discriminative

Uploaded by

CSE517A Machine Learning Fall 2022

Lecture 4: MLE and MAP for Discriminative Supervised Learning

1.2 Parameter Estimation

Assumptions for Discriminative Supervised Learning:

Maximum Likelihood Estimation

ŵM LE = arg max p(y ∣ X, w)

ŵM AP = arg max p(w ∣ X, y)

2 Example: Linear Regression

2.1 Learning Phase

Closed Form Solution: w = (XX ⊺ )−1 Xy.

Closed Form Solution: w = (XX ⊺ + λI)−1 Xy.

2.2 Prediction Phase

Exercise 2.1. True or false? Justify your answer.

3 Example: Logistic Regression

Exercise 3.1. Verify that p(y ∣ x, w) = 1

3.1 Learning Phase

[optional] True Bayesian Logistic Regression

3.2 Prediction Phase

Back to our application of predicting the production of bushels

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.