0% found this document useful (0 votes)

9 views18 pages

Version 1

The document outlines an assignment on linear and logistic regression, covering key concepts such as maximum likelihood estimation (MLE), gradient descent, and decision theory. It includes specific problems related to estimating parameters, weighted linear regression, decision-making under cost constraints, and maximum A-posteriori estimation. Additionally, it discusses the perceptron algorithm and includes a bonus problem on MLE for multi-class logistic regression.

Uploaded by

Fabian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views18 pages

Version 1

Uploaded by

Fabian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Franklin F.

Lucero

AI534 — Written Homework Assignment 1 (30 pts + 6 bonus pts)

This written assignment covers the contents of linear regression and logistic regression. The key concepts
covered here include:

• Maximum likelihood estimation (MLE)

• Gradient descent learning

• Decision theory for probabilistic classifiers

• Maximum A Posteriori (MAP) parameter estimation

• Perceptron

1. MLE for uniform distribution. [3pt]

Given a set of IID observed samples x1 , ..., xn ∼ uniform(0, θ), we wish to estimate the parameter θ.

(a) (1 pt) Write down the likelihood function of θ.

(b) (2 pts) Derive the maximum likelihood estimation for θ, which is the value for theta that maximizes
the function of part (a). (Hint: The likelihood function is a monotonic function. So the maximizing
solution is at the extreme— there is no need to take derivative for this case.)
.

2. Weighted linear regression. [10pt] In class when discussing linear regression, we assume that the
Gaussian noise is iid (identically independently distributed). In practice, we may have some extra
information regarding the fidelity of each data point. For example, we may know that some examples
have higher noise variance than others. To model this, we can model the noise variable i , 2 , · · · n
as distinct Gaussian’s, i.e., i ∼ N (0, σi2 ) with known variance σi2 . How will this influence our linear
regression model? Let’s work it out.

(a) (3pts) Write down the log likelihood function of w under this new modeling assumption.

(b) (1pts) Show that maximizing

n the log likelihood is equivalent to minimizing a weighted square
loss function J(W) = i=1 ai (wT xi − yi )2 , and express each ai in terms of σi .

(c) (3 pts) Take the gradient of the loss function J(w) and provide the batch gradient descent update
rule for optimizing w.

(d) (3 pts) Derive a closed form solution to this optimization problem. Hint: begin by rewrite the
objective into matrix form using a diagonal matrix A with A(i, i) = ai .

3. Decision theory: working with expectations. [6pt]

In this problem, you will analyze a scenario where the Maximum A-Posteriori (MAP) decision rule,
which you learned in class, is not appropriate. Instead, you’ll explore how to make decisions based on
minimizing expected costs.
Consider a spam filter that predicts whether an email is spam, using probabilistic predictions. For
this filter, there are costs associated with making errors (misclassifying emails), but these costs are not
symmetric. Misclassifying a non-spam email as spam (i.e., filtering out an important email) is more
costly than misclassifying a spam email as non-spam.
The following table shows the cost of each possible outcome:

1
Franklin F. Lucero

predicted true label y

label ŷ non-spam spam
non-spam 0 1
spam 10 0

Table 1: A mis-classification cost matrix for the spam filter problem.

• If the filter’s prediction is correct, there is no cost.

• If a non-spam email is classified as spam, there is a cost of 10.
• If a spam email is classified as non-spam, there is a cost of 1.

Here we will go through some questions to help you figure out how to use the probability and misclas-
sification costs to make predictions.

(a) (2 pt) You received an email for which the spam filter predicts that it is a spam with p = 0.8. We
want to make the decision that minimizes the expected cost.
Question: Should you classify this particular email as spam or non-spam? [Hint: Compare the
expected cost of classifying the email as spam versus non-spam. Choose the classification that
results in the lower expected cost.]

(b) (2 pts)The MAP decision rule would classify an email as spam if p > 0.5, but this rule does not
minimize expected cost in this case. We need a new rule that compares p to a different threshold
θ. The value of θ should be chosen to minimize the expected cost based on the costs in the table.
Question: What is the value of θ that works for the costs specified in Table 1? [Hint: To find
the threshold θ, set up the decision rule by comparing the expected cost of each decision, as you
did in (a), then Solve for p in terms of the costs.]

(c) (2pts) Now, imagine that the optimal decision rule would use θ = 1/5 as the threshold for
classifying an email as spam. Question: Can you provide a new cost table where this would be
the case? [Hint: Use the relationship between the costs and θ that you derived in part (b). Based
on this relationship, adjust the misclassification costs in the table to achieve θ = 1/5.]

4. Maximum A-Posteriori Estimation. [8pt] Suppose we observe the values of n IID random vari-
ables X1 , . . . , Xn drawn from a single Bernoulli distribution with parameter θ. In other words, for
each Xi , we know that P (Xi = 1) = θ and P (Xi = 0) = 1 − θ. In the Bayesian framework, we
treat θ as a random variable, and use a prior probability distribution over θ to express our prior
knowledge/preference about θ. In this framework, X1 , . . . , Xn can be viewed as generated by:
• First, the value of θ is drawn from a given prior probability distribution
• Second, X1 , . . . , Xn are drawn independently from a Bernoulli distribution with this θ value.

In this setting, Maximum A-Posteriori (MAP) estimation is a way to estimate θ by finding the value
that maximizes the posterior probability, given both its prior distribution and the observed data.The
MAP estimation of θ is given by:

θ̂M AP = argmax P (θ = θ̂|X1 , . . . , Xn )

θ̂

By applying Bayes’ theorem, this becomes:

θ̂M AP = argmax P (X1 , . . . , Xn |θ = θ̂)P (θ = θ̂) = argmax L(θ̂)p(θ̂)

θ̂ θ̂

where L(θ̂) is the likelihood function of the data given θ, and p(θ̂) is the prior distribution over θ.

2
Franklin F. Lucero

Now consider using a beta distribution as the prior: θ ∼ Beta(α, β), whose PDF function is

θ̂(α−1) (1 − θ̂)(β−1)
p(θ̂) =
B(α, β)

where B(α, β) is a normalizing constant.

(a) (3 pts) Derive the posterior distribution p(θ̂|X1 , . . . , Xn , α, β). Compare the form of the posterior
distribution with that of the beta distribution, you will see the posterior is also a beta distribution.
What the updated α and β parameters for the posterior?
(b) (2 pts) Suppose we use Beta(2, 2) as the prior, what Beta distribution do we get for the posterior
after we observe 5 coin tosses and 2 of them are head? What is the posterior distribution of θ after
we observe 50 coin tosses and 20 of them are head? (you don’t need to write out the distributions,
simply provide the α and β distribution would suffice.
(c) (1pt) Plot the pdf function of the prior Beta(2, 2) and the two posterior distributions. You can
use any software (e.g., R, Python, Matlab) for this plot.
(d) (2pt) Assume that θ = 0.4 is the true probability, as we observe more and more coin tosses from
this coin, how would the shape of the posterior change as more data is observed? Will the MAP
estimate converge toward the true value?

5. Perceptron. [3pt] Assume a data set consists only of a single data point {(x, +1)}. How many times
would the Perceptron algorithm mis-classify this point x before convergence? What if the initial weight
vector w0 was initialized randomly and not as the all-zero vector? Derive the number of times as a
function of w0 and x.

(a) (1 pts) Case 1: w0 = 0.

(b) (2 pts) Case 2: w0 ! = 0:

6. Bonus: MLE for multi-class logistic regression. [6 pts] Consider the maximum likelihood
estimation problem for multi-class logistic regression using the soft-max function defined below:

exp(wkT x)
p(y = k|x) = K
T
j=1 exp(wj x)

We can write out the likelihood function as:

N
K
L(w) = p(y = k|xi )yik
i=1 k=1

where yik is an indicator variable taking value 1 if yi = k.

(a) (1 pts) Provide the log-likelihood function.

(b) (5 pts) Derive the gradient of the log-likelihood function w.r.t the weight vector wc of class c.
[Hint: the solution to this problem is provided in the Logistic regression lecture slide. You just
need to fill in themissing derivation. Note that for any example xi , the denominator in the
softmax function j exp(wjT xi ) is the same for all k— denoting it as zi makes it simpler to work
through the derivation, but be sure to remember that zi is a function of all wk ’s.]

3
Franklin F. Lucero

CamScanner
Franklin F. Lucero

CamScanner
10/20/24 10:12 PM Code.m 1 of 1

clear;
close all;
clc;

% Vector of x values
x = 0:0.001:1;

% Prior Beta(2, 2)
prior = betapdf(x, 2, 2);

% Beta(4, 5)
after1 = betapdf(x, 4, 5);

% Beta(22, 32)
after2 = betapdf(x, 22, 32);

% Plot
figure;
plot(x, prior, 'DisplayName', 'Prior Beta(2,2)', 'LineWidth', 1.5);
hold on;
plot(x, after1, 'DisplayName', 'Posterior Beta(4,5)', 'LineWidth', 1.5);
plot(x, after2, 'DisplayName', 'Posterior Beta(22,32)', 'LineWidth', 1.5);
xlabel('Theta');
ylabel('Density');
legend('show');
grid on;
hold off;
6
Prior Beta(2,2)
Posterior Beta(4,5)
5 Posterior Beta(22,32)

4
Density

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Theta
CamScanner
CamScanner
CamScanner

Complete Bundle Essential Statistics For Public Managers and Policy Analysts 4th Edition Berman
No ratings yet
Complete Bundle Essential Statistics For Public Managers and Policy Analysts 4th Edition Berman
413 pages
STATA Basics Regression and Panal Data
100% (1)
STATA Basics Regression and Panal Data
26 pages
Ass8 Solns
No ratings yet
Ass8 Solns
10 pages
Assignment 06
No ratings yet
Assignment 06
11 pages
Mid Sem Final - Solutions
No ratings yet
Mid Sem Final - Solutions
9 pages
JW Chapter11solutions
No ratings yet
JW Chapter11solutions
49 pages
Exercises Chapter2 Part1
No ratings yet
Exercises Chapter2 Part1
2 pages
Name-Devashish Chatterjee, Manisha Masud PGDM ID - 379, 2113 Assignment - M2 / Cluster Analysis
No ratings yet
Name-Devashish Chatterjee, Manisha Masud PGDM ID - 379, 2113 Assignment - M2 / Cluster Analysis
4 pages
Biostats L2
No ratings yet
Biostats L2
36 pages
Inference Quals 1992-2019
No ratings yet
Inference Quals 1992-2019
66 pages
Tutorial 2 PSNM (2024-25) Unit-1 Correlation, Regression and Curve Fitting
No ratings yet
Tutorial 2 PSNM (2024-25) Unit-1 Correlation, Regression and Curve Fitting
2 pages
HW 1
No ratings yet
HW 1
11 pages
Exam 2011
No ratings yet
Exam 2011
22 pages
Homework2 v1.0
No ratings yet
Homework2 v1.0
5 pages
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 2
No ratings yet
Department of Electrical Engineering School of Science and Engineering EE514/CS535 Machine Learning Homework 2
8 pages
Effectiveness of Audiovisual-Based Training On Basic Life Support Knowledge of Students in Bengkulu
No ratings yet
Effectiveness of Audiovisual-Based Training On Basic Life Support Knowledge of Students in Bengkulu
6 pages
MGMT E-104: Quantitative Methods For Economics and Finance: Course Overview
No ratings yet
MGMT E-104: Quantitative Methods For Economics and Finance: Course Overview
7 pages
Hubungan Peran Ayah Dalam Pengasuhan Dengan Kepercayaan Diri Anak Usia 5-6 Tahun Di Paud Cendana Rumbai Kecamatan Rumbai
No ratings yet
Hubungan Peran Ayah Dalam Pengasuhan Dengan Kepercayaan Diri Anak Usia 5-6 Tahun Di Paud Cendana Rumbai Kecamatan Rumbai
14 pages
PROBLEM SET 4 Continuous Probability Solutions
No ratings yet
PROBLEM SET 4 Continuous Probability Solutions
8 pages
L10 - T Test
No ratings yet
L10 - T Test
28 pages
Mid Sem Solution 2019
No ratings yet
Mid Sem Solution 2019
9 pages
04 Lecturenote MLE MAP Discriminative
No ratings yet
04 Lecturenote MLE MAP Discriminative
6 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Machine Learning Questions Final - Solutions
No ratings yet
Machine Learning Questions Final - Solutions
5 pages
Midterm Solutions
No ratings yet
Midterm Solutions
8 pages
Supervised Learning by Fadhlurrohman Henriwan
No ratings yet
Supervised Learning by Fadhlurrohman Henriwan
31 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
Lecture17 Mle Map
No ratings yet
Lecture17 Mle Map
29 pages
Ai ML Exam - 1march 16 2022-Michael Magreola
No ratings yet
Ai ML Exam - 1march 16 2022-Michael Magreola
8 pages
Mini Project Analysis On Messi
No ratings yet
Mini Project Analysis On Messi
10 pages
SVM Problems1
No ratings yet
SVM Problems1
5 pages
CS725 2020 Quiz1
No ratings yet
CS725 2020 Quiz1
3 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
2011 End Spring 2011 Computer Science Machine Learning
No ratings yet
2011 End Spring 2011 Computer Science Machine Learning
10 pages
Standard Error
No ratings yet
Standard Error
14 pages
DATA MINING - Syllabus
No ratings yet
DATA MINING - Syllabus
4 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
L09 Using Matlab Neural Networks Toolbox
100% (1)
L09 Using Matlab Neural Networks Toolbox
34 pages
Bai and NG 2002
No ratings yet
Bai and NG 2002
31 pages
Output 25
No ratings yet
Output 25
8 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
Output 23
No ratings yet
Output 23
6 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
hw3 Red
No ratings yet
hw3 Red
4 pages
Quiz3 2024
No ratings yet
Quiz3 2024
2 pages
Introduction To Econometrics - Stock & Watson - CH 13 Slides
No ratings yet
Introduction To Econometrics - Stock & Watson - CH 13 Slides
38 pages
10-701/15-781, Machine Learning: Homework 1: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781, Machine Learning: Homework 1: Aarti Singh Carnegie Mellon University
6 pages
Final f02
No ratings yet
Final f02
12 pages
Narrabundah College: Specialist Mathematics AC/IB Unit 7: Specialist Mathematics - 1.0 STD Units
No ratings yet
Narrabundah College: Specialist Mathematics AC/IB Unit 7: Specialist Mathematics - 1.0 STD Units
12 pages
Distributed Lag Models
No ratings yet
Distributed Lag Models
9 pages
MAT112 CH 11 Ungrouped Data PDF
No ratings yet
MAT112 CH 11 Ungrouped Data PDF
4 pages
CMU 2018s NinaBALCAN HW3
No ratings yet
CMU 2018s NinaBALCAN HW3
7 pages
Probit Analysis MiniTab - Konsentrasi (LC50)
No ratings yet
Probit Analysis MiniTab - Konsentrasi (LC50)
3 pages
Hw2 - Raymond Von Mizener - Chirag Mahapatra
No ratings yet
Hw2 - Raymond Von Mizener - Chirag Mahapatra
13 pages
Homework 2
No ratings yet
Homework 2
4 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
Midterm 2010 F
No ratings yet
Midterm 2010 F
15 pages
Final Exam Solutions
No ratings yet
Final Exam Solutions
12 pages
Time Series Econometrics
No ratings yet
Time Series Econometrics
215 pages
Midterm Practice Questions
No ratings yet
Midterm Practice Questions
14 pages
Auditing: Estimation of Errors
No ratings yet
Auditing: Estimation of Errors
5 pages
ML Midsem 2018 Solutions
No ratings yet
ML Midsem 2018 Solutions
7 pages
Quiz3 2023
No ratings yet
Quiz3 2023
2 pages
MMPC-005 Quantitative Analysis
No ratings yet
MMPC-005 Quantitative Analysis
4 pages
Kruskal Wallis Test
No ratings yet
Kruskal Wallis Test
10 pages
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
No ratings yet
Solutions: 10-601 Machine Learning, Midterm Exam: Spring 2008 Solutions
8 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
University of Edinburgh College of Science and Engineering School of Informatics
No ratings yet
University of Edinburgh College of Science and Engineering School of Informatics
5 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
Dis 1
No ratings yet
Dis 1
5 pages
Midterm 2006
No ratings yet
Midterm 2006
11 pages
CMPUT 466/551 - Assignment 1: Paradox?
No ratings yet
CMPUT 466/551 - Assignment 1: Paradox?
6 pages
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
No ratings yet
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
8 pages
cs675 SS2022 Midterm Solution PDF
No ratings yet
cs675 SS2022 Midterm Solution PDF
10 pages
MLE and MAP Ex PG 1-4 Print
No ratings yet
MLE and MAP Ex PG 1-4 Print
10 pages
Assign 1
No ratings yet
Assign 1
5 pages
Kernel PCA
No ratings yet
Kernel PCA
13 pages
Solutions Problem Set 1
No ratings yet
Solutions Problem Set 1
7 pages
HW 3
No ratings yet
HW 3
7 pages
Ps 1
No ratings yet
Ps 1
5 pages
Machine 2021 Jan-Apr
No ratings yet
Machine 2021 Jan-Apr
45 pages
3.exponential Family & Point Estimation - 552
0% (1)
3.exponential Family & Point Estimation - 552
33 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
Histogram Notes
No ratings yet
Histogram Notes
2 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Version 1

Uploaded by

Version 1

Uploaded by

Franklin F.

AI534 — Written Homework Assignment 1 (30 pts + 6 bonus pts)

• Maximum likelihood estimation (MLE)

• Gradient descent learning

• Decision theory for probabilistic classifiers

• Maximum A Posteriori (MAP) parameter estimation

1. MLE for uniform distribution. [3pt]

(a) (1 pt) Write down the likelihood function of θ.

(b) (1pts) Show that maximizing

3. Decision theory: working with expectations. [6pt]

predicted true label y

Table 1: A mis-classification cost matrix for the spam filter problem.

• If the filter’s prediction is correct, there is no cost.

θ̂M AP = argmax P (θ = θ̂|X1 , . . . , Xn )

By applying Bayes’ theorem, this becomes:

θ̂M AP = argmax P (X1 , . . . , Xn |θ = θ̂)P (θ = θ̂) = argmax L(θ̂)p(θ̂)

where B(α, β) is a normalizing constant.

(a) (1 pts) Case 1: w0 = 0.

We can write out the likelihood function as:

where yik is an indicator variable taking value 1 if yi = k.

(a) (1 pts) Provide the log-likelihood function.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.