0% found this document useful (0 votes)

13 views5 pages

Dis 1

Uploaded by

luck786mty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views5 pages

Dis 1

Uploaded by

luck786mty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CS 189/289A Introduction to Machine Learning

Fall 2024 Jennifer Listgarten, Saeed Saremi DIS1

1 Maximizing Likelihood & Minimizing Cost

Maximum Likelihood Estimation (MLE) is a method for estimating the parameters of a statistical
model given observations.
Data Suppose we obtain n discrete observations belonging to B := {1, 2, 3, 4}. Our dataset looks
something like the following.

r1 = 1
r2 = 1
r3 = 3
..
.
rn = 1

Assumptions Suppose we aim to estimate the occurence probabilities of each class in B based
on the observed data. We additionally assume that observations are independent and identically
distributed (i.i.d.). In particular, this assumption implies that the order of the data does not matter.
Model Based on these assumptions, a natural model for our data is the multinomial distribu-
tion. In a multinomial distribution, the order of the data does not matter, and we can equivalently
represent our dataset as (y, cy )y∈B , where cy is the number of items of class y.
The probability mass function (PMF) of the multinomial distribution—this is, the probability in n
trials of obtaining each class i xi times—is
k
Y pixi
P(x1 , . . . , xk ) = n! .
i=1
xi !

(a) Derive an expression for the likelihood for this problem. What are the observations? What are
the parameters? What parameters are we trying to estimate with MLE?

DIS1, ©UCB CS 189/289A, Fall 2024. All Rights Reserved. This may not be publicly shared without explicit permission. 1
(b) Typically, the log-likelihood ℓ(θ) = log L(θ) is used instead of L(θ). Write down the expression
for ℓ(θ). Why might this be a good idea?

(c) Another idea might be to minimize the cross-entropy based on raw observations, corresponding
to the following program
n X
X
argmin − δri y log py
p∈R4+ i=1 y∈B
||p||1 =1

h i⊤
where p is the vector of probabilities per class p1 p2 p3 p4 , and δri y is the Kronecker
delta that outputs 1 if ri = y and 0 otherwise.
Show that this program is equivalent to the MLE program.

DIS1, ©UCB CS 189/289A, Fall 2024. All Rights Reserved. This may not be publicly shared without explicit permission. 2
2 Independence and Multivariate Gaussians
As described in lecture, a covariance matrix Σ ∈ RN×N for a random variable X ∈ RN with the
following values, where cov(Xi , X j ) = E[(Xi − µi )(X j − µ j )] is the covariance between the i-th and
j-th elements of the random vector X:

 
cov(X1 , X1 ) ... cov(X1 , Xn )
 
Σ =  ... ...  .

(1)
cov(Xn , X1 ) ... cov(Xn , Xn )


Recall that the density of an N dimensional Multivariate Gaussian Distribution N(µ, Σ) is defined
as follows when Σ is positive definite:

1 1 ⊤ Σ−1 (x−µ)
f (x) = p e− 2 (x−µ) . (2)
(2π)N |Σ|

Here, |Σ| denotes the determinant of the matrix Σ.

(a) Consider the random variables X and Y in R with the following conditions.

(i) X and Y can take values {−1, 0, 1}.

(ii) When X is 0, Y takes values 1 and -1 with equal probability ( 21 ). When Y is 0, X takes
values 1 and -1 with equal probability ( 21 ).
(iii) Either X is 0 with probability ( 21 ), or Y is 0 with probability ( 21 ).

Are X and Y uncorrelated? Are X and Y independent? Prove your assertions. Hint: Write
down the joint probability of (X, Y) for each possible pair of values they can take.

DIS1, ©UCB CS 189/289A, Fall 2024. All Rights Reserved. This may not be publicly shared without explicit permission. 3
(b) For X = [X1 , · · · , Xn ]⊤ ∼ N(µ, Σ), verify that if Xi , X j are independent (for all i , j), then Σ
must be diagonal, i.e., Xi , X j are uncorrelated.

α β
     
0 X 
(c) Let N = 2, µ =  , and Σ =  . Suppose X =  1  ∼ N(µ, Σ). Show that X1 , X2 are
0 β γ X2
independent if β = 0. Recall that two continuous random variables W, Y with joint density
fW,Y and marginal densities fW , fY are independent if fW,Y (w, y) = fW (w) fY (y).

(d) Consider a data point x drawn from an N-dimensional zero mean Multivariate Gaussian dis-
tribution N(0, Σ), as shown above. Assume that Σ−1 exists. Prove that there exists a matrix
A ∈ RN×N such that x⊤ Σ−1 x = ∥Ax∥22 for all vectors x. What is the matrix A?

In ordinary least-squares linear regression, we typically have n > d so that there is no w such that
Xw = y (these are typically overdetermined systems — too many equations given the number of
unknowns). Hence, we need to find an approximate solution to this problem. The residual vector
will be r = Xw − y and we want to make it as small as possible. The most common case is to
measure the residual error with the standard Euclidean ℓ2 -norm. So the problem becomes:

min ∥Xw − y∥22 ,

where X ∈ Rn×d , w ∈ Rd , y ∈ Rn .
Assume that X is full rank.

(a) How do we know that X⊤ X is invertible?

(b) Derive using vector calculus an expression for an optimal estimate for w for this problem.

(c) What should we do if X is not full rank?

Guidelines For Reliability Based Design
100% (1)
Guidelines For Reliability Based Design
236 pages
Jimenez A. Lectures On Probability and Statistics..Graduate-Level Economics 2024
No ratings yet
Jimenez A. Lectures On Probability and Statistics..Graduate-Level Economics 2024
295 pages
Compass Maritime Case Analysis
33% (6)
Compass Maritime Case Analysis
31 pages
Lec8 MLE
No ratings yet
Lec8 MLE
35 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
89 pages
Lecture 0.2 - Linear Methods For Regression, Optimization
No ratings yet
Lecture 0.2 - Linear Methods For Regression, Optimization
53 pages
9 Mle
No ratings yet
9 Mle
39 pages
2-1 Srswor
No ratings yet
2-1 Srswor
47 pages
1 s2.0 S0031320303003327 Main
No ratings yet
1 s2.0 S0031320303003327 Main
15 pages
Lec 12
No ratings yet
Lec 12
9 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
66 pages
Math170S Lecture6
No ratings yet
Math170S Lecture6
13 pages
Applied Nonparametric Regression
No ratings yet
Applied Nonparametric Regression
7 pages
CH 03
No ratings yet
CH 03
20 pages
Week 2 DrBuddhananda Banerjee Vector RV
No ratings yet
Week 2 DrBuddhananda Banerjee Vector RV
10 pages
MIT18 650F16 PSet8
No ratings yet
MIT18 650F16 PSet8
4 pages
CS1B044
No ratings yet
CS1B044
6 pages
CME 106 - Statistics Cheatsheet
No ratings yet
CME 106 - Statistics Cheatsheet
13 pages
Inference Quals 1992-2019
No ratings yet
Inference Quals 1992-2019
66 pages
Lectureslides Chap6-Annot PDF
No ratings yet
Lectureslides Chap6-Annot PDF
30 pages
Hayashi chp3
No ratings yet
Hayashi chp3
57 pages
1 Regression Analysis and Least Squares Estimators
No ratings yet
1 Regression Analysis and Least Squares Estimators
8 pages
Gawain NG Mag-Aaral #10 - Pagbuo NG Kabanata IV
No ratings yet
Gawain NG Mag-Aaral #10 - Pagbuo NG Kabanata IV
9 pages
Week03 Lecture BB
No ratings yet
Week03 Lecture BB
112 pages
Tugas Seminar Manajemen Pemasaran
No ratings yet
Tugas Seminar Manajemen Pemasaran
5 pages
Exercises Session1 PDF
No ratings yet
Exercises Session1 PDF
4 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
Dis1 Sol
No ratings yet
Dis1 Sol
8 pages
Uts M Fauzi Budiawan 120020595
No ratings yet
Uts M Fauzi Budiawan 120020595
9 pages
Statistics 3 Notes
No ratings yet
Statistics 3 Notes
90 pages
Lecture-4 2
No ratings yet
Lecture-4 2
50 pages
Version 1
No ratings yet
Version 1
18 pages
Multiple Linear Regressioin Part 1
0% (1)
Multiple Linear Regressioin Part 1
27 pages
Dynamic Econometric Models Time Series Econometrics For Microeconometricians 2011
No ratings yet
Dynamic Econometric Models Time Series Econometrics For Microeconometricians 2011
51 pages
Pengaruh Kepercayaan Diri Terhadap Prestasi Belajar Siswa
No ratings yet
Pengaruh Kepercayaan Diri Terhadap Prestasi Belajar Siswa
8 pages
Linear Stochastic Models: 5.1 Least Squares
No ratings yet
Linear Stochastic Models: 5.1 Least Squares
12 pages
Https Tutorials Iq Harvard Edu R Rstatistics Rstatistics HTML
No ratings yet
Https Tutorials Iq Harvard Edu R Rstatistics Rstatistics HTML
25 pages
Ranking Predictors in Logistic Regression
No ratings yet
Ranking Predictors in Logistic Regression
13 pages
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
No ratings yet
Lecture 6 - Ridge Regression, Polynomial Regression (DONE!!) PDF
26 pages
Theo Assignment 2 New
No ratings yet
Theo Assignment 2 New
10 pages
LeastSquares DeptMath
No ratings yet
LeastSquares DeptMath
7 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
Series 1, Oct 1st, 2013 Probability and Related) : Machine Learning
No ratings yet
Series 1, Oct 1st, 2013 Probability and Related) : Machine Learning
4 pages
Ken Black QA 5th Chapter14 Solution
100% (2)
Ken Black QA 5th Chapter14 Solution
43 pages
Lecture 2
No ratings yet
Lecture 2
8 pages
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
No ratings yet
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
34 pages
CQF ML Lab Estimating Default Probability With Logistic Regression
No ratings yet
CQF ML Lab Estimating Default Probability With Logistic Regression
7 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
7 pages
Partial Inverse Regression: Biometrika (2007), 94, 3, Pp. 615-625 Printed in Great Britain
No ratings yet
Partial Inverse Regression: Biometrika (2007), 94, 3, Pp. 615-625 Printed in Great Britain
12 pages
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
No ratings yet
Note 6: EECS 189 Introduction To Machine Learning Fall 2020 1 Multivariate Gaussians
9 pages
Econometrics - Exercise Set 2 (Solution)
No ratings yet
Econometrics - Exercise Set 2 (Solution)
12 pages
An Application On Multinomial Logistic Regression Model
No ratings yet
An Application On Multinomial Logistic Regression Model
22 pages
MBAS901 2 Lecture
No ratings yet
MBAS901 2 Lecture
87 pages
Final 100b w21
No ratings yet
Final 100b w21
5 pages
Day 1
No ratings yet
Day 1
41 pages
MIT18 650F16 Regression
No ratings yet
MIT18 650F16 Regression
44 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
Mathematics of The Linear Model and Linear Mixed Model: Brian Zhang February 2020
No ratings yet
Mathematics of The Linear Model and Linear Mixed Model: Brian Zhang February 2020
20 pages
Hw2 - Raymond Von Mizener - Chirag Mahapatra
No ratings yet
Hw2 - Raymond Von Mizener - Chirag Mahapatra
13 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
Multiple Linear Regression Analysis Usin
No ratings yet
Multiple Linear Regression Analysis Usin
19 pages
Stat Modelling Notes
No ratings yet
Stat Modelling Notes
49 pages
JMP Stat Graph Guide
No ratings yet
JMP Stat Graph Guide
942 pages
Heteroscedasticity
No ratings yet
Heteroscedasticity
4 pages
Deming Regression: Methcomp Package May 2007
100% (1)
Deming Regression: Methcomp Package May 2007
10 pages
Tabel Jumlah Peringkat Wilcoxon
No ratings yet
Tabel Jumlah Peringkat Wilcoxon
2 pages
Fasores
No ratings yet
Fasores
18 pages
ML Question CMU
No ratings yet
ML Question CMU
12 pages
HW 2
No ratings yet
HW 2
7 pages
MA204 FinalTest 2022
No ratings yet
MA204 FinalTest 2022
14 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
No ratings yet
Taller 3 (A. NG.) - Introducción Al Aprendizaje Supervisado
8 pages
2101 F 17 Assignment 1
No ratings yet
2101 F 17 Assignment 1
8 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
ES Key
No ratings yet
ES Key
6 pages
Awsm: CS 771A: Intro To Machine Learning, IIT Kanpur (19 Oct 2022) Name Roll No Dept
No ratings yet
Awsm: CS 771A: Intro To Machine Learning, IIT Kanpur (19 Oct 2022) Name Roll No Dept
2 pages
Machine 2020 Jul-Dec
No ratings yet
Machine 2020 Jul-Dec
45 pages
An Introduction To The Bootstrap: Teaching Statistics May 2001
No ratings yet
An Introduction To The Bootstrap: Teaching Statistics May 2001
7 pages
Answers Are Highlighted in Yellow Color: MCQ's Subject:Introductory Econometrics
100% (1)
Answers Are Highlighted in Yellow Color: MCQ's Subject:Introductory Econometrics
74 pages
ML ES 23-24-II Key
No ratings yet
ML ES 23-24-II Key
4 pages
HW 03 Sol
No ratings yet
HW 03 Sol
9 pages
Practice Midterm
No ratings yet
Practice Midterm
4 pages
Midterm 2010 Solutions
No ratings yet
Midterm 2010 Solutions
8 pages
HW 1
No ratings yet
HW 1
4 pages
Practice Midterm 2010
No ratings yet
Practice Midterm 2010
4 pages
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Dis 1

Uploaded by

Dis 1

Uploaded by

CS 189/289A Introduction to Machine Learning

Fall 2024 Jennifer Listgarten, Saeed Saremi DIS1

1 Maximizing Likelihood & Minimizing Cost

Here, |Σ| denotes the determinant of the matrix Σ.

(i) X and Y can take values {−1, 0, 1}.

min ∥Xw − y∥22 ,

(a) How do we know that X⊤ X is invertible?

(c) What should we do if X is not full rank?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.