0% found this document useful (0 votes)

14 views8 pages

Output 25

Uploaded by

lamvut67

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views8 pages

Output 25

Uploaded by

lamvut67

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

ML Homework 1

Lâm Vũ - 22000131

November 2024

Due Date: November 25th, 2024

1 Machine Learning as an Optimization Problem
(a) Explain why training a machine learning model can be formulated as an optimization problem.
What are the objectives and constraints involved?
Answer :
Training a machine learning model can be formulated as an optimization problem because the
main goal in training a machine learning model is to optimize an objective function that quantifies
how well the model is performing. This objective function is often referred to as the loss function
or cost function.
The loss function is a mathematical function that measures the error or difference between the
predicted values output by a machine learning model and the actual target values from the dataset.
The loss function is different depending on the ML model. It can be Mean Squared Error (MSE)
for regression or Cross-Entropy Loss for classification.
The optimization problem can be expressed as follows:

θ∗ = arg min L(θ, D),

where:

• θ: The set of parameters of the model (e.g., weights, biases).

• L(θ, D): The loss function that quantifies how well the model’s predictions match the actual
values, based on the dataset D.
• D: The dataset consisting of input-output pairs (xi , yi ).
• θ∗ : The optimal parameters that minimize the loss function.

(b) Provide examples of how optimization techniques are applied in the training of models such
as linear regression and logistic regression.
Answer : In linear regression, the goal is to find the parameters of the model that minimize the
error between the predicted and actual target values. The model is typically represented as:

y = θ 0 + θ 1 x1 + θ 2 x2 + . . . + θ p xp + ϵ

Where:
• y is the target variable.
• x1 , x2 , . . . , xp are the input features.
• θ0 , θ1 , . . . , θp are the model parameters (coefficients).
• ϵ is the error term, usually assumed to be Gaussian noise.

1
The objective is to minimize the MSE loss function:
n
1X
L(θ) = (yi − ŷi )2
n i=1

Where:
• ŷi = θ0 + θ1 x1 + . . . + θp xp is the predicted value for the i-th data point.
• n is the total number of data points.
The optimization is typically performed using Gradient Descent, which iteratively updates the
parameters θ in the direction of the negative gradient of the loss function. The update rule is:

∂L(θ)
θj := θj − α
∂θj

Where:
• α is the learning rate.
∂L(θ)
• ∂θj is the partial derivative of the loss function with respect to the parameter θj .

The gradient descent algorithm continues until the loss function L(θ) converges to a minimum.
In logistic regression, the goal is to predict the probability of a binary outcome (0 or 1) based
on input features. The model is similar to linear regression, but it applies the sigmoid function to
the output:
1
hθ (x) =
1+ e−(θ0 +θ1 x1 +...+θp xp )
Where:
• hθ (x) is the predicted probability that y = 1.
• x1 , x2 , . . . , xp are the input features.
• θ0 , θ1 , . . . , θp are the model parameters.
The objective in logistic regression is to minimize the Cross-entropy, which is given by:
n
1X
L(θ) = − [yi log(hθ (xi )) + (1 − yi ) log(1 − hθ (xi ))]
n i=1

Where:
• yi is the actual class label for the i-th data point.
• hθ (xi ) is the predicted probability for the i-th data point.
Similar to linear regression, **Gradient Descent** is used to optimize the parameters in logistic
regression. The update rule is:

∂L(θ)
θj := θj − α
∂θj

Where ∂L(θ)
∂θj is the partial derivative of the loss function with respect to the parameter θj .
(c) Discuss the role of the loss (or cost) function in this context and how it guides the optimization
process.
Answers:

2
The loss function plays a crucial role in the optimization process of machine learning models. It
quantifies how well the model’s predictions match the actual target values. The primary objective
in training any machine learning model is to minimize loss, which reflects the error between the
predicted and true values.
By evaluating the predictions against the actual values, the loss function provides feedback to
guide the optimization algorithm in adjusting the model parameters.
The optimization process involves iteratively updating the model parameters (such as θ0 , θ1 , . . . , θp )
in such a way that the loss function is minimized. This can be done using Gradient Descent or other
optimization algorithms. The gradient of the loss function with respect to the model parameters
provides the direction in which the parameters should be updated to reduce the error. The optimiza-
tion process stops when the loss function reaches its minimum, indicating that the model parameters
have been optimized.
In both linear and logistic regression, the loss function serves as a guiding signal for the optimiza-
tion process , directs the model to minimizing error. By optimizing the loss function, we improve
the model’s performance and make it more accurate in predicting future data.

2 Maximum Likelihood Estimation (MLE) and Maximum

A Posteriori (MAP)
Given a dataset of independent and identically distributed observations X = {x1 , x2 , . . . , xn } drawn
from a normal distribution with unknown mean µ and known variance σ 2 .
(a) Derive the Maximum Likelihood Estimator (MLE) for the mean µ.
Answer:
The likelihood function for n independent observations from a normal distribution is given by:
n
Y
L(µ) = f (xi ; µ)
i=1

(xi − µ)2

1
f (xi ; µ) = √ exp −
2πσ 2 2σ 2
Thus, the likelihood function L(µ) is:
n
(xi − µ)2

Y 1
L(µ) = √ exp −
i=1 2πσ 2 2σ 2

Take logarithm of the likelihood function, obtaining the log-likelihood function:

n
(xi − µ)2

X 1
log L(µ) = log √ exp −
i=1 2πσ 2 2σ 2

Using the properties of logarithms, this simplifies to:

n
(xi − µ)2

X 1
log L(µ) = − log(2πσ 2 ) −
i=1
2 2σ 2

We can factor out the constants:

n
n 1 X
log L(µ) = − log(2πσ 2 ) − 2 (xi − µ)2
2 2σ i=1

Take the derivative of log L(µ) with respect to µ and set it equal to zero:
n
d 1 X
log L(µ) = 2 (xi − µ)
dµ σ i=1

3
Setting the derivative equal to zero to maximize:
n
X
(xi − µ) = 0
i=1

Solving this equation for µ, we get:

n
1X
µ= xi
n i=1
For conclusion, the MLE for µ is the average of the observed data points.
(b) Assume a prior distribution for µ that is also normally distributed with mean µ0 and variance
τ 2 . Derive the Maximum A Posteriori (MAP) estimator for µ.
Answer:
The general formula for the Maximum A Posteriori (MAP) estimator is defined as:
µ̂M AP = arg max p(µ | X),
µ

where: - p(µ | X) is the posterior probability of the parameter µ given the observed data X, -
p(X | µ) is the likelihood of the data given the parameter, - p(µ) is the prior probability of the
parameter. Using Bayes’ theorem, the posterior can be expressed as:
p(X | µ) · p(µ)
p(µ | X) = .
p(X)
Here: - p(X) is the evidence (a normalizing constant) which does not depend on µ. Thus, the
MAP estimate becomes:
µ̂M AP = arg max p(X | µ) · p(µ) .
µ

Taking the logarithm, the MAP estimate µ̂M AP is found by maximizing the log-posterior:

µ̂M AP = arg max log p(µ | X) = arg max log p(X | µ) + log p(µ) .
µ µ

The log-likelihood is:

n
n 1 X
log p(X | µ) = − log(2πσ 2 ) − 2 (xi − µ)2 .
2 2σ i=1

Ignoring constants independent of µ, this simplifies to:

n
1 X
log p(X | µ) = − (xi − µ)2 .
2σ 2 i=1

The prior for µ is normally distributed with mean µ0 and variance τ 2 :

(µ − µ0 )2

1
p(µ) = √ exp − .
2πτ 2 2τ 2
The log-prior is:
1 1
log p(µ) = − log(2πτ 2 ) − 2 (µ − µ0 )2 .
2 2τ
Ignoring constants independent of µ, this becomes:
1
log p(µ) = − (µ − µ0 )2 .
2τ 2
The log-posterior is:
n
1 X 1
(xi − µ)2 − 2 (µ − µ0 )2 .

arg max log p(X | µ) + log p(µ) = − 2
µ 2σ i=1 2τ

4
To maximize this, we differentiate with respect to µ:
n
!
∂ 1 X 1
− 2 (xi − µ)2 − 2 (µ − µ0 )2 = 0.
∂µ 2σ i=1 2τ

Simplifying the derivative:

n
1 X 1
− 2
(xi − µ) − 2 (µ − µ0 ) = 0.
σ i=1 τ
Reorganizing terms:
n
n 1 1 X µ0
µ + 2 = 2 xi + 2 .
σ2 τ σ i=1 τ
Finally, solving for µ: Pn
1 µ0
σ2 i=1 xi + τ 2
µ̂M AP = n 1 .
σ2 + τ 2

(c) Compare the MLE and MAP estimators. Discuss how the choice of µ0 and τ 2 affects the MAP
estimator. Answer:
Estimates µ by maximizing the likelihood function p(X | µ) based solely on observed data. The
MLE for the mean of a normal distribution is:
n
1X
µ̂M LE = xi .
n i=1

Incorporates prior information by maximizing the posterior distribution p(µ | X), which combines
the likelihood p(X | µ) and the prior p(µ). The MAP estimate for µ with a Gaussian prior is:
1
Pn µ0
2 i=1 xi + τ 2
µ̂M AP = σ n 1 .
σ2 + τ 2

A larger µ0 pulls the MAP estimate towards the prior mean. If µ0 is close to the true mean, the
MAP estimate improves over MLE. A smaller τ 2 (stronger prior) gives more weight to the prior,
making the estimate closer to µ0 . As τ 2 → ∞, MAP converges to MLE.

3 Naive Bayes Classification

You are provided with a simplified dataset of text documents classified into two categories: Sports
and Politics. The vocabulary consists of the words: win, team, election, and vote.
Word Sports Count Politics Count
win 50 10
team 60 5
election 15 70
vote 10 80
(a) Explain the Naive Bayes assumption and how it applies to text classification.
Answer:
The key assumption in Naive Bayes is that, given the class label C, the presence (or absence) of
each feature (word) is independent of the others. In other words, the occurrence of one word does
not affect the occurrence of another word, given the class label.
Mathematically, for a set of features X = {x1 , x2 , . . . , xn } and a class label C, the Naive Bayes
assumption is:
P (x1 , x2 , . . . , xn | C) = P (x1 | C) · P (x2 | C) · . . . · P (xn | C)
This simplifies the computation of the likelihood P (X | C), as it assumes independence between
the words in the document, conditioned on the class.

5
In text classification, each document is treated as a set of words, and the goal is to classify it
into one of the categories, such as ”Sports” or ”Politics” in this case. The Naive Bayes classifier
computes the probability of a document belonging to each class and selects the class with the highest
probability. Specifically, for a given document with words X = {x1 , x2 , . . . , xn }, the probability of
the document belonging to class C is given by Bayes’ Theorem:
P (X | C) · P (C)
P (C | X) =
P (X)
If we have a dataset with two classes, ”Sports” and ”Politics”, and words like ”win”, ”team”,
”election”, and ”vote” in the vocabulary, we can compute the likelihood of the document belonging
to either class based on the frequencies of these words in the training data. The classifier will
compute the likelihood for each class, multiply it by the prior probability of the class, and select the
class with the highest posterior probability.
(b) Using the data above, calculate the probability that a document containing the words win
and vote belongs to the Sports category versus the Politics category. Assume uniform class priors
and apply Laplace smoothing with α = 1.
Answer:
The total number of words in each class is calculated as:

T otalwordsinSports = 50 + 60 + 15 + 10 = 135

T otalwordsinP olitics = 10 + 5 + 70 + 80 = 165

Assuming uniform class priors, we have:
1
P (Sports) = P (P olitics) =
2
Next, we apply Laplace smoothing with α = 1. The probability of a word xi given the class C is
calculated as:
countof wordxi inclassC + α
P (xi | C) =
totalwordsinclassC + α · V
where α = 1 and V = 4 is the vocabulary size (number of unique words: ”win”, ”team”, ”election”,
”vote”). Now, we calculate the probabilities for each word: For Sports:
50 + 1 51
P (win | Sports) = =
135 + 4 · 1 139
10 + 1 11
P (vote | Sports) = =
135 + 4 · 1 139
For Politics:
10 + 1 11
P (win | P olitics) = =
165 + 4 · 1 169
80 + 1 81
P (vote | P olitics) = =
165 + 4 · 1 169
We now compute the posterior probabilities for each class using Bayes’ Theorem. Since we
assume uniform priors, we have:
P (Sports) · P (win | Sports) · P (vote | Sports)
P (Sports | X) =
P (X)
P (P olitics) · P (win | P olitics) · P (vote | P olitics)
P (P olitics | X) =
P (X)
For Sports:
1 1
P (Sports | X) = · · 0.367 · 0.079
P (X) 2

6
For Politics:
1 1
P (P olitics | X) = · · 0.065 · 0.479
P (X) 2
We can see that P (P olitics | X) is greater than P (Sports | X), the document is more likely to
belong to the Politics category. (c) Interpret the results and discuss any limitations of the Naive
Bayes classifier in this context. Answer:
Naive Bayes assumes that the presence of each word is independent of others, which may not
hold true in real-world data, where words can be correlated. Words that don’t appear in the
training data (even with Laplace smoothing) may still result in zero probability for unseen words.
The classifier relies on a predefined vocabulary, which may not capture all important words in
the documents.

4 Logistic Regression
Consider a binary classification problem where the goal is to predict whether a student will pass
or fail an exam based on the numbers of hours spent studying and sleeping. Formulate the logistic
regression model for this problem. Answer :
https://github.com/lamvu0607/MLH W 1

5 Linear Regression and Overfitting

You are given a dataset where the input variable x ranges from 0 to 10, and the target variable y is
generated by y = 2x + ϵ, where ϵ is Gaussian noise with mean 0 and variance 4.
(a) Fit a linear regression model to the data and report the estimated parameters.
(b) Fit a 9th-degree polynomial regression model to the same data.
(c) Compare the training error and discuss which model is likely overfitting the data. Provide
visualizations to support your answer. Answer:
https://github.com/lamvu0607/MLH W 1lamvu0607/M L HW 1

6 Regularization Techniques
Regularization is a technique used to prevent overfitting in machine learning models.
(a) Explain the difference between L1 (Lasso) and L2 (Ridge) regularization in the context of linear
regression.
Answer:
Regularization is a very important technique in machine learning to prevent overfitting. Mathe-
matically speaking, it adds a regularization term in order to prevent the coefficients from fitting so
perfectly to overfit.
The difference between the L1 and L2 regularization is that L2 is the sum of the square of the
weights, while L1 is just the sum of the absolute values of the weights. As follows in linear regression:

• L1 regularization on least squares:

p
X
L(θ) = M SE + λ |θj |,
j=1

• L2 regularization on least squares:

p
X
L(θ) = M SE + λ θj2 ,
j=1

7
Solution Uniqueness: L2-norm (Ridge) always provides a unique solution. The penalty is the
sum of the squares of coefficients, leading to a smooth, differentiable function and a single global
minimum. L1-norm (Lasso) solution is not always unique. The penalty is the sum of the absolute
values of coefficients, which can lead to sparse solutions (some coefficients set to zero), but multiple
valid solutions can exist when features are correlated.
Sparsity: L1-norm has the property of producing many coefficients with zero values or very
small values and few large coefficients, allowing it to perform feature selection. L2-norm keeps all
features.
Computational Efficiency: L1-norm does not have an analytical solution. However, its spar-
sity properties allow it to be used with sparse algorithms, improving computational efficiency. L2-
norm has an analytical solution, making its computation more straightforward and efficient.
Stability: L1-norm is sensitive to small changes in data, especially with correlated features.
L2-norm retains all features, making it less sensitive to outliers.
(b) Given a dataset with multiple features that are highly correlated, discuss which regularization
method would be more appropriate and why. Answer:
In cases of high feature correlation, L2 regularization is typically the better choice due to its ability
to handle multicollinearity and provide stable, interpretable results without discarding important
features

Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
89 pages
Lecture1 ML MLE
No ratings yet
Lecture1 ML MLE
103 pages
Tecnomatix Plant Simulation Basics, Methods, and Strategies Student Guide - 2012
100% (1)
Tecnomatix Plant Simulation Basics, Methods, and Strategies Student Guide - 2012
764 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
66 pages
Lecture 3 ML - Optimization
No ratings yet
Lecture 3 ML - Optimization
32 pages
DA Unit 2
No ratings yet
DA Unit 2
124 pages
Logistic Regression
No ratings yet
Logistic Regression
36 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
77 pages
Regression Probabilistic Perspective
No ratings yet
Regression Probabilistic Perspective
20 pages
Lecture 5 - Logistic Regression
No ratings yet
Lecture 5 - Logistic Regression
28 pages
ML Basics Lecture2 Linear Classification
No ratings yet
ML Basics Lecture2 Linear Classification
34 pages
ML - Unit 2
No ratings yet
ML - Unit 2
155 pages
Log Reg Skimed - Ipynb - Colab
No ratings yet
Log Reg Skimed - Ipynb - Colab
10 pages
Notes 05
No ratings yet
Notes 05
51 pages
Chapter02 Introduction To DeepLearning
No ratings yet
Chapter02 Introduction To DeepLearning
84 pages
7 Logistic-Regression
No ratings yet
7 Logistic-Regression
63 pages
Assignment 10 Solution
No ratings yet
Assignment 10 Solution
8 pages
Lec 20
No ratings yet
Lec 20
16 pages
Machine Learning II: The Linear Model
No ratings yet
Machine Learning II: The Linear Model
48 pages
Version 1
No ratings yet
Version 1
18 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
Output 23
No ratings yet
Output 23
6 pages
Lin Reg
No ratings yet
Lin Reg
34 pages
04 Lecturenote MLE MAP Discriminative
No ratings yet
04 Lecturenote MLE MAP Discriminative
6 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
Logistic Regression (Probability Concepts) and Perceptron
No ratings yet
Logistic Regression (Probability Concepts) and Perceptron
20 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
Mastering HTML A Beginners Guide (Sufyan Bin Uzayr) (Z-Library)
No ratings yet
Mastering HTML A Beginners Guide (Sufyan Bin Uzayr) (Z-Library)
341 pages
Lecture17 Mle Map
No ratings yet
Lecture17 Mle Map
29 pages
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
No ratings yet
Note 4: EECS 189 Introduction To Machine Learning Fall 2020 1 MLE and MAP For Regression (Part I)
6 pages
2+logistic Regression
No ratings yet
2+logistic Regression
10 pages
Unit 3-Discriminative Models
No ratings yet
Unit 3-Discriminative Models
29 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Lecture 05
No ratings yet
Lecture 05
5 pages
Binary Classification and Logistic Regression
No ratings yet
Binary Classification and Logistic Regression
7 pages
Logistic Regression Loss
No ratings yet
Logistic Regression Loss
7 pages
Scribe Notes BML
No ratings yet
Scribe Notes BML
25 pages
Bayesian and MLE
No ratings yet
Bayesian and MLE
30 pages
Lec 05
No ratings yet
Lec 05
53 pages
Lecture Notes 6 Logistic Regression
No ratings yet
Lecture Notes 6 Logistic Regression
8 pages
Unit II
100% (1)
Unit II
13 pages
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
No ratings yet
Machine Learning: Probabilistic View of Linear Regression Logistic Regression Hyperplane Based Classifiers and Perceptron
67 pages
Maximum Likelihood Estimation by K.Kashin
No ratings yet
Maximum Likelihood Estimation by K.Kashin
34 pages
Lecture 6
No ratings yet
Lecture 6
13 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Essay Topics Grade Ten
No ratings yet
Essay Topics Grade Ten
8 pages
Logistic Regression
No ratings yet
Logistic Regression
25 pages
Jewish Art and Civilization
100% (3)
Jewish Art and Civilization
358 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Econometrics - Exercise Set 2 (Solution)
No ratings yet
Econometrics - Exercise Set 2 (Solution)
12 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
Understanding Maximum Likelihood
No ratings yet
Understanding Maximum Likelihood
5 pages
MLE Lecture Note For Econometrician
No ratings yet
MLE Lecture Note For Econometrician
13 pages
04 Probability and Learning PDF
No ratings yet
04 Probability and Learning PDF
34 pages
Ralph Rosen-Making Mockery - The Poetics of Ancient Satire (Classical Culture and Society) (2007)
100% (1)
Ralph Rosen-Making Mockery - The Poetics of Ancient Satire (Classical Culture and Society) (2007)
311 pages
CS229 Supplemental Lecture Notes: 1 Binary Classification
No ratings yet
CS229 Supplemental Lecture Notes: 1 Binary Classification
7 pages
Cheatsheet Supervised Learning
No ratings yet
Cheatsheet Supervised Learning
4 pages
11 Parameter Estimation
No ratings yet
11 Parameter Estimation
6 pages
ML Notes
No ratings yet
ML Notes
4 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
PDF Hostel Management System
0% (1)
PDF Hostel Management System
12 pages
Logistic Regression and SGD
No ratings yet
Logistic Regression and SGD
10 pages
Linux VI and Vim Editor: Tutorial and Advanced Features
No ratings yet
Linux VI and Vim Editor: Tutorial and Advanced Features
17 pages
Opinion Essay (IELTS APPROACH)
100% (1)
Opinion Essay (IELTS APPROACH)
27 pages
Petrarch Secret
No ratings yet
Petrarch Secret
223 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Unit 1
No ratings yet
Unit 1
26 pages
MLE and MAP Ex PG 1-4 Print
No ratings yet
MLE and MAP Ex PG 1-4 Print
10 pages
Critical Discourse Analysis and Translation Studis
No ratings yet
Critical Discourse Analysis and Translation Studis
11 pages
Design and Implementation of Text To Speech Application For Vision Impaired Students
100% (2)
Design and Implementation of Text To Speech Application For Vision Impaired Students
15 pages
AIGDEL - 0820 Red 1 26 - Compressed 1 26
No ratings yet
AIGDEL - 0820 Red 1 26 - Compressed 1 26
26 pages
Chapter 14
No ratings yet
Chapter 14
24 pages
Tutorial 6
No ratings yet
Tutorial 6
5 pages
WT 2 FN
No ratings yet
WT 2 FN
5 pages
"A Memorandum Is Written Not To Inform The Rea: Writing Memoranda
100% (1)
"A Memorandum Is Written Not To Inform The Rea: Writing Memoranda
4 pages
Praise and Worship Sample - April 10 2015
No ratings yet
Praise and Worship Sample - April 10 2015
14 pages
Customs Manners and Etiquette in Malaysia
No ratings yet
Customs Manners and Etiquette in Malaysia
24 pages
Kajian Sikap Dan Persepsi Terhadap Pembelajaran Bahasa Mandarin Dalam Kalangan Pelajar Uitm Kelantan
No ratings yet
Kajian Sikap Dan Persepsi Terhadap Pembelajaran Bahasa Mandarin Dalam Kalangan Pelajar Uitm Kelantan
16 pages
Tourist Attractions in Roxas
No ratings yet
Tourist Attractions in Roxas
10 pages
Resiliensi Keluarga Pada Keluarga Yang Memiliki Anak Autis
No ratings yet
Resiliensi Keluarga Pada Keluarga Yang Memiliki Anak Autis
13 pages
Ringkasan Materi Optimasi Tugas Mata Kul
No ratings yet
Ringkasan Materi Optimasi Tugas Mata Kul
15 pages
Maharashtra State Board of Technical Education Analysis of Term End Examination Result
No ratings yet
Maharashtra State Board of Technical Education Analysis of Term End Examination Result
1 page
Identifying and Remediating Reading Difficulties
No ratings yet
Identifying and Remediating Reading Difficulties
17 pages
Test 1 1. Mark The Letter A, B, C or D To Indicate The Underlined Part That Needs Correction in Each of The Following Questions
No ratings yet
Test 1 1. Mark The Letter A, B, C or D To Indicate The Underlined Part That Needs Correction in Each of The Following Questions
7 pages
Descargar Gratis El Libro La Rebelion de Las Ratas
No ratings yet
Descargar Gratis El Libro La Rebelion de Las Ratas
3 pages
Polynomial Sample Problems
No ratings yet
Polynomial Sample Problems
3 pages
How To Pray
100% (1)
How To Pray
1 page
Faith Is A Must To Live A Victorious Life-Faith Is A Must To Live A Victorious Life
No ratings yet
Faith Is A Must To Live A Victorious Life-Faith Is A Must To Live A Victorious Life
1 page
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Output 25

Uploaded by

Output 25

Uploaded by

ML Homework 1

Lâm Vũ - 22000131

Due Date: November 25th, 2024

θ∗ = arg min L(θ, D),

• θ: The set of parameters of the model (e.g., weights, biases).

2 Maximum Likelihood Estimation (MLE) and Maximum

Take logarithm of the likelihood function, obtaining the log-likelihood function:

Using the properties of logarithms, this simplifies to:

We can factor out the constants:

Solving this equation for µ, we get:

The log-likelihood is:

Ignoring constants independent of µ, this simplifies to:

The prior for µ is normally distributed with mean µ0 and variance τ 2 :

Simplifying the derivative:

3 Naive Bayes Classification

T otalwordsinP olitics = 10 + 5 + 70 + 80 = 165

5 Linear Regression and Overfitting

• L1 regularization on least squares:

• L2 regularization on least squares:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.