0% found this document useful (0 votes)

11 views13 pages

Ass6 Solns

Uploaded by

ceadamtan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views13 pages

Ass6 Solns

Uploaded by

ceadamtan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Homework Assignment 6

Due: Friday, November 8, 2024, 11:59 p.m. Mountain time

Total marks: 26

Policies:
For all multiple-choice questions, note that multiple correct answers may exist. However, selecting
an incorrect option will cancel out a correct one. For example, if you select two answers, one
correct and one incorrect, you will receive zero points for that question. Similarly, if the number
of incorrect answers selected exceeds the correct ones, your score for that question will be zero.
Please note that it is not possible to receive negative marks. You must select all the correct
options to get full marks for the question.
While the syllabus initially indicated the need to submit a paragraph explaining the use of AI or
other resources in your assignments, this requirement no longer applies as we are now utilizing
eClass quizzes instead of handwritten submissions. Therefore, you are not required to submit any
explanation regarding the tools or resources (such as online tools or AI) used in completing this
quiz.
This PDF version of the questions has been provided for your convenience should you wish to print
them and work offline.
Only answers submitted through the eClass quiz system will be graded. Please do not
submit a written copy of your responses.

Question 1. [1 mark]
Consider the predictor f (x) = xw, where w ∈ R is a one-dimensional parameter, and x rep-
resents the feature with no bias term. Suppose you are given a dataset of n data points D =
((x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )), where each yi is the target variable corresponding to feature xi . Let
the loss function be the scaled squared loss `(f (x), y) = c(f (x) − y)2 where c ∈ R. The estimate of
the expected loss for a parameter w ∈ R is defined as the following convex function:
n
1X
L̂(w) = c(xi w − yi )2
n
i=1

What is the closed form solution for ŵ = arg minw∈R L̂(w) ?

Pn
i=1 cxi yi
a. ŵ = P n 2
i=1 xi
Pn
yi
b. ŵ = i=1
n
Pn
i=1 yi
c. ŵ = Pn
i=1 xi
Pn
xi yi
d. ŵ = Pi=1
n 2
i=1 xi

Solution:
Answer: d.
Explanation: To find the closed-form solution for ŵ, we need to minimize L̂(w). This is equivalent
to minimizing the function:

1/13
Fall 2024 CMPUT 267: Basics of Machine Learning

n
cX
L̂(w) = (xi w − yi )2
n
i=1

Taking the derivative with respect to w and setting it to zero gives:

n
∂ L̂ cX
= 2(xi w − yi )xi = 0
∂w n
i=1

Simplifying leads to:

n
X n
X
xi (xi w) = xi yi
i=1 i=1

Thus, solving for w yields:

Pn
x i yi
ŵ = Pi=1
n 2
i=1 xi

Question 2. [1 mark]
Let everything be defined as in the previous question. Suppose we consider the multivariate case
where f (x) = x> w, and w ∈ Rd+1 . What is the closed form solution for ŵ = arg minw∈Rd+1 L̂(w)?

a. ŵ = A−1 b where A = ni=1 xi x>

P Pn
i and b = i=1 xi yi (assume that A is invertible).

b. ŵ = Ax where A = ni=1 xi x>

P
i

c. ŵ = n1 ni=1 xi
P

Pn
cxi yi
d. ŵ = Pi=1
n 2
i=1 cxi

Solution:
Answer: a.
Explanation: To find the closed-form solution for ŵ, we minimize the expected loss defined as:
n
1X
L̂(w) = c(x>
i w − yi )
2
n
i=1

Taking the derivative with respect to w and setting it to zero gives:

n
∂ L̂ cX
= 2(x>
i w − yi )xi = 0
∂w n
i=1

This simplifies to:

n
X n
X
xi (x>
i w) = yi xi
i=1 i=1

We can express this using the definitions A and b:

2/13
Fall 2024 CMPUT 267: Basics of Machine Learning

Aw = b
Thus, we can write:

ŵ = A−1 b

Question 3. [1 mark]
Let g(w) = − ln w i=1 yi − ln(1 − w) ni=1 (1 − yi ) where
Pn P
P w ∈ R. We can rewrite this a bit more
simply as g(w) = −s ln w − (n − s) ln(1 − w) where s = ni=1 yi . What is the derivative g 0 (w) and
the first order gradient descent update rule with a constant step size η?

a. g 0 (w) = − 1−w
s
+ n−s
w and update rule w ← w − η − s
1−w + n−s
w

b. g 0 (w) = − ws + n−s
1−w
s
and update rule w ← w − η − 1−w + n−s
w

c. g 0 (w) = − ws + n−s
1−w and update rule w ← w − η − ws + n−s
1−w

d. g 0 (w) = − 1−w
s
− n−s
w
s
and update rule w ← w − η − 1−w − n−s
w

Solution:
Answer: c
Explanation: To find the derivative g 0 (w), we differentiate g(w):

g(w) = −s ln w − (n − s) ln(1 − w)
Taking the derivative:
1 1
g 0 (w) = −s
+ (n − s)
w 1−w
The gradient descent update rule with a constant step size η is given by:

s n−s
w ← w − ηg 0 (w) = w − η − +
w 1−w
Thus, the first-order gradient descent update rule becomes:

s n−s
w ←w+η −
w 1−w

Question 4. [1 mark]
Let everything be defined as in the previous question. What is the second derivative g 00 (w) and the
second order gradient descent update rule?
s n−s
−w + 1−w
a. g 00 (w) = s
w2
− n−s
(1−w)2
and update: w ← w − s
− n−s 2
w2 (1−w)

s n−s
−w + 1−w
b. g 00 (w) = s
w2
+ n−s
(1−w)2
and update: w ← w − s
+ n−s 2
w2 (1−w)

3/13
Fall 2024 CMPUT 267: Basics of Machine Learning

s n−s
−w + 1−w
c. g 00 (w) = − ws2 + n−s
(1−w)2
and update: w ← w − − s
+ n−s 2
w2 (1−w)

s n−s
−w + 1−w
d. g 00 (w) = s
w2
+ n−s
(1−w)2
and update: w ← w + s
+ n−s 2
w2 (1−w)

Solution:
Answer: b.
Explanation: To find the second derivative:
s n−s
g 00 (w) = 2
+
w (1 − w)2
The first derivative is given by:
s n−s
g 0 (w) = −
+
w 1−w
Thus, the second-order gradient descent update rule is:

g 0 (w) − ws + 1−w
n−s
w←w− = w − s n−s
g 00 (w) w2
+ (1−w) 2

Question 5. [1 mark]
Let everything be defined as in the previous question. What is the closed form solution for

w∗ = arg min g(w)

w∈R

a. w∗ = n/s

b. w∗ = s/(n − s)

c. w∗ = s/(s − n)

d. w∗ = s/n

Solution:
Answer: d.
Explanation: Set derivative of g(w) to zero and solve for w:
−s n−s s n−s s
+ = 0 =⇒ = =⇒ w = .
w 1−w w 1−w n

Question 6. [1 mark]
Let g(w) = w4 + e−w where w ∈ R. What is the derivative g 0 (w) and the first order gradient
descent update rule with a constant step size η?

a. g 0 (w) = 4w3 − e−w and update: w ← w − η(4w3 − e−w )

b. g 0 (w) = 4w3 + e−w and update: w ← w − η(4w3 + e−w )

4/13
Fall 2024 CMPUT 267: Basics of Machine Learning

c. g 0 (w) = 4w3 + e−w and update: w ← w + η(4w3 + e−w )

d. g 0 (w) = 4w3 − e−w and update: w ← w + η(4w3 − e−w )

Solution:
Answer: a.
Explanation: To find the derivative g 0 (w):

d d −w
g 0 (w) = (w4 ) + (e ) = 4w3 − e−w .
dw dw
The first-order gradient descent update rule is given by:

w ← w − ηg 0 (w) = w − η(4w3 − e−w ).

Question 7. [1 mark]
Let everything be defined as in the previous question. What is the second derivative g 00 (w) and the
second order gradient descent update rule?
4w3 −e−w
a. g 00 (w) = 12w2 − e−w and update: w ← w − 12w2 −e−w

4w3 −e−w
b. g 00 (w) = 12w2 + e−w and update: w ← w + 12w2 +e−w

4w3 −e−w
c. g 00 (w) = 12w2 + e−w and update: w ← w − 12w2 +e−w

4w3 −e−w
d. g 00 (w) = 12w2 − e−w and update: w ← w + 12w2 −e−w

Solution:
Answer: c.
Explanation: To find the second derivative g 00 (w):
1. First, we have the first derivative:

g 0 (w) = 4w3 − e−w .

2. Next, we differentiate g 0 (w) to get g 00 (w):

d d −w
g 00 (w) = (4w3 ) − (e ) = 12w2 + e−w .
dw dw
The second-order gradient descent update rule is given by:

g 0 (w) 4w3 − e−w

w←w− = w − .
g 00 (w) 12w2 + e−w

Question 8. [1 mark]
Let everything be defined as in the previous question. For the second order update rule, calculate
w(1) if w(0) = 0.

5/13
Fall 2024 CMPUT 267: Basics of Machine Learning

Solution:
Answer: 1.
Explanation: Let g(w) = w4 +e−w where w ∈ R. We want to compute w(1) using the second-order
gradient descent update rule, given w(0) = 0.
The second-order update rule is given by:

g 0 (w(0) )
w(1) = w(0) −
g 00 (w(0) )

Step 1: Calculate g 0 (w). The first derivative is:

g 0 (w) = 4w3 − e−w

Substituting w(0) = 0:
g 0 (0) = 4(0)3 − e0 = −1
Step 2: Calculate g 00 (w). The second derivative is:

g 00 (w) = 12w2 + e−w

Substituting w(0) = 0:
g 00 (0) = 12(0)2 + e0 = 1
Step 3: Update the value of w. Now we can compute w(1) :
−1
w(1) = 0 − =0+1=1
1

Question 9. [1 mark]
Let everything be defined as in the previous question. Change the step size to be calculated using
the normalized gradient. For the first order update rule, calculate w(1) if w(0) = 0, η = 1. Only for
this problem, set = 0.

Solution:
Answer: 1.
Explanation: We know that the derivative at w(0) = 0 is g 0 (0) = −1. The normalized gradient
step size η is given by
η
η (0) = 0 (0) = 1
|g (w )|
Therefore
w(1) = w(0) − η (0) g 0 (w(0) ) = 0 − 1 × (−1) = 1.

Question 10. [1 mark]

Let g(w) = g(w1 , w2 ) = w12 w22 + e−w1 + e−w2 where w ∈ R2 . What is the gradient of g(w) and the
first order gradient descent update rule with a constant step size η?
>
(t) (t) (t) (t)
a. w(t+1) = w(t) − η 2w1 (w2 )2 , 2w2 (w1 )2

6/13
Fall 2024 CMPUT 267: Basics of Machine Learning

(t)
>
(t) (t) (t) (t) x
b. w(t+1) = w(t) − η 2w1 (w2 )2 − e−w1 , 2w2 (w1 )2 − e−w2 (t)

(t) >
(t)

(t) (t) (t) (t)
c. w(t+1) = w(t) − η 2w1 (w2 )2 + e−w1 , 2w2 (w1 )2 + e−w2

(t) >
(t)

(t) (t) (t) (t)
d. w(t+1) = w(t) − η −2w1 (w2 )2 + e−w1 , −2w2 (w1 )2 + e−w2

Solution:
Answer: b
Explanation: Let g(w) = g(w1 , w2 ) = w12 w22 + e−w1 + e−w2 where w ∈ R2 .
Step 1: Calculate the gradient ∇g(w)
The gradient is given by:
∂g ∂g >

∇g(w) = ,
∂w1 ∂w2
Calculating the partial derivatives:
∂g
= 2w1 w22 − e−w1
∂w1
∂g
= 2w2 w12 − e−w2
∂w2
Step 2: Write the gradient Thus, the gradient is:
>
∇g(w) = 2w1 w22 − e−w1 , 2w2 w12 − e−w2

Step 3: Gradient descent update rule The first-order gradient descent update rule is given by:

w(t+1) = w(t) − η∇g(w(t) )

Plugging in the gradient:

(t) (t) >

(t) (t) (t) (t)
w(t+1) = w(t) − η 2w1 (w2 )2 − e−w1 , 2w2 (w1 )2 − e−w2

Question 11. [1 mark]

If F ⊂ G, then is it true that minf ∈F L̂(f ) ≥ ming∈G L̂(g)?

Solution:
Answer: True.
Explaination: For any f ∈ G, we know that L̂(f ) ≥ L̂(g) for all g ∈ G, since the RHS is the
minimum value. But since F ⊂ G, we have that L̂(f ) ≥ L̂(g) for all f ∈ F as well. Taking minimum
over F both sides, we have the result.

Question 12. [1 mark]

Consider the setting of polynomial regression. Let d = 2, such that x = (x0 = 1, x1 , x2 ), and p = 4,
then p̄ = 10. True or False?

7/13
Fall 2024 CMPUT 267: Basics of Machine Learning

Solution:
2+4

False. It’s 4 = 6 · 5/2 = 15.

Question 13. [1 mark]

Let everything be defined as in the previous question. The expression for φp (x) is given by

φ(x) = x1 , x2 , x21 , x1 x2 , x22 , x31 , x21 x2 , x1 x22 , x32 , x41 , x31 x2 , x21 x22 , x1 x32 , x42 .

True or False?

Solution:
False. The constant term 1 is missing.

Question 14. [1 mark]

Suppose that

F¯p = {f |f : Rd+1 → R, and f (x) = log(φp (x)> w), for some w ∈ Rp̄ }.

Is it true that F̄1 ⊂ F̄2 ?

Solution:
True. As we increase the degree the function class becomes more expressive.

Question 15. [1 mark]

You are predicting house prices. Supose you want to make the irriducible error smaller. If you
gather a new feature about houses (that you didn’t already have) such as the number of swimming
pools in the backyard, is it likely to decrease the irriducible error? True or False?

Solution:
True. Irreducible error can be reduced by adding more features that are relevant to the prediction
task.

Question 16. [1 mark]

Consider the same setting as the previous problem. The estimation error can be reduced by reducing
the number of data points. True or False?

Solution:
False. Estimation error can be reduced by adding more data points or by using a simpler model.

Question 17. [1 mark]

Consider the same setting as the previous problem. The approximation error can be reduced by
using a larger function class. True or False?

Solution:
True. Approximation error can be reduced by using a more complex model.

8/13
Fall 2024 CMPUT 267: Basics of Machine Learning

Question 18. [1 mark]

You notice your predictor is overfitting. To reduce overfitting, we should make the degree p of the
polynomial function class larger. True or False?

Solution:
False. We need to make p smaller.

Question 19. [1 mark]

Suppose that you have a small dataset, but a large function class. Would the variance be large or
small? Would you expect the bias to be large or small? Would you expect the predictor fˆD to be
underfitting or overfitting the data or neither?

a. variance large, bias large, overfit.

b. variance small, bias large, overfit.

c. variance large, bias small, overfit.

d. variance small, bias small, underfit.

Solution:
Answer: c. Variance large, bias small, overfit.

Question 20. [1 mark]

Suppose that you have a large dataset, but a small function class, and fBayes is much more complex
than any function in the function class. Would the variance be large or small? Would you expect
the bias to be large or small? Would expect the predictor fˆD to be underfitting or overfitting the
data or neither?

a. variance large, bias large, underfit.

b. variance small, bias large, underfit.

c. variance small, bias small, neither overfitting nor underfitting.

d. variance large, bias large, neither overfitting nor underfitting.

Solution:
Answer: b. Variance small, bias large, underfit.

Question 21. [1 mark]

Suppose that you have a large dataset, a small function class F, and fBayes ∈ F. Would the variance
be large or small? Would you expect the bias to be large or small? Would expect the predictor fˆD
to be underfitting or overfitting the data or neither?

a. variance large, bias large, overfitting.

b. variance small, bias small, overfitting.

9/13
Fall 2024 CMPUT 267: Basics of Machine Learning

c. variance small, bias small, neither overfitting nor underfitting.

d. variance large, bias large, neither overfitting nor underfitting.

Solution:
Answer: c. Variance small, bias small, neither overfit or underfit.

Question 22. [1 mark]

You are using regularization. You notice you are underfitting. You should decrease the value of
lambda to reduce underfitting and get a smaller test loss. True or False?

Solution:
True. Decreasing the value of lambda will reduce the regularization strength and allow the model
to fit the data better.

Question 23. [1 mark]

Suppose you have a dataset D = (z1 , . . . , zn ) containing n i.i.d. flips of a coin. Since the flips are
i.i.d. you know they all follow the distribution Bernoulli (α∗ ). However, you do not know what α∗
is so you would like to estimate it using MLE. Which of the following is the maximum likelihood
estimate αMLE ?

a. αMLE = n1 ni=1 αi
P

b. αMLE = n1 ni=1 zi
P

1 Pn
c. αMLE = n−1 i=1 zi

1 Pn−1
d. αMLE = n i=1 zi

Solution:
Answer: b.
Explanation: The probability of each flip zi is p(zi |α) = αzi (1 − α)1−zi . The likelihood is:
n
Y
p(D|α) = αzi (1 − α)1−zi
i=1

The negaitve log-likelihood is:

n
X
log αzi (1 − α)1−zi

− log p(D|α) = −
i=1
n
X
=− (zi log α + (1 − zi ) log(1 − α))
i=1
n n
! !
X X
=− zi log α − n− zi log(1 − α)
i=1 i=1

10/13
Fall 2024 CMPUT 267: Basics of Machine Learning

d
Differentiating and setting (− log p(D|α)) = 0, we find:
dα
Pn
n − ni=1 zi
P
d i=1 zi
(− log p(D|α)) = − + =0
dα Pn Pn α 1−α
i=1 zi n − i=1 zi
=⇒ =
α 1−α
Xn X n
=⇒ (1 − α) zi = α(n − zi )
i=1 i=1
n
X n
X n
X
=⇒ zi − α zi = αn − α zi
i=1 i=1 i=1
Xn
=⇒ zi = αn
i=1
n
1X
=⇒ α = zi = αMLE
n
i=1

Question 24. [1 mark]

Assume that Y |X follows a Gaussian distribution with mean µ = xw1 and variance σ 2 = exp(xw2 )
for all x ∈ R and w = (w1 , w2 ) where w1 , w2 ∈ R. The negative log-likelihood, can be written as
follows for a dataset D = ((x1 , y1 ), · · · , (xn , yn )):
n
X
g(w) = gi (w) where gi (w) = − ln p(yi |xi , w) ,
i=1

where p(·|·) is the density of the above Gaussian distribution. What is partial derivative of g with
respect to w1 ?
i (yi −xi w1 )
∂g
= ni=1 xexp(x
P
a. ∂w 1 i w2 )

∂g Pn (yi −xi w1 )2
b. ∂w1 = i=1 2 exp(xi w2 )

∂g Pn xi (yi −xi w1 )
c. ∂w1 =− i=1 exp(xi w2 )

∂g Pn (yi −xi w1 )2
d. ∂w1 =− i=1 exp(xi w2 )

Solution:
Answer: c.
∂g
Explanation: To find ∂w1 , note that the density of Y |X is

(yi − xi w1 )2

1
p(yi |xi , w) = p exp −
2π exp(xi w2 ) 2 exp(xi w2 )
The negative log-likelihood term gi (w) is:

(yi − xi w1 )2 1
gi (w) = + ln(2π exp(xi w2 ))
2 exp(xi w2 ) 2

11/13
Fall 2024 CMPUT 267: Basics of Machine Learning

Differentiating gi (w) with respect to w1 gives:

n
X xi (yi − xi w1 )
∂g
=− .
∂w1 exp(xi w2 )
i=1

Question 25. [1 mark]

Let everything be defined as in the previous question. What is partial derivative of g with respect
to w2 ?
Pn i −xi w1 )
2

a. i=1 − 2(yexp(x i w2 )
+ xi

Pn (yi −xi w1 )2 xi

b. i=1 2 exp(xi w2 ) + 2

Pn xi (yi −xi w1 )2 xi

c. i=1 2 exp(xi w2 ) − 2
2

i −xi w1 )
− x2i (y
Pn xi
d. i=1 exp(xi w2 ) + 2

Solution:
Answer: d.
∂g
Explanation: To find ∂w2 , we start with the expression for gi (w):

(yi − xi w1 )2 1
gi (w) = + ln(2π exp(xi w2 ))
2 exp(xi w2 ) 2
Differentiating gi (w) with respect to w2 gives:
n
xi (yi − xi w1 )2 xi

∂g X
= − + .
∂w2 2 exp(xi w2 ) 2
i=1

Question 26. [1 mark]

Let everything be defined as in the previous question. You want to solve for wMLE using gradient
descent. Using the partial derivatives you calculated in the previous quesitons, what would the
gradient update rule look like with a constant step size η?

Pn xi (yi −xi w1 ) Pn xi (yi −xi w1 )2 xi

a. w1 ← w1 − η i=1 exp(xi w2 ) , w2 ← w2 − η i=1 2 exp(xi w2 ) − 2

Pn xi (yi −xi w1 ) Pn xi (yi −xi w1 )2 xi

b. w1 ← w1 + η i=1 exp(xi w2 ) , w2 ← w2 + η i=1 2 exp(xi w2 ) − 2

Pn (yi −xi w1 ) Pn (yi −xi w1 )2 xi

c. w1 ← w1 − η i=1 2 , w2 ← w2 − η i=1 2 exp(xi w2 ) − 2

Pn (yi −xi w1 ) Pn (yi −xi w1 )2 xi

d. w1 ← w1 − η i=1 exp(xi w2 ) , w2 ← w2 + η i=1 2 − 2

12/13
Fall 2024 CMPUT 267: Basics of Machine Learning

Solution:
Answer: b.
Explanation: Plugging in the partial derivatives from the previous questions into the gradient
descent update rule, we get:
n
X xi (yi − xi w1 )
w1 ← w1 + η ,
exp(xi w2 )
i=1
n
xi (yi − xi w1 )2

X xi
w2 ← w2 + η − .
2 exp(xi w2 ) 2
i=1

13/13

Assured Shorthold Tenancy
No ratings yet
Assured Shorthold Tenancy
6 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
MCQs On Calculus
No ratings yet
MCQs On Calculus
11 pages
MLF Q2 Practice Problems
No ratings yet
MLF Q2 Practice Problems
61 pages
Linear Algebra Assignment Solution
100% (1)
Linear Algebra Assignment Solution
28 pages
CS6910 Tutorial1
No ratings yet
CS6910 Tutorial1
10 pages
Meseret Fetene
100% (1)
Meseret Fetene
22 pages
Linear - Algebra Practice Problems
No ratings yet
Linear - Algebra Practice Problems
21 pages
CS480 6 Linear Models
No ratings yet
CS480 6 Linear Models
68 pages
Dharavi - A City Within A City
No ratings yet
Dharavi - A City Within A City
2 pages
Lecture 02
No ratings yet
Lecture 02
37 pages
Portfoli o Management: A Project On
No ratings yet
Portfoli o Management: A Project On
48 pages
DL 1
No ratings yet
DL 1
10 pages
System Modelling Ch5
No ratings yet
System Modelling Ch5
53 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
CS6910 Tutorial5
No ratings yet
CS6910 Tutorial5
9 pages
ML Ctanujit
No ratings yet
ML Ctanujit
56 pages
Exam 2023
No ratings yet
Exam 2023
16 pages
Unit-4, MCQ
No ratings yet
Unit-4, MCQ
5 pages
A Guide To: Project Auditing
No ratings yet
A Guide To: Project Auditing
37 pages
4.deep Learning Assignment4 Solution PDF
100% (1)
4.deep Learning Assignment4 Solution PDF
12 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
5 Year Procurement Projection 30032023
No ratings yet
5 Year Procurement Projection 30032023
26 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Week 2
No ratings yet
Week 2
7 pages
Oblig2 Fasit
No ratings yet
Oblig2 Fasit
6 pages
COGS 118 Homework 3 Supervised Machine Learning Algorithms
No ratings yet
COGS 118 Homework 3 Supervised Machine Learning Algorithms
7 pages
Child Labour
No ratings yet
Child Labour
14 pages
CS 419M Midsem 2021 22
No ratings yet
CS 419M Midsem 2021 22
6 pages
Sample Midterm With Solutions (Updated)
No ratings yet
Sample Midterm With Solutions (Updated)
26 pages
Week 10
No ratings yet
Week 10
8 pages
Handout Delta Rule
No ratings yet
Handout Delta Rule
10 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
Mlgs 2021 Endterm Solution
No ratings yet
Mlgs 2021 Endterm Solution
26 pages
Exam 2018
No ratings yet
Exam 2018
18 pages
Machine Learning Homework
No ratings yet
Machine Learning Homework
8 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
C2 M2 Exam Withsol
No ratings yet
C2 M2 Exam Withsol
12 pages
PRML Test 2
No ratings yet
PRML Test 2
3 pages
S Ccs Answers
No ratings yet
S Ccs Answers
192 pages
Transaction Highlights: Krugold Resources, Inc
No ratings yet
Transaction Highlights: Krugold Resources, Inc
5 pages
M2 Exam 2022-23 Solutions
No ratings yet
M2 Exam 2022-23 Solutions
12 pages
2021 Exam2 Solution
No ratings yet
2021 Exam2 Solution
11 pages
CS 771A: Introduction To Machine Learning Name Roll No Dept
No ratings yet
CS 771A: Introduction To Machine Learning Name Roll No Dept
2 pages
Wa0193.
No ratings yet
Wa0193.
4 pages
1160 CS F425 20241218114944 Comprehensive Exam Question Paper
No ratings yet
1160 CS F425 20241218114944 Comprehensive Exam Question Paper
5 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
MS Key-4
No ratings yet
MS Key-4
4 pages
Class Test 1
No ratings yet
Class Test 1
5 pages
Week 7
No ratings yet
Week 7
7 pages
Week 8
No ratings yet
Week 8
5 pages
hw4 Red
No ratings yet
hw4 Red
6 pages
Recognition Patterns: Jean Carlo Grandas Franco March 2020
No ratings yet
Recognition Patterns: Jean Carlo Grandas Franco March 2020
9 pages
Sol3 2015
No ratings yet
Sol3 2015
8 pages
Tutorial 8 Questions
No ratings yet
Tutorial 8 Questions
3 pages
2019-20-I MS Key
No ratings yet
2019-20-I MS Key
6 pages
Deep Learning Assignment2 Solutions PDF
No ratings yet
Deep Learning Assignment2 Solutions PDF
16 pages
Ind Hstry 202313jun
No ratings yet
Ind Hstry 202313jun
80 pages
Inside Sales Multimedia Study
No ratings yet
Inside Sales Multimedia Study
16 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Simon Chapter 3
No ratings yet
Simon Chapter 3
12 pages
Midterm With Solutions
No ratings yet
Midterm With Solutions
26 pages
Saudi Jobs
No ratings yet
Saudi Jobs
18 pages
Mock End Term Solution
No ratings yet
Mock End Term Solution
12 pages
Chp2-Binary Numbers and Codes (15.1.09)
No ratings yet
Chp2-Binary Numbers and Codes (15.1.09)
16 pages
Ey HK Tax Alert 1 Dec Issue 17
No ratings yet
Ey HK Tax Alert 1 Dec Issue 17
6 pages
MFRS 112 Dtadtl
No ratings yet
MFRS 112 Dtadtl
19 pages
P40-01-F21-1 Fire Safety Maintenance Plan & Log - Weekly (Enf, 21.01.01)
No ratings yet
P40-01-F21-1 Fire Safety Maintenance Plan & Log - Weekly (Enf, 21.01.01)
3 pages
Mumbai - Lucknow 26 May
No ratings yet
Mumbai - Lucknow 26 May
1 page
Deshidratador Serie MDQ
No ratings yet
Deshidratador Serie MDQ
4 pages
Week 1
No ratings yet
Week 1
11 pages
DL - Assignment 4 Solution
No ratings yet
DL - Assignment 4 Solution
6 pages
The Social Network Review
No ratings yet
The Social Network Review
16 pages
HW 1
No ratings yet
HW 1
3 pages
Homework 5 Solutions
No ratings yet
Homework 5 Solutions
6 pages
12 - Section VI - Chapter 3 - Annex 02 - Drawings and Documents Dibi
No ratings yet
12 - Section VI - Chapter 3 - Annex 02 - Drawings and Documents Dibi
15 pages
Tamil Presentation
No ratings yet
Tamil Presentation
5 pages
Election Worker Lawsuit Dismissal
No ratings yet
Election Worker Lawsuit Dismissal
12 pages
Solutions Manual Scientific Computing
0% (1)
Solutions Manual Scientific Computing
192 pages
MH12NR9505 PDF
No ratings yet
MH12NR9505 PDF
2 pages
Shayna Parker Resume 2018
No ratings yet
Shayna Parker Resume 2018
2 pages
28 Huerta Alba Resort Inc v. CA
No ratings yet
28 Huerta Alba Resort Inc v. CA
2 pages
Work Instruction - Manual Update Kuka - Recoveryusb V1.0 To V2.0
No ratings yet
Work Instruction - Manual Update Kuka - Recoveryusb V1.0 To V2.0
4 pages
Research Paper Templates For Elementary Students
No ratings yet
Research Paper Templates For Elementary Students
8 pages
Proforma Invoice - PAK RIO
No ratings yet
Proforma Invoice - PAK RIO
1 page
LabTech Software - Remote Monitoring & Management Blue Software Appin
No ratings yet
LabTech Software - Remote Monitoring & Management Blue Software Appin
2 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ass6 Solns

Uploaded by

Ass6 Solns

Uploaded by

Homework Assignment 6

Due: Friday, November 8, 2024, 11:59 p.m. Mountain time

What is the closed form solution for ŵ = arg minw∈R L̂(w) ?

Taking the derivative with respect to w and setting it to zero gives:

Simplifying leads to:

Thus, solving for w yields:

a. ŵ = A−1 b where A = ni=1 xi x>

b. ŵ = Ax where A = ni=1 xi x>

Taking the derivative with respect to w and setting it to zero gives:

This simplifies to:

We can express this using the definitions A and b:

w∗ = arg min g(w)

a. g 0 (w) = 4w3 − e−w and update: w ← w − η(4w3 − e−w )

b. g 0 (w) = 4w3 + e−w and update: w ← w − η(4w3 + e−w )

c. g 0 (w) = 4w3 + e−w and update: w ← w + η(4w3 + e−w )

d. g 0 (w) = 4w3 − e−w and update: w ← w + η(4w3 − e−w )

w ← w − ηg 0 (w) = w − η(4w3 − e−w ).

g 0 (w) = 4w3 − e−w .

2. Next, we differentiate g 0 (w) to get g 00 (w):

g 0 (w) 4w3 − e−w

Step 1: Calculate g 0 (w). The first derivative is:

g 0 (w) = 4w3 − e−w

g 00 (w) = 12w2 + e−w

Question 10. [1 mark]

w(t+1) = w(t) − η∇g(w(t) )

Plugging in the gradient:

Question 11. [1 mark]

Question 12. [1 mark]

Question 13. [1 mark]

Question 14. [1 mark]

Is it true that F̄1 ⊂ F̄2 ?

Question 15. [1 mark]

Question 16. [1 mark]

Question 17. [1 mark]

Question 18. [1 mark]

Question 19. [1 mark]

a. variance large, bias large, overfit.

b. variance small, bias large, overfit.

c. variance large, bias small, overfit.

d. variance small, bias small, underfit.

Question 20. [1 mark]

a. variance large, bias large, underfit.

b. variance small, bias large, underfit.

c. variance small, bias small, neither overfitting nor underfitting.

d. variance large, bias large, neither overfitting nor underfitting.

Question 21. [1 mark]

a. variance large, bias large, overfitting.

b. variance small, bias small, overfitting.

c. variance small, bias small, neither overfitting nor underfitting.

d. variance large, bias large, neither overfitting nor underfitting.

Question 22. [1 mark]

Question 23. [1 mark]

The negaitve log-likelihood is:

Question 24. [1 mark]

Differentiating gi (w) with respect to w1 gives:

Question 25. [1 mark]

Question 26. [1 mark]

Pn  xi (yi −xi w1 )  Pn  xi (yi −xi w1 )2 xi

Pn  xi (yi −xi w1 )  Pn  xi (yi −xi w1 )2 xi

Pn  (yi −xi w1 )  Pn  (yi −xi w1 )2 xi

Pn  (yi −xi w1 )  Pn  (yi −xi w1 )2 xi

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Pn xi (yi −xi w1 ) Pn xi (yi −xi w1 )2 xi

Pn xi (yi −xi w1 ) Pn xi (yi −xi w1 )2 xi

Pn (yi −xi w1 ) Pn (yi −xi w1 )2 xi

Pn (yi −xi w1 ) Pn (yi −xi w1 )2 xi