0% found this document useful (0 votes)
3 views3 pages

Assignment 4 Solution

The document provides solutions to various mathematical problems related to vector-valued functions, directional derivatives, and convexity in the context of neural networks. It discusses methods such as Lagrange multipliers for optimization and explains the role of weights and biases in neural networks. Additionally, it includes proofs and calculations related to gradients and convexity conditions.

Uploaded by

2k22.it.2213439
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views3 pages

Assignment 4 Solution

The document provides solutions to various mathematical problems related to vector-valued functions, directional derivatives, and convexity in the context of neural networks. It discusses methods such as Lagrange multipliers for optimization and explains the role of weights and biases in neural networks. Additionally, it includes proofs and calculations related to gradients and convexity conditions.

Uploaded by

2k22.it.2213439
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Assignment 4 Solution - NOC-CS 41

 
2 4
1. C.
4 2
" #
√6
2. D. 2
√6
2
Explanation: The rate of change of a vector-valued function at a point in the direction of a vector
is given by the directional derivative. The directional derivative of a vector-valued function f (x, y) =
[x2 + y 2 , 2xy] at a point (x0 , y0 ) in the direction of a unit vector u is given by Du f (x0 , y0 ) = fx (x0 , y0 ) ∗
u1 + fy (x0 , y0 ) ∗ u2 , where fx and fy are the partial derivatives of f with respect to x and y respectively,
and u1 and u2 are the components of the unit vector u.
In this case, f (x, y) = [x2 + y 2 , 2xy], so fx (x, y) = [2x, 2y] and fy (x, y) = [2y, 2x]. At the point
√ (1, 2),
fx (1, 2) = [2, 4] and fy (1, 2) = [4, 2]. The vector (1,1) is not a unit vector. Its magnitude is 12 + 12 =

2, so the unit vector in the direction of (1, 1) is √12 , √12 . Therefore, the directional derivative of f
    h i
at (1, 2) in the direction of (1, 1) is Du f (1, 2) = fx (1, 2) ∗ √12 + fy (1, 2) ∗ √12 = √62 , √62 .
So the rate of hchange iof the vector-valued function f (x, y) at the point (1, 2) in the direction of the
vector (1, 1) is √62 , √62 .
9
3. D. 2
9
2
Explanation: To find the minimum value of the vector-valued function f (x, y) = [x2 + y 2 , 2xy] subject
to the constraint x + y = 3, one of the methods we can use is the method of Lagrange multipliers. Let
g(x, y) = x + y − 3. Then we need to solve the system of equations given by ∇f (x, y) = λ∇g(x, y) and
g(x, y) = 0 for x, y, and λ.
The gradient of f is given by ∇f (x, y) = [2x, 2y]. The gradient of g is given by ∇g(x, y) = [1, 1]. So we
need to solve the system of equations given by

2x = λ
2y = λ
x+y =3

From the first two equations, we see that 2x = 2y, so x = y. Substituting this into the third equation
gives us x + x = 3, so x = 23 . Since x = y, we also have y = 23 .
Therefore, theh minimum value of thei vector-valued function f (x, y) subject to the constraint x + y = 3
2 2
is f 23 , 32 = 32 + 32 , 2 32 32 = 29 , 92 .
    

4. D. For a scalar-valued function f : Rn → R, a necessary and sufficient condition for f to be strictly


convex is that the Hessian of f at each point must be a positive definite matrix. So, the correct choice is:
5. A. The domain D of the function fi (x) must be convex. This is a necessary condition for a function to be
convex, but it is not sufficient on its own. Consider the function f (x) = x3 on the domain D = (−∞, ∞),
which is a convex set.
Let’s take x = −1, y = 1, and t = 0.5. We have:
f (0.5 ∗ (−1) + 0.5 ∗ 1) = f (0) = 0

and

0.5 ∗ f (−1) + 0.5 ∗ f (1) = 0.5 ∗ (−1) + 0.5 ∗ 1 = 0

In this case, the convex combination property holds. However, if we take x = −2, y = 1, and t = 0.5,
we get:

f (0.5 ∗ (−2) + 0.5 ∗ 1) = f (−0.5) = −0.125

and

0.5 ∗ f (−2) + 0.5 ∗ f (1) = 0.5 ∗ (−8) + 0.5 ∗ 1 = −3.5

In this case, −0.125 is not less than or equal to −3.5, so the convex combination property does not hold.
Therefore, the function f (x) = x3 is not convex, even though its domain D is convex. This demonstrates
that the convexity of the domain is a necessary but not sufficient condition for a function to be convex.
6. A. In a neural network, the primary purpose of a weight is indeed to increase or reduce the importance
of a certain input feature. This is how a neural network learns to prioritise certain features over others
during the training process.
7. B. To introduce non-linearity in the network
8. A. During the forward pass in a neural network, the network makes a prediction based on the input
data (option A). The weights and biases are not updated during the forward pass (this happens during
the backward pass), the gradient of the loss function is not calculated (this also happens during the
backward pass), and the activation function is determined before the forward pass begins, not during it.
9. B. The given function is not convex as neither component of the vector function is convex.
2 2
1.   component f1 (x1 , x2 ) = x1 − x2 is not convex. This is because its Hessian
The first matrix is
2 0
, which is not positive semi-definite (the second eigenvalue is negative).
0 −2
 
0 2
2. The second component f2 (x1 , x2 ) = 2x1 x2 is also not convex. Its Hessian matrix is , which
2 0
is not positive semi-definite (the first eigenvalue is negative).
10. D.
pPn δ 1
Proof: Remember that ||x − u|| = i=1 (xi − ui )2 . Now, consider δxi ||x−u|| , we have the following:

δ 1 δ 1
= pPn
δxi ||x − u|| δxi i=1 (xi − ui )
2
n
1 δ X
=− Pn 3 · (xi − ui )2
2 · ( i=1 (xi − ui )2 ) 2 δxi i=1
2(xi − ui )
=− Pn 3
2 · ( i=1 (xi − ui )2 ) 2
xi − ui xi − ui
= − pP 3 = −
n 2 ||x − u||3
i=1 (xi − ui )

x−u
Following this logic, we get the gradient ∇f (x) = − ||x−u||3

Page 2
11. A. Q must be positive definite
∂J
12. A. ∂w1 = (ŷ − y) · ŷ(1 − ŷ) · x1
Explanantion: Let’s say the two inputs to the neural network are x1 and x2 , the two weights are w1
and w2 , and the bias is b. The output of the neural network before applying the activation function is
z = w1 x1 + w2 x2 + b. After applying the sigmoid activation function, the predicted output is ŷ = σ(z) =
1
1+e−z . Let’s say the true output is y. The objective function, which is the mean squared error between
the predicted output and the true output, is J = 12 (ŷ − y)2 .
∂J
The gradient of the objective function with respect to the first weight w1 is ∂w1 . We can use the chain
rule to compute this gradient as follows:
∂J ∂J ∂ ŷ ∂z
∂w1 = ∂ ŷ · ∂z · ∂w1
∂J
∂ ŷ = (ŷ − y)
∂ ŷ ′
∂z = σ (z) = σ(z)(1 − σ(z)) = ŷ(1 − ŷ)
∂z
∂w1 = x1
∂J
Substituting these values back into the expression for ∂w1 , we get:
∂J
∂w1 = (ŷ − y) · ŷ(1 − ŷ) · x1
∂J
13. B. ∂b = (ŷ − y) · ŷ(1 − ŷ)
Explanation: The gradient of the objective function with respect to the bias is given by ∂J ∂b = (ŷ −
y) · ŷ(1 − ŷ). This is because the bias term is added to the weighted sum of inputs before being passed
through the activation function. The derivative of the objective function with respect to the bias is
similar to the derivative with respect to a weight, except that the input term xi is replaced by 1 since
the bias has no corresponding input.
∂J
14. A. ∂w1 = (ŷ − y) · (1 − ŷ 2 ) · x1 .

Page 3

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy