Assignment 4 Solution
Assignment 4 Solution
2 4
1. C.
4 2
" #
√6
2. D. 2
√6
2
Explanation: The rate of change of a vector-valued function at a point in the direction of a vector
is given by the directional derivative. The directional derivative of a vector-valued function f (x, y) =
[x2 + y 2 , 2xy] at a point (x0 , y0 ) in the direction of a unit vector u is given by Du f (x0 , y0 ) = fx (x0 , y0 ) ∗
u1 + fy (x0 , y0 ) ∗ u2 , where fx and fy are the partial derivatives of f with respect to x and y respectively,
and u1 and u2 are the components of the unit vector u.
In this case, f (x, y) = [x2 + y 2 , 2xy], so fx (x, y) = [2x, 2y] and fy (x, y) = [2y, 2x]. At the point
√ (1, 2),
fx (1, 2) = [2, 4] and fy (1, 2) = [4, 2]. The vector (1,1) is not a unit vector. Its magnitude is 12 + 12 =
√
2, so the unit vector in the direction of (1, 1) is √12 , √12 . Therefore, the directional derivative of f
h i
at (1, 2) in the direction of (1, 1) is Du f (1, 2) = fx (1, 2) ∗ √12 + fy (1, 2) ∗ √12 = √62 , √62 .
So the rate of hchange iof the vector-valued function f (x, y) at the point (1, 2) in the direction of the
vector (1, 1) is √62 , √62 .
9
3. D. 2
9
2
Explanation: To find the minimum value of the vector-valued function f (x, y) = [x2 + y 2 , 2xy] subject
to the constraint x + y = 3, one of the methods we can use is the method of Lagrange multipliers. Let
g(x, y) = x + y − 3. Then we need to solve the system of equations given by ∇f (x, y) = λ∇g(x, y) and
g(x, y) = 0 for x, y, and λ.
The gradient of f is given by ∇f (x, y) = [2x, 2y]. The gradient of g is given by ∇g(x, y) = [1, 1]. So we
need to solve the system of equations given by
2x = λ
2y = λ
x+y =3
From the first two equations, we see that 2x = 2y, so x = y. Substituting this into the third equation
gives us x + x = 3, so x = 23 . Since x = y, we also have y = 23 .
Therefore, theh minimum value of thei vector-valued function f (x, y) subject to the constraint x + y = 3
2 2
is f 23 , 32 = 32 + 32 , 2 32 32 = 29 , 92 .
and
In this case, the convex combination property holds. However, if we take x = −2, y = 1, and t = 0.5,
we get:
and
In this case, −0.125 is not less than or equal to −3.5, so the convex combination property does not hold.
Therefore, the function f (x) = x3 is not convex, even though its domain D is convex. This demonstrates
that the convexity of the domain is a necessary but not sufficient condition for a function to be convex.
6. A. In a neural network, the primary purpose of a weight is indeed to increase or reduce the importance
of a certain input feature. This is how a neural network learns to prioritise certain features over others
during the training process.
7. B. To introduce non-linearity in the network
8. A. During the forward pass in a neural network, the network makes a prediction based on the input
data (option A). The weights and biases are not updated during the forward pass (this happens during
the backward pass), the gradient of the loss function is not calculated (this also happens during the
backward pass), and the activation function is determined before the forward pass begins, not during it.
9. B. The given function is not convex as neither component of the vector function is convex.
2 2
1. component f1 (x1 , x2 ) = x1 − x2 is not convex. This is because its Hessian
The first matrix is
2 0
, which is not positive semi-definite (the second eigenvalue is negative).
0 −2
0 2
2. The second component f2 (x1 , x2 ) = 2x1 x2 is also not convex. Its Hessian matrix is , which
2 0
is not positive semi-definite (the first eigenvalue is negative).
10. D.
pPn δ 1
Proof: Remember that ||x − u|| = i=1 (xi − ui )2 . Now, consider δxi ||x−u|| , we have the following:
δ 1 δ 1
= pPn
δxi ||x − u|| δxi i=1 (xi − ui )
2
n
1 δ X
=− Pn 3 · (xi − ui )2
2 · ( i=1 (xi − ui )2 ) 2 δxi i=1
2(xi − ui )
=− Pn 3
2 · ( i=1 (xi − ui )2 ) 2
xi − ui xi − ui
= − pP 3 = −
n 2 ||x − u||3
i=1 (xi − ui )
x−u
Following this logic, we get the gradient ∇f (x) = − ||x−u||3
Page 2
11. A. Q must be positive definite
∂J
12. A. ∂w1 = (ŷ − y) · ŷ(1 − ŷ) · x1
Explanantion: Let’s say the two inputs to the neural network are x1 and x2 , the two weights are w1
and w2 , and the bias is b. The output of the neural network before applying the activation function is
z = w1 x1 + w2 x2 + b. After applying the sigmoid activation function, the predicted output is ŷ = σ(z) =
1
1+e−z . Let’s say the true output is y. The objective function, which is the mean squared error between
the predicted output and the true output, is J = 12 (ŷ − y)2 .
∂J
The gradient of the objective function with respect to the first weight w1 is ∂w1 . We can use the chain
rule to compute this gradient as follows:
∂J ∂J ∂ ŷ ∂z
∂w1 = ∂ ŷ · ∂z · ∂w1
∂J
∂ ŷ = (ŷ − y)
∂ ŷ ′
∂z = σ (z) = σ(z)(1 − σ(z)) = ŷ(1 − ŷ)
∂z
∂w1 = x1
∂J
Substituting these values back into the expression for ∂w1 , we get:
∂J
∂w1 = (ŷ − y) · ŷ(1 − ŷ) · x1
∂J
13. B. ∂b = (ŷ − y) · ŷ(1 − ŷ)
Explanation: The gradient of the objective function with respect to the bias is given by ∂J ∂b = (ŷ −
y) · ŷ(1 − ŷ). This is because the bias term is added to the weighted sum of inputs before being passed
through the activation function. The derivative of the objective function with respect to the bias is
similar to the derivative with respect to a weight, except that the input term xi is replaced by 1 since
the bias has no corresponding input.
∂J
14. A. ∂w1 = (ŷ − y) · (1 − ŷ 2 ) · x1 .
Page 3