Challenging Questions
Challenging Questions
1.Consider the case of the XOR function in which the two points {(0, 0),(1, 1)} belong to one
class, and the other two points {(1, 0),(0, 1)} belong to the other class. Show how you can use
the ReLU activation function to separate the two classes in a manner similar to the example in
Figure 1.14.
2. Show the following properties of the sigmoid and tanh activation functions (denoted by Φ(·)
in each case): (a) Sigmoid activation: Φ(−v)=1 − Φ(v) (b) Tanh activation: Φ(−v) = −Φ(v) (c)
Hard tanh activation: Φ(−v) = −Φ(v)
3. Show that the tanh function is a re-scaled sigmoid function with both horizontal and vertical
stretching, as well as vertical translation: tanh(v) = 2sigmoid(2v) − 1
4. Consider a data set in which the two points {(−1, −1),(1, 1)} belong to one class, and the other
two points {(1, −1),(−1, 1)} belong to the other class. Start with perceptron parameter values at
(0, 0), and work out a few stochastic gradient-descent updates with α = 1. While performing the
stochastic gradient-descent updates, cycle through the training points in any order. (a) Does the
algorithm converge in the sense that the change in objective function becomes extremely small
over time? (b) Explain why the situation in (a) occurs.
5. For the data set in Exercise 4, where the two features are denoted by (x1, x2), define a new 1-
dimensional representation z denoted by the following: z = x1 · x2 Is the data set linearly
separable in terms of the 1-dimensional representation corresponding to z? Explain the
importance of nonlinear transformations in classification problems.
http://ndl.ethernet.edu.et/bitstream/
123456789/88552/1/2018_Book_NeuralNetworksAndDeepLearning.pdf
1. Consider the following recurrence: (xt+1, yt+1)=(f(xt, yt), g(xt, yt)) (3.66) Here, f() and g()
are multivariate functions. (a) Derive an expression for ∂xt+2 ∂xt in terms of only xt and yt. (b)
Can you draw an architecture of a neural network corresponding to the above recursion for t
varying from 1 to 5? Assume that the neurons can compute any function you want.
2. Consider a two-input neuron that multiplies its two inputs x1 and x2 to obtain the output o. Let
L be the loss function that is computed at o. Suppose that you know that ∂L ∂o = 5, x1 = 2, and
x2 = 3. Compute the values of ∂L ∂x1 and ∂L ∂x2 .
3. Consider a neural network with three layers including an input layer. The first (input) layer has
four inputs x1, x2, x3, and x4. The second layer has six hidden units corresponding to all
pairwise multiplications. The output node o simply adds the values in the six hidden units. Let L
be the loss at the output node. Suppose that you know that ∂L ∂o = 2, and x1 = 1, x2 = 2, x3 = 3,
and x4 = 4. Compute ∂L ∂xi for each i.
4. How does your answer to the previous question change when the output o is computed as a
maximum of its six inputs rather than its sum?
5. The chapter discusses (cf. Table 3.1) how one can perform a backpropagation of an arbitrary
function by using the multiplication with the Jacobian matrix. Discuss why one must be careful
in using this matrix-centric approach.[Hint: Compute the Jacobian with respect to sigmoid
function]
http://ndl.ethernet.edu.et/bitstream/
123456789/88552/1/2018_Book_NeuralNetworksAndDeepLearning.pdf