Shallow Networks Versus Deep Networks
Shallow Networks Versus Deep Networks
When we hear the name Neural Network, we feel that it consist of many and many hidden
layers but there is a type of neural network with a few numbers of hidden layers. Shallow
neural networks consist of only 1 or 2 hidden layers. Understanding a shallow neural network
gives us an insight into what exactly is going on inside a deep neural network. The figure
below shows a shallow neural network with 1 hidden layer, 1 input layer and 1 output layer.
Shallow vs depth is a matter of degree. A logistic regression is a very shallow model as it has
only one layer (remember we don’t count the input as a layer):
A deeper neural network has more number of hidden layers:
These are some of the notations which we will be using in the upcoming sections. Keep them in
mind as we proceed, or just quickly hop back here in case you miss something.
We can vectorize these steps for ‘m’ training examples as shown below:
Z[l] = W[l] A [l-1] + B[l]
These outputs from one layer act as an input for the next layer. We can’t compute the forward
propagation for all the layers of a neural network without a for loop, so it’s fine to have a for loop
here. Before moving further, let’s look at the dimensions of various matrices that will help us
understand these steps in a better way.
Can you figure out the number of layers (L) in this neural network? You are correct if you
guessed 5. There are 4 hidden layers and 1 output layer. The units in each layer are:
where ‘m’ is the number of training examples. These are some of the generalized matrix
dimensions which will help you to run your code smoothly.
We have seen some of the basics of deep neural networks up to this point. But why do we need
deep representations
Deep neural networks find relations with the data (simpler to complex relations). What the first
hidden layer might be doing, is trying to find simple functions like identifying the edges in the
above image. And as we go deeper into the network, these simple functions combine together to
form more complex functions like identifying the face. Some of the common examples of
leveraging a deep neural network are:
Deep neural networks find relations with the data (simpler to complex relations). What the first
hidden layer might be doing, is trying to find simple functions like identifying the edges in the
above image. And as we go deeper into the network, these simple functions combine together to
form more complex functions like identifying the face. Some of the common examples of
leveraging a deep neural network are:
Face Recognition
o Image ==> Edges ==> Face parts ==> Faces ==> desired face
Audio recognition
o Audio ==> Low level sound features like (sss, bb) ==> Phonemes ==> Words
==> Sentences
Input: a[l-1]
Output: a[l]
This layer first calculates the z[l] on which the activations are applied. This z [l] is saved as cache.
For the backward propagation step, it will first calculate da [l], i.e., derivative of the activation at
layer l, derivative of weights dw [l], db[l], dz[l], and finally da[l-1]. Let’s visualize these steps to reduce
the complexity:
This is how each block (layer) of a deep neural network works. Next, we will see how to
implement all of these blocks.
A[l] = g[l](Z[l])
We will calculate Z and A for each layer of the network. After calculating the activations, the
next step is backward propagation, where we update the weights using the derivatives. The input
for backward propagation is da [l] and the outputs are da[l-1], dW[l] and db[l]. Let’s look at the
vectorized equations for backward propagation:
Deep Neural Networks perform surprisingly well (maybe not so surprising if you’ve used them
before!). Running only a few lines of code gives us satisfactory results. This is because we are
feeding a large amount of data to the network and it is learning from that data using the hidden
layers.
Choosing the right hyper parameters helps us to make our model more efficient
Parameters of a deep neural network are W and b, which the model updates during the back
propagation step. On the other hand, there are a lot of hyper parameters for a deep NN, including:
Learning rate – ⍺
Number of iterations
Number of hidden layers
Units in each hidden layer
Choice of activation function