0% found this document useful (0 votes)
43 views6 pages

Shallow Networks Versus Deep Networks

Uploaded by

cse21298
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views6 pages

Shallow Networks Versus Deep Networks

Uploaded by

cse21298
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

SHALLOW NETWORKS VERSUS DEEP NETWORKS

When we hear the name Neural Network, we feel that it consist of many and many hidden

layers but there is a type of neural network with a few numbers of hidden layers. Shallow

neural networks consist of only 1 or 2 hidden layers. Understanding a shallow neural network

gives us an insight into what exactly is going on inside a deep neural network. The figure

below shows a shallow neural network with 1 hidden layer, 1 input layer and 1 output layer.

Deep L-Layer Neural Network


In this section, we will look at how the concepts of forward and backpropogation can be applied
to deep neural networks. But you might be wondering at this point what in the world deep neural
networks actually are?

Shallow vs depth is a matter of degree. A logistic regression is a very shallow model as it has
only one layer (remember we don’t count the input as a layer):
A deeper neural network has more number of hidden layers:

Let’s look at some of the notations related to deep neural networks:

 L is the number of layers in the neural network


 n[l] is the number of units in layer l
 a[l] is the activations in layer l
 w[l] is the weights for z[l]

These are some of the notations which we will be using in the upcoming sections. Keep them in
mind as we proceed, or just quickly hop back here in case you miss something.

Forward Propagation in a Deep Neural Network


For a single training example, the forward propagation steps can be written as:

z[l] = W[l]a[l-1] + b[l]

a[l] = g[l] (z[l])

We can vectorize these steps for ‘m’ training examples as shown below:
Z[l] = W[l] A [l-1] + B[l]

A[l] = g[l] (Z[l])

These outputs from one layer act as an input for the next layer. We can’t compute the forward
propagation for all the layers of a neural network without a for loop, so it’s fine to have a for loop
here. Before moving further, let’s look at the dimensions of various matrices that will help us
understand these steps in a better way.

Getting your matrix dimensions right


Analyzing the dimensions of a matrix is one of the best debugging tools to check how correct our
code is. We will discuss what should be the correct dimension for each matrix in this section.
Consider the following example:

Can you figure out the number of layers (L) in this neural network? You are correct if you
guessed 5. There are 4 hidden layers and 1 output layer. The units in each layer are:

n [0] = 2, n[1] = 3, n[2] = 5, n[3] = 4, n[4] = 2, and n [5] = 1

The generalized form of dimensions of W, b and their derivatives is:

 W[l] = (n[l], n[l-1])


 b[l] = (n[l], 1)
 dW[l] = (n[l], n[l-1])
 db[l] = (n[l],1)
 Dimension of Z[l], A[l], dZ[l], dA[l] = (n[l],m)

where ‘m’ is the number of training examples. These are some of the generalized matrix
dimensions which will help you to run your code smoothly.

We have seen some of the basics of deep neural networks up to this point. But why do we need
deep representations
Deep neural networks find relations with the data (simpler to complex relations). What the first
hidden layer might be doing, is trying to find simple functions like identifying the edges in the
above image. And as we go deeper into the network, these simple functions combine together to
form more complex functions like identifying the face. Some of the common examples of
leveraging a deep neural network are:

Deep neural networks find relations with the data (simpler to complex relations). What the first
hidden layer might be doing, is trying to find simple functions like identifying the edges in the
above image. And as we go deeper into the network, these simple functions combine together to
form more complex functions like identifying the face. Some of the common examples of
leveraging a deep neural network are:

 Face Recognition
o Image ==> Edges ==> Face parts ==> Faces ==> desired face
 Audio recognition
o Audio ==> Low level sound features like (sss, bb) ==> Phonemes ==> Words
==> Sentences

Building Blocks of Deep Neural Networks


Consider any layer in a deep neural network. The input to this layer will be the activations from
the previous layer (l-1), and the output of this layer will be its own activations.

 Input: a[l-1]
 Output: a[l]

This layer first calculates the z[l] on which the activations are applied. This z [l] is saved as cache.
For the backward propagation step, it will first calculate da [l], i.e., derivative of the activation at
layer l, derivative of weights dw [l], db[l], dz[l], and finally da[l-1]. Let’s visualize these steps to reduce
the complexity:
This is how each block (layer) of a deep neural network works. Next, we will see how to
implement all of these blocks.

Forward and Backward Propagation


The input in a forward propagation step is a [l-1] and the outputs are a[l] and cache z[l], which is a
function of w[l] and b[l]. So, the vectorized form to calculate Z[l] and A[l] is:

Z[l] = W[l] * A[l-1] + b[l]

A[l] = g[l](Z[l])

We will calculate Z and A for each layer of the network. After calculating the activations, the
next step is backward propagation, where we update the weights using the derivatives. The input
for backward propagation is da [l] and the outputs are da[l-1], dW[l] and db[l]. Let’s look at the
vectorized equations for backward propagation:

dZ[l] = dA[l] * g'[l](Z[l])

dW[l] = 1/m * (dZ[l] * A[l-1].T)


db[l] = 1/m * np.sum (dZ[l], axis = 1, keepdims = True)

dA[l-1] = w[l].T * dZ[l]

This is how we implement deep neural networks.

Deep Neural Networks perform surprisingly well (maybe not so surprising if you’ve used them
before!). Running only a few lines of code gives us satisfactory results. This is because we are
feeding a large amount of data to the network and it is learning from that data using the hidden
layers.

Choosing the right hyper parameters helps us to make our model more efficient

Parameters vs hyper parameters


This is an oft-asked question by deep learning newcomers. The major difference between
parameters and hyper parameters is that parameters are learned by the model during the training
time, while hyper parameters can be changed before training the model.

Parameters of a deep neural network are W and b, which the model updates during the back
propagation step. On the other hand, there are a lot of hyper parameters for a deep NN, including:

 Learning rate – ⍺
 Number of iterations
 Number of hidden layers
 Units in each hidden layer
 Choice of activation function

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy