0% found this document useful (0 votes)
24 views28 pages

4-Neural Networks and Activation Function

Uploaded by

biware8359
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views28 pages

4-Neural Networks and Activation Function

Uploaded by

biware8359
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

Lecture 2

Shallow Neural Networks


Shallow Neural Networks
 In the single-layer neural network, the training process is relatively straightforward because
the error (or loss function) can be computed as a direct function of the weights, which
allows easy gradient computation.
 In the case of multi-layer networks, the problem is that the loss is a complicated
composition function of the weights in earlier layers.
 The gradient of a composition function is computed using the backpropagation algorithm.
 The backpropagation algorithm leverages the chain rule of differential calculus, which
deeplearning.ai
computes the error gradients in terms of summations of local-gradient products over the
various paths from a node to the output.
Neural Network Representation

𝑥1
𝑥2 ^
𝑦

𝑥3

07/04/2024 Zeeshan Khan 3


Shallow Neural Networks

Backpropagation Algorithm contains two main phases/passes, referred to as the forward


and backward phases/passes, respectively.
 Forward pass: In this pass, the inputs for a training instance are fed into the neural
network. This results in a forward cascade of computations across the layers, using the
current set of weights.
 Backward pass: The main goal of the backward pass is to learn the gradient of the loss
deeplearning.ai
function with respect to the different weights by using the chain rule of differential
calculus. These gradients are used to update the weights.
Calculating Neural Networks’ Output
Forward Pass

deeplearning.ai
Neural Network Representation
Consider the following representation of Neural Network.
 It has two layers i.e., one hidden layer and one output layer.
 The first layer is referred as a[0], second layer as a[1], and the final layer as a[2]. Here ‘a’
stands for activations.
 The corresponding parameters are w[1], b[1] and w[1], b[2]

deeplearning.ai
Neural Network Representation

𝑥1
𝑥2 ^
𝑦

𝑥3

07/04/2024 Zeeshan Khan 7


Computing a Neural Network’s Output
Let’s look in detail at how each neuron of a neural network works. Each neuron takes an input,
performs some operation on them (calculates z = w[T] + b), and then applies the activation
function (sigmoid) function:

𝑥1

𝑥2 𝑤
𝑇
𝜎 (𝑧)
𝑥 +𝑏 𝑎= ^
𝑦 𝑥1
𝑧 𝑎 𝑥2 ^
𝑦
𝑥3
𝑥3
𝑇
𝑧 =𝑤 𝑥 +𝑏
𝑎=𝜎
07/04/2024(𝑧 ) Zeeshan Khan 8
Computing a Neural Network’s Output
This step is performed by each neuron.

𝑥1
𝑥1
𝑥2 ^
𝑦
𝑥2 𝑤
𝑇
𝜎 (𝑧)
𝑥 +𝑏 𝑎= ^
𝑦 𝑥3
𝑧 𝑎
𝑥3

𝑧 =𝑤 𝑥 +𝑏
𝑇
𝑥1
𝑎=𝜎 (𝑧 ) 𝑥2 ^
𝑦
𝑥3
07/04/2024 Zeeshan Khan 9
Computing a Neural Network’s Output

This step is performed by each neuron. The equations for the first hidden layer with four
neurons will be:

𝑎 [11 ] )
𝑥1 𝑎 [21 ] )
𝑥2 [1 ]
^
𝑦
)
𝑎3
𝑥3
𝑎 [41 ] )

07/04/2024 Zeeshan Khan 10


Computing a Neural Network’s Output
So, for given input X, the outputs for each layer will be:

𝑎 [11 ]
𝑧 [ 1] = 𝑊 [ 1 ] 𝑥+ 𝑏[ 1 ]
𝑥1 𝑎 [21 ]
𝑥2 ^
𝑦 𝑎 [ 1 ] = 𝜎 ( 𝑧 [ 1] )
𝑎 [31 ]
𝑥3
𝑎 [41 ] 𝑧 [ 2 ] =𝑊 [ 2] 𝑎 [ 1] +𝑏 [ 2 ]

𝑎[ 2 ] = 𝜎 ( 𝑧 [ 2 ] )
To compute these outputs, we need to run a for loop which will calculate these values
individually for each neuron. But recall that using a for loop will make the computations very
slow,07/04/2024
and hence we should optimize the code to Khan
Zeeshan get rid of this for loop and run it faster. 11
Vectorizing across multiple examples
The non-vectorized form of computing the output from a neural network is:

for i=1 to m:

z[1](i) = W[1](i)x + b[1]

a[1](i) = 𝛔(z[1](i))

z[2](i) = W[2](i)x + b[2]

a[2](i) = 𝛔(z[2](i))
deeplearning.ai
Using this for loop, we are calculating z and a value for each training example separately.
Now we will look at how it can be vectorized. All the training examples will be merged in a
single matrix X:
Vectorizing across multiple examples
for i = 1 to m:
[ 1] (𝑖 ) [ 1 ] (𝑖 ) [1 ]
𝑧 =𝑊 𝑥 +𝑏
[ 1 ] ( 𝑖) [ 1] ( 𝑖 )
𝑎 =𝜎 ( 𝑧 )
[ 2 ] (𝑖 ) [ 2] [ 1] ( 𝑖 ) [ 2]
𝑧 =𝑊 𝑎 +𝑏
[2 ](𝑖 ) [2 ]( 𝑖 )
𝑎 =𝜎 (𝑧 )

07/04/2024 Zeeshan Khan 13


Activation functions

deeplearning.ai
Activation functions
What are activation functions ?
Activation function decides, whether a neuron should be activated or not. The
purpose of the activation function is to introduce non-linearity into the output
of a neuron
Why do we need Non-linear activation functions ?
A neural network without an activation function is essentially just a linear
regression model. The activation function does the non-linear transformation to
the input making it capable to learn and perform more complex tasks.
Sigmoid activation function
It is a function which is plotted as ‘S’ shaped graph. a

1 z
𝑎=
sigmoid: 1 +𝑒
−𝑧

Nature : Non-linear.
Value Range : 0 to 1
Uses : Usually used in output layer of a binary classification, where result is either 0 or 1,
as value for sigmoid function lies between 0 and 1 only so, result can be predicted easily to
be 1 if value is greater than 0.5 and 0 otherwise.
Tanh activation function
The activation that works almost always better than sigmoid function is Tanh function also
knows as Tangent Hyperbolic function. It’s actually mathematically shifted version of the
sigmoid function.
a

Formula: tanh(z) = 2 * sigmoid(2z) - 1


x
Value Range :- -1 to +1
Nature :- non-linear
Uses :- Usually used in hidden layers of a neural network as it’s values lies between -1 to
1 hence the mean for the hidden layer comes out be 0 or very close to it, hence helps
in centering the data by bringing mean close to 0. This makes learning for the next layer
much easier.
ReLU activation function
Stands for Rectified linear unit. It is the most widely used activation function. Mainly
implemented in hidden layers of Neural network.

z
Equation :- A(Z) = max(0,Z). It gives an output z if z is positive and 0 otherwise.
Value Range :- [0, inf)
Nature :- non-linear, which means we can easily backpropagate the errors and have multiple
layers of neurons being activated by the ReLU function.
Uses :- ReLu is less computationally expensive than tanh and sigmoid because it involves
simpler mathematical operations. In simple words, RELU learns much faster than sigmoid and
Tanh function.
Leaky ReLU activation function
 It is an attempt to solve the dying ReLU problem
 Equation :- A(Z) = max(0.01,Z). It gives an output z if z is positive and 0 otherwise.
 The leak helps to increase the range of the ReLU function. Usually, the value of a is 0.01
or so.When a is not 0.01 then it is called Randomized ReLU. Therefore the range of the
Leaky ReLU is (-infinity to infinity).
 Both Leaky and Randomized ReLU functions are monotonic in nature. Also, their
derivatives also monotonic in nature.
Softmax activation function
 The softmax function is also a type of sigmoid function but is handy when we are trying to
handle classification problems.
 Nature :- non-linear
 Uses :- Usually used when trying to handle multiple classes. The softmax function would
squeeze the outputs for each class between 0 and 1 and would also divide by the sum of the
outputs.
 Output:- The softmax function is ideally used in the output layer of the classifier where we
are actually trying to attain the probabilities to define the class of each input.
Activation Functions
Activation Function Pros Cons
Sigmoid It is useful for binary Output is restricted between 0 and 1
classification
tanh Better than sigmoid Parameters are updated slowly
when points are at extreme ends
ReLU Parameters are updated faster Zero slope when x<0
as slope is 1 when x>0

• The basic rule of thumb is if you really don’t know what activation function to use, then
simply use RELU as it is a general activation function and is used in most cases these
days.
• If your output is for binary classification then, sigmoid function is very natural choice for
output layer.
07/04/2024 Zeeshan Khan 21
Choosing a good W

f(x,W) = Wx + b

1. Use a loss function to quantify how good a value of W is

2. Find a W that minimizes the loss function (optimization)

07/04/2024 Zeeshan Khan 22


Loss Function
•A loss function tells how good our current classifier is

•Low loss = good classifier High loss = bad classifier

•(Also called: objective function; cost function)

07/04/2024 Zeeshan Khan 23


Loss Function
•A loss function tells how good our current classifier is

•Low loss = good classifier High loss = bad classifier

•(Also called: objective function; cost function)

•Negative loss function sometimes called reward function, profit


function, utility function, fitness function, etc

07/04/2024 Zeeshan Khan 24


Loss Function
•A loss function tells how good our Given a dataset of examples
current classifier is
𝑁
•Low loss = good classifier
𝑥i , 𝑦i i" 1
High loss = bad classifier Where 𝑥i is image and
𝑦i is (integer) label
•(Also called: objective function; cost
function)

•Negative loss function sometimes called


reward function, profit function, utility
function, fitness function, etc
07/04/2024 Zeeshan Khan 25
Loss Function
•A loss function tells how good our Given a dataset of examples
current classifier is 𝑁
𝑥i , 𝑦i i" 1
•Low loss = good classifier
High loss = bad classifier Where 𝑥i is image and
𝑦i is (integer) label
•(Also called: objective function; cost
function) Loss for a single example is
𝐿i 𝑓 𝑥i , 𝑊
•Negative loss function sometimes called
reward function, profit function, utility , 𝑦i
function, fitness function, etc
07/04/2024 Zeeshan Khan 26
Loss Function
•A loss function tells how good our
current classifier is

•Low loss = good classifier


High loss = bad classifier

•(Also called: objective function; cost


function)

•Negative loss function sometimes called


reward function, profit function, utility
function, fitness function, etc
07/04/2024 Zeeshan Khan 27
Next Lecture:
Gradient descent for neural networks
Backward Pass
deeplearning.ai

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy