4-Neural Networks and Activation Function
4-Neural Networks and Activation Function
𝑥1
𝑥2 ^
𝑦
𝑥3
deeplearning.ai
Neural Network Representation
Consider the following representation of Neural Network.
It has two layers i.e., one hidden layer and one output layer.
The first layer is referred as a[0], second layer as a[1], and the final layer as a[2]. Here ‘a’
stands for activations.
The corresponding parameters are w[1], b[1] and w[1], b[2]
deeplearning.ai
Neural Network Representation
𝑥1
𝑥2 ^
𝑦
𝑥3
𝑥1
𝑥2 𝑤
𝑇
𝜎 (𝑧)
𝑥 +𝑏 𝑎= ^
𝑦 𝑥1
𝑧 𝑎 𝑥2 ^
𝑦
𝑥3
𝑥3
𝑇
𝑧 =𝑤 𝑥 +𝑏
𝑎=𝜎
07/04/2024(𝑧 ) Zeeshan Khan 8
Computing a Neural Network’s Output
This step is performed by each neuron.
𝑥1
𝑥1
𝑥2 ^
𝑦
𝑥2 𝑤
𝑇
𝜎 (𝑧)
𝑥 +𝑏 𝑎= ^
𝑦 𝑥3
𝑧 𝑎
𝑥3
𝑧 =𝑤 𝑥 +𝑏
𝑇
𝑥1
𝑎=𝜎 (𝑧 ) 𝑥2 ^
𝑦
𝑥3
07/04/2024 Zeeshan Khan 9
Computing a Neural Network’s Output
This step is performed by each neuron. The equations for the first hidden layer with four
neurons will be:
𝑎 [11 ] )
𝑥1 𝑎 [21 ] )
𝑥2 [1 ]
^
𝑦
)
𝑎3
𝑥3
𝑎 [41 ] )
𝑎 [11 ]
𝑧 [ 1] = 𝑊 [ 1 ] 𝑥+ 𝑏[ 1 ]
𝑥1 𝑎 [21 ]
𝑥2 ^
𝑦 𝑎 [ 1 ] = 𝜎 ( 𝑧 [ 1] )
𝑎 [31 ]
𝑥3
𝑎 [41 ] 𝑧 [ 2 ] =𝑊 [ 2] 𝑎 [ 1] +𝑏 [ 2 ]
𝑎[ 2 ] = 𝜎 ( 𝑧 [ 2 ] )
To compute these outputs, we need to run a for loop which will calculate these values
individually for each neuron. But recall that using a for loop will make the computations very
slow,07/04/2024
and hence we should optimize the code to Khan
Zeeshan get rid of this for loop and run it faster. 11
Vectorizing across multiple examples
The non-vectorized form of computing the output from a neural network is:
for i=1 to m:
a[1](i) = 𝛔(z[1](i))
a[2](i) = 𝛔(z[2](i))
deeplearning.ai
Using this for loop, we are calculating z and a value for each training example separately.
Now we will look at how it can be vectorized. All the training examples will be merged in a
single matrix X:
Vectorizing across multiple examples
for i = 1 to m:
[ 1] (𝑖 ) [ 1 ] (𝑖 ) [1 ]
𝑧 =𝑊 𝑥 +𝑏
[ 1 ] ( 𝑖) [ 1] ( 𝑖 )
𝑎 =𝜎 ( 𝑧 )
[ 2 ] (𝑖 ) [ 2] [ 1] ( 𝑖 ) [ 2]
𝑧 =𝑊 𝑎 +𝑏
[2 ](𝑖 ) [2 ]( 𝑖 )
𝑎 =𝜎 (𝑧 )
deeplearning.ai
Activation functions
What are activation functions ?
Activation function decides, whether a neuron should be activated or not. The
purpose of the activation function is to introduce non-linearity into the output
of a neuron
Why do we need Non-linear activation functions ?
A neural network without an activation function is essentially just a linear
regression model. The activation function does the non-linear transformation to
the input making it capable to learn and perform more complex tasks.
Sigmoid activation function
It is a function which is plotted as ‘S’ shaped graph. a
1 z
𝑎=
sigmoid: 1 +𝑒
−𝑧
Nature : Non-linear.
Value Range : 0 to 1
Uses : Usually used in output layer of a binary classification, where result is either 0 or 1,
as value for sigmoid function lies between 0 and 1 only so, result can be predicted easily to
be 1 if value is greater than 0.5 and 0 otherwise.
Tanh activation function
The activation that works almost always better than sigmoid function is Tanh function also
knows as Tangent Hyperbolic function. It’s actually mathematically shifted version of the
sigmoid function.
a
z
Equation :- A(Z) = max(0,Z). It gives an output z if z is positive and 0 otherwise.
Value Range :- [0, inf)
Nature :- non-linear, which means we can easily backpropagate the errors and have multiple
layers of neurons being activated by the ReLU function.
Uses :- ReLu is less computationally expensive than tanh and sigmoid because it involves
simpler mathematical operations. In simple words, RELU learns much faster than sigmoid and
Tanh function.
Leaky ReLU activation function
It is an attempt to solve the dying ReLU problem
Equation :- A(Z) = max(0.01,Z). It gives an output z if z is positive and 0 otherwise.
The leak helps to increase the range of the ReLU function. Usually, the value of a is 0.01
or so.When a is not 0.01 then it is called Randomized ReLU. Therefore the range of the
Leaky ReLU is (-infinity to infinity).
Both Leaky and Randomized ReLU functions are monotonic in nature. Also, their
derivatives also monotonic in nature.
Softmax activation function
The softmax function is also a type of sigmoid function but is handy when we are trying to
handle classification problems.
Nature :- non-linear
Uses :- Usually used when trying to handle multiple classes. The softmax function would
squeeze the outputs for each class between 0 and 1 and would also divide by the sum of the
outputs.
Output:- The softmax function is ideally used in the output layer of the classifier where we
are actually trying to attain the probabilities to define the class of each input.
Activation Functions
Activation Function Pros Cons
Sigmoid It is useful for binary Output is restricted between 0 and 1
classification
tanh Better than sigmoid Parameters are updated slowly
when points are at extreme ends
ReLU Parameters are updated faster Zero slope when x<0
as slope is 1 when x>0
• The basic rule of thumb is if you really don’t know what activation function to use, then
simply use RELU as it is a general activation function and is used in most cases these
days.
• If your output is for binary classification then, sigmoid function is very natural choice for
output layer.
07/04/2024 Zeeshan Khan 21
Choosing a good W
f(x,W) = Wx + b