Deep Learningchap2
Deep Learningchap2
RÉSIDENCES FABRIKAM
Deep Learning
Ms Mounira Zouaghi
Content
1 Introduction to deeplearning
Fundamentals of deeplearning
2
Deeplearning frameworks:PyTorch,
3 TensorFlow, Keras…
2
Chapter2: Fundamentals of Deep Learning
What is Deep Learning
a subset of machine learning that's based
on artificial neural networks.
https://www.youtube.com/watch?v=ER2It2mIagI
What is Neuron?
Synapses Weights
Axon Output
How does Neuron work?
Activation function(1)
• Is the first mathematical operation done by
neuron
step1
y=
bi: bias
step2
z=Activation(y)=f(y)
Activation function (2)
What?
Activation function is a mathematical function that we use to get the output of Node called also
Transfer Function,
Why?
Used to determine the output of neural network like yes or no (Neuron is Activated or Not). Output is generally
between 0 and 1
Examples ?
How to train NN with BackProbagation
𝜕𝑙𝑜𝑠𝑠
𝑤 1𝑛𝑒𝑤=𝑤 1 𝑜𝑙𝑑 −𝜇
𝜕𝑤 1
y: correct result
x1
w1 𝝁 : 𝑳𝒆𝒂𝒓𝒏𝒊𝒏𝒈 𝑹𝒂𝒕𝒆
w4 o4
x2 w2 ŷ The same process will be
done some number of
w3 times (epochs) until loss
𝑙𝑜𝑠𝑠=( 𝑦 − ŷ ) 2
function will be reduced
x3
to 0
How does it work?
https://www.youtube.com/watch?v=bfmFfD2RIcg
https://www.youtube.com/watch?v=ER2It2mIagI
Train Multilayer NNT: Gradient Descent
Hidden Hidden
input Layer
Layer1 Layer2
w11
x1
f11 O11
w21 𝝁 : 𝑳𝒆𝒂𝒓𝒏𝒊𝒏𝒈 𝑹𝒂𝒕𝒆
w31 O/p Layer
x2 w41
O12 f21 O21
f 12
f21
ŷ
O22
y: eventuel result
x3
O13
f22 Loss=(y-ŷ) 2
ŷ : calculated result
f 13
x4
Gradient Descent Optimizer
Initialization of NN Parameters:Weights and
Repeat to obtain bayes
the minimum cost
Update parameters
Types of Gradient Descent
SGD: Stochastic Gradient Descent: In SGD, a random mini-batch of training examples is used to estimate the
gradient in each iteration. This introduces randomness and can lead to faster convergence and escape local
minima
Batch Gradient Descent: In this variant, the entire training dataset is used to compute the gradient in each
iteration. It can be computationally expensive for large datasets but is more stable than SGD
Mini-Batch Gradient Descent:This approach combines the benefits of full-batch gradient descent and SGD by
using a small random subset (mini-batch) of the training data in each iteration.
Chain Rule in Backpropagation
Hidden Hidden
input Layer
Layer1 Layer2
w111 𝝁 : 𝑳𝒆𝒂𝒓𝒏𝒊𝒏𝒈 𝑹𝒂𝒕𝒆
x1 w 11 2
f11
w121 O11 𝝏 𝒍𝒐𝒔𝒔 𝝏 𝒍𝒐𝒔𝒔 𝝏O31
= ∗
O/p Layer 𝝏 w 3 11 𝝏O 3 1 𝝏 w 311
w212
x2
w131
f21 O21
f 12
O12 w311
f21
ŷ O31
w141 O22
x3
f22 w321
()
f 13
O13
x4
y: eventuel result
Loss=(y-ŷ) 2
ŷ : calculated result
Types of NN
Perceptron
Multilayer Perceptron
Avantages Disadvantages
Perceptrons can implement Logic Gates l Perceptrons can only learn linearly separable
like AND, OR, or NAND. problems such as boolean AND problem.