DL Unit1 HD
DL Unit1 HD
UNIT - I
By,
Dr. Himani Deshpande 1
UNIT - I
1.1
3
HISTORY OF DEEP LEARNING
4
History of
Deep
Learning
BIOLOGICAL NEURON
Output
Input
This myelin sheath allows electrical impulses to transmit quickly and efficiently along the
nerve cells. It gets thicker and thicker as you generate memories stronger and stronger 5
History of
Deep
Learning
NEURONS
6
History of Deep Learning
7
1943 History of
Deep
EVOLUTION OF DEEP LEARNING Learning
x1
W1
Y Y
W2
x2
10
1943 History of
Deep
Learning
LIMITATIONS OF MP
¡ One of the difficulties with the McCulloch-Pitts neuron was its simplicity.
¡ It only used the threshold step activation function and it did not
incorporate weighting the different inputs.
¡ Rosenblatt’s perceptron has one or more inputs with only one output
and a processor.
12
1957 History of
Deep
Learning
13
1957 History of
Deep
14
1957 History of
Deep
Learning
15
1960 History of
Deep
Learning
DELTA LEARNING RULE OR WIDROW HOFF RULE
It was developed by
Bernard Widrow and
Marcian Hoff and It
depends on supervised
learning and has a
continuous activation
function. 16
1960 History of
Deep
¡ It is also known as the Least Mean Square method and it minimizes error over all the training patterns.
17
19
History of
Deep
Learning
MLP
21
History of
Deep
Learning
PERCEPTRON LIMITATION
22
History of
Deep
Learning
23
History of
Deep
Learning
BACKPROPOGATION
24
History of
Deep
Learning
GRADIENT DESCENT
25
UAT
26
History of
Deep
Learning
27
History of
Deep
Learning
28
THE HISTORY OF DEEP LEARNING
¡ The history of deep learning dates back to 1943 when Warren McCulloch and Walter
Pitts created a computer model based on the neural networks of the human brain.
¡ Warren McCulloch and Walter Pitts used a combination of mathematics and
algorithms they called threshold logic to mimic the thought process.
¡ Since then, deep learning has evolved steadily, over the years with two significant
breaks in its development.
¡ The development of the basics of a continuous Back Propagation Model is credited to
Henry J. Kelley in 1960. Stuart Dreyfus came up with a simpler version based only on
the chain rule in 1962. The concept of back propagation existed in the early 1960s
but only became useful until 1985. 29
DEEP LEARNING SUCCESS STORIES
32
APPLICATIONS OF NEURAL NETWORK
ALEXA
33
DEEP LEARNING SUCCESS STORIES
34
DEEP LEARNING SUCCESS STORY
¡ Self Driving Cars or Autonomous Vehicles
¡ News Accumulation and Fake News Detection
¡ Natural Language Processing NLP
¡ Virtual Assistants
¡ Visual Recognition
¡ Deep Learning Applications in Healthcare
¡ Personalization
¡ Colourization of Black and White Images
¡ Adding Sounds to Silent Movies
35
38
MULTILAYER NEURAL NETWORK
40
MLP
43
MLP REPRESENTATION POWER
44
MLP REPRESENTATION POWER
2 INPUTS
45
MLP REPRESENTATION POWER
2 INPUTS
46
MLP REPRESENTATION POWER
2 INPUTS
47
MLP REPRESENTATION POWER
2 INPUTS
48
MLP REPRESENTATION POWER
3 INPUTS
49
MLP REPRESENTATION POWER
50
LP REPRESENTATION POWER
51
MLP REPRESENTATION POWER
52
53
54
SIGMOID FUNCTION
55
SIGMOID NEURON
57
Perceptron suffers from a major disadvantage of
being very harsh at the boundaries when it
WHY SIGMOID NEURON ?
comes to deciding the output.
58
SIGMOID NEURONS
¡ Sigmoid neurons can be used for both binary classification and regression problems.
The output will be a value between ‘0’ and ‘1’ .
Sigmoid neurons are similar to perceptrons, but modified so that small changes
in their weights and bias cause only a small change in their output. 59
PERCEPTRON FUNCTION
60
SIGMOID FUNCTION
61
PERCEPTRON VS SIGMOID
62
PERCEPTRON VS SIGMOID
63
64
GUESS THE GRE SCORE
65
GOAL
¡ Find a better way of traversing the error surface so that we can reach
the minimum value quickly without resorting to brute force search
GDà .
An Algorithm
To
Minimize a function
By
Optimizing parameters 66
VALUES OF WEIGHTS AND BIAS
68
CHANGING THE WEIGHTS
69
GRADIENT DESCENT
70
GRADIENT DESCENT
¡ First-Order Derivative
¡ The first order derivatives tell about the direction of the function whether the
function is increasing or decreasing.
¡ The first derivative math or first-order derivative can be interpreted as an
instantaneous rate of change.
¡ It can also be predicted from the slope of the tangent line.
72
The main objective of using a gradient descent
algorithm is to minimize the cost function using
GRADIENT DESCENT iteration.
¡ Calculates the first-order derivative of the function to compute the gradient or slope
of that function.
¡ Move away from the direction of the gradient, which means the slope increased
from the current point by alpha times, where Alpha is defined as the Learning Rate.
¡ It is a tuning parameter in the optimization process which helps to decide the length
of the steps.
73
GRADIENT DESCENT
74
GRADIENT DESCENT
75
ERROR SURFACE
76
ERROR SURFACE
77
GRADIENT DESCENT
Max error
Min error
78
GRADIENT DESCENT
!!"# = !! − $! %&(!! )
where #" is the $ "# step size (also called learning rate)
Possible Stopping Criteria: iterate until 79
" # = #!
Step size: 0.8
# (#) = −4
80
New value = Old value - learning rate * gradient
GRADIENT DESCENT
" # = #!
81
" # = #!
Step size: 0.8
# (#) = −4
82
New value = Old value - learning rate * gradient
GRADIENT DESCENT
" # = #!
Step size: 0.8
# (#) = −4
Derivative à 2x
83
New value = Old value - learning rate * gradient
GRADIENT DESCENT
" # = #!
84
GRADIENT DESCENT
" # = #!
85
GRADIENT DESCENT " # = #!
Step size: .8
' ($) = −4
' (") = 2.4
' (%) = −1.44
Derivative à 2x
87
GRADIENT DESCENT " # = #!
Step size: .8
' ($) = −4
' (") = 2.4
' (%) = −1.44
88
GRADIENT DESCENT
" # = #!
Step size: .8
' ($) = −4
' (") = 2.4
' (%) = −1.44
' (&) = .864
' (() = −0.5184
' (') = 0.31104
90
91
Step size: .2
GRADIENT DESCENT
92
93
(m,c) c
Loss function
¡ Y = mx +c
¡ Loss function (m,c)
c m
94
GRADIENT DESCENT WITH TWO VARIABLES
c
(m,c) X Y
1 2
3 4
¡ Y = mx +C
Training Data
¡ L(m,c)
¡ Initial assumptions à c = 0, m = 1
¡ L(m,c)
= [2-(c+m*1)] ^2 + [4-(c+3m)] ^2
95
GRADIENT DESCENT WITH TWO VARIABLES
¡ Y = mx +C
¡ L(m,c) c
(m,c) X Y
1 2
¡ Initial assumptions à c = 0, m = 1 3 4
¡ L(m,c) = [2-(c+m*1)] ^2 + [4-(c+3m)] ^2 Training Data
97
NEURONS
Sigmoid Neuron
Smooth output
MLP
Harsh Threshhold
PERCEPTRON
Real Input & Binary Output
FEED FORWARD
MP NEURON 98
99
MUTI LAYERED
100
FEED FORWARD NN
101
REPRESENTATION POWER
102
103
BASIC TERMINOLOGIES OF NN
105
NN
WORKING
Input layer
NN
WORKING
Input layer
NN
WORKING
28 pixel
…
28 pixel
28 X 28
= 784 pixel
Input layer
NN
WORKING x1
0.8
x2 0.2
x3
784 inputs/nodes 0.1
x4
0.3
…
28 pixel
x781 0.8
28 pixel x782
0.2
x783
0.7
x784
109
NN
WORKING x1
0.8
x2 0.2
x3
784 inputs/nodes 0.1
x4
0.3
…
28 pixel
x781 0.8
28 pixel x782
0.2
x783
0.7
x784
Weights 110
NN
WORKING x1
0.8
B1
x2 0.2
B2
x3
784 inputs/nodes 0.1
B3
x4
0.3
…
28 pixel
B4
x781 0.8
28 pixel x782 B5
0.2
x783
B6
0.7
x784
Weights 111
NN Activation
WORKING (x1*0.8 + x3*0.2)+B1
x1
0.8
Function
B1
x2 0.2
B2
x3
0.1
784 inputs/nodes
B3
x4
0.3
…
x781 0.8
B4
x782 B5
0.2
x783
0.7 B6
x784
Weights 112
NN
WORKING x1
0.8
B1
0.6
x2 0.2
B2 0.2
x3 0.2
0.1
784 inputs/nodes
B3 0.4 0.5
x4
0.3
…
28 pixel
B4 0.3
x781 0.8 0.3
0.3 0.2
28 pixel x782 B5
0.2
x783 0.5
0.7 B6
x784
Weights 113
NN
WORKING x1
0.8
B1
0.6
x2 0.2
B2 0.2
x3 0.2
0.1
784 inputs/nodes
B3 0.4 0.5
x4
0.3
…
28 pixel
B4 0.3
x781 0.8 0.3
0.3
28 pixel x782 B5
0.2 0.2
x783 0.5
0.7 B6
x784
114
NN
WORKING n
x1 i o
0.8
e d ict
B1
p r
0.6
x2 0.2
B2 0.2
x3 0.2
0.1 0.5 Actual Error
784 inputs/nodes
B3 0.4 Output
0.5
x4
0.3 0 -0.5
…
0.4
28 pixel
B4 0.3
x781 0.3 1 +0.6
0.8
0.3 0.1 0 -0.1
28 pixel x782 B5
0.2 0.2
x783 0.5
0.7 B6
x784
115
NN
WORKING x1
0.8
B1
0.6
x2 0.2
B2 0.2
x3 0.2
0.1
784 inputs/nodes
B3 0.4 0.5
x4
0.3
…
28 pixel
B4 0.3
x781 0.8 0.3
0.3
28 pixel x782 B5
0.2 0.2
x783 0.5
0.7 B6
x784
116
28 pixel NN
28 pixel
WORKING
784 inputs/nodes
117
NN
WORKING
0.4
0.4
0.3
0.2
B2 0.1
784 inputs/nodes 0.2 0.3
0.2 0.1
0.3
0.5
28 pixel
0.5
0.5
0.4
0.4
28 pixel B5 0.2
0.2 0.2
0.3
0.3
119
NN
WORKING
0.4
0.4
0.3
0.1
B2 0.1
784 inputs/nodes 0.2
0.2 0.06
0.3
0.2
0.3
0.3
0.9
28 pixel
0.5
0.5
0.4
0.4
0.4 0.04
28 pixel
0.7
0.7 B5 0.2
0.3
0.3
0.3
120
NN
WORKING
0.5
0.4
0.1
0.2
0.4
0.1
0.7
0.2
0.2 0.2
0.7
784 inputs/nodes
0.2
0.3
0.3
0.4 0.3
0.77
28 pixel
0.5
0.5 0.4
0.4
0.5
0.5
0.4
0.3
28 pixel
0.03
0.7
0.6 0.3
0.3
0.3
0.3
0.2
121
NN
WORKING
0.5
0.4 Move back to
previous values
0.1
0.2
0.4
0.1
0.7
0.2
0.2 0.2
0.7
784 inputs/nodes
0.2
0.3
0.3
0.4 0.3
0.77
28 pixel
0.5
0.5 0.4
0.4
0.5
0.5
0.4
0.3
28 pixel
0.03
0.7
0.6 0.3
0.3
0.3
0.3
0.2
122
NN
WORKING
0.4
0.4
0.3
0.1
B2 0.1
784 inputs/nodes 0.2
0.2 0.06
0.3
0.2
0.3
0.3
0.9
28 pixel
0.5
0.5
0.4
0.4
0.4 0.04
28 pixel
0.7
0.7 B5 0.2
0.3
0.3
0.3
123
NN PREDICTION EXAMPLE
124
NN IS A BLACK BOX
125
NN TERMINOLOGIES
In the simplest network we would have an input layer, a hidden layer and an output layer. Each layer has
multiple neurons and all the neurons in each layer are connected to all the neurons in the next layer. These
networks can also be called as fully connected networks.
126
NN TERMINOLOGIES
LEARNING RATE –
The learning rate is defined as the amount of minimization in the cost function in each
iteration.
In simple terms, the rate at which we descend towards the minima of the cost function is
the learning rate.
We should choose the learning rate very carefully since it should neither be very large
that the optimal solution is missed and nor should be very low that it takes forever for
the network to converge.
127
NN TERMINOLOGIES
LEARNING RATE
Forward Propagation refers to the movement of the input through the hidden
layers to the output layers.
In forward propagation, the information travels in a single direction FORWARD.
The input layer supplies the input to the hidden layers and then the output is
generated.
There is no backward movement.
129
BACK PROPAGATION
130
131
NN TERMINOLOGIES
BACKPROPAGATION
¡ When we define a neural network, we assign random weights and bias values to our nodes.
¡ Once we have received the output for a single iteration, we can calculate the error of the network. This error is
then fed back to the network along with the gradient of the cost function to update the weights of the network.
¡ These weights are then updated so that the errors in the subsequent iterations is reduced. This updating of weights
using the gradient of the cost function is known as back-propagation.
¡ In back-propagation the movement of the network is backwards, the error along with the gradient flows back from
the out layer through the hidden layers and the weights are updated.
132
NN TERMINOLOGIES
COST FUNCTION
When we build a network, the network tries to predict the output as close as possible to the
actual value.
The cost or loss function tries to penalize the network when it makes errors.
133
NN TERMINOLOGIES
EPOCH
One forward pass and one backward pass of all training datasets is 1 Epoch
135
GRADIENT DESCENT
¡ The best way to define the local minimum or local maximum of a function using
gradient descent is as follows:
à If we move towards a negative gradient or away from the gradient of the function at the
current point, it will give the local minimum of that function.
à Whenever we move towards a positive gradient or towards the gradient of the function at
the current point, we will get the local maximum of that function.
136
GRADIENT DESCENT
¡ This entire procedure is known as Gradient Ascent, which is also known as steepest descent.
¡ The main objective of using a gradient descent algorithm is to minimize the cost function using
iteration. To achieve this goal, it performs two steps iteratively:
à Calculates the first-order derivative of the function to compute the gradient or slope of that
function.
à Move away from the direction of the gradient, which means slope increased from the current
point by alpha times, where Alpha is defined as Learning Rate. It is a tuning parameter in the
optimization process which helps to decide the length of the steps.
137
A multi-layer perceptron (MLP) is a form of feedforward neural
network that consists of multiple layers of computation nodes
FEED FORWARD NN that are connected in a feed-forward way.
¡ A feed-forward neural network, in which some routes are cycled, is the polar
opposite of a Recurrent Neural Network.
¡ The feed-forward model is the basic type of neural network because the input
is only processed in one direction. The data always flows in one direction and
never backwards/opposite.
The process of receiving an input to produce some kind of output to make some 138
¡ Feed Forward neural network is the core of many other important neural networks
such as convolution neural network.
¡ There can be multiple hidden layers which depend on what kind of data you are
dealing with. The number of hidden layers is known as the depth of the neural
network. The deep neural network can learn from more functions. Input layer first
provides the neural network with data and the output layer then make predictions
on that data which is based on a series of functions. ReLU Function is the most
commonly used activation function in the deep neural network.
139
140
CLASSES OF DEEP LEARNING
¡ Supervised,
¡ Unsupervised and
¡ Hybrid
141
142
SUPERVISED LEARNING
¡ Supervised learning involves training a deep learning model on labelled data, where
both input data and corresponding output labels are provided.
¡ The model learns to map the input data to the correct output by minimizing the
discrepancy between the predicted and true labels.
¡ It is commonly used for tasks such as image classification, object detection, and
natural language processing, where the goal is to learn the mapping between inputs
and pre-defined outputs.
143
UNSUPERVISED LEARNING