0% found this document useful (0 votes)
33 views52 pages

Artificial Neural Network - Training

Uploaded by

pandyaalpha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views52 pages

Artificial Neural Network - Training

Uploaded by

pandyaalpha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Artificial Neural Network : Training

Credits:

Soft Computing Applications, Dr. Debasis Samanta, IIT Kharagpur


Soft Computing: Fundamentals and Applications, Dr. D. K. Pratihar, IIT Kharagpur
“Principles of Soft Computing” by S.N. Sivanandam & SN Deepa
Neural Networks, Fuzzy Logic and Genetic Algorithms by S. Rajasekaran and GAV Pai
Neural networks- a comprehensive foundation by Simon S Haykin
The concept of learning

The learning is an important feature of human computational


ability.

Learning may be viewed as the change in behavior acquired due


to practice or experience, and it lasts for relatively long time.

As it occurs, the effective coupling between the neuron ismodified.


In case of artificial neural networks, it is a process of modifying
neural network by updating its weights, biases and other
parameters, if any.

During the learning, the parameters of the networks are optimized


and as a result process of curve fitting.

It is then said that the network has passed through a learning


phase.
Kinds of Learning

Parameter Learning: It involves changing and updating the


connecting weights in the Neural Net.

Structure Learning: It focuses on changing the structure or


architecture of the Neural Net.
Types of learning
Types of learning
There are several learning techniques.
A taxonomy of well known learning techniques are shown below.
Different learning techniques: Supervised learning

Supervised learning

In this learning, A teacher is present during the learning


process and presents expected output.

Thus, in this form of learning, the input-output relationship of the


training scenarios are available.

Here, the output of a network is compared with the corresponding


target value and the error is determined.

The “error” generated is used to change network parameters


that result improved performance.

This type of training is called learning with the help of teacher.


Different learning techniques: Unsupervised
learning

Unsupervised learning: No teacher is present

If the target output is not available, then the error in prediction can
not be determined and in such a situation, the system learns of its
own by discovering and adapting to structural features in the input
patterns.

This type of training is called learning without a teacher.


Different learning techniques: Reinforced learning

Reinforced learning

In this techniques, although a teacher is available, it does not tell


the expected answer, but only tells if the computed output is
correct or incorrect. A reward is given for a correct answer
computed and a penalty for a wrong answer. This information
helps the network in its learning process.

Note : Supervised and unsupervised learnings are the most


popular forms of learning.

Unsupervised learning is very common in biological systems.

It is also important for artificial neural networks : training data are


not always available for the intended application of the neural
network.
Different learning techniques : Gradient descent
learning
Gradient Descent learning :
This learning technique is based on the minimization of error E
defined in terms of weights and the activation function of the
network.
Also, it is required that the activation function employed by the
network is differentiable, as the weight update is dependent on the
gradient of the error E.
Thus, if ∆W ij denoted the weight update of the link connectingthe
i-th and j-th neuron of the two neighboring layers then
∆W ij = η∂W∂E
ij

where η is the learning rate parameter and ∂E


∂Wij is theerror
gradient with reference to the weight Wij
The least mean square and back propagation uses this
learning technique.
Different learning techniques : Stochastic learning

Stochastic learning

In this method, weights are adjusted in a probabilistic fashion.


Simulated annealing is an example of such learning (proposed by
Boltzmann and Cauch)
Different learning techniques : Competitive
learning

Competitive learning

In this learning method, those neurons which responds strongly to


input stimuli have their weights updated.

When an input pattern is presented, all neurons in the layer


compete and the winning neuron undergoes weight adjustment.

This is why it is called a Winner-takes-all strategy.

Next, we will discuss a generalized approach of supervised


learning to train different type of neural network architectures.
Classification of NN Systems
Training SLFFNNs
Single layer feed forward NN training
We know that, several neurons are arranged in one layer with
inputs and weights connect to every neuron.
Learning in such a network occurs by adjusting the weights
associated with the inputs so that the network can classify the
input patterns.
A single neuron in such a neural network is called perceptron.
The algorithm to train a perceptron is stated below.
Let there is a perceptron with (m + 1) inputs x0, x1, x2, · · · ,xm
where x0 = 1 is the bias input.
Let f denotes the transfer function of the neuron. Suppose, X¯and Y¯
denotes the input-output vectors as a training data set. W¯denotes
the weight matrix.
With this input-output relationship pattern and configuration of a
perceptron, the algorithm Training Perceptron to train the perceptron
is stated in the following slide.
Single layer feed forward NN training

Algorithm 1 Perceptron Learning


w = [w0, w1, w2, . . . , wm]
x = [1, x1, x2, . . . , xm]
P ← input with labels 1;
N ← input with labels 0;
Initialize w randomly;
while !convergence do
Pick random x ∈ P ∪ N
if x ∈ P and wTx < 0 then
w = w + x
if x ∈ N and wTx ≥ 0 then
w = w − x
end
Single layer feed forward NN training
Algorithm 2 Perceptron Convergence Algorithm
Single layer feed forward NN training

Note :

The algorithm is based on the supervised learning


technique

ADALINE : Adaptive Linear Network Element is also an alternative


neuron to perceptron

If there are 10 number of neurons in the single layer feed forward


neural network to be trained, then we have to iterate the algorithm
for each perceptron in the network.
Training MLFFNNs
Training multilayer feed forward neural network

Like single layer feed forward neural network, supervisory training


methodology is followed to train a multilayer feed forward neural
network.

Before going to understand the training of such a neural network,


we redefine some terms involved in it.

A block digram and its configuration for a three layer multilayer FF


NN of type m − n − p is shown in the next slide.
Back-propagation Network
Learning a MLFFNN

Whole learning method consists of the following three computations:

1 Input layer computation


2 Hidden layer computation

3 Output layer computation


Input- Output data
Specifying a MLFFNN
Specifying a MLFFNN
Back Propagation Algorithm

The above discussion comprises how to calculate values of


different parameters in m − n − p multiple layer feed forward
neural network.

Next, we will discuss how to train such a neural network.


We consider the most popular algorithm calledBack-Propagation
algorithm, which is a supervised learning.

The principle of the Back-Propagation algorithm is based on the


error-correction with Steepest-descent method.

We first discuss the method of steepest descent followed by its


use in the training algorithm.
Method of Steepest Descent

Supervised learning is, in fact, error-based learning.


In other words, with reference to an external (teacher) signal (i.e.
target output) it calculates error by comparing the target output
and computed output.

Based on the error signal, the neural network should modify its
configuration, which includes synaptic connections, that is , the
weight matrices.

It should try to reach to a state, which yields minimum error.


In other words, its searches for a suitable values of parameters
minimizing error, given a training set.

Note that, this problem turns out to be an optimization problem.


Method of Steepest Descent

E
Error, E

Initial weights

Adjusted
V, W W Best weight
weight

(b) Error surface with two


(a) Searching for a minimum error parameters V and W
Method of Steepest Descent

For simplicity, let us consider the connecting weights are the only
design parameter.

Suppose, V and W are the wights parameters to hidden and


output layers, respectively.

Thus, given a training set of size T, the error surface, E can be


represented as
E= Σ T
t= 1 et (V, W, It)
where It is the t-th input pattern in the training set and et...)
denotes the error computation of the t-th input.

Now, we will discuss the steepest descent method of computing


error, given a changes in V and W matrices.
Calculation of error in a neural network

Let us consider any k-th neuron at the output layer. For an input
pattern It ∈ T (input in training), the target output tk of the k-th
neuron, computed output be yk .

Then, the error ek of the k-th neuron is defined corresponding to


the input It as
1 2
𝐸= 𝑡 − 𝑦𝑘
2 𝑘
where 𝑦𝑘 denotes the observed output of the k-th neuron.
Supervised learning : Back-propagation algorithm
The back-propagation algorithm can be followed to train a neural
network to set its topology, connecting weights, bias values and
many other parameters.
In this present discussion, we will only consider updating weights.
Thus, we can write the error E corresponding to a particular
training scenario T as a function of the variable V and W . That is
E = f (V, W )
In BP algorithm, this error E is to be minimized using the gradient
descent method. We know that according to the gradient descent
method, the changes in weight value can be given as

∂E
∆V = −η (1)
∂V
and

∂E
∆W = −η (2)
∂W
Supervised learning : Back-propagation algorithm

∂E ∂E
Note that −ve sign is used to signify the fact that if ∂V (or ∂W )
> 0, then we have to decrease V and vice-versa.
Let vij (and wjk ) denotes the weights connecting i-th neuron (at the
input layer) to j-th neuron(at the hidden layer) and connecting j-th
neuron (at the hidden layer) to k-th neuron (at the output layer).

Also, let Ek denotes the error at the k-th output neuron.


Supervised learning: Back-propagation algorithm

1 2
𝐸𝑘 = 𝑡 − 𝑦𝑘
2 𝑘

Activation Function

Input Identity(Linear AF)

Hidden Binary Sigmoid (Log Sigmoid)

Output Binary Sigmoid (Log Sigmoid)


Calculation of w jk
Calculation of w jk

1 2
𝐸𝑘 = 𝑡 − 𝑦𝑘
2 𝑘
𝜕𝐸
Calculation of 𝜕𝑤 𝑘
𝑗𝑘
Calculation of w jk

yk
wjk yink f()
Ek
tk
Using Chain-Rule of differentiation
𝜕𝐸𝑘 𝜕𝐸 𝜕𝑦𝑘 𝜕𝑦𝑖𝑛𝑘
𝜕𝑤𝑗𝑘
= 𝜕𝑦𝑘 x 𝜕𝑦 x 𝜕𝑤𝑗𝑘
𝑘 𝑖𝑛𝑘
Calculation of w jk

yk
wjk yink f()
Ek
tk
𝜕𝐸𝑘 𝜕𝐸 𝜕𝑦𝑘 𝜕𝑦𝑖𝑛𝑘
= 𝜕𝑦𝑘 x 𝜕𝑦 x
𝜕𝑤𝑗𝑘 𝑘 𝑖𝑛𝑘 𝜕𝑤𝑗𝑘

1 2
𝐸𝑘 = 𝑡 − 𝑦𝑘
2 𝑘
𝜕𝐸𝑘
= −(𝑡𝑘 − 𝑦𝑘 )
𝜕𝑦𝑘
Calculation of w jk

yk
wjk yink f()
Ek
tk
𝜕𝐸𝑘 𝝏𝑬 𝜕𝑦𝑘 𝜕𝑦𝑖𝑛𝑘
= 𝝏𝒚𝒌 x 𝜕𝑦 x
𝜕𝑤𝑗𝑘 𝒌 𝑖𝑛𝑘 𝜕𝑤𝑗𝑘

1 2
𝐸𝑘 = 𝑡 − 𝑦𝑘
2 𝑘
𝝏𝑬𝒌
= −(𝑡𝑘 − 𝑦𝑘 )
𝝏𝒚𝒌
Calculation of w jk

𝜕𝐸𝑘 𝜕𝐸𝑘 𝝏𝒚𝒌 𝜕𝑦𝑖𝑛𝑘


= x x
𝜕𝑤𝑗𝑘 𝜕𝑦𝑘 𝝏𝒚𝒊𝒏𝒌 𝜕𝑤𝑗𝑘

In the Output Layer: activation function is chosen Logistic Sigmoid.


1
𝑦𝑘 = 𝑓 𝑥 = 𝑓 ′ (𝑥) = 𝜆 𝑓 𝑥 (1 − 𝑓(𝑥))
1+ 𝑒 −𝜆𝑥
1
𝑦𝑘 = 𝑓 𝒚𝒊𝒏𝒌 =
1+ 𝑒 −𝜆𝒚𝒊𝒏𝒌

𝝏𝒚𝒌
= 𝜆𝑦𝑘 (1 − 𝑦𝑘 )
𝝏𝒚𝒊𝒏𝒌
Calculation of w jk

𝜕𝐸𝑘 𝜕𝐸 𝜕𝑦𝑘 𝝏𝒚𝒊𝒏𝒌


= 𝜕𝑦𝑘 x 𝜕𝑦 x
𝜕𝑤𝑗𝑘 𝑘 𝑖𝑛𝑘 𝝏𝒘𝒋𝒌

𝑦𝑖𝑛𝑘 = 𝑧1 𝑤1𝑘 + 𝑧2 𝑤2𝑘 + … + 𝑧𝑗 𝑤𝑗𝑘 + ⋯ + 𝑧𝑛 𝑤𝑛𝑘


𝝏𝒚𝒊𝒏𝒌
= 𝑧𝑗
𝝏𝒘𝒋𝒌
Calculation of w jk

Putting all the above values together

𝜕𝐸𝑘 𝜕𝐸 𝜕𝑦𝑘 𝜕𝑦𝑖𝑛𝑘


= 𝜕𝑦𝑘 x 𝜕𝑦 x
𝜕𝑤𝑗𝑘 𝑘 𝑖𝑛𝑘 𝜕𝑤𝑗𝑘

−(𝑡𝑘 − 𝑦𝑘 ) 𝜆𝑦𝑘 1 − 𝑦𝑘 𝑧𝑗

−𝜆(𝑡𝑘 − 𝑦𝑘 ) 𝑦𝑘 1 − 𝑦𝑘 𝑧𝑗
𝜕𝐸 Let 𝜆 = 1
Δ𝑤𝑗𝑘 = − 𝜂 𝜕𝑤 𝑘 = 𝜂𝛿𝑘 𝑧𝑗
𝑗𝑘
Calculation of w jk
𝜕𝐸
Δ𝑤𝑗𝑘 = − 𝜂 𝜕𝑤 𝑘 = 𝜂𝛿𝑘 𝑧𝑗
𝑗𝑘

In matrix form: V W
m–n–p
Δ𝑊𝑛 𝑥 𝑝 = 𝜂𝑧𝑛 𝑥 1 𝛿1 𝑥 𝑝

Learning Local gradient


rate Output of of output layer
Hidden Layer
Calculation of vij

zj yk
zinj f() yink f()
vij wjk Ek
tk

Using Chain-Rule of differentiation

𝜕𝐸𝑘 𝜕𝐸 𝜕𝑦𝑘 𝜕𝑦𝑖𝑛𝑘 𝜕𝑧𝑗 𝜕𝑧𝑖𝑛𝑗


= 𝜕𝑦𝑘 x 𝜕𝑦 x x x
𝜕𝑣𝑖𝑗 𝑘 𝑖𝑛𝑘 𝜕𝑧𝑗 𝜕𝑧𝑖𝑛𝑗 𝜕𝑣𝑖𝑗
Calculation of vij

𝜕𝐸𝑘 𝝏𝑬 𝝏𝒚𝒌 𝝏𝒚𝒊𝒏𝒌 𝜕𝑧𝑗 𝜕𝑧𝑖𝑛𝑗


= 𝝏𝒚𝒌 x 𝝏𝒚 x x x
𝜕𝑣𝑖𝑗 𝒌 𝒊𝒏𝒌 𝝏𝒛𝒋 𝜕𝑧𝑖𝑛𝑗 𝜕𝑣𝑖𝑗

𝑦𝑖𝑛𝑘 = 𝑧1 𝑤1𝑘 + 𝑧2 𝑤2𝑘 + … + 𝑧𝑗 𝑤𝑗𝑘 + ⋯ + 𝑧𝑛 𝑤𝑛𝑘

𝜕𝑦𝑖𝑛𝑘
= 𝑤𝑗𝑘
𝜕𝑧𝑗
Calculation of vij

𝜕𝐸𝑘 𝝏𝑬 𝝏𝒚𝒌 𝝏𝒚𝒊𝒏𝒌 𝜕𝑧𝑗 𝜕𝑧𝑖𝑛𝑗


= 𝝏𝒚𝒌 x 𝝏𝒚 x x x
𝜕𝑣𝑖𝑗 𝒌 𝒊𝒏𝒌 𝝏𝒛𝒋 𝜕𝑧𝑖𝑛𝑗 𝜕𝑣𝑖𝑗

In the Hidden Layer: activation function is chosen Logistic Sigmoid.

1
𝑧𝑗 = 𝑓 𝑥 = 𝑓 ′ (𝑥) = 𝜆 𝑓 𝑥 (1 − 𝑓(𝑥))
1+ 𝑒 −𝜆𝑥
1
𝑧𝑗 = 𝑓 𝒛𝒊𝒏𝒋 = −𝜆𝒚𝒊𝒏𝒋
1+ 𝑒
𝝏𝒛𝒋
= 𝜆𝑧𝑗 (1 − 𝑧𝑗 )
𝝏𝒛𝒊𝒏𝒋
Calculation of vij

𝜕𝐸𝑘 𝝏𝑬𝒌 𝝏𝒚𝒌 𝝏𝒚𝒊𝒏𝒌 𝝏𝒛𝒋 𝜕𝑧𝑖𝑛𝑗


= x x x x
𝜕𝑣𝑖𝑗 𝝏𝒚𝒌 𝝏𝒚𝒊𝒏𝒌 𝝏𝒛𝒋 𝝏𝒛𝒊𝒏𝒋 𝜕𝑣𝑖𝑗

𝑧𝑖𝑛𝑗 = 𝑥1 𝑣1𝑗 + 𝑥2 𝑣2𝑗 + … + 𝑥𝑖 𝑣𝑖𝑗 + ⋯ + 𝑥𝑚 𝑤𝑚𝑗


𝝏𝒛𝒊𝒏𝒋
= 𝑥𝑖
𝝏𝒗𝒊𝒋
Calculation of vij
Putting all the above values together
𝜕𝐸𝑘 𝝏𝑬 𝝏𝒚𝒌 𝜕𝑦𝑖𝑛𝑘 𝜕𝑧𝑗 𝜕𝑧𝑖𝑛𝑗
= 𝝏𝒚𝒌 x 𝝏𝒚 x x x
𝜕𝑣𝑖𝑗 𝒌 𝒊𝒏𝒌 𝜕𝑧𝑗 𝜕𝑧𝑖𝑛𝑗 𝜕𝑣𝑖𝑗

−(𝒕𝒌 − 𝒚𝒌 ) 𝝀𝒚𝒌 𝟏 − 𝒚𝒌 𝑤𝑗𝑘 𝜆𝑧𝑗 (1 − 𝑧𝑗 ) 𝑥𝑖


𝛿𝑘

𝛿𝑖𝑛𝑗 𝑓 ′ (𝑧𝑖𝑛𝑗 )

𝛿𝑗
𝜕𝐸
Δ𝑣𝑖𝑗 = − 𝜂 𝜕𝑣 𝑘 = 𝜂𝛿𝑗 𝑥𝑖 Let 𝜆 = 1
𝑖𝑗
Calculation of vij
𝜕𝐸
Δ𝑣𝑖𝑗 = − 𝜂 𝜕𝑣 𝑘 = 𝜂𝛿𝑗 𝑥𝑖
𝑖𝑗

In matrix form: V W
m–n–p
Δ𝑉𝑚 𝑥 𝑛 = 𝜂𝑥𝑚 𝑥 1 𝛿1 𝑥 𝑛

Learning Local gradient


rate Input to of hidden layer
Input Layer
Example
Example
Example

Input the First Training Data

1 0.4
x =
−0.7 2𝑥1

0 𝑣11 𝑣12 0.1 0.4


𝑉 = =
𝑣21 𝑣22 2𝑥2 −0.2 0.2 2𝑥2

0 𝑤11 0.2
𝑊 = =
𝑤21 2𝑥1 −0.5 2𝑥1
Example

Hidden Layer Inputs

0
𝑧𝑖𝑛1
𝑧𝑖𝑛 = 𝑧 = 𝑉𝑇 x =
𝑖𝑛2 2𝑥1
𝑣11 𝑣21 𝑥1
=
𝑣12 𝑣22 𝑥2
0.1 −0.2 0.4 0.18
= =
0.4 0.2 2 𝑥 2 −0.7 2𝑥1 0.02 2𝑥1
Example

0
𝑧1
𝑧 = 𝑧
2 2𝑥1

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy