12-Back propagation Algorithm-07-08-2024
12-Back propagation Algorithm-07-08-2024
Dr.S.ALBERT ALEXANDER
SCHOOL OF ELECTRICAL ENGINEERING
albert.alexander@vit.ac.in
Dr.S.ALBERT ALEXANDER-
SELECT-VIT 1
Module 2
Artificial Neural Networks
❖ Perceptron Learning Algorithm
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 2
2.3 Back propagation Algorithm
❖ The demonstration of the limitations of single-layer neural
networks was a significant factor in the decline of interest
in neural networks in the 1970s
❖ The discovery (by several researchers independently) and
widespread dissemination of an effective general method
of training a multilayer neural network played a major role
in the reemergence of neural networks as a tool for solving
a wide variety of problems
❖ One such training method is known as backpropagation (of
errors) or the generalized delta rule
❖ It is simply a gradient descent method to minimize the total
squared error of the output computed by the net
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 3
Back propagation Algorithm
❖ The training of a network by backpropagation involves
three stages: the feedforward of the input training pattern,
the calculation and backpropagation of the associated
error, and the adjustment of the weights
❖ After training, application of the net involves only the
computations of the feedforward phase
❖ Even if training is slow, a trained net can produce its output
very rapidly
❖ Numerous variations of backpropagation have been
developed to improve the speed of the training process
❖ More than one hidden layer may be beneficial for some
applications, but one hidden layer is sufficient
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 4
Architecture
❖ A multilayer neural network with one layer of hidden units
(the Z units) is shown in next slide
❖ The output units (the Y units) and the hidden units also
may have biases (as shown)
❖ The bias on a typical output unit Yk is denoted by Wok
❖ The bias on a typical hidden unit Zy is denoted Voj
❖ These bias terms act like weights on connections from
units whose output is always 1
❖ Only the direction of information flow for the feedforward
phase of operation is shown
❖ During the backpropagation phase of learning, signals are
sent in the reverse direction
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 5
Architecture
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 6
Algorithm
❖ During feedforward, each input unit (Xi) receives an input
signal and broadcasts this signal to the each of the hidden
units Z1,….,Zp
❖ Each hidden unit then computes its activation and sends its
signal (zj) to each output unit
❖ Each output unit (Yk) computes its activation (yk) to form
the response of the net for the given input pattern
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 7
Algorithm
❖ k is used to distribute the error at output unit Yk back to all
units in the previous layer (the hidden units that are
connected to Yk)
❖ It is also used (later) to update the weights between the
output and the hidden layer
❖ In a similar manner, the factor j (j=1, ...,p) is computed for
each hidden unit Zj
❖ It is not necessary to propagate the error back to the input
layer, but j is used to update the weights between the
hidden layer and the input layer
❖ After all of the factors have been determined, the weights
for all layers are adjusted simultaneously
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 8
Algorithm
❖ The adjustment to the weight wjk (from hidden unit Zj to
output unit Yk) is based on the factor k and the activation zj
of the hidden unit Zj
❖ The adjustment to the weight vij (from input unit Xi to
hidden unit Zj) is based on the factor j and the activation xi
of the input unit
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 9
Training Algorithm
Step 0:
❖ Initialize weights. (Set to small random values)
Step 1:
❖ While stopping condition is false, do Steps 2-9
Step 2:
❖ For each training pair, do Steps 3-8
Feedforward
Step 3:
❖ Each input unit (Xi, i=1,...,n) receives input signal Xi and
broadcasts this signal to all units in the layer above (the
hidden units)
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 10
Training Algorithm
Step 4:
❖ Each hidden unit (Zj, j=1,...,p) sums its weighted input
signals
𝑧_𝑖𝑛𝑗 =𝑣𝑜𝑗 +σ𝑛𝑖=1 𝑥𝑖 𝑣𝑖𝑗
❖ Applies its activation function to compute its output signal
𝑧𝑗 = 𝑓(𝑧_𝑖𝑛𝑗 )
❖ Sends this signal to all units in the layer above (output
units)
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 11
Training Algorithm
Step 5:
❖ Each output unit (Yk, k=1,...,m) sums its weighted input
signals
𝑝
𝑦_𝑖𝑛𝑘 =𝑤𝑜𝑘 +σ𝑗=1 𝑧𝑗 𝑤𝑗𝑘
❖ Applies its activation function to compute its output signal,
𝑦𝑘 = 𝑓(𝑦_𝑖𝑛𝑘 )
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 12
Training Algorithm
Backpropagation of error
Step 6:
❖ Each output unit (Yk, k=1,...,m)receives a target pattern
corresponding to the input training pattern, computes its
error information term
k= (tk-yk)f’(y_ink)
❖ Calculates its weight correction term (used to update wjk
later), 𝑤𝑗𝑘 =𝑘 𝑧𝑗
❖ Calculates its bias correction term (used to update wok
later), 𝑤𝑜𝑘 =𝑘
❖ Sends 𝑘 to units in the layer below
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 13
Training Algorithm
Backpropagation of error
Step 7:
❖ Each hidden unit (Zj, j=1,...,p) sums its delta inputs (from
units in the layer above)
_𝑖𝑛𝑗 =σ𝑚
𝑘=1 𝑘 𝑤𝑗𝑘
❖ Multiplies by the derivative of its activation function to
calculate its error information term
𝑗 = _𝑖𝑛𝑗 𝑓′(𝑧_𝑖𝑛𝑗 )
❖ Calculates its weight correction term (used to update vij
later), 𝑣𝑖𝑗 =𝑗 𝑥𝑖
❖ Calculates its bias correction term (used to update voj
later), 𝑣𝑜𝑗 =𝑗
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 14
Training Algorithm
Update weights and biases
Step 8:
❖ Each output unit (Yk, k=1,…,m) updates its bias and
weights (j = 0,…,p):
wjk(new) = wjk(old) + wjk
❖ Each hidden unit (Zj, j=1,…,p) updates its bias and weights
(i=0,…,n):
vij(new) = vij(old) + vij
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 15
Analysis
❖ In implementing BPN algorithm, separate arrays should be
used for the deltas for the output units (Step 6, 𝑘 ) and the
deltas for the hidden units (Step 7, 𝑗 )
❖ An epoch is one cycle through the entire set of training
vectors
❖ Typically, many epochs are required for training a
backpropagation neural net
❖ The foregoing algorithm updates the weights after each
training pattern is presented
❖ A common variation is batch updating, in which weight
updates are accumulated over an entire epoch (or some
other number of presentations of patterns) before being
applied
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 16
Analysis
❖ Note that (f’(y_ink) and f’(z_inj) can be expressed in terms
of yk and zj, respectively, using the appropriate formulas
(depending on the choice of activation function)
❖ The mathematical basis for the backpropagation algorithm
is the optimization technique known as gradient descent
❖ The gradient of a function (in this case, the function is the
error and the variables are the weights of the net) gives the
direction in which the function increases more rapidly
❖ The negative of the gradient gives the direction in which
the function decreases most rapidly
❖ The derivation clarifies the reason why the weight updates
should be done after all of the 𝑘 and 𝐽 expressions have
been calculated, rather than during backpropagation
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 17
Example-1
Using back propagation network calculate the new weights
for the network shown in the figure. It is represented with the
input pattern [0,1] and the target output is 1. Use learning
rate= 0.25 and identity activation function.
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 18
Solution
The initial weights are:
❖ [v11 v21 v01] = [0.6 -0.1 0.3]
Given sample:
❖ [x1, x2]=[0,1] and target t=1
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 19
Solution
For Z1 layer
❖ Zin1= v01+x1v11+x2v21
For Z2 layer
❖ Zin2= v02+x1v12+x2v22
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 20
Solution
Calculate the net input entering the output layer; for y
layer
❖ Yin= W0+Z1W1+Z2W2 = -0.03
❖ Y= f(Yin)= -0.03
Now,
f’(y_ink)= f(yin)[1-f(yin)]
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 21
Solution
❖ f(yin)=yk= 0.2(0.4)+0.9(0.1)+1(-0.2) = -0.03
❖ f’(y_ink)= -0.03[1-(-0.03]= -0.0309
This implies,
❖ 1=(1-(-0.03)x(-0.0309) =-0.031827
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 22
Solution
Compute the error portion of j between input and hidden
layer (j=1 to 2)
❖ j =inj f’(zinj)
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 23
Solution
Now find the changes in weights between input and
hidden layer
❖ ∆v11 =1x1 = 0.25 x -0.002036928 x 0 = 0
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 24
Solution
Compute the final weights of the network
❖ V11(new)= v11 old + ∆v11 = 0.6+0 =0.6
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 25
Dr.S.ALBERT ALEXANDER-SELECT-
VIT 26