100% found this document useful (1 vote)
95 views59 pages

2 - Neural Network

This document provides an overview of artificial neural networks and their biological inspiration. It discusses the basic structure and functioning of biological neurons, comparing them to artificial neurons. The key components of artificial neurons like weights, inputs, outputs and activation functions are explained. Different types of neural network topologies like feedforward and recurrent networks are described. The document aims to introduce some of the fundamental concepts behind artificial neural networks and their relationship to biological neural systems.

Uploaded by

Jitesh Behera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
95 views59 pages

2 - Neural Network

This document provides an overview of artificial neural networks and their biological inspiration. It discusses the basic structure and functioning of biological neurons, comparing them to artificial neurons. The key components of artificial neurons like weights, inputs, outputs and activation functions are explained. Different types of neural network topologies like feedforward and recurrent networks are described. The document aims to introduce some of the fundamental concepts behind artificial neural networks and their relationship to biological neural systems.

Uploaded by

Jitesh Behera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

PARALA MAHARAJA ENGINEERING COLLEGE

BERHAMPUR

A Course on
SOFT COMPUTING

Prepared By
Mr. Suryalok Dash
Asst. Professor, Dept. of Electrical Engineering.
Mr. Suryalok Dash
2 PMEC- Berhampur 11/13/2018
Artificial Neural Network
(ANN)

Mr. Suryalok Dash


3 PMEC- Berhampur 11/13/2018
Human brain
 Biological nervous system is the most important part of human being.
 There is a part called brain at the center of the human nervous system.
 Unlike computers, which are programmed to solve problems using sequential
algorithms, the brain makes use of a massive network of parallel and distributed
computational elements called neurons.
 Neurons are responsible for thought emotion, cognition, pattern recognition etc.
 Each neurons is approximately 10 micron long in length and they can operate in parallel.
Typically, human brain consists of approximately 1011 neurons.
 A neuron receives electro chemical signals from its various sources and in turn responds
by transmitting electrical impulse to other neurons.
 Characteristics of Human Brain
 Ability to learn from experience
 Ability to generalize the knowledge it possess
 Ability to perform abstraction
 To make errors.

Mr. Suryalok Dash


4 PMEC- Berhampur 11/13/2018
Mr. Suryalok Dash
5 PMEC- Berhampur 11/13/2018
Working of Neuron:
 A neuron composed of a nucleus (a cell body, also known as soma).
 Long irregularity shaped filaments called dendrites are connected to the soma. These
behaves as inputs to the neuron.
 The output channels of the neuron is called Axon. These are electrically active. These
are non linear threshold devices which produce a voltage pulse that lasts for about a
millisecond.
 If the cumulative input as received by the soma raises the internal potential of the cell
(known as membrane potential), the neuron fires to excite or inhibit other neurons.
 The axon terminates in a contact called Synapse or Synaptic junction that connects
the axon with the dendrite of other neuron.
 Synaptic junction contains a neurotransmitter fluid, which is responsible for accelerating
or retarding the electric charges to the soma.
 A single neuron may have many synaptic input and synaptic output.

Mr. Suryalok Dash


6 PMEC- Berhampur 11/13/2018
Computer and Human Brain:
Similarities
 Both operate on electrical signals
 Both are a composition of a large number of simple elements.
 Both perform functions that are computational.
Differences
 Compared to s or ns time scales of digital computation, nerve impulses are
slow.
 The brain’s huge computation rate is achieved by a tremendous number of
parallel computational units, far beyond any proposed for a computer system.
 A digital computer is inherently error free, but brain often produces best
guesses and approximations from partially incomplete and incorrect inputs,
which may be wrong.

Mr. Suryalok Dash


7 PMEC- Berhampur 11/13/2018
Modern Computer Biological Neural System
Processor Complex Simple
High Speed Low Speed
One or Few A Large Number
Memory Separate from a Processor Integrated into Processor

Localized Distributed
Computing Centralized Distributed
Sequential Parallel
Stored Programs Self-Learning
Reliability Very Vulnerable Robust
Expertise Numerical and Symbolic Perceptual Problems
Manipulations
Operating Environment Well-Defined Poorly Defined

Well Constrained Unconstrained


Mr. Suryalok Dash
8 PMEC- Berhampur 11/13/2018
Artificial neural network (ANN)
ANN or NN are simplified models of the biological nervous system.
Definition: (According to Haykin (1994)): A neural network is a massively parallel
distributed processor that has a natural propensity for storing experiential knowledge and
making it available for use. It resembles the brain in two respects:
 Knowledge is acquired by the network through a learning process.
 Interneuron connection strengths known as synaptic weights are used to store the
knowledge.
Objective: To Develop a computational device for modeling the brain to perform various
computational tasks at a faster rate than the traditional system.
Motivation:
 Scientists are challenged to use machines more effectively for tasks currently solved
by humans.
 Symbolic rules don't reflect processes actually used by humans.
 Traditional computing excels in many areas, but not in others.

Application: Pattern Recognition, Classification, Optimization function, Approximation,


Vector quantization, data clustering etc.
Mr. Suryalok Dash
9 PMEC- Berhampur 11/13/2018
 Artificial Neuron is also referred as “Perceptron”.
Analogy Between BNN and ANN:

Mr. Suryalok Dash


10 PMEC- Berhampur 11/13/2018
 A biological neuron receives all inputs through the dendrites, sums them and produces
an output if the sum is greater than a threshold value.
 The input signal are passed on to the cell body through the synapse which may
accelerate or retard an arriving signal.
 The acceleration and retardation of the input signals is modelled by the weights.
 An strong synapse which transmits a stronger signal will have a corresponding larger
weight while a weak synapse will have a smaller weight.

Mr. Suryalok Dash


11 PMEC- Berhampur 11/13/2018
Biological Neuron Artificial Neuron
Cell Neuron
Dendrites Weights
Soma Net Input
Axon Output

Mr. Suryalok Dash


12 PMEC- Berhampur 11/13/2018
Features of ANN
 An artificial neural network (ANN) is typically composed of a set of parallel and
distributed processing units, called nodes or neurons.
 These are usually ordered into layers, appropriately interconnected by means of
unidirectional (or bi-directional in some cases) weighted signal channels, called
connections or synaptic weights. (Fig. next slide)
 The internal architecture of ANN provides powerful computational capabilities,
allowing for the simultaneous exploration of different competing hypotheses.
 Massive parallelism and computationally intensive learning through examples
make them suitable for application in nonlinear functional mapping, speech and
pattern recognition, categorization, data compression, and many other
applications characterized by complex dynamics and possibly uncertain
behavior.
 Three important features generally characterize an artificial neural network:
the network topology, the network transfer functions, and the
network learning algorithm.
Mr. Suryalok Dash
13 PMEC- Berhampur 11/13/2018
Mr. Suryalok Dash
14 PMEC- Berhampur 11/13/2018
Neural network topologies:
 The way the nodes and the interconnections are arranged within the layers of a
given ANN determines its topology.
 The choice for using a given topology is mainly dictated by the type of problem
being considered.
 The following are the topologies of the Neural Network.
The feedforward (FF) topology
 It has nodes hierarchically arranged in layers starting with the input layer and
ending with the output layer.
 In between, a number of internal layers, also called hidden layers, provide most
of the network computational power.
 The nodes in each layer are connected to the next layer through unidirectional
paths starting from one layer (source) and ending at the subsequent layer (sink).
This means that the outputs of a given layer feed the nodes of the following layer
in a forward path.
 These are static in nature. i.e. the output of a given pattern of inputs is
independent of the previous state of the network.
Mr. Suryalok Dash
15 PMEC- Berhampur 11/13/2018
Mr. Suryalok Dash
16 PMEC- Berhampur 11/13/2018
The recurrent topology
 recurrent networks (RN) allow for feedback connections among their nodes.
 They are structured in such a way as to permit storage of information in their
output nodes through dynamic states, hence providing the network with some
sort of “memory.”
 recurrent networks map states into states and as such are very useful for
modeling and identifying dynamic systems.

Mr. Suryalok Dash


17 PMEC- Berhampur 11/13/2018
Neural network activation functions:
 The Neurons take the weighted sum of their inputs from other nodes and apply
to them a nonlinear mapping (not necessarily linear) called an activation
function before delivering the output to the next neuron.

Mr. Suryalok Dash


18 PMEC- Berhampur 11/13/2018
 The output Ok of a typical neuron (k) having (l) inputs is given as

 where f is the node’s activation function, x1, x2, . . . , xl are the node’s inputs,
w1k, w2k, . . . , wlk are the connections weights, and θk is the node’s
threshold.
 The bias effect (threshold value) is intended to occasionally inhibit the activity of
some nodes.
 The activation functions can take different forms: sigmoid mapping, signum
function, step function or linear correspondence.
 The mathematical representation for some of these mappings are:

Mr. Suryalok Dash


19 PMEC- Berhampur 11/13/2018
Mr. Suryalok Dash
20 PMEC- Berhampur 11/13/2018
Neural network learning algorithms:
 Learning algorithms are used to update the weight parameters at the
interconnection level of the neurons during the training process of the network
 The three well-known and most often used learning mechanisms are the
supervised, the unsupervised (or self-organized) and the reinforced.
Supervised learning:
 The main feature of the supervised (or active) learning mechanism is the
training by examples.
 This means that an external teacher provides the network with a set of input
stimuli for which the output is a priori known.
 During the training process, the output results are continuously compared with
the desired data.
 An appropriate learning rule uses the error between the actual output and the
target data to adjust the connection weights so as to obtain, after a number of
iterations, the closest match between the target output and the actual output.
 Supervised learning is particularly useful for feedforward networks.

Mr. Suryalok Dash


21 PMEC- Berhampur 11/13/2018
 Gradient descent optimization technique and the least mean square
algorithm are among the most commonly used supervised learning rules.

Unsupervised learning:
 Unsupervised or self-organized learning does not involve an external teacher
and relies instead upon local information and internal control.
 The training data and input patterns are presented to the system, and through
predefined guidelines, the system discovers emergent collective properties and
organizes the data into clusters or categories.

Mr. Suryalok Dash


22 PMEC- Berhampur 11/13/2018
 An unsupervised learning scheme operates as follows. A set of training data is
presented to the system at the input layer level. The network connection
weights are then adjusted through some sort of competition among the nodes of
the output layer, where the successful candidate will be the node with the
highest value. In the process, the algorithm strengthens the connection between
the incoming pattern at the input layer and the node output corresponding to
the winning candidate

Mr. Suryalok Dash


23 PMEC- Berhampur 11/13/2018
Reinforcement learning:
 Reinforcement, also known as graded learning, mimic the way of adjusting
behavior of humans when interacting with a given physical environment.
 This is another type of learning mechanism by means of which the network
connections are modified according to feedback information provided to the
network by its environment.
 This information simply instructs the system on whether or not a correct
response has been obtained. In the case of a correct response, the corresponding
connections leading to that output are strengthened otherwise weakened.

Mr. Suryalok Dash


24 PMEC- Berhampur 11/13/2018
Other important terminologies of ANN
Weights:
 In ANN each neuron is connected to other neuron by direct link with
weights.
 The weight contains the information about the input signal.
 The weight can be represented in the form of a matrix called connection
matrix.
Bias:
 The bias included in the NN has its impact in
calculating the net input.
 The bias is considered as another input.
 Yin = ∑wixi + θ, where θ is the bias
Threshold:
Threshold is a set value based upon which the final output of the network may
be calculated. The threshold value is used in the activation function. A
comparison is made between the calculated net input and the threshold to
obtain the network output.
Mr. Suryalok Dash
25 PMEC- Berhampur 11/13/2018
Learning Rate:
 The learning rate is denoted by “η”.
 It is used to control the amount of weight adjustment at each step of
training.
 The learning rate, ranging from 0 to 1, determines the rate of learning at
each time step.
Momentum Factor
 Convergence is made faster if a momentum factor is added to the weight
updation factor process.
 This is generally done is back propagation network.

Mr. Suryalok Dash


26 PMEC- Berhampur 11/13/2018
Evolution of ANN
Year Neural Network Designer
1943 McCulloch and Pitts (M-P) Neuron McCulloch and Pitts
1949 Hebb Network Hebb
1958 Perceptron Frank Rosenblatt and
Others
1960 Adaline Widrow and Hoff
1972 Kohonen self organizing map Kohonen
1982 Hopfield Network John Hopfield and Tank

1986 Back Propagation Network Rumelhart and others

1988 Counter Propagation Network Grossberg

1987-1990 Adaptive Resonance Theory Carpenter and Grossberg

1988 Radian Basis Function Network Broomhead and Lowe

Mr. Suryalok Dash


27 PMEC- Berhampur 11/13/2018
MC-Culloch–Pitts models
 This model is considered by many as the first serious attempt to model the computing
process of the biological neuron.
 This Model is limited in terms of computing capability and doesn’t actually provide
learning.
 This neuron model is quite simple in design as shown in Figure

 It collects the incoming signals x1, x2, . . . , xl, multiplies them by corresponding
weights w1, w2, . . . , wl and compares the result with a predetermined bias θ before
applying activation function resulting in the output o.
Mr. Suryalok Dash
28 PMEC- Berhampur 11/13/2018
The output is expressed by

 If the weighted sum of the different signal inputs, is larger than the bias θ, then
an output value of 1 is generated, otherwise the result is zero. This is done
through the step activation function.
Note:
 McCulloch-Pitts neuron allows binary 0 or 1 states only ie.it is binary activated
 The weights could be excitatory (positive) or inhibitory (Negative).
 model topology is based on a fixed set of weights and thresholds
 This model are most widely used in case of logic function.
 Notice here the absence of any type of learning since there is no updating
mechanism for the synaptic weights once the system has been presented with a
set of training input–output data.
Mr. Suryalok Dash
29 PMEC- Berhampur 11/13/2018
Example: X1 X2 O
Implement AND function using M-P neuron. 0 0 0
0 1 0
1 0 0
Solution:
1 1 1

Assume weights W1 and W2 =1 and θ=0. With this assumed weight, the net input
for four inputs are:
Yin = 0 (for input 0,0), Yin = 1 (for input 0,1)
Yin = 1 (for input 1,0), Yin = 2 (for input 1,1)
Hence, if we set the threshold of the activation function anything > 1 and < 2, we
will get desired output.
Alternate solution: w1=1, w2=1, θ= -1, threshold =0.1
Note: There are many such solutions !

Mr. Suryalok Dash


30 PMEC- Berhampur 11/13/2018
Hebb Network
 Hebb, proposed a very simple learning method to the neural network. .
 According to Hebb rule, the weight vector is found to increase proportionately
to the product of the input (xi) and the learning signal. Here, the learning signal
is equal to the neuron’s target output(t).
 In Hebb learning, if two interconnected neurons are ON simultaneously then
the weights associated with these neurons can be increased by the modification
made in their strength. The weight update rule is
𝑤𝑖 𝑛𝑒𝑤 = 𝑤𝑖 𝑜𝑙𝑑 + 𝑡 𝑥𝑖 (For all I, including i=0)
Note:The Hebb rule is suited for bipolar data (1,-1) than binary data (0,1)

Mr. Suryalok Dash


31 PMEC- Berhampur 11/13/2018
Example: Training of an AND Gate with bipolar data X1 X2 Target (t)
Solution: -1 -1 -1
 Initialize the weight to random values w1=0 and
-1 1 -1
w2=0 with Bias θ =0
 Take the first set of data, (-1,-1), target =-1
1 -1 -1
Δw1= (-1)-1 = 1 and Δw2= (-1)-1 = 1 and Δθ =-1 1 1 1
So, the updated weights are w1=1 and w2=1 with Bias =-1
 Take the second set of data, (-1,1), target =-1
Δw1= (-1)-1 = 1 and Δw2= (-1)1 = -1 and Δθ =(-1)= -1
So, the updated weights are w1=2 and w2=0 with Bias =-2
 Take the third set of data, (1,-1), target =-1
Δw1= (-1)1 = -1 and Δw2= (-1)-1 = 1 and Δθ =-1
So, the updated weights are w1=1 and w2=1 with Bias =-3
 Take the forth set of data, (1,1) target =1
Δw1= (1)1 = 1 and Δw2= (1)1 = 1 and Δθ =1
So, the updated weights are w1=2 and w2=2 with Bias =-2
Mr. Suryalok Dash
32 PMEC- Berhampur 11/13/2018
Example: Let’s Try to solve the same problem with Binary data (0,1)
Solution:
I/P W1 W2 W0 Targ Δw1 Δw2 Δw0 W1 W2 W0
x1,x2 (ol (old (old et (new) (new) (new)
d) ) ) (t)
Epoch-1 (TakeW0=0,x0=1)
0,0 0 0 0 0 0 0 0 0 0 0
0,1 0 0 0 0 0 0 0 0 0 0
1,0 0 0 0 0 0 0 0 0 0 0
1,1 0 0 0 1 1 1 1 1 1 1
Epoch-2
0,0 1 1 1 0 0 0 0 1 1 1
0,1 1 1 1 0 0 0 0 1 1 1
1,0 1 1 1 0 0 0 0 1 1 1
1,1 1 1 1 1 1 1 1 2 2 2
It can be seen that the settled value will keep increasing. Hence, weights will never be settled

Mr. Suryalok Dash


33 PMEC- Berhampur 11/13/2018
Perceptron
 To overcome the limitations of M-P model, other models involving learning
capabilities have been suggested on the basis they would have to adjust the
connection weights in accordance with some sort of optimization mechanism.
 In the early 1960s, Rosenblatt of Cornell University developed a trainable
model of an artificial neuron using a supervised learning procedure in which the
system adjusts its weights in response to a comparative signal computed
between the actual output and the target output. This is called perceptron.
 The motivation behind this development is for the purpose of pattern
classification of linearly separable sets. By definition, sets are linearly separable if
there exists a hyperplanar multidimensional decision boundary that classifies the
data input into two classes.
 Perceptron architecture is a hierarchical three-level structure is shown in figure
next slide.
 The input level corresponding to the sensory unit or retina.

Mr. Suryalok Dash


34 PMEC- Berhampur 11/13/2018
Mr. Suryalok Dash
35 PMEC- Berhampur 11/13/2018
 The second-level unit (hidden unit), called a feature detector unit or associator
unit.
 The input and hidden unit are connected with fixed connection weights and
thresholds.
 The third-level unit (response unit) involving the output layer composed of one
single node but with adjustable connection weights.
 The activation function used in the perceptron is the step or the hard limiting
(signum) activation function.
 The learning algorithm used to adjust the weights is the perceptron learning
rule. It is used to update the weights between the associator unit and response
unit.
 The error calculation is based on the comparison of the values of target with
those of the calculated output. The target value is either 1 or -1.
 The weights will be adjusted according to learning rule if an error occurred. If
no error, no weight updatation.

Mr. Suryalok Dash


36 PMEC- Berhampur 11/13/2018
Training Algorithm for Perceptron:
 Step-1: Initialize weights and thresholds to small random values.
 Step 2: Choose an input–output pattern from the training input–output dataset
(x(k), t(k)).
 Step 3: Compute the net input to the net and actual output
 Step 4:
 Case-1: (If the activation function is step function)
(i.e. if target is 0 or 1)

 Case-2: (If the activation function is signum function)(i.e, target is 1 or -1)

Where, with η being a positive update rate ranging from 0 to 1


representing the learning rate.
Mr. Suryalok Dash
37 PMEC- Berhampur 11/13/2018
Step 5: In case the weights do not reach steady-state values Δwi = 0, repeat
starting from step 2 and choose another training pattern.
 This learning procedure is very similar to the Hebbian learning rule, with the
difference that the connection weights here are not updated if the network
responds correctly.

Mr. Suryalok Dash


38 PMEC- Berhampur 11/13/2018
Example: Training of an AND Gate
X1 X2 Target (t)
(Use bipolar data and signum activation
function) -1 -1 -1
i.e. -1 1 -1
y=-1 ; if net i/p <0
1 -1 -1
y=0 ; if net i/p =0
y=1 ; if net i/p >0 1 1 1

Solution:
 Let’s Initialize the weight to
random values w1=0 and
w2=0 with Bias =0
 Also assume, η =0.3

Mr. Suryalok Dash


39 PMEC- Berhampur 11/13/2018
I/p I/P x0 Target Net Output Δw1 Δw2 Δw0 W1 W2 W0
x1 x2 ,t i/p O (Bias (new (ne (ne
) ) w) w)
Epoch -1
-1 -1 1 -1 0 0 0.6 0.6 -0.6 0.6 0.6 -0.6

-1 1 1 -1 0.6 1 0.6 -0.6 -0.6 1.2 0 0

1 -1 1 -1 1.2 1 -0.6 0.6 -0.6 0.6 0.6 -0.6

1 1 1 1 0.6 1 0 0 0 0.6 0.6 -0.6

Epoch -2
-1 -1 1 -1 -0.6 -1 0 0 0 0.6 0.6 -0.6

-1 1 1 -1 -0.6 -1 0 0 0 0.6 0.6 -0.6

1 -1 1 -1 0.6 -1 0 0 0 0.6 0.6 -0.6

1 1 1 1 0.6 1 0 0 0 0.6 0.6 -0.6

Mr. Suryalok Dash


40 PMEC- Berhampur 11/13/2018
Example: The AND Gate can also be trained
X1 X2 Target (t)
using Binary data and step activation
function as below 0 0 0
Solution: 0 1 0
 Initialize the weight to 1 0 0
random values w1=0.3 and
w2=0.2 with Bias =0
1 1 1
 Take the first set of data, (0,0)
Net input=x1w1+x2w2 =0
Compute the output. O=f(net input)=0 and t=0 Hence, no weight change
Note:The function is a step function
 Take the second set of data, (0,1), Net input=x1w1+x2w2 =0.2
Output, O = f(Net input)=1, but, Target, t=0. So we the weights need to be
updated. Assume the learning rate η =0.6
Δw1= 0.6[0-1]0 = 0 and Δw2= 0.6[0-1]1 = -0.6
So, the updated weights becomes, W1= 0.3+0 =0.3 and W2= 0.2-0.6 = -0.4

Mr. Suryalok Dash


41 PMEC- Berhampur 11/13/2018
 Take the third set of data, (1,0), Net input=x1w1+x2w2 =0.3
Output, O = f(Net input)=1, but, Target, t=0. So we the weights need to be
updated. Assume the learning rate η =0.6
Δw1= 0.6[0-1]1 = -0.6 and Δw2= 0.6[0-1]0 = 0
So, the updated weights becomes, W1= 0.3-0.6 =-0.3 and W2= =-0.4+0 = -0.4
 Take the fourth set of data, (1,1), Net input=x1w1+x2w2 =-0.7
Output, O = f(Net input)=0, but, Target, t=1. So we the weights need to be
updated. Assume the learning rate η =0.6
Δw1= 0.6[1-0]1 = 0.6 and Δw2= 0.6[1-0]1 = 0.6
So, the updated weights becomes, W1= -0.3+0.6 =0.3 and W2= =-0.4+0.6= 0.2
Remember, we started the initial weight from 0.3 and 0.2, and after training all 4 set of data we are
still at the same weights.
So, NO CONVERGENCE !!!
Probable solution: Add a Bias

Mr. Suryalok Dash


42 PMEC- Berhampur 11/13/2018
 Initialize the weight to random values w1=0.3 and w2=0.2 with Bias, θ =-1
 Take the first set of data, (0,0) Net input=x1w1+x2w2-1 =-1
O=f(net input)=0 and t=0 Hence, no weight change
 Take the second set of data, (0,1), Net input=x1w1+x2w2 -1=-0.8
Output, O = f(Net input)=0, and, Target, t=0. Hence, no weight change
 Take the third set of data, (1,0), Net input=x1w1+x2w2 -1=-0.7
Output, O = f(Net input)=0, and, Target, t=0. Hence, no weight change
 Take the forth set of data, (1,1), Net input=x1w1+x2w2 -1=-0.5
Output, O = f(Net input)=0, and, Target, t=1. Hence, weights need to be updated
Δw1= 0.6[1-0]1 = 0.6 and Δw2= 0.6[1-0]1 = 0.6
So, the updated weights becomes, W1= 0.3+0.6 =0.9 and W2= =0.2+0.6= 0.8
 One iteration is completed. Now rerun for next set of iteration for all 4 data set.
 Net input=-1, O=f(net input)=0 and t=0 Hence, no weight change
 Net input=-0.2, O=f(net input)=0 and t=0 Hence, no weight change
 Net input=-0.1, O=f(net input)=0 and t=0 Hence, no weight change
 Net input=0.7, O=f(net input)=1 and t=1 Hence, no weight change
Mr. Suryalok Dash
43 PMEC- Berhampur 11/13/2018
Linear Separability:
 According to perceptron convergence theorem, it was shown that as long as the patterns
used to train the perceptron are linearly separable, the learning algorithm should
converge in a finite number of steps. This means that a decision boundary should be
obtained to classify the patterns as shown in figure below.

 To illustrate the ideas, the hyperplane takes the form of

 In the two-dimensional case, this translates into finding the line given by w1x1 + w2x2
− θ = 0, which after learning should adequately classify the patterns.
Mr. Suryalok Dash
44 PMEC- Berhampur 11/13/2018
 On training, if the weights of training input vectors of correct response +1 lie on one
side of the boundary and that of -1 lie on the other side of the boundary, then the
problem is linearly separable.

XOR
AND

(1,0) (1,1)
(1,0) (1,1)

(0,0) (0,1) (0,0) (0,1)

Mr. Suryalok Dash


45 PMEC- Berhampur 11/13/2018
 Excersise-1:
Can we train a Perceptron to solve the logic exclusive (XOR) problem? Justify the answer.
 Excersise-2:
Train the network using the following set of input and desired output training vectors:
X1 X2 X3 X4 t
1 -2 0 -1 -1
0 1.5 -0.5 -1 -1
-1 1 0.5 -1 1
with initial weight vector w(1) = [1, −1, 0, 0.5]T, and the learning rate η = 0.1.
Ans: w1 = −0.2, w2 = 0.3, w3 = 0.5, and w4 = 0.3.

Mr. Suryalok Dash


46 PMEC- Berhampur 11/13/2018
 Single layer perceptron were subjected to two major criticisms
perceptron lacked the ability to solve problems dealing with nonlinearly separable

patterns
 perceptron, once trained with a set of training data, cannot adapt its connection
weights to a new set of data (lack of generalization).
 The shortcomings put on hold the connectionist research for a number of years. (1960-
1970)

Mr. Suryalok Dash


47 PMEC- Berhampur 11/13/2018
Adaline
 Adaptive Linear Neuron (ADALINE), was developed by Widrow 1962 and is composed
of a linear combiner, an activation function (hard limiter) and a set of trainable signed
weights as shown in Figure.
 In an Adaline, the input-output relationship
is linear.
 It has only one output.
 This model is more versatile in terms of
generalization and more powerful in terms
of weight adaptation.
 The weights in this model are adjusted
according to the least mean square (LMS)
algorithm which is also known as the
Widrow–Hoff learning rule or delta rule.
 The learning rule for the adaline is
formally derived using the gradient
descent algorithm.

Mr. Suryalok Dash


48 PMEC- Berhampur 11/13/2018
 The perceptron learning rule stops after a finite number of learning steps. But, gradient
descent approach continues forever, converging only asymptotically to the solution.
 The LMS rule adjusts the weights by incrementing them every iteration step by an
amount proportional to the gradient of the cumulative error of the network E(w):

where,

 This is the cumulative error for all patterns k’s (k = 1 . . . n) between the desired
response t k and the actual output of the linear combiner.
 The weights are updated individually according to the formulae

Mr. Suryalok Dash


49 PMEC- Berhampur 11/13/2018
 Once Adaline has been trained for a set of patterns, it can be applied successfully to
classify a noisy version of the same set.
 The steps involving the training of the Adaline are quite similar to those used for the
perceptron with the difference that the weight updating law is being carried out using
the desired output and the actual output of the linear combiner (not output of
activation function) before going through the activation function.

Mr. Suryalok Dash


50 PMEC- Berhampur 11/13/2018
 Training Algorithm for Adaline:

Mr. Suryalok Dash


51 PMEC- Berhampur 11/13/2018
Madaline
 Like Perceptron, Adaline also unable to train patterns belonging to nonlinearly separable
spaces.
 It was found that, by combining in parallel a number of Adaline units can solve this
problem. This network is called a Madaline. (Many Adaline)
 For instance, it was found that the XOR logic function could be effectively trained by
combining in parallel two Adaline units using the AND logic gate. (Fig next slide)
 The weights that are connected from Adaline to Madaline layer are fixed, positive and
posses fixed value. The weights between the input layer and Adaline layer are adjustable
during training process.
 The Training process of the Madaline is similar to that of Adaline.
 Hoever, Madaline was still restricted in its capabilities for dealing with complex
functional mappings and multi-class pattern recognition problems.

Mr. Suryalok Dash


52 PMEC- Berhampur 11/13/2018
Mr. Suryalok Dash
53 PMEC- Berhampur 11/13/2018
Kohonen’s self-organizing network (or Map)
(KSON or KSOM)
 The Kohonen self-organizing network (KSON), also known as the Kohonen self-
organizing map (KSOM) or self-organizing map (SOM), belongs to the class of
unsupervised learning networks i.e updates of weighting parameters without the need
for a performance feedback from a teacher or a network trainer.
 One major feature of this network is that the nodes distribute themselves across the
input space to recognize groups of similar input vectors, while the output nodes
compete among themselves to be fired one at a time in response to a particular input
vector. This process is known as competitive learning.
 When suitably trained, the network produces a low dimension representation of the
input space that preserves the ordering of the original structure of the network. This
implies that two input vectors with similar pattern characteristics excite two physically
close layer nodes.
 KSONs have been used extensively for clustering applications such as speech
recognition, vector coding, and texture segmentation and robotics applications.
 A schematic representation of a typical KSON with a 2-D output configuration is shown
in Figure next slide
Mr. Suryalok Dash
54 PMEC- Berhampur 11/13/2018
Mr. Suryalok Dash
55 PMEC- Berhampur 11/13/2018
 The learning here permits the clustering of input data into a smaller set of elements
having similar characteristics (features).
 It is based on the competitive learning technique also known as the ‘winner take all’
strategy.
 Let us presume that the input pattern is given by the vector x and let us denote by wij
the weight vector connecting the pattern input elements to an output node with
coordinates provided by indices i and j.
 Let us also denote Nc as being the neighborhood around the winning output candidate,
which has its size decreasing at every iteration of the algorithm until convergence
occurs.
 The steps of the learning algorithm are summarized as follows:
Step1: Initialize all weights to small random values. Set a value for the initial learning rate
α and a value for the neighborhood Nc.
Step 2: Choose an input pattern x from the input data set.
Step 3: Select the winning unit c (the index of the best matching output unit) such that the
performance index I given by the Euclidian distance from x to wij is minimized:

Mr. Suryalok Dash


56 PMEC- Berhampur 11/13/2018
Step 4: Update the weights according to the global network updating phase from iteration
k to iteration k + 1 as:

where α(k) is the adaptive learning rate (strictly positive value smaller than
unity) and Nc(k) is the neighborhood of the unit c at iteration k
Step 5: The learning rate and the neighborhood are decreased at every iteration according
to an appropriate scheme. For instance, Kohonen suggested a shrinking function in the
form of α(k) = α(0)(1 − k/T), with T being the total number of training cycles and α(0)
the starting learning rate bounded by one.
Step 6: The learning scheme continues until a sufficient number of iterations has been
reached or until each output reaches a threshold of sensitivity with respect to a portion of
the input space.

Mr. Suryalok Dash


57 PMEC- Berhampur 11/13/2018
recurrent network
 Feedforward networks are networks in which the output of a given node is always
injected into the following layer. Recurrent networks, on the other hand, have some
outputs directed back as inputs to node(s) in the same or preceding layer.
 The presence of cycles in a recurrent network provides the abilities of dynamically
encoding, storing, and retrieving context information and hence their name as dynamic
neural networks.
 Fig shows a typical structure of a recurrent neural network (RNN). The “feedback”
connections allow the network to tackle problems involving dynamic processes.

Mr. Suryalok Dash


58 PMEC- Berhampur 11/13/2018
 The architecture is similar to the feedforward network except that there are
connections going from nodes in the output layer to the ones in the input layer. There
are also some self-loops in the hidden layer.
 When time inputs are presented at the input layer, the unit computes the activation
output just like a feedforward node does. However, its net input consists of additional
information reflecting the state of the network. For the subsequent inputs, the state will
essentially be a function of the previous and current inputs. As a result, the behavior of
the network depends not only on the current excitation, but also on past history.
 Huge information can be stored in a recurrent network .At the same time, however, the
feedback connections pose a challenging problem in terms of training.

Mr. Suryalok Dash


59 PMEC- Berhampur 11/13/2018

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy