Applications
Applications
•Introduction
•Winner–take–all learning
•Max net
•Hamming Net
•counter propagation network
•feature mapping
•Self-organizing feature map
•Applications of neural algorithms
•Neural Network control
1
Introduction
The main property of a neural network is an ability to learn from its
environment, and to improve its performance through learning. So far we
have considered supervised or active learning learning with an external
“teacher” or a supervisor who presents a training set to the network. But
another type of learning also exists: unsupervised learning.
Learning without a teacher.
No feedback to indicate the desired outputs.
The network must by itself discover the relationship of interest
from the input data.
– E.g., patterns, features, regularities, correlations, or
categories.
Translate the discovered relationship into output
2
• Unsupervised learning
– Training samples contain only input patterns
• No desired output is given (teacher-less)
– Learn to form classes/clusters of sample patterns according to
similarities among them
• Patterns in a cluster would have similar features
• No prior knowledge as what features are important for
classification, and how many classes are there.
• In contrast to supervised learning, unsupervised or self-
organised learning does not require an external teacher.
During the training session, the neural network receives a
number of different input patterns, discovers significant
features in these patterns and learns how to classify input
data into appropriate categories. Unsupervised learning
tends to follow the neuro-biological organisation of the brain.
• Unsupervised learning algorithms aim to learn rapidly and
can be used in real-time 3
Hebbian learning
In 1949, Donald Hebb proposed one of the key
ideas in biological learning, commonly known as
Hebb’s Law. Hebb’s Law states that if neuron i is
near enough to excite neuron j and repeatedly
participates in its activation, the synaptic connection
between these two neurons is strengthened and
neuron j becomes more sensitive to stimuli from
neuron i.
4
Hebb’s Law can be represented in the form of two
rules:
1. If two neurons on either side of a connection
are activated synchronously, then the weight of
that connection is increased.
2. If two neurons on either side of a connection
are activated asynchronously, then the weight
of that connection is decreased.
Hebb’s Law provides the basis for learning
without a teacher. Learning here is a local
phenomenon occurring without feedback from
the environment.
5
Hebbian learning in a neural network
Output Signals
Input Signals
i j
6
Using Hebb’s Law we can express the adjustment
applied to the weight wij at iteration p in the
following form:
w ij ( p ) F [ y j ( p ), x i ( p ) ]
Step 2: Activation.
10
Hebbian learning example
To illustrate Hebbian learning, consider a fully
connected feedforward network with a
single layer of five computation neurons. Each
neuron is represented by a McCulloch and Pitts model
with the sign activation function. The network is
trained on the following set of input vectors:
0 0 0 0 0
0 1 0 0 1
X1 0 X 2 0 X3 0 X 4 1 X5 0
0 0 1 0 0
0 1 0 0 1
11
Initial and final states of the network
1 1 y1 1 0 y1
x1 1 1 x 1 1
0 0 y2 0 1 y2
x2 2 2 x 2 2
x3 0 3 3
0 y3 x 0 3 3
0 y3
0 0 y4 0 0 y4
x4 4 4 x 4 4
x5 1 5 5 1 y5 x 1 5 5 1 y5
Input layer Output layer Input layer Output layer
12
Initial and final weight matrices
Ou tp u t l a ye r Ou tp u t l a yer
1 2 3 4 5 1 2 3 4 5
1 1 0 0 0 0 1 0 0 0 0 0
0 1 0 0 0
2
2
0 2.0204 0 0 2.0204
3 0 0 1 0 0 3 0 0 1.0200 0 0
4
0 0 0 1 0 4 0 0 00 .9996 0
5 0 0 0 0 1 5 0 2.0204 0 0 2.0204
13
A test input vector, or probe, is defined as
1
0
X 0
0
1
– If these class nodes compete with each other, maybe only one
will win eventually and all others lose (winner-takes-all). The
winner represents the computed classification of the input
15
Competitive learning
In competitive learning, neurons compete among
themselves to be activated.
While in Hebbian learning, several output neurons
can be activated simultaneously, in competitive
learning, only a single output neuron is active at
any time.
The output neuron that wins the “competition” is
called the winner-takes-all neuron.
16
• Winner-takes-all (WTA):
– Among all competing nodes, only one will win and all
others will lose
– We mainly deal with single winner WTA, but multiple
winners WTA are possible (and useful in some
applications)
– Easiest way to realize WTA: have an external, central
arbitrator (a program) to decide the winner by
comparing the current outputs of the competitors (break
the tie arbitrarily)
– This is biologically unsound (no such external
arbitrator exists in biological nerve system).
17
• Ways to realize competition in NN
– Lateral inhibition (Maxnet, Mexican hat)
output of each node feeds wij , w ji 0
xi xj
to others through inhibitory
connections (with negative weights)
– Resource competition w ii 0 w jj 0
• output of node k is distributed to
node i and j proportional to wik xi xj
and wjk , as well as xi and xj
wik w jk
• self decay
• biologically sound xk
18
19
• This rule is an example of competitive learning, and it is used for
unsupervised network training.
• Typically, winners take all learning is used for learning statistical
properties of inputs
• The learning is based on the premise that one of the neurons in
the layer, say the mth, has the maximum response due to input x
• This neuron is declared the winner.
• The weight vectors are defined as below
20
The individual weight adjustment becomes
21
HAMMING NET AND MAXNET
23
Fig: Hamming network for n bit bipolar binary vectors representing p classes:
(a) classifier network and (b) neurons' activation function.
24
The above equation is equivalent to
The weight matrix WH of the Hamming network can be created by encoding the
class vector prototypes as rows in the form as below
25
Adding the fixed bias value of n/2 to the input of each neuron results
in the total input net,
An input vector that is the complement of the prototype of class m would result
in f(netm) = 0.
26
Fig. MAXNET for p classes: (a) network architecture and (b) neuron's activation function.
27
• MAXNET needs to be employed as a second layer only for cases in which an
enhancement of the initial dominant response of the m th node is required.
• As a result of MAXNET recurrent processing, the mth node responds
positively, as opposed to all remaining nodes whose responses should have
decayed to zero.
• MAXNET is a recurrent network involving both excitatory and inhibitory
connections.
• The excitatory connection within the network is implemented in the form of
a single positive self-feedback loop with a weighting coefficient of 1.
• All the remaining connections of this fully coupled feedback network are
inhibitory.
• They are represented as M - 1 cross-feedback synapses with coefficients -E
from each output.
• The second layer weight matrix W, of size p X p is thus of the form
28
• The quantity E can be called the lateral interaction coefficient.
• With the activation function as shown previous figure and the initializing
inputs fulfilling conditions
where is a nonlinear diagonal matrix operator with entries f(*) given below
29
• Assume that ym0 > yi0:, i = 1, 2, . . . , p and i m.
• During the first recurrence, all entries of y1 are computed
on the linear portion of f(net).
• The smallest of all yo entries will first reach the level f(net)
= 0, assumed at the k'th step.
• The clipping of one output entry slows down the decrease
of ymk+1 all forthcoming steps.
• Then, the second smallest entry of yo reaches f(net) = 0.
• The process repeats itself until all values except for one, at
the output of the m'th node, remain at nonzero values.
30
The basic idea of competitive learning was
introduced in the early 1970s.
In the late 1980s, Teuvo Kohonen introduced a
special class of artificial neural networks called
self-organizing feature maps. These maps are
based on competitive learning.
31
What is a self-organizing feature map?
Our brain is dominated by the cerebral cortex, a
very complex structure of billions of neurons and
hundreds of billions of synapses. The cortex
includes areas that are responsible for different
human activities (motor, visual, auditory,
somatosensory, etc.), and associated with different
sensory inputs. We can say that each sensory
input is mapped into a corresponding area of the
cerebral cortex. The cortex is a self-organizing
computational map in the human brain.
32
The Kohonen network
The Kohonen model provides a topological
mapping. It places a fixed number of input
patterns from the input layer into a higher-
dimensional output or Kohonen layer.
Training in the Kohonen network begins with the
winner’s neighborhood of a fairly large size. Then,
as training proceeds, the neighborhood size
gradually decreases.
33
Feature-mapping Kohonen model
Kohonen layer Kohonen layer
1 0 0 1
(a) (b)
34
Architecture of the Kohonen Network
y1
x1
y2
x2
y3
Input Output
layer layer
35
The lateral connections are used to create a
competition between neurons.
The neuron with the largest activation level
among all neurons in the output layer becomes
the winner. This neuron is the only neuron that
produces an output signal. The activity of all
other neurons is suppressed in the competition.
The lateral feedback connections produce
excitatory or inhibitory effects, depending on the
distance from the winning neuron. This is
achieved by the use of a Mexican hat function
which describes synaptic weights between
neurons in the Kohonen layer. 36
The Mexican hat function of lateral connection
Connection
1 strength
Excitatory
effect
0 Distance
Inhibitory Inhibitory
effect effect
37
In the Kohonen network, a neuron learns by
shifting its weights from inactive connections to
active ones. Only the winning neuron and its
neighbourhood are allowed to learn. If a neuron
does not respond to a given input pattern, then
learning cannot occur in that particular neuron.
The competitive learning rule defines the change
wij applied to synaptic weight wij as
( xi wij ), if neuron j wins the competition
wij
0, if neuron j loses the competition
jX min X W j , j = 1, 2, . . ., m
j
40
Suppose, for instance, that the 2-dimensional input
vector X is presented to the three-neuron Kohonen
network,
0.52
X
0. 12
41
We find the winning (best-matching) neuron j X
42
The updated weight vector W at iteration (p + 1)
3
is determined as:
43
Competitive Learning Algorithm
Step 1: Initialization.
Set initial synaptic weights to small random
values, say in an interval [0, 1], and assign a small
positive value to the learning rate parameter .
44
Step 2: Activation and Similarity Matching.
Activate the Kohonen network by applying the
input vector X, and find the winner-takes-all (best
matching) neuron j at iteration p, using the
X
47
Competitive learning in the Kohonen network
To illustrate competitive learning, consider the
Kohonen network with 100 neurons arranged in the
form of a two-dimensional lattice with 10 rows and
10 columns. The network is required to classify
two-dimensional input vectors - each neuron in the
network should respond only to the input vectors
occurring in its region.
The network is trained with 1000 two-dimensional
input vectors generated randomly in a square
region in the interval between –1 and +1. The
learning rate parameter a is equal to 0.1.
48
Initial random weights
1
0.8
0.6
0.4
0.2
W(2,j)
-0.2
-0.4
-0.6
-0.8
-1
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
W(1,j)
49
Network after 100 iterations
1
0.8
0.6
0.4
0.2
W(2,j)
-0.2
-0.4
-0.6
-0.8
-1
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
W(1,j)
50
Network after 1000 iterations
1
0.8
0.6
0.4
0.2
W(2,j)
-0.2
-0.4
-0.6
-0.8
-1
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
W(1,j)
51
Network after 10,000 iterations
1
0.8
0.6
0.4
0.2
W(2,j)
-0.2
-0.4
-0.6
-0.8
-1
-1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1
W(1,j)
52
Competitive learning (Kohonen 1982) is a special case of
SOM (Kohonen 1989)
In competitive learning,
the network is trained to organize input vector space into
subspaces/classes/clusters
each output node corresponds to one class
the output nodes are not ordered: random map
cluster_1
C
IQ 54
l a s s
s i
i f
fiiccaattiioonn
C las
TTrryy C
A
Height
C
IQ 55
A
Height
C
IQ 56
Height
A B
C
IQ 57
l a s s
s i
i f
fiiccaattiioonn
C las
TTrryy C
Height
A B
C
IQ 58
Height
IQ
59
Height
IQ
60
Categorize the input patterns into several
classes based on the similarity among patterns.
Height
IQ 61
Categorize the input patterns into several
classes based on the similarity among patterns.
m a y
y h
h a
a v
v e
e ?
?
l a s s
s e
e s
s w
w e
e m a
o w m
m a
a n
n y
y c
c l a s
Ho w
H
Height
IQ 62
Categorize the input patterns into several
classes based on the similarity among patterns.
22 clusters
clusters
Height
IQ 63
Categorize the input patterns into several
classes based on the similarity among patterns.
33 clusters
clusters
Height
IQ 64
Categorize the input patterns into several
classes based on the similarity among patterns.
44 clusters
clusters
Height
IQ 65
Suppose that we have p prototypes centered at x(1),
x(2), …, x(p).
Given pattern x, it is assigned to the class label of
the ith prototype if
11 22
x(1) x(2)
33 44
x(3) x(4)
67
11 22
x(1) x(2)
33 ?Class
44
x(3) x(4)
68
Applications of neural algorithms
• Neural networks represent a class of very
powerful, general-purpose tools that have been
successfully applied to prediction, classification
and clustering problems. They are used in a
variety of areas, from speech and character
recognition to detecting fraudulent transactions,
from medical diagnosis of heart attacks to process
control and robotics, from predicting foreign
exchange rates to detecting and identifying
radar targets.
69
We demonstrate an application of a multilayer
feedforward network for printed
character recognition.
70
Bit maps for digit recognition
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31 32 33 34 35
36 37 38 39 40
41 42 43 44 45
71
How do we choose the architecture of a
neural network?
The number of neurons in the input layer is
decided by the number of pixels in the bit map.
The bit map in our example consists of 45
pixels, and thus we need 45 input neurons.
The output layer has 10 neurons – one neuron
72
How do we determine an optimal number
of hidden neurons?
Complex patterns cannot be detected by a small
number of hidden neurons; however too many
of them can dramatically increase the
computational burden.
Another problem is overfitting. The greater the
number of hidden neurons, the greater the
ability of the network to recognise existing
patterns. However, if the number of hidden
neurons is too big, the network might simply
memorise all training examples.
73
Neural network for printed digit recognition
0 1 1 0
1 2 2 1
1 3
3 0
1
1 4
4 0
0 5 2
5 0
3
6 0
1 41 4
7 0
1 42 5
1 43
8 0
1 44 9 0
1 45 10 0
74
What are the test examples for character
recognition?
75
Learning curves of the digit recognition
three-layer
neural networks
10 2
10 1
1
10
4 3 2
1 -1
1 – two hidden neurons;
2 – five hidden neurons;
1 -2 3 – ten hidden neurons;
4 – twenty hidden neurons
-3
1
-4
1
0 50 150 200 250
Epochs
76
Performance evaluation of the digit
recognition
25 neural networks
0
0 0.1 0.2 0.3 0.4 0.5
77
Can we improve the performance of the
character recognition neural
network?
A neural network is as good as the examples
used to train it.
Therefore, we can attempt to improve digit
recognition by feeding the
network with
“noisy” examples of digits from 0 to 9.
78
Performance evaluation of the digit recognition
network trained with “noisy”
examples
16
10
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.
Noise Level
79
Characteristics of NNs
Learning from experience: Complex difficult to solve
problems, but with plenty of data that describe the problem
Generalizing from examples: Can interpolate from previous
learning and give the correct response to unseen data
Rapid applications development: NNs are generic machines
and quite independent from domain knowledge
Adaptability: Adapts to a changing environment, if is
properly designed
Computational efficiency: Although the training off a neural
network demands a lot of computer power, a trained network
demands almost nothing in recall mode
Non-linearity: Not based on linear assumptions about the real
word
80
Neural Networks Projects Are Different
• Projects are data driven: Therefore, there is a need to collect and analyse data
as part of the design process and to train the neural network. This task is often
time-consuming and the effort, resources and time required are frequently
underestimated
• It is not usually possible to specify fully the solution at the design stage:
Therefore, it is necessary to build prototypes and experiment with them in order
to resolve design issues. This iterative development process can be difficult to
control
• Performance, rather than speed of processing, is the key issue: More
attention must be paid to performance issues during the requirements analysis,
design and test phases. Furthermore, demonstrating that the performance meets
the requirements can be particularly difficult.
• These issues affect the following areas :
– Project planning
– Project management
– Project documentation
81
Project/Process life cycle:
Application Identification
Feasibility Study
Design Prototype
Data Collection
Development
and validation Build Train and Test
of prototype
Optimize prototype
Validate prototype
Implement System
82
Validate System
NNs in real problems
Rest of System
Raw data
Pre-processing
Feature vector
Input encode
Network inputs
Neural Network
Network outputs
Output encode
Decoded outputs
Post-processing
Rest of System
83
Pre-processing
84