Artificial Neural Networks An Artificial Neuron: X W X W S X W W y
Artificial Neural Networks An Artificial Neuron: X W X W S X W W y
•
•
Artificial neural networks
Background
Artificial neurons, what they can and cannot do
The multilayer perceptron (MLP) x2
x1
w2
w1
An artificial neuron
x0 = +1
Σ
w0 = –θ
f y
S=
y = f (S )
n
i =1
wi xi − θ =
i =0
wi xi
A single neuron as a classifier
The neuron can be used as a classifier
y<0.5 class 0
y>0.5 class 1
Linear discriminant = a hyper plane
w1 θ
x2
The XOR problem
Not linearly separable – must combine two linear discriminants.
x2 x2 = − x1 + 1
w2 w2
2D example:
Only linearly separable
A line
classification problems 0 x1
can be solved. 0 1
AND Two sigmoids implement fuzzy
x1 3
NOR AND and NOR 4
1
Application areas Why neural networks?
(statistical methods are always at least as good, right?)
Finance Industry
• Neural networks are statistical methods
• Forecasting • Adaptive control
• Fraud detection • Signal analysis • Model independence
Medicine • Data mining • Adaptivity/Flexibility
• Image analysis
• Concurrency
Consumer market
• Household equipment • Economical reasons (rapid prototyping)
• Character recognition
• Speech recognition
7 8
Agent
Learning
– The contribution to the error E
system from a particular weight wji is ∂E
∂w ji
Error
Learning Suggested Action The weight should be moved in Error function and transfer function
system actions selector proportion to that contribution, but in the must both be differentiable.
other direction:
∂E
∆w ji = −η
9
∂w ji 10
function is sigmoid: j
j
wkjδ k Otherwise
k
presentation (pattern learning)
1
y j = f (S j ) = −S j
or This is the result the counts!
1+ e sum over all nodes in the
Accumulate weight changes (∆w)
derivative of sigmoid ’next’ layer (closer to the
outputs) until the end of the training set is
reached (epoch or batch learning)
11 12
2
Overtraining Network size
E
Typical error curves • Overtraining is more likely to occur …
if we train on too little data
Test or validation set error if the network has too many hidden nodes
if we train for too long
Training set error
• The network should be slightly larger than the size
necessary to represent the target function
Time (epochs)
• Unfortunately, the target function is unknown ...
Overtraining • Need much more training data than the number of
Cross validation: Use a third set, a validation set, to decide when to stop (find
weights!
the minimum for this set, and retrain for that number of epochs)
13 14
Inputs Outputs
17 18
3
RBF training MLP vs. RBF
• Could use backprop (transfer function still • RBF (hidden) nodes work in a local region, MLP
nodes are global
differentiable)
• MLPs do better in high-dimensional spaces
• Better: Train layers separately • MLPs require fewer nodes and generalizes better
Hidden layer: Find position and size of Gaussians • RBFs can learn faster
by unsupervised learning (e.g. competitive learning, • RBFs are less sensitive to the order in which data is
K-means) presented
Output layer: Supervised, e.g. Delta-rule, LMS, • RBFs make less false-yes classification errors
backprop • MLPs extrapolate better
19 20
4
The winner takes it all Self organising maps
Problem with competitive learning: A node may become invincible
The cerebral cortex is a two- Different neurons in the auditory
B dimensional structure, yet we cortex respond to different
A can reason in more than two frequencies. These neurons are
dimensions located in frequency order!
W
Dimensional reduction Topological preservation /
• Poor initialisation: The weight vectors have been initialised to small random topographic map
numbers (in W), but these are far from the data (A and B)
• The first node to win will move from W towards A or B and will always win,
henceforth Kohonen’s self-organising feature map (SOFM or SOM)
• Solution: Use the data to initialise the weights (as in K-means), or include the Non-linear, topologically preserving, dimensional reduction (like
winning-frequency in the distance measure, or move more nodes than only pressing a flower)
25 26
the winner.
SOM offline
SOM online example example
• A 10x10 SOM, is trained on a chemical analysis of 178 wines from one region in
Italy, where the grapes have grown on three different types of soil. The input is http://websom.hut.fi
13-dimensional.
• After training, wines from different soil types activate different regions of the
A two-dimensional, clickable,
SOM. For example: map of Usenet news articles
(from comp.ai.neural-nets)
• Note that the network is not told that the difference between the wines is the soil
type, nor how many such types (how many classes) there are.
29
5
Growing neural gas Node positions
• Growing unsupervised network (starting from • Start with two nodes
two nodes) • Each node has a set of neighbours,
• Dynamic neighbourhood indicated by edges
• The edges are created and
• Constant parameters destroyed dynamically during
• Very good at following moving targets training
• Can also follow jumping targets • For each sample, the closest node,
k, and all its current neighbours are
• Current work: Using GNG to define and train moved towards the input
the hidden layer of Gaussians in a RBF network
31 32
6
Delaunay triangulation Dead units
• There is only one way for an edge to get ’younger’ – when the two
Connect the codebook vectors in all adjacent Voroni regions
nodes it interconnects are the two closest to the input
• If one of the two nodes wins, but the other one is not the runner-up,
then, and only then, the edge ages
• If neither of the two nodes win, the edge does not age!
Voroni regions (red) and Delaunay The graph of GNG edges is a subset of
triangulation (yellow) the Delaunay triangulation
The input distribution has jumped from the lower
37 38
left to the upper right corner
The lab
(in room 1515!)
• Classification of bitmaps, by supervised learning (back
propagation), using the SNNS simulator
• An illustration of some unsupervised learning algorithms, using
the GNG demo applet
– LBG/LBG-U (≈ K-means)
– HCL (Hard competitive learning)
– Neural gas
– CHL (Competitive Hebbian learning)
– Neural gas with CHL
– GNG/GNG-U (Growing neural gas)
– SOM (Self organising map)
– Growing grid
39