0% found this document useful (0 votes)
63 views7 pages

Artificial Neural Networks An Artificial Neuron: X W X W S X W W y

Artificial neural networks are computational models inspired by biological neural networks. They are made up of interconnected nodes that store information in their weights. Neural networks are trained by adjusting the weights based on examples, allowing them to learn patterns and generalize to new data. They can implement any function given enough nodes and layers. Common applications include finance, industry, medicine, and consumer markets. Neural networks learn in three forms: supervised, unsupervised, and reinforcement learning. The backpropagation algorithm is commonly used for supervised learning by calculating the error gradient and adjusting weights to minimize error.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views7 pages

Artificial Neural Networks An Artificial Neuron: X W X W S X W W y

Artificial neural networks are computational models inspired by biological neural networks. They are made up of interconnected nodes that store information in their weights. Neural networks are trained by adjusting the weights based on examples, allowing them to learn patterns and generalize to new data. They can implement any function given enough nodes and layers. Common applications include finance, industry, medicine, and consumer markets. Neural networks learn in three forms: supervised, unsupervised, and reinforcement learning. The backpropagation algorithm is commonly used for supervised learning by calculating the error gradient and adjusting weights to minimize error.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7



Artificial neural networks
Background
Artificial neurons, what they can and cannot do
The multilayer perceptron (MLP) x2
x1

w2
w1
An artificial neuron
x0 = +1

Σ
w0 = –θ

f y
S=

y = f (S )
n

i =1
wi xi − θ =

f(S) = any non-linear,


n

i =0
wi xi

• Three forms of learning saturating function, e.g. a


wn step function or a sigmoid:
• The back propagation algorithm xn
1
• Radial basis function networks f (S ) =
1 + e −S
• Competitive learning (and relatives)
1 2


A single neuron as a classifier
The neuron can be used as a classifier
y<0.5 class 0
y>0.5 class 1
Linear discriminant = a hyper plane

w1 θ
x2
The XOR problem
Not linearly separable – must combine two linear discriminants.

x2 x2 = − x1 + 1
w2 w2
2D example:
Only linearly separable
A line
classification problems 0 x1
can be solved. 0 1
AND Two sigmoids implement fuzzy
x1 3
NOR AND and NOR 4

The multilayer perceptron Artificial neural networks ...


• store information in the weights, not in the nodes
• are trained, by adjusting the weights, not
programmed
Inputs Outputs
• can generalize to previously unseen data
• are adaptive
Linear (func. approx.) or • are fast computational devices, well suited for
Sigmoidal (classification)
parallel simulation and/or hardware implementation
Can implement any function, given a sufficiently rich • are fault tolerant
internal structure (number of nodes and layers)
5 6

1
Application areas Why neural networks?
(statistical methods are always at least as good, right?)

Finance Industry
• Neural networks are statistical methods
• Forecasting • Adaptive control
• Fraud detection • Signal analysis • Model independence
Medicine • Data mining • Adaptivity/Flexibility
• Image analysis
• Concurrency
Consumer market
• Household equipment • Economical reasons (rapid prototyping)
• Character recognition
• Speech recognition

7 8

Three forms of learning Back propagation

Supervised Unsupervised Reinforcement Input Output (y)


Desired output (d)

Input Target Environment


Error function
function Reward
State Action

Agent
Learning
– The contribution to the error E
system from a particular weight wji is ∂E
∂w ji
Error
Learning Suggested Action The weight should be moved in Error function and transfer function
system actions selector proportion to that contribution, but in the must both be differentiable.
other direction:
∂E
∆w ji = −η
9
∂w ji 10

Back propagation update rule Training procedure (1)


Network is initialised with small random weights
wji
i j k Split data in two – a training set and a test set
Assumptions ∂E
∆w ji = −η = ηδ j xi
Error is squared error: ∂w ji
n
1 The training set is used for training The test set is used to test for
E=− (d j − y j )2 derivative of error
2 j =1 and is passed through many times. generalization (to see how
Transfer (activation)


y j (1 − y j )( d j − y j ) If node j is an output node well the net does on


δ j = y (1 − y ) Update weights after each previously unseen data).
 

function is sigmoid: j


j


wkjδ k Otherwise
k
presentation (pattern learning)
1
y j = f (S j ) = −S j
or This is the result the counts!
1+ e sum over all nodes in the
Accumulate weight changes (∆w)
derivative of sigmoid ’next’ layer (closer to the
outputs) until the end of the training set is
reached (epoch or batch learning)
11 12

2
Overtraining Network size
E
Typical error curves • Overtraining is more likely to occur …
if we train on too little data
Test or validation set error if the network has too many hidden nodes
if we train for too long
Training set error
• The network should be slightly larger than the size
necessary to represent the target function
Time (epochs)
• Unfortunately, the target function is unknown ...
Overtraining • Need much more training data than the number of
Cross validation: Use a third set, a validation set, to decide when to stop (find
weights!
the minimum for this set, and retrain for that number of epochs)
13 14

Training procedure (2) Practical considerations


1. Start with a small network, train, increase the size, • What happens if the mapping represented by the data is not a
train again, etc., until the error on the training set can function? For example, what if the same input does not always
be reduced to acceptable levels. lead to the same output?
2. If an acceptable error level was found, increase the • In what order should data be presented? Sequentially? At
size by a few percent and retrain again, this time using random?
the cross-validation procedure to decide when to stop. • How should data be represented? Compact? Distributed?
Publish the result on the independent test set. • What can be done about missing data?
3. If the network failed to reduce the error on the training • Trick of the trade: Monotonic functions are easier to learn than
set, despite a large number of nodes and attempts, non-monotonic functions! (at least for the MLP)
something is likely to be wrong with the data.
15 16

Radial basis functions (RBF) Geometric interpretation


• Layered structure, like the MLP, with one hidden layer
• The input space is covered with overlapping Gaussians.
• Output nodes are conventional
• Each hidden node …
measures the distance between its weight vector and the
input vector (instead of a weighted sum) • In classification, the discriminants become hyper spheres
feeds that through a Gaussian (instead of a sigmoid) (circles in 2D).

Inputs Outputs

17 18

3
RBF training MLP vs. RBF
• Could use backprop (transfer function still • RBF (hidden) nodes work in a local region, MLP
nodes are global
differentiable)
• MLPs do better in high-dimensional spaces
• Better: Train layers separately • MLPs require fewer nodes and generalizes better
Hidden layer: Find position and size of Gaussians • RBFs can learn faster
by unsupervised learning (e.g. competitive learning, • RBFs are less sensitive to the order in which data is
K-means) presented
Output layer: Supervised, e.g. Delta-rule, LMS, • RBFs make less false-yes classification errors
backprop • MLPs extrapolate better
19 20

Unsupervised learning K-means


Classifying unlabeled data
Nearest neighbour classifiers K-means, for K=2
• Classify the unknown sample (vector) x to the class of its closest • Make a codebook of two vectors, c1 and c2
previously classified neighbour • Sample (at random) two vectors from the data as
initial values of c1 and c2
The new pattern, x, will • Split the data in two subsets, D1 and D2 where D1
be classified as a . is the set of all points with c1 as their closest
codebook vector, and vice versa
x
• Move c1 towards the mean in D1 and c2 towards
• Problem 1: The closest neighbour may be an outlier from the the mean in D2
wrong class • Repeat from 3 until convergence (until the
• Problem 2: Must store lots of samples and compute distance to codebook vectors stop moving)
each one, for every new sample
21

Voroni regions Competitive learning


M linear, threshold less, nodes
(only weighted sums)
• K-means form so called Voroni regions in the input space N inputs
• The Voroni region around a codebook vector ci is the region in
1. Present a pattern (sample), x With normalised weights, this is
which ci is the closest codebook vector 2. The node with the largest output equivalent to finding the node with the
(node k) is declared winner minimum distance between its weight
3. The weights of the winner is vector and the input vector
updated so that it will become even
stronger the next time the same
pattern is presented. All other Network node = Codebook vector
weights are left unchanged

The standard competitive learning rule


∆wki = η( xi − wki) 1≤ i ≤ N
Voroni regions around 10 codebook vectors 23 24
Competitive learning + batch learning = K-means

4
The winner takes it all Self organising maps
Problem with competitive learning: A node may become invincible
The cerebral cortex is a two- Different neurons in the auditory
B dimensional structure, yet we cortex respond to different
A can reason in more than two frequencies. These neurons are
dimensions located in frequency order!
W
Dimensional reduction Topological preservation /
• Poor initialisation: The weight vectors have been initialised to small random topographic map
numbers (in W), but these are far from the data (A and B)
• The first node to win will move from W towards A or B and will always win,
henceforth Kohonen’s self-organising feature map (SOFM or SOM)
• Solution: Use the data to initialise the weights (as in K-means), or include the Non-linear, topologically preserving, dimensional reduction (like
winning-frequency in the distance measure, or move more nodes than only pressing a flower)
25 26
the winner.

SOM SOM update rule


Competitive learning, extended in two ways:
• Find the winner, node k, and then update all weights by:
1. The nodes are organised in a two-dimensional grid ∆wki = ηf ( j , k )( xi − wki ) 1≤ i ≤ N
(in competitive learning, there is no defined order between nodes)
• f(j, k) is a neighbourhood function in the range [0,1], with a
A 3x3 grid, making a two- maximum for the winner (j=k) and decreasing with distance
dimensional map of the four- from the winner, e.g. a Gaussian
dimensional input space
• Gradually decrease neighbourhood radius (width of the
Gaussians) and learning rate (η) over time.
2. A neighbourhood function is introduced
(not only the winner is updated, but also its closest neighbours in the grid)
• Result: Vectors that are close in the high-dimensional input
space will activate areas that are close on the grid.
27 28

SOM offline
SOM online example example
• A 10x10 SOM, is trained on a chemical analysis of 178 wines from one region in
Italy, where the grapes have grown on three different types of soil. The input is http://websom.hut.fi
13-dimensional.
• After training, wines from different soil types activate different regions of the
A two-dimensional, clickable,
SOM. For example: map of Usenet news articles
(from comp.ai.neural-nets)

• Note that the network is not told that the difference between the wines is the soil
type, nor how many such types (how many classes) there are.
29

5
Growing neural gas Node positions
• Growing unsupervised network (starting from • Start with two nodes
two nodes) • Each node has a set of neighbours,
• Dynamic neighbourhood indicated by edges
• The edges are created and
• Constant parameters destroyed dynamically during
• Very good at following moving targets training
• Can also follow jumping targets • For each sample, the closest node,
k, and all its current neighbours are
• Current work: Using GNG to define and train moved towards the input
the hidden layer of Gaussians in a RBF network
31 32

Node creation Node creation (contd.)


• A new node (blue) is created every • Here, a fourth node has just been
λ’th time step, unless the maximum created
number of nodes has been reached
• In effect, new nodes are created
• The new node is placed halfway close to where they are most likely
between the node with the greatest
needed
error and the node among its
current neighbours with the • The exact position of the new node
greatest error is not crucial, since nodes move
• The node with the greatest error is around
the most unstable one
33 34

After a while … Neighbourhood


Neighbourhood edges are created and destroyed as follows:
• For each sample, let k denote the winner (the node
closest to the sample) and r the runner-up (the second
closest)
• If an edge exists between k and r, reset its age to 0
– Otherwise, create such an edge and set its age to 0
• Increment the age of all other edges emanating from
node k
7 nodes 50 nodes
(Voroni regions in red) • Edges older than amax are removed, as are any nodes
that in this way loses its last remaining edge
35 36

6
Delaunay triangulation Dead units
• There is only one way for an edge to get ’younger’ – when the two
Connect the codebook vectors in all adjacent Voroni regions
nodes it interconnects are the two closest to the input
• If one of the two nodes wins, but the other one is not the runner-up,
then, and only then, the edge ages
• If neither of the two nodes win, the edge does not age!

Voroni regions (red) and Delaunay The graph of GNG edges is a subset of
triangulation (yellow) the Delaunay triangulation
The input distribution has jumped from the lower
37 38
left to the upper right corner

The lab
(in room 1515!)
• Classification of bitmaps, by supervised learning (back
propagation), using the SNNS simulator
• An illustration of some unsupervised learning algorithms, using
the GNG demo applet
– LBG/LBG-U (≈ K-means)
– HCL (Hard competitive learning)
– Neural gas
– CHL (Competitive Hebbian learning)
– Neural gas with CHL
– GNG/GNG-U (Growing neural gas)
– SOM (Self organising map)
– Growing grid
39

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy