0% found this document useful (0 votes)
21 views59 pages

Neural Networks Six

The document discusses Self-Organizing Maps (SOMs) as a form of unsupervised learning in neural networks, detailing the learning processes and algorithms involved. It explains the architecture of SOMs, emphasizing their unique properties such as topological organization and the competitive process for determining the 'winner' neuron. The document also outlines the phases of the adaptive process, including the self-organizing phase and the convergence phase, along with the mathematical principles underlying the SOM algorithm.

Uploaded by

杨西
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views59 pages

Neural Networks Six

The document discusses Self-Organizing Maps (SOMs) as a form of unsupervised learning in neural networks, detailing the learning processes and algorithms involved. It explains the architecture of SOMs, emphasizing their unique properties such as topological organization and the competitive process for determining the 'winner' neuron. The document also outlines the phases of the adaptive process, including the self-organizing phase and the convergence phase, along with the mathematical principles underlying the SOM algorithm.

Uploaded by

杨西
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Lecture Six

Self-Organizing Maps

Xiang Cheng
Associate Professor
Department of Electrical & Computer Engineering
The National University of Singapore

Phone: 65166210 Office: Block E4-08-07


Email: elexc@nus.edu.sg 1
7

Learning

What is learning in neural networks?


Learning is a process by which the free parameters of a neural network
are adapted through a process of stimulation by the environment in
which the network is embedded. The type of learning is determined by
the manner in which the parameter changes take place.

Process of learning:

1. The neural network is stimulated by an environment

2. The neural network undergoes changes in its free parameters as a


result of this stimulation.
3. The neural network responds in a new way to the environment
because of the changes that have occurred in its internal structure.

2
Learning with a teacher:
Supervised Learning
Process of learning:

1. The neural network is fed


with input, and produce an
output.

2. The teacher will tell what the


desired output should be, and the
error signal is generated.

3. The weights are adjusted by the


error signals.

What are the examples? MLP and RBFN


3
Learning without a teacher:
Reinforcement Learning
Process of learning:

1. The neural network is


interacting with the
environment by taking
various actions.

2. The learning system will be


rewarded or penalized by its
actions.

3. The weights are adjusted by the


reinforcement signal.

Did we ever encounter


RL in our studies so far?
No.
But you are going to learn it in the second part of this course! 4
Learning without a teacher:
Unsupervised or self-organized learning
Process of learning:

1. The neural network is


fed with input.

2. The weights are adjusted based


upon the input signals only!

How can the system adjust the weights without any error or reward signal?

Did we introduce any type of unsupervised learning in this course so far?

In fact, K-means for clustering as discussed in lecture five is unsupervised learning.


Today, we are going to learn a neural network designed for unsupervised learning.
5
In MLP, we used supervised learning.
Is supervised learning biological plausible? Does our brain use
“Backpropagation” to adjust the weights?
This is not biologically plausible: In a biological system, there is
no external “teacher” who manipulates the network’s weights
from outside the network.
Biologically more adequate: unsupervised learning.
Reinforcement Learning is of course well biologically based.

We will study Self-Organizing Maps (SOMs) as another


example for unsupervised learning, which also has a sound
biological basis.
6
Feature Mapping of the Human Cortex

The neurons are well organized! Neighboring areas in these maps represent
neighboring areas in the sensory input space or motory output space.
Is this topographical organization entirely genetically programmed? 7
Self-Organizing Maps: History
In 1973, von der Malsburg studied the self-organizing property of
the visual cortex, and concluded that:
A model of the visual cortex could not be entirely genetically
predetermined; rather, a self-organizing process involving
synaptic learning may be responsible for the local ordering of
feature-sensitive cortical cells.
C. von der Malsburg The computer simulation by von der Malsburg was perhaps
(1942—present) the first to demonstrate self-organization. However, global
topographic ordering was not achieved in his model.

The SOM is now always associated with Kohonen, who


published his work on SOM in 1982, 4 years before BP.

Principle of Topographic Map Formation:


The spatial location of an output neuron in a topographic map
corresponds to a particular domain or feature of data drawn
from the input space.
Teuvo Kohonen
The neurons are spatially organized in a meaningful way!
(1934-2021) 8
The topology-conserving mapping can be achieved by SOMs:
• Two layers: input layer and output (map) layer
There is no hidden layer because SOM was not designed for
function approximation.
• Input and output layers are fully connected.
• A topology (neighborhood relation) is defined on
the output layer.
The location of the neurons in the output layer is important!
For MLP or RBFN, do we care about where the neurons are in the output
layer?
No.
Topological organization is the unique property of SOM.

9
SOM----Goal

Lattice

Visualization

The principle goal of the self-organizing map is to transform an incoming


signal pattern of arbitary dimension into a one- or two dimensional discrete
feature map, and to perform this transformation adaptively in a
topologically ordered fashion.
10
SOM – Architecture

j
2d array of neurons
wj1 wj2 wj3 wjn Synaptic Weights

x1 x2 x3 ... xn Input vector


(connected to all neurons in lattice)
For each output neuron j, there are a set of synaptic weights wji connected
from all the input neurons.
What is the purpose of these weights?
Do we use these weights to compute the outputs by McCulloch and Pitts model?
n
y j = ϕ (∑ w ji xi + b j )
i =1
No. In fact, there are no output signals produced by the output neurons!
The synaptic weights act as the underlying “codes” of the neurons!
Are the output neurons connected to each other by synaptic weights?
No. They are not connected by synaptic weights.
But they are related to each other by their locations in the 2D-map!
How to describe their locations in the map?
It can be simply described by their indices in the 2d-array.
A neuron with the index (i,j) in the 2d array, its location is simply (i,j) in the 2-d plane.
SOM –Algorithm Overview

j
2d array of neurons
wj1 wj2 wj3 wjn Weighted synapses

x1 x2 x3 ... xn Input vector


(connected to all neurons in lattice)

1. Randomly initialise all the weights.


2. Select input vector x = [x1, x2, x3, … , xn] from the training set.
3. Compare x with weights wj for each neuron j to determine winner.
4. Update winner so that it becomes more like x, together with the
winner’s neighbours.
5. Adjust parameters: learning rate & ‘neighbourhood function’.
6. Repeat from (2) until the map has converged (i.e. no noticeable changes
in the weights) or pre-defined no. of training cycles have passed.
12
(i) Randomly initialise the weight
vectors wj for all nodes j

This can be done by assigning them


small values picked from a random
number generator (such as the MATLAB
command “rand”).

(ii) Sampling: choose an input vector x from the training


set to stimulate the network
It can be chosen randomly from the training set (if the set is huge), or
one by one in a deterministic manner like that for sequential learning
for MLP. 13
j
2d array of neurons
wj1 wj2 wj3 wjn Weighted synapses

x1 x2 x3 ... xn Input vector


(connected to all neurons in lattice)

The formation of the self-organizing map:


1. Competition
2. Cooperation
3. Synaptic Adaptation

14
Competitive Process (Finding a winner)
A continuous input space of activation patterns is mapped
onto a discrete output space of neurons by a process of
competition among the neurons in the network .

Find the best-matching neuron w(x), usually the neuron whose weight
vector has smallest Euclidean distance from the input vector x
i(x) = arg minj||x – wj ||
The winning neuron is that which is in some sense ‘closest’ to the input
vector.
‘Euclidean distance’ is the straight line distance between the data points,
if they are plotted on a (multi-dimensional) graph
Euclidean distance between two vectors a and b, a = (a1,a2,…,an), b =
(b1,b2,…bn), is calculated as:

∑ (a − bi )
Euclidean distance
2
d a, b = i
i 15
The winning neuron locates the center of a topological
neighborhood of cooperating neurons.
How to define a topological neigborhood that is neurobiologically correct?
There is neurobiological evidence for lateral interaction:
A neuron that is firing tends to excite the neurons in its immediate
neighborhood more than those farther away from it.
The closer to the winner neuron, the more impact it receives.
From the biological evidence,
1. The topological neighborhood is symmetric about the maximum point (the
winner neuron).
2. The amplitude of the topological neighborhood decreases monotonically with
the increasing lateral distance.
ϕ

What is the natural choice


of the function to describe
this bell shape?
position of k
position of i 16
The typical choice of topological neighborhood function

d 2j ,i
h j ,i ( x ) = exp(− )
2σ 2

d j ,i the Euclidian distance from the neuron j to the winning neuron i


(associated with the input vector x).
The position of the neuron is described by the index of the neuron in the matrix
(2d-lattice). For instance, the distance from the neuron at position (l, m) to the one
at (n,k) is
d j ,i = (l − n) 2 + (m − k ) 2
h j ,i ( x ) is a measure of the effectiveness of the winner on its neighbors.
σ “effective width” of the topological neighborhood.
When the neuron is σ away from the winner, hj,i=exp(-0.5)=0.61, it is heavily affected.
When the neuron is 2σ away from the winner, hj,i=exp(-2)=0.14, it is less affected.
17
Another unique feature of the SOM algorithm is that the size of
the topological neighborhood shrinks with time.

What is the difference between the two Gaussians above? The effective width!
“effective width” decreases with time. A popular choice is:
Time-varying width

τ1 is a time-constant to control the decay rate of the effective width.


d 2j ,i
Time-varying neighborhood function:
h j ,i ( x ) (n) = exp(− )
2σ (n) 2

It means that the influence of the winner on its neighbors decreases with time. 18
Adaptation

What is the geometrical meaning of this algorithm? w(n+1)


x

∆𝑤𝑤𝑗𝑗 𝑛𝑛 = 𝑤𝑤𝑗𝑗 𝑛𝑛 + 1 − 𝑤𝑤𝑗𝑗 𝑛𝑛 = 𝜂𝜂(𝑛𝑛)ℎ𝑗𝑗,𝑖𝑖 𝑥𝑥 𝑛𝑛 (𝑥𝑥 − 𝑤𝑤𝑗𝑗 𝑛𝑛 �

The change of the weight is in the direction


w(n)
from the current weight to the input!
The synaptic weight vector wi of winning neuron i moves
toward the input vector x!
All the neurons in the neighborhood of the winning neuron also
move toward the input vector! The farther away, the less change.
Upon repeated presentations of the training data, the synaptic weight vector
tend to follow the distribution of the input vectors due to the neighborhood
updating.
The algorithm leads to a topological ordering of the feature map in the
sense that neurons that are adjacent in the lattice will tend to have similar
synaptic weight vectors. 19
Adaptation---Learning Rate

Would it converge if the learning rate is a constant?

No. Since ,, the winner neuron will always update its weights
(unless the winner matches the input exactly) although its effect on its
neighbors decreases with time.

How to make sure that the weights converge?

The learning-rate parameter η(n) should also decrease with time.

One possible choice:

τ2 is another time-constant to control the decay rate of the learning rate.


20
The first phases of the adaptive process
The adaptation of the weights is decomposed into two phases: an ordering or
self-organizing phase followed by a convergence phase.
1. Self-organizing or ordering phase
The ordering phase may take as many as 1000 iterations, and possibly more.
•The learning rate η(n) should begin with a value close to 0.1; there after it
should decrease gradually, but remain above 0.01.
n
η 0 = 0.1 τ 2 = T →η (n) = 0.1 exp(− ), n = 0,1,2,..., T
T
T is the total number of iterations for the first phase.
The neighborhood function should initially include almost all neurons, and then
shrink with time. We may set the initial size of the width equal to the “radius” of
the lattice. For instance if the size of the lattice is MxN, then the initial width can
be set as
M2 + N2
σ0 =
2
T
Correspondingly, the time constant can be chosen as τ1 =
log(σ 0 )

At the end, n =T σ (T ) = 1
The winner only affects its immediate neighbors at the end of first phase! 21
The second phase of the adaptive process

2. Convergence phase: This second phase is needed to fine tune the


feature map.

As a general rule, the number of itertations must be at least 500


times the number of neurons in the network. Thus, the convergence
phase may have to go on for thousands and possibly tens of
thousands of iterations.
For good statistical accuracy, the learning parameter η(n) should be
maintained at a small value, on the order of 0.01. In any event, it must not
be allowed to decrease to zero.

The neighborhood function should contain only the nearest


neighbors of a winning neuron, which may eventually reduce to one
or zero neighboring neurons.

In many applications, the second phase is not needed if convergence of the


parameters are not critical. 22
Summary of the Algorithm
For n-dimensional input space and m output neurons:
(1) Randomly initialize the weight vector wi for neuron i, i = 1, ..., m
(2) Sampling: choose an input vector x from the training set.
(3) Determine winner neuron k:
||wk – x|| = mini ||wi – x|| (Euclidean distance)
(4) Update all weight vectors of all neurons i in the
neighborhood of the winning neuron i(x):

(5) If convergence criterion met, STOP.


Otherwise, go to (2).
Isn’t it a simple algorithm? It is the simplest compared to MLP and RBFN.

The Kohonen’s SOM algorithm is so simple to implement, yet mathematically so


difficult to analyze its properties in a general setting. It is still a hot research topic.

23
How to use SOM?
After training the network, how to use the map?
j
Question: how do you find out whether the
wj1 wj2 wj3 wjn neurons are organized in a meaningful way?
x1 x2 x3 ... xn What does each neuron stand for?

There are many possible ways to interpret the neurons. The simplest one is just
finding out which input signal (in the training set) stimulates the particular neuron
the most.

Determine winner input signal for neuron j:


||xk – wj|| = mini ||xi – wj|| (Euclidean distance)

Mark each neuron by the particular input signal for which it produces the best response.

The resulting map is so called the “contextual map” or “semantic map”.


24
Example I: Learning a one-dimensional representation of a
two-dimensional (triangular) input space:
In this case, the topological map is one-dimensional.

0 20 100

1000 10000 25000


25
Example II: Learning a two-dimensional representation of a two-dimensional
(square) input space:
In this case, the neurons are organized in a 2-d lattice (most common way).

demo

26
How about high-
dimensional input?

Example III:
Learning a two-
dimensional mapping
of texture images

The inputs are texture


images. The SOM is
a 10x8 map.

27
Example IV: Classifying World Poverty
Helsinki University
of Technology

‘Poverty map’ based on 39 indicators from World Bank statistics (1992) 28


Example V: WEB SOM

 SOM analysis
technique to map
thousands of
articles posted on
Usenet
newsgroups

Lagus et al. (1996);


Honkela et al.
(1998) - HUT NN
Research Centre)

29
Example VI: Contextual Maps by SOM
Animal names and their attributes
Dove Hen Duck Goose Owl Hawk Eagle Fox Dog Wolf Cat Tiger Lion Horse Zebra Cow
Small 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0
is Medium 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
Big
2 legs
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
1
0
1
0
1
0
1
0
1
0
A grouping according to similarity has
has
4 legs
Hair
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 emerged
Hooves 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
Mane 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0
Feathers
Hunt
1
0
1
0
1
0
1
0
1
1
1
1
1
1
0
1
0
0
0
1
0
1
0
1
0
1
0
0
0
0
0
0
peaceful
likes Run 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0
to Fly 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0
Swim 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0

birds

SOM can really organize the data!


hunters
[Teuvo Kohonen 2001] Self-Organizing Maps; Springer; 30
Break

Alpha Go Deep Mind:

31
What is Neural Network (NN)?

A neural network is a massively parallel distributed processor made up of simple


processing unit, which has a natural propensity for storing experiential knowledge
and making it available for use.

It employs a massive inter-connection Knowledge is obtained


of “simple” computing units - neurons. from the data/input
It is capable of organizing its structure signals provided.
consists of many neurons, to perform
tasks that are many times faster than the Knowledge is learned by
fastest digital computers nowadays. adjusting the synapses!

The artificial neural networks are largely inspired by the biological neural
networks, and the ultimate goal of building an intelligent machine which can
mimic the human brain.
32
The understanding of neuron started more than 100 years ago:

Santiago Ramon y Cajal 1852-1934 33


Biological Neuron
The major job of neuron:
1. It receives information, usually in the form
of electrical pulses, from many other neurons.
2. It does what is, in effect, a complex dynamic
sum of these inputs.
3. It sends out information in the form of a stream
of electrical impulses down its axon and on to
many other neurons.

4. The connections (synapses) are crucial for


excitation or inhibition of the cells.

5. Learning is possible by adjusting the


synapses!

How to build a mathematical model of the neuron?

This is the starting point of the artificial neural networks!


The beginning of the artificial neural networks
TMcCulloch and Pitts, 1943

35
The next major step: Perceptron—single layer neural networks
Frank Rosenblatt, 1958

Supervised learning:
w(n + 1) = w(n) + ηe(n) x(n)
Frank Rosenblatt
e( n ) = d ( n ) − y ( n )
(1928-1971)
The weights were initially random. Then it could learn to perform certain simple
tasks in pattern recognition.
Rosenblatt proved that for a certain class of problems, it could learn to behave
correctly! During 1960s’, it seemed that neural networks could do anything. 36
15

Perceptron Learning Algorithm


Start with a randomly chosen weight vector w(1);
Update the weight vector to
w(n + 1) = w(n) + ηe(n) x(n)
e( n ) = d ( n ) − y ( n )

2
3 Perceptron Convergence Theorem: (Rosenblatt, 1962)

If C1 and C2 are linearly separable, then the perceptron training algorithm


“converges” in the sense that after a finite number of steps, the synaptic
weights remain unchanged and the perceptron correctly classifies all
elements of the training set.

Is there any condition on the learning-rate to make the algorithm work?


Any positive learning rate would be fine!
37
2

4 Regression Problem
Consider a multiple input-single output system whose mathematical
characterization is unknown:

Given a set of observations of input-output data:

m = dimensionality of the input space; i = time index.


How to design a model of the unknown system?

38
Optimization problem: Minimize the cost function!
What is the most common cost function?
n n
E ( w) = ∑ e(i ) 2 = ∑ (d (i ) − y (i )) 2
i =1 i =1

What is the Optimality Condition?

There are two ways to solve the problem:

If the model is simple, then directly solve

Iterative descent algorithm: Starting with an initial guess denoted by w(0),


generate a sequence of weight vectors w(1), w(2),…, such that the cost function
E(w) is reduced at each iteration of the algorithm, as shown by
E(w(n+1)) < E(w(n))

How to choose the iterative algorithm


What is the simplest way if the
w(n + 1) = w(n) + ∆w(n)
gradient is known?
such that the cost is always decreasing ? 39
Method of Steepest Descent (Gradient Descent)
w(n + 1) = w(n) + ∆w(n)
Successive adjustment applied to the weight vector w are in the direction of
steepest descent (a direction opposite to the gradient vector ∇E(w)).

Let g(n) =∇E(w(n)), steepest descent algorithm is formally described by

where η is a positive constant called the stepsize or learning-rate


Is there any condition on the learning-rate to make the algorithm work?
The rate has to be sufficiently small! 40
31

Linear Regression Problem


Consider that we are trying to fit a linear model to a set of input-output
pairs (x(1), d(1)), (x(2), d(2)) …, (x(n), d(n)) observed in an interval of
duration n.
y(x)=w1x1+w2x2+…+wnxn+b
n n
E ( w) = ∑ e(i ) 2 = ∑ (d (i ) − y (i )) 2
Cost Function: i =1 i =1

Of course, the answer can be easily found by solving directly.


Standard Linear Least Squares
w = ( X T X ) −1 X T d

The Least-Mean-Square algorithm:

e(n) = d (n) − wT (n) x(n)


w(n + 1) = w(n) + ηe(n) x(n)
Does it take the same form as that for the Perceptron?

Yes.
Does it also converge in finite steps? No. 41
1
0 The fundamental limits of Single Layer Perceptrons
For Pattern Recognition Problem: Linearly Separable

For Regression Problem: The process has to be close to a linear model!

42
Multilayer Perceptron (MLP) and Back Propagation
David Rumelhart and his colleagues, 1986

David Rumelhart (1942-present)

The hidden layer provides a nonlinear map from the original


input space to a feature space.

The nonlinearly separable problem in the input space can


become linearly separable problem in the feature space!

43
Single Layer Perceptron v.s. Multi-layer Perceptrons

Pattern Recognition Problem:

44
Single Layer Perceptron v.s. Multi-layer Perceptrons
Regression Problem:

Multi-layer Perceptrons can approximate any bounded continuous functions!


The learning algorithms are based upon the steepest descent method:
w(k + 1) = w(k ) − ηg (k )

w(n + 1) = w(n) + ηe(n) x(n)

Output Error Input Signal Local Error Input Signal


45
Signal-flow graphic representation of BP

46
How to design and train the neural network?
How many hidden layers?

Normally one hidden layer is enough.

Two or more hidden layers may be better if the target function can be clearly
decomposed into sub-functions. The number of layers depends upon the number of
levels the function can be decomposed.

How many hidden neurons?

The hidden neurons are simply building blocks.

If the geometrical shape can be perceived, then use the minimal number of line segments
(or hyper-planes) as the starting point.

For higher dimensional problem, start with a large network, then use SVD to determine
the effective number of hidden neurons.

47
How to design and train the neural network?

How to choose activation function in the hidden layers?


Hyperbolic tangent (tansig) is the preferred one in all the hidden neurons.

How to choose activation functions in the output neurons?


Logsig for pattern recogntion, purelin for regression problem.

How to pre-process the input data?


Normalize all the input variables to the same range such that the mean is close to zero.

When to use sequential leaning? And when to use batch learning ?


Batch learning with second order algorithm (such as trainlm in MATLAB) is usually
faster, but prone to local minima. Sequential learning can produce better solution, in
particular for large database with lots of redundant samples.

How to deal with over-fitting?


You either identify the minimal structure, or use regularization (trainbr in MATLAB).

48
What can MLP do?
MLP can solve regression and pattern recognition problems.
They all have the following characteristic:
There exists a function (map) between inputs x and outputs y: y=F(x)

input x output y
Map

Unfortunately, the mathematical form of this function (map) is unknown!


If the model is just too difficult to build,
or the map is too complicated to be expressed by any known simple functions,

then we can use MLP to approximate this function.


MLP is an universal approximator! It can approximate any bounded function!
All it needs is a training set:
Given a set of observations of input-output data:
{(x(1),d(1)),( x(2),d(2)), (x(3),d(3)),…,(x(N), d(N))}
Use this training data to train the MLP such that the difference between the
desired outputs and the outputs of the MLP are minimized. 49
NETtalk --Terrence Sejnowski and Charles Rosenberg in 1987.
NETtalk was created to learn how to correctly pronounce English from written
English text.

50
"Autonomous Land Vehicle In a
Neural Network” (ALVINN)
by Dean Pomerleau and Todd Jochem,
1993, CMU

The state of the art self-driving car.

51
2

Radial Basis Functions (RBFs)


The activation of a hidden unit is determined by the distance between the
input vector and a prototype vector, the center.

ϕ ( x) = ϕ (|| x − c ||) = ϕ (r )

In most cases, if the distance is zero, the activation would be maximum.


The activation level will drop off further away from the center.

52
M
y ( x) = ∑ wiϕi (|| x −µi ||) + b
i =1

|| x − µi ||2
ϕi (|| x − µi ||) = exp(− 2
)
2σ i

We know that MLP can approximate any bounded


continuous function. Can RBFN also do this?

RBFN can approximate any bounded continuous function!


It is just an existence result! It does not tell you how to obtain the weights.

Given a set of sampling points and desired outputs {(x(i), d(i)), i=1,…,N}

How to find out the parameters {wi}, and such that the cost function
N N
E ( w) = ∑ e(i ) = ∑ (d (i ) − y (i )) 2
2
is minimized?
i =1 i =1

53
20

Hybrid training of RBF networks

Two stage ‘hybrid' learning process:

(Stage 1) Parameterize hidden layer of RBFs.


- hidden unit number (M)
- centre/position
- spread/width
Use unsupervised methods as they are quick.
1. Random Selection
2. Clustering

(Stage 2) Find weight values between hidden and output units.


Aim: Minimize sum-of-squares error between actual outputs and desired responses.
N N
E ( w) = ∑ e(i ) 2 = ∑ ( y (i ) − d (i )) 2 = (Φw − d )T (Φw − d )
i =1 i =1

∂E ( w) w = (Φ T Φ ) −1 Φ T d
=0
∂w
54
26

Regularization Theory
Both MLP and RBFN can lead to to over-fitting (learn the noise present
in the training data) and result in poor generalization.

An alternative approach to cope with over-fitting comes from the theory of


regularization, which is a method of controlling the smoothness of mapping
functions.

55
Regularization Methods
Cost function:
F = E D + λE w

training error cost on smoothness

It involves adding an extra term to the error measure which penalizes


mappings that are not smooth.

For RBFN, a simple way to choose the penalty term is the size of the
weights.
1 N 1 1 1
F ( w) = ∑ ( y (i ) − d (i )) 2 + λ || w ||2 = (Φw − d )T (Φw − d ) + λwT w
2 i =1 2 2 2

w = (Φ T Φ + λI ) −1 Φ T d

λ is the regularization factor


For MLP, regularization algorithm is more involved. Trainbr in MATLAB 56
Feature Mapping of the Human Cortex

Neighboring areas in these maps represent neighboring areas in the sensory


input space or motory output space.
The map can change with learning! 57
SOM – Architecture and Algorithm Overview

j
2d array of neurons
wj1 wj2 wj3 wjn Weighted synapses

x1 x2 x3 ... xn Input vector


(connected to all neurons in lattice)

1. Randomly initialise all weights


2. Select input vector x = [x1, x2, x3, … , xn].
3. Compare x with weights wj for each neuron j to determine winner.
4. Update winner so that it becomes more like x, together with the
winner’s neighbours.
5. Adjust parameters: learning rate & ‘neighbourhood function’.
6. Repeat from (2) until the map has converged (i.e. no noticeable changes
in the weights) or pre-defined no. of training cycles have passed.
58
What lies in the future for AI?
What is happening now?
What would happen in the distant future?
It could be better or worse!
And people are worried!

After taking this course, you will be better prepared to face AI!

59

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy