0% found this document useful (0 votes)

21 views59 pages

Neural Networks Six

The document discusses Self-Organizing Maps (SOMs) as a form of unsupervised learning in neural networks, detailing the learning processes and algorithms involved. It explains the architecture of SOMs, emphasizing their unique properties such as topological organization and the competitive process for determining the 'winner' neuron. The document also outlines the phases of the adaptive process, including the self-organizing phase and the convergence phase, along with the mathematical principles underlying the SOM algorithm.

Uploaded by

杨西

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views59 pages

Neural Networks Six

Uploaded by

杨西

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Lecture Six

Self-Organizing Maps

Xiang Cheng
Associate Professor
Department of Electrical & Computer Engineering
The National University of Singapore

Phone: 65166210 Office: Block E4-08-07

Email: elexc@nus.edu.sg 1
7

Learning

What is learning in neural networks?

Learning is a process by which the free parameters of a neural network
are adapted through a process of stimulation by the environment in
which the network is embedded. The type of learning is determined by
the manner in which the parameter changes take place.

Process of learning:

1. The neural network is stimulated by an environment

2. The neural network undergoes changes in its free parameters as a

result of this stimulation.
3. The neural network responds in a new way to the environment
because of the changes that have occurred in its internal structure.

2
Learning with a teacher:
Supervised Learning
Process of learning:

1. The neural network is fed

with input, and produce an
output.

2. The teacher will tell what the

desired output should be, and the
error signal is generated.

3. The weights are adjusted by the

error signals.

What are the examples? MLP and RBFN

3
Learning without a teacher:
Reinforcement Learning
Process of learning:

1. The neural network is

interacting with the
environment by taking
various actions.

2. The learning system will be

rewarded or penalized by its
actions.

3. The weights are adjusted by the

reinforcement signal.

Did we ever encounter

RL in our studies so far?
No.
But you are going to learn it in the second part of this course! 4
Learning without a teacher:
Unsupervised or self-organized learning
Process of learning:

1. The neural network is

fed with input.

2. The weights are adjusted based

upon the input signals only!

How can the system adjust the weights without any error or reward signal?

Did we introduce any type of unsupervised learning in this course so far?

In fact, K-means for clustering as discussed in lecture five is unsupervised learning.

Today, we are going to learn a neural network designed for unsupervised learning.
5
In MLP, we used supervised learning.
Is supervised learning biological plausible? Does our brain use
“Backpropagation” to adjust the weights?
This is not biologically plausible: In a biological system, there is
no external “teacher” who manipulates the network’s weights
from outside the network.
Biologically more adequate: unsupervised learning.
Reinforcement Learning is of course well biologically based.

We will study Self-Organizing Maps (SOMs) as another

example for unsupervised learning, which also has a sound
biological basis.
6
Feature Mapping of the Human Cortex

The neurons are well organized! Neighboring areas in these maps represent
neighboring areas in the sensory input space or motory output space.
Is this topographical organization entirely genetically programmed? 7
Self-Organizing Maps: History
In 1973, von der Malsburg studied the self-organizing property of
the visual cortex, and concluded that:
A model of the visual cortex could not be entirely genetically
predetermined; rather, a self-organizing process involving
synaptic learning may be responsible for the local ordering of
feature-sensitive cortical cells.
C. von der Malsburg The computer simulation by von der Malsburg was perhaps
(1942—present) the first to demonstrate self-organization. However, global
topographic ordering was not achieved in his model.

The SOM is now always associated with Kohonen, who

published his work on SOM in 1982, 4 years before BP.

Principle of Topographic Map Formation:

The spatial location of an output neuron in a topographic map
corresponds to a particular domain or feature of data drawn
from the input space.
Teuvo Kohonen
The neurons are spatially organized in a meaningful way!
(1934-2021) 8
The topology-conserving mapping can be achieved by SOMs:
• Two layers: input layer and output (map) layer
There is no hidden layer because SOM was not designed for
function approximation.
• Input and output layers are fully connected.
• A topology (neighborhood relation) is defined on
the output layer.
The location of the neurons in the output layer is important!
For MLP or RBFN, do we care about where the neurons are in the output
layer?
No.
Topological organization is the unique property of SOM.

9
SOM----Goal

Lattice

Visualization

The principle goal of the self-organizing map is to transform an incoming

signal pattern of arbitary dimension into a one- or two dimensional discrete
feature map, and to perform this transformation adaptively in a
topologically ordered fashion.
10
SOM – Architecture

j
2d array of neurons
wj1 wj2 wj3 wjn Synaptic Weights

x1 x2 x3 ... xn Input vector

(connected to all neurons in lattice)
For each output neuron j, there are a set of synaptic weights wji connected
from all the input neurons.
What is the purpose of these weights?
Do we use these weights to compute the outputs by McCulloch and Pitts model?
n
y j = ϕ (∑ w ji xi + b j )
i =1
No. In fact, there are no output signals produced by the output neurons!
The synaptic weights act as the underlying “codes” of the neurons!
Are the output neurons connected to each other by synaptic weights?
No. They are not connected by synaptic weights.
But they are related to each other by their locations in the 2D-map!
How to describe their locations in the map?
It can be simply described by their indices in the 2d-array.
A neuron with the index (i,j) in the 2d array, its location is simply (i,j) in the 2-d plane.
SOM –Algorithm Overview

j
2d array of neurons
wj1 wj2 wj3 wjn Weighted synapses

x1 x2 x3 ... xn Input vector

(connected to all neurons in lattice)

1. Randomly initialise all the weights.

2. Select input vector x = [x1, x2, x3, … , xn] from the training set.
3. Compare x with weights wj for each neuron j to determine winner.
4. Update winner so that it becomes more like x, together with the
winner’s neighbours.
5. Adjust parameters: learning rate & ‘neighbourhood function’.
6. Repeat from (2) until the map has converged (i.e. no noticeable changes
in the weights) or pre-defined no. of training cycles have passed.
12
(i) Randomly initialise the weight
vectors wj for all nodes j

This can be done by assigning them

small values picked from a random
number generator (such as the MATLAB
command “rand”).

(ii) Sampling: choose an input vector x from the training

set to stimulate the network
It can be chosen randomly from the training set (if the set is huge), or
one by one in a deterministic manner like that for sequential learning
for MLP. 13
j
2d array of neurons
wj1 wj2 wj3 wjn Weighted synapses

x1 x2 x3 ... xn Input vector

(connected to all neurons in lattice)

The formation of the self-organizing map:

1. Competition
2. Cooperation
3. Synaptic Adaptation

14
Competitive Process (Finding a winner)
A continuous input space of activation patterns is mapped
onto a discrete output space of neurons by a process of
competition among the neurons in the network .

Find the best-matching neuron w(x), usually the neuron whose weight
vector has smallest Euclidean distance from the input vector x
i(x) = arg minj||x – wj ||
The winning neuron is that which is in some sense ‘closest’ to the input
vector.
‘Euclidean distance’ is the straight line distance between the data points,
if they are plotted on a (multi-dimensional) graph
Euclidean distance between two vectors a and b, a = (a1,a2,…,an), b =
(b1,b2,…bn), is calculated as:

∑ (a − bi )
Euclidean distance
2
d a, b = i
i 15
The winning neuron locates the center of a topological
neighborhood of cooperating neurons.
How to define a topological neigborhood that is neurobiologically correct?
There is neurobiological evidence for lateral interaction:
A neuron that is firing tends to excite the neurons in its immediate
neighborhood more than those farther away from it.
The closer to the winner neuron, the more impact it receives.
From the biological evidence,
1. The topological neighborhood is symmetric about the maximum point (the
winner neuron).
2. The amplitude of the topological neighborhood decreases monotonically with
the increasing lateral distance.
ϕ

What is the natural choice

of the function to describe
this bell shape?
position of k
position of i 16
The typical choice of topological neighborhood function

d 2j ,i
h j ,i ( x ) = exp(− )
2σ 2

d j ,i the Euclidian distance from the neuron j to the winning neuron i

(associated with the input vector x).
The position of the neuron is described by the index of the neuron in the matrix
(2d-lattice). For instance, the distance from the neuron at position (l, m) to the one
at (n,k) is
d j ,i = (l − n) 2 + (m − k ) 2
h j ,i ( x ) is a measure of the effectiveness of the winner on its neighbors.
σ “effective width” of the topological neighborhood.
When the neuron is σ away from the winner, hj,i=exp(-0.5)=0.61, it is heavily affected.
When the neuron is 2σ away from the winner, hj,i=exp(-2)=0.14, it is less affected.
17
Another unique feature of the SOM algorithm is that the size of
the topological neighborhood shrinks with time.

What is the difference between the two Gaussians above? The effective width!
“effective width” decreases with time. A popular choice is:
Time-varying width

τ1 is a time-constant to control the decay rate of the effective width.

d 2j ,i
Time-varying neighborhood function:
h j ,i ( x ) (n) = exp(− )
2σ (n) 2

It means that the influence of the winner on its neighbors decreases with time. 18
Adaptation

What is the geometrical meaning of this algorithm? w(n+1)

∆𝑤𝑤𝑗𝑗 𝑛𝑛 = 𝑤𝑤𝑗𝑗 𝑛𝑛 + 1 − 𝑤𝑤𝑗𝑗 𝑛𝑛 = 𝜂𝜂(𝑛𝑛)ℎ𝑗𝑗,𝑖𝑖 𝑥𝑥 𝑛𝑛 (𝑥𝑥 − 𝑤𝑤𝑗𝑗 𝑛𝑛 �

The change of the weight is in the direction

w(n)
from the current weight to the input!
The synaptic weight vector wi of winning neuron i moves
toward the input vector x!
All the neurons in the neighborhood of the winning neuron also
move toward the input vector! The farther away, the less change.
Upon repeated presentations of the training data, the synaptic weight vector
tend to follow the distribution of the input vectors due to the neighborhood
updating.
The algorithm leads to a topological ordering of the feature map in the
sense that neurons that are adjacent in the lattice will tend to have similar
synaptic weight vectors. 19
Adaptation---Learning Rate

Would it converge if the learning rate is a constant?

No. Since ,, the winner neuron will always update its weights
(unless the winner matches the input exactly) although its effect on its
neighbors decreases with time.

How to make sure that the weights converge?

The learning-rate parameter η(n) should also decrease with time.

One possible choice:

τ2 is another time-constant to control the decay rate of the learning rate.

20
The first phases of the adaptive process
The adaptation of the weights is decomposed into two phases: an ordering or
self-organizing phase followed by a convergence phase.
1. Self-organizing or ordering phase
The ordering phase may take as many as 1000 iterations, and possibly more.
•The learning rate η(n) should begin with a value close to 0.1; there after it
should decrease gradually, but remain above 0.01.
n
η 0 = 0.1 τ 2 = T →η (n) = 0.1 exp(− ), n = 0,1,2,..., T
T
T is the total number of iterations for the first phase.
The neighborhood function should initially include almost all neurons, and then
shrink with time. We may set the initial size of the width equal to the “radius” of
the lattice. For instance if the size of the lattice is MxN, then the initial width can
be set as
M2 + N2
σ0 =
2
T
Correspondingly, the time constant can be chosen as τ1 =
log(σ 0 )

At the end, n =T σ (T ) = 1
The winner only affects its immediate neighbors at the end of first phase! 21
The second phase of the adaptive process

2. Convergence phase: This second phase is needed to fine tune the

feature map.

As a general rule, the number of itertations must be at least 500

times the number of neurons in the network. Thus, the convergence
phase may have to go on for thousands and possibly tens of
thousands of iterations.
For good statistical accuracy, the learning parameter η(n) should be
maintained at a small value, on the order of 0.01. In any event, it must not
be allowed to decrease to zero.

The neighborhood function should contain only the nearest

neighbors of a winning neuron, which may eventually reduce to one
or zero neighboring neurons.

In many applications, the second phase is not needed if convergence of the

parameters are not critical. 22
Summary of the Algorithm
For n-dimensional input space and m output neurons:
(1) Randomly initialize the weight vector wi for neuron i, i = 1, ..., m
(2) Sampling: choose an input vector x from the training set.
(3) Determine winner neuron k:
||wk – x|| = mini ||wi – x|| (Euclidean distance)
(4) Update all weight vectors of all neurons i in the
neighborhood of the winning neuron i(x):

(5) If convergence criterion met, STOP.

Otherwise, go to (2).
Isn’t it a simple algorithm? It is the simplest compared to MLP and RBFN.

The Kohonen’s SOM algorithm is so simple to implement, yet mathematically so

difficult to analyze its properties in a general setting. It is still a hot research topic.

23
How to use SOM?
After training the network, how to use the map?
j
Question: how do you find out whether the
wj1 wj2 wj3 wjn neurons are organized in a meaningful way?
x1 x2 x3 ... xn What does each neuron stand for?

There are many possible ways to interpret the neurons. The simplest one is just
finding out which input signal (in the training set) stimulates the particular neuron
the most.

Determine winner input signal for neuron j:

||xk – wj|| = mini ||xi – wj|| (Euclidean distance)

Mark each neuron by the particular input signal for which it produces the best response.

The resulting map is so called the “contextual map” or “semantic map”.

24
Example I: Learning a one-dimensional representation of a
two-dimensional (triangular) input space:
In this case, the topological map is one-dimensional.

0 20 100

1000 10000 25000

25
Example II: Learning a two-dimensional representation of a two-dimensional
(square) input space:
In this case, the neurons are organized in a 2-d lattice (most common way).

demo

26
How about high-
dimensional input?

Example III:
Learning a two-
dimensional mapping
of texture images

The inputs are texture

images. The SOM is
a 10x8 map.

27
Example IV: Classifying World Poverty
Helsinki University
of Technology

‘Poverty map’ based on 39 indicators from World Bank statistics (1992) 28

Example V: WEB SOM

 SOM analysis
technique to map
thousands of
articles posted on
Usenet
newsgroups

Lagus et al. (1996);

Honkela et al.
(1998) - HUT NN
Research Centre)

29
Example VI: Contextual Maps by SOM
Animal names and their attributes
Dove Hen Duck Goose Owl Hawk Eagle Fox Dog Wolf Cat Tiger Lion Horse Zebra Cow
Small 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0
is Medium 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
Big
2 legs
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
1
0
1
0
1
0
1
0
1
0
A grouping according to similarity has
has
4 legs
Hair
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 emerged
Hooves 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
Mane 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0
Feathers
Hunt
1
0
1
0
1
0
1
0
1
1
1
1
1
1
0
1
0
0
0
1
0
1
0
1
0
1
0
0
0
0
0
0
peaceful
likes Run 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0
to Fly 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0
Swim 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0

birds

SOM can really organize the data!

hunters
[Teuvo Kohonen 2001] Self-Organizing Maps; Springer; 30
Break

Alpha Go Deep Mind:

31
What is Neural Network (NN)?

A neural network is a massively parallel distributed processor made up of simple

processing unit, which has a natural propensity for storing experiential knowledge
and making it available for use.

It employs a massive inter-connection Knowledge is obtained

of “simple” computing units - neurons. from the data/input
It is capable of organizing its structure signals provided.
consists of many neurons, to perform
tasks that are many times faster than the Knowledge is learned by
fastest digital computers nowadays. adjusting the synapses!

The artificial neural networks are largely inspired by the biological neural
networks, and the ultimate goal of building an intelligent machine which can
mimic the human brain.
32
The understanding of neuron started more than 100 years ago:

Santiago Ramon y Cajal 1852-1934 33

Biological Neuron
The major job of neuron:
1. It receives information, usually in the form
of electrical pulses, from many other neurons.
2. It does what is, in effect, a complex dynamic
sum of these inputs.
3. It sends out information in the form of a stream
of electrical impulses down its axon and on to
many other neurons.

4. The connections (synapses) are crucial for

excitation or inhibition of the cells.

5. Learning is possible by adjusting the

synapses!

How to build a mathematical model of the neuron?

This is the starting point of the artificial neural networks!

The beginning of the artificial neural networks
TMcCulloch and Pitts, 1943

35
The next major step: Perceptron—single layer neural networks
Frank Rosenblatt, 1958

Supervised learning:
w(n + 1) = w(n) + ηe(n) x(n)
Frank Rosenblatt
e( n ) = d ( n ) − y ( n )
(1928-1971)
The weights were initially random. Then it could learn to perform certain simple
tasks in pattern recognition.
Rosenblatt proved that for a certain class of problems, it could learn to behave
correctly! During 1960s’, it seemed that neural networks could do anything. 36
15

Perceptron Learning Algorithm

Start with a randomly chosen weight vector w(1);
Update the weight vector to
w(n + 1) = w(n) + ηe(n) x(n)
e( n ) = d ( n ) − y ( n )

2
3 Perceptron Convergence Theorem: (Rosenblatt, 1962)

If C1 and C2 are linearly separable, then the perceptron training algorithm

“converges” in the sense that after a finite number of steps, the synaptic
weights remain unchanged and the perceptron correctly classifies all
elements of the training set.

Is there any condition on the learning-rate to make the algorithm work?

Any positive learning rate would be fine!
37
2

4 Regression Problem
Consider a multiple input-single output system whose mathematical
characterization is unknown:

Given a set of observations of input-output data:

m = dimensionality of the input space; i = time index.

How to design a model of the unknown system?

38
Optimization problem: Minimize the cost function!
What is the most common cost function?
n n
E ( w) = ∑ e(i ) 2 = ∑ (d (i ) − y (i )) 2
i =1 i =1

What is the Optimality Condition?

There are two ways to solve the problem:

If the model is simple, then directly solve

Iterative descent algorithm: Starting with an initial guess denoted by w(0),

generate a sequence of weight vectors w(1), w(2),…, such that the cost function
E(w) is reduced at each iteration of the algorithm, as shown by
E(w(n+1)) < E(w(n))

How to choose the iterative algorithm

What is the simplest way if the
w(n + 1) = w(n) + ∆w(n)
gradient is known?
such that the cost is always decreasing ? 39
Method of Steepest Descent (Gradient Descent)
w(n + 1) = w(n) + ∆w(n)
Successive adjustment applied to the weight vector w are in the direction of
steepest descent (a direction opposite to the gradient vector ∇E(w)).

Let g(n) =∇E(w(n)), steepest descent algorithm is formally described by

where η is a positive constant called the stepsize or learning-rate

Is there any condition on the learning-rate to make the algorithm work?
The rate has to be sufficiently small! 40
31

Linear Regression Problem

Consider that we are trying to fit a linear model to a set of input-output
pairs (x(1), d(1)), (x(2), d(2)) …, (x(n), d(n)) observed in an interval of
duration n.
y(x)=w1x1+w2x2+…+wnxn+b
n n
E ( w) = ∑ e(i ) 2 = ∑ (d (i ) − y (i )) 2
Cost Function: i =1 i =1

Of course, the answer can be easily found by solving directly.

Standard Linear Least Squares
w = ( X T X ) −1 X T d

The Least-Mean-Square algorithm:

e(n) = d (n) − wT (n) x(n)

w(n + 1) = w(n) + ηe(n) x(n)
Does it take the same form as that for the Perceptron?

Yes.
Does it also converge in finite steps? No. 41
1
0 The fundamental limits of Single Layer Perceptrons
For Pattern Recognition Problem: Linearly Separable

For Regression Problem: The process has to be close to a linear model!

42
Multilayer Perceptron (MLP) and Back Propagation
David Rumelhart and his colleagues, 1986

David Rumelhart (1942-present)

The hidden layer provides a nonlinear map from the original

input space to a feature space.

The nonlinearly separable problem in the input space can

become linearly separable problem in the feature space!

43
Single Layer Perceptron v.s. Multi-layer Perceptrons

Pattern Recognition Problem:

44
Single Layer Perceptron v.s. Multi-layer Perceptrons
Regression Problem:

Multi-layer Perceptrons can approximate any bounded continuous functions!

The learning algorithms are based upon the steepest descent method:
w(k + 1) = w(k ) − ηg (k )

w(n + 1) = w(n) + ηe(n) x(n)

Output Error Input Signal Local Error Input Signal

45
Signal-flow graphic representation of BP

46
How to design and train the neural network?
How many hidden layers?

Normally one hidden layer is enough.

Two or more hidden layers may be better if the target function can be clearly
decomposed into sub-functions. The number of layers depends upon the number of
levels the function can be decomposed.

How many hidden neurons?

The hidden neurons are simply building blocks.

If the geometrical shape can be perceived, then use the minimal number of line segments
(or hyper-planes) as the starting point.

For higher dimensional problem, start with a large network, then use SVD to determine
the effective number of hidden neurons.

47
How to design and train the neural network?

How to choose activation function in the hidden layers?

Hyperbolic tangent (tansig) is the preferred one in all the hidden neurons.

How to choose activation functions in the output neurons?

Logsig for pattern recogntion, purelin for regression problem.

How to pre-process the input data?

Normalize all the input variables to the same range such that the mean is close to zero.

When to use sequential leaning? And when to use batch learning ?

Batch learning with second order algorithm (such as trainlm in MATLAB) is usually
faster, but prone to local minima. Sequential learning can produce better solution, in
particular for large database with lots of redundant samples.

How to deal with over-fitting?

You either identify the minimal structure, or use regularization (trainbr in MATLAB).

48
What can MLP do?
MLP can solve regression and pattern recognition problems.
They all have the following characteristic:
There exists a function (map) between inputs x and outputs y: y=F(x)

input x output y
Map

Unfortunately, the mathematical form of this function (map) is unknown!

If the model is just too difficult to build,
or the map is too complicated to be expressed by any known simple functions,

then we can use MLP to approximate this function.

MLP is an universal approximator! It can approximate any bounded function!
All it needs is a training set:
Given a set of observations of input-output data:
{(x(1),d(1)),( x(2),d(2)), (x(3),d(3)),…,(x(N), d(N))}
Use this training data to train the MLP such that the difference between the
desired outputs and the outputs of the MLP are minimized. 49
NETtalk --Terrence Sejnowski and Charles Rosenberg in 1987.
NETtalk was created to learn how to correctly pronounce English from written
English text.

50
"Autonomous Land Vehicle In a
Neural Network” (ALVINN)
by Dean Pomerleau and Todd Jochem,
1993, CMU

The state of the art self-driving car.

51
2

Radial Basis Functions (RBFs)

The activation of a hidden unit is determined by the distance between the
input vector and a prototype vector, the center.

ϕ ( x) = ϕ (|| x − c ||) = ϕ (r )

In most cases, if the distance is zero, the activation would be maximum.

The activation level will drop off further away from the center.

52
M
y ( x) = ∑ wiϕi (|| x −µi ||) + b
i =1

|| x − µi ||2
ϕi (|| x − µi ||) = exp(− 2
)
2σ i

We know that MLP can approximate any bounded

continuous function. Can RBFN also do this?

RBFN can approximate any bounded continuous function!

It is just an existence result! It does not tell you how to obtain the weights.

Given a set of sampling points and desired outputs {(x(i), d(i)), i=1,…,N}

How to find out the parameters {wi}, and such that the cost function
N N
E ( w) = ∑ e(i ) = ∑ (d (i ) − y (i )) 2
2
is minimized?
i =1 i =1

53
20

Hybrid training of RBF networks

Two stage ‘hybrid' learning process:

(Stage 1) Parameterize hidden layer of RBFs.

- hidden unit number (M)
- centre/position
- spread/width
Use unsupervised methods as they are quick.
1. Random Selection
2. Clustering

(Stage 2) Find weight values between hidden and output units.

Aim: Minimize sum-of-squares error between actual outputs and desired responses.
N N
E ( w) = ∑ e(i ) 2 = ∑ ( y (i ) − d (i )) 2 = (Φw − d )T (Φw − d )
i =1 i =1

∂E ( w) w = (Φ T Φ ) −1 Φ T d
=0
∂w
54
26

Regularization Theory
Both MLP and RBFN can lead to to over-fitting (learn the noise present
in the training data) and result in poor generalization.

An alternative approach to cope with over-fitting comes from the theory of

regularization, which is a method of controlling the smoothness of mapping
functions.

55
Regularization Methods
Cost function:
F = E D + λE w

training error cost on smoothness

It involves adding an extra term to the error measure which penalizes

mappings that are not smooth.

For RBFN, a simple way to choose the penalty term is the size of the
weights.
1 N 1 1 1
F ( w) = ∑ ( y (i ) − d (i )) 2 + λ || w ||2 = (Φw − d )T (Φw − d ) + λwT w
2 i =1 2 2 2

w = (Φ T Φ + λI ) −1 Φ T d

λ is the regularization factor

For MLP, regularization algorithm is more involved. Trainbr in MATLAB 56
Feature Mapping of the Human Cortex

Neighboring areas in these maps represent neighboring areas in the sensory

input space or motory output space.
The map can change with learning! 57
SOM – Architecture and Algorithm Overview

j
2d array of neurons
wj1 wj2 wj3 wjn Weighted synapses

x1 x2 x3 ... xn Input vector

(connected to all neurons in lattice)

1. Randomly initialise all weights

2. Select input vector x = [x1, x2, x3, … , xn].
3. Compare x with weights wj for each neuron j to determine winner.
4. Update winner so that it becomes more like x, together with the
winner’s neighbours.
5. Adjust parameters: learning rate & ‘neighbourhood function’.
6. Repeat from (2) until the map has converged (i.e. no noticeable changes
in the weights) or pre-defined no. of training cycles have passed.
58
What lies in the future for AI?
What is happening now?
What would happen in the distant future?
It could be better or worse!
And people are worried!

After taking this course, you will be better prepared to face AI!

ANN-unit 4
No ratings yet
ANN-unit 4
25 pages
Csi 2018 Mechanical Division 15
100% (1)
Csi 2018 Mechanical Division 15
303 pages
Atomic Habits by James Clear
100% (1)
Atomic Habits by James Clear
23 pages
Self-Organizing Map Demystified - Peter Leow
No ratings yet
Self-Organizing Map Demystified - Peter Leow
71 pages
PNAL8 SelfOrganizingMaps
No ratings yet
PNAL8 SelfOrganizingMaps
35 pages
Lecture 03 - Som + LVQ
No ratings yet
Lecture 03 - Som + LVQ
46 pages
EELU ANN ITF309 Lecture 11 Spring 2024
No ratings yet
EELU ANN ITF309 Lecture 11 Spring 2024
51 pages
Ann Unit-4
No ratings yet
Ann Unit-4
14 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
49 pages
NN 4 5
No ratings yet
NN 4 5
15 pages
Chapter 9
No ratings yet
Chapter 9
37 pages
Unit 3
No ratings yet
Unit 3
32 pages
NNDL Unit2
No ratings yet
NNDL Unit2
21 pages
Self Organizing Maps
No ratings yet
Self Organizing Maps
22 pages
Kohonen Self Organizing Maps
100% (1)
Kohonen Self Organizing Maps
45 pages
Ca2 Pe-Ec702c
No ratings yet
Ca2 Pe-Ec702c
8 pages
4.self Organizing Maps 5
No ratings yet
4.self Organizing Maps 5
49 pages
MLNC Unsupervised
No ratings yet
MLNC Unsupervised
40 pages
Unsupervised Learning Handout
No ratings yet
Unsupervised Learning Handout
43 pages
Self Organizing Maps
No ratings yet
Self Organizing Maps
31 pages
Self-Organizing Maps: "Kohonen Nets" Feature Maps (A Form of Competitive Learning That Also Requires Cooperation)
No ratings yet
Self-Organizing Maps: "Kohonen Nets" Feature Maps (A Form of Competitive Learning That Also Requires Cooperation)
112 pages
Networks of Neural Computation Self-Organising Networks
No ratings yet
Networks of Neural Computation Self-Organising Networks
47 pages
Climate of India - Wikipedia
No ratings yet
Climate of India - Wikipedia
146 pages
RIS Notes Module 3 & 4
No ratings yet
RIS Notes Module 3 & 4
32 pages
Lecture 2.1.3 - Hopfield
No ratings yet
Lecture 2.1.3 - Hopfield
10 pages
Important Questions Soil
No ratings yet
Important Questions Soil
12 pages
WK6 - Self-Organising Networks
No ratings yet
WK6 - Self-Organising Networks
47 pages
Sound
100% (1)
Sound
5 pages
Lec9 Inroduction To Neural Network
No ratings yet
Lec9 Inroduction To Neural Network
36 pages
Self-Organizing Maps
No ratings yet
Self-Organizing Maps
12 pages
Softcomputing NN
No ratings yet
Softcomputing NN
84 pages
Som MJJ
No ratings yet
Som MJJ
29 pages
Unsupervised Learning Networks
No ratings yet
Unsupervised Learning Networks
80 pages
Self-Organizing Map (SOM) : Categorization Method, Neural Network Technique, Unsupervised Learning
No ratings yet
Self-Organizing Map (SOM) : Categorization Method, Neural Network Technique, Unsupervised Learning
8 pages
Susanna Eken - UK
100% (1)
Susanna Eken - UK
86 pages
CI-10 Networks Based On Competition Learning - Clustering - Kmean and SOM
No ratings yet
CI-10 Networks Based On Competition Learning - Clustering - Kmean and SOM
36 pages
Kohonen Self-Organizing Feature Map (SOM)
No ratings yet
Kohonen Self-Organizing Feature Map (SOM)
19 pages
Self Organizing Maps (SOM)
No ratings yet
Self Organizing Maps (SOM)
15 pages
Magnetic Particle Testing
80% (5)
Magnetic Particle Testing
7 pages
Unit 4 5 NN
No ratings yet
Unit 4 5 NN
15 pages
Mid 2 NN
No ratings yet
Mid 2 NN
14 pages
Self Organizing Maps (SOM)
No ratings yet
Self Organizing Maps (SOM)
8 pages
F650man I
No ratings yet
F650man I
553 pages
Metode Data Mining Som
No ratings yet
Metode Data Mining Som
22 pages
Artificial Neural Networks:: Unsupervised Learning
No ratings yet
Artificial Neural Networks:: Unsupervised Learning
37 pages
Chapter 9 Self-Organizing Maps
No ratings yet
Chapter 9 Self-Organizing Maps
8 pages
Unit 2
No ratings yet
Unit 2
21 pages
Beginners Guide To Anomaly Detection Using Self Organizing Maps
No ratings yet
Beginners Guide To Anomaly Detection Using Self Organizing Maps
10 pages
What Are The Commonly Used Activation Functions
No ratings yet
What Are The Commonly Used Activation Functions
8 pages
06 Som
No ratings yet
06 Som
39 pages
Submitted By: Alka Gupta 2011emcs01
No ratings yet
Submitted By: Alka Gupta 2011emcs01
18 pages
Talal Khan CV Civil Engineer - Planning Engineer - Dec 18
100% (1)
Talal Khan CV Civil Engineer - Planning Engineer - Dec 18
4 pages
Self Organizing Map
No ratings yet
Self Organizing Map
9 pages
Oxford Big Ideas Geography 8 Ch1 Landforms and Landscapes
0% (1)
Oxford Big Ideas Geography 8 Ch1 Landforms and Landscapes
14 pages
Self Organizing Maps: Fundamentals: Introduction To Neural Networks: Lecture 16
No ratings yet
Self Organizing Maps: Fundamentals: Introduction To Neural Networks: Lecture 16
15 pages
Testbank For Life The Science of Biology 12th Edition Hillis Solution Manual
No ratings yet
Testbank For Life The Science of Biology 12th Edition Hillis Solution Manual
18 pages
Heteroskedasticity
100% (1)
Heteroskedasticity
23 pages
MLCH9
No ratings yet
MLCH9
45 pages
Self-Organizing Map (SOM)
No ratings yet
Self-Organizing Map (SOM)
23 pages
Movement in and Out of Cells - NOTES
No ratings yet
Movement in and Out of Cells - NOTES
5 pages
Self-Organizing Map: Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology Kanpur
No ratings yet
Self-Organizing Map: Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology Kanpur
22 pages
Neural 10
No ratings yet
Neural 10
15 pages
Self-Organizing Maps (SOM)
No ratings yet
Self-Organizing Maps (SOM)
22 pages
Kohonen Self-Organizing Maps: Is The Normalization Necessary?
No ratings yet
Kohonen Self-Organizing Maps: Is The Normalization Necessary?
19 pages
Platonic Idealism: By: Dylan Isabela Jairus Marcos
No ratings yet
Platonic Idealism: By: Dylan Isabela Jairus Marcos
15 pages
Unit III Word NN
No ratings yet
Unit III Word NN
12 pages
Self Organizing Maps
No ratings yet
Self Organizing Maps
18 pages
Caterpillar Cat D7R TRACK-TYPE TRACTOR (Prefix 9HM) Service Repair Manual Instant Download
No ratings yet
Caterpillar Cat D7R TRACK-TYPE TRACTOR (Prefix 9HM) Service Repair Manual Instant Download
35 pages
Lecture10 CompetitiveLearning
No ratings yet
Lecture10 CompetitiveLearning
17 pages
Soft Computing Question Answer
No ratings yet
Soft Computing Question Answer
6 pages
Deleuze, Monet, and Being Repetitive
No ratings yet
Deleuze, Monet, and Being Repetitive
35 pages
History of Irrigation
No ratings yet
History of Irrigation
24 pages
Kohonen Self Organizing Maps Shyam Guthikonda 2
No ratings yet
Kohonen Self Organizing Maps Shyam Guthikonda 2
21 pages
Self-Organizing Map: Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology Kanpur
No ratings yet
Self-Organizing Map: Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology Kanpur
22 pages
Cell Size Configuration in Random Access Procedure (I) - Preamble Format
No ratings yet
Cell Size Configuration in Random Access Procedure (I) - Preamble Format
5 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
12 pages
Magnifico 160000334 V1 1121 LR 01
No ratings yet
Magnifico 160000334 V1 1121 LR 01
12 pages
RTWP Process
No ratings yet
RTWP Process
2 pages
Series: Always at The Primacy of Digital Imaging The Pride and Legacy of Fujifilm
0% (1)
Series: Always at The Primacy of Digital Imaging The Pride and Legacy of Fujifilm
4 pages
Theory & Application of Psycho-Oncology
No ratings yet
Theory & Application of Psycho-Oncology
56 pages
Tress:: A Common Point For Counselling
No ratings yet
Tress:: A Common Point For Counselling
7 pages
Clue Dinner Theater Script
No ratings yet
Clue Dinner Theater Script
45 pages
Brewers Fayre Main Menu Band4
No ratings yet
Brewers Fayre Main Menu Band4
8 pages
Cat525c PDF
100% (1)
Cat525c PDF
20 pages
Essay 3
No ratings yet
Essay 3
14 pages
Bacillus Polyfermenticus
No ratings yet
Bacillus Polyfermenticus
6 pages
GR 10 GRAPHS MIXED EXERCISE Mathematics
No ratings yet
GR 10 GRAPHS MIXED EXERCISE Mathematics
5 pages
CTP 3R
No ratings yet
CTP 3R
6 pages
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
From Everand
Feedforward Neural Networks: Fundamentals and Applications for The Architecture of Thinking Machines and Neural Webs
Fouad Sabry
No ratings yet
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
From Everand
Bio Inspired Computing: Fundamentals and Applications for Biological Inspiration in the Digital World
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.