Neural Networks Six
Neural Networks Six
Self-Organizing Maps
Xiang Cheng
Associate Professor
Department of Electrical & Computer Engineering
The National University of Singapore
Learning
Process of learning:
2
Learning with a teacher:
Supervised Learning
Process of learning:
How can the system adjust the weights without any error or reward signal?
The neurons are well organized! Neighboring areas in these maps represent
neighboring areas in the sensory input space or motory output space.
Is this topographical organization entirely genetically programmed? 7
Self-Organizing Maps: History
In 1973, von der Malsburg studied the self-organizing property of
the visual cortex, and concluded that:
A model of the visual cortex could not be entirely genetically
predetermined; rather, a self-organizing process involving
synaptic learning may be responsible for the local ordering of
feature-sensitive cortical cells.
C. von der Malsburg The computer simulation by von der Malsburg was perhaps
(1942—present) the first to demonstrate self-organization. However, global
topographic ordering was not achieved in his model.
9
SOM----Goal
Lattice
Visualization
j
2d array of neurons
wj1 wj2 wj3 wjn Synaptic Weights
j
2d array of neurons
wj1 wj2 wj3 wjn Weighted synapses
14
Competitive Process (Finding a winner)
A continuous input space of activation patterns is mapped
onto a discrete output space of neurons by a process of
competition among the neurons in the network .
Find the best-matching neuron w(x), usually the neuron whose weight
vector has smallest Euclidean distance from the input vector x
i(x) = arg minj||x – wj ||
The winning neuron is that which is in some sense ‘closest’ to the input
vector.
‘Euclidean distance’ is the straight line distance between the data points,
if they are plotted on a (multi-dimensional) graph
Euclidean distance between two vectors a and b, a = (a1,a2,…,an), b =
(b1,b2,…bn), is calculated as:
∑ (a − bi )
Euclidean distance
2
d a, b = i
i 15
The winning neuron locates the center of a topological
neighborhood of cooperating neurons.
How to define a topological neigborhood that is neurobiologically correct?
There is neurobiological evidence for lateral interaction:
A neuron that is firing tends to excite the neurons in its immediate
neighborhood more than those farther away from it.
The closer to the winner neuron, the more impact it receives.
From the biological evidence,
1. The topological neighborhood is symmetric about the maximum point (the
winner neuron).
2. The amplitude of the topological neighborhood decreases monotonically with
the increasing lateral distance.
ϕ
d 2j ,i
h j ,i ( x ) = exp(− )
2σ 2
What is the difference between the two Gaussians above? The effective width!
“effective width” decreases with time. A popular choice is:
Time-varying width
It means that the influence of the winner on its neighbors decreases with time. 18
Adaptation
No. Since ,, the winner neuron will always update its weights
(unless the winner matches the input exactly) although its effect on its
neighbors decreases with time.
At the end, n =T σ (T ) = 1
The winner only affects its immediate neighbors at the end of first phase! 21
The second phase of the adaptive process
23
How to use SOM?
After training the network, how to use the map?
j
Question: how do you find out whether the
wj1 wj2 wj3 wjn neurons are organized in a meaningful way?
x1 x2 x3 ... xn What does each neuron stand for?
There are many possible ways to interpret the neurons. The simplest one is just
finding out which input signal (in the training set) stimulates the particular neuron
the most.
Mark each neuron by the particular input signal for which it produces the best response.
0 20 100
demo
26
How about high-
dimensional input?
Example III:
Learning a two-
dimensional mapping
of texture images
27
Example IV: Classifying World Poverty
Helsinki University
of Technology
SOM analysis
technique to map
thousands of
articles posted on
Usenet
newsgroups
29
Example VI: Contextual Maps by SOM
Animal names and their attributes
Dove Hen Duck Goose Owl Hawk Eagle Fox Dog Wolf Cat Tiger Lion Horse Zebra Cow
Small 1 1 1 1 1 1 0 0 0 0 1 0 0 0 0 0
is Medium 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0
Big
2 legs
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
1
0
1
0
1
0
1
0
1
0
A grouping according to similarity has
has
4 legs
Hair
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1 emerged
Hooves 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
Mane 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0
Feathers
Hunt
1
0
1
0
1
0
1
0
1
1
1
1
1
1
0
1
0
0
0
1
0
1
0
1
0
1
0
0
0
0
0
0
peaceful
likes Run 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0
to Fly 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0
Swim 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
birds
31
What is Neural Network (NN)?
The artificial neural networks are largely inspired by the biological neural
networks, and the ultimate goal of building an intelligent machine which can
mimic the human brain.
32
The understanding of neuron started more than 100 years ago:
35
The next major step: Perceptron—single layer neural networks
Frank Rosenblatt, 1958
Supervised learning:
w(n + 1) = w(n) + ηe(n) x(n)
Frank Rosenblatt
e( n ) = d ( n ) − y ( n )
(1928-1971)
The weights were initially random. Then it could learn to perform certain simple
tasks in pattern recognition.
Rosenblatt proved that for a certain class of problems, it could learn to behave
correctly! During 1960s’, it seemed that neural networks could do anything. 36
15
2
3 Perceptron Convergence Theorem: (Rosenblatt, 1962)
4 Regression Problem
Consider a multiple input-single output system whose mathematical
characterization is unknown:
38
Optimization problem: Minimize the cost function!
What is the most common cost function?
n n
E ( w) = ∑ e(i ) 2 = ∑ (d (i ) − y (i )) 2
i =1 i =1
Yes.
Does it also converge in finite steps? No. 41
1
0 The fundamental limits of Single Layer Perceptrons
For Pattern Recognition Problem: Linearly Separable
42
Multilayer Perceptron (MLP) and Back Propagation
David Rumelhart and his colleagues, 1986
43
Single Layer Perceptron v.s. Multi-layer Perceptrons
44
Single Layer Perceptron v.s. Multi-layer Perceptrons
Regression Problem:
46
How to design and train the neural network?
How many hidden layers?
Two or more hidden layers may be better if the target function can be clearly
decomposed into sub-functions. The number of layers depends upon the number of
levels the function can be decomposed.
If the geometrical shape can be perceived, then use the minimal number of line segments
(or hyper-planes) as the starting point.
For higher dimensional problem, start with a large network, then use SVD to determine
the effective number of hidden neurons.
47
How to design and train the neural network?
48
What can MLP do?
MLP can solve regression and pattern recognition problems.
They all have the following characteristic:
There exists a function (map) between inputs x and outputs y: y=F(x)
input x output y
Map
50
"Autonomous Land Vehicle In a
Neural Network” (ALVINN)
by Dean Pomerleau and Todd Jochem,
1993, CMU
51
2
ϕ ( x) = ϕ (|| x − c ||) = ϕ (r )
52
M
y ( x) = ∑ wiϕi (|| x −µi ||) + b
i =1
|| x − µi ||2
ϕi (|| x − µi ||) = exp(− 2
)
2σ i
Given a set of sampling points and desired outputs {(x(i), d(i)), i=1,…,N}
How to find out the parameters {wi}, and such that the cost function
N N
E ( w) = ∑ e(i ) = ∑ (d (i ) − y (i )) 2
2
is minimized?
i =1 i =1
53
20
∂E ( w) w = (Φ T Φ ) −1 Φ T d
=0
∂w
54
26
Regularization Theory
Both MLP and RBFN can lead to to over-fitting (learn the noise present
in the training data) and result in poor generalization.
55
Regularization Methods
Cost function:
F = E D + λE w
For RBFN, a simple way to choose the penalty term is the size of the
weights.
1 N 1 1 1
F ( w) = ∑ ( y (i ) − d (i )) 2 + λ || w ||2 = (Φw − d )T (Φw − d ) + λwT w
2 i =1 2 2 2
w = (Φ T Φ + λI ) −1 Φ T d
j
2d array of neurons
wj1 wj2 wj3 wjn Weighted synapses
After taking this course, you will be better prepared to face AI!
59