Artificial Neural Networks and Deep Learning
Artificial Neural Networks and Deep Learning
1
Contents
• Introduction
Motivation, Biological Background
• Th res h o l d L o g i c U n i t s
Definition, Geometric Interpretation, Limitations, Networks of T L U s , Training
• General N e u r a l Networks
Structure, Operation, Training
• M u l t i - layer Perceptrons
Definition, Function Approximation, Gradient Descent, Backpropagation, Variants, Sensitivity Analysis
• Deep Learn i n g
Many-layered Perceptrons, Rectified Linear Units, Auto-Encoders, Feature Construction, Image Analysis
• R a d i a l B a s i s Fu n ct ion Networks
Definition, Function Approximation, Initialization, Training, Generalized Version
• Self-Org an i zi n g Map s
Definition, Learning Vector Quantization, Neighborhood of Output Neurons
2
R a d i a l Bas is Function Networks
205
R a d i a l B a si s Function Networks
(symmetry),
(triangle inequality).
206
R a d i a l B a si s Function Networks
206
Distance Functions
k= 1 k= 2 k→∞
207
R a d i a l B a si s Function Networks
The network input function of the output neurons is the weighted sum of their inputs:
(u) ⊤→
∀u ∈ Uout : →u, i→
f net (w nu) = →
w u inu = Σ wuv outv .
v∈pred ( u )
The activation function of each hidden neuron is a so-called radial function, that
is, a monotonically decreasing function
208
R a d i a l A c t i v a t io n Functions
1 1
net net
0 0
0 σ 0 σ
1 1
—1
1 e 2
2
209
R a d i a l B a si s Function Networks: E x a mp l e s
x1 1
1
1 x2
1
2 0 y
0
x2 1
0 x1 1
x1 1
0
−1 x2
6
5
−1 y
0
x2 0
0 x1 1
210
R a d i a l B a si s Function Networks: E x a mp l e s
x1 x2 ≡ ( x 1 ∧ x 2 ) ∨ ¬(x1 ∨ x 2 )
x1 1
1
1 2 1
1 x2
0 y
0 0
x2 0 1 1
2
0 x1 1
211
R a d i a l B a s i s Function Networks: Function A p p r o x i ma t i o n
y y
y4 y4
y3 y3
y2 y2
y1 y1
x x
x1 x2 x3 x4 x1 x2 x3 x4
1 ·y4
Approximation of a function 0
by rectangular pulses, each of 1 ·y3
0
which can be represented by a 1 ·y2
neuron of an radial basis function 0
1 ·y1
network. 0
212
R a d i a l B a s i s Function Networks: Function A p p r o x i ma t i o n
x1 y1
x2 σ y2
x 0 y
x3 σ y3
x4 y4
σ = 12∆ x = i +1 —x i
1
σ 2
(x )
A radial basis function network that computes the step function on the preceding slide
and the piecewise linear function on the next slide (depends on activation function).
213
R a d i a l B a s i s Function Networks: Function A p p r o x i ma t i o n
y y
y4 y4
y3 y3
y2 y2
y1 y1
x x
x1 x2 x3 x4 x1 x2 x3 x4
Approximation of a function by
1
0
·y4
triangular pulses, each of which 1
0
·y3
can be represented by a neuron of 1 ·y2
an radial basis function network. 0
1
0
·y1
214
R a d i a l B a s i s Function Networks: Function A p p r o x i ma t i o n
y y
2 2
1 1
x x
0 0
2 4 6 8 2 4 6 8
−1 −1
1
0
·w1
Approximation of a function by Gaussian
1
functions with radius σ = 1. It is w1 = 1, 0
·w2
w2 = 3 and w3 = −2. 1
0
·w3
215
R a d i a l B a s i s Function Networks: Function A p p r o x i ma t i o n
1
2 1
x 5 3 y
1 0
6 −2
• The weights of the connections from the input neuron to the hidden neurons
determine the locations of the Gaussian functions.
• The weights of the connections from the hidden neurons to the output neuron
determine the height/direction (upward or downward) of the Gaussian functions.
216
Tr a in in g R a d i a l Bas is Function Networks
217
R a d i a l B a s i s Function Networks: Initialization
∀k ∈ {1 , . . . , m }
:
If the activation function is the Gaussian function,
the radii σ k are chosen heuristically
∀k ∈ {1 , . . . , m } :
where
218
R a d i a l B a s i s Function Networks: Initialization
( l1 ) ( lm ) ⊤
where →
o u = (ou , . . . , ou ) is the vector of desired outputs, θ u = 0, and
This is a linear equation system, that can be solved by inverting the matrix A :
219
R B F N Initialization: E x a m p l e
1
2
0
0
w1
x1 x2 y x1 1 1
2 w2
0 0 1 0
1 0 0 0 y
0 1 0 0
1 1 1 x2 1 w3
1 2
w4
1
1
1
2
220
R B F N Initialization: E x a m p l e
where
221
R B F N Initialization: E x a m p l e
act
222
R a d i a l B a s i s Function Networks: Initialization
A + = (A ⊤ A) −1 A ⊤ .
The weights can then be computed by
223
R B F N Initialization: E x a m p l e
x1 1
1 2 w1
1
θ y
0
x2 0 1 w2
2
224
R B F N Initialization: E x a m p l e
where
a ≈ −0.1810, b ≈ 0.6810,
c ≈ 1.1781, d ≈ −0.6688, e ≈ 0.1594.
Resulting weights:
225
R B F N Initialization: E x a m p l e
(1, 0)
act
act
226
R a d i a l B a s i s Function Networks: Initialization
• Use all data points as centers for the radial basis functions.
◦ Advantages: Only radius and output weights need to be determined; desired
output values can be achieved exactly (unless there are inconsistencies).
◦ Disadvantage: Often much too many radial basis functions; computing the
weights to the output neuron via a pseudo-inverse can become infeasible.
• Use a random subset of data points as centers for the radial basis functions.
◦ Advantages: Fast; only radius and output weights need to be determined.
◦ Disadvantages: Performance depends heavily on the choice of data points.
• Use the result of clustering as centers for the radial basis functions, e.g.
◦ c-means clustering (on the next slides)
◦ Learning vector quantization (to be discussed later)
227
R B F N Initialization: c -means Clustering
• D a t a point assignment:
Assign each data point to the cluster center that is closest to it
(that is, closer than any other cluster center).
• C l u s t e r center update:
Compute new cluster centers as the mean vectors of the assigned data points.
(Intuitively: center of gravity if each data point has unit weight.)
• Repeat these two steps (data point assignment and cluster center update)
until the clusters centers do not change anymore.
228
c -Means Clustering: E x a m p l e
229
Delaunay Triangulations and Voronoi D i a g r a ms
230
Delaunay Triangulations and Voronoi D i a g r a ms
231
c -Means Clustering: E x a m p l e
232
R a d i a l B a si s Function Networks: T r a i n i n g
Gradient:
(Two more learning rates are needed for the center coordinates and the radii.)
233
R a d i a l B a si s Function Networks: T r a i n i n g
Gradient:
234
R a d i a l B a si s Function Networks: T r a i n i n g
235
R a d i a l B a si s Function Networks: T r a i n i n g
236
R a d i a l B a si s Function Networks: T r a i n i n g
237
R a d i a l B a s i s Function Networks: Generalization
Example: biimplication
x1 1
2 1
1 1
0 y x2
3
1
0
x2 2
0 x1 1
238
Application: Recognition of H a n d w r i t t e n D i g i t s
• C o mp a r i s o n of various classifiers:
◦ Nearest Neighbor (1NN) ◦ Learning Vector Quantization (LVQ)
◦ Decision Tree (C4.5) ◦ Radial Basis Function Network (RBF)
◦ Multi-Layer Perceptron (MLP) ◦ Support Vector Machine (SVM)
• Di s t i n ct i on of the number of R B F training phases:
◦ 1 phase: find output connection weights e.g. with pseudo-inverse.
◦ 2 phase: find R B F centers e.g. with some clustering plus 1 phase.
◦ 3 phase: 2 phase plus error backpropagation training.
240
Application: Recognition of H a n d w r i t t e n D i g i t s
Classification results:
245