0% found this document useful (0 votes)
11 views39 pages

Artificial Neural Networks and Deep Learning

The document provides an in-depth overview of Artificial Neural Networks (ANNs) and Deep Learning, covering various types of networks including Radial Basis Function Networks (RBFNs), Multi-layer Perceptrons, and Self-Organizing Maps. It discusses the structure, operation, training methods, and applications of these networks, along with mathematical formulations and examples. Additionally, it addresses initialization techniques and clustering methods for optimizing RBFNs.

Uploaded by

mahsa.kh.1980
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views39 pages

Artificial Neural Networks and Deep Learning

The document provides an in-depth overview of Artificial Neural Networks (ANNs) and Deep Learning, covering various types of networks including Radial Basis Function Networks (RBFNs), Multi-layer Perceptrons, and Self-Organizing Maps. It discusses the structure, operation, training methods, and applications of these networks, along with mathematical formulations and examples. Additionally, it addresses initialization techniques and clustering methods for optimizing RBFNs.

Uploaded by

mahsa.kh.1980
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Artificial N eu r al Networks

and Deep Lear n in g

1
Contents

• Introduction
Motivation, Biological Background

• Th res h o l d L o g i c U n i t s
Definition, Geometric Interpretation, Limitations, Networks of T L U s , Training

• General N e u r a l Networks
Structure, Operation, Training

• M u l t i - layer Perceptrons
Definition, Function Approximation, Gradient Descent, Backpropagation, Variants, Sensitivity Analysis

• Deep Learn i n g
Many-layered Perceptrons, Rectified Linear Units, Auto-Encoders, Feature Construction, Image Analysis

• R a d i a l B a s i s Fu n ct ion Networks
Definition, Function Approximation, Initialization, Training, Generalized Version

• Self-Org an i zi n g Map s
Definition, Learning Vector Quantization, Neighborhood of Output Neurons

• Hopfield Networks and B o l t z m a n n Machines


Definition, Convergence, Associative Memory, Solving Optimization Problems, Probabilistic Models

• Recu rren t N e u r a l Networks


Differential Equations, Vector Networks, Backpropagation through Time

2
R a d i a l Bas is Function Networks

205
R a d i a l B a si s Function Networks

A radial basis function network ( R B F N ) is a neural network with


a graph G = (U, C ) that satisfies the following conditions

(i) Uin ∩ Uout = ∅,


( i i ) C = (Uin × Uhidden) ∪ C ′ , C ′ ⊆ (Uhidden × Uout)
The network input function of each hidden neuron is a distance function
of the input vector and the weight vector, that is,

where d : IR n × IRn → IR+0 is a function satisfying

(symmetry),
(triangle inequality).

206
R a d i a l B a si s Function Networks

206
Distance Functions

Illustration of distance functions: Mi n k o wsk i F a mi l y

Well-known special cases from this family are:

k= 1: Manhattan or city block distance,


k= 2: Euclidean distance,
k → ∞ : maximum distance, that is,

k= 1 k= 2 k→∞

207
R a d i a l B a si s Function Networks

The network input function of the output neurons is the weighted sum of their inputs:
(u) ⊤→
∀u ∈ Uout : →u, i→
f net (w nu) = →
w u inu = Σ wuv outv .
v∈pred ( u )

The activation function of each hidden neuron is a so-called radial function, that
is, a monotonically decreasing function

f : IR+0 → [0, 1] with f (0) = 1 and lim f ( x ) = 0.


x →∞

The activation function of each output neuron is a linear function, namely


(u)
f act (net u , θu ) = netu −θ u .

208
R a d i a l A c t i v a t io n Functions

rectangle function: triangle function:

1 1

net net
0 0
0 σ 0 σ

cosine until zero: Gaussian function:

1 1

—1
1 e 2
2

net e−2 net


0 0
0 σ 2σ 0 σ 2σ

209
R a d i a l B a si s Function Networks: E x a mp l e s

R a d i a l basis function networks for the conjunction x 1 ∧ x 2

x1 1
1
1 x2
1
2 0 y
0
x2 1
0 x1 1

x1 1
0
−1 x2
6
5
−1 y
0
x2 0
0 x1 1

210
R a d i a l B a si s Function Networks: E x a mp l e s

R a d i a l basis function networks for the biimplication x 1 x2

Idea: logical decomposition

x1 x2 ≡ ( x 1 ∧ x 2 ) ∨ ¬(x1 ∨ x 2 )

x1 1
1
1 2 1
1 x2
0 y
0 0
x2 0 1 1
2
0 x1 1

211
R a d i a l B a s i s Function Networks: Function A p p r o x i ma t i o n

y y
y4 y4
y3 y3

y2 y2
y1 y1

x x
x1 x2 x3 x4 x1 x2 x3 x4

1 ·y4
Approximation of a function 0
by rectangular pulses, each of 1 ·y3
0
which can be represented by a 1 ·y2
neuron of an radial basis function 0
1 ·y1
network. 0

212
R a d i a l B a s i s Function Networks: Function A p p r o x i ma t i o n

x1 y1

x2 σ y2

x 0 y

x3 σ y3
x4 y4

σ = 12∆ x = i +1 —x i
1
σ 2
(x )

A radial basis function network that computes the step function on the preceding slide
and the piecewise linear function on the next slide (depends on activation function).

213
R a d i a l B a s i s Function Networks: Function A p p r o x i ma t i o n

y y
y4 y4
y3 y3

y2 y2
y1 y1

x x
x1 x2 x3 x4 x1 x2 x3 x4

Approximation of a function by
1
0
·y4
triangular pulses, each of which 1
0
·y3
can be represented by a neuron of 1 ·y2
an radial basis function network. 0
1
0
·y1

214
R a d i a l B a s i s Function Networks: Function A p p r o x i ma t i o n

y y
2 2

1 1

x x
0 0
2 4 6 8 2 4 6 8

−1 −1

1
0
·w1
Approximation of a function by Gaussian
1
functions with radius σ = 1. It is w1 = 1, 0
·w2
w2 = 3 and w3 = −2. 1
0
·w3

215
R a d i a l B a s i s Function Networks: Function A p p r o x i ma t i o n

R a d i a l basis function network for a s u m of three Gau s s ian functions

1
2 1

x 5 3 y
1 0

6 −2

• The weights of the connections from the input neuron to the hidden neurons
determine the locations of the Gaussian functions.

• The weights of the connections from the hidden neurons to the output neuron
determine the height/direction (upward or downward) of the Gaussian functions.

216
Tr a in in g R a d i a l Bas is Function Networks

217
R a d i a l B a s i s Function Networks: Initialization

Let L fixed = { l 1 , . . . , l m } be a fixed learning task, consisting of m


training patterns

Si mp l e radial basis function network:


One hidden neuron v k , k = 1, . . . , m, for each training pattern:

∀k ∈ {1 , . . . , m }
:
If the activation function is the Gaussian function,
the radii σ k are chosen heuristically

∀k ∈ {1 , . . . , m } :

where

218
R a d i a l B a s i s Function Networks: Initialization

Initializing the connections from the hidden to the output neurons


m
∀u : Σ wuvm out (vlm) −θ u = ou(l) or abbreviated
k =1

( l1 ) ( lm ) ⊤
where →
o u = (ou , . . . , ou ) is the vector of desired outputs, θ u = 0, and

This is a linear equation system, that can be solved by inverting the matrix A :

219
R B F N Initialization: E x a m p l e

Si mp l e radial basis function network for the biimplication x 1 x2

1
2

0
0
w1
x1 x2 y x1 1 1
2 w2
0 0 1 0
1 0 0 0 y
0 1 0 0
1 1 1 x2 1 w3
1 2
w4
1
1
1
2

220
R B F N Initialization: E x a m p l e

Si mp l e radial basis function network for the biimplication x 1 x2

where

221
R B F N Initialization: E x a m p l e

Si mp l e radial basis function network for the biimplication x 1 x2


act

act

single basis all basis functions output


function

• Initialization leads already to a perfect solution of the learning task.


• Subsequent training is not necessary.

222
R a d i a l B a s i s Function Networks: Initialization

N o r m a l radial basis function networks:


Select subset of k training patterns as centers.

Compute (Moore–Penrose) pseudo inverse:

A + = (A ⊤ A) −1 A ⊤ .
The weights can then be computed by

223
R B F N Initialization: E x a m p l e

N o r m a l radial basis function network for the biimplication x 1 x2

Select two training patterns:


• l1 =
• l4 =

x1 1
1 2 w1
1
θ y
0
x2 0 1 w2
2

224
R B F N Initialization: E x a m p l e

N o r m a l radial basis function network for the biimplication x 1 x2

where
a ≈ −0.1810, b ≈ 0.6810,
c ≈ 1.1781, d ≈ −0.6688, e ≈ 0.1594.

Resulting weights:

225
R B F N Initialization: E x a m p l e

N o r m a l radial basis function network for the biimplication x 1 x2

(1, 0)
act

act

basis function (0,0) basis function (1,1) output

• Initialization leads already to a perfect solution of the learning task.


• This is an accident, because the linear equation system is not over-determined,
due to linearly dependent equations.

226
R a d i a l B a s i s Function Networks: Initialization

H o w to choose the radial basis function centers?

• Use all data points as centers for the radial basis functions.
◦ Advantages: Only radius and output weights need to be determined; desired
output values can be achieved exactly (unless there are inconsistencies).
◦ Disadvantage: Often much too many radial basis functions; computing the
weights to the output neuron via a pseudo-inverse can become infeasible.

• Use a random subset of data points as centers for the radial basis functions.
◦ Advantages: Fast; only radius and output weights need to be determined.
◦ Disadvantages: Performance depends heavily on the choice of data points.

• Use the result of clustering as centers for the radial basis functions, e.g.
◦ c-means clustering (on the next slides)
◦ Learning vector quantization (to be discussed later)

227
R B F N Initialization: c -means Clustering

• Choose a number c of clusters to be found (user input).

• Initialize the cluster centers randomly


(for instance, by randomly selecting c data points).

• D a t a point assignment:
Assign each data point to the cluster center that is closest to it
(that is, closer than any other cluster center).

• C l u s t e r center update:
Compute new cluster centers as the mean vectors of the assigned data points.
(Intuitively: center of gravity if each data point has unit weight.)

• Repeat these two steps (data point assignment and cluster center update)
until the clusters centers do not change anymore.

It can be shown that this scheme must converge,


that is, the update of the cluster centers cannot go on forever.

228
c -Means Clustering: E x a m p l e

Data set to cluster. Initial position of cluster centers.


Choose c = 3 clusters. Randomly selected data points.
(From visual inspection, can be (Alternative methods include
difficult to determine in general.) e.g. latin hypercube sampling)

229
Delaunay Triangulations and Voronoi D i a g r a ms

• Dots represent cluster centers.

• Left: Del au n ay Triangulation


The circle through the corners of a triangle does not contain another point.

• Right: Voronoi D i a g r a m / Tesselation


Midperpendiculars of the Delaunay triangulation: boundaries of the regions
of points that are closest to the enclosed cluster center (Voronoi cells).

230
Delaunay Triangulations and Voronoi D i a g r a ms

• Del au n ay Triangulation: simple triangle (shown in gray on the left)


• Voronoi D i a g r a m: midperpendiculars of the triangle’s edges
(shown in blue on the left, in gray on the right)

231
c -Means Clustering: E x a m p l e

232
R a d i a l B a si s Function Networks: T r a i n i n g

T r a i n i n g radial basis function networks:


Derivation of update rules is analogous to that of multi-layer perceptrons.
Weights from the hidden to the output neurons.

Gradient:

Weight update rule:

Typical learning rate: η3 ≈ 0.001.

(Two more learning rates are needed for the center coordinates and the radii.)

233
R a d i a l B a si s Function Networks: T r a i n i n g

T r a i n i n g radial basis function networks:


Center coordinates (weights from the input to the hidden neurons).

Gradient:

Weight update rule:

Typical learning rate: η1 ≈ 0.02.

234
R a d i a l B a si s Function Networks: T r a i n i n g

T r a i n i n g radial basis function networks:


Center coordinates (weights from the input to the hidden neurons).

Special case: Eu cl i d ean distance

Special case: Gau s s i an activation function

235
R a d i a l B a si s Function Networks: T r a i n i n g

T r a i n i n g radial basis function networks:


Radii of radial basis functions.
Gradient:

Weight update rule:

Typical learning rate: η2 ≈ 0.01.

236
R a d i a l B a si s Function Networks: T r a i n i n g

T r a i n i n g radial basis function networks:


Radii of radial basis functions.

Special case: Gau s s i an activation function

(The distance function is irrelevant for the radius update,


since it only enters the network input function.)

237
R a d i a l B a s i s Function Networks: Generalization

Generalization of the distance function

Idea: Use anisotropic (direction dependent) distance function.

Example: Mahalanobis distance

Example: biimplication

x1 1
2 1
1 1
0 y x2
3

1
0
x2 2

0 x1 1

238
Application: Recognition of H a n d w r i t t e n D i g i t s

• C o mp a r i s o n of various classifiers:
◦ Nearest Neighbor (1NN) ◦ Learning Vector Quantization (LVQ)
◦ Decision Tree (C4.5) ◦ Radial Basis Function Network (RBF)
◦ Multi-Layer Perceptron (MLP) ◦ Support Vector Machine (SVM)
• Di s t i n ct i on of the number of R B F training phases:
◦ 1 phase: find output connection weights e.g. with pseudo-inverse.
◦ 2 phase: find R B F centers e.g. with some clustering plus 1 phase.
◦ 3 phase: 2 phase plus error backpropagation training.

• Initialization of radial basis function centers:


◦ Random choice of data points
◦ c-means Clustering
◦ Learning Vector Quantization
◦ Decision Tree (one R B F center per leaf)

240
Application: Recognition of H a n d w r i t t e n D i g i t s

Classification results:

Classifier Accuracy • LVQ: 200 vectors


Nearest Neighbor (1NN) 97.68% (20 per class)
Learning Vector Quantization (LVQ) 96.99% C4.5: 505 leaves
Decision Tree (C4.5) 91.12% c-means: 60 centers(?)
2-Phase-R B F (data points) 95.24% (6 per class)
2-Phase-R B F (c-means) 96.94% SVM 10 classifiers,
2-Phase-R B F (LVQ) 95.86% : ≈ 4200 vectors
2-Phase-R B F (C4.5) 92.72% MLP 1 hidden layer
3-Phase-R B F (data points) 97.23% : with 200 neurons
3-Phase-R B F (c-means) 98.06% • Results are medians
3-Phase-R B F (LVQ) 98.49% of three training/test runs.
3-Phase-R B F (C4.5) 94.83%
Support Vector Machine (SVM) 98.76% • Error backpropagation
Multi-Layer Perceptron (MLP) 97.59% improves R B F
results.

245

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy