I2ml3e Chap11
I2ml3e Chap11
INTRODUCTION
TO
MACHINE
LEARNING
3RD EDITION
ETHEM ALPAYDIN
© The MIT Press, 2014
alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 11:
MULTILAYER PERCEPTRONS
Neural Networks
3
d
y w j x j w0 w T x
j 1
(Rosenblatt, 1962)
What a Perceptron Does
6
y y
s y
w0 w0
w w
x
w0
x x
x0=+1
1
y sigmoido
1 exp wT x
Regression:
K Outputs d
yi wij x j wi 0 w Ti x
7 j 1
y Wx
Classification:
oi w Ti x
exp oi
yi
k exp ok
choose C i
if y i max y k
k
Training
8
t t t 1 t
2
E w | x , r r y r w x
t 2 1 t
2
T t 2
w tj r t y t x tj
9
Classification
10
k exp w T t
kx i
H
y i v Ti z v ih zh v i 0
h 1
zh sigmoidw Th x
1
1 exp d
j 1
whj x j wh 0
H
y i v z v ih zh v i 0
T
i
h 1
zh sigmoidw Th x
1
1 exp
d
j 1
whj x j wh 0
E E y i zh
whj y i zh whj
E W, v | X r y
1 t 2
Regression
t
2 t
vh r t y t zht
H
y v z v0
t t
h h t
h 1
Backward
E
Forward whj
whj
zh sigmoidw x T
h
E y t zht
t t
t y z h w hj
r t y t v h zht 1 zht x tj
t
x r t y t v h zht 1 zht x tj
t
16
Regression with Multiple Outputs
17
yi
E W ,V | X ri y i
1 t t 2
2 t i vih
H
y it v ih zht v i 0
h 1 zh
v ih rit y it zht whj
t
xj
t
whj ri y i v ih zh 1 zht x tj
t t
t i
18
19
whx+w0
vhzh
zh
20
Two-Class Discrimination
21
h1
E W , v | X r t log y t 1 r t log 1 y t
t
v h r t y t zht
t
exp
H t
o
oit v ih zht v i 0 y it i
P C | x t
k exp okt i
h 1
v ih rit y it zht
t
t
whj ri y i v ih zh 1 zht x tj
t t
t i
Multiple Hidden Layers
23
l 1
Improving Convergence
24
Momentum
E t
w
t
wit 1
wi
i
a if E t E t
b otherwise
Overfitting/Overtraining
25
Destructive Constructive
Weight decay: Growing networks
E
wi w i
wi
E' E
2
i
i
w 2
Autoencoder networks
33
Learning Time
34
Applications:
Sequence recognition: Speech recognition
Sequence reproduction: Time-series prediction
Sequence association
Network architectures
Time-delay networks (Waibel et al., 1989)
Recurrent networks (Rumelhart et al., 1986)
Time-Delay Neural Networks
35
Recurrent Networks
36
Unfolding in Time
37
Deep Networks
38