0% found this document useful (0 votes)

66 views38 pages

I2ml3e Chap11

The document is a set of lecture slides on multilayer perceptrons. It contains 23 slides covering topics such as: - How multilayer perceptrons work with multiple hidden layers and outputs - The backpropagation algorithm for training multilayer perceptrons using gradient descent - How backpropagation calculates the gradients to update weights in the network to minimize error for problems like regression and classification.

Uploaded by

EMS Metalworking Machinery

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views38 pages

I2ml3e Chap11

Uploaded by

EMS Metalworking Machinery

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Lecture Slides for

INTRODUCTION
TO
MACHINE
LEARNING
3RD EDITION
ETHEM ALPAYDIN
© The MIT Press, 2014

alpaydin@boun.edu.tr
http://www.cmpe.boun.edu.tr/~ethem/i2ml3e
CHAPTER 11:

MULTILAYER PERCEPTRONS
Neural Networks
3

 Networks of processing units (neurons) with

connections (synapses) between them
 Large number of neurons: 10 10
 Large connectitivity: 105
 Parallel processing
 Distributed computation/memory
 Robust to noise, failures
Understanding the Brain
4

 Levels of analysis (Marr, 1982)

1. Computational theory
2. Representation and algorithm
3. Hardware implementation
 Reverse engineering: From hardware to theory
 Parallel processing: SIMD vs MIMD
Neural net: SIMD with modifiable local memory
Learning: Update by training/experience
Perceptron
5

d
y   w j x j  w0  w T x
j 1

w  w0 ,w1 ,..., wd T

x  1, x1 ,..., xd 
T

(Rosenblatt, 1962)
What a Perceptron Does
6

 Regression: y=wx+w0  Classification:y=1(wx+w0>0)

y y
s y
w0 w0
w w
x
w0
x x
x0=+1
1
y  sigmoido 

1  exp  wT x 
Regression:
K Outputs d
yi   wij x j  wi 0  w Ti x
7 j 1

y  Wx

Classification:
oi  w Ti x
exp oi
yi 
k exp ok
choose C i
if y i  max y k
k
Training
8

 Online (instances seen one by one) vs batch (whole

sample) learning:
 No need to store the whole sample
 Problem may change in time
 Wear and degradation in system components

 Stochastic gradient-descent: Update after a single

pattern
 Generic update rule (LMS rule):
wijt   rit  yit x tj
Update LearningFa ctor DesiredOutput ActualOutput  Input
Training a Perceptron: Regression

 Regression (Linear output):

t t t 1 t
2

E w | x , r   r  y   r  w x 
t 2 1 t
2

T t 2

w tj   r t  y t x tj

9
Classification
10

 Single sigmoid output

y t  sigmoidw T xt 
E t w | xt , r t   r t log y t  1  r t  log 1  y t 
w tj   r t  y t x tj

 K>2 softmax outputs

E t w i i | xt , r t    rit log yit

exp w T t
i x
y 
t

k exp w T t
kx i

wijt   rit  yit x tj

Learning Boolean AND
11
XOR

 No w0, w1, w2 satisfy:

w0 0
(Minsky and Papert, 1969)
w 2  w0 0
w1  w0 0
w1  w 2  w0 0
12
Multilayer Perceptrons
13

H
y i  v Ti z   v ih zh  v i 0
h 1

zh  sigmoidw Th x 
1

1  exp   d
j 1
whj x j  wh 0 

(Rumelhart et al., 1986)

14 x1 XOR x2 = (x1 AND ~x2) OR (~x1 AND x2)
Backpropagation
15

H
y i  v z   v ih zh  v i 0
T
i
h 1

zh  sigmoidw Th x 
1
1  exp   
 d
j 1
whj x j  wh 0

E E y i zh

whj y i zh whj
E W, v | X    r  y 
1 t 2
Regression
t

2 t

vh   r t  y t zht
H
y   v z  v0
t t
h h t
h 1
Backward
E
Forward whj  
whj

zh  sigmoidw x T
h
E y t zht
   t t
t y z h w hj

    r t  y t v h zht 1  zht x tj
t

x    r t  y t v h zht 1  zht x tj
t
16
Regression with Multiple Outputs
17

E W ,V | X    ri  y i 
1 t t 2

2 t i vih
H
y it   v ih zht v i 0
h 1 zh
v ih    rit  y it zht whj
t
xj
  t
whj     ri  y i v ih zh 1  zht x tj
t t

t  i 
18
19
whx+w0
vhzh
zh

20
Two-Class Discrimination
21

 One sigmoid output yt for P(C1|xt) and

P(C2|xt) ≡ 1-yt
 H 
y  sigmoid  v h zh  v 0 
t t

 h1 
E W , v | X    r t log y t  1  r t  log 1  y t 
t

v h    r t  y t zht
t

whj    r t  y t v h zht 1  zht x tj

t
K>2 Classes
22

exp
 
H t
o
oit   v ih zht  v i 0 y it  i
 P C | x t

k exp okt i
h 1

E W , v | X    rit log y it

t i

v ih    rit  y it zht
t

  t
whj     ri  y i v ih zh 1  zht x tj
t t

t  i 
Multiple Hidden Layers
23

 MLP with one hidden layer is a universal

approximator (Hornik et al., 1989), but using
multiple layers may lead to simpler networks
 d 
z1h  sigmoidw x   sigmoid  w1hj x j  w1h 0 , h  1,..., H1
T
1h

 j 1 
 H1 
z 2l  sigmoidw z   sigmoid  w 2lh z1h  w 2l 0 , l  1,..., H2
T
2l 1
 h1 
H2
y  v z 2   vl z2l  v0
T

l 1
Improving Convergence
24

 Momentum
E t
w  
t
 wit 1
wi
i

 Adaptive learning rate

  a if E t   E t
  
 b otherwise
Overfitting/Overtraining
25

Number of weights: H (d+1)+(H+1)K

26
Structured MLP
27

 Convolutional networks (Deep learning)

(Le Cun et al, 1989)

Weight Sharing
28
Hints
29

 Invariance to translation, rotation, size

 Virtual examples (Abu-Mostafa, 1995)

 Destructive  Constructive
Weight decay: Growing networks

E
wi    w i
wi

E'  E 
2
 i
i
w 2

(Ash, 1989) (Fahlman and Lebiere, 1989)

Bayesian Learning
31

 Consider weights wi as random vars, prior p(wi)

 Weight decay, ridge regression, regularization

cost=data-misfit + λ complexity
More about Bayesian methods in chapter 14
Dimensionality Reduction
32

Autoencoder networks
33
Learning Time
34

 Applications:
 Sequence recognition: Speech recognition
 Sequence reproduction: Time-series prediction

 Sequence association

 Network architectures
 Time-delay networks (Waibel et al., 1989)
 Recurrent networks (Rumelhart et al., 1986)
Time-Delay Neural Networks
35
Recurrent Networks
36
Unfolding in Time
37
Deep Networks
38

 Layers of feature extraction units

 Can have local receptive fields as in convolution
networks, or can be fully connected
 Can be trained layer by layer using an autoencoder
in an unsupervised manner
 No need to craft the right features or the right basis
functions or the right dimensionality reduction method;
learns multiple layers of abstraction all by itself given
a lot of data and a lot of computation
 Applications in vision, language processing, ...

Notes ML 02 Slides RNN ANN
No ratings yet
Notes ML 02 Slides RNN ANN
105 pages
Multi Layer Perceptron 1
No ratings yet
Multi Layer Perceptron 1
54 pages
L06 Slides - mlp3
No ratings yet
L06 Slides - mlp3
26 pages
Neural Network
No ratings yet
Neural Network
97 pages
Lecture 8 - Intro To Neural Networks
No ratings yet
Lecture 8 - Intro To Neural Networks
61 pages
2025 Lecture07 P2 MLP
No ratings yet
2025 Lecture07 P2 MLP
56 pages
L04 Slides - mlp1
No ratings yet
L04 Slides - mlp1
22 pages
P5 Neural Nets
No ratings yet
P5 Neural Nets
114 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
Anthony Kuh - Neural Networks and Learning Theory
No ratings yet
Anthony Kuh - Neural Networks and Learning Theory
72 pages
Unit 4 ML NN, DL, CNN-1
No ratings yet
Unit 4 ML NN, DL, CNN-1
84 pages
ML Unit - 2
No ratings yet
ML Unit - 2
70 pages
Lecture 4
No ratings yet
Lecture 4
50 pages
Week-12 - Introduction To ML-NN-CNN
No ratings yet
Week-12 - Introduction To ML-NN-CNN
45 pages
Lec03 NeuralNetwork
No ratings yet
Lec03 NeuralNetwork
39 pages
Multi Layer Perceptron Annotated
No ratings yet
Multi Layer Perceptron Annotated
53 pages
02A DL2023 NN Basics
No ratings yet
02A DL2023 NN Basics
52 pages
Dave Reed: Connectionist Approach To AI
No ratings yet
Dave Reed: Connectionist Approach To AI
26 pages
19 Learning
No ratings yet
19 Learning
31 pages
ANN 3 - Perceptron
100% (1)
ANN 3 - Perceptron
56 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
NN Suppl
No ratings yet
NN Suppl
64 pages
Evaluation Instrument For Training Supervisor - Final
No ratings yet
Evaluation Instrument For Training Supervisor - Final
2 pages
Multi Layer Perceptron Haykin
No ratings yet
Multi Layer Perceptron Haykin
50 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
75 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Neural Network BSC
No ratings yet
Neural Network BSC
32 pages
855597620
No ratings yet
855597620
44 pages
Artificial Neural Networks: Introduction To Computational Neuroscience
No ratings yet
Artificial Neural Networks: Introduction To Computational Neuroscience
42 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Bim309 Ai Week13
No ratings yet
Bim309 Ai Week13
53 pages
2023 Lecture11 NeuralNetworks
No ratings yet
2023 Lecture11 NeuralNetworks
48 pages
2EL1730 ML Lecture07 Neural Networks
No ratings yet
2EL1730 ML Lecture07 Neural Networks
65 pages
C-70 Summer Holiday Homework Std. VI
No ratings yet
C-70 Summer Holiday Homework Std. VI
2 pages
MLP Chap11
No ratings yet
MLP Chap11
24 pages
Lec 6-7 (Neural Networks)
No ratings yet
Lec 6-7 (Neural Networks)
26 pages
Steam Turbine 08 Neil Goog
No ratings yet
Steam Turbine 08 Neil Goog
655 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
Lecture 4: Perceptrons and Multilayer Perceptrons: Cognitive Systems II - Machine Learning SS 2005
No ratings yet
Lecture 4: Perceptrons and Multilayer Perceptrons: Cognitive Systems II - Machine Learning SS 2005
25 pages
NN Theory
No ratings yet
NN Theory
138 pages
Session 2 ANN 2024
No ratings yet
Session 2 ANN 2024
29 pages
Steam Turbine 00 Neil Rich
No ratings yet
Steam Turbine 00 Neil Rich
242 pages
3 Non Linear Classifiers
No ratings yet
3 Non Linear Classifiers
74 pages
Detailed-Lesson-Plan - Module 2 Final Na Jod
No ratings yet
Detailed-Lesson-Plan - Module 2 Final Na Jod
13 pages
Week 4
No ratings yet
Week 4
52 pages
Mid Summary
No ratings yet
Mid Summary
13 pages
4.2 Ann
No ratings yet
4.2 Ann
26 pages
2021 Lecture11 NeuralNetworks
No ratings yet
2021 Lecture11 NeuralNetworks
48 pages
TheDesignofSteamBoilersandPressureVessels 10267815
No ratings yet
TheDesignofSteamBoilersandPressureVessels 10267815
439 pages
Thermodynamics of 1911 Pea B
No ratings yet
Thermodynamics of 1911 Pea B
330 pages
Neural
No ratings yet
Neural
32 pages
Week3 Perceptron Mlprwerwerwer
No ratings yet
Week3 Perceptron Mlprwerwerwer
8 pages
11-Nonlinear Models (Neural Networks)
No ratings yet
11-Nonlinear Models (Neural Networks)
6 pages
Check Your Knowledge Homework: The Toeic Listening and Reading Test
50% (2)
Check Your Knowledge Homework: The Toeic Listening and Reading Test
3 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
ML 03
No ratings yet
ML 03
42 pages
OBE Syllabus in Lit Crit
No ratings yet
OBE Syllabus in Lit Crit
7 pages
Angles of Elevation and Depression - Lesson Plan
No ratings yet
Angles of Elevation and Depression - Lesson Plan
7 pages
2024 MTH058 Lecture02 Backpropagation
No ratings yet
2024 MTH058 Lecture02 Backpropagation
62 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Essentials of Instructional Technology: Mudasir Hamid Malik Aqueel Ahmad Pandith
No ratings yet
Essentials of Instructional Technology: Mudasir Hamid Malik Aqueel Ahmad Pandith
66 pages
02 Listening Skills - Exam Overview & Note Completion
No ratings yet
02 Listening Skills - Exam Overview & Note Completion
15 pages
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
No ratings yet
Neural Network: Presented by Lecturer Dept. of Mechatronics Engineering Rajshahi University of Engineering & Technology
25 pages
Steamturbine 02 Neilgoog
No ratings yet
Steamturbine 02 Neilgoog
245 pages
Main
No ratings yet
Main
25 pages
Introduction To Neural Networks: RWTH Aachen University Chair of Computer Science 6 Prof. Dr.-Ing. Hermann Ney
No ratings yet
Introduction To Neural Networks: RWTH Aachen University Chair of Computer Science 6 Prof. Dr.-Ing. Hermann Ney
31 pages
5 1 ArtificialNeuralNetworks 4up
No ratings yet
5 1 ArtificialNeuralNetworks 4up
12 pages
Grant Proposal Topic Verge Learning Management System
No ratings yet
Grant Proposal Topic Verge Learning Management System
8 pages
A Directory of Paper Recycling Resources
No ratings yet
A Directory of Paper Recycling Resources
276 pages
NN Ch04
No ratings yet
NN Ch04
29 pages
Supervised Learning Neural Networks
No ratings yet
Supervised Learning Neural Networks
34 pages
Inspiring Creative Supervision
No ratings yet
Inspiring Creative Supervision
4 pages
Resume David Ruehs High School
No ratings yet
Resume David Ruehs High School
3 pages
Detailed Science Lesson Plan: Objectives
No ratings yet
Detailed Science Lesson Plan: Objectives
3 pages
Unit 2
No ratings yet
Unit 2
92 pages
Dimensions of Comparative Education
No ratings yet
Dimensions of Comparative Education
7 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
I2ml3e Chap5
No ratings yet
I2ml3e Chap5
26 pages
EN PDF Betontechnik
No ratings yet
EN PDF Betontechnik
23 pages
2018 Annual Report
No ratings yet
2018 Annual Report
76 pages
Design and Fabrication of Pressing Steam Boiler
No ratings yet
Design and Fabrication of Pressing Steam Boiler
14 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
71 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Daily Lesson Log Light
No ratings yet
Daily Lesson Log Light
3 pages
I2ml3e Chap8
No ratings yet
I2ml3e Chap8
28 pages
I2ml3e Chap15
No ratings yet
I2ml3e Chap15
22 pages
Solicitation Letter
No ratings yet
Solicitation Letter
1 page
Duyuru Basel 0001 55
No ratings yet
Duyuru Basel 0001 55
29 pages
Feeding For Dairy
No ratings yet
Feeding For Dairy
28 pages
Staar Ush Blueprint
No ratings yet
Staar Ush Blueprint
1 page
I2ml3e Chap9
No ratings yet
I2ml3e Chap9
15 pages
Appendix 1 - NC00056438-Rev2
No ratings yet
Appendix 1 - NC00056438-Rev2
3 pages
Es1003 Lathe Drawtube Specifications
No ratings yet
Es1003 Lathe Drawtube Specifications
7 pages
18 Circular 2020 PDF
No ratings yet
18 Circular 2020 PDF
3 pages
General Purchase Terms - Version 10.2006
No ratings yet
General Purchase Terms - Version 10.2006
5 pages
DLL Q4 wk2 Eng9ee
No ratings yet
DLL Q4 wk2 Eng9ee
5 pages
10A AK27 Technological-Devices
No ratings yet
10A AK27 Technological-Devices
22 pages
Sf9 Intermediate Sy 21-22-8.5x11
No ratings yet
Sf9 Intermediate Sy 21-22-8.5x11
2 pages
Week 3. Contemporary and Popular Lit
No ratings yet
Week 3. Contemporary and Popular Lit
4 pages
Art - Snowflakes Lesson
100% (1)
Art - Snowflakes Lesson
3 pages
Strategies of Teaching by S' Jaymar Arago
No ratings yet
Strategies of Teaching by S' Jaymar Arago
13 pages
(Q1-W4-M2NS-If-20.1) : I - Objectives
100% (1)
(Q1-W4-M2NS-If-20.1) : I - Objectives
5 pages
Erguvan Olive Oil Price List
No ratings yet
Erguvan Olive Oil Price List
1 page
Perez BatangasProveince ResearchBulletin
No ratings yet
Perez BatangasProveince ResearchBulletin
3 pages
Teachers Should Be Paid More
No ratings yet
Teachers Should Be Paid More
3 pages
Performance Indicators Objectives Outstanding (5) Very Satisfactory (4) Satisfactory (3) Unsatisfactory (2) Poor
No ratings yet
Performance Indicators Objectives Outstanding (5) Very Satisfactory (4) Satisfactory (3) Unsatisfactory (2) Poor
6 pages
CHAPTER I-V
No ratings yet
CHAPTER I-V
30 pages
6Cs in Social Care Guide
No ratings yet
6Cs in Social Care Guide
4 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

I2ml3e Chap11

Uploaded by

I2ml3e Chap11

Uploaded by

Lecture Slides for

 Networks of processing units (neurons) with

 Levels of analysis (Marr, 1982)

w  w0 ,w1 ,..., wd T

 Regression: y=wx+w0  Classification:y=1(wx+w0>0)

 Online (instances seen one by one) vs batch (whole

 Stochastic gradient-descent: Update after a single

 Regression (Linear output):

 Single sigmoid output

 K>2 softmax outputs

E t w i i | xt , r t    rit log yit

wijt   rit  yit x tj

 No w0, w1, w2 satisfy:

(Rumelhart et al., 1986)

 One sigmoid output yt for P(C1|xt) and

whj    r t  y t v h zht 1  zht x tj

E W , v | X    rit log y it

 MLP with one hidden layer is a universal

 Adaptive learning rate

Number of weights: H (d+1)+(H+1)K

 Convolutional networks (Deep learning)

(Le Cun et al, 1989)

 Invariance to translation, rotation, size

 Virtual examples (Abu-Mostafa, 1995)

(Ash, 1989) (Fahlman and Lebiere, 1989)

 Consider weights wi as random vars, prior p(wi)

 Weight decay, ridge regression, regularization

 Layers of feature extraction units

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.