0% found this document useful (0 votes)

57 views6 pages

1 Algorithm: For I 1 To N Ify

The document summarizes the perceptron algorithm. It was developed in 1962 based on models of neurons. The algorithm trains a binary classifier using an iterative process where it updates the classifier parameters if a point is misclassified to move the decision boundary closer to correctly classifying that point. If the algorithm goes through an iteration without any updates, it will not make any further updates and can terminate. The algorithm is guaranteed to find a linear classifier with zero training error if the data is linearly separable.

Uploaded by

Parias L. Mukeba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views6 pages

1 Algorithm: For I 1 To N Ify

Uploaded by

Parias L. Mukeba

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

CHAPTER 3

The Perceptron

First of all, the coolest algorithm name! It is based on the 1943 model of neurons made by Well, maybe “neocogni-
McCulloch and Pitts and by Hebb. It was developed by Rosenblatt in 1962. At the time, tron,” also the name of
a real ML algorithm, is
it was not interpreted as attempting to optimize any particular criteria; it was presented
cooler.
directly as an algorithm. There has, since, been a huge amount of study and analysis of its
convergence properties and other aspects of its behavior.

1 Algorithm
Recall that we have a training dataset Dn with x ∈ Rd , and y ∈ {−1, +1}. The Perceptron
algorithm trains a binary classifier h(x; θ, θ0 ) using the following algorithm to find θ and
θ0 using τ iterative steps: We use Greek letter τ
here instead of T so we
P ERCEPTRON(τ, Dn ) don’t confuse it with
T transpose!
1 θ = 0 0 ··· 0
2 θ0 = 0
3 for t = 1 to τ
4 for i = 1 to n
5 if y(i) θT x(i) + θ0 6 0
6 θ = θ + y(i) x(i)
7 θ0 = θ0 + y(i)
8 return θ, θ0
Let’s check dimensions.
Intuitively, on each step, if the current hypothesis θ, θ0 classifies example x(i) correctly, Remember that θ is
d × 1, x(i) is d × 1, and
then no change is made. If it classifies x(i) incorrectly, then it moves θ, θ0 so that it is
y(i) is a scalar. Does
“closer” to classifying x(i) , y(i) correctly. everything match?
Note that if the algorithm ever goes through one iteration of the loop on line 4 without
making an update, it will never make any further updates (verify that you believe this!)
and so it should just terminate at that point.
Study Question: What is true about En if that happens?

15
MIT 6.036 Fall 2019 16

1 (0)
Example: Let h be the linear classifier defined by θ(0) = , θ0 = 1. The dia-
−1
gram below shows several points classified by h. However,
in this case, h (repre-
1
sented by the bold line) misclassifies the point x(1) = which has label y(1) = 1.
3
Indeed,
1
y (1)
θ x + θ0 = 1 −1
T (1)
+ 1 = −1 < 0
3
By running an iteration of the Perceptron algorithm, we update

(1) (0) (1) (1) 2
θ =θ +y x =
2

(1) (0)
θ0 = θ0 + y(1) = 2
The new classifier (represented by the dashed line) now correctly classifies that
point, but now makes a mistake on the negatively labeled point.

T (0)
θ(0) x + θ0 = 0

x(1)

θ(0)

θ(1)

T (1)
θ(1) x + θ0 = 0

A really important fact about the perceptron algorithm is that, if there is a linear classi-
fier with 0 training error, then this algorithm will (eventually) find it! We’ll look at a proof
of this in detail, next.

2 Offset
Sometimes, it can be easier to implement or analyze classifiers of the form

+1 if θT x > 0
h(x; θ) =
−1 otherwise.

Without an explicit offset term (θ0 ), this separator must pass through the origin, which may
appear to be limiting. However, we can convert any problem involving a linear separator
with offset into one with no offset (but of higher dimension)!

Last Updated: 12/18/19 11:56:05

MIT 6.036 Fall 2019 17

Consider the d-dimensional linear separator defined by θ = θ1 θ2 ··· θd and
offset θ0 .

• to each data point x ∈ D, append a coordinate with value +1, yielding

T
xnew = x1 ··· xd +1

• define T
θnew = θ1 ··· θd θ0

Then,

θTnew · xnew = θ1 x1 + · · · + θd xd + θ0 · 1
= θT x + θ0

Thus, θnew is an equivalent ((d + 1)-dimensional) separator to our original, but with no
offset.
Consider the data set:

X = [[1], [2], [3], [4]]

Y = [[+1], [+1], [−1], [−1]]

It is linearly separable in d = 1 with θ = [−1] and θ0 = 2.5. But it is not linearly separable
through the origin! Now, let

1 2 3 4
Xnew =
1 1 1 1

This new dataset is separable through the origin, with θnew = [−1, 2.5]T .
We can make a simplified version of the perceptron algorithm if we restrict ourselves
to separators through the origin: We list it here because
this is the version of the
P ERCEPTRON -T HROUGH -O RIGIN(τ, Dn ) algorithm we’ll study in
T more detail.
1 θ = 0 0 ··· 0
2 for t = 1 to τ
3 for i = 1 to n
4 if y(i) θT x(i) 6 0
5 θ = θ + y(i) x(i)
6 return θ

3 Theory of the perceptron

Now, we’ll say something formal about how well the perceptron algorithm really works.
We start by characterizing the set of problems that can be solved perfectly by the perceptron
algorithm, and then prove that, in fact, it can solve these problems. In addition, we provide
a notion of what makes a problem difficult for perceptron and link that notion of difficulty
to the number of iterations the algorithm will take.

Last Updated: 12/18/19 11:56:05

MIT 6.036 Fall 2019 18

3.1 Linear separability

A training set Dn is linearly separable if there exist θ, θ0 such that, for all i = 1, 2, . . . , n:

y(i) θT x(i) + θ0 > 0 .

Another way to say this is that all predictions on the training set are correct:

h(x(i) ; θ, θ0 ) = y(i) .

And, another way to say this is that the training error is zero:

En (h) = 0 .

3.2 Convergence theorem

The basic result about the perceptron is that, if the training data Dn is linearly separable,
then the perceptron algorithm is guaranteed to find a linear separator. If the training data is
We will more specifically characterize the linear separability of the dataset by the margin not linearly separable,
the algorithm will not
of the separator. We’ll start by defining the margin of a point with respect to a hyperplane.
be able to tell you for
First, recall that the distance of a point x to the hyperplane θ, θ0 is sure, in finite time, that
it is not linearly sepa-
θT x + θ0 rable. There are other
. algorithms that can
kθk
test for linear separa-
bility with run-times
Then, we’ll define the margin of a labeled point (x, y) with respect to hyperplane θ, θ0 to be
O(nd/2 ) or O(d2n ) or
O(nd−1 log n).
θT x + θ0
y· .
kθk

This quantity will be positive if and only if the point x is classified as y by the linear classi-
fier represented by this hyperplane.
Study Question: What sign does the margin have if the point is incorrectly classi-
fied? Be sure you can explain why.
Now, the margin of a dataset Dn with respect to the hyperplane θ, θ0 is the minimum
margin of any point with respect to θ, θ0 :
T (i)

(i) θ x + θ0
min y · .
i kθk

The margin is positive if and only if all of the points in the data-set are classified correctly.
In that case (only!) it represents the distance from the hyperplane to the closest point.

Last Updated: 12/18/19 11:56:05

MIT 6.036 Fall 2019 19

1
Example: Let h be the linear classifier defined by θ = , θ0 = 1.
−1

The diagram below shows several points classified by h, one of which is misclassi-
fied. We compute the margin for each point:

θT x + θ0 = 0

x(1)

x(3) x(2)

√
θT x(1) + θ0 −2 + 1 2
y(1) · =1· √ =−
kθk 2 2

θT x(2) + θ0 1+1 √
y(2) · =1· √ = 2
kθk 2
T (3)
θ x + θ0 −3 + 1 √
y(3) · = −1 · √ = 2
kθk 2

Note that since point x(1) is misclassified,

√
its margin is negative. Thus the margin
for the whole data set is given by − 22 .

Theorem 3.1 (Perceptron Convergence). For simplicity, we consider the case where the linear
separator must pass through the origin. If the following conditions hold:
∗T (i)
(a) there exists θ∗ such that y(i) θ kθx∗ k > γ for all i = 1, . . . , n and for some γ > 0 and

(b) all the examples have bounded magnitude: x(i) 6 R for all i = 1, . . . n,
2
then the perceptron algorithm will make at most R γ mistakes. At this point, its hypothesis will
be a linear separator of the data.

Proof. We initialize θ(0) = 0, and let θ(k) define our hyperplane after the perceptron algo-
rithm has made k mistakes. We are going to think about the angle between the hypothesis
we have now, θ(k) and the assumed good separator θ∗ . Since they both go through the ori-
gin, if we can show that the angle between them is decreasing usefully on every iteration,
then we will get close to that separator.

Last Updated: 12/18/19 11:56:05

MIT 6.036 Fall 2019 20

So, let’s think about the cos of the angle between them, and recall, by the definition of
dot product:
θ(k) · θ∗
cos θ(k) , θ∗ = ∗
kθ k θ(k)
We’ll divide this up into two factors,
!
θ(k) · θ∗ 1
(k) ∗
cos θ , θ = ·
θ(k) , (3.1)
kθ∗ k

and start by focusing on the first factor.

Without loss of generality, assume that the k mistake occurs on the i example
th th

x ,y
(i) (i)
.

θ(k) · θ∗ θ(k−1) + y(i) x(i) · θ∗
=
kθ∗ k kθ∗ k
θ(k−1) · θ∗ y(i) x(i) · θ∗
= +
kθ∗ k kθ∗ k
θ(k−1) · θ∗
> +γ
kθ∗ k
> kγ

where we have first applied the margin condition from (a) and then applied simple induc-
tion.
Now, we’ll look at thesecond factor
in equation 3.1. We note that since x , y
(i) (i)
is
T
classified incorrectly, y(i) θ(k−1) x(i) 6 0. Thus,
2
(k) 2 (k−1)
θ = θ + y(i) x(i)
2 2
T
= θ(k−1) + 2y(i) θ(k−1) x(i) + x(i)
2

6 θ(k−1) + R2
6 kR2

where we have additionally applied the assumption from (b) and then again used simple
induction.
Returning to the definition of the dot product, we have
(k) ∗
θ(k) · θ∗ θ ·θ 1 1 √ γ
(k) ∗
cos θ , θ = > (kγ) · √ = k·
θ(k) kθ∗ k = ∗
kθ k θ(k) kR R

Since the value of the cosine is at most 1, we have

√ γ
1> k·
R
2
R
k6 .
γ

This result endows the margin γ of Dn with an operational meaning: when using the
Perceptron algorithm for classification, at most (R/γ)2 classification errors will be made,
where R is an upper bound on the magnitude of the training vectors.

Last Updated: 12/18/19 11:56:05

MAT1581 Assignment 2 - Solutions
No ratings yet
MAT1581 Assignment 2 - Solutions
4 pages
Midterm Review Spring18 Sols
No ratings yet
Midterm Review Spring18 Sols
22 pages
6.86x Machine Learning With Python: Linear Classifiers
No ratings yet
6.86x Machine Learning With Python: Linear Classifiers
7 pages
The Principle of Least Work PDF
100% (1)
The Principle of Least Work PDF
17 pages
Perceptron Bound Proof
No ratings yet
Perceptron Bound Proof
27 pages
hw1 Sols PDF
No ratings yet
hw1 Sols PDF
5 pages
Linear Classifiers and The Perceptron Algorithm: 36-350, Data Mining, Fall 2009 16 November 2009
No ratings yet
Linear Classifiers and The Perceptron Algorithm: 36-350, Data Mining, Fall 2009 16 November 2009
5 pages
Linear Regression
No ratings yet
Linear Regression
37 pages
Linear Separability
No ratings yet
Linear Separability
4 pages
Lecture Notes 3 Perceptron
No ratings yet
Lecture Notes 3 Perceptron
7 pages
Asset-V1 MITx+6.86x+3T2020+typeasset+blockslides Lecture2 Compressed
No ratings yet
Asset-V1 MITx+6.86x+3T2020+typeasset+blockslides Lecture2 Compressed
21 pages
Lecturenotes Perceptron
No ratings yet
Lecturenotes Perceptron
7 pages
Week 1
No ratings yet
Week 1
5 pages
Perceptron
No ratings yet
Perceptron
6 pages
Perceptron
No ratings yet
Perceptron
23 pages
Kakade S. Tewari A. - Topics in Artificial Intelligence (Learning Theory)
No ratings yet
Kakade S. Tewari A. - Topics in Artificial Intelligence (Learning Theory)
68 pages
Linear Classifier-Perceptron
No ratings yet
Linear Classifier-Perceptron
42 pages
01 Halfspaces Perceptron
No ratings yet
01 Halfspaces Perceptron
56 pages
Perceptron Mistake Bound
No ratings yet
Perceptron Mistake Bound
10 pages
Perceptrons
No ratings yet
Perceptrons
12 pages
Chapter Classification
No ratings yet
Chapter Classification
12 pages
Perceptron Notes
No ratings yet
Perceptron Notes
5 pages
Perceptron Learning Algorithm Lecture Supplement
No ratings yet
Perceptron Learning Algorithm Lecture Supplement
6 pages
2007 02 01b Janecek Perceptron
No ratings yet
2007 02 01b Janecek Perceptron
37 pages
Perceptron PDF
No ratings yet
Perceptron PDF
37 pages
20.NeuralNets Short
No ratings yet
20.NeuralNets Short
60 pages
Lecture 3 - The Perceptron
No ratings yet
Lecture 3 - The Perceptron
4 pages
SML Lecture5
No ratings yet
SML Lecture5
45 pages
ML - Lec 6 - Linear Classifiers
No ratings yet
ML - Lec 6 - Linear Classifiers
55 pages
05 Neural Network
No ratings yet
05 Neural Network
38 pages
Perceptron
No ratings yet
Perceptron
3 pages
Perceptron: Tirtharaj Dash
No ratings yet
Perceptron: Tirtharaj Dash
22 pages
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
No ratings yet
CSE 473 Pattern Recognition: Instructor: Dr. Md. Monirul Islam
43 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
MAT6007 Session5 Perceptron Algorithm
No ratings yet
MAT6007 Session5 Perceptron Algorithm
19 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
46 pages
Perceptron Algorithm
No ratings yet
Perceptron Algorithm
10 pages
CS60010: Deep Learning: Spring 2021
No ratings yet
CS60010: Deep Learning: Spring 2021
32 pages
The Percept Ronal Go
No ratings yet
The Percept Ronal Go
72 pages
Lecture 3 - Rosenblatt - S Perceptron-Ch2
No ratings yet
Lecture 3 - Rosenblatt - S Perceptron-Ch2
20 pages
NN Part1
No ratings yet
NN Part1
43 pages
Perceptron PDF
0% (1)
Perceptron PDF
8 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
Perceptron - Algorithm
No ratings yet
Perceptron - Algorithm
9 pages
DL Assignment Solutions
No ratings yet
DL Assignment Solutions
64 pages
Refresher: Perceptron Training Algorithm
No ratings yet
Refresher: Perceptron Training Algorithm
12 pages
Perceptron
No ratings yet
Perceptron
26 pages
ANN (Perceptron) 02
No ratings yet
ANN (Perceptron) 02
14 pages
NN 1
No ratings yet
NN 1
6 pages
Perceptron, Convergence, and Generalization
No ratings yet
Perceptron, Convergence, and Generalization
5 pages
Perceptron Linear Classifiers
No ratings yet
Perceptron Linear Classifiers
42 pages
Perceptron Lecture 3
No ratings yet
Perceptron Lecture 3
25 pages
L03 Slides - Perceptron
No ratings yet
L03 Slides - Perceptron
22 pages
Machine Learning: Linear Models For Classification 1
No ratings yet
Machine Learning: Linear Models For Classification 1
30 pages
05 Linear Classifiers
No ratings yet
05 Linear Classifiers
59 pages
3 Percept Ron
No ratings yet
3 Percept Ron
34 pages
(I) (J) (I) TH (J) TH (I) (J)
No ratings yet
(I) (J) (I) TH (J) TH (I) (J)
3 pages
Pr5 PerceptronWriteUp
No ratings yet
Pr5 PerceptronWriteUp
6 pages
Lecture 2
No ratings yet
Lecture 2
57 pages
Module 4 Lab 1
No ratings yet
Module 4 Lab 1
5 pages
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
4.5/5 (2)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Notes Chapter Logistic Regression
No ratings yet
Notes Chapter Logistic Regression
6 pages
Notes Chapter Linear Classifiers
No ratings yet
Notes Chapter Linear Classifiers
4 pages
This Story Paraphrased From A Post On 9/4/12
No ratings yet
This Story Paraphrased From A Post On 9/4/12
7 pages
Notes Chapter Feature Representation
No ratings yet
Notes Chapter Feature Representation
6 pages
MTPDF4 Exact and Non-Exact Differential Equations
No ratings yet
MTPDF4 Exact and Non-Exact Differential Equations
27 pages
4 Gauss Forward Formula2
No ratings yet
4 Gauss Forward Formula2
10 pages
M.sc. IV Sem (Maths) - Research Project Topics Distribution List (Session 2024-25)
No ratings yet
M.sc. IV Sem (Maths) - Research Project Topics Distribution List (Session 2024-25)
7 pages
Formula List
No ratings yet
Formula List
15 pages
Basic Mathematics Form Two & Four-1
No ratings yet
Basic Mathematics Form Two & Four-1
5 pages
(Ebook) Mathematical Methods For Engineers and Scientists 1: Complex Analysis and Linear Algebra by Kwong-Tin Tang ISBN 9783031056772, 3031056779
100% (3)
(Ebook) Mathematical Methods For Engineers and Scientists 1: Complex Analysis and Linear Algebra by Kwong-Tin Tang ISBN 9783031056772, 3031056779
79 pages
cs147 Lecture Slides
No ratings yet
cs147 Lecture Slides
122 pages
EE163 Digital Control Systems
No ratings yet
EE163 Digital Control Systems
1 page
Quadratic Equations and Inequalities: Credited Summer Class 2019-2020
No ratings yet
Quadratic Equations and Inequalities: Credited Summer Class 2019-2020
16 pages
Homework 6 Solutions: Enrique Trevi No October 22, 2014
No ratings yet
Homework 6 Solutions: Enrique Trevi No October 22, 2014
4 pages
Goa Board Nov21 HSSC Class 12 Mathematics Term 1 Question Paper 2022 23
No ratings yet
Goa Board Nov21 HSSC Class 12 Mathematics Term 1 Question Paper 2022 23
11 pages
T,,.. I L-L - ' Men' I, X z3 .,) : Heat Removal Factor For A Flat-Plate Solar Collector With A Serpentine Tube
No ratings yet
T,,.. I L-L - ' Men' I, X z3 .,) : Heat Removal Factor For A Flat-Plate Solar Collector With A Serpentine Tube
6 pages
5.2 Velocity-Graphical PDF
No ratings yet
5.2 Velocity-Graphical PDF
9 pages
Mean Median Mode Champions Square
No ratings yet
Mean Median Mode Champions Square
3 pages
38 Complex
No ratings yet
38 Complex
3 pages
Founder of Circle'S First Theorem Thales Definition of Circle
No ratings yet
Founder of Circle'S First Theorem Thales Definition of Circle
15 pages
Things To Learn Today: 1. Use Energy Method To Derive Equation of Motion 2. Damping Elements
No ratings yet
Things To Learn Today: 1. Use Energy Method To Derive Equation of Motion 2. Damping Elements
15 pages
Mathematical Geodesy Maa-6.3230: Martin Vermeer 4th February 2013
No ratings yet
Mathematical Geodesy Maa-6.3230: Martin Vermeer 4th February 2013
127 pages
MIS Class 12th Set B PREBOARD - 1 Exam
No ratings yet
MIS Class 12th Set B PREBOARD - 1 Exam
5 pages
Presentation Keni's Method
No ratings yet
Presentation Keni's Method
31 pages
Orthogonality and Vector Spaces
No ratings yet
Orthogonality and Vector Spaces
18 pages
Lecture Notes in Mathematics: 1133 Krzysztof C. Kiwiel
No ratings yet
Lecture Notes in Mathematics: 1133 Krzysztof C. Kiwiel
368 pages
2025 12MS-1
No ratings yet
2025 12MS-1
3 pages
Chapter 14 Test 1
No ratings yet
Chapter 14 Test 1
9 pages
Calculus Essay Writing
No ratings yet
Calculus Essay Writing
4 pages
Assignment 2, AH, Fall 2024
No ratings yet
Assignment 2, AH, Fall 2024
3 pages
Vitaly Vanchurin and Alexander Vilenkin - Eternal Observers and Bubble Abundances in The Landscape
No ratings yet
Vitaly Vanchurin and Alexander Vilenkin - Eternal Observers and Bubble Abundances in The Landscape
4 pages
K-Gamma and K-Beta Function
No ratings yet
K-Gamma and K-Beta Function
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

1 Algorithm: For I 1 To N Ify

Uploaded by

1 Algorithm: For I 1 To N Ify

Uploaded by

CHAPTER 3

Last Updated: 12/18/19 11:56:05

• to each data point x ∈ D, append a coordinate with value +1, yielding

X = [[1], [2], [3], [4]]

3 Theory of the perceptron

Last Updated: 12/18/19 11:56:05

3.1 Linear separability

3.2 Convergence theorem

Last Updated: 12/18/19 11:56:05

Note that since point x(1) is misclassified,

Last Updated: 12/18/19 11:56:05

and start by focusing on the first factor.

Since the value of the cosine is at most 1, we have

Last Updated: 12/18/19 11:56:05

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.