0% found this document useful (0 votes)
21 views6 pages

Perceptron

The document provides an in-depth overview of Frank Rosenblatt's Perceptron model, including its algorithm, update rules, assumptions, and convergence proofs. It discusses the conditions for misclassification, the behavior of weight vectors, and presents problems with solutions related to the Perceptron algorithm, including practical applications like the OR and XOR gates. The document concludes with the importance of converged weights for generalization in neural networks.

Uploaded by

Devkriti Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views6 pages

Perceptron

The document provides an in-depth overview of Frank Rosenblatt's Perceptron model, including its algorithm, update rules, assumptions, and convergence proofs. It discusses the conditions for misclassification, the behavior of weight vectors, and presents problems with solutions related to the Perceptron algorithm, including practical applications like the OR and XOR gates. The document concludes with the importance of converged weights for generalization in neural networks.

Uploaded by

Devkriti Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Frank Rosenblatt’s Perceptron Model

Course: EE-241 (Neural Networks and Fuzzy Systems)


Instructor: Pankaj K. Mishra, NIT Hamirpur, India

1 Perceptron algorithm
Consider a labeled dataset:
D = {(pi , yi ) ∣ pi ∈ Rd , yi ∈ {0, 1}, i = 1, . . . , n}. (1)
The classifier predicts:

⎪1 if wT pi ≥ 0,

ŷi = ⎨

⎪0 otherwise.

A point pj is misclassified if ŷj ≠ yj .

1.1 Perceptron update rule


For a misclassified pj , update:

wk+1 = wk + ek pj , where ek = yj − ŷj ∈ {−1, 1}. (2)

1.2 Assumptions
1. Linear Separability with Margin: There exists an ideal weight vector w∗ ∈ Rd and σ > 0 such that:

∣w∗T pi ∣
≥σ ∀i ∈ {1, . . . , n}. (3)
∥w∗ ∥

2. Bounded Input Norms:


∥pi ∥2 ≤ M 2 ∀i ∈ {1, . . . , n}. (4)
3. Zero Initialization:
w0 = 0. (5)

1.3 Auxiliary conditions for the proof of convergence


Property 1: Inner Product Bound
For any vectors a, b ∈ Rd , the inner product satisfies:

aT b ≤ ∥a∥∥b∥. (6)
Proof: By the geometric definition of the dot product:

aT b = ∥a∥∥b∥ cos θ. (7)


Since cos θ ≤ 1 for all θ, it follows that:

aT b ≤ ∥a∥∥b∥. (8)
Thus, Property 1 is proved.

Property 2: Expansion of Squared Norm


For any vectors a, b ∈ Rd , the squared norm of their sum satisfies:

∥a + b∥2 = ∥a∥2 + ∥b∥2 + 2aT b. (9)


Proof: Expanding the squared norm:

∥a + b∥2 = (a + b)T (a + b)
= aT a + aT b + bT a + bT b
= ∥a∥2 + ∥b∥2 + 2aT b.

Thus, Property 2 is proved.

1
Lemma 1: Misclassification condition for the ideal weight vector.
For any misclassified pj :
w∗T pj ek = ∣w∗T pj ∣. (10)
Lemma 2: Misclassification condition for the weights to be updated.
If pj is misclassified by wk , then:
wkT pj ek ≤ 0. (11)
Remark 1 : The ideal weight vector w∗ remains constant and is not subject to updates during the perceptron
algorithm. However, its interaction with the error term is analyzed to compare it against the updated weight vector
wk .
For a misclassified point, the updated weight vector wk produces an error term ek ≠ 0. The objective is to determine
the behaviour of both:

• The projection of the misclassified point onto the updated weight vector wk .
• The projection of the same point onto the ideal weight vector w∗ .
Since the classification of the point by wk is incorrect, the corresponding dot product wkT pj ek will always be non-
positive. However, since the ideal weight vector w∗ correctly classifies all points, its interaction with the misclassified
point results in a dot product that is always positive or zero. This behaviour is demonstrated in the proof.

Proof of Lemma 1 and Lemma 2


Since pj is misclassified, the analysis is performed for two possible cases.
Case 1: yj = 1 or w∗T pj ≥ 0 but ŷ = 0 or wkT pj < 0
The prediction satisfies ŷ = 0, implying ek = 1. Multiplying both sides of w∗T pj ≥ 0 and wkT pj < 0 by ek :

w∗T pj ek ≥ 0. (12)

wkT pj ek < 0. (13)


Case 2: yj = 0 or w pj < 0 but ŷ = 1 or
∗T
wkT pj
≥0
The prediction satisfies ŷ = 0, implying ek = 1. Multiplying both sides of w∗T pj < 0 and wkT pj > 0 by ek :

w∗T pj ek < 0. (14)


wkT pj ek ≥ 0. (15)
Analyzing the results for both cases, it can be concluded that w pj ek will always be non-negative and since ek is
∗T

either 0 or 1, consequently it can be written w∗T pj ek = ∣w∗T pj ∣.


Further, it is straightforward to analyze from the derived results for both cases that wkT pj ek will always be non-
positive or wkT pj ek < 0.

1.4 Convergence Proof


Step 1: Lower bound on weight norm The perceptron update rule is given by:

wk = wk−1 + ek pj . (16)
Multiplying both sides by the ideal weight vector w ∗T
:

w∗T wk = w∗T wk−1 + w∗T ek pj


= w∗T wk−1 + ∣w∗T pj ∣ (Using Lemma 1)
≥ w∗T wk−1 + σ∥w∗ ∥ (Using Assumption 1)

Recursively applying this inequality for all updates:

w∗T wk ≥ kσ∥w∗ ∥. (17)


Using Property 1, i.e., w ∗T
wk ≤ ∥w ∥∥wk ∥, (17) can be written as ∥w ∥∥wk ∥ ≥ kσ∥w ∥, or
∗ ∗ ∗

∥wk ∥2 ≥ k 2 σ 2 . (18)

2
Step 2: Upper bound on weight norm After k updates:

∥wk+1 ∥2 = ∥wk + ek pj ∥2
= ∥wk ∥2 + 2wkT ek pj + ∥pj ∥2 (Using Property 2)
≤ ∥wk ∥2 + M (Using Lemma 2 and Assumption 2, i.e., ∥pj ∥2 ≤ M 2 ).

Recursively applying this inequality for all updates::

∥wk ∥2 ≤ kM 2 . (19)

Step 3: Upper bound on no. of update, i.e., ‘k’:


Using (18) and (19), we have

k 2 σ 2 ≤ kM 2 . (20)

Further simplifying above,

M2
k≤ . (21)
σ2
M2
Thus, the Perceptron algorithm will converge after at most σ2
updates.

Convergence-result independent of the number of samples!


Convergence-result independent of input-dimension!

3
Problem1: Given the dataset:
D = {(1, 2, 1), (2, 1, 0), (3, 3, 1), (0, 0, 0)},
where each tuple (x1 , x2 , y) represents a data point with features x1 , x2 and label y ∈ {0, 1}. Initialize the weight
vector W = [0, 0]T and bias b = 0. Perform one iteration of the Perceptron algorithm using the first misclassified
point. Update the weights and bias using the Perceptron update rule. State the new decision boundary after the
update.
Solution:
• Initialization:
0
W0 = [ ] , b0 = 0.
0

• Prediction
1
ŷ = W0T [ ] + b0 .1 = 0.
2
However, the actual output is 1, hence the data point is misclassified by W0 and b0 . Need an update!
• First misclassified point: (1, 2, 1)

1 0 1 1
W1 = W0 + (y − ŷ) ⋅ [ ] = [ ] + 1 ⋅ [ ] = [ ] ,
2 0 2 2
b1 = b0 + (y − ŷ) = 0 + 1 = 1.

• New decision boundary:


1 ⋅ x1 + 2 ⋅ x2 + 1 = 0.

Problem 2: Given the dataset:

D = {(1, 1, 1), (2, 2, 1), (−1, −1, 0), (−2, −2, 0)},

and the margin σ = 2, prove that the Perceptron algorithm will converge in at most k = 4 updates. Assume that
the initial weight is a null vector.
Solution:

Convergence bound:
M2 8
k≤ = = 4.
σ2 2
Since the dataset is small and well-separated, the algorithm converges in k = 4 updates.
Problem 3: After training the Perceptron algorithm on a dataset, the converged weight vector is W = [2, −1]T and
bias b = −1. Perform the following:
1. Write the equation of the decision boundary.
2. Classify the following points using the decision boundary: (1,1), (-1,-1), (2,0).

3. Explain why the converged weights are important for generalization.


Solution:
• Decision boundary:
W T x + b = 0 ⇒ 2x1 − x2 − 1 = 0.

• Classification:
– For (1, 1):
2(1) − 1(1) − 1 = 0 (on the boundary or Class 1).
– For (−1, −1):
2(−1) − 1(−1) − 1 = −2 + 1 − 1 = −2 < 0 (Class 0).
– For (2, 0):
2(2) − 1(0) − 1 = 4 − 0 − 1 = 3 > 0 (Class 1).
• Importance of converged weights:
– The learned weight vector determines the decision boundary.

4
– Ensures correct classification of training data and good generalization to unseen data.
Problem 4: Train the Perceptron algorithm on the OR gate dataset:

DOR = {(0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 1)}.

Initialize W0 = [0, 0]T and b0 = 0, and perform updates until convergence. Also, manually solve for W and b by
solving inequalities before applying the Perceptron algorithm. Provide epoch-by-epoch weight updates in a tabular
format.

Manual Derivation - Solving Inequalities

The perceptron must satisfy:

w1 (0) + w2 (0) + b < 0 for y = 0,


w1 (0) + w2 (1) + b ≥ 0 for y = 1,
w1 (1) + w2 (0) + b ≥ 0 for y = 1,
w1 (1) + w2 (1) + b ≥ 0 for y = 1.

Above can be written as,

b<0
w2 + b ≥ 0,
w1 + b ≥ 0,
w1 + w2 + b ≥ 0.

A valid solution is:


1
W = [ ], b = −1.
1

Epoch-by-Epoch Training: Start with (W0 = [0, 0]T and b0 = 0)

Input p True y WTp + b Predicted ŷ Error Wnew = Wold + Error ∗ Input bnew = bold + Error ∗ 1
(0,0) 0 0 1 -1 [0, 0]T -1
Epoch 1: (0,1) 1 -1 0 1 [0, 1]T 0
(1,0) 1 0 1 0 No update No update
(1,1) 1 3 1 0 No update No update

Input True y WTp + b Predicted ŷ Error Wnew = Wold + Error ∗ Input bnew = bold + Error ∗ 1
(0,0) 0 0 1 -1 [0, 1]T -1
Epoch 2: (0,1) 1 0 1 0 No update No update
(1,0) 1 1 0 1 [1, 1]T 0
(1,1) 1 2 1 0 No update No update

Input True y WTp + b Predicted ŷ Error Wnew = Wold + Error ∗ Input bnew = bold + Error ∗ 1
(0,0) 0 0 1 -1 [1, 1]T -1
Epoch 3: (0,1) 1 0 1 0 No update No update
(1,0) 1 0 1 0 No update No update
(1,1) 1 1 1 0 No update No update

Input True y WTp + b Predicted ŷ Error Wnew = Wold + Error ∗ Input bnew = bold + Error ∗ 1
Epoch 4:
(0,0) 0 -1 0 0 No update No update

Final Weights and Decision Boundary:

1
W = [ ], b = −1, x1 + x2 = 1.
1
Problem 5: Consider the XOR dataset:

DXOR = {(0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 0)}.

1. Show that no weight vector W = [w1 , w2 ]T and bias b can solve the XOR problem.

5
2. Solve the XOR problem using a hidden layer with four perceptrons and consider zero bias for all the perceptrons
in the network.
To solve the XOR problem, a hidden layer with four perceptrons can be introduced. The following hidden layer
weights are used corresponding to four neurons:

W1 = [−1, −1]T , W2 = [1, −1]T , W3 = [−1, 1]T , W4 = [1, 1]T .

Each perceptron in the hidden layer computes an intermediate value, which is then passed to the output perceptron.
Hidden Layer Computations:

Input (x1 , x2 ) H1 H2 H3 H4 Target Output y


(0,0) f (W1T [0, 0]) f (W2T [0, 0]) f (W3T [0, 0]) f (W4T [0, 0]) 0
(0,1) f (W1T [0, 1]) f (W2T [0, 1]) f (W3T [0, 1]) f (W4T [0, 1]) 1
(1,0) f (W1T [1, 0]) f (W2T [1, 0]) f (W3T [1, 0]) f (W4T [1, 0]) 1
(1,1) f (W1T [1, 1]) f (W2T [1, 1]) f (W3T [1, 1]) f (W4T [1, 1]) 0

Table 1: Computation of Hidden Layer Outputs for XOR Dataset

The output layer computes:


O = w1 H1 + w2 H2 + w3 H3 + w4 H4 + bo ,
where the weights w1 , w2 , w3 , w4 satisfy the conditions:

w1 (1) + w2 (1) + w3 (1) + w4 (1) < 0, (for (0,0))


w1 (0) + w2 (0) + w3 (1) + w4 (1) ≥ 0, (for (0,1))
w1 (0) + w2 (1) + w3 (0) + w4 (1) ≥ 0, (for (1,0))
w1 (0) + w2 (1) + w3 (1) + w4 (0) < 0. (for (1,1))

A valid solution that satisfies these constraints is:

w1 = −1, w2 = 1, w3 = 1, w4 = −1.

Thus, by introducing a hidden layer, the XOR function becomes separable and can be correctly classified.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy