0% found this document useful (0 votes)

34 views38 pages

cs188 sp23 Lec25 - Z

This document discusses linear classifiers and perceptrons for classification problems. It introduces feature vectors, weights, and how dot products of weights and features are used to classify examples. It also covers learning algorithms for perceptrons, including updating weights for misclassified examples. Multiclass classification with perceptrons is also covered.

Uploaded by

willy muhammad fauzi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views38 pages

cs188 sp23 Lec25 - Z

Uploaded by

willy muhammad fauzi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

CS 188: Artificial Intelligence

Perceptrons and Logistic Regression

Spring 2023
University of California, Berkeley
Linear Classifiers
Feature Vectors

Hello, # free
YOUR_NAME
:
:
2
0 SPAM
Do you want free printr
cartriges? Why pay more
MISSPELLED
FROM_FRIEND
:
:
2
0 or
when you can get them
ABSOLUTELY FREE! Just
...
+

PIXEL-7,12 : 1
PIXEL-7,13
...
: 0
“2”
NUM_LOOPS : 1
...
Some (Simplified) Biology
§ Very loose inspiration: human neurons
Linear Classifiers

§ Inputs are feature values

§ Each feature has a weight
§ Sum is the activation

§ If the activation is: f1

S
§ Positive, output +1 w2
f2 >0?
w3
§ Negative, output -1 f3
Weights
Dot product positive means the positive class (spam)

# free : 2
# free : 4
YOUR_NAME :-1 YOUR_NAME : 0
MISSPELLED : 2
MISSPELLED : 1
FROM_FRIEND : 0
FROM_FRIEND :-3
...
...

# free : 4 # free : 0
YOUR_NAME :-1 YOUR_NAME : 1
MISSPELLED : 1 MISSPELLED : 1
FROM_FRIEND :-3 FROM_FRIEND : 1
... ...

Do these weights make sense for spam classification?

Review: Vectors
§ A tuple like (2,3) can be interpreted two different ways:

2 2
A point on a coordinate grid A vector in space. Notice we are
not on a coordinate grid.

§ A tuple with more elements like (2, 7, -3, 6) is a point or vector in higher-
dimensional space (hard to visualize)
Review: Vectors
§ Definition of dot product:
§ a · b = |a| |b| cos(θ)
§ θ is the angle between the vectors a and b θ
θ
§ Consequences of this definition:
§ Vectors closer together a · b large, positive a · b small, positive
= “similar” vectors
= smaller angle θ between vectors
= larger (more positive) dot product
§ If θ < 90°, then dot product is positive
θ θ
§ If θ = 90°, then dot product is zero
a · b zero a · b negative
§ If θ > 90°, then dot product is negative
Weights
§ Binary case: compare features to a weight vector
§ Learning: figure out the weight vector from examples

# free : 4
YOUR_NAME :-1
MISSPELLED : 1 # free : 2
FROM_FRIEND :-3 YOUR_NAME : 0
... MISSPELLED : 2
FROM_FRIEND : 0
...

# free : 0
YOUR_NAME : 1
MISSPELLED : 1
Dot product positive FROM_FRIEND : 1
means the positive class ...
Decision Rules
Binary Decision Rule
§ In the space of feature vectors
§ Examples are points
§ Any weight vector is a hyperplane (divides space into two sides)
§ One side corresponds to Y=+1, the other corresponds to Y=-1 free : 4
§ In the example: money : 2

§ f · w > 0 when 4free + 2money > 0

money
f · w < 0 when 4*free + 2*money < 0 2
These equations correspond to two halves of the feature space
+1 = SPAM
§ f · w = 0 when 4*free + 2*money = 0
This equation corresponds to the decision boundary (a line in 1
2D, a hyperplane in higher dimensions)

0
-1 = HAM 0 1 free
Weight Updates
Learning: Binary Perceptron
§ Start with weights = 0
§ For each training instance:
§ Classify with current weights

§ If correct (i.e., y=y*), no change!

§ If wrong: adjust the weight vector

Learning: Binary Perceptron
§ Start with weights = 0
§ For each training instance:
§ Classify with current weights

§ If correct (i.e., y=y*), no change!

§ If wrong: adjust the weight vector by
adding or subtracting the feature
vector. Subtract if y* is -1.
Learning: Binary Perceptron
§ Misclassification, Case I:
§ w · f > 0, so we predict +1
§ True class is -1
§ We want to modify w to w' such that dot product w' · f is lower
§ Update if we misclassify a true class -1 sample: w' = w – f
§ Proof: w' · f = (w – f) · f = (w · f) – (f · f) = (w · f) – |f|2
Note that |f|2 is always positive
§ Misclassification, Case II:
§ w · f < 0, so we predict -1
§ True class is +1
§ We want to modify w to w' such that dot product w' · f is higher
§ Update if we misclassify a true class +1 sample: w' = w + f
§ Proof: w' · f = (w + f) · f = (w · f) + (f · f) = (w · f) + |f|2
Note that |f|2 is always positive
§ Write update compactly as w' = w + y* · f, where y* = true class
Examples: Perceptron
§ Separable Case
Multiclass Decision Rule

§ If we have multiple classes:

§ A weight vector for each class:

§ Score (activation) of a class y:

§ Prediction highest score wins

Binary = multiclass where the negative class has weight zero

Learning: Multiclass Perceptron

§ Start with all weights = 0

§ Pick up training examples one by one
§ Predict with current weights

§ If correct, no change!
§ If wrong: lower score of wrong answer,
raise score of right answer
Example: Multiclass Perceptron

“win the vote”

“win the election”
“win the game”

BIAS : 1 BIAS : 0 BIAS : 0

win : 0 win : 0 win : 0
game : 0 game : 0 game : 0
vote : 0 vote : 0 vote : 0
the : 0 the : 0 the : 0
... ... ...
Properties of Perceptrons
Separable
§ Separability: true if some parameters get the training set
perfectly correct

§ Convergence: if the training is separable, perceptron will

eventually converge (binary case)

§ Mistake Bound: the maximum number of mistakes (binary

Non-Separable
case) related to the margin or degree of separability
Problems with the Perceptron

§ Noise: if the data isn’t separable,

weights might thrash
§ Averaging weight vectors over time
can help (averaged perceptron)

§ Mediocre generalization: finds a

“barely” separating solution

§ Overtraining: test / held-out

accuracy usually rises, then falls
§ Overtraining is a kind of overfitting
Improving the Perceptron
Non-Separable Case: Deterministic Decision
Even the best linear boundary makes at least one mistake
Non-Separable Case: Probabilistic Decision
0.9 | 0.1
0.7 | 0.3
0.5 | 0.5
0.3 | 0.7
0.1 | 0.9
How to get deterministic decisions?
§ Perceptron scoring: z = w · f (x)
§ If z = w · f (x) positive à classifier says: 1.0 probability this is class +1
§ If z = w · f (x) negative à classifier says: 0.0 probability this is class +1
H(z)
§ Step function
1

z
0
§ z = output of perceptron
H(z) = probability the class is +1, according to the classifier
How to get probabilistic decisions?
§ Perceptron scoring: z = w · f (x)
§ If z = w · f (x) very positive à probability of class +1 should approach 1.0
§ If z = w · f (x) very negative à probability of class +1 should approach 0.0

§ Sigmoid function

1
(z) = z
1+e
§ z = output of perceptron
1
(z) ==probability
z
the class is +1, according to the classifier
1+e
A 1D Example
where w is some weight constant (1D vector) we have to learn
(assume w is positive in this example)

definitely blue not sure definitely red

(x negative) (x near 0) (x positive)
Best w?
§ Recall maximum likelihood estimation: Choose the w value that
maximizes the probability of the observed (training) data
Best w?
§ Recall maximum likelihood estimation: Choose the w value that
maximizes the probability of the observed (training) data
Best w?
§ Maximum likelihood estimation:
X
(i) (i)
max ll(w) = max log P (y |x ; w)
w w
i

(i) (i) 1
with: P (y = +1|x ; w) = w·f (x(i) )
1+e
(i) (i) 1
P (y = 1|x ; w) = 1 w·f (x(i) )
1+e
= Logistic Regression
Separable Case: Deterministic Decision – Many Options
Separable Case: Probabilistic Decision – Clear Preference

0.7 | 0.3
0.5 | 0.5
0.7 | 0.3
0.3 | 0.7 0.5 | 0.5
0.3 | 0.7
Multiclass Logistic Regression
§ Recall Perceptron:
§ A weight vector for each class:

§ Score (activation) of a class y:

§ Prediction highest score wins

§ How to make the scores into probabilities?

z1
e ez2 ez3
z1 , z2 , z3 ! , ,
ez1 + ez2 + ez3 ez1 + ez2 + ez3 ez1 + ez2 + ez3
original activations softmax activations
Best w?
§ Recall maximum likelihood estimation: Choose the w value that
maximizes the probability of the observed (training) data
Best w?
§ Maximum likelihood estimation:
X
(i) (i)
max ll(w) = max log P (y |x ; w)
w w
i

wy(i) ·f (x(i) )
(i) (i) e
with: P (y |x ; w) = P (i) )
e w y ·f (x
y

= Multi-Class Logistic Regression

Softmax with Different Bases
Softmax and Sigmoid
§ Recall: Binary perceptron is a special case of multi-class perceptron
§ Multi-class: Compute for each class y, pick class with the highest activation
§ Binary case:
Let the weight vector of +1 be w (which we learn).
Let the weight vector of -1 always be 0 (constant).
§ Binary classification as a multi-class problem:
Activation of negative class is always 0.
If w · f is positive, then activation of +1 (w · f) is higher than -1 (0).
If w · f is negative, then activation of -1 (0) is higher than +1 (w · f).

Softmax Sigmoid
with wred = 0 becomes:
Next Lecture

§ Optimization

§ i.e., how do we solve:

X
(i) (i)
max ll(w) = max log P (y |x ; w)
w w
i

Classification
100% (2)
Classification
105 pages
Machine Learning-4
100% (1)
Machine Learning-4
18 pages
lec21-ML II
No ratings yet
lec21-ML II
66 pages
SP14 CS188 Lecture 22 - Perceptron - Print
No ratings yet
SP14 CS188 Lecture 22 - Perceptron - Print
35 pages
W4 - Logistic Regression
No ratings yet
W4 - Logistic Regression
52 pages
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
No ratings yet
06-07-08-Supervised Learning by Computing Distances, Multi Class Classification, Decision Boundary
32 pages
ML - Lec 6 - Linear Classifiers
No ratings yet
ML - Lec 6 - Linear Classifiers
55 pages
AI Lec2.1 MLsupervised
No ratings yet
AI Lec2.1 MLsupervised
21 pages
Lecture 4 - Linear Classification
No ratings yet
Lecture 4 - Linear Classification
34 pages
Perceptron
No ratings yet
Perceptron
23 pages
CS221 - Artificial Intelligence - Machine Learning - 3 Linear Classification
No ratings yet
CS221 - Artificial Intelligence - Machine Learning - 3 Linear Classification
28 pages
Lec 21
No ratings yet
Lec 21
34 pages
lec22-ML III
No ratings yet
lec22-ML III
51 pages
Week3 LearningI
No ratings yet
Week3 LearningI
48 pages
Lecture 19
No ratings yet
Lecture 19
8 pages
L03 Slides - Perceptron
No ratings yet
L03 Slides - Perceptron
22 pages
Lecture 06 - Perceptron
No ratings yet
Lecture 06 - Perceptron
28 pages
SML Lecture5
No ratings yet
SML Lecture5
45 pages
3 Percept Ron
No ratings yet
3 Percept Ron
34 pages
Machine Learning - Lecture 5
No ratings yet
Machine Learning - Lecture 5
19 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
03 Classification Handout
No ratings yet
03 Classification Handout
24 pages
Machine Learning: Support Vector Machines Kernel Methods
No ratings yet
Machine Learning: Support Vector Machines Kernel Methods
87 pages
Lecture 2
No ratings yet
Lecture 2
57 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
46 pages
Lec10 Intro ML
No ratings yet
Lec10 Intro ML
93 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Ai and ML
No ratings yet
Ai and ML
16 pages
Lec1 PerceptronPocket Recap
No ratings yet
Lec1 PerceptronPocket Recap
61 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
Perceptons Neural Networks
No ratings yet
Perceptons Neural Networks
33 pages
06 Lectureslides LinearClassification Fixed
No ratings yet
06 Lectureslides LinearClassification Fixed
52 pages
Lecture 1, Part 2: Linear Classification: Roger Grosse
No ratings yet
Lecture 1, Part 2: Linear Classification: Roger Grosse
10 pages
NN 03
No ratings yet
NN 03
27 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
Perceptron Notes
No ratings yet
Perceptron Notes
5 pages
Perceptron
No ratings yet
Perceptron
26 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
Machine Learning
No ratings yet
Machine Learning
20 pages
Fintech ML Using Azure
No ratings yet
Fintech ML Using Azure
51 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
ch6 (Q 2,8,4)
No ratings yet
ch6 (Q 2,8,4)
9 pages
Lecture Notes 3 Perceptron
No ratings yet
Lecture Notes 3 Perceptron
7 pages
ML Unit I
No ratings yet
ML Unit I
14 pages
PRu 4
No ratings yet
PRu 4
13 pages
Chapter 8
No ratings yet
Chapter 8
103 pages
NN Theory
No ratings yet
NN Theory
138 pages
Machine Learning - Classifiers and Boosting: Reading CH 18.6-18.12, 20.1-20.3.2
No ratings yet
Machine Learning - Classifiers and Boosting: Reading CH 18.6-18.12, 20.1-20.3.2
54 pages
Lecturenotes Perceptron
No ratings yet
Lecturenotes Perceptron
7 pages
Petrel 2014 1 Release Notes
No ratings yet
Petrel 2014 1 Release Notes
46 pages
W4 Ecs7020p
No ratings yet
W4 Ecs7020p
48 pages
3 Linear
No ratings yet
3 Linear
5 pages
Lecture 2 Math
No ratings yet
Lecture 2 Math
34 pages
Ae1 Listening Test Paper 08.2021
No ratings yet
Ae1 Listening Test Paper 08.2021
3 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
Linear Classifiers and The Perceptron Algorithm: 36-350, Data Mining, Fall 2009 16 November 2009
No ratings yet
Linear Classifiers and The Perceptron Algorithm: 36-350, Data Mining, Fall 2009 16 November 2009
5 pages
NN-Ch2 New V1
No ratings yet
NN-Ch2 New V1
99 pages
Perceptons Neural Networks
No ratings yet
Perceptons Neural Networks
33 pages
Image Compression (Chapter 8) : CS474/674 - Prof. Bebis
No ratings yet
Image Compression (Chapter 8) : CS474/674 - Prof. Bebis
128 pages
Perceptron Linear Classifiers
No ratings yet
Perceptron Linear Classifiers
42 pages
Manual: High Pressure Cleaner MC 300/21
No ratings yet
Manual: High Pressure Cleaner MC 300/21
46 pages
Iv. Single Layer Structures: 4.1. Perceptrons
No ratings yet
Iv. Single Layer Structures: 4.1. Perceptrons
26 pages
Virtuous A. Adroit
No ratings yet
Virtuous A. Adroit
10 pages
Ipc - Jedec J-STD-020C
100% (1)
Ipc - Jedec J-STD-020C
14 pages
Audio Recording & Mastering Tips
93% (15)
Audio Recording & Mastering Tips
2 pages
Instruction Manual: Digital Genset Controller DGC-500
No ratings yet
Instruction Manual: Digital Genset Controller DGC-500
151 pages
Interview Questions - For LinkedIn
No ratings yet
Interview Questions - For LinkedIn
4 pages
Dissertation Alexis de Tocqueville
100% (2)
Dissertation Alexis de Tocqueville
8 pages
Cylinder Head Valves
No ratings yet
Cylinder Head Valves
6 pages
Import As Import As From Import
No ratings yet
Import As Import As From Import
23 pages
Downward Compatibility:: Click On Tools - Utility
No ratings yet
Downward Compatibility:: Click On Tools - Utility
2 pages
Review of Design Fire Heat Release Rate For Tunnels With Fire Suppression Systems
No ratings yet
Review of Design Fire Heat Release Rate For Tunnels With Fire Suppression Systems
11 pages
Echoes of The Tambaran Masculinity History and The Subject in The Work of Donald F Tuzin David Lipset Instant Download
No ratings yet
Echoes of The Tambaran Masculinity History and The Subject in The Work of Donald F Tuzin David Lipset Instant Download
85 pages
Guideline Answers To The Concept Check Questions Chapter 8: Capital Budgeting
No ratings yet
Guideline Answers To The Concept Check Questions Chapter 8: Capital Budgeting
8 pages
Chapter 1
No ratings yet
Chapter 1
23 pages
733-Article Text-1725-3-10-20230630
No ratings yet
733-Article Text-1725-3-10-20230630
16 pages
Hyd Pressure Spek
No ratings yet
Hyd Pressure Spek
3 pages
What Is New in Netbackup 6.5
No ratings yet
What Is New in Netbackup 6.5
42 pages
Math Ip3
No ratings yet
Math Ip3
8 pages
Guo2017 Recent Developments of Miniature Ion Trap Mass Spectrometers
No ratings yet
Guo2017 Recent Developments of Miniature Ion Trap Mass Spectrometers
10 pages
Output: Aoi
No ratings yet
Output: Aoi
24 pages
FN Series: Dry Heat Sterilizers /ovens
No ratings yet
FN Series: Dry Heat Sterilizers /ovens
2 pages
The WLTP Consumer Information Guide
No ratings yet
The WLTP Consumer Information Guide
12 pages
MK PPR Ecu en 72
No ratings yet
MK PPR Ecu en 72
2 pages
Your Reliance Bill: Summary of Current Charges Amount (RS)
No ratings yet
Your Reliance Bill: Summary of Current Charges Amount (RS)
3 pages
Brochure - Fibra-Cel Disks Questions and Answers
No ratings yet
Brochure - Fibra-Cel Disks Questions and Answers
4 pages
Zoning Map
No ratings yet
Zoning Map
1 page
M D A I C: Measure Define Improve Control
No ratings yet
M D A I C: Measure Define Improve Control
1 page
The mathematics of quantum mechanics
From Everand
The mathematics of quantum mechanics
Alessio Mangoni
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

cs188 sp23 Lec25 - Z

Uploaded by

cs188 sp23 Lec25 - Z

Uploaded by

CS 188: Artificial Intelligence

Perceptrons and Logistic Regression

§ Inputs are feature values

§ If the activation is: f1

Do these weights make sense for spam classification?

§ f · w > 0 when 4free + 2money > 0

§ If correct (i.e., y=y*), no change!

§ If wrong: adjust the weight vector

§ If correct (i.e., y=y*), no change!

§ If we have multiple classes:

§ Score (activation) of a class y:

§ Prediction highest score wins

Binary = multiclass where the negative class has weight zero

§ Start with all weights = 0

“win the vote”

BIAS : 1 BIAS : 0 BIAS : 0

§ Convergence: if the training is separable, perceptron will

§ Mistake Bound: the maximum number of mistakes (binary

§ Noise: if the data isn’t separable,

§ Mediocre generalization: finds a

§ Overtraining: test / held-out

definitely blue not sure definitely red

§ Score (activation) of a class y:

§ Prediction highest score wins

§ How to make the scores into probabilities?

= Multi-Class Logistic Regression

§ i.e., how do we solve:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

cs188 sp23 Lec25 - Z

Uploaded by

cs188 sp23 Lec25 - Z

Uploaded by

CS 188: Artificial Intelligence

Perceptrons and Logistic Regression

§ Inputs are feature values

§ If the activation is: f1

Do these weights make sense for spam classification?

§ f · w > 0 when 4*free + 2*money > 0

§ If correct (i.e., y=y*), no change!

§ If wrong: adjust the weight vector

§ If correct (i.e., y=y*), no change!

§ If we have multiple classes:

§ Score (activation) of a class y:

§ Prediction highest score wins

Binary = multiclass where the negative class has weight zero

§ Start with all weights = 0

“win the vote”

BIAS : 1 BIAS : 0 BIAS : 0

§ Convergence: if the training is separable, perceptron will

§ Mistake Bound: the maximum number of mistakes (binary

§ Noise: if the data isn’t separable,

§ Mediocre generalization: finds a

§ Overtraining: test / held-out

definitely blue not sure definitely red

§ Score (activation) of a class y:

§ Prediction highest score wins

§ How to make the scores into probabilities?

= Multi-Class Logistic Regression

§ i.e., how do we solve:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

§ f · w > 0 when 4free + 2money > 0