cs188 sp23 Lec25 - Z
cs188 sp23 Lec25 - Z
Spring 2023
University of California, Berkeley
Linear Classifiers
Feature Vectors
Hello, # free
YOUR_NAME
:
:
2
0 SPAM
Do you want free printr
cartriges? Why pay more
MISSPELLED
FROM_FRIEND
:
:
2
0 or
when you can get them
ABSOLUTELY FREE! Just
...
+
PIXEL-7,12 : 1
PIXEL-7,13
...
: 0
“2”
NUM_LOOPS : 1
...
Some (Simplified) Biology
§ Very loose inspiration: human neurons
Linear Classifiers
S
§ Positive, output +1 w2
f2 >0?
w3
§ Negative, output -1 f3
Weights
Dot product positive means the positive class (spam)
# free : 2
# free : 4
YOUR_NAME :-1 YOUR_NAME : 0
MISSPELLED : 2
MISSPELLED : 1
FROM_FRIEND : 0
FROM_FRIEND :-3
...
...
# free : 4 # free : 0
YOUR_NAME :-1 YOUR_NAME : 1
MISSPELLED : 1 MISSPELLED : 1
FROM_FRIEND :-3 FROM_FRIEND : 1
... ...
2 2
A point on a coordinate grid A vector in space. Notice we are
not on a coordinate grid.
§ A tuple with more elements like (2, 7, -3, 6) is a point or vector in higher-
dimensional space (hard to visualize)
Review: Vectors
§ Definition of dot product:
§ a · b = |a| |b| cos(θ)
§ θ is the angle between the vectors a and b θ
θ
§ Consequences of this definition:
§ Vectors closer together a · b large, positive a · b small, positive
= “similar” vectors
= smaller angle θ between vectors
= larger (more positive) dot product
§ If θ < 90°, then dot product is positive
θ θ
§ If θ = 90°, then dot product is zero
a · b zero a · b negative
§ If θ > 90°, then dot product is negative
Weights
§ Binary case: compare features to a weight vector
§ Learning: figure out the weight vector from examples
# free : 4
YOUR_NAME :-1
MISSPELLED : 1 # free : 2
FROM_FRIEND :-3 YOUR_NAME : 0
... MISSPELLED : 2
FROM_FRIEND : 0
...
# free : 0
YOUR_NAME : 1
MISSPELLED : 1
Dot product positive FROM_FRIEND : 1
means the positive class ...
Decision Rules
Binary Decision Rule
§ In the space of feature vectors
§ Examples are points
§ Any weight vector is a hyperplane (divides space into two sides)
§ One side corresponds to Y=+1, the other corresponds to Y=-1 free : 4
§ In the example: money : 2
money
f · w < 0 when 4*free + 2*money < 0 2
These equations correspond to two halves of the feature space
+1 = SPAM
§ f · w = 0 when 4*free + 2*money = 0
This equation corresponds to the decision boundary (a line in 1
2D, a hyperplane in higher dimensions)
0
-1 = HAM 0 1 free
Weight Updates
Learning: Binary Perceptron
§ Start with weights = 0
§ For each training instance:
§ Classify with current weights
§ If correct, no change!
§ If wrong: lower score of wrong answer,
raise score of right answer
Example: Multiclass Perceptron
z
0
§ z = output of perceptron
H(z) = probability the class is +1, according to the classifier
How to get probabilistic decisions?
§ Perceptron scoring: z = w · f (x)
§ If z = w · f (x) very positive à probability of class +1 should approach 1.0
§ If z = w · f (x) very negative à probability of class +1 should approach 0.0
§ Sigmoid function
1
(z) = z
1+e
§ z = output of perceptron
1
(z) ==probability
z
the class is +1, according to the classifier
1+e
A 1D Example
where w is some weight constant (1D vector) we have to learn
(assume w is positive in this example)
(i) (i) 1
with: P (y = +1|x ; w) = w·f (x(i) )
1+e
(i) (i) 1
P (y = 1|x ; w) = 1 w·f (x(i) )
1+e
= Logistic Regression
Separable Case: Deterministic Decision – Many Options
Separable Case: Probabilistic Decision – Clear Preference
0.7 | 0.3
0.5 | 0.5
0.7 | 0.3
0.3 | 0.7 0.5 | 0.5
0.3 | 0.7
Multiclass Logistic Regression
§ Recall Perceptron:
§ A weight vector for each class:
wy(i) ·f (x(i) )
(i) (i) e
with: P (y |x ; w) = P (i) )
e w y ·f (x
y
Softmax Sigmoid
with wred = 0 becomes:
Next Lecture
§ Optimization