Support Vector Machines
Support Vector Machines
com
DLZNK464L9 Support Vector Machines (SVM)
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
• Introduction to SVM
• Concept of Hyperplane
sainathgunda99@gmail.com
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Machines
• We approach the two-class classification problem in a direct way by trying and finding a plane
that separates the classes in the feature space.
• Support vector machines (SVMs) choose the linear separator with the largest margin.
sainathgunda99@gmail.com
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Machines
• Imagine a situation where you have a two-class classification problem with two predictors, X1 and X2.
• Suppose that the two classes are linearly separable (i.e., one can draw a straight line in which all points
on one side belong to the first class and points on the other side to the second class).
• Then, a natural approach is to find the straight line that gives the largest margin (biggest separation)
sainathgunda99@gmail.com
DLZNK464L9
between the classes (i.e., the points are as far from the line as possible).
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Machines
• C is the minimum perpendicular distance between
each point and the separating line.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Machines
• What is a hyperplane?
sainathgunda99@gmail.com
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Machines
• Hyperplane in 2 Dimensions
sainathgunda99@gmail.com
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Machines
• Separating Hyperplanes
sainathgunda99@gmail.com
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Machines
• This idea works just as well with more than two predictors, too!
• For example, with three predictors, you want to find the plane that produces the largest separation
between the classes.
• With more than three dimensions, it becomes hard to visualize a plane, but it still exists. In general,
sainathgunda99@gmail.com
DLZNK464L9
they are called hyperplanes.
• In practice, it is not usually possible to find a hyperplane that perfectly separates two classes.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Summary
Here is a quick recap:
• We discussed the support vector machine in detail, which is used to find the straight line that gives
the largest margin (biggest separation) between the classes.
• We learned the concept of a Hyperplane that finds the line which maximizes the minimum
perpendicular distance between each point and the separating line.
• We then discussed about the hyperplane in 2-Dimensions and how to separate the hyperplane
graphically to understand the concept in depth.
sainathgunda99@gmail.com
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Maximal Marginal and Support Vector
sainathgunda99@gmail.com
DLZNK464L9 Classifier
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
• The idea of the Maximal-margin classifier
• Limitations of Maximal-margin classifier
• The concept of support vector classifier using an example
sainathgunda99@gmail.com
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Maximal Margin Classifier
Among all separating hyperplanes, find the one that makes the biggest gap or margin between the two
classes.
2
subject to jβ = 1,
0
2
j=1
yi(β0 + β1xi1 + . . . + βpxip) ≥ M
−1
for all i = 1, . . . , N.
Each observation will be on
−1 0 1 2 3 the correct side of the hyperplane.
X1
Support vectors…
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Max Marginal Classifier (Issues): Non-separable Data
• The data on the left are not separable by a linear
boundary.
DLZNK464L9
−0.5
−1.0
0 1 2 3
X1
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
An additional issue…noisy data
3
3
2
2
X2
X2
1
1
0
0
sainathgunda99@gmail.com
−1
−1
DLZNK464L9
−1 0 1 2 3 −1 0 1 2 3
X1 X1
Sometimes the data are separable but noisy. This can lead to a poor solution for
the maximal-margin classifier.
4
7 7
3
3
11
9 9
8 8
2
2
X2
X2
1
1
1 1
12
3
sainathgunda99@gmail.com 3
0
0
DLZNK464L9
4 5 4 5
2 2
6 6
−1
−1
−0.5 0.0 0.5 1.0 1.5 2.0 2.5 −0.5 0.0 0.5 1.0 1.5 2.0
Correct side of the margin
2.5
X1 X1 Wrong side of the margin
β0
maximize
,β ,...,β ,є ,...,є
M Wrong side of the hyperplane
1 p 1 n
Slack variables:
Allows observations Σn Non-negative tuning parameter…number
to be on the wrong ϵi ≥ 0, ϵ ≤ C,i and severity of violations!
side of the margin. i=1
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Classifier
Large C Smaller C
2
0
2
1
X2
X2
0
−1
1
−1
Tolerance for
−2
observations
sainathgunda99@gmail.com
−3
−2
DLZNK464L9
3
being on the −1 0
X1
1 2 −1 0
X1
1 2
as C decreases
−1
−3
2
2
(and so does the
margin)
1
X2
X2
−2
−2
−1
3
0
1
−3
−3
*C = 0, Maximal margin −1 0 1
Proprietary
Thiscontent.
2
file is ©University Arizona. All−1use
meant forofpersonal Rightsby 0
Reserved. 1
Unauthorized2
use or distributiononly.
prohibited.
X1 X1 sainathgunda99@gmail.com
hyperplane optimization… Sharing or publishing the contents in part or full is liable for legal action.
Linear boundary can fail
4
sainathgunda99@gmail.com
0
DLZNK464L9
−2
−4
−4 −2 0 2 4
X1
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Feature Expansion
• Enlarge the space of features by including transformations;
e.g. X12, X13, X1X2, X1X22,. . .. Hence go from a
p-dimensional space to a M > p dimensional space.
• Fit a support-vector classifier in the enlarged space.
• This results in non-linear decision boundaries in the original space.
sainathgunda99@gmail.com
DLZNK464L9
Example: Suppose we use (X1, X2, X12 , X22 , X1X2) instead of just (X1, X2). Then the decision
boundary would be of the form:
Sharing or publishing the contents in part or full is liable for legal action.
A simple example
sainathgunda99@gmail.com
DLZNK464L9
Sharing or publishing the contents in part or full is liable for legal action.
Summary
Here is a quick recap:
• We discussed the Maximal-margin classifier that tends to find out the biggest gap or margin between
the two classes.
• We talked about the limitations of the Maximal-margin classifier, such as noisy data, non-separable
data, etc.
• We learned the concept of a support vector classifier using an example and understood why it is
better than a Maximal-margin classifier.
sainathgunda99@gmail.com
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Cubic Polynomial and Kernel in
sainathgunda99@gmail.com
DLZNK464L9 Support Vector Machine
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
• The concept of Cubic polynomials
• Introduction to Kernels in SVM
sainathgunda99@gmail.com
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Cubic Polynomials
• Here we use a basic expansion
of cubic polynomials.
4
- From 2 variables to 9
• The support-vector classifier in
2
the enlarged space solves the
X2
problem in the lower-
0
DLZNK464L9 dimensional space.
sainathgunda99@gmail.com
−2
−4
−4 −2 0 2 4
X1
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Nonlinearities and Kernels
• Polynomials (especially high-dimensional ones) get wild rather fast.
• There is a more elegant and controlled way to introduce nonlinearities in support-vector classifiers —
through the use of kernels.
• Before we discuss these, we must understand the role of inner products in support-vector classifiers.
sainathgunda99@gmail.com
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Inner Products and Support Vectors
Σp
⟨x i, x ′ ⟩ = i x xij′ i j — inner product between vectors
j=1
Sharing or publishing the contents in part or full is liable for legal action.
Radial Kernel
j=1
4
sainathgunda99@gmail.com
DLZNK464L9
2
−4 −2 0 2 4
X1 Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Which to use: SVM or Logistic Regression?
• When classes are (nearly) separable, SVM does better than LR. So does LDA.
• When not, LR (with ridge penalty) and SVM are very similar.
• For nonlinear boundaries, kernel SVMs are popular. Can use kernels with LR and LDA as well, but
computations are more expensive.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Summary
Here is a quick recap:
• We discussed the basis of Cubic polynomials, how the support-vector classifier in the enlarged space
solves the problem in the lower-dimensional space.
• We learned to introduce nonlinearities in support-vector classifiers through the use of kernels, and
also understood the concept of radial kernels.
• We talked about why the Support vector machine is considered to be better than logistic regression.
sainathgunda99@gmail.com
DLZNK464L9
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.
Sharing or publishing the contents in part or full is liable for legal action.