0% found this document useful (0 votes)
8 views30 pages

Support Vector Machines

The document provides an overview of Support Vector Machines (SVM), focusing on their use in two-class classification problems by finding the optimal separating hyperplane that maximizes the margin between classes. It discusses the concept of maximal-margin classifiers, limitations such as non-separable data and noise, and introduces the support vector classifier which allows for soft margins. Additionally, the document touches on feature expansion to achieve non-linear decision boundaries in the original feature space.

Uploaded by

sainathgunda99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views30 pages

Support Vector Machines

The document provides an overview of Support Vector Machines (SVM), focusing on their use in two-class classification problems by finding the optimal separating hyperplane that maximizes the margin between classes. It discusses the concept of maximal-margin classifiers, limitations such as non-separable data and noise, and introduces the support vector classifier which allows for soft margins. Additionally, the document touches on feature expansion to achieve non-linear decision boundaries in the original feature space.

Uploaded by

sainathgunda99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

sainathgunda99@gmail.

com
DLZNK464L9 Support Vector Machines (SVM)

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
• Introduction to SVM
• Concept of Hyperplane

sainathgunda99@gmail.com
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Machines
• We approach the two-class classification problem in a direct way by trying and finding a plane
that separates the classes in the feature space.

• Support vector machines (SVMs) choose the linear separator with the largest margin.

sainathgunda99@gmail.com
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Machines
• Imagine a situation where you have a two-class classification problem with two predictors, X1 and X2.

• Suppose that the two classes are linearly separable (i.e., one can draw a straight line in which all points
on one side belong to the first class and points on the other side to the second class).

• Then, a natural approach is to find the straight line that gives the largest margin (biggest separation)
sainathgunda99@gmail.com
DLZNK464L9
between the classes (i.e., the points are as far from the line as possible).

• This is the basic idea of a support vector classifier.

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Machines
• C is the minimum perpendicular distance between
each point and the separating line.

• We find the line which maximizes C.

• This line is called the optimal separating


sainathgunda99@gmail.com
DLZNK464L9
hyperplane.

• The classification of a point depends on which side


of the line it falls on.

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Machines
• What is a hyperplane?

sainathgunda99@gmail.com
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Machines
• Hyperplane in 2 Dimensions

sainathgunda99@gmail.com
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Machines
• Separating Hyperplanes

sainathgunda99@gmail.com
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Machines
• This idea works just as well with more than two predictors, too!

• For example, with three predictors, you want to find the plane that produces the largest separation
between the classes.

• With more than three dimensions, it becomes hard to visualize a plane, but it still exists. In general,
sainathgunda99@gmail.com
DLZNK464L9
they are called hyperplanes.

• In practice, it is not usually possible to find a hyperplane that perfectly separates two classes.

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Summary
Here is a quick recap:
• We discussed the support vector machine in detail, which is used to find the straight line that gives
the largest margin (biggest separation) between the classes.
• We learned the concept of a Hyperplane that finds the line which maximizes the minimum
perpendicular distance between each point and the separating line.
• We then discussed about the hyperplane in 2-Dimensions and how to separate the hyperplane
graphically to understand the concept in depth.
sainathgunda99@gmail.com
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Maximal Marginal and Support Vector
sainathgunda99@gmail.com
DLZNK464L9 Classifier

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
• The idea of the Maximal-margin classifier
• Limitations of Maximal-margin classifier
• The concept of support vector classifier using an example

sainathgunda99@gmail.com
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Maximal Margin Classifier
Among all separating hyperplanes, find the one that makes the biggest gap or margin between the two
classes.

Constrained optimization problem maximize M


β0 ,β1 ,...,βp Maximizing the margin of the hyperplane (width)
1
3

sainathgunda99@gmail.com Distances are relative to the


Σp hyperplane (perpendicular)…
DLZNK464L9
X2

2
subject to jβ = 1,
0
2

j=1
yi(β0 + β1xi1 + . . . + βpxip) ≥ M
−1

for all i = 1, . . . , N.
Each observation will be on
−1 0 1 2 3 the correct side of the hyperplane.
X1

Support vectors…
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Max Marginal Classifier (Issues): Non-separable Data
• The data on the left are not separable by a linear
boundary.

• This is often the case, unless N < p.


0.5
1.5

• Another issue: Perfect separation could lead to


X2

sainathgunda99@gmail.com overfitting (sensitivity issues).


0.0
1.0
2.0

DLZNK464L9
−0.5
−1.0

0 1 2 3
X1

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
An additional issue…noisy data
3

3
2

2
X2

X2
1

1
0

0
sainathgunda99@gmail.com
−1

−1
DLZNK464L9

−1 0 1 2 3 −1 0 1 2 3

X1 X1

Sometimes the data are separable but noisy. This can lead to a poor solution for
the maximal-margin classifier.

The support vector classifier maximizes a soft margin.


- Greater robustness to individual observations
- Better classification of most training observations
This file is meant for personal use by sainathgunda99@gmail.com only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Classifier
Some observations can be on the wrong side of the margin. Soft margin: It can be violated by
10 10 some of the training observations.
4

4
7 7
3

3
11
9 9
8 8
2

2
X2

X2
1

1
1 1
12
3
sainathgunda99@gmail.com 3
0

0
DLZNK464L9
4 5 4 5
2 2
6 6
−1

−1
−0.5 0.0 0.5 1.0 1.5 2.0 2.5 −0.5 0.0 0.5 1.0 1.5 2.0
Correct side of the margin
2.5
X1 X1 Wrong side of the margin

β0
maximize
,β ,...,β ,є ,...,є
M Wrong side of the hyperplane
1 p 1 n

Slack variables:
Allows observations Σn Non-negative tuning parameter…number
to be on the wrong ϵi ≥ 0, ϵ ≤ C,i and severity of violations!
side of the margin. i=1
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Support Vector Classifier
Large C Smaller C

2
0
2
1
X2

X2
0

−1
1
−1

Tolerance for
−2

observations
sainathgunda99@gmail.com
−3

−2
DLZNK464L9

3
being on the −1 0
X1
1 2 −1 0
X1
1 2

wrong side of the


Even smaller C Smallest C, but C>0*
margin decreases
3

as C decreases

−1
−3
2

2
(and so does the
margin)
1
X2

X2
−2
−2
−1

3
0

1
−3
−3

*C = 0, Maximal margin −1 0 1
Proprietary
Thiscontent.
2
file is ©University Arizona. All−1use
meant forofpersonal Rightsby 0
Reserved. 1
Unauthorized2
use or distributiononly.
prohibited.
X1 X1 sainathgunda99@gmail.com
hyperplane optimization… Sharing or publishing the contents in part or full is liable for legal action.
Linear boundary can fail
4

Sometimes, a linear boundary


simply won’t work, no matter
2

what the value of C is.


X2

sainathgunda99@gmail.com
0

DLZNK464L9
−2
−4

−4 −2 0 2 4

X1

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Feature Expansion
• Enlarge the space of features by including transformations;
e.g. X12, X13, X1X2, X1X22,. . .. Hence go from a
p-dimensional space to a M > p dimensional space.
• Fit a support-vector classifier in the enlarged space.
• This results in non-linear decision boundaries in the original space.
sainathgunda99@gmail.com
DLZNK464L9

Example: Suppose we use (X1, X2, X12 , X22 , X1X2) instead of just (X1, X2). Then the decision
boundary would be of the form:

This leads to nonlinear decision boundaries in the original space


(quadratic conic sections).
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
A simple example

sainathgunda99@gmail.com
DLZNK464L9

For this example, the


altitude feature can be
expressed
mathematically as an
interaction between
latitude and longitude.
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Summary
Here is a quick recap:
• We discussed the Maximal-margin classifier that tends to find out the biggest gap or margin between
the two classes.
• We talked about the limitations of the Maximal-margin classifier, such as noisy data, non-separable
data, etc.
• We learned the concept of a support vector classifier using an example and understood why it is
better than a Maximal-margin classifier.
sainathgunda99@gmail.com
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Cubic Polynomial and Kernel in
sainathgunda99@gmail.com
DLZNK464L9 Support Vector Machine

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
• The concept of Cubic polynomials
• Introduction to Kernels in SVM

sainathgunda99@gmail.com
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Cubic Polynomials
• Here we use a basic expansion
of cubic polynomials.

4
- From 2 variables to 9
• The support-vector classifier in

2
the enlarged space solves the

X2
problem in the lower-

0
DLZNK464L9 dimensional space.
sainathgunda99@gmail.com

−2
−4
−4 −2 0 2 4

X1

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Nonlinearities and Kernels
• Polynomials (especially high-dimensional ones) get wild rather fast.

• There is a more elegant and controlled way to introduce nonlinearities in support-vector classifiers —
through the use of kernels.

• Before we discuss these, we must understand the role of inner products in support-vector classifiers.
sainathgunda99@gmail.com
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Inner Products and Support Vectors
Σp
⟨x i, x ′ ⟩ = i x xij′ i j — inner product between vectors
j=1

• The linear support vector classifier can be represented as:


Σn
sainathgunda99@gmail.com f (x) = β + 0 αi ⟨ — n parameters
DLZNK464L9
i=1 i

• To estimate the parameters α1, . . . , αn and β0, all we need


n
are the 2 inner products i ′ ⟩ between all pairs of training observations.
i
• It turns out that most of the αˆi can be zero (e.g. non-zero only for
support vectors):
i∈S

S is the support set of indicesThisI,filesuch that αˆi use


> 0.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
is meant for personal by sainathgunda99@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Kernels and Support Vector Machines
• If we can compute inner-products between observations, we
can fit a SV classifier. Can be quite abstract!
• Some special kernel functions can do this for us. E.g.
 d Kernel quantifies the
Σp similarity of two observations
sainathgunda99@gmail.com
K(xi, xi′ ) = 1 + xijxi′j 
(i.e. directly generalizing
DLZNK464L9 j=1 inner products).

computes the inner-products needed for d dimensional


p+d
polynomials — d basis functions!

• The solution has the form


Σ
AKA Support Vector Machine (in it’s non-linear form)
Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Radial Kernel

j=1
4

sainathgunda99@gmail.com
DLZNK464L9
2

Implicit feature space; very high


dimensional.
X2
0

Controls variance by squashing


down most dimensions severely.
−2
−4

−4 −2 0 2 4
X1 Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Which to use: SVM or Logistic Regression?
• When classes are (nearly) separable, SVM does better than LR. So does LDA.

• When not, LR (with ridge penalty) and SVM are very similar.

• If you wish to estimate probabilities, LR is the choice.


sainathgunda99@gmail.com
DLZNK464L9

• For nonlinear boundaries, kernel SVMs are popular. Can use kernels with LR and LDA as well, but
computations are more expensive.

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.
Summary
Here is a quick recap:
• We discussed the basis of Cubic polynomials, how the support-vector classifier in the enlarged space
solves the problem in the lower-dimensional space.
• We learned to introduce nonlinearities in support-vector classifiers through the use of kernels, and
also understood the concept of radial kernels.
• We talked about why the Support vector machine is considered to be better than logistic regression.
sainathgunda99@gmail.com
DLZNK464L9

Proprietary
Thiscontent.
file is ©University
meant forofpersonal
Arizona. All use
Rightsby
Reserved. Unauthorized use or distributiononly.
sainathgunda99@gmail.com prohibited.

Sharing or publishing the contents in part or full is liable for legal action.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy